DEV Community

Cover image for Run Frontier AI for Free — Ollama Cloud Models with OpenCode
Insight 105
Insight 105

Posted on

Run Frontier AI for Free — Ollama Cloud Models with OpenCode

No GPU. No subscription. No kidding.

Here's how to run powerful cloud-hosted AI models through Ollama — completely free — using just one command.


The Secret Nobody's Talking About

Most developers assume that running a model like GLM-4.7, GPT-OSS 120B, or Gemma3 27B requires either expensive hardware or a paid cloud API. But Ollama quietly introduced something called cloud models — models that run on Ollama's infrastructure, not your machine, and many of them are free.

The catch? You need a smart way to use them for coding. Enter OpenCode — an AI-powered coding agent that plugs right into Ollama.


What You'll Need

Before anything else, make sure you have both tools installed:

1. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh
Enter fullscreen mode Exit fullscreen mode

Or download the installer from https://ollama.com for Windows/macOS.

2. Install OpenCode

npm install -g opencode-ai
Enter fullscreen mode Exit fullscreen mode

OpenCode is a terminal-based AI coding agent. Think of it as a free, local alternative to GitHub Copilot Workspace.

That's it. You don't need to ollama pull anything. Cloud models are fetched on-demand — no gigabytes of weights filling up your disk.


One Command to Rule Them All

ollama launch opencode --model glm-4.7:cloud
Enter fullscreen mode Exit fullscreen mode

That's the whole magic. Swap glm-4.7:cloud with any free cloud model below and you're done.

OpenCode will open an interactive coding session powered by the model you chose, running in Ollama's cloud — no local GPU required.


Free Cloud Models You Can Use Right Now

These models have been tested and confirmed to work without a Pro subscription:

Model Command Notes
GLM-4.7 (Z.AI) --model glm-4.7:cloud Strong reasoning, free cloud-only
GPT-OSS 20B (OpenAI) --model gpt-oss:20b-cloud OpenAI open-source, confirmed ✅
Gemma3 27B (Google) --model gemma3:27b-cloud Google's latest, confirmed ✅
Gemma3 4B (Google) --model gemma3:4b-cloud Lighter, fast, great for quick tasks
Devstral Small 2 (Mistral) --model devstral-small-2:24b-cloud Coding-specialized, confirmed ✅
Minimax M2.5 --model minimax-m2.5:cloud Top open-source SWE benchmark
Qwen3 Coder 480B (Alibaba) --model qwen3-coder:480b-cloud Massive coding model, free!
Qwen3 Next 80B (Alibaba) --model qwen3-next:80b-cloud General purpose powerhouse
Qwen3 Coder Next --model qwen3-coder-next:cloud Latest Qwen coder variant
Nemotron 3 Super (NVIDIA) --model nemotron-3-super:cloud NVIDIA's flagship reasoning model
Ministral 3 (Mistral) --model ministral-3:8b-cloud Efficient, fast, multilingual
RNJ-1 (Essential AI) --model rnj-1:8b-cloud Lightweight and capable

Tip: Start with gemma3:27b-cloud or gpt-oss:20b-cloud — both responded instantly in testing.


Example Session

# Launch OpenCode with Google's Gemma3 27B — free, no install needed
ollama launch opencode --model gemma3:27b-cloud
Enter fullscreen mode Exit fullscreen mode
OpenCode v1.x — powered by gemma3:27b-cloud
Type your task or press Ctrl+C to exit.

> Refactor this Python function to be async and add error handling

◆ Reading your codebase...
◆ Generating solution...

[gemma3:27b-cloud] Here's the refactored version:
...
Enter fullscreen mode Exit fullscreen mode

What About the Pro-Only Models?

Some of the most capable frontier models require an Ollama Pro subscription. You'll get a 403 Forbidden if you try them without one:

Model Tier
DeepSeek V4 Pro (1.6T MoE) ❌ Pro only
Qwen3.5 Cloud ❌ Pro only
Kimi K2.6 (multimodal agentic) ❌ Pro only
GLM-5.1 (SOTA SWE-Bench) ❌ Pro only
Mistral Large 3 (675B) ❌ Pro only
Gemini 3 Flash Preview ❌ Pro only

These are genuinely frontier-class models. If you find the free tier useful, it's worth checking out Ollama Pro to unlock them.


Why This Matters

Traditional Setup Ollama Cloud
Hardware GPU required Any machine
Disk space 5–290 GB per model 0 GB
Setup time Minutes to hours Seconds
Cost Hardware + electricity Free
Model size Limited by your VRAM Up to 480B parameters

Running Qwen3 Coder 480B locally would require ~290 GB of disk and multiple high-end GPUs. Via Ollama Cloud? One command, zero setup.


Quick Reference

# Coding-focused (recommended for OpenCode)
ollama launch opencode --model qwen3-coder:480b-cloud
ollama launch opencode --model devstral-small-2:24b-cloud
ollama launch opencode --model gpt-oss:20b-cloud

# General purpose powerhouses
ollama launch opencode --model gemma3:27b-cloud
ollama launch opencode --model glm-4.7:cloud
ollama launch opencode --model nemotron-3-super:cloud

# Lightweight and fast
ollama launch opencode --model gemma3:4b-cloud
ollama launch opencode --model ministral-3:8b-cloud
ollama launch opencode --model rnj-1:8b-cloud
Enter fullscreen mode Exit fullscreen mode

Final Thought

The AI infrastructure barrier is quietly disappearing. You don't need a $10,000 GPU cluster or a pricey API subscription to run capable, large-scale models anymore. With Ollama Cloud and OpenCode, a curl command and an npm install is all that stands between you and a 480-billion-parameter coding assistant.

No GPU. No subscription. No excuses.


Models tested directly via ollama run <model> "hello" — May 10, 2026. Free tier availability may change. Check ollama.com/search?c=cloud for the latest.

Top comments (0)