Insight 105

Posted on May 10

Run Frontier AI for Free — Ollama Cloud Models with OpenCode

#ai #webdev #productivity #tutorial

No GPU. No subscription. No kidding.

Here's how to run powerful cloud-hosted AI models through Ollama — completely free — using just one command.

The Secret Nobody's Talking About

Most developers assume that running a model like GLM-4.7, GPT-OSS 120B, or Gemma3 27B requires either expensive hardware or a paid cloud API. But Ollama quietly introduced something called cloud models — models that run on Ollama's infrastructure, not your machine, and many of them are free.

The catch? You need a smart way to use them for coding. Enter OpenCode — an AI-powered coding agent that plugs right into Ollama.

What You'll Need

Before anything else, make sure you have both tools installed:

1. Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Or download the installer from https://ollama.com for Windows/macOS.

2. Install OpenCode

npm install -g opencode-ai

OpenCode is a terminal-based AI coding agent. Think of it as a free, local alternative to GitHub Copilot Workspace.

That's it. You don't need to ollama pull anything. Cloud models are fetched on-demand — no gigabytes of weights filling up your disk.

One Command to Rule Them All

ollama launch opencode --model glm-4.7:cloud

That's the whole magic. Swap glm-4.7:cloud with any free cloud model below and you're done.

OpenCode will open an interactive coding session powered by the model you chose, running in Ollama's cloud — no local GPU required.

Free Cloud Models You Can Use Right Now

These models have been tested and confirmed to work without a Pro subscription:

Model	Command	Notes
GLM-4.7 (Z.AI)	`--model glm-4.7:cloud`	Strong reasoning, free cloud-only
GPT-OSS 20B (OpenAI)	`--model gpt-oss:20b-cloud`	OpenAI open-source, confirmed ✅
Gemma3 27B (Google)	`--model gemma3:27b-cloud`	Google's latest, confirmed ✅
Gemma3 4B (Google)	`--model gemma3:4b-cloud`	Lighter, fast, great for quick tasks
Devstral Small 2 (Mistral)	`--model devstral-small-2:24b-cloud`	Coding-specialized, confirmed ✅
Minimax M2.5	`--model minimax-m2.5:cloud`	Top open-source SWE benchmark
Qwen3 Coder 480B (Alibaba)	`--model qwen3-coder:480b-cloud`	Massive coding model, free!
Qwen3 Next 80B (Alibaba)	`--model qwen3-next:80b-cloud`	General purpose powerhouse
Qwen3 Coder Next	`--model qwen3-coder-next:cloud`	Latest Qwen coder variant
Nemotron 3 Super (NVIDIA)	`--model nemotron-3-super:cloud`	NVIDIA's flagship reasoning model
Ministral 3 (Mistral)	`--model ministral-3:8b-cloud`	Efficient, fast, multilingual
RNJ-1 (Essential AI)	`--model rnj-1:8b-cloud`	Lightweight and capable

Tip: Start with gemma3:27b-cloud or gpt-oss:20b-cloud — both responded instantly in testing.

Example Session

# Launch OpenCode with Google's Gemma3 27B — free, no install needed
ollama launch opencode --model gemma3:27b-cloud

OpenCode v1.x — powered by gemma3:27b-cloud
Type your task or press Ctrl+C to exit.

> Refactor this Python function to be async and add error handling

◆ Reading your codebase...
◆ Generating solution...

[gemma3:27b-cloud] Here's the refactored version:
...

What About the Pro-Only Models?

Some of the most capable frontier models require an Ollama Pro subscription. You'll get a 403 Forbidden if you try them without one:

Model	Tier
DeepSeek V4 Pro (1.6T MoE)	❌ Pro only
Qwen3.5 Cloud	❌ Pro only
Kimi K2.6 (multimodal agentic)	❌ Pro only
GLM-5.1 (SOTA SWE-Bench)	❌ Pro only
Mistral Large 3 (675B)	❌ Pro only
Gemini 3 Flash Preview	❌ Pro only

These are genuinely frontier-class models. If you find the free tier useful, it's worth checking out Ollama Pro to unlock them.

Why This Matters

	Traditional Setup	Ollama Cloud
Hardware	GPU required	Any machine
Disk space	5–290 GB per model	0 GB
Setup time	Minutes to hours	Seconds
Cost	Hardware + electricity	Free
Model size	Limited by your VRAM	Up to 480B parameters

Running Qwen3 Coder 480B locally would require ~290 GB of disk and multiple high-end GPUs. Via Ollama Cloud? One command, zero setup.

Quick Reference

# Coding-focused (recommended for OpenCode)
ollama launch opencode --model qwen3-coder:480b-cloud
ollama launch opencode --model devstral-small-2:24b-cloud
ollama launch opencode --model gpt-oss:20b-cloud

# General purpose powerhouses
ollama launch opencode --model gemma3:27b-cloud
ollama launch opencode --model glm-4.7:cloud
ollama launch opencode --model nemotron-3-super:cloud

# Lightweight and fast
ollama launch opencode --model gemma3:4b-cloud
ollama launch opencode --model ministral-3:8b-cloud
ollama launch opencode --model rnj-1:8b-cloud

Final Thought

The AI infrastructure barrier is quietly disappearing. You don't need a $10,000 GPU cluster or a pricey API subscription to run capable, large-scale models anymore. With Ollama Cloud and OpenCode, a curl command and an npm install is all that stands between you and a 480-billion-parameter coding assistant.

No GPU. No subscription. No excuses.

Models tested directly via ollama run <model> "hello" — May 10, 2026. Free tier availability may change. Check ollama.com/search?c=cloud for the latest.

DEV Community