Avinash Seethalam

Posted on May 9

Running Hermes Agent with NVIDIA-Hosted Models and Local Ollama

I spent about a week migrating my agent setup off OpenClaw and onto Hermes Agent, with a hybrid backend. NVIDIA-hosted inference for the heavy stuff, a local Ollama daemon for everything I didn't want leaving the box. This is what I ended up with after two false starts and one evening of yelling at WSL networking.

If you're already happy with your current loop, skip this. If you've been hitting the same things I have (flaky routing, agents that lose the plot on multi-file edits, surprise bills), maybe useful.

References I actually opened while writing:

Hermes repo: https://github.com/NousResearch/hermes-agent
Hermes docs: https://hermes-agent.nousresearch.com/docs/
OpenClaw repo: https://github.com/openclaw/openclaw
OpenClaw docs: https://docs.openclaw.ai
NVIDIA NIM / build catalog: https://build.nvidia.com
Ollama repo: https://github.com/ollama/ollama
OpenRouter coding category: https://openrouter.ai/apps/category/coding

Note on framing. OpenClaw and Hermes overlap but they are not the same shape of tool. OpenClaw is a personal-assistant gateway whose main surface is messaging channels (WhatsApp, Telegram, Discord, iMessage). Hermes ships a terminal TUI plus a messaging gateway plus a skills/memory loop, and includes a hermes claw migrate command that imports OpenClaw configs directly. So the comparison below is based on how I was using OpenClaw in practice (terminal-first), not its actual elevator pitch. If you came to OpenClaw for the WhatsApp bot, your mileage will differ.

Why I bothered

The OpenRouter coding category keeps growing. Hermes started showing up in those threads, and Nous shipping an explicit OpenClaw migration path made "try it for a week" cheap.

My OpenClaw setup had drifted into a state I didn't trust. Three things in particular:

Long sessions silently lost context after compaction. Asked it to recall a migration plan we'd sketched two hours earlier and got back a confidently wrong summary that mixed in details from a totally different repo.
Provider routing was opaque. I'd ask for a specific model and it'd quietly fall back to something cheaper. Only noticed because the latency dropped.
Multi-file refactors needed too much hand-holding. Edit file A correctly, edit B as if A's edit hadn't happened, loop.

Not OpenClaw-specific. General failure mode of agents that conflate "context window" with "memory." But it added up.

Why Hermes Over an OpenClaw-Style Workflow

What got better, in roughly a week of use:

Skills + persistent memory as first-class concepts. Hermes has a built-in skill loop and FTS5 session search. OpenClaw has a skills system too (ClawHub) but cross-session recall in Hermes felt tighter. Asking "how did we set up the OpenRouter pinning last week" actually returned the snippet.
A real terminal UI. hermes drops into a TUI with multiline editing, slash-command autocomplete, conversation history, streaming tool output. OpenClaw's chat surface is fine. Hermes' is just better suited to how I work.
Config is YAML. Everything in ~/.hermes/config.yaml, secrets in ~/.hermes/.env. You can diff it. You can copy it.
hermes model for switching providers. Or hermes config set model openrouter/google/gemini-2.5-flash directly. No restart dance.

Where OpenClaw is still the better pick:

Messaging-first workflows. OpenClaw's channel coverage is broader (WeChat, Matrix, Feishu, LINE, Nostr, the long tail). If your bot lives on WhatsApp, stay there.
Live Canvas and Voice Wake are nice if you're building a voice assistant rather than a coding agent. Hermes has voice memo transcription, not the same thing.
If you're on Node-only infra, npm install -g openclaw@latest is one line. Hermes pulls in uv, Python 3.11, Node, ripgrep, ffmpeg.

The thing that mattered most to me architecturally: Hermes treats the provider as configuration, not code. The same model.base_url field handles NVIDIA NIM, Ollama (local or Cloud), OpenRouter, anything OpenAI-compatible. One CLI command flips between them. OpenClaw can do this too. Hermes' YAML-first version is just faster to reason about when something breaks at 11pm.

Installation

macOS daily, Ubuntu workstation, WSL2 on a Windows laptop I travel with. Same one-liner everywhere.

macOS

The official installer handles uv, Python 3.11, Node, ripgrep, ffmpeg:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Output is illustrative, lines vary by version:

==> Installing uv
==> Installing Python 3.11
==> Installing Node.js
==> Cloning hermes-agent
==> Symlinking ~/.local/bin/hermes
✓ Hermes installed. Run: source ~/.zshrc && hermes

Then:

source ~/.zshrc
hermes --version
hermes doctor

hermes doctor is the most useful command during setup. Checks PATH, config location, provider reachability. Run it before anything else.

One macOS thing that cost me twenty minutes: if you have an older Homebrew Python on PATH, the installer prefers its own uv-managed Python (correct), but python3 on your shell is now a different interpreter than the one Hermes is using. Mostly fine, occasionally surprising when you're debugging. If you're hacking on Hermes itself, prefer the dev path:

git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
./setup-hermes.sh
./hermes

Ubuntu / Linux

Same one-liner.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

If you're on a minimal server image without curl, install it first (sudo apt install -y curl). Installer pulls into ~/.hermes/ and symlinks ~/.local/bin/hermes. Make sure that's on PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
which hermes
hermes doctor

If which hermes is empty, your rc is overriding PATH late. zsh+oh-my-zsh does this. Grep your .zshrc for export PATH= lines that come after the installer's edits.

Windows + WSL2

The Hermes README has a native Windows PowerShell installer flagged as early beta:

irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex

I tried it. It works, but I went back to WSL2 within a day. Inside WSL2 Ubuntu the Linux one-liner is fine:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

WSL was annoying as hell, mostly for reasons that aren't Hermes' fault.

Don't put your project on /mnt/c/.... The 9P translation layer makes file watches and large reads slow enough that tool calls visibly lag. Workspaces on the WSL native filesystem (~/work/...) only.
If you install Ollama on Windows and Hermes inside WSL, you have to reach the Windows host from WSL. Host IP is in /etc/resolv.conf as nameserver. On some configurations it changes between reboots. I gave up. Installed Ollama inside WSL.
Ctrl+C in some Windows terminals doesn't propagate cleanly to long tool calls. Windows Terminal is better than the legacy console here. Don't use the legacy one.

The Hermes browser-based dashboard chat pane requires WSL2 specifically (it uses a POSIX PTY). Classic CLI and gateway run natively. So if you only need the terminal, the PowerShell install is technically fine. I just didn't trust it enough.

First-Run Setup

hermes setup

Wizard. Walks you through provider selection, key entry, writes the config. If you have an existing ~/.openclaw it offers to migrate skills, memories, command allowlists, API keys. From the README:

hermes claw migrate              # Interactive migration
hermes claw migrate --dry-run    # Preview what would be migrated
hermes claw migrate --preset user-data   # Migrate without secrets
hermes claw migrate --overwrite  # Overwrite existing conflicts

Run --dry-run first. It prints exactly what would be copied where. Useful, and the kind of thing that suggests someone actually thought about the migration UX. I imported user-data only and re-pasted my keys by hand because the OpenClaw config had three stale keys I'd forgotten about.

After setup:

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials
├── SOUL.md         # Primary agent identity
├── memories/       # Persistent memory
├── skills/         # Agent-created and imported skills
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Error and gateway logs

HERMES_HOME overrides the location if you want parallel installations.

hermes config after setup gives you a one-screen view of where everything resolves from. Useful to verify the model is actually pointed where you think.

NVIDIA Model Configuration

NVIDIA's hosted endpoints (build.nvidia.com, the NIM-style ones) are OpenAI-compatible. Hermes already speaks OpenAI-compatible. So plugging them in is base URL plus key.

Why I leaned on them:

Latency was good and stayed good. I expected hosted endpoints to be uneven. Over a week they weren't, with one exception (more on that below).
Llama variants, Qwen-Coder variants, DeepSeek-Coder, Nemotron, all reachable from one provider with one key. No juggling four credentials.
A 70B-class model running in the cloud is, from my workstation's perspective, free RAM.

The exception: on April 18 the integrate.api.nvidia.com endpoint started throwing 5xx for about twenty minutes around midday Pacific. Hermes retried with backoff but the session was effectively frozen until I noticed and flipped to local Ollama. Not a big deal. Worth knowing the failure mode.

Getting a key

Wiring it into Hermes

Fastest path is the env-var route. Hermes reads provider keys and base URLs from ~/.hermes/.env:

# ~/.hermes/.env
NVIDIA_API_KEY=nvapi-...
NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1
HERMES_INFERENCE_PROVIDER=nvidia

NVIDIA_BASE_URL defaults to https://integrate.api.nvidia.com/v1 per the Hermes environment variables reference, so you can omit it unless you're hitting a self-hosted NIM. HERMES_INFERENCE_PROVIDER accepts values like nvidia, openrouter, anthropic, ollama-cloud (from the same reference). It's the global "which provider is the default" switch.

Pick a model:

hermes model
# or directly
hermes config set model nvidia/meta/llama-3.1-70b-instruct

Model identifier depends on what NVIDIA exposes in the catalog at the time. The catalog moves. Verify the slug before pasting. Common ones I've used:

meta/llama-3.1-70b-instruct
qwen/qwen2.5-coder-32b-instruct
deepseek-ai/deepseek-coder-...
Various nemotron variants

If the slug doesn't resolve, Hermes tells you on first call rather than at config time. Mildly annoying. Fine once you know.

YAML equivalent

# ~/.hermes/config.yaml
model:
  default: meta/llama-3.1-70b-instruct
  provider: nvidia
  base_url: ""        # leave empty to use NVIDIA_BASE_URL from .env
  context_length: 32768

The Hermes config docs are explicit about base_url: when set, Hermes ignores the provider and calls that endpoint directly. Useful for self-hosted NIMs. Footgun if you forget about a stale URL from an experiment three weeks ago. Empty string is the safe default.

Operational notes

Rate limits exist and I haven't found a definitive published cap. In practice agent loops hit limits well before chat sessions do, because every tool result is going back into the context.

Free-tier quotas are real. I'd planned to do bulk repo analysis on hosted models. Switched to local once I realized how fast the quota burns. Reserve the hosted ones for the parts that benefit from a 70B-class model.

Advertised context windows and the windows that actually behave well are not the same. Past ~32K tokens on some models the recall got noticeably worse. I cap context_length at 32768 even on models that claim more. (There's a separate question about whether the model is "using" the long context or just paying its memory cost. I haven't dug in.)

Default timeouts were fine for chat, occasionally too short for long tool-augmented planning. Bump if you're seeing premature aborts.

Ollama Configuration

NVIDIA-hosted is great until you're on a plane, on a hotspot, or working on something you don't want leaving the machine.

Install

macOS:

brew install ollama
brew services start ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable --now ollama

Default listen: 127.0.0.1:11434. Need it reachable on a LAN, set OLLAMA_HOST=0.0.0.0:11434 before starting. Heads up: there's no auth on the Ollama API. Don't expose it on a public network. Don't bind it to 0.0.0.0 on a coffee-shop wifi.

Pulling models

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:14b      # if you have the VRAM
ollama pull deepseek-coder-v2:16b   # MoE, surprisingly fast for its size
ollama pull llama3.1:8b

pulling manifest
pulling abc123... 100% ▕████████████████▏ 4.7 GB
verifying sha256 digest
writing manifest
success

Verify it runs before wiring it into Hermes:

ollama run qwen2.5-coder:7b "write a python function that reverses a string"

If that hangs >30s on first invocation, the model is loading. Subsequent calls are fast. Hangs forever, you're probably on CPU fallback because the GPU couldn't initialize. Check ollama ps and nvidia-smi (or Activity Monitor on Mac).

Wiring local Ollama into Hermes

This is the gotcha that cost me an hour. Hermes' default for OLLAMA_BASE_URL is https://ollama.com/v1, which is Ollama Cloud, not your local daemon. Want local? Override it:

# ~/.hermes/.env
OLLAMA_API_KEY=ollama                       # any non-empty string; local Ollama ignores it
OLLAMA_BASE_URL=http://localhost:11434/v1   # local daemon, NOT Ollama Cloud

Doc-verified path uses the YAML provider: custom form, which bypasses provider-name routing and calls base_url directly:

# ~/.hermes/config.yaml
model:
  default: qwen2.5-coder:14b
  provider: custom
  base_url: "http://localhost:11434/v1"

Or from the CLI:

hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default qwen2.5-coder:14b

For Ollama Cloud, leave OLLAMA_BASE_URL at default and set HERMES_INFERENCE_PROVIDER=ollama-cloud. The env-vars reference lists ollama-cloud explicitly. A bare ollama provider isn't documented there at the time of writing, so I stuck with provider: custom for local rather than guess.

Sanity check:

hermes doctor
hermes config

hermes doctor will tell you if the configured base URL is unreachable. Faster signal than waiting for the first chat turn to fail.

Operational notes

VRAM is the constraint. A 14b Q4 quant runs comfortably on a 16 GB GPU. A 32b does not. On my M2 Pro 16 GB Mac, 14b is the practical ceiling and I notice the memory pressure with a browser open.

Quantization matters more than I expected going in. q4_K_M is the sweet spot for coding tasks. q8_0 is noticeably better on nuanced refactors but the memory cost is real and you'll feel it.

CPU fallback is unusable for interactive work. A 7b on pure CPU can take 30+ seconds per response. Fine for batch, painful for an agent loop.

Ollama default context is 2048 tokens on some models. Trips people up constantly. Set num_ctx via the model's Modelfile or pass it through Hermes; verify with ollama show <model>. I lost an evening to this before realizing the model wasn't dumb, it was just blind past the first 2K tokens.

Recommended Model Setup

Qualitative, week of real use, no benchmarks.

Model	Provider	Coding Quality	Latency	VRAM	Cost	Best For
Llama 3.1 70B Instruct	NVIDIA	Strong	Medium	n/a (hosted)	Free-tier OK	Planning, long-context reasoning
Qwen2.5-Coder 32B	NVIDIA	Very strong	Medium	n/a (hosted)	Free-tier OK	Multi-file refactors, code review
DeepSeek-Coder (large)	NVIDIA	Strong	Medium	n/a (hosted)	Free-tier OK	Algorithmic / DSA-style tasks
Nemotron family	NVIDIA	Variable	Variable	n/a (hosted)	Free-tier OK	Worth A/B-testing on your domain
Qwen2.5-Coder 14B (q4_K_M)	Ollama	Solid	Fast	~10–12 GB	Local only	Daily driver, offline work
Qwen2.5-Coder 7B (q4_K_M)	Ollama	OK	Very fast	~5–6 GB	Local only	Quick edits, autocomplete-style use
DeepSeek-Coder-V2 16B (MoE)	Ollama	Strong	Fast	~10–12 GB	Local only	Surprisingly capable for its footprint
Llama 3.1 8B	Ollama	OK	Very fast	~5–6 GB	Local only	Lightweight planning / chat

Day-to-day I use Qwen2.5-Coder 32B on NVIDIA for serious work, Qwen2.5-Coder 14B locally for everything else, Llama 3.1 70B on NVIDIA when I need long-context planning. Tried the rest, rotated them out. A coworker on an M3 Max says the local 32B is usable for him; on my 16 GB Pro it isn't, so don't take VRAM numbers above as the floor for everyone.

Switching between them is one line for hosted, three for local because of the base_url switch:

# hosted
hermes config set model nvidia/qwen/qwen2.5-coder-32b-instruct

# local
hermes config set model.provider custom
hermes config set model.base_url http://localhost:11434/v1
hermes config set model.default qwen2.5-coder:14b

hermes model (the interactive picker) does the same thing in fewer keystrokes once you've used it twice.

Real Workflow Improvements

Concrete things that got better:

Repository analysis. Pointing at a 200-file Python repo and asking "where does the auth flow start" used to be a coin flip. With Hermes routing the analysis pass to a 32B-class hosted model and edits to a local 14B, I get useful answers in under a minute, with file paths I can actually open.

Multi-file refactors. Renaming a domain concept across a service used to require me to micromanage every file. Hermes' tool-call sequencing handles "edit A, re-read A, then edit B based on A's new state" without me nudging it each step. Not magic, it still gets confused on circular imports, but the baseline is better.

Long-context exploration works. Pasting a stack trace plus three relevant files into context and asking for a hypothesis is reliable on the 70B hosted model. Local 14B handles shorter cases.

Cross-session recall is the feature I miss most when I temporarily switch back to anything else. "How did I configure the NVIDIA timeout last week" returns the actual config snippet, not a guess. Different in kind.

Skills. I haven't gone deep here yet. The bundled openclaw-migration skill walked me through the import with dry-run previews and that alone saved a chunk of time. The autonomous skill creation after complex tasks is the part I want to evaluate over a longer horizon, ask me in a month.

What didn't change:

Tool calls run my tests fine. Interpreting flaky test output is still on me.

Frontend work. All current models are mediocre at non-trivial CSS. Hermes doesn't fix that.

Truly novel architectural decisions, the agent produces something plausible, which is worse than producing nothing if you're not careful.

Failure Modes and Rough Edges

The section that made me want to write the post.

OLLAMA_BASE_URL defaults to Ollama Cloud, not local. Most common silent failure I've seen. Override to http://localhost:11434/v1 for local.
API key not picked up. Hermes reads ~/.hermes/.env at startup. Edit while running, restart the session or run hermes config check. (I keep meaning to file an issue about a hermes reload command. Haven't.)
OpenRouter routing inconsistencies. The underlying provider OpenRouter selects can change between requests. Pin a provider preference if reproducibility matters.
Ollama context default of 2048 on some models. Your model isn't dumb. Set num_ctx, verify with ollama show <model>.
WSL filesystem. File watch events on /mnt/c/... are unreliable. Workspaces on the WSL native filesystem only.
model.base_url overrides model.provider silently per the docs. A stale base_url from an earlier experiment will quietly route everything to the wrong endpoint. I did this to myself twice.
Free-tier throttling. NVIDIA's free tier will throttle. Hermes retries on 429s. You'll see a session pause for 5–30s with no obvious indicator unless you're tailing ~/.hermes/logs/.
Reasoning-heavy variants. Some Nemotron-family reasoning models produce great output 90% of the time and absolute nonsense the other 10%. Worth keeping in your config, don't make them the default.
Token cost surprises. Long agent loops consume an order of magnitude more tokens than chat sessions because every tool call result goes back in. Watch the dashboard the first few days.
Migration imports more than you might want. Default preset brings API keys over. Use --preset user-data to skip.

The defaults aren't great. Not wrong, exactly. Just the combination of "Ollama base URL pointing at Cloud, plus 2048 context, plus free-tier quota" produces a setup that works for twenty minutes and then mysteriously degrades, and you spend an evening figuring out which knob.

Troubleshooting

Quick reference for things I've actually hit:

hermes: command not found. ~/.local/bin not on PATH. Add it.
PermissionError on config write. Set HERMES_HOME to a writable path.
401 Unauthorized from NVIDIA. Key not in ~/.hermes/.env, or rotated. cat ~/.hermes/.env | grep NVIDIA.
connection refused to Ollama. Daemon not running. ollama serve, or brew services start ollama, or systemctl start ollama.
Hermes calls https://ollama.com/v1 instead of localhost. OLLAMA_BASE_URL not overridden.
Ollama model "doesn't follow instructions". Almost always the 2048-context default.
Tool calls hang forever. Provider timeout too short, or the model is in a tool-call loop. Inspect ~/.hermes/logs/.
Hermes "loses" the workspace. You're on WSL with the project on /mnt/c/....
Different answers from the same prompt. Provider-side cache, or a routing layer selecting a different upstream. Pin the provider, disable cache while debugging.
Migration wizard doesn't see OpenClaw. Wizard looks at ~/.openclaw. Symlink if elsewhere.
Sudden latency spike. Check the provider's status page. NVIDIA's hosted endpoints have been stable, mostly, but they're not magic and April 18 happened.

When in doubt, hermes doctor first. Catches more first-line problems than you'd expect.

Final Thoughts

Hermes is best for engineers who already have an opinion about how their tooling should work and want an agent that exposes its config rather than hiding it. If you want defaults that just work with no thought, OpenClaw and the more polished alternatives are friendlier on day one. And OpenClaw is genuinely the better tool if your primary surface is messaging channels rather than a terminal.

Where Hermes still needs work, in my opinion:

OLLAMA_BASE_URL defaulting to Cloud is a usability footgun. A clearer default or a louder warning on first call would help.
Configuration documentation lags the schema in places. I read source more than once.
Error messages on misconfig are sometimes cryptic. I'd take slower startup for better diagnostics.

Hybrid setups make sense right now because hosted inference is fast and capable but unreliable in ways out of your control (rate limits, quotas, occasional regressions on newly-deployed models). Local inference is reliable but capacity-constrained. Running both, routing deliberately, gives you a setup that degrades gracefully. Not a revolution. Just how the production-engineering side of any "use a service" problem has always worked. The fact that we're now doing it for inference is the new part.

I'll keep using this. If the NVIDIA endpoints change, or the Hermes config schema churns again, I'll update the post. Probably.

Appendix A — Suggested image directory layout

images/
├── hermes-doctor.png               # 'hermes doctor' diagnostic output
├── hermes-config.png               # 'hermes config' resolved settings
├── nvidia-dashboard.png            # build.nvidia.com API key management
├── nvidia-key-creation.png         # NVIDIA API key creation (optional)
├── ollama-running.png              # ollama ps / loaded model
├── ollama-pull.png                 # 'ollama pull' progress bar (optional)
├── workflow-multifile-refactor.gif # multi-file refactor session (optional)
├── workflow-repo-analysis.png      # repo analysis output (optional)
└── failure-mode-429.png            # 429 retry-with-backoff (optional)

Appendix B — `assets/commands.sh`

Snippets I keep around as quick-reference. Adjust paths and model identifiers for your setup.

#!/usr/bin/env bash
set -euo pipefail

# --- Hermes install (Linux/macOS/WSL2) ---
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
# Reload shell so 'hermes' is on PATH
# shellcheck disable=SC1090
source "${ZDOTDIR:-$HOME}/.zshrc" 2>/dev/null || source "$HOME/.bashrc"

# --- First-run setup (also offers OpenClaw migration if ~/.openclaw exists) ---
hermes setup

# --- Optional: explicit OpenClaw migration ---
hermes claw migrate --dry-run
# hermes claw migrate --preset user-data
# hermes claw migrate --overwrite

# --- Provider env: append to ~/.hermes/.env ---
ENV_FILE="$HOME/.hermes/.env"
mkdir -p "$HOME/.hermes"
{
  echo "NVIDIA_API_KEY=${NVIDIA_API_KEY:-replace-me}"
  echo "NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1"
  echo "OLLAMA_API_KEY=ollama"                       # any non-empty value
  echo "OLLAMA_BASE_URL=http://localhost:11434/v1"   # local Ollama, NOT Ollama Cloud
  echo "HERMES_INFERENCE_PROVIDER=nvidia"
} >> "$ENV_FILE"

# --- Ollama install + local models ---
# macOS: brew install ollama && brew services start ollama
# Linux: curl -fsSL https://ollama.com/install.sh | sh && sudo systemctl enable --now ollama

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:14b
ollama pull deepseek-coder-v2:16b
ollama pull llama3.1:8b

# --- Sanity checks ---
hermes --version
hermes doctor || true
hermes config
ollama list
ollama ps

# --- Smoke tests ---
ollama run qwen2.5-coder:7b "print('hello from local')" || true
curl -sS "${NVIDIA_BASE_URL:-https://integrate.api.nvidia.com/v1}/models" \
  -H "Authorization: Bearer ${NVIDIA_API_KEY:-replace-me}" | head -c 400 || true

# --- Switch models from CLI ---
# hermes config set model nvidia/meta/llama-3.1-70b-instruct
# hermes config set model nvidia/qwen/qwen2.5-coder-32b-instruct
#
# Local Ollama (provider=custom + base_url; bare 'ollama/...' isn't documented):
# hermes config set model.provider custom
# hermes config set model.base_url http://localhost:11434/v1
# hermes config set model.default qwen2.5-coder:14b

That's the setup.

DEV Community

Running Hermes Agent with NVIDIA-Hosted Models and Local Ollama

Why I bothered

Why Hermes Over an OpenClaw-Style Workflow

Installation

macOS

Ubuntu / Linux

Windows + WSL2

First-Run Setup

NVIDIA Model Configuration

Getting a key

Wiring it into Hermes

YAML equivalent

Operational notes

Ollama Configuration

Install

Pulling models

Wiring local Ollama into Hermes

Operational notes

Recommended Model Setup

Real Workflow Improvements

Failure Modes and Rough Edges

Troubleshooting

Final Thoughts

Appendix A — Suggested image directory layout

Appendix B — `assets/commands.sh`

Top comments (0)

Why I bothered

Why Hermes Over an OpenClaw-Style Workflow

Installation

macOS

Ubuntu / Linux

Windows + WSL2

First-Run Setup

NVIDIA Model Configuration

Getting a key

Wiring it into Hermes

YAML equivalent

Operational notes

Ollama Configuration

Install

Pulling models

Wiring local Ollama into Hermes

Operational notes

Recommended Model Setup

Real Workflow Improvements

Failure Modes and Rough Edges

Troubleshooting

Final Thoughts

Appendix A — Suggested image directory layout

Appendix B — assets/commands.sh

Appendix B — `assets/commands.sh`