DEV Community

Mary Olowu
Mary Olowu

Posted on

I Stopped Using Claude Code as a Giant Prompt and Started Using It as Project Ops

If you use Claude Code on a real project for more than one-off coding tasks, you eventually hit the same wall:

the model is good at solving the task in front of it, but every new session still has to reconstruct the project.

For me, that got especially annoying in a solo-dev monorepo. I was not just asking Claude to write code. I was also using it for:

  • backlog triage
  • bug capture
  • planning the next task
  • weekly status summaries
  • preserving decisions across sessions

At some point I realized I was trying to solve a workflow problem with a better prompt.

That was the wrong move.

What helped was building a thin project-ops layer around Claude Code instead.

My current version uses Jira MCP for backlog work, Confluence for published reports, a local JSON context DB for working memory, maintainer docs for durable context, and a few commands like /standup, /bug, and /rfe.

Then I pulled the reusable parts into a public starter repo without shipping the private project details around them.

The repo is here: restofstack/claude-project-ops-starter.

TL;DR

The useful part of my setup is not one giant prompt.

It is:

  1. a short CLAUDE.md for guardrails
  2. a docs/maintainers/ folder for durable project context
  3. a tiny local JSON file for rolling memory
  4. real systems of record for backlog, PRs, and releases
  5. reusable commands for common project-ops tasks

That is the pattern I extracted into a public starter repo.

The Problem With Default AI Usage on Ongoing Projects

The default interaction pattern looks like this:

  1. open Claude Code
  2. paste context
  3. explain the task
  4. repeat tomorrow

That is fine for isolated implementation work.

It breaks down when each session has to renegotiate:

  • what matters in the repo
  • where architecture context lives
  • what work is already in progress
  • which tools are authoritative
  • how status should be reported

Once a project is large enough, "just paste more context" stops being a serious strategy.

The Structure I Ended Up With

This is the rough shape:

CLAUDE.md
docs/
  maintainers/
    README.md
    overview/
    development/
.workspace-temp/
  context-db.json
.claude/
  commands/
    standup.md
    bug.md
    rfe.md
    reflect.md
    weekly-report.md
Enter fullscreen mode Exit fullscreen mode

Each part has a different job.

1. CLAUDE.md Is for Rules, Not Everything

I keep CLAUDE.md short and boring on purpose.

It only contains the repo-level rules that should apply in every session, things like:

  • prefer the existing system of record over invented state
  • finish work in progress before proposing new work
  • never fabricate backlog items or counts
  • keep outputs concise and actionable

That file is not where I put architecture notes, runbooks, or a giant project brain dump.

If you overload it, both you and the model stop trusting it.

2. docs/maintainers/ Holds the Durable Context

Anything that should survive beyond a session goes into maintainers docs:

  • system overviews
  • service boundaries
  • local development notes
  • runbooks
  • release notes

This gives Claude a clean place to start, and it has a side benefit: the docs also become useful to humans.

That matters more than it sounds. If a doc is good enough for future-you, it is usually better context for AI too.

3. Local JSON Memory Is Good Enough

I use a small local JSON file for rolling working memory.

Not a service. Not a database product. Just a file.

It stores a few useful things:

  • what shipped recently
  • branch or PR context
  • decisions worth remembering
  • estimate patterns

This has been the right level of complexity for solo work because it is:

  • cheap
  • easy to inspect
  • easy to edit
  • easy to replace later

I also use a /reflect command to append small memory items instead of trying to manually maintain that file all the time.

4. Real Systems of Record Stay Real

Claude should not become your shadow Jira, shadow GitHub, or shadow release tracker.

The actual source of truth should stay in the actual system:

  • Jira, GitHub Issues, Linear, or whatever you use for backlog
  • PR system
  • release history
  • docs or wiki

The AI should read from those systems and synthesize useful outputs. It should not replace them.

That boundary is what keeps the workflow practical instead of magical-and-fragile.

The Commands Were the Biggest UX Upgrade

The structure matters, but the commands are what made the whole thing usable day to day.

I extracted the workflows I kept repeating:

  • /standup
  • /bug
  • /rfe
  • /reflect
  • /weekly-report

And when I cleaned the setup for a public starter, a few surrounding workflows became part of the picture too: /checkpoint, /sanitize, /docs-sync, /release-notes, and /root-cause.

Each one has a defined job.

For example:

  • /standup checks memory, git state, PR state, backlog state, and maintainers docs, then recommends the next actions
  • /bug captures a clean bug report without turning it into a full debugging session
  • /weekly-report turns project signals into a durable report instead of a one-off chat response

That consistency removed a lot of prompt thrash.

Without commands, every request is basically a blank page. With commands, the common project tasks have defaults.

Where This Fits Relative to Spec Kit

I still use Spec Kit, and I do not think this starter replaces it.

Spec Kit is useful when I want to take one feature or product change and push it toward a clearer spec and implementation path.

This starter handles a different layer: working memory, maintainer docs, standups, bug and RFE capture, reports, handoff, and the repeatable repo workflows that help Claude pick up the thread again tomorrow.

So for me this fills a different gap than Spec Kit or other Claude "superpowers" style workflows.

Why This Was Especially Useful in a Monorepo

Monorepos create a context problem fast.

Even as a solo developer, I still need a reliable way to answer:

  • what changed recently?
  • what is in progress?
  • what got forgotten?
  • what should be picked up next?
  • what decisions should persist beyond this session?

I did not want to build a custom agent platform to solve that.

I also did not want to keep improvising.

This setup gave me a middle path.

The Portable Part

The original version of this workflow was tied pretty closely to my own stack.

The part worth sharing was the pattern, not the exact tools.

It also needed a cleanup pass before it was publishable. A real working setup usually contains things you should not ship as-is:

  • local Claude settings
  • project-specific IDs and URLs
  • live working memory files
  • internal naming conventions
  • backlog details that only make sense inside the project

That is why I split the starter into adapters like:

  • Jira + Confluence
  • GitHub Issues + repo docs
  • local JSON + markdown only

So if your stack is different, you can still keep the same model:

  • stable guardrails
  • durable docs
  • lightweight memory
  • real systems of record
  • repeatable workflows

That was the real extraction goal: publish the useful workflows, not the private residue of my specific project.

What I Would Recommend If You Try This

If you want to copy the idea, I would start with this:

  1. Keep CLAUDE.md short.
  2. Move durable project context into maintainers docs.
  3. Use a tiny local memory file before building anything fancier.
  4. Pick 3-5 workflows you repeat all the time and formalize them.
  5. Keep the real backlog and release data in the systems you already trust.

That is enough to get most of the value.

Closing

The useful change was not "add more prompt."

It was "design better interfaces for the model."

That is what made Claude Code feel less like a clever autocomplete session and more like a practical project-ops layer for an ongoing codebase.

If you are already using AI on a real repo, that is where I think the leverage is.

Top comments (3)

Collapse
 
ggle_in profile image
HARD IN SOFT OUT

This shift is maturity in action. I fell into the mega‑prompt trap myself, and it never delivered. Treating the model as a project operator is a mental leap many won’t make—you’ve articulated it clearly. Real trouble with this pattern: partial failure. If step 3 fails, you’re left with a half‑refactored mess. In CI/CD, we enforce idempotency and explicit artifact passing. Do you have something similar here, or is it still a linear chain of hope? One idea: let each operation drop a tiny “manifest” of file changes. After everything runs, compare intended state to actual git diff and auto‑rollback mismatches. A mini-Terraform for code ops.

Collapse
 
restofstack profile image
Mary Olowu

Exactly. I’m not claiming this is fully transactional code ops yet. Right now it’s more guardrails, stable context, and repeatable workflows than true idempotent orchestration.

The protections I rely on today are smaller scoped operations, explicit systems of record, checkpoints/handoffs, maintainers docs, and reviewing actual git state instead of trusting a long agent chain. So yes, partial failure is still a real gap.

I really like your manifest idea. A mini-Terraform for code ops is a good way to describe the next layer: each workflow declares intended changes, emits a small artifact or manifest, and gets reconciled against the actual diff before the run is considered complete.
That’s a different maturity level than the starter I shared, but it’s probably the right direction for higher-risk or multi-step flows.

Collapse
 
ggle_in profile image
HARD IN SOFT OUT

Glad the manifest idea landed. The guardrails you listed—smaller scoped ops, explicit systems of record, checkpointing—already set the foundation for it. You're closer than you think.

One lightweight starting point: a declarative YAML file per workflow that lists expected file paths and their intended state (created, modified, deleted). After the run, diff it against actual git state. No discrepancy? The manifest auto‑commits as an audit trail. Mismatch? Rollback and flag for review. No heavy orchestration engine needed.

That "starter" version could slip right into your current checkpointing flow without a rewrite. Curious if you've already prototyped something like that, or if it's still on the whiteboard.