tasuku fujioka

Posted on Apr 22 • Edited on Apr 23

Git tracks what changed. It doesn't track why. So I built project-memory.

#ai #claude #agents #productivity

In 3 lines

I built a skill that solves a real problem in long-running AI-assisted work: decisions, hypotheses, experiment results, and half-formed ideas keep disappearing.
It has a promotion rule so hypotheses do not silently become facts, and a priority rule for resolving contradictions between files.
It works not only for software projects, but also for research, exploratory engineering, and paper writing, without depending on a specific model vendor.

Have you run into this?

When you work with AI agents over a long period of time, the problem is not just that chats disappear.

Even when the chat still exists, these kinds of things get lost:

why you chose option A over option B
what you already tried and why it failed
a promising idea you mentioned once and never found again
which hypotheses are still untested
which assumptions are actually verified
how last week's experiment connects to today's decision

Typical examples look like this:

“We compared A and B and chose A.” → A month later: Why did we choose A again?
“We tried X and it didn't work.” → Two weeks later: you accidentally try X again.
“Maybe Y is worth exploring too.” → It disappears into the conversation.
Research notes, experiment notes, and the actual paper draft stop lining up.

Git tracks what changed.
It does not reliably track why you changed it, what you ruled out, or what is still only a hypothesis.

And now that AI often writes commits too, even commit messages do not always reflect the human reasoning behind the work.

What I built

I built an Agent Skill called project-memory.

It manages project and research knowledge through role-separated markdown files inside the repository.
The AI agent updates them during work sessions. Humans mostly just read them when needed.

README.md            ← entry point
CURRENT_STATE.md     ← things safe to treat as current working truth
ROADMAP.md           ← future plan
DECISION_LOG.md      ← decisions and why they were made
RESEARCH_LOG.md      ← experiments, investigation, observations
HYPOTHESIS_LAB.md    ← unverified ideas and hypotheses
HUMAN_BRIEF.md       ← the summary a human should read first
RECOVERY_NOTES.md    ← restart checkpoint after interruption
CONTEXT_MANIFEST.md  ← what to read, and in what order

For research-heavy work, I also use files like these:

LITERATURE_NOTES.md  ← literature and reading notes
FIGURES_LOG.md       ← figures, diagrams, and output tracking

No database.
No vector store.
No vendor lock-in.
Just markdown files.

But the important part is that the information is separated by state and responsibility.

Why this is useful

1. It keeps not only the decision, but the reason behind it

The most useful file is probably DECISION_LOG.md.

## DEC-014 — Authentication strategy

Date: 2026-04-15
Status: adopted

### Context
We needed to decide how to implement user authentication.

### Alternatives considered
1. Session-based auth — simple, but harder to scale
2. JWT — stateless, but token revocation becomes complicated
3. OAuth2 + PKCE — standard, and easier to integrate with external IdPs

### Decision
Adopt OAuth2 + PKCE.

### Rationale
We expect to support Google and GitHub login later.
Starting with OAuth2 now reduces migration cost later.
We also rejected plain JWT because token revocation management would add operational complexity.

### Risks
- The first implementation of the OAuth2 flow is more complex
- Login depends on IdP availability

### Revisit when
- Running our own IdP exceeds X hours per month
- User count exceeds 10,000 and rate limits start becoming a real problem

The benefit is not just that it says what we chose.
It also records:

what we rejected
why we rejected it
what risks remain
when we should revisit the choice

So one month later, “Why did we go with OAuth2 again?” is no longer a mystery buried in old chats.

2. Half-formed ideas do not disappear

Some of the most important thoughts show up as:

“maybe this is the cause”
“this might be worth testing”
“this is messy, but I don't want to lose it”

Those ideas usually vanish first.

So HYPOTHESIS_LAB.md is split into two layers.

## Raw sparks

Low-commitment captures. Vague or incomplete ideas go here first.

- HYP-031: The latency problem might be DNS resolution, not N+1 queries
- HYP-032: User drop-off might be caused by onboarding screen 3, not pricing

## Working hypotheses

Ideas that keep coming back, or already have a clear next step.

### HYP-018 — Cache invalidation timing is causing the performance regression

Status: testing
Originated: 2026-04-10
Evidence: RSC-039
Next step: change TTL from 30s to 120s and measure for one week
Revisit: after measurement results arrive

The important part is this:

the AI captures these automatically.

It does not stop and ask, “Do you want me to log that?”
If the user says something like “maybe this is relevant,” it can go straight into Raw sparks.

You can clean up later.
But once a useful idea is lost in a conversation, it usually stays lost.

3. Hypotheses do not silently turn into facts

This is probably the core design choice.

A very common failure mode in long-running projects is this:

Someone writes “I think X might be true.”

A few weeks later, everyone is acting as if X has already been confirmed.

To prevent that, project-memory has a promotion rule.

To move something from HYPOTHESIS_LAB.md into CURRENT_STATE.md, at least one of the following must exist:

evidence in RESEARCH_LOG.md
an explicit operating decision in DECISION_LOG.md
a clear user decision
support from an external source

And the promoted item must also satisfy all of these:

it is clear enough to be used as a stable working assumption
the source file is traceable
it is not dominated by unresolved objections
it has a revisit_when condition

So the rule is:

capture broadly
promote narrowly

That is how you avoid losing ideas without polluting your working truth.

4. Contradictions between files are not silently merged away

In long-running work, contradictions happen.

For example:

CURRENT_STATE.md says “we are using A”
but the latest DECISION_LOG.md says “we switched to B”

If an AI agent casually merges that into a vague compromise, you get bad state.

So I define an explicit priority order:

latest entry in CURRENT_STATE.md
latest entry in DECISION_LOG.md
latest entry in RESEARCH_LOG.md
RECOVERY_NOTES.md
HUMAN_BRIEF.md
ROADMAP.md
HYPOTHESIS_LAB.md

When the agent detects a conflict, it should not silently “fix” it.
It should report the contradiction and propose a patch.

5. Parallel work becomes visible

In real projects and research, there is rarely only one thread.

You may be doing several of these at once:

implementation
experiments
literature review
writing the paper itself
preparing figures
submission work

A pure chronological log mixes these together and becomes hard to scan.

So HUMAN_BRIEF.md includes a Tracked threads table.

## Tracked threads

| Thread | Status | Next action / blocker | Source |
| --- | --- | --- | --- |
| Authentication migration | active | Waiting for OAuth2 test results | RSC-042 |
| Performance audit | paused | Waiting for staging deployment | BLK-003 |
| Billing model design | active | Comparing 3 pricing options | HYP-028 |

This file is not updated on every small change.
It only changes when the human-facing picture changes:

current goal
major blocker
key risk
next decision
tracked threads

That keeps it short and actually readable.

It works for research and paper writing too

This system is not only for software projects.
It works surprisingly well for research, writing, and exploratory analysis.

Common problems in paper-writing workflows

When you work on research with AI, these things happen a lot:

literature notes become scattered
hypotheses and sourced claims get mixed together
figures drift away from the argument they were meant to support
abandoned directions get rediscovered and repeated
you forget which source supported which claim

How the research profile handles it

In research-heavy work, I split things like this:

LITERATURE_NOTES.md  ← key points, objections, citation candidates
RESEARCH_LOG.md      ← comparisons, tests, investigation, analysis
HYPOTHESIS_LAB.md    ← ideas not ready for the paper yet
DECISION_LOG.md      ← decisions about structure, method, scope
CURRENT_STATE.md     ← assumptions safe to use in the current draft
FIGURES_LOG.md       ← figure ideas, adoption reasons, revision history

A concrete example

Suppose you are writing a paper in comparative mythology.

A workflow might look like this:

After reading a source, write into LITERATURE_NOTES.md: what it claims, where it might be useful, and where it is weak
While thinking, you say: “maybe this myth motif spread through a different transmission route” → that goes into HYPOTHESIS_LAB.md under Raw sparks
When you actually build comparison tables or check primary sources, that goes into RESEARCH_LOG.md
When you decide, “we will use this comparison axis in section 3,” that goes into DECISION_LOG.md
Only the things safe to treat as current draft assumptions go into CURRENT_STATE.md

That way, hypothesis, investigation, and adopted working truth do not get mixed together.

And that matters, because papers are not just the final claims.
A lot of value is in the path that led you there.
If that path disappears, you often end up rethinking the same ground from scratch.

Figure logs help more than expected

Figures drift easily in research workflows.
A chart gets made, then later you wonder:

Why did we make this figure in the first place?

FIGURES_LOG.md helps by recording:

what the figure is for
which section it supports
which version was adopted
which versions were rejected

That reduces a surprising amount of confusion later.

So in practice, project-memory is less about “developer notes” and more about state management for long-running thinking work with AI.

A failure mode this avoids

I have seen this break things many times:

someone says “this is probably the cause” in conversation
it gets copied into a README or note
a few weeks later it is treated as established fact
implementation or writing proceeds based on it
nobody can trace where it came from

Once that happens, the cleanup cost is high.

That is why project-memory keeps this distinction strict:

ideas can be captured broadly in hypotheses,

but CURRENT_STATE.md must stay strict.

That separation matters a lot.

How to use it

New project

git clone https://github.com/tasuku-9/project-memory-skill
python scripts/init_memory_workspace.py /path/to/project --profile research

Then tell your AI agent something like:

Read SKILL.md and start in Init mode.

It can then ask about the goal, current state, immediate next target, and known constraints, and write the initial files.

Existing project

If the repository already has code and documents, use an adoption flow.

Adopt project-memory into this project.
Classify information from the README, existing docs, and git history,
and route it into the correct files.

The AI can then separate things like this:

README stays the entry point
decisions go into DECISION_LOG.md
experiments go into RESEARCH_LOG.md
hypotheses go into HYPOTHESIS_LAB.md

Being able to turn an overloaded README back into a real entry point is one of the nicest side effects.

Resume a session

Resume this project.
Read CONTEXT_MANIFEST.md and RECOVERY_NOTES.md first,
and tell me what I should do next.

Switch models

You can move from Claude to GPT to Gemini more easily if the repo itself holds the durable memory.

The point is that the repository markdown is the canonical memory, not the chat thread.

The three profiles

profile	use case
light	small projects, personal notes
standard	normal long-running projects
research	research, experiments, exploratory engineering, paper workflows

If full structure feels heavy, start with light and move up only when needed.

FAQ

Why not just put everything in the README?

Because READMEs bloat fast.

They start absorbing all of this:

setup instructions
current status
decision history
hypotheses
experiment notes
recovery steps

Once all of that lives in one file, people stop maintaining it, and then they stop trusting it.

The whole point of project-memory is to separate files by responsibility.

Can smaller models handle this?

The light profile probably can.

But standard and especially research depend on classification quality and on following the promotion rule consistently, so stronger models will be more reliable.

That said, if you are already doing this kind of long-running AI-assisted work, you are probably using a stronger model anyway.

Is it really safe to let AI write the memory?

It is safer if the AI is not writing into one giant undifferentiated note.

The rules are what make it work:

ideas go broadly into HYPOTHESIS_LAB.md
experiments and observations go into RESEARCH_LOG.md
decisions go into DECISION_LOG.md
CURRENT_STATE.md has strict promotion requirements

So the agent is not just “writing stuff down.”
It is routing information by state.

Are there already similar skills?

There are similar ideas in public memory and project-notes systems.

But in the examples I found, I did not see many that combine all of the following:

repository markdown as canonical memory
promotion rules for hypotheses
contradiction detection between files
visibility for parallel threads
an adoption flow for existing projects

So this is my attempt to generalize a workflow I had been developing for myself.

Final thought

What I want from this skill is actually pretty simple:

do not lose ideas
do not let hypotheses silently become facts
keep the reason behind decisions
connect experiments to judgments
make context portable across models

If you work with AI for long stretches, I think this matters more than chat length.

What matters is what gets preserved, where, and under what rules.

And if that part is handled well, you stop having to re-derive the same reasoning every time you come back to the work.

DEV Community