When deleting a database column, I ran grep "\.html\b" across a Django codebase to check for references. It returned 1,202 hits. The column had 10 actual attribute-access references. The other 1,192 were template paths, HTML file extensions in strings, comments, and import fragments — none of which mattered.
Filtering 1,200+ grep hits by hand every time you drop a column isn't a workflow, it's a chore I kept putting off.
So I built colref — a CLI tool that uses AST parsing to find only the attribute-access references to a model field, filtering out everything grep can't.
The Haystack: One Field Name, Ten Thousand Strings
grep treats your codebase as a flat stream of characters. .html matches everything containing those five characters — in code, in strings, in comments, in template paths.
grep -rn "\.html\b" --include="*.py" wagtail/
# 1,202 hits
The 1,192 noise hits in Wagtail break down like this:
| Category | Count | Example |
|---|---|---|
| HTML file extensions in strings | 1,087 | template_name = "pages/publish.html" |
| Other string literals | 27 | format_html(...) |
| Comments | 21 | # See docs/settings.html#... |
| Other | 57 | template_html = base + ".html" |
The same problem appears across every project and every field name. On Mastodon, .domain gives 269 hits; 175 are spec files and SQL heredocs. On Zulip, .name for Stream returns 1,347 hits; 10 are noise.
grep matches characters. It cannot distinguish obj.html from publish.html in a path string.
The Blueprint: AST as a Code Structure Map
colref parses your source files into an Abstract Syntax Tree and walks only the attribute-access nodes — the ones that represent obj.field in running code.
colref check --orm django --model Embed --field html ./wagtail/
# 10 hits
String literals, comments, Django template strings, SQL heredocs, and docstring-embedded code examples are all invisible to the AST walker.
| Approach | Hits | What's included |
|---|---|---|
grep "html" |
3,534 | Everything |
grep "\.html\b" |
1,202 | File extensions, strings, comments |
| colref | 10 | Attribute accesses only |
The AST sees obj.html as an attribute access and "publish.html" as a string literal — two different node types.
The Anchor: ORM Schema as a Disambiguation Layer
AST parsing alone is not enough. obj.name might be Stream.name, User.name, or a method call with no relation to your database. colref resolves this by reading the ORM schema first.
For Django, it parses models.py files to find which fields are declared on which model. For Rails, it reads db/schema.rb (or replays migrations if schema.rb is absent). Only references to a field that actually exists on the target model are reported.
colref check --orm rails --model Account --field username ./mastodon/
# 40 hits (grep \.username\b gives 196)
Without the schema, colref would have no way to distinguish account.username from config.username in a settings file.
The "Implicit Self" Trap
The most common false positive colref produces comes from bare method calls with no explicit receiver:
# Forem — app/views/articles/show.html.erb
<% title "Welcome!" %>
<% title @article.title_with_query_preamble(user_signed_in?) %>
When the field name matches a helper method called on implicit self, colref currently includes it. For the Forem title field, this produced 50 false positives out of 340 reported hits.
The fix is a receiver-aware pass: treat a call node as a candidate only when it has an explicit receiver. That work is on the roadmap.
Verified Against 15+ Real-World Projects
colref has been tested against real OSS codebases across both ORMs:
Django (10 projects: Wagtail, Saleor, Zulip, NetBox, BookWyrm, Misago, django-wiki, and others):
- Zero false negatives across all tested model/field pairs
- Zero false positives after fixing the
models/package scanner (#65) - Remaining gap:
abstract_models.pypatterns (django-oscar style) not yet supported
Rails (Mastodon, Forem, Fat Free CRM, Lobsters, Publify, mutual-aid, and others):
- Same precision/recall profile
- Projects without a committed
db/schema.rbnow supported via migration replay
Detailed results with TP/FN/FP breakdowns per project are in the GitHub issues.
What's Next
Django and Rails are the first two ORMs. The roadmap includes:
- Laravel (PHP) — migration-based schema, Eloquent attribute access
- Spring Boot / JPA (Java) — entity annotations, JPA field resolution
- Prisma (TypeScript/Node) — schema.prisma as the source of truth
If you use one of these and want to help shape the implementation, the issues are open.
Where it stands
colref filters out the text noise that makes grep unreliable for column reference checks. On Wagtail, Mastodon, and Zulip the signal-to-noise ratio went from roughly 1% to 100%, and I now reach for it before grep when removing a column.
The implicit-self false positives are still there, abstract_models.py isn't handled, and 15 projects is a small slice of the Django and Rails worlds.
If you maintain a Django or Rails codebase, I'd like to know how colref does on your models — especially the cases where it misses something obvious or reports a hit that's clearly noise.
Try it and open an issue if it breaks.
go install github.com/shinagawa-web/colref@latest
colref check --orm django --model YourModel --field your_field ./
Top comments (0)