Most of the GEO/SEO tooling on the market right now reads like it was written to sell a course, not to solve a problem.
So I wrote four tools instead.
Four Node CLIs, zero runtime dependencies, MIT, each one does one thing. They all live under the @geosuite scope on npm, and the source is at github.com/TryGeoSuite.
Here's what they do, and the design call behind each one.
1. @geosuite/ai-crawler-bots
What it does: tells you whether GPTBot, ClaudeBot, PerplexityBot, and ~20 other AI crawlers can actually reach your site, and where the block is coming from when they can't.
npx @geosuite/ai-crawler-bots robots https://your-site.com
The non-obvious part: when a request comes back 403, the result distinguishes between an edge block (Cloudflare / CloudFront / Vercel / Akamai / Fastly / Netlify fingerprint in the response) and an origin block (no such fingerprint — your application or web server). The remediation is different in each case: edge means flip a toggle in your CDN dashboard, origin means update a config.
It also parses robots.txt with line-level provenance, so when a bot is Disallowed it tells you which line in which group did it. And it detects the # BEGIN Cloudflare Managed content … # END Cloudflare Managed Content markers Cloudflare injects when "Block AI Bots" is enabled — if your own rules would have allowed the bot but the managed block disallows it, the report says so.
UA strings come from operator docs, not third-party SEO blogs that copy each other. We don't accept entries without a docs link.
2. @geosuite/schema-templates
What it does: ships 23 copy-paste-ready schema.org JSON-LD templates plus an offline structural validator.
npx @geosuite/schema-templates list
npx @geosuite/schema-templates show Product
JSON-LD is the cheapest, least ambiguous signal you can give an AI assistant about what your page is. It will not on its own make ChatGPT cite you — authority and freshness still matter — but it removes a class of avoidable failures. The AI no longer has to guess your prices, your author, or whether a number on the page is a benchmark or a typo.
I deliberately excluded fields that aren't truly recommended for each type. Padding templates with every optional schema.org property dilutes the signal. If you need a field that's not there, schema.org is the source of truth — add it yourself.
There's also geosuite-schema fill <Type> --url <url> --ai if you want the LLM to populate placeholders from a real page, but the deterministic side (templates + validator) does not need a network or an API key.
3. @geosuite/llms-txt-generator
What it does: turns a sitemap.xml into an llms.txt file per the proposed standard at llmstxt.org.
npx @geosuite/llms-txt-generator https://your-site.com/sitemap.xml \
--name="Your Site" --enrich --out=public/llms.txt
llms.txt is intended to be the LLM-shaped equivalent of a sitemap: a curated, sectioned, markdown index of your most important pages. The format is small enough to be parsed by classical tooling (regex) and also legible to a model — that's the point.
The generator is deterministic. With --enrich it fetches each URL once and pulls <title> + <meta name="description"> via regex. No headless browser, no LLM dependency in the default path. (--ai is opt-in if you want the LLM to rewrite descriptions; we send only URL + title + meta, never the page body.)
Sitemap-index files are flattened automatically. Pass them like a flat sitemap.
4. @geosuite/sitemap-builder
What it does: crawls a site and emits a valid sitemap.xml. For sites that ship without one (more common than you'd think on custom builds).
npx @geosuite/sitemap-builder https://your-site.com --output sitemap.xml
BFS, same-origin only, three caps stack: page count, depth, wall-clock budget. Whichever fires first wins. Drops obvious non-HTML extensions and fragment-only links. Output is sitemaps.org-compliant — <loc> plus optional <lastmod>, no <changefreq> or <priority> (deprecated, ignored by every major engine).
Whole tool is around 250 lines of vanilla Node. No puppeteer, no cheerio, no axios. Just node:http, node:https, and a few regexes.
The design choices, all in one place
-
Zero runtime dependencies. The four packages combined add ~0 install footprint to your project. The only exception is
llms-txt-generator, which depends onfast-xml-parserfor the sitemap-index path because writing your own XML parser is a footgun. -
AI mode is opt-in. Every CLI has a
--aiflag. Without it, behaviour is fully deterministic. With it, payloads are minimal and structured (verdicts, titles, depths) — never raw HTML or page bodies. -
One tool, one job. Composable via stdout/JSON. If you want to chain
sitemap-builderintollms-txt-generator, that's a single pipe. - Boring code. No clever metaprogramming. The whole stack is meant to be readable in an afternoon. If it isn't, that's a bug, not a feature.
Why open source the building blocks
The same checks power GeoSuite, the hosted product I'm building (history, alerts, dashboards, integrations into your content pipeline). But the building blocks belong open: I find it dishonest to sell a black box that does things any developer could verify.
If you find a bot UA missing — or worse, a wrong one — the place to send it is bots.json in ai-crawler-bots, with a link to the operator's docs. UA strings drift a couple of times per year per operator, and that file ages faster than anything else in the suite.
PRs and issues welcome. Especially the ones that prove me wrong.




Top comments (0)