Most OSINT tools age quickly: sites change endpoints, captchas evolve, and static scanners stop working. Maigret is different. It has survived for years by using a versioned site-signature database, deterministic detection rules, recursive verification, and update workflows that keep 3,000+ site checks usable over time.
This guide is for engineers. We will look at what Maigret does, when it is legitimate to use, how its architecture works, and how the same patterns apply to API testing with Apidog.
If you have not read it yet, our API testing without Postman in 2026 post covers similar pattern-matching and drift-detection ideas in a friendlier domain.
TL;DR
- Maigret checks 3,000+ public sites for accounts matching a username.
- It uses a versioned signature database instead of hard-coded one-off checks.
- It compares multiple signals: URL patterns, presence strings, absence strings, headers, and page content.
- Legitimate use cases include self-audits, account recovery, brand-abuse monitoring, missing-person work with consent, authorized red-team engagements, and investigative journalism.
- The same engineering patterns map directly to API testing: signature-driven checks, multi-signal assertions, scheduled replay, fixture-based drift detection, and LLM postprocessing.
- Apidog can apply these patterns to API contracts and regression suites.
What Maigret is and is not
Maigret is a Python tool maintained by soxoj. Its README describes it as a way to “collect a dossier on a person by username from 3,000+ sites.”
Install it with Python:
pip install maigret
Run a basic scan:
maigret some_username
Run against the full site database:
maigret some_username -a
Important boundaries:
- Maigret only reads public data.
- It does not require credentials, private API keys, or login bypasses.
- If a site exposes a profile to anonymous visitors, Maigret can inspect it.
- If a site does not expose the profile, Maigret returns “not found”, “unknown”, or a flagged result.
- It can be misused if pointed at private individuals without consent.
Use it only in legitimate contexts: your own accounts, written authorization, approved journalism, consent-based investigations, or scoped security testing.
The rest of this article focuses on the engineering patterns, not on targeting people.
The site-signature database
Maigret’s most important design choice is its site-signature database.
Instead of hard-coding every check in Python, Maigret stores site behavior as data. Each site entry describes how to answer questions like:
- What URL should be queried?
- What does a valid profile page look like?
- What does a “user not found” page look like?
- Which strings must appear when an account exists?
- Which strings prove an account does not exist?
- Are special headers required?
- Is the site known to rate-limit or show captchas?
Conceptually, a signature looks like this:
{
"name": "ExampleSite",
"urlMain": "https://example.com",
"url": "https://example.com/{username}",
"presenseStrs": ["Profile", "@{username}"],
"absenceStrs": ["User not found", "This account does not exist"],
"headers": {
"User-Agent": "Mozilla/5.0"
},
"tags": ["social", "global"]
}
That is the same pattern you want in an API test suite.
For APIs, each endpoint has a signature:
{
"method": "GET",
"path": "/users/{id}",
"expectedStatus": 200,
"requiredFields": ["id", "email", "createdAt"],
"forbiddenFields": ["password", "internalToken"],
"requiredHeaders": ["content-type"],
"errorEnvelope": {
"code": "string",
"message": "string"
}
}
When the API response drifts, the signature fails and gives you a useful diff.
We covered related workflows in contract-first API development and the MCP server testing playbook.
How Maigret detects “found” vs “not found”
A naive scanner might do this:
curl https://example.com/user/alice
Then it checks the status code:
200 = found
404 = not found
That breaks quickly.
Many real sites return 200 OK for all of these cases:
- A valid profile
- A “user not found” page
- A homepage redirect
- A captcha page
- A cached fallback page
- A soft error
Maigret avoids this by using multiple signals.
A “found” result requires:
- The expected URL pattern to resolve
- All configured
presenseStrsto appear - No configured
absenceStrsto appear - Optional extraction rules to match
- Optional headers or response behavior to look correct
A “not found” result requires the inverse.
Anything ambiguous becomes unknown, which is safer than pretending the scanner knows.
Apply the same idea to API testing. Do not stop at this:
pm.response.to.have.status(200);
Prefer multi-signal assertions:
pm.response.to.have.status(200);
const body = pm.response.json();
pm.expect(body).to.have.property("id");
pm.expect(body).to.have.property("email");
pm.expect(body).to.not.have.property("password");
pm.expect(pm.response.headers.get("content-type")).to.include("application/json");
In Apidog, the equivalent is to combine:
- Status-code assertions
- JSON schema checks
- Required field checks
- Forbidden field checks
- Header assertions
- Saved example comparisons
That is the API-testing version of Maigret’s presence and absence strings.
Recursive search and information extraction
After Maigret finds an account, it can extract public profile data from the page.
Examples of public identifiers include:
- Linked usernames
- Display names
- Public email addresses
- Public phone numbers
- Profile links
- Social handles
The extraction rules are site-specific. A GitHub profile exposes different fields than a LinkedIn profile or a forum account.
Then Maigret can recurse: new identifiers feed back into the search loop.
For OSINT, this turns one username into a graph of possible related public accounts.
For API testing, the same pattern is useful when exploring systems:
- Call one endpoint.
- Extract IDs or links from the response.
- Follow those IDs to related endpoints.
- Validate that downstream responses still match expected contracts.
- Add newly discovered behavior to your test suite.
Example:
GET /orders/ord_123
Response:
{
"id": "ord_123",
"customerId": "cus_456",
"paymentId": "pay_789"
}
A recursive API test should then check:
GET /customers/cus_456
GET /payments/pay_789
This helps uncover broken joins, stale references, missing permissions, and undocumented dependencies.
Captcha and rate-limit handling
Maigret detects captchas and rate limits by reading response shape and known site behavior.
Its strategies can include:
- Rotating user agents
- Respecting retry headers
- Falling back to mobile or simplified domains
- Routing through Tor or I2P where permitted
- Marking the result as captcha-protected or unknown
The important point: Maigret does not treat every failure as a missing account. It separates:
- “Not found”
- “Blocked”
- “Rate-limited”
- “Captcha detected”
- “Unknown”
API clients and API test runners should do the same.
For example, treat these differently:
404 Not Found => resource does not exist
401 Unauthorized => authentication failed
403 Forbidden => caller lacks access
429 Too Many Requests => rate limit hit
503 Service Unavailable => upstream or service issue
A useful API test should back off on 429, not hammer the endpoint.
Example retry logic:
if (response.status === 429) {
const retryAfter = response.headers.get("retry-after");
console.log(`Rate limited. Retry after: ${retryAfter || "unknown"} seconds`);
// Do not brute force retries.
// Mark the test as rate-limited or reschedule it.
}
This protects your test infrastructure and avoids polluting results with false failures.
The signature drift problem
A signature database is only valuable if it stays current.
Sites change:
- URL paths
- HTML templates
- Profile layouts
- Error messages
- Captcha behavior
- Redirect behavior
- Brand names and domains
APIs drift too:
- Fields are renamed
- Nullable fields become required
- Error envelopes change
- Pagination formats change
- Headers disappear
- Vendors ship undocumented updates
Maigret handles drift with several layers:
- Auto-update from the central GitHub repository
- Community pull requests for stale signatures
- A manual
--updateflag - A test harness that validates signatures against known-existing usernames
That last part matters most.
For each supported site, a known-good username can be used to verify that the signature still detects an existing account. If the known-good check fails, the signature may be stale.
For APIs, the equivalent is fixture-based regression testing:
- Save a known-good response.
- Replay the request on a schedule.
- Compare the live response against the saved contract.
- Alert when the response shape changes.
Example expected fixture:
{
"id": "cus_456",
"email": "user@example.com",
"createdAt": "2026-01-01T00:00:00Z"
}
Example drift:
{
"id": "cus_456",
"emailAddress": "user@example.com",
"created_at": "2026-01-01T00:00:00Z"
}
The values may still be there, but the contract changed. A good test suite should catch that before production clients break.
Apidog supports this pattern with saved examples, assertions, scheduled runs, and response comparisons. Our DeepSeek V4 API guide shows the manual side of this workflow for one vendor API.
The optional AI summary mode
Maigret’s --ai flag uses an OpenAI-compatible LLM endpoint to summarize raw findings.
The key architectural decision: the LLM does not decide whether a username matches.
Maigret keeps detection deterministic:
- Rules decide found/not found/unknown.
- The LLM summarizes the final report.
- The model operates over constrained input.
That is the safer pattern for API monitoring too.
Use deterministic checks for pass/fail:
status code == 200
required field exists
schema matches
forbidden field absent
latency under threshold
Then optionally use an LLM to summarize the run:
17 endpoints passed.
2 endpoints failed due to schema drift.
1 endpoint returned 429 and should be retried later.
The breaking change is in /customers/{id}: email was renamed to emailAddress.
Do not let the model be the judge. Let it be the reporter.
This is the same structured-first approach discussed in computer use vs structured APIs.
Legitimate use cases
Here are legitimate contexts for Maigret.
1. Account recovery for yourself
Run Maigret against usernames you used in the past.
Useful for:
- Privacy audits
- Closing old accounts
- Reducing abandoned public profiles
- Finding forgotten forum or social accounts
Example:
maigret old_username -a --pdf
2. Brand-abuse monitoring
Organizations can check brand names, product names, or public handles to detect impersonation accounts.
Example:
maigret yourbrand -a --tags social
This can help security, legal, and trust-and-safety teams triage possible impersonation.
3. Missing-person work with consent
Search-and-rescue and missing-person organizations may use OSINT tools with family consent and coordination with law enforcement.
Do not freelance here. Uncoordinated searches can harm investigations.
4. Authorized red-team engagements
A red team with written scope can use Maigret to map public exposure before deeper testing.
Example workflow:
- Confirm written authorization.
- Define usernames, brands, and domains in scope.
- Run Maigret.
- Validate findings manually.
- Include only relevant public exposure in the report.
5. Investigative journalism
Reporters may use OSINT tools under editorial and legal review when investigating fraud, public-interest misconduct, or organized crime.
What is not appropriate:
- Looking up strangers out of curiosity
- Monitoring an ex-partner
- Building datasets about people without consent
- Publishing unverified extracted profile data as fact
Treat Maigret findings as leads, not proof.
Patterns from Maigret you can apply to API testing
1. Use signature databases instead of hand-coded checks
Represent endpoint behavior as data.
Bad:
if (endpoint === "/users") {
expect(status).toBe(200);
}
Better:
{
"endpoint": "/users",
"method": "GET",
"expectedStatus": 200,
"requiredFields": ["data", "pagination"],
"forbiddenFields": ["debugToken"]
}
Data-driven checks are easier to update, review, diff, and share.
2. Use multi-signal assertions
Do not rely on status codes alone.
Check:
- Status code
- Response body
- Schema
- Required fields
- Forbidden fields
- Headers
- Error envelope
- Latency
- Auth behavior
This reduces false positives from generic success pages, cached responses, or partial failures.
3. Sync signatures centrally
Maigret updates its site database from a central repo.
API teams should do the same with contract definitions and test collections.
Apidog projects support collaborative API design and testing workflows. We covered this workflow in API testing without Postman.
4. Detect drift with scheduled replay
Set up scheduled runs against known-good fixtures.
Minimum workflow:
- Save a valid request.
- Save a known-good response.
- Run the request periodically.
- Compare the live response to the expected contract.
- Alert on breaking changes.
5. Use LLMs only after deterministic checks
Good use:
Summarize these failed API assertions for Slack.
Bad use:
Decide whether this API response is correct.
Let deterministic tests decide correctness. Let the LLM make the output easier to read.
Common pitfalls when running Maigret
Running without -a and assuming the scan is complete
By default, Maigret focuses on a smaller set of popular sites.
Use -a for the full database:
maigret username -a
Expect the run to take longer.
Ignoring tags
Use tags to narrow by category or country.
Example:
maigret username --tags social
Tag filtering helps when your scope is region-specific or platform-specific.
Skipping updates
Old signatures cause false positives and false negatives.
Force an update before serious work:
maigret --update
Misreading Tor blocks
Some sites block Tor exit nodes. A Tor block is not evidence about the username. It is evidence about the network path.
Treating extracted fields as proof
Maigret extracts what public pages expose. Pages can be fake, stale, or impersonated.
Always verify before using findings in a report.
Real-world implementation patterns
Red-team scoping
A consultancy can use Maigret at the start of a scoped engagement:
- Confirm the client’s written authorization.
- Define usernames, brand names, and domains.
- Run Maigret.
- Manually validate relevant public accounts.
- Include public exposure in the kickoff report.
Fraud investigation summaries
An investigator can run a broad scan and use --ai to summarize deterministic findings for non-technical readers.
The split is important:
- Maigret generates the data.
- The LLM summarizes the report.
- The investigator verifies the conclusions.
API regression testing
An engineering team can apply Maigret’s architecture to internal APIs:
- Store endpoint contracts as signatures.
- Run multi-signal assertions.
- Save known-good fixtures.
- Schedule replay.
- Alert on drift.
- Summarize failures for the team.
This is where Apidog fits naturally: define the API, save examples, add assertions, run tests, and detect contract changes before clients break.
Conclusion
Maigret is worth studying because it solves a hard maintenance problem: thousands of detection rules across changing external surfaces.
The transferable ideas are:
- Store detection logic as versioned signatures.
- Use multiple signals instead of one status code.
- Separate found, not found, blocked, rate-limited, and unknown states.
- Keep signatures updated.
- Test signatures against known-good fixtures.
- Treat drift as a first-class failure mode.
- Use LLMs for summaries, not decisions.
For API teams, the next step is practical: open Apidog, pick one important endpoint, and model it like a Maigret signature.
Define:
- Expected status
- Required fields
- Forbidden fields
- Required headers
- Error shape
- Saved known-good response
- Scheduled drift check
That discipline pays off the first time a vendor renames a field at 2 a.m. and your test suite catches it before users do.
FAQ
Is Maigret legal to use?
It depends on the jurisdiction and the target.
Running it on yourself, accounts you own, a company you are authorized to test, or approved journalism is generally different from running it on an unsuspecting individual. Targeting private people without consent can cross stalking, harassment, privacy, or computer-misuse laws.
Check your local rules and get authorization.
Does Maigret work without Python?
The official package requires Python 3.10+. The author also maintains a Telegram bot and a Cloud Shell setup for users who do not want a local install.
How accurate is the 3,000-site claim?
The repository database contains 3,000+ entries, but not every site is active or reachable at all times. Auto-updates and community maintenance keep a working subset current. Tags help narrow the scan to sites relevant to your scope.
What does the AI mode add?
The --ai flag summarizes deterministic findings with an OpenAI-compatible LLM endpoint. It does not change account detection. You provide the API key.
Can I use Maigret in CI?
For OSINT investigations, usually no; they require human review and context.
But Maigret’s architecture belongs in CI for API testing: signature databases, fixture replay, drift detection, and deterministic assertions. Apidog supports those API testing workflows.
How is Maigret different from Sherlock?
Sherlock is the older and simpler username-search tool. Maigret extends the idea with richer site signatures, information extraction, recursive search, captcha handling, AI summary mode, and a larger database. Both are MIT-licensed.
Where do I report a stale signature?
Use GitHub issues or pull requests in the Maigret repository. Community contributions keep the database current, and one pull request per stale site is the usual workflow.


Top comments (0)