Hassann

Posted on May 11 • Originally published at apidog.com

What is Maigret: OSINT Scanner That Doesn't Break

Most OSINT tools age quickly: sites change endpoints, captchas evolve, and static scanners stop working. Maigret is different. It has survived for years by using a versioned site-signature database, deterministic detection rules, recursive verification, and update workflows that keep 3,000+ site checks usable over time.

Try Apidog today

This guide is for engineers. We will look at what Maigret does, when it is legitimate to use, how its architecture works, and how the same patterns apply to API testing with Apidog.

If you have not read it yet, our API testing without Postman in 2026 post covers similar pattern-matching and drift-detection ideas in a friendlier domain.

TL;DR

Maigret checks 3,000+ public sites for accounts matching a username.
It uses a versioned signature database instead of hard-coded one-off checks.
It compares multiple signals: URL patterns, presence strings, absence strings, headers, and page content.
Legitimate use cases include self-audits, account recovery, brand-abuse monitoring, missing-person work with consent, authorized red-team engagements, and investigative journalism.
The same engineering patterns map directly to API testing: signature-driven checks, multi-signal assertions, scheduled replay, fixture-based drift detection, and LLM postprocessing.
Apidog can apply these patterns to API contracts and regression suites.

What Maigret is and is not

Maigret is a Python tool maintained by soxoj. Its README describes it as a way to “collect a dossier on a person by username from 3,000+ sites.”

Install it with Python:

pip install maigret

Run a basic scan:

maigret some_username

Run against the full site database:

maigret some_username -a

Important boundaries:

Maigret only reads public data.
It does not require credentials, private API keys, or login bypasses.
If a site exposes a profile to anonymous visitors, Maigret can inspect it.
If a site does not expose the profile, Maigret returns “not found”, “unknown”, or a flagged result.
It can be misused if pointed at private individuals without consent.

Use it only in legitimate contexts: your own accounts, written authorization, approved journalism, consent-based investigations, or scoped security testing.

The rest of this article focuses on the engineering patterns, not on targeting people.

The site-signature database

Maigret’s most important design choice is its site-signature database.

Instead of hard-coding every check in Python, Maigret stores site behavior as data. Each site entry describes how to answer questions like:

What URL should be queried?
What does a valid profile page look like?
What does a “user not found” page look like?
Which strings must appear when an account exists?
Which strings prove an account does not exist?
Are special headers required?
Is the site known to rate-limit or show captchas?

Conceptually, a signature looks like this:

{
  "name": "ExampleSite",
  "urlMain": "https://example.com",
  "url": "https://example.com/{username}",
  "presenseStrs": ["Profile", "@{username}"],
  "absenceStrs": ["User not found", "This account does not exist"],
  "headers": {
    "User-Agent": "Mozilla/5.0"
  },
  "tags": ["social", "global"]
}

That is the same pattern you want in an API test suite.

For APIs, each endpoint has a signature:

{
  "method": "GET",
  "path": "/users/{id}",
  "expectedStatus": 200,
  "requiredFields": ["id", "email", "createdAt"],
  "forbiddenFields": ["password", "internalToken"],
  "requiredHeaders": ["content-type"],
  "errorEnvelope": {
    "code": "string",
    "message": "string"
  }
}

When the API response drifts, the signature fails and gives you a useful diff.

We covered related workflows in contract-first API development and the MCP server testing playbook.

How Maigret detects “found” vs “not found”

A naive scanner might do this:

curl https://example.com/user/alice

Then it checks the status code:

200 = found
404 = not found

That breaks quickly.

Many real sites return 200 OK for all of these cases:

A valid profile
A “user not found” page
A homepage redirect
A captcha page
A cached fallback page
A soft error

Maigret avoids this by using multiple signals.

A “found” result requires:

The expected URL pattern to resolve
All configured presenseStrs to appear
No configured absenceStrs to appear
Optional extraction rules to match
Optional headers or response behavior to look correct

A “not found” result requires the inverse.

Anything ambiguous becomes unknown, which is safer than pretending the scanner knows.

Apply the same idea to API testing. Do not stop at this:

pm.response.to.have.status(200);

Prefer multi-signal assertions:

pm.response.to.have.status(200);

const body = pm.response.json();

pm.expect(body).to.have.property("id");
pm.expect(body).to.have.property("email");
pm.expect(body).to.not.have.property("password");
pm.expect(pm.response.headers.get("content-type")).to.include("application/json");

In Apidog, the equivalent is to combine:

Status-code assertions
JSON schema checks
Required field checks
Forbidden field checks
Header assertions
Saved example comparisons

That is the API-testing version of Maigret’s presence and absence strings.

Recursive search and information extraction

After Maigret finds an account, it can extract public profile data from the page.

Examples of public identifiers include:

Linked usernames
Display names
Public email addresses
Public phone numbers
Profile links
Social handles

The extraction rules are site-specific. A GitHub profile exposes different fields than a LinkedIn profile or a forum account.

Then Maigret can recurse: new identifiers feed back into the search loop.

For OSINT, this turns one username into a graph of possible related public accounts.

For API testing, the same pattern is useful when exploring systems:

Call one endpoint.
Extract IDs or links from the response.
Follow those IDs to related endpoints.
Validate that downstream responses still match expected contracts.
Add newly discovered behavior to your test suite.

Example:

GET /orders/ord_123

Response:

{
  "id": "ord_123",
  "customerId": "cus_456",
  "paymentId": "pay_789"
}

A recursive API test should then check:

GET /customers/cus_456
GET /payments/pay_789

This helps uncover broken joins, stale references, missing permissions, and undocumented dependencies.

Captcha and rate-limit handling

Maigret detects captchas and rate limits by reading response shape and known site behavior.

Its strategies can include:

Rotating user agents
Respecting retry headers
Falling back to mobile or simplified domains
Routing through Tor or I2P where permitted
Marking the result as captcha-protected or unknown

The important point: Maigret does not treat every failure as a missing account. It separates:

“Not found”
“Blocked”
“Rate-limited”
“Captcha detected”
“Unknown”

API clients and API test runners should do the same.

For example, treat these differently:

404 Not Found      => resource does not exist
401 Unauthorized   => authentication failed
403 Forbidden      => caller lacks access
429 Too Many Requests => rate limit hit
503 Service Unavailable => upstream or service issue

A useful API test should back off on 429, not hammer the endpoint.

Example retry logic:

if (response.status === 429) {
  const retryAfter = response.headers.get("retry-after");

  console.log(`Rate limited. Retry after: ${retryAfter || "unknown"} seconds`);

  // Do not brute force retries.
  // Mark the test as rate-limited or reschedule it.
}

This protects your test infrastructure and avoids polluting results with false failures.

The signature drift problem

A signature database is only valuable if it stays current.

Sites change:

URL paths
HTML templates
Profile layouts
Error messages
Captcha behavior
Redirect behavior
Brand names and domains

APIs drift too:

Fields are renamed
Nullable fields become required
Error envelopes change
Pagination formats change
Headers disappear
Vendors ship undocumented updates

Maigret handles drift with several layers:

Auto-update from the central GitHub repository
Community pull requests for stale signatures
A manual --update flag
A test harness that validates signatures against known-existing usernames

That last part matters most.

For each supported site, a known-good username can be used to verify that the signature still detects an existing account. If the known-good check fails, the signature may be stale.

For APIs, the equivalent is fixture-based regression testing:

Save a known-good response.
Replay the request on a schedule.
Compare the live response against the saved contract.
Alert when the response shape changes.

Example expected fixture:

{
  "id": "cus_456",
  "email": "user@example.com",
  "createdAt": "2026-01-01T00:00:00Z"
}

Example drift:

{
  "id": "cus_456",
  "emailAddress": "user@example.com",
  "created_at": "2026-01-01T00:00:00Z"
}

The values may still be there, but the contract changed. A good test suite should catch that before production clients break.

Apidog supports this pattern with saved examples, assertions, scheduled runs, and response comparisons. Our DeepSeek V4 API guide shows the manual side of this workflow for one vendor API.

The optional AI summary mode

Maigret’s --ai flag uses an OpenAI-compatible LLM endpoint to summarize raw findings.

The key architectural decision: the LLM does not decide whether a username matches.

Maigret keeps detection deterministic:

Rules decide found/not found/unknown.
The LLM summarizes the final report.
The model operates over constrained input.

That is the safer pattern for API monitoring too.

Use deterministic checks for pass/fail:

status code == 200
required field exists
schema matches
forbidden field absent
latency under threshold

Then optionally use an LLM to summarize the run:

17 endpoints passed.
2 endpoints failed due to schema drift.
1 endpoint returned 429 and should be retried later.
The breaking change is in /customers/{id}: email was renamed to emailAddress.

Do not let the model be the judge. Let it be the reporter.

This is the same structured-first approach discussed in computer use vs structured APIs.

Legitimate use cases

Here are legitimate contexts for Maigret.

1. Account recovery for yourself

Run Maigret against usernames you used in the past.

Useful for:

Privacy audits
Closing old accounts
Reducing abandoned public profiles
Finding forgotten forum or social accounts

Example:

maigret old_username -a --pdf

2. Brand-abuse monitoring

Organizations can check brand names, product names, or public handles to detect impersonation accounts.

Example:

maigret yourbrand -a --tags social

This can help security, legal, and trust-and-safety teams triage possible impersonation.

3. Missing-person work with consent

Search-and-rescue and missing-person organizations may use OSINT tools with family consent and coordination with law enforcement.

Do not freelance here. Uncoordinated searches can harm investigations.

4. Authorized red-team engagements

A red team with written scope can use Maigret to map public exposure before deeper testing.

Example workflow:

Confirm written authorization.
Define usernames, brands, and domains in scope.
Run Maigret.
Validate findings manually.
Include only relevant public exposure in the report.

5. Investigative journalism

Reporters may use OSINT tools under editorial and legal review when investigating fraud, public-interest misconduct, or organized crime.

What is not appropriate:

Looking up strangers out of curiosity
Monitoring an ex-partner
Building datasets about people without consent
Publishing unverified extracted profile data as fact

Treat Maigret findings as leads, not proof.

Patterns from Maigret you can apply to API testing

1. Use signature databases instead of hand-coded checks

Represent endpoint behavior as data.

Bad:

if (endpoint === "/users") {
  expect(status).toBe(200);
}

Better:

{
  "endpoint": "/users",
  "method": "GET",
  "expectedStatus": 200,
  "requiredFields": ["data", "pagination"],
  "forbiddenFields": ["debugToken"]
}

Data-driven checks are easier to update, review, diff, and share.

2. Use multi-signal assertions

Do not rely on status codes alone.

Check:

Status code
Response body
Schema
Required fields
Forbidden fields
Headers
Error envelope
Latency
Auth behavior

This reduces false positives from generic success pages, cached responses, or partial failures.

3. Sync signatures centrally

Maigret updates its site database from a central repo.

API teams should do the same with contract definitions and test collections.

Apidog projects support collaborative API design and testing workflows. We covered this workflow in API testing without Postman.

4. Detect drift with scheduled replay

Set up scheduled runs against known-good fixtures.

Minimum workflow:

Save a valid request.
Save a known-good response.
Run the request periodically.
Compare the live response to the expected contract.
Alert on breaking changes.

5. Use LLMs only after deterministic checks

Good use:

Summarize these failed API assertions for Slack.

Bad use:

Decide whether this API response is correct.

Let deterministic tests decide correctness. Let the LLM make the output easier to read.

Common pitfalls when running Maigret

Running without `-a` and assuming the scan is complete

By default, Maigret focuses on a smaller set of popular sites.

Use -a for the full database:

maigret username -a

Expect the run to take longer.

Ignoring tags

Use tags to narrow by category or country.

Example:

maigret username --tags social

Tag filtering helps when your scope is region-specific or platform-specific.

Skipping updates

Old signatures cause false positives and false negatives.

Force an update before serious work:

maigret --update

Misreading Tor blocks

Some sites block Tor exit nodes. A Tor block is not evidence about the username. It is evidence about the network path.

Treating extracted fields as proof

Maigret extracts what public pages expose. Pages can be fake, stale, or impersonated.

Always verify before using findings in a report.

Real-world implementation patterns

Red-team scoping

A consultancy can use Maigret at the start of a scoped engagement:

Confirm the client’s written authorization.
Define usernames, brand names, and domains.
Run Maigret.
Manually validate relevant public accounts.
Include public exposure in the kickoff report.

Fraud investigation summaries

An investigator can run a broad scan and use --ai to summarize deterministic findings for non-technical readers.

The split is important:

Maigret generates the data.
The LLM summarizes the report.
The investigator verifies the conclusions.

API regression testing

An engineering team can apply Maigret’s architecture to internal APIs:

Store endpoint contracts as signatures.
Run multi-signal assertions.
Save known-good fixtures.
Schedule replay.
Alert on drift.
Summarize failures for the team.

This is where Apidog fits naturally: define the API, save examples, add assertions, run tests, and detect contract changes before clients break.

Conclusion

Maigret is worth studying because it solves a hard maintenance problem: thousands of detection rules across changing external surfaces.

The transferable ideas are:

Store detection logic as versioned signatures.
Use multiple signals instead of one status code.
Separate found, not found, blocked, rate-limited, and unknown states.
Keep signatures updated.
Test signatures against known-good fixtures.
Treat drift as a first-class failure mode.
Use LLMs for summaries, not decisions.

For API teams, the next step is practical: open Apidog, pick one important endpoint, and model it like a Maigret signature.

Define:

Expected status
Required fields
Forbidden fields
Required headers
Error shape
Saved known-good response
Scheduled drift check

That discipline pays off the first time a vendor renames a field at 2 a.m. and your test suite catches it before users do.

FAQ

Is Maigret legal to use?

It depends on the jurisdiction and the target.

Running it on yourself, accounts you own, a company you are authorized to test, or approved journalism is generally different from running it on an unsuspecting individual. Targeting private people without consent can cross stalking, harassment, privacy, or computer-misuse laws.

Check your local rules and get authorization.

Does Maigret work without Python?

The official package requires Python 3.10+. The author also maintains a Telegram bot and a Cloud Shell setup for users who do not want a local install.

How accurate is the 3,000-site claim?

The repository database contains 3,000+ entries, but not every site is active or reachable at all times. Auto-updates and community maintenance keep a working subset current. Tags help narrow the scan to sites relevant to your scope.

What does the AI mode add?

The --ai flag summarizes deterministic findings with an OpenAI-compatible LLM endpoint. It does not change account detection. You provide the API key.

Can I use Maigret in CI?

For OSINT investigations, usually no; they require human review and context.

But Maigret’s architecture belongs in CI for API testing: signature databases, fixture replay, drift detection, and deterministic assertions. Apidog supports those API testing workflows.

How is Maigret different from Sherlock?

Sherlock is the older and simpler username-search tool. Maigret extends the idea with richer site signatures, information extraction, recursive search, captcha handling, AI summary mode, and a larger database. Both are MIT-licensed.

Where do I report a stale signature?

Use GitHub issues or pull requests in the Maigret repository. Community contributions keep the database current, and one pull request per stale site is the usual workflow.

TL;DR

What Maigret is and is not

The site-signature database

How Maigret detects “found” vs “not found”

Recursive search and information extraction

Captcha and rate-limit handling

The signature drift problem

The optional AI summary mode

Legitimate use cases

1. Account recovery for yourself

2. Brand-abuse monitoring

3. Missing-person work with consent

4. Authorized red-team engagements

5. Investigative journalism

Patterns from Maigret you can apply to API testing

1. Use signature databases instead of hand-coded checks

2. Use multi-signal assertions

3. Sync signatures centrally

4. Detect drift with scheduled replay

5. Use LLMs only after deterministic checks

Common pitfalls when running Maigret

Running without -a and assuming the scan is complete

Ignoring tags

Skipping updates

Misreading Tor blocks

Treating extracted fields as proof

Real-world implementation patterns

Red-team scoping

Fraud investigation summaries

API regression testing

Conclusion

FAQ

Is Maigret legal to use?

Does Maigret work without Python?

How accurate is the 3,000-site claim?

What does the AI mode add?

Can I use Maigret in CI?

How is Maigret different from Sherlock?

Where do I report a stale signature?

Running without `-a` and assuming the scan is complete