GnomeMan4201

Posted on May 12

A Prompt Is a Control Surface, Not a Magic Spell

#ai #llm #promptengineering #security

Most prompt books optimize for better answers. I wanted prompts that fail visibly.

Most prompt collections are fine if all you need is a nicer answer. They save time. If you've never thought about how to ask an AI to reformat a table or draft a meeting summary, someone compiled a list and you can paste from it. Useful. Fine.

But that is not the problem I was trying to solve.

I Got Tired of Prompt Dumps

The problem I had was this: I was using LLMs for serious work. Building tools that run in production. Running security research. Publishing findings I have to defend. The AI-assisted parts of that workflow needed to hold up — not just in demos, not just on clean inputs, but under pressure.

Messy inputs. Adversarial conditions. Situations where a confident-sounding wrong answer is worse than no answer. Systems where someone might actively try to manipulate what the model does.

A prompt dump gives you words to paste. A field manual gives you a way to test what comes back.

That distinction is the whole thing. Most prompt collections stop at the first part. I needed the second.

So I wrote The GNOME Prompt Field Manual: Prompts That Survive Pressure — and this article is an introduction to how it thinks, not a preview of everything in it.

A Prompt Is a Control Surface

Here's where most of the discussion about prompts goes wrong: it treats a prompt as a request. You ask something, you get something back, you judge whether it sounds good.

A prompt is more than that.

A prompt is a control surface. It shapes what the model sees. What it ignores. What kind of answer is structurally allowed by the framing. What failure looks like — and whether you'll recognize failure when it happens. Whether the output can be tested, cited, committed, or published.

When you understand a prompt that way, three things follow.

First: a prompt is a thinking tool. The framing of a question determines what kinds of answers are possible. A prompt that asks for a steelman produces different cognitive output than one that asks for a critique — not because the model changed, but because the operation changed. Getting this right matters.

Second: a prompt is an attack surface. Anywhere a prompt accepts external input, it can be manipulated. Prompt injection is a real attack class. RAG poisoning is a documented threat. Evaluator capture — where a model used as a judge gets gamed into inflating scores — is a real vulnerability in AI evaluation pipelines. A prompt that doesn't account for adversarial use is not production-ready.

Third: a prompt is a quality filter. A well-constructed prompt raises the floor of what you'll accept as output. If you can't specify what failure looks like, you won't catch it when it happens. Some of the most useful prompts in the manual are designed specifically to detect failure — in the output of other prompts.

Most collections only address the first of these.

The Admission Filter

Every prompt in the manual was tested against ten inclusion criteria before it was allowed in. A prompt had to do at least one of the following:

Reveal a hidden assumption
Convert failure into a test
Separate signal from noise
Harden an idea against attack
Protect a system from bad inputs
Turn messy work into a reusable artifact
Expose a trust boundary
Improve decision quality under uncertainty
Help a builder ship safer or cleaner
Produce output that can be reused, tested, cited, published, or committed

"It's useful" is not on the list. "It's interesting" is not on the list.

A prompt that just gets you a better answer to a question you already knew how to ask is a fine prompt. It belongs somewhere else.

This filter eliminated a lot of prompts that felt good. The first draft had 22 prompts. Every one of them passed the criteria when checked in isolation. But "passes at least one criterion" turned out to be a floor, not a standard. A prompt can technically reveal a hidden assumption while being indistinguishable from every other "list your assumptions" prompt in every other AI book.

I ran a written audit. Each prompt got a verdict: KEEP, UPGRADE, CUT, or MERGE — with specific documented reasons. Not vibes. Specific deficiencies, specific fixes. If the fix wasn't visible in the revised version, the upgrade didn't happen.

A prompt had to do serious work before it earned a slot.

Example: The Idea Stress-Test

Here's one of the core entries — not the full manual text, just enough to show how the framework works in practice.

The Idea Stress-Test is a six-lens pressure test. Each lens is a distinct attack angle, designed to assault the idea from a non-overlapping position. The output is a structured report ending in a single weakest-link verdict and a minimum test requirement.

Lens 1 — Assumption Lens
What are the load-bearing assumptions this idea requires to be true? Which assumption is most likely false? Which is least verifiable?

Lens 2 — Adversarial User Lens
Who would use this idea in ways it was not intended? What would a motivated bad actor, competitor, or non-compliant user do with it, to it, or against it? How does that use break the idea's core value?

Lens 3 — Historical Analog Lens
What similar idea has been tried before? What happened? Where did it succeed and fail? What does the analog predict about this idea's failure mode?

Lens 4 — Incentive Lens
Who benefits if this succeeds? Who is harmed or threatened by it? Where are the misaligned incentives that will generate resistance, gaming, or sabotage?

Lens 5 — Failure Cascade Lens
If this idea fails, what fails because of it? Map the downstream collapse. What is the maximum realistic blast radius?

Lens 6 — Weakest Link Lens
Given the five lenses above: state one falsifiable condition — "This idea fails if [specific condition]." Why this link and not others?

A short version of the prompt:

Run a six-lens stress test on the following idea.

Each lens is a distinct attack angle. Do not merge lenses or 
treat one as a variant of another.

For each lens:
- state the attack angle
- produce a specific finding, not a generic risk
- rate severity HIGH / MEDIUM / LOW
- recommend one action

After all six lenses:
- identify the single weakest link as a falsifiable condition
- explain why this link matters more than the others
- state the minimum test that determines whether it holds
- give a verdict: PROCEED / REVISE / ABANDON

Idea: [IDEA]

The key instruction buried in the full entry: every finding must be specific to the idea under test. If you can swap the idea out and the findings still read as true, the prompt failed. Rerun with: "No finding should be applicable to a different idea without modification."

That's what I mean by a control surface. The prompt isn't just asking for analysis. It's structuring what kind of analysis is allowed, what counts as failure, and what the output needs to demonstrate before it's worth using.

Why Failure Modes Are Not Optional

Every serious prompt in the manual includes a documented failure mode. Not as a caveat. Not as a footnote. As part of the entry.

A prompt that cannot tell you how it fails is not reliable enough for serious work.

For the Idea Stress-Test, the documented failure mode is specific: the model will list generic risks instead of running the actual lenses. Signs of failure include findings that could apply to any idea — "the assumptions may not hold," "competitors could react negatively." If the adversarial user lens and the incentive lens produce findings that overlap substantially, the lenses weren't actually distinct. The failure mode tells you exactly what to look for and what to do about it.

For the Dense-to-Clear rewrite prompt, the documented failure is different: the model produces a fluent rewrite that loses technical force, compresses meaning, or weakens claims — and it does this smoothly enough that the loss isn't obvious unless you check. The rewrite is easier to read and less accurate. That's why the prompt mandates a rule-application log: a written record of every change made and why. Without the log, there is no accountability.

The failure mode is how you test the prompt. Without it, you're running a process you can't verify.

Anti-Prompts

The manual contains twenty anti-prompts. These are not regular prompts.

Anti-prompts are diagnostic tools run on the output of other prompts. You use them when you suspect a model has given you something that looks right but isn't. They are not about generating better output. They are about catching bad output before it becomes a decision, a deployment, or a published claim.

A few examples of what they detect:

Over-Smoothing Detector — Catches the failure where an AI rewrite has averaged away the specificity, friction, and technical precision that made the original worth using. Checks for specificity loss, tone flattening, claim weakening, missing content. Returns a verdict: PRESERVED / SLIGHTLY SMOOTHED / SIGNIFICANTLY SMOOTHED / GUTTED.

Confidence Laundering Probe — Detects the specific technique where uncertain or weak evidence is made to appear strong through structure, repetition, or rhetorical framing — without any actual improvement in the underlying evidence. Six named techniques: citation laundering, consensus laundering, repetition-as-evidence, precision-as-confidence, structure-as-authority, appeal to publication.

Sycophancy Tripwire — Probes whether a model is agreeing with your framing rather than evaluating it. Sycophancy failure is specific: the model detected what you wanted to hear and gave you that instead of what you asked for. The tripwire surfaces it.

Injection Residue Check — Checks whether prompt injection residue is still observable in model outputs after a structural refactor. Structural separation alone doesn't block semantic injection. This is the follow-on instrument after you've done the architectural work.

Hallucinated Structure Detector — Audits AI output for organization and structure that sounds rigorous but wasn't present in the source material and doesn't hold up when traced.

The point of an anti-prompt is not to distrust every output. It's to have a test you can run when something feels off — and to run it before you ship, publish, or commit based on the output.

The point is not better-sounding output. The point is output that can survive pressure.

Why the Manual Is Organized by Operational Trigger

The manual is not organized by model capability, topic area, or difficulty level.

It's organized by what you need to do right now.

The sections are: Idea Hardening. Dense-to-Clear Without Weakening. Turning Raw Work Into Evidence. Build, Break, Observe. Prompts That Survive Attack. Publishing Serious Artifacts. Then the chains. Then the anti-prompts. Then the field cards.

I don't care which model you're using. I care whether the output can be tested, cited, committed, published, or safely used. Those are operational questions, not capability questions. The right entry point to the manual is: what problem just surfaced, and what needs to hold up?

The security section — prompt injection recognition, RAG poisoning audits, evaluator capture scanning, command/data separation, trust boundary mapping, tool-agent offense probing — exists because those are real failure modes in real deployed systems, not theoretical edge cases. If you're building anything that processes external inputs with an LLM, these prompts belong in your workflow. Not as theory. As instruments.

The field cards in the last section are for live use. When you're in the middle of something and don't have time to read a full entry, the field card gives you the prompt and the failure mode. That's all you need under pressure.

That's what makes it a field manual.

Closing

There are plenty of ways to make AI produce better-sounding output. Better framing, better context, better format instructions. That work is real.

But "sounds better" and "can be trusted" are different things. The difference shows up when inputs are bad. When someone is trying to break the system. When you need to cite what came back. When a confident-sounding wrong answer has real consequences.

The prompts in this manual were selected because they work under conditions that generic prompts don't. Because they fail visibly when they fail. Because they produce work you can actually test.

Prompts should not just make AI sound smarter. They should make failure harder to hide.

I'm building this as The GNOME Prompt Field Manual: Prompts That Survive Pressure — 50 prompts, 10 chains, and 20 anti-prompts, organized by operational trigger.

Top comments (1)

GnomeMan4201 • May 12

This post is the framing layer for the manual.

The part I’m most interested in building out publicly is the anti-prompt side: prompts that test whether another prompt failed.

If there’s interest, I’ll write the next post around one concrete anti-prompt probably Over-Smoothing Detector or Confidence Laundering Probe and show it running against a real AI-generated output.