Skip to content

Write for the Extractor: The Craft of Getting Quoted by an Answer Engine

Answer engines retrieve passages and synthesize an answer, so getting cited is a craft: lead each chunk with a self-contained claim, make it survive being torn out of context, and hand the model the cleaner, more attributable fact than your competitors did.

By Mehdi7 min read
Share
On this page

To be quoted by an answer engine, you have to write for how it actually reads, and it does not read the way a person does. It retrieves chunks, ranks them, and synthesizes a few of them into an answer. That single mechanical fact governs which sentences get lifted into the response and which sit unread. Most advice about "optimizing for AI" skips the mechanism and jumps to vibes. The craft is downstream of the pipeline, so start there.

Here is the pipeline in one breath. Your page is split into passages and embedded as vectors. A user's question is embedded the same way. The engine pulls the passages nearest the question, sometimes reranks them with a heavier model, and feeds the top handful into a generator that writes prose and, if you are lucky, attributes a claim to you. The unit of competition is not your page. It is the passage. You are not ranking a document; you are making individual chunks that a synthesizer prefers to quote over the ones it retrieved from everyone else answering the same question.

Once you see it as a passage-level contest, the tactics stop being folklore and become derivable. I run this publication on these rules, so I will show the work rather than assert it.

Lead each passage with the claim, because the retriever reads the top first

The strongest move is also the oldest writing advice: put the claim first. It matters more here, for a specific reason. When a passage gets retrieved and handed to the generator, the model weighs the opening of that chunk heavily when deciding what the chunk is about and whether it answers the question. A paragraph that spends three sentences clearing its throat buries the answer below the fold of the model's attention. The retriever may still surface it, but the synthesizer reaches past it for a competitor who said the thing in sentence one.

So write the way a good abstract is written. First sentence: the answer. Everything after: the support, the boundary conditions, the mechanism. This is the same lead-with-the-claim discipline that separates writing a busy person forwards from writing they abandon. The answer engine just enforces it mechanically instead of politely clicking away. If you already write thesis-first, you were doing GEO before it had a name.

Make every section survive being torn out of context

Retrieval decontextualizes. A chunk that reads perfectly in place turns to gibberish the moment it is pulled out and dropped next to three passages from other sites. "As I argued above, this changes everything" is dead on arrival — there is no "above" in the retrieved context, and there is no "this." The pronoun points at nothing the model can see.

Write each section as if it might be the only part of your work the engine ever sees. Restate the subject by name instead of leaning on a pronoun that reaches backward. Re-anchor the claim to its entity. Let the section carry enough of its own scaffolding that a stranger reading only that block understands what is being asserted and who is asserting it. This is defensive writing against decontextualization, and it doubles as an attribution safeguard: a self-contained passage that names its subject is far likelier to be quoted with your identity attached than a fragment that only makes sense inside a larger argument. A chunk that attributes correctly on its own is a chunk the model can safely cite.

There is a cost. Self-contained sections repeat a little. A human reading top to bottom will notice you restated the subject. That mild redundancy is the premium you pay for surviving retrieval, and against the alternative — being retrieved and then discarded as unquotable — it is cheap.

Give the synthesizer the precise, sourced version of the fact

When a generator is composing an answer and needs a number, a date, a definition, or a mechanism, it does not want the vaguest available phrasing. It wants the one it can state confidently and attribute cleanly. Between "engagement went up a lot" and "checkout completion rose from 61 percent to 78 percent in the cohort we measured," the second is not just better prose. It is more citable. It carries a specific, checkable value the model can drop into an answer and point back at you.

Specificity is retrieval bait of the honest kind. Quantified claims, named mechanisms, explicit definitions, and dated facts are exactly what factual queries retrieve toward, and exactly what a synthesizer prefers to lift, because a precise attributable fact lowers its risk of saying something wrong. Corroboration compounds this. When your specific claim agrees with other credible sources the engine has also retrieved, it becomes the low-risk pick — the version the model can state without hedging. You want to be the cleanest statement of a fact that the rest of the record already supports.

I feel this acutely writing from computational biology, where the gap between a citable claim and a vague one is the gap between a result and a rumor. "An epigenetic clock predicts age" is nearly useless. "A methylation-based clock estimates biological age from DNA methylation at a defined set of CpG sites, with residual error driven substantially by batch effects and cell-type composition" is a claim with edges — it says what is predicted, from what, and where the noise lives. The bounded version is the one a careful synthesizer trusts, precisely because it declares its own limits. Unbounded confidence reads as unreliable to a model that has seen a million overclaims. State the boundary conditions and you become more quotable, not less.

Make the structure legible, because structure is how chunks get cut and matched

Chunking is not magic; it is mostly boundaries. Clear headings, direct question-and-answer blocks, and explicit definitions give the splitter clean seams to cut along, and they give the retriever tight semantic units to match against a query. A wall of undifferentiated text gets chopped at arbitrary points, and every chunk arrives half-formed. A page with a heading that states a question a user would actually type, followed by a passage that answers it directly, is nearly purpose-built for the retrieval step.

This is why an honest FAQ block outperforms its length. A real question paired with a self-contained answer is the retrievable unit in miniature — the question mirrors the query, the answer stands alone, and the boundary is unambiguous. It is not a trick. It is the pipeline's preferred shape, offered directly. This publication uses thesis-first openings, sections that stand on their own, structured data, and FAQ blocks for exactly this reason, and the mechanism above is the whole justification.

Keep entities and terms consistent, so the model reliably associates the claim with you

Language models resolve entities probabilistically. If you call your product three slightly different things across a page, and refer to your core concept by four near-synonyms, you smear the association the model is trying to form. When it later reaches for the claim, it may attribute a fuzzier version to no one in particular, or to a competitor whose terminology was crisp. Consistency is how you get named. Pick the canonical term for your concept and your product and use it, verbatim, every time it matters. Kommerce is Kommerce in every passage, not "our commerce platform" here and "the trust layer" there. The reward for terminological discipline is a stable link between a claim and your name, handed to the user as attribution.

What does not work, said plainly

Keyword stuffing is dead and mildly counterproductive; retrievers match meaning, not term frequency, and dense repetition reads as low-signal. Spinning up fifty thin rephrasings of one page does not create fifty citable sources — it creates near-duplicates that dilute your own signal and lose to whichever version is most precise. Walls of undifferentiated prose get chunked badly and retrieved worse. And none of this rescues a page with nothing to say. The entire mechanism rewards specificity, so if you have no specific claim, there is nothing for the extractor to extract. The floor is having a real, precise thing to assert. Everything above only helps the engine find and quote what is already worth quoting.

I will be honest about the flux. This surface is young and moving. Chunking strategies, reranking, and attribution behavior differ across engines and change without notice, and measurement is still crude — you can watch whether the major engines cite you and quote you accurately, but the feedback loop is slow and noisy. Treat the specifics as a live bet, not settled science. What is not a bet is the direction: retrieval-then-synthesis over passages is how these systems work, and writing that serves that pipeline will keep paying while the details churn. This is the tactical companion to the strategic case in GEO Is the New SEO; that piece argues why the channel matters, this one is the craft at the passage level.

Notice what every rule reduces to. Lead with the claim, make it self-contained, make it specific, make its structure legible, name it consistently — each is an instruction to state exactly what you are asserting, with its edges, so a machine cannot mistake it. That is problem specification pointed at your own prose: the discipline of saying precisely what you mean is now the discipline of getting quoted. The extractor rewards the writer who already knew what they were claiming. It was never optimization. It was clarity, finally load-bearing.

Frequently asked questions

Is writing for answer engines different from writing well for humans?
Mostly it is the same discipline enforced harder. Leading with the claim, self-contained sections, specific and sourced facts, consistent terminology — these are old marks of good writing. The retriever just punishes their absence more mechanically than a forgiving human reader does. The one genuinely new move is defensive structure: assume any single passage may be pulled out of context and must still make sense and still attribute to you.
Does keyword stuffing or spinning up many thin rephrasings help?
No, and it can hurt. Synthesizers reach for the single most precise, corroborated, attributable version of a fact. Keyword-dense or thinly reworded pages read as lower-signal near-duplicates and lose to the source that states the number cleanly with a citation. You want to be the canonical statement of one claim, not fifty blurry statements of it.
How do I know if any of this is working?
Measurement is genuinely immature. Query the major answer engines for the questions you want to own and check whether you are cited and quoted accurately, track referral traffic arriving from those surfaces, and watch whether models attribute your specific numbers and terms back to you. Treat the whole practice as a live bet under flux, not a settled channel, and instrument before you invest heavily.

Filed under Marketing & Growth. Distribution as a discipline, not a growth hack.

Essays like this, in your inbox.

Thoughtful essays. No spam. Unsubscribe anytime.

Marketing & Growth

GEO Is the New SEO: Get Cited, Not Ranked

Answer engines read many sources and emit one synthesized reply. You no longer compete for a rank on a page of links; you compete to be the source the model quotes — and most businesses are still optimizing a channel that is shrinking.

8 min read
Marketing & Growth

Your Growth Loop Isn't Broken. It Has a Feedback Delay.

Most "dead" growth loops are working loops judged on the wrong clock. A control-systems view of why operators kill compounding loops at day 20 and overfeed vanity loops that quietly go negative.

7 min read