Generating the Markdown — content negotiation guide

Content negotiation is only useful if you have Markdown to negotiate. Three approaches, in rough order of fidelity:

1. Markdown is the source

Your content is authored in Markdown or MDX. You render it to HTML for browsers, and serve the original (or a lightly-processed) Markdown to agents.

When this fits: blogs, docs sites, anything static-site-generator driven. Hugo, Astro, Next.js with MDX, Eleventy, Jekyll — all ship content as Markdown files already.

How:

Build step emits page.html and page.md side by side.
Runtime chooses which to serve based on Accept.
Strip front-matter if you don’t want it in the Markdown response.
Consider stripping MDX-specific components (e.g., <Alert>) that don’t render as pure Markdown.

This is the highest-fidelity approach because you never round-trip.

2. Database or CMS content, dual-rendered at write-time

Your content is in a CMS or a database. Content is authored in rich-text or block-based editors, stored as HTML or structured JSON.

How:

When content is saved, run the HTML through a converter and store both representations.
Content table grows by the size of the Markdown variant.
Serve the cached Markdown at request time — no runtime conversion.

Converters:

3. Runtime HTML-to-Markdown conversion

Your content is dynamic, not stored, and you can’t touch the render pipeline. You negotiate Accept at the edge or proxy, fetch the HTML, and convert on the fly.

How:

A reverse proxy (Worker, Lambda, middleware) intercepts requests.
On Accept: text/markdown, it fetches the HTML from origin.
Runs an HTML-to-Markdown converter.
Returns the Markdown with correct headers.

The Roots post-content-to-markdown plugin is this approach for WordPress — it converts a post’s content to Markdown on request.

Cloudflare’s Markdown for Agents is this approach, managed. If you need it self-hosted, a Cloudflare Worker running turndown is a 20-line implementation.

Tradeoffs:

Runtime cost per request (unless you cache aggressively).
Lossy on complex content — custom components, embedded widgets, interactive elements don’t translate.
Zero fidelity guarantees — the same HTML may convert differently after a CSS change.

What to strip from the Markdown

Regardless of approach, the Markdown representation should be just the content. Strip:

Site navigation and footer chrome
Related-content sidebars and widgets
Share buttons, social metadata
Cookie banners and consent UI
Advertisements
Newsletter signup forms
Comment threads (unless integral to the content)

If your HTML renders the content inside a specific container (<main>, <article>, .post-body), scope the conversion to that container.

Preserve what matters

Headings and their hierarchy (# through ######)
Links with their text and target
Code blocks with language hints (the triple-backtick fence)
Lists (ordered and unordered)
Emphasis (*italic*, **bold**)
Tables (GFM syntax is widely supported)
Images — include the alt text and URL

What to not generate

Front-matter in the response body (YAML --- blocks). Useful for SSG input, noise for agents. Strip before serving.
HTML fallbacks inside Markdown (e.g., <div class="callout">) unless your Markdown flavor really needs them. GitHub-flavored Markdown renders most things natively.