Generating the Markdown
Three approaches — source-of-truth, build-time dual rendering, and runtime HTML-to-Markdown. Pick one based on where your content lives.
Content negotiation is only useful if you have Markdown to negotiate. Three approaches, in rough order of fidelity:
1. Markdown is the source
Your content is authored in Markdown or MDX. You render it to HTML for browsers, and serve the original (or a lightly-processed) Markdown to agents.
When this fits: blogs, docs sites, anything static-site-generator driven. Hugo, Astro, Next.js with MDX, Eleventy, Jekyll — all ship content as Markdown files already.
How:
- Build step emits
page.htmlandpage.mdside by side. - Runtime chooses which to serve based on
Accept. - Strip front-matter if you don’t want it in the Markdown response.
- Consider stripping MDX-specific components (e.g.,
<Alert>) that don’t render as pure Markdown.
This is the highest-fidelity approach because you never round-trip.
2. Database or CMS content, dual-rendered at write-time
Your content is in a CMS or a database. Content is authored in rich-text or block-based editors, stored as HTML or structured JSON.
How:
- When content is saved, run the HTML through a converter and store both representations.
- Content table grows by the size of the Markdown variant.
- Serve the cached Markdown at request time — no runtime conversion.
Converters:
- JavaScript:
turndown,html-to-md - Python:
html2text,markdownify - PHP:
league/html-to-markdown - Ruby:
reverse_markdown
3. Runtime HTML-to-Markdown conversion
Your content is dynamic, not stored, and you can’t touch the render
pipeline. You negotiate Accept at the edge or proxy, fetch the HTML,
and convert on the fly.
How:
- A reverse proxy (Worker, Lambda, middleware) intercepts requests.
- On
Accept: text/markdown, it fetches the HTML from origin. - Runs an HTML-to-Markdown converter.
- Returns the Markdown with correct headers.
The Roots post-content-to-markdown
plugin is this approach for WordPress — it converts a post’s content
to Markdown on request.
Cloudflare’s
Markdown for Agents is this
approach, managed. If you need it self-hosted, a Cloudflare Worker
running turndown is a 20-line implementation.
Tradeoffs:
- Runtime cost per request (unless you cache aggressively).
- Lossy on complex content — custom components, embedded widgets, interactive elements don’t translate.
- Zero fidelity guarantees — the same HTML may convert differently after a CSS change.
What to strip from the Markdown
Regardless of approach, the Markdown representation should be just the content. Strip:
- Site navigation and footer chrome
- Related-content sidebars and widgets
- Share buttons, social metadata
- Cookie banners and consent UI
- Advertisements
- Newsletter signup forms
- Comment threads (unless integral to the content)
If your HTML renders the content inside a specific container (<main>,
<article>, .post-body), scope the conversion to that container.
Preserve what matters
- Headings and their hierarchy (
#through######) - Links with their text and target
- Code blocks with language hints (the triple-backtick fence)
- Lists (ordered and unordered)
- Emphasis (
*italic*,**bold**) - Tables (GFM syntax is widely supported)
- Images — include the alt text and URL
What to not generate
- Front-matter in the response body (YAML
---blocks). Useful for SSG input, noise for agents. Strip before serving. - HTML fallbacks inside Markdown (e.g.,
<div class="callout">) unless your Markdown flavor really needs them. GitHub-flavored Markdown renders most things natively.