Converting between Markdown and HTML: GFM, sanitization, attribute loss

Apr 28, 2026 4 min read

“Generate blog HTML from Markdown” / “Migrate existing HTML articles to Markdown” — bidirectional conversion comes up routinely. Both directions have pitfalls.

Markdown → HTML: not lossless

Markdown lacks features that HTML has, so conversion can drop information. And strict Markdown variants behave differently for edge cases.

1. Line breaks

Line 1
Line 2

CommonMark: Line 1\nLine 2 (newline becomes whitespace)
GFM (GitHub Flavored Markdown): optionally produces Line 1 Line 2

Force a break with two trailing spaces or an explicit  .

2. Tables (GFM-only)

| Col 1 | Col 2 |
| ----- | ----- |
| a     | b     |

CommonMark’s pure spec has no table syntax — only GFM-style extensions render tables. Verify your renderer supports GFM.

3. Inline HTML

Markdown lets you embed HTML directly:

This is **bold** and <span style="color: red">red</span>

Most tools: pass the  through to output
Strict CMS comment areas: strip or escape HTML
Static site generators: typically pass through

4. Auto-linking

https://example.com ← GFM auto-links this
[text](https://example.com) ← explicit link

GFM converts bare URLs to <a>; CommonMark leaves them as plain text.

5. Math and diagrams (extensions)

LaTeX ( $E = mc^2$ ) and Mermaid are renderer-extension features. GitHub, Notion, and custom sites support different subsets.

HTML → Markdown: more is lost

The reverse is structurally lossier because HTML is more expressive.

1. Cannot be expressed

HTML	Markdown
Inline `<style>`	Not representable
`<script>`	Not representable
`<iframe>`	Usually not (embed shortcodes only)
`<form>`	Not representable
Class names, IDs	Not representable (some extensions add this)
`<table>` `colspan` / `rowspan`	Not in GFM tables

2. Partially preserved

HTML	Markdown
`<strong>` `<b>`	`bold`
`<em>` `<i>`	`italic`
`<a href="...">`	`[text](URL)`
`<img src="...">`	`![alt](URL)`
`<code>`	`code`
`<pre><code>`	Fenced code block
`<ul>` `<ol>`	`-` `1.`
`<blockquote>`	`>`
`<h1>`–`<h6>`	`#` – `######`

3. Always lost

target="_blank" on links, width / height on images — most converters drop these. If you must preserve them, embed raw HTML.

Sanitization: handling untrusted Markdown

User-submitted Markdown is dangerous if inline HTML is allowed:

This is <script>alert('XSS')</script> an attack

Defenses:

Always sanitize (DOMPurify, sanitize-html, etc.)
Pick a Markdown renderer that forbids raw HTML (marked’s mangle: true etc.)
Allow-list image URLs (data: URIs can carry SVG XSS payloads)
Add noopener noreferrer to outbound links automatically

GitHub, Reddit, Stack Overflow — most public-facing services sanitize server-side before storing.

Behavior across renderers

Renderer	GFM	Raw HTML	Sanitized	Math
GitHub	✅	△ (some)	Server-side	KaTeX
Notion	✅	❌	Auto	LaTeX
Obsidian	✅	✅	No (local)	KaTeX
Hugo / Jekyll	✅	✅	Off by default	Plugin
marked.js	✅	Configurable	Separate	Separate
markdown-it	✅	Configurable	Separate	Plugin

Picking the direction

Markdown → HTML

Choose Markdown when authoring comfort matters
Reach for raw HTML only when pixel-level control is required

HTML → Markdown

Common in blog migrations (WordPress → Hugo, etc.)
Full automation isn’t realistic; expect manual cleanup
Libraries like turndown get you ~80% of the way

Summary

Markdown vs HTML information content is asymmetric (HTML carries more)
GFM extension support changes conversion results
Always sanitize untrusted Markdown
HTML → Markdown is lossy by design — automate then review

For ad-hoc conversion in either direction, the Markdown-to-HTML and HTML-to-Markdown tools on this site handle GFM and run entirely in-browser, so internal content stays local while you experiment.