Markdown TOC and anchors: GitHub-style slug generation and the pitfalls
A long technical article without a table of contents is hard to scan and easy to bounce off of. Markdown’s table-of-contents links use the simple form [text](#anchor), but the anchor name is generated from the heading text by the renderer (GitHub, Notion, etc.) using rules that aren’t always obvious. This article walks through those rules and what an auto-generator has to handle.
Why have a TOC at all
Articles past ~3000 characters are hard to navigate without a TOC. Adding one gives you:
- Skimmability — readers can see the structure at a glance.
- Direct jumps — they can land on the section they want.
- SEO benefits — Google occasionally surfaces “Jump to” sublinks in search results.
- Shareable links — you can point someone at “the X section” with a deep link.
For short pieces (under 800 characters), skip it. Beyond ~1500, consider it.
Markdown anchor links: the basic form
In Markdown, headings get auto-generated id attributes, and you link to them with [text](#anchor):
## Section A
Body of Section A.
## Section B
[Back to Section A](#section-a) The #section-a part is the automatically generated slug. The catch: each renderer generates slugs slightly differently.
GitHub Flavored Markdown’s slug rules
GitHub Flavored Markdown (GFM) is the de-facto standard. GitHub’s renderer turns headings into slugs by:
- Lowercasing the text.
- Stripping punctuation and symbols like
! @ # $ % ^ & * ( ) + = < > ? , . ; : ' " \ | / [ ] { }. - Replacing whitespace with hyphens, collapsing runs.
- Suffixing duplicates with
-1,-2, …
Examples
| Heading | Generated slug |
|---|---|
## Hello World | hello-world |
## What's that? | whats-that |
## Setup (1st) | setup |
## Setup (2nd) | setup-1 |
## Setup (3rd) | setup-2 |
## 日本語の見出し | 日本語の見出し (kept as-is) |
## API Reference (v2) | api-reference-v2 |
CJK characters are preserved, but the URL form gets percent-encoded (%E6%97%A5%…), which is why “Japanese anchors look messy in URLs”.
Differences across renderers
Other major renderers each have variations:
| Renderer | Slug differences |
|---|---|
| GitHub | The reference rules above. |
| GitLab | Similar, but sometimes preserves case. |
| Notion | Uses opaque hash IDs unrelated to heading text. |
| HackMD | Close to GitHub with small differences. |
| Hugo / Jekyll | Configurable; can match GFM or use their own scheme. |
Moving an article between renderers is a common cause of broken TOC links. Know which renderer’s rules you’re targeting.
Anatomy of a TOC generator
The basic flow:
1. Walk the Markdown line by line.
2. Match lines that start with #.
3. Record the heading level (number of #s).
4. Slugify the heading text.
5. Detect duplicate slugs and append -1, -2, …
6. Emit a nested bullet list as Markdown. What an implementation has to handle correctly:
1. Don’t treat # inside code blocks as a heading
```python
# This is a Python comment, not a heading
def foo():
pass
``` Track open/close fences (``` or ~~~) and skip lines inside them.
2. Closed ATX form ## Section ##
ATX headings can be closed with trailing hashes:
## Section Strip the trailing #s before slugifying.
3. Inline Markdown in headings
If a heading contains formatting:
## **Important** stuff GitHub:
- Display: rendered with bold.
- Slug: bare text,
important-stuff.
A regex strip works for simple cases; parsing to an AST and taking only text nodes is more robust.
4. Indent width
Nested bullet lists typically use 2 or 4 spaces:
- [Section A](#section-a)
- [Subsection A.1](#subsection-a1) ← 2 spaces
- [Section B](#section-b) GitHub recognizes 2-space indentation as nested; some other parsers require 4. Match your target renderer.
TOC design tips
1. Skip H1
H1 is the page title; the TOC typically covers H2 and below.
2. Stop at H4
Including H4 / H5 makes the TOC dense and reduces its skim value. The point of a TOC is to show the structure, so keep depth shallow.
3. Avoid duplicate heading names
Duplicate suffixes (-1, -2) make links fragile — if the article is reordered later, the link points to a different section.
4. Regenerate after edits
A TOC drifts out of date the moment you add or rename headings. Either re-run the generator or build it into your CI / commit hook.
Summary
- Articles past ~1500 characters benefit from a TOC.
- Markdown anchors follow “lowercase + strip punct + spaces→hyphens + dedupe” in GFM.
- Renderers differ; know which one you’re targeting.
- Generators have to handle code fences, closed ATX, inline formatting, and indent width.
The Markdown TOC generator on this site implements GFM-compatible slug generation. Paste your draft, copy the generated TOC, and drop it at the top of your article.