Unicode Normalizer (NFC / NFD / NFKC / NFKD)
NFC
0
chars · 0
bytes (UTF-8)
NFD
0
chars · 0
bytes (UTF-8)
NFKC
0
chars · 0
bytes (UTF-8)
NFKD
0
chars · 0
bytes (UTF-8)
How to Use
Type or paste text containing combining marks, fullwidth ASCII, ligatures, or compatibility characters. The tool returns the result of all four Unicode normalization forms side-by-side, with code points and UTF-8 byte counts. Useful for debugging filename equality issues, search indexing, and database collation.
The four forms
Unicode defines four normalization forms (UAX #15). They differ along two axes: canonical vs compatibility decomposition, and whether to compose afterwards.
- NFC (Canonical Composition): decompose then re-compose canonically. The default for most text storage and comparison. "が" stays "が" (one code point).
- NFD (Canonical Decomposition): decompose into base + combining marks. "が" becomes "が" ("か" + combining voiced mark, two code points). Used by macOS HFS+/APFS for filenames.
- NFKC (Compatibility Composition): like NFC but also folds compatibility variants — fullwidth A becomes ASCII A, ㈱ becomes (株). Use for search and identifier comparison.
- NFKD (Compatibility Decomposition): the most aggressive — applies compatibility folding and decomposes. Useful for stripping diacritics or implementing case-insensitive search.
When to use which
- Storing user-supplied text: NFC (smallest representation, widest compatibility).
- Indexing for search: NFKC (so "Café" and "Cafe" and "cafe" all collapse).
- Comparing filenames across OSes: normalize both sides to NFC before comparing.
- Stripping accents: NFKD then remove combining marks (`\p{M}` regex).
Privacy & Security
All normalization happens in your browser via the standard String.prototype.normalize() API. No text is ever sent to a server.