Unicode Normalizer (NFC / NFD / NFKC / NFKD)
NFC
NFD
NFKC
NFKD
How to Use
Type or paste text containing combining marks, fullwidth ASCII, ligatures, or compatibility characters. The tool returns the result of all four Unicode normalization forms side-by-side, with code points and UTF-8 byte counts. Useful for debugging filename equality issues, search indexing, and database collation.
The four forms
Unicode defines four normalization forms (UAX #15). They differ along two axes: canonical vs compatibility decomposition, and whether to compose afterwards.
- NFC (Canonical Composition): decompose then re-compose canonically. The default for most text storage and comparison. "が" stays "が" (one code point).
- NFD (Canonical Decomposition): decompose into base + combining marks. "が" becomes "が" ("か" + combining voiced mark, two code points). Used by macOS HFS+/APFS for filenames.
- NFKC (Compatibility Composition): like NFC but also folds compatibility variants — fullwidth A becomes ASCII A, ㈱ becomes (株). Use for search and identifier comparison.
- NFKD (Compatibility Decomposition): the most aggressive — applies compatibility folding and decomposes. Useful for stripping diacritics or implementing case-insensitive search.
When to use which
- Storing user-supplied text: NFC (smallest representation, widest compatibility).
- Indexing for search: NFKC (so "Café" and "Cafe" and "cafe" all collapse).
- Comparing filenames across OSes: normalize both sides to NFC before comparing.
- Stripping accents: NFKD then remove combining marks (`\p{M}` regex).
Privacy & Security
All normalization happens in your browser via the standard String.prototype.normalize() API. No text is ever sent to a server.
FAQ
What is the difference between NFC and NFKC?
Both compose after decomposing, but NFC applies only canonical transformations and preserves the visible characters. NFKC additionally folds compatibility variants, turning fullwidth A into ASCII A and ㈱ into (株). Use NFKC for search and identifier comparison.
Is the text I enter sent to a server?
No. Normalization is performed locally using the browser's standard String.prototype.normalize(), and the text you enter is never sent to or stored on any server.
Is there a limit on how much text I can process?
There is no explicit limit; the tool handles long text up to your browser's memory capacity. Alongside all four forms it also shows the code point sequence and UTF-8 byte counts.
Why don't filenames match between macOS and other operating systems?
macOS HFS+/APFS stores filenames in NFD (decomposed form), so even when they look identical to systems that use NFC, the underlying representation differs. Normalize both sides to NFC before comparing.
Which form should I use to strip accent marks?
Decompose with NFKD (or NFD) to separate base characters from combining marks, then remove the combining marks with a regex such as \p{M} to strip the accents.