Punycode and internationalized domain names: what happens behind a Unicode URL
When you use a domain like 日本.jp, the URL bar shows Japanese, but DNS receives an ASCII representation called Punycode. This article walks through how that conversion works and the security implications it brings.
DNS only handles ASCII
DNS was designed for ASCII:
- Hostname characters: letters, digits, hyphens, dots.
- Case-insensitive.
- Up to 63 characters per label.
You cannot put Unicode (Japanese, Cyrillic, accented Latin, …) directly into a hostname. IDNA (Internationalizing Domain Names in Applications) is the workaround, and Punycode is the encoding it relies on.
Punycode: encode Unicode using ASCII
Punycode (RFC 3492) is a reversible encoding that expresses any Unicode string in ASCII letters, digits, and hyphens.
| Original domain | Punycode |
|---|---|
日本.jp | xn--wgv71a.jp |
münchen.de | xn--mnchen-3ya.de |
пример.рф | xn--e1afmkfd.xn--p1ai |
The xn-- prefix marks a Punycode-encoded label.
Encoding outline
Punycode runs three steps:
- Copy any ASCII characters to the left.
- Encode the non-ASCII characters by code point order plus position.
- Wrap the result with
xn--.
For mixed input like mañana.com, the man, ana ASCII parts stay; the position and code point of ñ are encoded compactly.
Homograph attacks: same shape, different character
Punycode enables a class of attack: register a domain that looks visually identical to a real one.
apple.com(Latin)аpple.com(Cyrillicаinstead of Latina)
The second registers as something like xn--pple-43d.com. A user who thinks they’re visiting Apple ends up somewhere else.
Browsers defend against this by showing Punycode form when a single label mixes scripts.
How browsers decide what to display
A simplified version of Chrome / Firefox rules:
- Pure ASCII, or one script throughout (all Japanese, all Cyrillic) → show Unicode.
- Multiple scripts mixed within one label → show Punycode.
- Some specific safe combinations (Japanese + ASCII digits, etc.) → show Unicode.
That’s why 日本.jp displays as 日本.jp, but аpple.com (Cyrillic а) is shown as xn--pple-43d.com.
Email Address Internationalization (EAI)
Internationalized email addresses (RFC 6530) follow the same pattern:
- Local part (left of
@) — Unicode allowed. - Domain part — IDNA / Punycode.
EAI-aware mail servers are still uncommon, so many systems stick to ASCII.
Where this comes up in code
1. Domain input forms
If users type 日本.jp, the value you store usually needs to be Punycode (xn--wgv71a.jp). Browser URL objects expose the Punycode form on most properties.
2. Email delivery
Sending mail to a Unicode address requires the domain to be Punycode-encoded for SMTP.
3. TLS certificates
Let’s Encrypt and other ACME-based CAs issue certificates against the Punycode form of IDN domains.
4. Search engine indexing
Google can treat the IDN and Punycode forms of the same page as separate URLs — set canonical to one form to avoid duplication.
Summary
- DNS only carries ASCII; Punycode encodes Unicode into ASCII.
- The
xn--prefix marks a Punycode label. - Homograph attacks led browsers to display mixed-script labels as Punycode.
- Code that handles domains, emails, TLS, and SEO all touches Punycode somewhere.
To convert between Unicode and Punycode forms, the Punycode converter on this site does both directions.