HTML entity escaping: why each context needs its own rules
“Replace < with <” is the basic HTML escape, but what to escape depends on where the value lands. Body text and attribute values, HTML and JavaScript, HTML and URLs — each context has its own rules. This article maps the differences and the traps that come with mixing them up.
The five core entities
The most-used named entities in HTML escape:
| Character | Entity | Numeric reference |
|---|---|---|
< | < | < |
> | > | > |
& | & | & |
" | " | " |
' | ' or ' | ' |
Note:
'is XHTML / HTML5 only. For old-browser compatibility,'is safer.
Escaping these five is enough to defuse most HTML-context XSS.
Rules per context
The “what to escape” set changes by destination.
1. HTML body (text inside elements)
Escape: <, >, &.
<p>2 < 3 && foo</p>
<!-- Renders: 2 < 3 && foo --> < and & will confuse the parser if left raw. > isn’t strictly required, but escaping it is the defensive default.
2. HTML attribute values
Escape: <, >, &, and the quote character used (" or ').
<a title='hello "world"'>link</a> <a title="hello 'world'">link</a> Inside an attribute value, the surrounding quote becomes an escape target.
If the attribute is unquoted (e.g. <a title=hello>), virtually any whitespace, quote, or operator becomes special. Always quote attribute values.
3. JavaScript string literal
<script>
const name = '<?= $name ?>'; // ❌ unsafe
</script> HTML escaping isn’t enough here. If the value contains </script>, the HTML parser ends the script block early:
<script>
const name = "</script><img src=x onerror=alert(1)>";
</script> The script ends mid-string, the rest is parsed as new HTML, and you have an XSS.
Safer patterns for JS-context interpolation:
- JSON via DOM:
<script>const name = JSON.parse(document.getElementById('data').textContent);</script>— read the value from a DOM element. JSON.stringifywith<>&escaped to<etc. — closes the</script>hole.- Avoid embedding into a script tag when possible.
The takeaway: HTML escaping alone is not enough for values inside <script>. Use a JS-aware encoder.
4. Inside a URL
<a href="/search?q=<?= $query ?>">search</a> You need both URL encoding and HTML attribute escaping:
- First, URL-encode (
encodeURIComponent). - Then HTML-escape the result (
&→&).
<a href="/search?q=hello+world&page=1">search</a> A raw & in an attribute value tempts the parser to interpret the next characters as an entity reference (e.g. &page looks like the start of one).
5. Inside CSS
<style>
.user-bg {
background: url('<?= $url ?>');
}
</style> CSS strings have their own rules: escape ' as \27, etc. HTML escaping is the wrong layer.
Embedding untrusted data in CSS is high-risk in general — avoid it if you can.
Two forms of numeric references
Numeric character references come in decimal and hex:
& ← decimal, & (U+0026)
& ← hex, & (U+0026) Mixing both forms makes search and replace error-prone. Standardize on one — decimal is more legible.
Also, leading zeros are allowed: & and & mean the same. Attackers exploit this to bypass naive denylists (e.g. one that only blocks ' lets ' through).
Double escaping and multiple decoding
Double escape
Escaping already-escaped data breaks display:
input: "Hello"
once: "Hello"
twice: &quot;Hello&quot; ← user sees "Hello" When DB → API → frontend each tries to escape, this is the outcome. Escape exactly once, at the moment of output.
Multi-step decode
Conversely, decoding more than once is an attack vector:
input: &lt;script&gt;
once: <script>
twice: <script> ← XSS triggered Browsers decode HTML once. If the server decodes once before sending, the browser decodes again, and the value lands as raw HTML.
Trust your template engine
Implementing these rules manually is fragile. In practice:
- Use the template engine’s auto-escape (Jinja2
autoescape, ERB<%= %>). - Treat raw output (
{!! !!},<%= raw %>) as a code smell — only when truly necessary. - For JS interpolation, go through
JSON.stringify; for URLs,encodeURIComponent. - For DOM insertion, prefer
textContentoverinnerHTML.
A good template engine knows the context (body vs attribute vs script) and escapes accordingly. Lean on that.
Cheat sheet by destination
| Destination | Required encoding |
|---|---|
| HTML body | Entities for <, >, & |
| HTML attribute value (quoted) | Entities for <, >, &, " (or ') |
| JavaScript string literal | JSON.stringify + < > & escaped to \u00XX |
| URL in an attribute | encodeURIComponent + HTML attribute escape |
| CSS string | CSS \HEX escape (avoid putting user data here) |
The common discipline is: know where the value will land, and apply the right escape for that context.
To experiment with HTML entity encoding/decoding, the HTML entity tool on this site converts strings both ways. Useful when you want to verify a hand-written HTML escape.