XML formatting: attributes vs elements, indentation, namespaces
XML is structured data, but its flexibility in formatting means presentation choices matter for readability. This article covers the rules and tooling pitfalls.
Indentation basics
Indent for each nesting level:
<order>
<id>123</id>
<items>
<item>
<name>Book</name>
<price>1500</price>
</item>
</items>
</order> 2 or 4 spaces. Avoid tabs (environment-dependent).
Attribute vs child element
Where to put data:
As attribute
<book id="123" title="SQL Basics" published="2024" /> As child element
<book>
<id>123</id>
<title>SQL Basics</title>
<published>2024</published>
</book> Heuristics:
- Attributes — simple values, metadata, identifiers, configuration.
- Elements — complex structures, repeating data, long text.
Attributes are constrained: single value, no repetition, no nested structure. Elements are flexible.
Attribute formatting
Many attributes — break to multiple lines:
<server
hostname="example.com"
port="443"
protocol="https"
timeout="30" /> Few attributes — inline:
<server hostname="example.com" port="443" /> Self-closing tags
Empty elements close themselves:
<!-- Good -->
<br />
<img src="logo.png" />
<!-- Verbose -->
<br></br>
<img src="logo.png"></img> Self-closing required in XHTML and SVG; both forms are valid in plain XML.
Whitespace handling
XML preserves whitespace between tags by default:
<name> Alice </name> The ” Alice ” (with leading/trailing spaces) is part of the value.
xml:space="preserve" makes preservation explicit:
<code xml:space="preserve">
if (x > 0) {
return x;
}
</code> Formatters should respect this.
CDATA sections
For text containing <, >, & literally:
<script>
<![CDATA[
if (a < b && b > c) {
console.log("hello");
}
]]>
</script> Formatters must not touch CDATA contents.
Comments
XML comments:
<!-- single line -->
<!--
multi-line
comment
--> Caveats:
- No
--inside the comment (parser confusion). - Can’t nest.
- Some positions inside DTDs are off-limits.
Namespaces
Mixing schemas:
<root
xmlns="http://example.com/default"
xmlns:html="http://www.w3.org/1999/xhtml"
xmlns:svg="http://www.w3.org/2000/svg">
<html:p>paragraph</html:p>
<svg:rect width="100" height="50" />
</root> Formatters should preserve and align namespace prefixes.
Escaping
XML special characters:
| Character | Entity |
|---|---|
< | < |
> | > |
& | & |
" | " |
' | ' |
Attribute-value " becomes " (or use single quotes around the attribute and escape the ').
XML declaration
At the top of the file:
<?xml version="1.0" encoding="UTF-8"?> Optional, but required when using non-UTF-8 encoding. UTF-8 with BOM is discouraged.
DOCTYPE
DTD reference (older XML / HTML):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> Largely replaced by XML Schema (XSD). New documents shouldn’t use DOCTYPE.
XML Schema vs DTD
Schema options:
- XSD (XML Schema) — XML-based, rich type system.
- DTD (Document Type Definition) — older, terse, custom syntax.
- RELAX NG — often considered more readable than XSD.
XSD is the modern default.
Formatter pitfalls in mixed content
Mixed content (text + child elements) is fragile:
<p>Hello <b>world</b>!</p> If a formatter expands to:
<p>
Hello
<b>world</b>
!
</p> … you get extra rendered whitespace. Mixed-content elements should be touched conservatively.
XML parsers
Implementations differ:
- DOM parser — loads everything into memory (small/medium docs).
- SAX parser — event-driven, streaming (large docs).
- StAX — pull-style streaming.
Config files → DOM. Bulk data → SAX/StAX.
XML vs JSON vs YAML
<user>
<name>Alice</name>
<age>30</age>
</user> {
"name": "Alice",
"age": 30
} name: Alice
age: 30 - XML — structured documents, DOM operations, legacy systems.
- JSON — APIs, web, simple.
- YAML — configs, human-authored.
New APIs default to JSON. XML still dominates SOAP, Office documents, SVG, Atom feeds.
Formatters
- xmllint (libxml2) —
xmllint --format file.xml. - prettier-plugin-xml — Prettier integration.
- xmlstarlet — XML manipulation and formatting.
- VS Code “XML Tools” etc.
Wire CI to format on PR for clean diffs.
Summary
- Indent 2 or 4 spaces; avoid tabs.
- Attributes for simple values, elements for structure.
- Self-close empty elements.
- Don’t mangle CDATA or comments.
- Preserve namespace prefixes.
For XML formatting and validation, the XML formatter on this site handles the common cases.