XML formatting: attributes vs elements, indentation, namespaces

3 min read

XML is structured data, but its flexibility in formatting means presentation choices matter for readability. This article covers the rules and tooling pitfalls.

Indentation basics

Indent for each nesting level:

<order>
  <id>123</id>
  <items>
    <item>
      <name>Book</name>
      <price>1500</price>
    </item>
  </items>
</order>

2 or 4 spaces. Avoid tabs (environment-dependent).

Attribute vs child element

Where to put data:

As attribute

<book id="123" title="SQL Basics" published="2024" />

As child element

<book>
  <id>123</id>
  <title>SQL Basics</title>
  <published>2024</published>
</book>

Heuristics:

  • Attributes — simple values, metadata, identifiers, configuration.
  • Elements — complex structures, repeating data, long text.

Attributes are constrained: single value, no repetition, no nested structure. Elements are flexible.

Attribute formatting

Many attributes — break to multiple lines:

<server
  hostname="example.com"
  port="443"
  protocol="https"
  timeout="30" />

Few attributes — inline:

<server hostname="example.com" port="443" />

Self-closing tags

Empty elements close themselves:

<!-- Good -->
<br />
<img src="logo.png" />

<!-- Verbose -->
<br></br>
<img src="logo.png"></img>

Self-closing required in XHTML and SVG; both forms are valid in plain XML.

Whitespace handling

XML preserves whitespace between tags by default:

<name>  Alice  </name>

The ” Alice ” (with leading/trailing spaces) is part of the value.

xml:space="preserve" makes preservation explicit:

<code xml:space="preserve">
  if (x > 0) {
    return x;
  }
</code>

Formatters should respect this.

CDATA sections

For text containing <, >, & literally:

<script>
  <![CDATA[
    if (a < b && b > c) {
      console.log("hello");
    }
  ]]>
</script>

Formatters must not touch CDATA contents.

Comments

XML comments:

<!-- single line -->

<!--
  multi-line
  comment
-->

Caveats:

  • No -- inside the comment (parser confusion).
  • Can’t nest.
  • Some positions inside DTDs are off-limits.

Namespaces

Mixing schemas:

<root
  xmlns="http://example.com/default"
  xmlns:html="http://www.w3.org/1999/xhtml"
  xmlns:svg="http://www.w3.org/2000/svg">
  <html:p>paragraph</html:p>
  <svg:rect width="100" height="50" />
</root>

Formatters should preserve and align namespace prefixes.

Escaping

XML special characters:

CharacterEntity
<<
>>
&&
""
''

Attribute-value " becomes " (or use single quotes around the attribute and escape the ').

XML declaration

At the top of the file:

<?xml version="1.0" encoding="UTF-8"?>

Optional, but required when using non-UTF-8 encoding. UTF-8 with BOM is discouraged.

DOCTYPE

DTD reference (older XML / HTML):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Largely replaced by XML Schema (XSD). New documents shouldn’t use DOCTYPE.

XML Schema vs DTD

Schema options:

  • XSD (XML Schema) — XML-based, rich type system.
  • DTD (Document Type Definition) — older, terse, custom syntax.
  • RELAX NG — often considered more readable than XSD.

XSD is the modern default.

Formatter pitfalls in mixed content

Mixed content (text + child elements) is fragile:

<p>Hello <b>world</b>!</p>

If a formatter expands to:

<p>
  Hello
  <b>world</b>
  !
</p>

… you get extra rendered whitespace. Mixed-content elements should be touched conservatively.

XML parsers

Implementations differ:

  • DOM parser — loads everything into memory (small/medium docs).
  • SAX parser — event-driven, streaming (large docs).
  • StAX — pull-style streaming.

Config files → DOM. Bulk data → SAX/StAX.

XML vs JSON vs YAML

<user>
  <name>Alice</name>
  <age>30</age>
</user>
{
	"name": "Alice",
	"age": 30
}
name: Alice
age: 30
  • XML — structured documents, DOM operations, legacy systems.
  • JSON — APIs, web, simple.
  • YAML — configs, human-authored.

New APIs default to JSON. XML still dominates SOAP, Office documents, SVG, Atom feeds.

Formatters

  • xmllint (libxml2) — xmllint --format file.xml.
  • prettier-plugin-xml — Prettier integration.
  • xmlstarlet — XML manipulation and formatting.
  • VS Code “XML Tools” etc.

Wire CI to format on PR for clean diffs.

Summary

  • Indent 2 or 4 spaces; avoid tabs.
  • Attributes for simple values, elements for structure.
  • Self-close empty elements.
  • Don’t mangle CDATA or comments.
  • Preserve namespace prefixes.

For XML formatting and validation, the XML formatter on this site handles the common cases.