Base64 padding deep dive: when `=` is required, when it's optional, and how libraries disagree

5 min read

Most developers know the = at the end of a Base64 string is “to make the length a multiple of four.” The harder question — when can you omit it and when must you keep it — is the one that actually shows up in code review. This article works from RFC 4648 outward to the per-library decoder behavior that bites in production.

Why = exists

Base64 maps 3 input bytes (24 bits) to 4 output characters (4 × 6 bits). When the input length is not a multiple of 3, the final group is short of 4 chars. = pads it to 4.

Input bytes mod 3TrailingExample (A = 0x41)
0no paddingAAA (3 B) → QUFB
1==A (1 B) → QQ==
2=AA (2 B) → QUE=

The number of = characters (0/1/2) encodes input length mod 3 — that’s the actual function, not just length-rounding.

What RFC 4648 actually says

RFC 4648 is the spec. The relevant text on padding is §3.2:

The pad character ”=” is typically percent-encoded when used in an URI, but if the data length is known implicitly, this can be avoided by skipping the padding;

So: if the consumer knows where the data ends from external framing, padding can be omitted. Examples cited downstream include JWT (uses . separators) and Base64 in data: URIs (HTTP framing/length).

When padding is required:

  • Streaming consumption where the end isn’t known
  • Concatenated Base64 blocks where you need a boundary marker

“External framing” means things like HTTP Content-Length, URL path-segment boundaries, JSON string-value closing quotes — anywhere the outer protocol fixes the end.

Why JWT drops padding

JWT joins three Base64url parts with .:

eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJqb2huIn0.AbCdEf...

The . is the boundary marker. Padding is unnecessary. Worse, leaving = in puts a query-delimiter character into URLs. RFC 7515 §2 explicitly defines Base64url for JWT as padding stripped.

Library disagreement

The spec permits omission, but every standard library has its own opinion on whether to accept unpadded input on decode.

Language / libraryPadded inputUnpadded input
Node.js Buffer.from(s, 'base64url')
Node.js Buffer.from(s, 'base64')✓ (lenient)
Python base64.urlsafe_b64decodebinascii.Error
Python base64.b64decode
Java Base64.getUrlDecoder()
Java Base64.getDecoder()
Go base64.URLEncoding.DecodeString
Go base64.RawURLEncoding.DecodeString
PHP base64_decode
Ruby Base64.urlsafe_decode64✗ (Ruby ≤ 2.4)

Python and Go are strict; Java and Node are lenient. “It worked in Node but Python rejects it” is a frequent porting bug.

Defensive code

When handling tokens that may arrive unpadded (JWT, etc.), pad before decoding:

function padBase64url(s) {
	return s + '='.repeat((4 - (s.length % 4)) % 4);
}
def pad_base64url(s: str) -> str:
    return s + '=' * ((4 - len(s) % 4) % 4)

The reverse direction (stripping) is just s.replace(/=+$/, '').

What RFC 4648 does NOT permit

Some related rules that get confused:

  • §3.2 Padding: omittable when external framing is present.
  • §3.3 Non-alphabet characters: = is allowed only at the end. Middle-of-string = is invalid.
  • §3.5 Canonical Encoding: defines a strict canonical form — always omit when permitted, and unused bits in the final quad must be zero.

“Padding may be omitted” is a different rule from ”= may appear anywhere.” Reject = anywhere except trailing.

The unused-bit malleability

Separate from padding: when input has 1 byte (8 bits) and the encoder emits 2 chars (12 bits), there are 4 leftover bits. RFC 4648 canonical form requires those bits to be zero — but most libraries decode non-zero leftover bits without complaint.

Base64: "QQ"  → 0x41 + 4 leftover bits = 0
Base64: "QR"  → 0x41 + different 4 leftover bits → still 0x41

"QQ" and "QR" both decode to "A". This is Base64 malleability: distinct Base64 strings yielding the same bytes. Attackers can use it to bypass equality checks on encoded inputs.

For signature/MAC pipelines that compare Base64-decoded bytes, enforce canonical form: decode, re-encode, and require equality with the original input.

Practical guidelines

  • JWT, OAuth, WebPush — pad before decode, since wire tokens are unpadded.
  • Base64 over HTTP API parameters — emit with padding; accept either form on decode.
  • Data URIs (data:image/png;base64,...) — keep padding; not strictly required but browsers vary.
  • Cryptographic signatures and MACs — enforce canonical form (zero leftover bits) and re-encoding equality.

Summary

Base64’s = is not just length padding — it encodes the input’s length % 3, letting decoders recover exact byte counts. RFC 4648 §3.2 explicitly permits omission when external framing supplies the length, which is what JWT relies on. Whether a given library accepts unpadded input is implementation-dependent: Python and Go are strict, Node and Java are lenient. Padding before decode is the highest-compatibility approach for tokens that may arrive in either form.

To experiment with byte-by-byte encoding, the Base64 tool covers the standard and url-safe variants.