Why Base64 grows by 4/3, and how URL-safe Base64 differs

4 min read

Base64 is one of those technologies you keep encountering without ever quite remembering. Why does the output get bigger? Why are there two variants — one with +/ and one with `-`?_ Understanding the spec makes debugging and implementation much less mysterious. This article unpacks Base64 from first principles.

Why Base64 exists at all

Many text-oriented protocols — email, HTTP headers, JSON, URLs — were designed without expecting binary payloads to flow through them as-is:

  • SMTP assumes 7-bit ASCII. Raw 8-bit bytes don’t survive.
  • A null byte or newline embedded in a JSON string breaks the parser.
  • A ? or & inside a URL parameter confuses query parsers.

Base64 solves this by encoding arbitrary bytes as printable ASCII characters. The spec lives in RFC 4648.

The core idea: 3 bytes → 4 characters

Base64 takes 3 input bytes (24 bits), splits them into four 6-bit chunks, and represents each chunk as one character:

input:    [byte0       ][byte1       ][byte2       ]
          8 + 8 + 8 = 24 bits

output:   [c0   ][c1   ][c2   ][c3   ]
          6 + 6 + 6 + 6 = 24 bits

Each 6-bit chunk has 2^6 = 64 possible values, hence the alphabet of A-Z a-z 0-9 + / — exactly 64 characters (“Base64”).

For example, encoding Man:

'M'  = 0x4D = 01001101
'a'  = 0x61 = 01100001
'n'  = 0x6E = 01101110

bit concat: 010011010110000101101110
6-bit slice: 010011 010110 000101 101110
             = 19, 22, 5, 46
table lookup: T, W, F, u

→ "TWFu"

Decoding is the symmetric inverse — table lookup the other way.

Why the 4/3 expansion

Note that we represent 3 input bytes (24 bits) using 4 output characters (4 × 8 bits = 32 bits). Each output character is one ASCII byte (8 bits) carrying only 6 bits of useful data, so 2 out of every 8 bits are wasted.

The size ratio:

output_size / input_size = 32 / 24 = 4/3 ≈ 1.333

Base64 inflates data by about 33%. This is unavoidable, and it’s the reason images embedded as data URIs are larger than the original files.

Padding for non-multiples of 3 bytes

Inputs are not always a multiple of 3 bytes. Handling the leftover 1 or 2 bytes is the slightly fiddly part of Base64.

Leftover bytesHandlingOutput
1 byte (8 bits)6 + 2 zero-padded bits → 2 chars + ==4 chars (2 of them =)
2 bytes (16 b)6 + 6 + 4 zero-padded bits → 3 chars + =4 chars (1 of them =)
3 bytes (24 b)No padding4 chars

= is a “this slot is empty” marker. Decoders count trailing = to recover the original byte length.

Some implementations omit = (“unpadded Base64”). Decoders then need to either infer the leftover length or reject the input. Base64URL commonly drops padding.

Why a URL-safe variant exists

Standard Base64 includes + and /, both of which are problematic in URLs:

  • + means space in URL query strings (application/x-www-form-urlencoded).
  • / is a path separator.
  • = separates key=value.

Embedding such characters directly forces percent-encoding. To avoid that, Base64URL (RFC 4648 §5) defines:

Standard Base64Base64URL
+-
/_
= paddingtypically omitted

JWT headers and payloads use Base64URL. When you see - and _ in an encoded string, you are looking at the URL-safe variant.

Tips for implementing your own decoder

To run Base64URL through a standard decoder like atob() (browser) or Buffer.from(str, 'base64') (Node.js), pre-process:

function fromBase64Url(s) {
	// restore + and /
	let b = s.replace(/-/g, '+').replace(/_/g, '/');
	// pad to a multiple of 4
	while (b.length % 4) b += '=';
	return atob(b);
}

Encoding the other way is +-, /_, strip =.

Encoding is not encryption

The most consequential misconception about Base64 is “Base64 makes my data a little bit safe”. Base64 is not encryption. Anyone can decode it via a lookup table.

Logging an API key as Base64 is just as dangerous as logging it in plaintext. Use encryption to keep data secret, hashing to derive identifiers, and pick the right tool for what you actually want.

When you need to encode or decode something quickly, the Base64 tool on this site does it in your browser. Nothing leaves your device, so it’s safe to try with sensitive inputs.