CSV quoting rules and the spreadsheet compatibility you need to think about

4 min read

CSV is “simple” until your data contains a comma, newline, or double quote. RFC 4180 specifies the rules; if you ignore them, Excel and Google Sheets disagree, parsers fail, and rows go missing. This article walks through the canonical rules and the spreadsheet quirks that come with them.

RFC 4180: the de-facto standard

CSV had no formal spec for a long time, but RFC 4180 (2005) defined a baseline. It is informational rather than mandatory, but most modern libraries follow it.

Key rules:

  1. Fields are separated by commas.
  2. Lines are separated by CRLF (\r\n).
  3. A field that contains ,, ", or \r\n must be enclosed in double quotes.
  4. A double quote inside a quoted field is escaped as "".
  5. The first row may be a header (optional).
  6. Encoding is ASCII (in practice, UTF-8 dominates).

When fields need quoting

name,age,comment
Taro Yamada,30,Lives in Tokyo
"Sato, Hanako",25,"Name with a comma"
"Comment with
a newline",40,"Includes ""quotes"""

By rule:

FieldNeeds quoting?
Hello worldNo
Hello, worldYes (contains a comma)
Line1\nLine2Yes (contains a newline)
He said "Hi"Yes (contains a double quote)
123No

Escaping double quotes: ""

To include a " in a field, write two consecutive "" and quote the whole field:

input data: He said "Hi"
CSV form:   "He said ""Hi"""

Note this is not backslash escape (\"). CSV doesn’t treat backslash specially — it’s just another character.

Line ending pitfalls: CRLF vs LF

RFC 4180 calls for CRLF (\r\n), but in practice CSV files mix:

  • Windows — CRLF (\r\n)
  • macOS / Linux — LF (\n)
  • Old Mac — CR (\r)

Most parsers accept all three for line breaks between rows, but newlines inside quoted fields are a different story:

  • Outside fields (row separators): parsers normalize.
  • Inside fields (quoted data): preserved as-is.

Move a CSV with embedded newlines across operating systems, and you can end up with mixed line endings in the same file. Excel re-saves to Windows CRLF, which can silently rewrite intentional content.

The BOM problem with Excel

Saving a UTF-8 CSV and opening it in Excel often mojibakes Japanese characters. The reason:

  • UTF-8 with BOM → Excel reads as UTF-8.
  • UTF-8 without BOM → Excel falls back to the system locale (Shift_JIS on Japanese Windows).

The UTF-8 BOM is three bytes:

0xEF 0xBB 0xBF

Prepending it lets Excel detect UTF-8. Google Sheets and Numbers are fine with or without BOM.

// Emit Excel-friendly UTF-8 CSV
const bom = '';
const csv = bom + 'name,age\nTaro,30\n';

Adding a BOM where it isn’t needed can leave a stray character at the start of the first field. Add it only when Excel compatibility is in scope.

Excel’s silent type coercion

Opening a CSV in Excel can silently rewrite values:

  • Leading zeros stripped000123123
  • Long numbers in scientific notation12345678901234561.23457E+15
  • Date-like strings parsed as dates1-2 → “January 2”
  • Phone numbers misread as expressions03-1234-5678

Mitigations:

  • Wrap the value as ="..." to force text: ="000123"
  • Use TSV — Excel is gentler with tabs.
  • Use the “Get Data → From Text” import wizard, which lets you set per-column types.

Double-clicking a CSV gets you Excel’s defaults; the wizard gets you control.

Regional separator differences

In countries that use , for the decimal separator, CSV typically uses ; as the field separator:

  • Japan / US / UK — separator ,, decimal .
  • Germany / France / Italy — separator ;, decimal ,

Internationally distributed CSV exports often need locale-aware separators. Strictly speaking, RFC 4180 says comma, but reality forces flexibility.

Type inference when converting to JSON

Every CSV field is a string. Converting to JSON forces a design choice on whether to infer types:

CSV:   name,age,active
       Taro,30,true

leave as strings: { "name": "Taro", "age": "30", "active": "true" }
type-coerced:     { "name": "Taro", "age": 30, "active": true }

Type inference is convenient — until "03-1234" (a phone number) gets parsed as -1231. Safest default: keep everything as strings; offer an explicit per-column type option.

CSV checklist

  • Quote any field containing ,, newline, or ".
  • Escape " as "", not \".
  • Add UTF-8 BOM only when Excel compatibility is required.
  • Tolerate CRLF and LF for line endings.
  • Be careful about Excel’s automatic type conversions.
  • Account for regional separator differences (; in parts of Europe).

To convert between CSV and JSON, the converter on this site handles quoting and escapes correctly. Running tricky data through it before integrating is a good sanity check.