What Is a CSV File? A Practical CSV Definition for Real-World Use

Despite their simplicity, CSV files come with surprising complexity. If you’re working with structured data, here’s what you really need to know about how CSV works—and where it doesn’t.

CSV files aim to satisfy at least three goals:

  1. Computers should be able to read and process the content
  2. The files should be easy for humans to read
  3. Storage should be rather terse and not waste space

With these goals in mind, lots of the quirks we see in practical usage are easier to understand.


CSV isn’t a real standard

There’s no official, universally followed specification for CSV. The most commonly referenced document is RFC 4180, published in 2005. But it’s not a strict standard—just a best-effort attempt to describe what CSV should look like.

Even RFC 4180 itself acknowledges this:

While there are various specifications and implementations for the CSV format, there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files.

In other words: RFC 4180 defines one way CSV could work. But many tools, programming languages, and legacy systems don’t follow it at all.


What RFC 4180 actually says

Here’s a condensed version of what RFC 4180 recommends:

  • One record per line
    Each line is a new row. Line endings should be CRLF (\r\n), but many systems use LF (\n) anyway.

  • Commas as field separators
    It assumes commas—not semicolons or tabs.

  • Optional header row
    The first line may contain column names, but it’s not required.

  • Fields can be quoted with double quotes
    Quoting is needed when a field contains the delimiter, quotes, or line breaks.

  • Escaping quotes by doubling
    If a quoted field includes a double quote, it must be escaped by writing two double quotes ("").

Sounds reasonable? Yes—but most real-world CSV files deviate from these rules in one way or another. That’s why relying solely on RFC 4180 compatibility isn’t enough.


Many CSV files don’t have a header row

A CSV file doesn’t need to start with a header. Some do, some don’t. Some use metadata elsewhere (like in a README or separate spec file). You can’t rely on the first row being column names. At the other extreme, some so-called CSV files may even include multiple header rows.


CSV doesn’t always mean comma-separated

While “CSV” stands for Comma-Separated Values, many files use semicolons, tabs, or pipes instead. In some regions, semicolon is the default because the comma is used as a decimal separator. Quoted fields can still contain commas, but this often makes the file harder for humans to read. Switching to semicolon as the separator keeps the file easier to scan for human readers in such a case. This is the reason CSV is sometimes explained as Character-Separated Values.


Quoting and escaping rules vary

Some files quote every field. Others only quote fields that contain a separator or newline. Quotes within quoted fields are usually escaped by doubling them ("He said ""hi"""), but not always. There’s no universal escaping rule. CSV parsers should aim to be as fault-tolerant as possible. Troubles arise when applying sloppy rules leads to ambiguity.


Newlines are allowed inside fields

If a field is quoted, it can contain actual line breaks. This means you can’t reliably split a CSV file by line and expect one record per line, though you’ll find lots of such false solutions when people try to process a CSV file using command-line tools. Only if you’re completely sure that no field contains a line break should you use such a naive approach.


Everything is a string

Coming from spreadsheet applications, many users assume columns or fields in a CSV document have some kind of well-defined types like Integer or Float or Date. This isn’t technically possible as in a CSV there’s no metadata that would contain such information. CSV is extremely WYSIWYG—what you see is what you get.

Some applications—like Excel, and even some CSV-specific tools—try to guess a column’s data type based on its content. While this seems to be rather comfortable, it can cause all kinds of problems. Perhaps your CSV contains the product number 9007199254740993. An application reading it and converting it to a numeric type using IEEE 754 floating point numbers internally, converts this number to 9007199254740990. If you don’t believe it, try it for yourself in Excel. Or in Python.


Spaces are part of the content

If a CSV document has this line abc,def ,ghi (please note the space after “def” here), then the three fields would have this content: abc, def and ghi. According to RFC 4180, spaces around the “actual” content are part of this content.

You shouldn’t use spaces (or other whitespace characters) to make a CSV file more readable for humans.


Duplicate column names are technically allowed

Even if a header row is present, nothing stops it from having duplicate column names. Most software will accept this without warning—and if you just use CSV editors or spreadsheet applications to open and read such a file, there shouldn’t be any problems. The trouble begins when you try to import a CSV file with duplicate column names into a relational database where each column name has to be unique. Or if you convert the CSV document to JSON or XML.


Summary

CSV is best understood as a convention, not a specification. It’s flexible—and that’s both its strength and its biggest weakness. If you’re building or using tools that handle CSV files, make as few assumptions as possible:

  • Don’t expect a header row.
  • Don’t assume the separator is a comma.
  • Don’t rely on consistent quoting or escaping.
  • Don’t assume one row per line.
  • Don’t trust that each row has the same number of columns.
  • Don’t treat CSV fields as typed data.
  • Don’t expect field names to be unique.

Bottom line:
A CSV file is just text—nothing more. Its structure depends entirely on convention and context. Treat it with caution, and always validate what you’re working with.

 


 

This tutorial is based on a newsletter issue from The Missing Header — my practical data newsletter. Subscribe now to get more real-world insights into data handling, CSV quirks, and tooling tips: