mirror of
https://github.com/adambard/learnxinyminutes-docs.git
synced 2025-04-26 15:13:56 +00:00
[csv] shorten and move quote rules to first example
This commit is contained in:
parent
49be924382
commit
7b79257f21
112
csv.md
112
csv.md
@ -1,94 +1,62 @@
|
|||||||
---
|
---
|
||||||
language: CSV
|
name: CSV
|
||||||
contributors:
|
contributors:
|
||||||
- [Timon Erhart, 'https://github.com/turbotimon/']
|
- [Timon Erhart, 'https://github.com/turbotimon/']
|
||||||
---
|
---
|
||||||
|
|
||||||
CSV (Comma-Separated Values) is a lightweight file format used to store tabular
|
CSV (Comma-Separated Values) is a file format used to store tabular
|
||||||
data in plain text, designed for easy data exchange between programs,
|
data in plain text.
|
||||||
particularly spreadsheets and databases. Its simplicity and human readability
|
|
||||||
have made it a cornerstone of data interoperability. It is often used for
|
|
||||||
moving data between programs with incompatible or proprietary formats.
|
|
||||||
|
|
||||||
While RFC 4180 provides a standard for the format, in practice, the term "CSV"
|
|
||||||
is often used more broadly to refer to any text file that:
|
|
||||||
|
|
||||||
- Can be interpreted as tabular data
|
|
||||||
- Uses a delimiter to separate fields (columns)
|
|
||||||
- Uses line breaks to separate records (rows)
|
|
||||||
- Optionally includes a header in the first row
|
|
||||||
|
|
||||||
```csv
|
```csv
|
||||||
Name, Age, DateOfBirth
|
Name,Age,DateOfBirth,Comment
|
||||||
Alice, 30, 1993-05-14
|
Alice,30,1993-05-14,
|
||||||
Bob, 25, 1998-11-02
|
Bob,25,1998-11-02,
|
||||||
Charlie, 35, 1988-03-21
|
Eve,,,data might be missing because it's just text
|
||||||
|
"Charlie Brown",35,1988-03-21,strings can be quoted
|
||||||
|
"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
|
||||||
|
"Walter ""The Danger"" White",52,1958-09-07,quotes are escaped by doubling them up
|
||||||
|
Joe Smith,33,1990-06-02,"multi line strings
|
||||||
|
span multiple lines
|
||||||
|
there are no escape characters"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Delimiters for Rows and Columns
|
The first row might be a header of field names or there might be no header and
|
||||||
|
the first line is already data.
|
||||||
|
|
||||||
Rows are typically separated by line breaks (`\n` or `\r\n`), while columns
|
## Delimiters
|
||||||
(fields) are separated by a specific delimiter. Although commas are the most
|
|
||||||
common delimiter for fields, other characters, such as semicolons (`;`), are
|
|
||||||
commonly used in regions where commas are decimal separators (e.g., Germany).
|
|
||||||
Tabs (`\t`) are also used as delimiters in some cases, with such files often
|
|
||||||
referred to as "TSV" (Tab-Separated Values).
|
|
||||||
|
|
||||||
Example using semicolons as delimiter and comma for decimal separator:
|
Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma.
|
||||||
|
|
||||||
|
Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs"
|
||||||
|
(Tab-Separated Values). They are easier to paste into Excel.
|
||||||
|
|
||||||
|
Occasionally other characters can be used, for example semicolons (`;`) may be used
|
||||||
|
in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator)
|
||||||
|
instead of the decimal point.
|
||||||
|
|
||||||
```csv
|
```csv
|
||||||
Name; Age; Grade
|
Name;Age;Grade
|
||||||
Alice; 30; 50,50
|
Alice;30;50,50
|
||||||
Bob; 25; 45,75
|
Bob;25;45,75
|
||||||
Charlie; 35; 60,00
|
Charlie;35;60,00
|
||||||
```
|
```
|
||||||
|
|
||||||
## Data Types
|
## Data Types
|
||||||
|
|
||||||
CSV files do not inherently define data types. Numbers and dates are stored as
|
CSV files do not inherently define data types. Numbers and dates are stored as
|
||||||
plain text, and their interpretation depends on the software importing the
|
text. Interpreting and parsing them is left up to software using them.
|
||||||
file. Typically, data is interpreted as follows:
|
Typically, data is interpreted as follows:
|
||||||
|
|
||||||
```csv
|
```csv
|
||||||
Data, Comment
|
Data,Comment
|
||||||
100, Interpreted as a number (integer)
|
100,Interpreted as a number (integer)
|
||||||
100.00, Interpreted as a number (floating-point)
|
100.00,Interpreted as a number (floating-point)
|
||||||
2024-12-03, Interpreted as a date or a string (depending on the parser)
|
2024-12-03,Interpreted as a date or a string (depending on the parser)
|
||||||
Hello World, Interpreted as text (string)
|
Hello World,Interpreted as text (string)
|
||||||
"1234", Interpreted as text instead of a number
|
"1234",Interpreted as text instead of a number
|
||||||
```
|
```
|
||||||
|
|
||||||
## Quoting Strings and Special Characters
|
## Further reading
|
||||||
|
|
||||||
Quoting strings is only required if the string contains the delimiter, special
|
* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
|
||||||
characters, or otherwise could be interpreted as a number. However, it is
|
* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
|
||||||
often considered good practice to quote all strings to enhance readability and
|
|
||||||
robustness.
|
|
||||||
|
|
||||||
```csv
|
|
||||||
Quoting strings examples,
|
|
||||||
Unquoted string,
|
|
||||||
"Optionally quoted string (good practice)",
|
|
||||||
"If it contains the delimiter, it needs to be quoted",
|
|
||||||
"Also, if it contains special characters like \n newlines or \t tabs",
|
|
||||||
"The quoting "" character itself typically is escaped by doubling the quote ("")",
|
|
||||||
"or in some systems with a backslash \" (like other escapes)",
|
|
||||||
```
|
|
||||||
|
|
||||||
However, make sure that for one document, the quoting method is consistent.
|
|
||||||
For example, the last two examples of quoting with either "" or \" would
|
|
||||||
not be consistent and could cause problems.
|
|
||||||
|
|
||||||
## Encoding
|
|
||||||
|
|
||||||
Different encodings are used. Most modern CSV files use UTF-8 encoding, but
|
|
||||||
older systems might use others like ASCII or ISO-8859.
|
|
||||||
|
|
||||||
If the file is transferred or shared between different systems, it is a good
|
|
||||||
practice to explicitly define the encoding used, to avoid issues with
|
|
||||||
character misinterpretation.
|
|
||||||
|
|
||||||
## More Resources
|
|
||||||
|
|
||||||
+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
|
|
||||||
+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
|
|
||||||
|
Loading…
Reference in New Issue
Block a user