mirror of
https://github.com/adambard/learnxinyminutes-docs.git
synced 2025-04-26 07:03:57 +00:00
[csv] shorten and move quote rules to first example
This commit is contained in:
parent
49be924382
commit
7b79257f21
112
csv.md
112
csv.md
@ -1,94 +1,62 @@
|
||||
---
|
||||
language: CSV
|
||||
name: CSV
|
||||
contributors:
|
||||
- [Timon Erhart, 'https://github.com/turbotimon/']
|
||||
- [Timon Erhart, 'https://github.com/turbotimon/']
|
||||
---
|
||||
|
||||
CSV (Comma-Separated Values) is a lightweight file format used to store tabular
|
||||
data in plain text, designed for easy data exchange between programs,
|
||||
particularly spreadsheets and databases. Its simplicity and human readability
|
||||
have made it a cornerstone of data interoperability. It is often used for
|
||||
moving data between programs with incompatible or proprietary formats.
|
||||
|
||||
While RFC 4180 provides a standard for the format, in practice, the term "CSV"
|
||||
is often used more broadly to refer to any text file that:
|
||||
|
||||
- Can be interpreted as tabular data
|
||||
- Uses a delimiter to separate fields (columns)
|
||||
- Uses line breaks to separate records (rows)
|
||||
- Optionally includes a header in the first row
|
||||
CSV (Comma-Separated Values) is a file format used to store tabular
|
||||
data in plain text.
|
||||
|
||||
```csv
|
||||
Name, Age, DateOfBirth
|
||||
Alice, 30, 1993-05-14
|
||||
Bob, 25, 1998-11-02
|
||||
Charlie, 35, 1988-03-21
|
||||
Name,Age,DateOfBirth,Comment
|
||||
Alice,30,1993-05-14,
|
||||
Bob,25,1998-11-02,
|
||||
Eve,,,data might be missing because it's just text
|
||||
"Charlie Brown",35,1988-03-21,strings can be quoted
|
||||
"Louis XIV, King of France",76,1638-09-05,strings containing commas must be quoted
|
||||
"Walter ""The Danger"" White",52,1958-09-07,quotes are escaped by doubling them up
|
||||
Joe Smith,33,1990-06-02,"multi line strings
|
||||
span multiple lines
|
||||
there are no escape characters"
|
||||
```
|
||||
|
||||
## Delimiters for Rows and Columns
|
||||
The first row might be a header of field names or there might be no header and
|
||||
the first line is already data.
|
||||
|
||||
Rows are typically separated by line breaks (`\n` or `\r\n`), while columns
|
||||
(fields) are separated by a specific delimiter. Although commas are the most
|
||||
common delimiter for fields, other characters, such as semicolons (`;`), are
|
||||
commonly used in regions where commas are decimal separators (e.g., Germany).
|
||||
Tabs (`\t`) are also used as delimiters in some cases, with such files often
|
||||
referred to as "TSV" (Tab-Separated Values).
|
||||
## Delimiters
|
||||
|
||||
Example using semicolons as delimiter and comma for decimal separator:
|
||||
Rows are separated by line breaks (`\n` or `\r\n`), columns are separated by a comma.
|
||||
|
||||
Tabs (`\t`) are sometimes used instead of commas and those files are called "TSVs"
|
||||
(Tab-Separated Values). They are easier to paste into Excel.
|
||||
|
||||
Occasionally other characters can be used, for example semicolons (`;`) may be used
|
||||
in Europe because commas are [decimal separators](https://en.wikipedia.org/wiki/Decimal_separator)
|
||||
instead of the decimal point.
|
||||
|
||||
```csv
|
||||
Name; Age; Grade
|
||||
Alice; 30; 50,50
|
||||
Bob; 25; 45,75
|
||||
Charlie; 35; 60,00
|
||||
Name;Age;Grade
|
||||
Alice;30;50,50
|
||||
Bob;25;45,75
|
||||
Charlie;35;60,00
|
||||
```
|
||||
|
||||
## Data Types
|
||||
|
||||
CSV files do not inherently define data types. Numbers and dates are stored as
|
||||
plain text, and their interpretation depends on the software importing the
|
||||
file. Typically, data is interpreted as follows:
|
||||
text. Interpreting and parsing them is left up to software using them.
|
||||
Typically, data is interpreted as follows:
|
||||
|
||||
```csv
|
||||
Data, Comment
|
||||
100, Interpreted as a number (integer)
|
||||
100.00, Interpreted as a number (floating-point)
|
||||
2024-12-03, Interpreted as a date or a string (depending on the parser)
|
||||
Hello World, Interpreted as text (string)
|
||||
"1234", Interpreted as text instead of a number
|
||||
Data,Comment
|
||||
100,Interpreted as a number (integer)
|
||||
100.00,Interpreted as a number (floating-point)
|
||||
2024-12-03,Interpreted as a date or a string (depending on the parser)
|
||||
Hello World,Interpreted as text (string)
|
||||
"1234",Interpreted as text instead of a number
|
||||
```
|
||||
|
||||
## Quoting Strings and Special Characters
|
||||
## Further reading
|
||||
|
||||
Quoting strings is only required if the string contains the delimiter, special
|
||||
characters, or otherwise could be interpreted as a number. However, it is
|
||||
often considered good practice to quote all strings to enhance readability and
|
||||
robustness.
|
||||
|
||||
```csv
|
||||
Quoting strings examples,
|
||||
Unquoted string,
|
||||
"Optionally quoted string (good practice)",
|
||||
"If it contains the delimiter, it needs to be quoted",
|
||||
"Also, if it contains special characters like \n newlines or \t tabs",
|
||||
"The quoting "" character itself typically is escaped by doubling the quote ("")",
|
||||
"or in some systems with a backslash \" (like other escapes)",
|
||||
```
|
||||
|
||||
However, make sure that for one document, the quoting method is consistent.
|
||||
For example, the last two examples of quoting with either "" or \" would
|
||||
not be consistent and could cause problems.
|
||||
|
||||
## Encoding
|
||||
|
||||
Different encodings are used. Most modern CSV files use UTF-8 encoding, but
|
||||
older systems might use others like ASCII or ISO-8859.
|
||||
|
||||
If the file is transferred or shared between different systems, it is a good
|
||||
practice to explicitly define the encoding used, to avoid issues with
|
||||
character misinterpretation.
|
||||
|
||||
## More Resources
|
||||
|
||||
+ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
|
||||
+ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
|
||||
* [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values)
|
||||
* [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180)
|
||||
|
Loading…
Reference in New Issue
Block a user