From a44f874f78cea0378bdbcad50c82247f250712c7 Mon Sep 17 00:00:00 2001 From: Timon Erhart <57718207+turbotimon@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:19:30 +0000 Subject: [PATCH 1/3] finish first version of csv --- csv.html.markdown | 76 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 csv.html.markdown diff --git a/csv.html.markdown b/csv.html.markdown new file mode 100644 index 00000000..56a4a731 --- /dev/null +++ b/csv.html.markdown @@ -0,0 +1,76 @@ +--- +language: CSV +filename: learncsv.csv +contributors: +- [Timon Erhart, 'https://github.com/turbotimon/'] +--- + +CSV (Comma-Separated Values) is a lightweight file format used to store tabular data in plain text, designed for easy data exchange between programs, particularly spreadsheets and databases. Its simplicity and human readability have made it a cornerstone of data interoperability. It is often used for moving data between programs with incompatible or proprietary formats. + +While RFC 4180 provides a standard for the format, in practice, the term "CSV" is often used more broadly to refer to any text file that: + +- Can be interpreted as tabular data +- Uses a delimiter to separate fields (columns) +- Uses line breaks to separate records (rows) +- Optionally includes a header in the first row + + +```csv +Name, Age, DateOfBirth +Alice, 30, 1993-05-14 +Bob, 25, 1998-11-02 +Charlie, 35, 1988-03-21 +``` + +**Delimiters for Rows and Columns** + +Rows are typically separated by line breaks (\n or \r\n), while columns (fields) are separated by a specific delimiter. Although commas are the most common delimiter for fields, other characters, such as semicolons (;), are commonly used in regions where commas are decimal separators (e.g., Germany). Tabs (\t) are also used as delimiters in some cases, with such files often referred to as "TSV" (Tab-Separated Values). + +Example using semicolons as delimiter and comma for decimal separator: + +```csv +Name; Age; Grade +Alice; 30; 50,50 +Bob; 25; 45,75 +Charlie; 35; 60,00 +```csv + +**Data Types** + +CSV files do not inherently define data types. Numbers and dates are stored as plain text, and their interpretation depends on the software importing the file. Typically, data is interpreted as follows: + +```csv +Data, Comment +100, Interpreted as a number (integer) +100.00, Interpreted as a number (floating-point) +2024-12-03, Interpreted as a date or a string (depending on the parser) +Hello World, Interpreted as text (string) +"1234", Interpreted as text instead of a number +```csv + +**Quoting Strings and Special Characters** + +Quoting strings is only required if the string contains the delimiter, special characters, or otherwise could be interpreted as a number. However, it is often considered good practice to quote all strings to enhance readability and robustness. + +```csv +Quoting strings examples, +Unquoted string, +"Optionally quoted string (good practice)", +"If it contains the delimiter, it needs to be quoted", +"Also, if it contains special characters like \n newlines or \t tabs", +"The quoting "" character itself typically is escaped by doubling the quote ("")", +"or in some systems with a backslash \" (like other escapes)", +``` + +However, make sure that for one document, the quoting method is consistent. For example, the last two examples of quoting with either "" or \" would not be consistent and could cause problems. + +** Encoding ** + +Most modern CSV files use UTF-8 encoding, but older systems might use others like ASCII or ISO-8859. + +If the file is transferred or shared between different systems, it is a good practice to explicitly define the encoding used, to avoid issues with character misinterpretation. + +### More Resources + ++ [Wikipedia](https://en.wikipedia.org/wiki/Comma-separated_values) ++ [RFC 4180](https://datatracker.ietf.org/doc/html/rfc4180) From 813969f8987eb880f71a54c3826ada8e17a7a0e3 Mon Sep 17 00:00:00 2001 From: Timon Erhart <57718207+turbotimon@users.noreply.github.com> Date: Tue, 3 Dec 2024 15:36:47 +0000 Subject: [PATCH 2/3] fix format --- csv.html.markdown | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/csv.html.markdown b/csv.html.markdown index 56a4a731..cc74cfb1 100644 --- a/csv.html.markdown +++ b/csv.html.markdown @@ -24,7 +24,7 @@ Charlie, 35, 1988-03-21 **Delimiters for Rows and Columns** -Rows are typically separated by line breaks (\n or \r\n), while columns (fields) are separated by a specific delimiter. Although commas are the most common delimiter for fields, other characters, such as semicolons (;), are commonly used in regions where commas are decimal separators (e.g., Germany). Tabs (\t) are also used as delimiters in some cases, with such files often referred to as "TSV" (Tab-Separated Values). +Rows are typically separated by line breaks (`\n` or `\r\n`), while columns (fields) are separated by a specific delimiter. Although commas are the most common delimiter for fields, other characters, such as semicolons (`;`), are commonly used in regions where commas are decimal separators (e.g., Germany). Tabs (`\t`) are also used as delimiters in some cases, with such files often referred to as "TSV" (Tab-Separated Values). Example using semicolons as delimiter and comma for decimal separator: @@ -33,7 +33,7 @@ Name; Age; Grade Alice; 30; 50,50 Bob; 25; 45,75 Charlie; 35; 60,00 -```csv +``` **Data Types** @@ -46,7 +46,7 @@ Data, Comment 2024-12-03, Interpreted as a date or a string (depending on the parser) Hello World, Interpreted as text (string) "1234", Interpreted as text instead of a number -```csv +``` **Quoting Strings and Special Characters** @@ -64,9 +64,9 @@ Unquoted string, However, make sure that for one document, the quoting method is consistent. For example, the last two examples of quoting with either "" or \" would not be consistent and could cause problems. -** Encoding ** +**Encoding** -Most modern CSV files use UTF-8 encoding, but older systems might use others like ASCII or ISO-8859. +Different encodings are used. Most modern CSV files use UTF-8 encoding, but older systems might use others like ASCII or ISO-8859. If the file is transferred or shared between different systems, it is a good practice to explicitly define the encoding used, to avoid issues with character misinterpretation. From 6016dcba8c966381d7ac4e115ee36ee83464a467 Mon Sep 17 00:00:00 2001 From: Timon Erhart <57718207+turbotimon@users.noreply.github.com> Date: Tue, 3 Dec 2024 17:53:42 +0000 Subject: [PATCH 3/3] fix line lenght --- csv.html.markdown | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/csv.html.markdown b/csv.html.markdown index cc74cfb1..2376c863 100644 --- a/csv.html.markdown +++ b/csv.html.markdown @@ -5,16 +5,20 @@ contributors: - [Timon Erhart, 'https://github.com/turbotimon/'] --- -CSV (Comma-Separated Values) is a lightweight file format used to store tabular data in plain text, designed for easy data exchange between programs, particularly spreadsheets and databases. Its simplicity and human readability have made it a cornerstone of data interoperability. It is often used for moving data between programs with incompatible or proprietary formats. +CSV (Comma-Separated Values) is a lightweight file format used to store tabular +data in plain text, designed for easy data exchange between programs, +particularly spreadsheets and databases. Its simplicity and human readability +have made it a cornerstone of data interoperability. It is often used for +moving data between programs with incompatible or proprietary formats. -While RFC 4180 provides a standard for the format, in practice, the term "CSV" is often used more broadly to refer to any text file that: +While RFC 4180 provides a standard for the format, in practice, the term "CSV" + is often used more broadly to refer to any text file that: - Can be interpreted as tabular data - Uses a delimiter to separate fields (columns) - Uses line breaks to separate records (rows) - Optionally includes a header in the first row - ```csv Name, Age, DateOfBirth Alice, 30, 1993-05-14 @@ -24,7 +28,12 @@ Charlie, 35, 1988-03-21 **Delimiters for Rows and Columns** -Rows are typically separated by line breaks (`\n` or `\r\n`), while columns (fields) are separated by a specific delimiter. Although commas are the most common delimiter for fields, other characters, such as semicolons (`;`), are commonly used in regions where commas are decimal separators (e.g., Germany). Tabs (`\t`) are also used as delimiters in some cases, with such files often referred to as "TSV" (Tab-Separated Values). +Rows are typically separated by line breaks (`\n` or `\r\n`), while columns + (fields) are separated by a specific delimiter. Although commas are the most + common delimiter for fields, other characters, such as semicolons (`;`), are + commonly used in regions where commas are decimal separators (e.g., Germany). + Tabs (`\t`) are also used as delimiters in some cases, with such files often + referred to as "TSV" (Tab-Separated Values). Example using semicolons as delimiter and comma for decimal separator: @@ -37,7 +46,9 @@ Charlie; 35; 60,00 **Data Types** -CSV files do not inherently define data types. Numbers and dates are stored as plain text, and their interpretation depends on the software importing the file. Typically, data is interpreted as follows: +CSV files do not inherently define data types. Numbers and dates are stored as + plain text, and their interpretation depends on the software importing the + file. Typically, data is interpreted as follows: ```csv Data, Comment @@ -50,7 +61,10 @@ Hello World, Interpreted as text (string) **Quoting Strings and Special Characters** -Quoting strings is only required if the string contains the delimiter, special characters, or otherwise could be interpreted as a number. However, it is often considered good practice to quote all strings to enhance readability and robustness. +Quoting strings is only required if the string contains the delimiter, special + characters, or otherwise could be interpreted as a number. However, it is + often considered good practice to quote all strings to enhance readability and + robustness. ```csv Quoting strings examples, @@ -62,13 +76,18 @@ Unquoted string, "or in some systems with a backslash \" (like other escapes)", ``` -However, make sure that for one document, the quoting method is consistent. For example, the last two examples of quoting with either "" or \" would not be consistent and could cause problems. +However, make sure that for one document, the quoting method is consistent. + For example, the last two examples of quoting with either "" or \" would + not be consistent and could cause problems. **Encoding** -Different encodings are used. Most modern CSV files use UTF-8 encoding, but older systems might use others like ASCII or ISO-8859. +Different encodings are used. Most modern CSV files use UTF-8 encoding, but + older systems might use others like ASCII or ISO-8859. -If the file is transferred or shared between different systems, it is a good practice to explicitly define the encoding used, to avoid issues with character misinterpretation. +If the file is transferred or shared between different systems, it is a good + practice to explicitly define the encoding used, to avoid issues with + character misinterpretation. ### More Resources