From d1216a4253c1b03641c10b171030d04227ad8408 Mon Sep 17 00:00:00 2001 From: Sachin Divekar Date: Sun, 26 Jun 2016 18:08:05 +0530 Subject: [PATCH] Add an example of trap command (#1826) * Begin writing document for PCRE Started writing learnxinyminutes document for PCRE to cover general purpose regular expressions. Added introduction and a couple of details. * Change introductory example for regex The old example was incorrect. It's replaced with a simple one. * Add some more introductory text * Add first example * Added more example and a table for proper formatting * Add few more examples * Formatting * Improve example * Edit description of character classes * Add a way to test regex Add https://regex101.com/ web application to test the regex provided in example. * Add example of trap command trap is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using trap command i.e. cleanup upon receiving signal is added. * Revert "Add example of trap command" * Add an example of trap command `trap` is a very important command to intercept a fatal signal, perform cleanup, and then exit gracefully. It needs an entry in this document. Here a simple and most common example of using `trap` command i.e. cleanup upon receiving signal is added. --- bash.html.markdown | 3 ++ pcre.html.markdown | 82 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 85 insertions(+) create mode 100644 pcre.html.markdown diff --git a/bash.html.markdown b/bash.html.markdown index 02d7f31e..c2c3e3f1 100644 --- a/bash.html.markdown +++ b/bash.html.markdown @@ -272,6 +272,9 @@ grep -c "^foo.*bar$" file.txt # and not the regex, use fgrep (or grep -F) fgrep "foobar" file.txt +# trap command allows you to execute a command when a signal is received by your script. +# Here trap command will execute rm if any one of the three listed signals is received. +trap "rm $TEMP_FILE; exit" SIGHUP SIGINT SIGTERM # Read Bash shell builtins documentation with the bash 'help' builtin: help diff --git a/pcre.html.markdown b/pcre.html.markdown new file mode 100644 index 00000000..0b61653d --- /dev/null +++ b/pcre.html.markdown @@ -0,0 +1,82 @@ +--- +language: PCRE +filename: pcre.txt +contributors: + - ["Sachin Divekar", "http://github.com/ssd532"] + +--- + +A regular expression (regex or regexp for short) is a special text string for describing a search pattern. e.g. to extract domain name from a string we can say `/^[a-z]+:/` and it will match `http:` from `http://github.com/`. + +PCRE (Perl Compatible Regular Expressions) is a C library implementing regex. It was written in 1997 when Perl was the de-facto choice for complex text processing tasks. The syntax for patterns used in PCRE closely resembles Perl. PCRE syntax is being used in many big projects including PHP, Apache, R to name a few. + + +There are two different sets of metacharacters: +* Those that are recognized anywhere in the pattern except within square brackets +``` + \ general escape character with several uses + ^ assert start of string (or line, in multiline mode) + $ assert end of string (or line, in multiline mode) + . match any character except newline (by default) + [ start character class definition + | start of alternative branch + ( start subpattern + ) end subpattern + ? extends the meaning of ( + also 0 or 1 quantifier + also quantifier minimizer + * 0 or more quantifier + + 1 or more quantifier + also "possessive quantifier" + { start min/max quantifier +``` + +* Those that are recognized within square brackets. Outside square brackets. They are also called as character classes. + +``` + + \ general escape character + ^ negate the class, but only if the first character + - indicates character range + [ POSIX character class (only if followed by POSIX syntax) + ] terminates the character class + +``` + +PCRE provides some generic character types, also called as character classes. +``` + \d any decimal digit + \D any character that is not a decimal digit + \h any horizontal white space character + \H any character that is not a horizontal white space character + \s any white space character + \S any character that is not a white space character + \v any vertical white space character + \V any character that is not a vertical white space character + \w any "word" character + \W any "non-word" character +``` + +## Examples + +We will test our examples on following string `66.249.64.13 - - [18/Sep/2004:11:07:48 +1000] "GET /robots.txt HTTP/1.0" 200 468 "-" "Googlebot/2.1"`. It is a standard Apache access log. + +| Regex | Result | Comment | +| :---- | :-------------- | :------ | +| GET | GET | GET matches the characters GET literally (case sensitive) | +| \d+.\d+.\d+.\d+ | 66.249.64.13 | `\d+` match a digit [0-9] one or more times defined by `+` quantifier, `\.` matches `.` literally | +| (\d+\.){3}\d+ | 66.249.64.13 | `(\d+\.){3}` is trying to match group (`\d+\.`) exactly three times. | +| \[.+\] | [18/Sep/2004:11:07:48 +1000] | `.+` matches any character (except newline), `.` is any character | +| ^\S+ | 66.249.64.13 | `^` means start of the line, `\S+` matches any number of non-space characters | +| \+[0-9]+ | +1000 | `\+` matches the character `+` literally. `[0-9]` character class means single number. Same can be achieved using `\+\d+` | + +All these examples can be tried at https://regex101.com/ + +1. Copy the example string in `TEST STRING` section +2. Copy regex code in `Regular Expression` section +3. The web application will show the matching result + + +## Further Reading + +