Table Of Contents
- Regular Expressions
Regular Expressions
What is a Regular Expression
- recommended site: https://regexr.com
Searching with Regular Expressions
- 4 primary components
- character classes
- character set |
[ABC]
- negated set |
[^ABC]
- range |
[A-Z]
or[a-z]
- word |
\w
- digit |
\d
- single character |
.
- any character except new line
- whitespace |
\s
- character set |
- quantifiers and alternation
+
| one or more of the previous token*
| zero or more of the previous token{min,max}
| range- like
{0,3}
- like
?
| optional|
| alternation | one token or an other
- roots
- anchors
- allows us to specify where the match starts and where the match ends
^
beginning of the string/line (line -- if multi line option is enabled)$
| end of the string/line (line -- if multi line option is enabled)- in the example
- it means, that the entire line must match the date
- if there is other text on the line with the date --> the date will not match
- character classes
- examples
- date
^[A-Z][a-z]{2,}\s+[0,3]?[1-9],\s+[12]?[0-9]{0,3}$
- number less-or-equal 42
- case 1 -- digits that start with 4
- case 2 -- double digits
- case 3 -- 3 or more digits
^4[2-9]|[5-9]\d|[1-9]\d{2,}$
- monetary value -- faulty
- escape dollar sign with \
^\$?\s*[1-9][0-9]{0,2}(,?[0-9]{3})*(\.[0-9]{2})?$
- monetary value -- fixed
^\$?\s*[1-9][0-9]{0,2}((,[0-9]{3})*|)(\.[0-9]{2})?$
- date
- examples | using with
grep
# won't work properly -- \d is not recognized
grep -E '^4[2-9]|[5-9]\d|[1-9]\d{2,}$' numbers.txt
# replace with it's class
grep -E '^4[2-9]|[5-9][0-9]|[1-9][0-9]{2,}$' numbers.txt
# or
grep -E '^4[2-9]|[5-9][:digit:]|[1-9][:digit:]{2,}$' numbers.txt
Replacing with Regular Expressions | on regexr
- for the whole match |
$&
- for partial matches | capturing groups --> add () to the groups you want to match and later on refer to
- var 1 -->
$1
- var2 -->
$2
- replacing day,year,month -->
$3-$1-$2
- var 1 -->
Tips on Building Regular Expressions
- regular expressions are greedy
- add an
?
after*
or+
to make it lazy
- add an
- don't build an expression all at once
- build a piece, then test it --> repeat
- use multiple, simpler expressions
- test with valid and invalid data
- add comments using x modifier