Skip to main content

Table Of Contents

  • Regular Expressions

Regular Expressions

What is a Regular Expression

Searching with Regular Expressions

  • 4 primary components
    • character classes
      • character set | [ABC]
      • negated set | [^ABC]
      • range | [A-Z] or [a-z]
      • word | \w
      • digit | \d
      • single character | .
        • any character except new line
      • whitespace | \s
    • quantifiers and alternation
      • + | one or more of the previous token
      • * | zero or more of the previous token
      • {min,max} | range
        • like {0,3}
      • ? | optional
      • | | alternation | one token or an other
    • roots
    • anchors
      • allows us to specify where the match starts and where the match ends
      • ^ beginning of the string/line (line -- if multi line option is enabled)
      • $ | end of the string/line (line -- if multi line option is enabled)
      • in the example
        • it means, that the entire line must match the date
        • if there is other text on the line with the date --> the date will not match
  • examples
    • date
      • ^[A-Z][a-z]{2,}\s+[0,3]?[1-9],\s+[12]?[0-9]{0,3}$
    • number less-or-equal 42
      • case 1 -- digits that start with 4
      • case 2 -- double digits
      • case 3 -- 3 or more digits
      • ^4[2-9]|[5-9]\d|[1-9]\d{2,}$
    • monetary value -- faulty
      • escape dollar sign with \
      • ^\$?\s*[1-9][0-9]{0,2}(,?[0-9]{3})*(\.[0-9]{2})?$
    • monetary value -- fixed
      • ^\$?\s*[1-9][0-9]{0,2}((,[0-9]{3})*|)(\.[0-9]{2})?$
  • examples | using with grep
    # won't work properly -- \d is not recognized
    grep -E '^4[2-9]|[5-9]\d|[1-9]\d{2,}$' numbers.txt
    # replace with it's class
    grep -E '^4[2-9]|[5-9][0-9]|[1-9][0-9]{2,}$' numbers.txt
    # or
    grep -E '^4[2-9]|[5-9][:digit:]|[1-9][:digit:]{2,}$' numbers.txt

Replacing with Regular Expressions | on regexr

  • for the whole match | $&
  • for partial matches | capturing groups --> add () to the groups you want to match and later on refer to
    • var 1 --> $1
    • var2 --> $2
    • replacing day,year,month --> $3-$1-$2

Tips on Building Regular Expressions

  • regular expressions are greedy
    • add an ? after * or + to make it lazy
  • don't build an expression all at once
    • build a piece, then test it --> repeat
    • use multiple, simpler expressions
  • test with valid and invalid data
  • add comments using x modifier