Regular expressions
- File size
- 7.4KB
- Lines of code
- 140
Regular expressions
The bane of every programmer's existence.
Intro
- Abbreviated as regex or regexp
- Powerful pattern-matching for specific character combinations in strings
- Used for searching, editing and manipulating strings
Quickstart
- Regular expressions consist of both literal characters and metacharacters
- Literal character: character literals (eg.
a,b,c,1,2,3) - Metacharacter: special character with a specific meaning (eg.
.,^,$,*)
- Literal character: character literals (eg.
Literal character
- Alphanumeric characters:
a,b,c, ... and1,2,3, ... - Whitespace characters: spaces, tabs, newline, ... (unless explicitly escaped)
- Punctuation and symbols:
@,#,$, ... (unless they are already designated metacharacters)
Metacharacters
.: matches any character except the newline character^: matches the start of a string$: matches the end of a string*: matches 0 or more repetitions of the preceding element+: matches 1 or more repetitions of the preceding element?: matches 0 or 1 repetition of the preceding element{n}: matches exactlynrepetitions of the preceding element{n,}: matchesnor more repetitions of the preceding element{n,m}: matches betweennandmrepetitions of the preceding element[]: matches any one character within the square brackets|: logical OR operator\: escape character that specifies the escape of a regex metacharacter (treating the metacharacter as a literal character)(): groups multiple tokens together to create a capture group for extracting substrings\d: matches any digit (equivalent to[0-9])\D: matches any non-digit (equivalent to[^0-9])\w: matches any alphanumeric character and the_underscore (equivalent to[a-zA-Z0-9_])\W: matches any character that is not a word character (equivalent to[^a-zA-Z0-9_])\s: matches any whitespace character (equivalent to[ \t\n\r\f\v])\S: matches any character that is not a whitespace character (equivalent to[^ \t\n\r\f\v])\b: matches a position between a word character and a non-word character (word boundary)\B: matches a position that is not a word boundary (non-word boundary)(?:): groups multiple tokens together without creating a capture group (non-capturing group)(?=): asserts that a group of characters can be matched to the right of the current position without including it in the match (positive lookahead)(?!: asserts that a group of characters cannot be matched to the right of the current position (negative lookahead)(?<=): asserts that a group of characters can be matched to the left of the current position (positive lookbehind)(?<!: asserts that a group of characters cannot be matched to the left of the current position (negative lookbehind)
Worked example
# ----- WORKED EXAMPLE -----
hello # this matches the exact string "hello"
h.llo # this matches "hello", "hallo", "hxllo", etc.
^hello # this matches "hello" only if it's at the start of a line
world$ # this matches "world" only if it's at the end of a line
a* # this matches "a", "aa", "aaa", etc., including an empty string
a+ # this matches "a", "aa", "aaa", etc., but not an empty string
a? # this matches "a" or an empty string
a{3} # this matches exactly "aaa"
a{2,4} # this matches "aa", "aaa", or "aaaa"
[abc] # this matches "a", "b", or "c"
[^abc] # this matches any character except "a", "b", or "c"
[a-z] # this matches any lowercase letter
[A-Z] # this matches any uppercase letter
[0-9] # this matches any digit
(ab|cd) # this matches "ab" or "cd"
(grape|apple)s # this matches "grapes" or "apples"