Python RegEx C h e a ts h e e t with Examples
A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. They’re
typically used to find a sequence of characters within a string so you can extract and manipulate them.
For example, the following returns both instances of ‘active’:
import re
pattern = 'ac..ve'
test_string = 'my activestate platform account is now active'
result = re.findall(pattern, test_string)
RegExes are extremely useful, but the syntax can be hard to recall. With that in mind, ActiveState offers
this “cheatsheet” to help point you in the right direction when building RegExes in Python.
Special characters Special sequences
. match any char except newline \A match occurrence only at start of string
(eg., ac..ve) \Z match occurrence only at end of string
^ match at beginning of string match empty string at word boundary (e.g.,
(eg., ^active) \b between \w and \W)
$ match at end of string \B match empty string not at word boundary
(eg, state$) \d match a digit
[3a-c] match any char \D match a non-digit
(ie., 3 or a or b or c) \s match any whitespace char: [ \t\n\r\f\v]
[^x-z1] match any char except x, y, z or 1 \S match any non-whitespace char
A|S match either A or S regex \w match any alphanumeric: [0-9a-zA-Z_]
() capture & match a group of chars \W match any non-alphanumeric
(eg., (8097ba)) \g<id> matches a previously captured group
\ escape special characters (?:A) match expression represented by A
(non-capture group)
Quantifiers A(?=B) match expression A only if followed by B
match 0 or more occurrences A(?!B) match expression A only if not followed by
*
(eg., py*) B
+
match 1 or more occurrences (?<=B)A match expression A only if it follows B
(eg., py+) match expression A only if not preceded by
?
match 0 or 1 occurrences (?<!B)A B
(eg., py?)
(?aiLmsux) where a, i, L, m, s, u, and x are flags:
match exactly m occurrences
{m} (eg., py{3}) a = match ASCII only
i = make matches ignore case
match from m to n occurrences
{m,n} (eg., py{1,3}) L = make matches locale dependent
m = multi-line makes ^ and $ match at the
match from 0 to n occurrences
{,n} (eg., py{,3}) beginning/end of each line, respectively
match m to infinite occurrences
s = makes ‘.’ match any char including newline
{m,} (eg., py{3,}) u = match unicode only
match m to n occurrences, x = verbose increases legibility by allowing
{m,n} but as few as possible (eg., py{1,3}?) comments & ignoring most whitespace
re Module Functions
Besides enabling the above functionality, the ‘re’ module also features a number of popular functions:
re.findall(A, B) match all occurrences of expression A in string B
re.search(A, B) match the first occurrence of expression A in string B
re.split(A, B) split string B into a list using the delimiter A
re.sub(A, B, C) replace A with B in string C
Need more help?
Read the documentation here:
www.activestate.com https://docs.python.org/3/library/re.html