In this tutorial, I will go over two common functions, re.findall() and re.search(), that I find useful to search through texts with.

1.Create your regular expression pattern

  • I usually use regex101.com to check if my regular expression pattern matches the text I want

  • A screenshot of example usage: example

  • Some common tokens I use

    • . : a single character
    • \s : any white space
    • \d : Andy digit
    • \b : Any word boundary
    • \w : any word character
    • a|b : match either a or b
    • (...): capture the ... part into groups
    • a?: zero or one of a
    • a* : zero or more of a
    • a+: one or more of a
    • a{8} : exactly 8 of a

2. Use re.findall() and re.search() to find your matches

  • re is a module in the python standard library that provides regular expression matching operation
  • Two of the most common functions I use are re.findall() and re.search()
import re



re.findall()

  • I use this function when I need to find every matches in the text
  • If multiple groups are present, it returns a list of tuples of strings matching the group

Example:

text = 'The result depends on the number of capturing groups in the pattern. If there are no groups, return a list of strings matching the whole pattern.'
match = re.findall(r'\b(\w+)\b\s(groups)',text)
match
[('capturing', 'groups'), ('no', 'groups')]



re.search()

  • This function returns the first match found in the text
  • I use this function when I only have to check whether there is a match in the text

Example:

match = re.search(r'\b(\w+)\b\s(groups)',text)
match
<re.Match object; span=(36, 52), match='capturing groups'>

If you want to check the matching groups of re.search(), you can do:

match.group(0),match.group(1), match.group(2)
('capturing groups', 'capturing', 'groups')



If you're interested in more background reading on regular expressions, here's an excellent chapter by Dan Jurafsky and James H. Martin.

Reference