Robert Elder Software Inc.
  • Home
  • Store
  • Blog
  • Contact
  • Home
  • Store
  • Blog
  • Contact
  • #linux
  • |
  • #commandline
  • |
  • #softwareengineering
  • |
  • #embeddedsystems
  • |
  • #compilers
  • ...
  • View All >>

How Regular Expression Alternation Works

2020-08-18 - By Robert Elder

     This article is part of Series On Regular Expressions.

What is 'Alternation'?

     Here is an example of a regular expression that uses an alternation.

cat|dog

     This will match match a piece of text that contains either the word 'cat' or the word 'dog'.  You could say that 'alternation' is used whenever there is more than one alternative for the string you want to match.

A Pet Food Example

     Here is a diagram that shows the control flow graph of how a regular expression engine could interpret this regex.  In this diagram, the regular expression matcher starts at the 'BEGIN' node and the moves to the 'SPLIT' node where it effectively splits control into two possible paths.  First it tries to match the path with the word 'cat'.  If this fails, it tries the other path with the word 'dog'.  If neither path has a match, control moves to the 'FAILURE' node and the search will either re-start and continue at next character, or fail completely if it's already reached the end of the text being searched.

     You can also specify more than two possible alternatives by separating them all with the vertical line or 'pipe' symbol:

cat|dog|mouse|snake|bird

     Since alternation can be thought of as a list of alternatives, it's worth comparing it to the character class.  As we saw in the first part of this series, the character class can be thought of as a list of possible characters that can appear at a specific position in our string.  Regardless of what you put inside a character class, the entire class can only match a single character in your search string.  With alternation, each possible alternative on either side of the pipe symbol can be any number of characters.

     Because the alternatives in an alternation can be a variable number of characters, it's common to use parentheses to define the start and end position for the list of alternatives.  For example, this regular expression:

For sale: (cat|dog|mouse|snake|bird) food

     will match any of the following strings:

"For sale: cat food"
"For sale: dog food"
"For sale: mouse food"
"For sale: snake food"
"For sale: bird food"

     However, if we didn't use the parentheses and wrote this regex instead:

For sale: cat|dog|mouse|snake|bird food

     The first and last choices in the alternation would have included the very start and the very end of the regular expression, which isn't what we want here.  The presence of spaces does not break up the alternation since they're treated as just another part of the pattern that will be searched for.  Here is a list of strings that would match this alternation:

"For sale: cat"
"dog"
"mouse"
"snake"
"bird food"

     Any part of a regular expression that's inside parentheses is commonly referred to as a sub-expression or also as a group.  Sub-expressions also have a number of other important features and uses that will not be discussed at this time.

     A very common use case for alternation is to match suffixes or prefixes in words.  For example, this regular expression will match several different variations of the word 'employ'.

employ(er|ee|ment|ing|able)

Watch Out For Prefixes

     At its core, alternation is actually quite simple.  In fact, it's probably correct to say that it's easier to understand alternation that it is to understand character classes.  There is however, one caveat that you should be aware of in order to avoid surprises.  If one of the choices in your alternation is a prefix another, your regex may not always match the same thing in every regex engine.

     For example, consider this alternation:

cat|dog|mouse|snake|bird|caterpillar

     It contains both the word 'cat' and the word 'caterpillar'.  Since 'cat' is prefix of the word 'caterpillar' there can be ambiguity about which alternative to match.

     In most modern regular expression engines, the part of the alternation that will match is simply the first choice that matches, starting on the left and moving towards the right.  This means that the order in which you specify options in the alternation may affect which option will be chosen.  For example, if we pass the text "The cat likes to play with the caterpillar." with the regex shown above through grep using Perl-compatible regular expression mode, it will only match the word 'cat' twice:

echo "The cat likes to play with the caterpillar" | grep -Po 'cat|dog|mouse|snake|bird|caterpillar'

     outputs the following:

cat
cat

     However, if we re-arrange the ordering of the alternation, so that 'caterpillar' is the first option we'll be able to match the word 'cat' and the word 'caterpillar':

echo "The cat likes to play with the caterpillar" | grep -Po 'caterpillar|dog|mouse|snake|bird|cat'

     outputs the following:

cat
caterpillar

POSIX Alternation Gives Different Results

     Now, if we change from using Perl-compatible mode to POSIX-based extended regular expression mode (by changing grep's command-line flag from -P to -E), the result is the same no matter what the ordering is of the choices in the alternation:

echo "The cat likes to play with the caterpillar" | grep -Eo 'cat|dog|mouse|snake|bird|caterpillar'

     outputs the following:

cat
caterpillar
echo "The cat likes to play with the caterpillar" | grep -Eo 'caterpillar|dog|mouse|snake|bird|cat'

     outputs the following:

cat
caterpillar

     This is because POSIX based regular expression engines are required to always try all possible options in an alternation and pick the longest match when searching from the leftmost position in the search text.

     Since all of these details can be confusing and hard to remember, if you simply make sure that the longest possible alternation choices are placed first in the alternation, then you should be able to avoid this ambiguity even when you don't exactly know whether you're using a Perl-like or POSIX-line regular expression engine.  If possible, I would also recommend avoiding POSIX based regular expression engines when you can.  The POSIX regular expression standards are much older and less well-defined than newer Perl based ones.

     This article is part 3 of a Series On Regular Expressions.  You can read Part 4 on quantifiers here.

The Regular Expression Visualizer, Simulator & Cross-Compiler Tool
The Regular Expression Visualizer, Simulator & Cross-Compiler Tool
Published 2020-07-09
Regular Expression Laptop Stickers
$20.00 CAD
Regular Expression Laptop Stickers
Regular Expression Character Escaping
Regular Expression Character Escaping
Published 2020-11-20
How Do Regular Expression Quantifier Work?
How Do Regular Expression Quantifier Work?
Published 2020-08-18
Interesting Regular Expression Test Cases
Interesting Regular Expression Test Cases
Published 2020-07-09
Character Ranges & Class Negation in Regular Expressions
Character Ranges & Class Negation in Regular Expressions
Published 2020-05-31
Guide To Regular Expressions
Guide To Regular Expressions
Published 2020-07-09
Character Classes in Regular Expressions - A Gentle Introduction
Character Classes in Regular Expressions - A Gentle Introduction
Published 2020-05-10
Join My Mailing List
Privacy Policy
Why Bother Subscribing?
  • Free Software/Engineering Content. I publish all of my educational content publicly for free so everybody can make use of it.  Why bother signing up for a paid 'course', when you can just sign up for this email list?
  • Read about cool new products that I'm building. How do I make money? Glad you asked!  You'll get some emails with examples of things that I sell.  You might even get some business ideas of your own :)
  • People actually like this email list. I know that sounds crazy, because who actually subscribes to email lists these days, right?  Well, some do, and if you end up not liking it, I give you permission to unsubscribe and mark it as spam.
© 2025 Robert Elder Software Inc.
SocialSocialSocialSocialSocialSocialSocial
Privacy Policy      Store Policies      Terms of Use