Robert Elder Software Inc.
  • Home
  • Store
  • Blog
  • Contact
  • Home
  • Store
  • Blog
  • Contact
  • #linux
  • |
  • #commandline
  • |
  • #softwareengineering
  • |
  • #embeddedsystems
  • |
  • #compilers
  • ...
  • View All >>

Writing A Grep Clone In 9 Lines Of Python

2020-11-02 - By Robert Elder

Introduction

     This purpose of this article is to show how you can build a basic 'grep' clone in less than 9 lines of Python source code.  This simple clone of grep won't contain nearly as many features as the real version, and its output may have a few differences that will be documented below.  The goal here is simply to provide insights into how grep works rather than to re-create a fully functional replacement.

Simple 'grep -P' Clone In 8 Lines Of Python

     The following 8 lines of Python code show how you could write a simple clone of the 'grep' command that is close to the behaviour of grep when run with the '-P' flag:

#  Put this in a file called '1.py'
import sys
import re

for line in sys.stdin:
	regex_pattern = sys.argv[1]
	pattern = re.compile(regex_pattern)
	if pattern.search(line):
		sys.stdout.write(line)

     The code above will read in data from stdin one line at a time and perform a regex search on each line.  The 'sys.argv[1]' part references the first argument that is passed to the python script, which will then be treated as a regular expression.  Here is an example of how you could use this python script to search a file for a regex pattern like '[A-Z]N0':

cat example_data.csv | python 1.py "[A-Z]N0"

     Which should output the following:

Sneakers, MN009, 49.99, 1.11
Shirt, MN089, 8.99, 1.44
Sneakers, KN09, 49.99, 1.11
Shoes, BN009, 449.22, 4.31

     Which is the same output that we get from using grep:

cat example_data.csv | grep -P "[A-Z]N0"

Simple 'grep -Po' Clone In 9 Lines Of Python

     The 'grep' command also has a feature that lets you extract only the matched part of the text. The following 9 lines of Python code show how you could write a simple clone of the 'grep' command that is close to the behaviour of grep when run with the '-Po' flag:

#  Put this in a file called '2.py'
import sys
import re

for line in sys.stdin:
	regex_pattern = "(" + sys.argv[1] + ")"
	pattern = re.compile(regex_pattern)
	m = pattern.search(line)
	if m:
		sys.stdout.write(m.group(1) + "\n")

     The code above works similarly to the first example, except this time, the regex is enclosed in parentheses.  The parentheses create a capture group that lets us extract whatever text matched the regex using 'm.group(1)'.  Here is an example of how you could use this python script to extract all strings that match the regex pattern '[A-Z]N0[^,]*':

cat example_data.csv | python 2.py "[A-Z]N0[^,]*"

     Which should output the following:

MN009
MN089
KN09
BN009

     Which is the same output that we get from using grep:

cat example_data.csv | grep -Po "[A-Z]N0[^,]*"

Caveats

     In the examples above, grep was used with the '-P' flag which causes grep to use 'Perl-Compatible Regular Expressions' (PCRE).  PCRE regular expressions are slightly different from 'Basic Regular Expressions' (the default regex mode used by grep), and 'Extended Regular Expressions' (when grep is used with the -E flag).  Also, not all versions of grep support the -P flag, and even when it is supported, there may be slight differences between grep's implementation and Python's implementation.

     Both of the examples above are very inefficient since they re-compile the regex once for every line in the file, which isn't necessary.  This could be easily fixed by moving the regex compiling outside the for loop, but I've kept it the same for consistency with the video.

     Another difference is that grep doesn't always just read from stdin.  You can also use one or more files as arguments to grep.  The example scripts shown in this article don't support this feature (but it wouldn't be that hard to add).

     Finally, grep supports many other flags and features (too many to list here) and none of these cases are covered by the examples in this article.

References

     Here is the 'example_data.csv' used the the examples above:

item, modelnumber, price, tax
Sneakers, MN009, 49.99, 1.11
Sneakers, MTG09, 139.99, 4.11
Shirt, MN089, 8.99, 1.44
Pants, N09, 39.99, 1.11
Sneakers, KN09, 49.99, 1.11
Shoes, BN009, 449.22, 4.31
Sneakers, dN099, 9.99, 1.22
Bananas, GG009, 4.99, 1.11
The Most Confusing Grep Mistakes I've Ever Made
The Most Confusing Grep Mistakes I've Ever Made
Published 2020-11-02
Terminal Block Mining Simulation Game
$1.00 CAD
Terminal Block Mining Simulation Game
Can You Use 'ed' As A Drop-in Replacement For vim, grep & sed?
Can You Use 'ed' As A Drop-in Replacement For vim, grep & sed?
Published 2020-10-15
Undefined Behaviour With Grep -E
Undefined Behaviour With Grep -E
Published 2020-10-01
A Surprisingly Common Mistake Involving Wildcards & The Find Command
A Surprisingly Common Mistake Involving Wildcards & The Find Command
Published 2020-01-21
A Guide to Recording 660FPS Video On A $6 Raspberry Pi Camera
A Guide to Recording 660FPS Video On A $6 Raspberry Pi Camera
Published 2019-08-01
Why Is It so Hard to Detect Keyup Event on Linux?
Why Is It so Hard to Detect Keyup Event on Linux?
Published 2019-01-10
Use The 'tail' Command To Monitor Everything
Use The 'tail' Command To Monitor Everything
Published 2021-04-08
Join My Mailing List
Privacy Policy
Why Bother Subscribing?
  • Free Software/Engineering Content. I publish all of my educational content publicly for free so everybody can make use of it.  Why bother signing up for a paid 'course', when you can just sign up for this email list?
  • Read about cool new products that I'm building. How do I make money? Glad you asked!  You'll get some emails with examples of things that I sell.  You might even get some business ideas of your own :)
  • People actually like this email list. I know that sounds crazy, because who actually subscribes to email lists these days, right?  Well, some do, and if you end up not liking it, I give you permission to unsubscribe and mark it as spam.
© 2025 Robert Elder Software Inc.
SocialSocialSocialSocialSocialSocialSocial
Privacy Policy      Store Policies      Terms of Use