Use The 'head' Command To Get Ahead In Business
2021-04-08 - By Robert Elder
Introduction
This article is about how to use the 'head' command. Here is an overview of how you can use the 'head' and 'tail' commands to output different parts of a file:
Head Command | Tail Command |
Use The Following Commands to output the first 3 lines of the file/steam:
business-tips.txt (green lines will be printed as output)
1) Buy low, sell high. 2) Invest with a plan. 3) Follow your heart. 4) Find your passion. 5) Believe in yourself. 6) Take risks, be bold. 7) Play it safe, and be considerate. 8) Realize that there are no limits. 9) Realize that you have limits. 10) You can rest when you're dead. 11) Take time to care for yourself. 12) Ignore what negative people say. 13) Listen to constructive criticism. |
Use The Following Commands to output the last 3 lines of the file/steam.
business-tips.txt (green lines will be printed as output)
1) Buy low, sell high. 2) Invest with a plan. 3) Follow your heart. 4) Find your passion. 5) Believe in yourself. 6) Take risks, be bold. 7) Play it safe, and be considerate. 8) Realize that there are no limits. 9) Realize that you have limits. 10) You can rest when you're dead. 11) Take time to care for yourself. 12) Ignore what negative people say. 13) Listen to constructive criticism. |
Use The Following Command to output the 'head' or beginning of the file/stream with the last 3 lines omitted:
business-tips.txt (green lines will be printed as output)
1) Buy low, sell high. 2) Invest with a plan. 3) Follow your heart. 4) Find your passion. 5) Believe in yourself. 6) Take risks, be bold. 7) Play it safe, and be considerate. 8) Realize that there are no limits. 9) Realize that you have limits. 10) You can rest when you're dead. 11) Take time to care for yourself. 12) Ignore what negative people say. 13) Listen to constructive criticism. |
Use The Following Command to output the 'tail' end of the file/stream starting with the 3rd line continuing until the end:
business-tips.txt (green lines will be printed as output)
1) Buy low, sell high. 2) Invest with a plan. 3) Follow your heart. 4) Find your passion. 5) Believe in yourself. 6) Take risks, be bold. 7) Play it safe, and be considerate. 8) Realize that there are no limits. 9) Realize that you have limits. 10) You can rest when you're dead. 11) Take time to care for yourself. 12) Ignore what negative people say. 13) Listen to constructive criticism. |
The table above shows the different ways that you can use the 'head' and 'tail' commands to output the beginning and end of a file. You can get different results by passing a positive or negative number to these commands, although not all combinations are POSIX compliant. Therefore, they may not be supported in all implementations of head/tail. In the examples shown above, the green colour text is what would get printed, and the gray text is the part that would be omitted.
The Simplest Use Of The 'head' Command
You can use the 'head' command to print out the 'head', or beginning of a file or stream. If we start with the following file 'business-tips.txt':
1) Buy low, sell high.
2) Invest with a plan.
3) Follow your heart.
4) Find your passion.
5) Believe in yourself.
6) Take risks, be bold.
7) Play it safe, and be considerate.
8) Realize that there are no limits.
9) Realize that you have limits.
10) You can rest when you're dead.
11) Take time to care for yourself.
12) Ignore what negative people say.
13) Listen to constructive criticism.
and run the head command like this:
head business-tips.txt
We'll get the following output:
1) Buy low, sell high.
2) Invest with a plan.
3) Follow your heart.
4) Find your passion.
5) Believe in yourself.
6) Take risks, be bold.
7) Play it safe, and be considerate.
8) Realize that there are no limits.
9) Realize that you have limits.
10) You can rest when you're dead.
which shows the first 10 lines in the file. The default number of lines to display is 10, but you can use the '-n' flag to specify a different number. For example, this command:
head -n 5 business-tips.txt
will output the following:
1) Buy low, sell high.
2) Invest with a plan.
3) Follow your heart.
4) Find your passion.
5) Believe in yourself.
which shows the first 5 lines in the file.
Using 'head' With Bytes Instead Of Lines (non-POSIX)
The 'tail' command includes support for the '-c' flag, which lets you specify the number of bytes rather than the number of lines to print out. However, the official POSIX specification for the 'head' command does not include support for this flag. Having said this, the GNU implemenation of the 'head' command does include support for this flag. Here is an example that will print out the first 5 bytes of the file:
head -c 5 business-tips.txt
which produces the following output:
1)
if we pipe this output into xxd we can see each individual byte:
head -c 5 business-tips.txt | xxd
which produces this output:
00000000: 2031 2920 20 1)
Generate Random Data For Testing
A common requirement is to be able to create a file with 'random' data of some fixed length, usually for testing. The head command can help you with this task using the '-c' flag (if it's supported on your system). For example, if you want to generate a file with exactly 123 bytes of random data, you can use this command:
head -c 123 /dev/urandom > random_data.dat
and the resulting file 'random_data.dat' will contain 123 bytes of random binary data. Here's an example of what the data looks like if we output it through xxd:
xxd random_data.dat
provides the following output for (it will be different for you):
00000000: cfb7 6994 9bca a382 2fb7 e398 9caa d353 ..i...../......S
00000010: 85e3 e2aa ecf6 b8f1 678b 8d8b 4445 1e79 ........g...DE.y
00000020: 7583 45e1 c817 fb9f 97bb 3150 906c 967a u.E.......1P.l.z
00000030: de95 660a 6c5b c2ae 999a fcfa 1ff0 5acf ..f.l[........Z.
00000040: 982d 0d11 7438 d001 bdca fca7 d548 22d4 .-..t8.......H".
00000050: 49c0 c610 af20 1912 c891 e0bf 414c 9059 I.... ......AL.Y
00000060: 427f 9d99 3e14 c6df 307d d487 2a35 d4b2 B...>...0}..*5..
00000070: b824 c44a 6f4a 642c 8ff2 ae .$.JoJd,...
Each time you read from '/dev/urandom', you'll get a different sequence of bytes.
Counting From The End Instead Of The Beginning (non-POSIX)
You may encounter situations where you want to extract the 'head' of a file where you don't know in advance how long the 'head' will be. The 'head' command can still help you in these situations because you can specify an offset from the end where you'd like to end the output. For example, let's consider the file 'some_lines.txt' with the following contents:
My Favourite Stonks:
GME
AMC
RKT
PLTR
We can use the 'cat' command with the '-n' flag to add some line numbers for clarity:
cat -n some_lines.txt
which produces this output:
1 My Favourite Stonks:
2 GME
3 AMC
4 RKT
5 PLTR
If we only wanted the first two lines, we would use the head command like this:
cat -n some_lines.txt | head -n 2
which provides this output:
1 My Favourite Stonks:
2 GME
But, if we add a minus sign in front of the two, we get a different interpretation. This use of the 'head' command is not POSIX compatible, but it does work with GNU head:
cat -n some_lines.txt | head -n -2
this prints out the 'head' of the file starting at the first line, continuing until two lines before the end (regardless of the total number of lines):
1 My Favourite Stonks:
2 GME
3 AMC
If we then add some more data to this file:
echo "FOO1" >> some_lines.txt
echo "FOO2" >> some_lines.txt
echo "FOO3" >> some_lines.txt
those lines will now be considered too when we issue the same command:
cat -n some_lines.txt | head -n -2
will output:
1 My Favourite Stonks:
2 GME
3 AMC
4 RKT
5 PLTR
6 FOO1
Finding The 3 Oldest Books
In the article on the sort command, we reviewed an example of how to find the 3 newest books from the following unsorted list:
Tropic of Cancer,Henry Miller,1934
Housekeeping,Marilynne Robinson,1981
Deliverance,James Dickey,1970
The Sun Also Rises,Ernest Hemingway,1926
The Great Gatsby,F. Scott Fitzgerald,1925
The Corrections,Jonathan Franzen,2001
The Berlin Stories,Christopher Isherwood,1946
Call It Sleep,Henry Roth,1935
Slaughterhouse-Five,Kurt Vonnegut,1969
Light in August,William Faulkner,1932
We showed how you can use the sort command to sort each line in the file according according to the three different columns:
sort -t ',' -k 3,3n -k 2,2 -k 1,1 -s books.txt
which produces this output:
The Great Gatsby,F. Scott Fitzgerald,1925
The Sun Also Rises,Ernest Hemingway,1926
Light in August,William Faulkner,1932
Tropic of Cancer,Henry Miller,1934
Call It Sleep,Henry Roth,1935
The Berlin Stories,Christopher Isherwood,1946
Slaughterhouse-Five,Kurt Vonnegut,1969
Deliverance,James Dickey,1970
Housekeeping,Marilynne Robinson,1981
The Corrections,Jonathan Franzen,2001
We can this pipe this directly into the 'head' command and tell it to only print out the first 3 lines:
sort -t ',' -k 3,3n -k 2,2 -k 1,1 -s books.txt | head -n 3
which produces this output:
The Great Gatsby,F. Scott Fitzgerald,1925
The Sun Also Rises,Ernest Hemingway,1926
Light in August,William Faulkner,1932
As you can see above, we've found the 3 oldest books from an unsorted list using a combination of the 'sort' and 'head' commands.
Finding The Least Popular Name
In the article on the 'uniq' command, we reviewed an example of how to find the most popular name from a list of names:
Verity Jayda
Verity Jayda
Verity Jayda
Verity Jayda
Justy Kaiden
Christopher Rene
Christopher Rene
Christopher Rene
Branden McKenna
Branden McKenna
For this example, we'll do the same thing, but this time we'll find the least popular name. The first step is to make sure the list of names was sorted (as a requirement of the 'uniq' command):
cat names.txt | sort
then, the list of names is piped into the 'uniq' command with the '-c' flag which provides a count of the number of occurrences for each name:
cat names.txt | sort | uniq -c
and the output of this is:
2 Branden McKenna
3 Christopher Rene
1 Justy Kaiden
4 Verity Jayda
Now, we can use the 'sort' command again, but this time use numeric sorting to order the list according to which name has the most occurrences:
cat names.txt | sort | uniq -c | sort -n
which provides this output:
1 Justy Kaiden
2 Branden McKenna
3 Christopher Rene
4 Verity Jayda
Now, we just need to use the 'head' command to pick out the first line in the output:
cat names.txt | sort | uniq -c | sort -n | head -n 1
and the output of this command is:
1 Justy Kaiden
And the above name is the least popular name in the list. This process lets go from an unsorted list of many names to a single record that tells what the least popular name is.
The Head Command & The SIGPIPE Signal
Something that you may occasionally encounter with the head command (and be confused by) is the concept of a SIGPIPE signal. The SIGPIPE signal isn't specific to the 'head' command, since you can encounter it with any application. Your operating system will send a 'SIGPIPE' signal to a process whenever that process tries to write to a pipe where all readers of that pipe have closed their read session. The 'head' command is a fairly obvious way to trigger this signal, since the definition of what the head command does is read the first few lines of output from anything, then close and ignore on the rest of its input.
To illustrate this in practice, let's try writing the following simple Python script, 'sigpipe_test.py':
import sys
for i in range(0,int(sys.argv[1])):
print(str(i))
This program simply prints out the number 0 to N-1, where N is the number that you pass to it. For example, this:
python3 sigpipe_test.py 5
will have the following output:
0
1
2
3
4
Now, when I run this command and pipe the results into the 'head' command:
python3 sigpipe_test.py 5 | head
I get this result:
0
1
2
3
4
So far so good, let's run this again with a longer output:
python3 sigpipe_test.py 50 | head
the result is this:
0
1
2
3
4
6
7
8
9
No problems so far, let's run it again with a much larger number of outputs:
python3 sigpipe_test.py 50000 | head
And now the output that I get is this:
0
1
2
3
4
5
6
7
8
9
Traceback (most recent call last):
File "sigpipe_test.py", line 4, in <module>
print(str(i))
BrokenPipeError: [Errno 32] Broken pipe
Now that the output from the python script is substantially bigger, we encounter a condition where the python script is trying to write information to the output fifo, but the process which reads from that same fifo (the head command in this case) has already 'closed' its association to that fifo. The result is that a 'SIGPIPE' signal is sent to the writing process (in this case out Python script).
The root cause of the issue seen above involves a discussion of how Python deals with system signals which is beyond the scope of this article. However, if you're not concerned with properly handling SIGPIPE signals from other sources (such as sockets), you can update the script like this to make the issue go away:
import signal
import sys
# https://docs.python.org/3/library/signal.html#signal.SIGPIPE
# https://docs.python.org/3/library/signal.html#signal.SIG_DFL
# This statement can may cause issues if you're also writing to
# sockets that may trigger SIGPIPE as well:
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
for i in range(0,int(sys.argv[1])):
print(str(i))
And now testing this again:
python3 sigpipe_test_updated.py 50000 | head
we get a much more graceful result:
0
1
2
3
4
5
6
7
8
9
Potentially Unbounded Memory Use
The head command can potentially experience unbounded memory requirements when using the non-POSIX feature of specifying a negative value for the '-n' argument. The reason why becomes obvious when you consider the fact that the 'head' command must be capable of working with streams where the length of the input it not known in advance. If you want to output the entire stream except for the last N lines, you need to be able to store at least N lines in memory so that once you finally detect the end of the stream you'll be able to output what came before it.
In the interest of giving some numbers, here's a contrived example that reads a bunch of lines from /dev/urandom. We then use 'head' to grab the first 10,000,000 lines and pipe them into 'head' again but with a negative value passed to '-n'. This entire pipe is run as a sub-shell through /usr/bin/time so we can get some performance numbers:
/usr/bin/time -v sh -c 'xxd /dev/urandom | head -n 10000000 | head -n -1 | wc -l'
which gives some output that includes these lines:
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:25.90
Maximum resident set size (kbytes): 2152
If we start increasing the number of lines passed to the second head command:
/usr/bin/time -v sh -c 'xxd /dev/urandom | head -n 10000000 | head -n -5000000 | wc -l'
the max memory usage starts to increase:
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.25
Maximum resident set size (kbytes): 337176
And increase...
/usr/bin/time -v sh -c 'xxd /dev/urandom | head -n 10000000 | head -n -10000000 | wc -l'
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.21
Maximum resident set size (kbytes): 670024
From the above output, we can see that a maximum of 670024KiB or 654.320MiB was required by at one time during the running of the above head command. Obviously, 10,000,000 lines is quite a lot, but you should keep this in mind for cases where the argument to the 'head' command might be a variable in a script that could be arbitrarily large. A similar problem can also occur with the the tail command that lets you specify values to '-n' with a '+' sign.
And that's why the 'head' command is my favourite Linux command.
A Surprisingly Common Mistake Involving Wildcards & The Find Command
Published 2020-01-21 |
$1.00 CAD |
A Guide to Recording 660FPS Video On A $6 Raspberry Pi Camera
Published 2019-08-01 |
The Most Confusing Grep Mistakes I've Ever Made
Published 2020-11-02 |
Use The 'tail' Command To Monitor Everything
Published 2021-04-08 |
An Overview of How to Do Everything with Raspberry Pi Cameras
Published 2019-05-28 |
An Introduction To Data Science On The Linux Command Line
Published 2019-10-16 |
Using A Piece Of Paper As A Display Terminal - ed Vs. vim
Published 2020-10-05 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|