Intro To 'csplit' Command In Linux
2023-11-15 - By Robert Elder
I use the 'csplit' command to split up text files based on certain context. For example, in this file called 'example.txt':
1
2
SEPARATOR
3
4
Running the following 'csplit' command will split up this file into two separate files named 'xx00' and 'xx01':
csplit example.txt '/SEPARATOR/'
The 'xx00' file will contain this text:
1
2
And the 'xx01' file will contain this text:
SEPARATOR
3
4
Splitting A Text File Using 'csplit'
To illustrate a more practical use case for the 'csplit' command, we'll use an example of splitting up a log file. Here, I have a log file named 'productivity.txt' where I keep track of where my time goes, so I can stay productive:
--Monday--
- Client meeting, discuss deliverables for next Monday.
- Play video games (11 hours)
- Watch YouTube videos (4 hours)
--Tuesday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (7 hours)
- Play video games (3 hours)
- Watch YouTube videos (6 hours)
--Wednesday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (16 hours)
- Do Laundry (20 min)
--Thursday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (6 hours)
- Watch YouTube videos (6 hours)
--Friday--
- Review notes from client project meeting. (5 minutes)
- Play video games (11 hours)
- Watch my favourite Twitch E-Girl streamers (2 hours)
--Saturday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (14 hours)
--Sunday--
- Work continuously to meet client deadline (23 hours).
I can use this 'csplit' command to split up this log file into multiple files, one for each day of the week:
csplit productivity.txt '/.*day--/' {*}
The argument '/.*day--/' specifies a regular expression that matches the header for each day of the week, and the '{*}' argument tells the 'csplit' command to try and match the regular expression pattern as many times as possible.
After running the above command, you can see 8 new files that are produced:
ls
productivity.txt xx00 xx01 xx02 xx03 xx04 xx05 xx06 xx07
The contents of these files are as follows:
cat xx00
(no output since the file starts with a separator)
cat xx01
--Monday--
- Client meeting, discuss deliverables for next Monday.
- Play video games (11 hours)
- Watch YouTube videos (4 hours)
cat xx02
--Tuesday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (7 hours)
- Play video games (3 hours)
- Watch YouTube videos (6 hours)
cat xx03
--Wednesday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (16 hours)
- Do Laundry (20 min)
cat xx04
--Thursday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (6 hours)
- Watch YouTube videos (6 hours)
cat xx05
--Friday--
- Review notes from client project meeting. (5 minutes)
- Play video games (11 hours)
- Watch my favourite Twitch E-Girl streamers (2 hours)
cat xx06
--Saturday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (14 hours)
cat xx07
--Sunday--
- Work continuously to meet client deadline (23 hours).
Specify A Custom Prefix For Split Files
You can also use the '-f' flag to change the prefix of the resulting split files:
csplit -f 'day-' productivity.txt '/.*day--/' {*}
ls
day-00 day-01 day-02 day-03 day-04 day-05 day-06 day-07 productivity.txt
Limit Number Of Pattern Matches
If I change the '{*}' to '{3}', only the first 3 pattern matches will result in file splits:
csplit -f 'day-' productivity.txt '/.*day--/' {3}
ls
day-00 day-01 day-02 day-03 day-04 productivity.txt
The last file, 'day-04' will now contain all text from the 4th day until the last:
cat day-04
--Thursday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (6 hours)
- Watch YouTube videos (6 hours)
--Friday--
- Review notes from client project meeting. (5 minutes)
- Play video games (11 hours)
- Watch my favourite Twitch E-Girl streamers (2 hours)
--Saturday--
- Review notes from client project meeting. (5 minutes)
- Watch my favourite Twitch E-Girl streamers (14 hours)
--Sunday--
- Work continuously to meet client deadline (23 hours).
Split Based On Line Number
Here, I have a file called 'some-lines.txt':
This is line 1.
This is line 2.
This is line 3.
This is line 4.
This is line 5.
This is line 6.
This is line 7.
This is line 8.
This is line 9.
This is line 10.
This is line 11.
This is line 12.
This is line 13.
This command will split the file based on line numbers instead of a regex pattern:
csplit -k -f 'part-' some-lines.txt '4' {*}
Running the above command produces the following output:
48
64
66
csplit: ‘4’: line number out of range on repetition 3
34
And the contents of the split files are as follows:
tail -n +1 part-0*
==> part-00 <==
This is line 1.
This is line 2.
This is line 3.
==> part-01 <==
This is line 4.
This is line 5.
This is line 6.
This is line 7.
==> part-02 <==
This is line 8.
This is line 9.
This is line 10.
This is line 11.
==> part-03 <==
This is line 12.
This is line 13.
As you can see from the above output, the file has been split just before every line number that is divisible by 4.
Notes On 'csplit' Regular Expressions
According to the POSIX specification, the 'csplit' command uses 'basic' regular expressions, which are quite limited in practice:
/rexp/[offset]
A file shall be created using the content of the lines from the current
line up to, but not including, the line that results from the evaluation
of the regular expression with offset, if any, applied. The regular
expression rexp shall follow the rules for basic regular expressions
described in XBD Basic Regular Expressions.
And that's why the 'csplit' command is my favourite Linux command.
Intro To 'stty' Command In Linux
Published 2023-10-04 |
$1.00 CAD |
Intro To 'nproc' Command In Linux
Published 2023-07-15 |
Intro To 'comm' Command In Linux
Published 2023-09-06 |
How To Force The 'true' Command To Return 'false'
Published 2023-07-09 |
A Surprisingly Common Mistake Involving Wildcards & The Find Command
Published 2020-01-21 |
A Guide to Recording 660FPS Video On A $6 Raspberry Pi Camera
Published 2019-08-01 |
Intro To 'chroot' Command In Linux
Published 2023-06-23 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|