In this article, we discuss linux filters used to process text data so as to produce useful information. This involve commands like cat, tac, od, wc, head, tail, sort and cut.
Table of contents.
Text filtering is the process of taking an input stream of text and performing conversions on it before sending it to the output stream.
A filter is a program that will read standard input(file) and performs an operation on it then outputs the result to the output stream.
These filters are smaller programs which perform only a single task, we can view them as building blocks which can combine and use to build anything.
A stream is a sequence of bytes that can be read or written using functions which hide the details of the underlying device from the application. Streams can be stdin, stdout, stderr which represent the standard input, output and error streams respectively.
Throughout this article we shall use the following text file so be sure to create one,
id|firstname|lastname|email|age|profession 100|Alameda|Ricarda|Alameda.Ricarda@yopmail.com|36|police officer 101|Nanete|Nadia|Nanete.Nadia@yopmail.com|35|developer 102|Kore|Malanie|Kore.Malanie@yopmail.com|59|driver 103|Lynde|Anton|Lynde.Anton@yopmail.com|51|developer 104|Neila|Gwenore|Neila.Gwenore@yopmail.com|44|doctor 105|Hyacinthe|Ahab|Hyacinthe.Ahab@yopmail.com|53|developer 106|Christian|Emmy|Christian.Emmy@yopmail.com|35|firefighter 107|Selma|Friede|Selma.Friede@yopmail.com|27|police officer 108|Helena|Lewes|Helena.Lewes@yopmail.com|39|driver 109|Kristan|Donell|Kristan.Donell@yopmail.com|32|doctor 110|Nonnah|Neva|Nonnah.Neva@yopmail.com|26|police officer 111|Leontine|Pauly|Leontine.Pauly@yopmail.com|41|doctor 112|Gui|Ovid|Gui.Ovid@yopmail.com|40|firefighter 113|Marita|Ventre|Marita.Ventre@yopmail.com|27|driver 114|Gisela|Redmond|Gisela.Redmond@yopmail.com|46|police officer
cat is short for concatenate, this command is used to display file contents without having to open the file.
To view the contents of test.txt we write,
When no file has been specified, cat will read from stdin, for example, type cat without any input and continue typing input to see what happens.
You can also write to a file without opening it by writing,
cat > test2.txt
After executing the command, type something into the stdin, cat will read input and redirect it into the test2.txt file, now test2.txt will have the input from stdin stream.
You can stop this by ctrl+c.
To concatenate two files we can write,
cat test.txt test2.txt
We can number the concatenation of the two files by using the -n option,
cat -n test.txt test2.txt
We can also use tac command instead of cat.
We can also reverse all text in a file with tac by using the -r regex option and -s separator option.
To reverse a file we write,
tac -r -s 'x\|[^x]' test.txt
od stands for octal dump, this command is used to display contents of a file in different formats such as octal, hexadecimal, ASCII characters, decimals, .
To display a file in octal format we write,
od -b test.txt
To display a file in hexadecimal format we write,
od -x test.txt
To display a file in ASCII characters we write,
od -c test.txt
To display a file in decimals, we write,
od -d test.txt
wc stands for word count, this command counts the words, characters and lines in a file.
To get the word count we write,
By default this command will give all three counts, that is words, characters and lines however we can use options to control this.
To get the number of lines we write,
wc -l test.txt
For the number of words we use the -w option and for the number of characters we use -c option.
The head command displays the first n lines of a text file. The default number of lines in 10 if n is not specified.
To display the first 10 lines we write,
To display the first 3 lines we write,
head -3 test.txt
The tail command works the same as the head command but in reverse, that is it displays lines from the bottom of a text file.
To display the last 10 lines(default), we write,
To display the last 4 lines in the text file we write,
tail -4 test.txt
We can also sort output alphabetically using the sort command.
To sort the file test.txt we write,
We can also decide to sort a specific column by writing,
sed 's/|/ /g' text.txt | sort -k2
We have used sed command to replace the | characters with white spaces, then piped the output to sort by the second column which is the first names.
This is an small example to show how the smaller commands can be combined to achieve amazing results.
Given a file of numbers we can sort it numerically using the -n option,
sort -n numericalFile.txt
The cut command is used to select a specific column of a file assuming a text file is separated by columns.
It cuts a specified section by byte position, characters and field and writes it to the stdout stream.
We specify delimiters to tell the command how the sections are separated, in our case the test.txt file columns are separated by '|'.
To get all professions (column 6) we write,
cut -d '|' -f 6 test.txt
We can also cut by using space as a delimiter,
sed 's/|/ /g' text.txt | cut -d ' ' -f 4
Since our file is separated by '|' we use sed to replace them with '' then apply the cut command. The output is all email addresses from the fourth column.
We can also cut by by byte by using the -b option, that is if we want only the first initials of peoples first names from test.txt we can write,
cut -b 5 text.txt
Can you sort the output from the above command?
We can also cut by character by using the -c option as follows,
To cut the first three characters from the first name we write,
cut -c 5-8 test.txt
Or to cut the fifth and ninth character we can write,
cut -c 5,9 test.txt
We can also comma-separate the value to cut as many values as we need, e.g
5, 10, 15 cuts the fifth, tenth and fifteenth characters.
We can also cut by complement pattern,
To cut the test.txt by the third characters we can write,
cut --complement -c 3 text.txt
And, to cut it by the first four characters(id and '|') so as to get a clean output, we can write,
cut --complement -c 1-4 text.txt
Filters can be used to process information in very useful ways by restructuring output to generate useful information or text modifications.
Note that some of these commands, maybe all, can be executed in git bash which can be installed in a windows environment.
- For each of the commands you can type command --help for reference.