×

Search anything:

Awk command in Linux

Free book on Dynamic Programming

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we discuss awk, a powerful scripting language for advanced text processing

Table of contents.

  1. Introduction.
  2. Syntax.
  3. Processing file contents.
  4. Processing columns.
  5. Processing Column Lines.
  6. Processing lines with specific patterns.
  7. Regex matching with awk.
  8. Arithmetic with awk.
  9. Filtering with awk.
  10. Comparisons with awk.
  11. Loops in awk.
  12. Summary.
  13. References.

Introduction.

Awk can be viewed as a general-purpose scripting language built for advanced text processing.
It is data-driven meaning that we define a set of actions which are to be performed against a data set.
Given a set of data we can write scripts to grab certain columns, rows, fields, search and replace patterns in a text.

The text file(test.txt)

id,firstname,lastname,email,email2,profession,metric1,metric2,metric2
100,Theodora,Lubin,Theodora.Lubin@yopmail.com,Theodora.Lubin@gmail.com,worker,135,32,582
101,Merci,Celestine,Merci.Celestine@yopmail.com,Merci.Celestine@gmail.com,worker,288,50,636
102,Layla,Braun,Layla.Braun@yopmail.com,Layla.Braun@gmail.com,worker,486,28,998
103,Elise,Laverne,Elise.Laverne@yopmail.com,Elise.Laverne@gmail.com,worker,279,27,971
104,Viki,Suk,Viki.Suk@yopmail.com,Viki.Suk@gmail.com,worker,343,42,573
105,Isa,Lubin,Isa.Lubin@yopmail.com,Isa.Lubin@gmail.com,doctor,202,33,586
106,Chloris,Sinegold,Chloris.Sinegold@yopmail.com,Chloris.Sinegold@gmail.com,worker,288,2,828
107,Kassey,Amadas,Kassey.Amadas@yopmail.com,Kassey.Amadas@gmail.com,doctor,460,25,663
108,Almeta,Colleen,Almeta.Colleen@yopmail.com,Almeta.Colleen@gmail.com,firefighter,485,17,643
109,Calla,Doig,Calla.Doig@yopmail.com,Calla.Doig@gmail.com,worker,278,23,543
110,Gerianna,Orelee,Gerianna.Orelee@yopmail.com,Gerianna.Orelee@gmail.com,developer,294,2,762

Syntax.

awk options '{action}' file.txt

Processing file contents.

To print data in the text file we use the command below.

awk '{print $0}' test.txt

This command works the same as the cat command.

We can also number the lines of text from the by using the NR built-in variable.

awk '{print NR, $0}' test.txt

Now the output is the file with all lines numbered, this may be useful i.e for counting the number of lines in a text file,

When dealing with thousands of records and may not want to print all lines but just count these records.

Therefore, to get the total number of lines from two files, file1 and file2 we write.

awk 'END {print NR}' file1.txt file2.txt

We can opt to print the file ignoring the first n characters, in this case 4, by writing the following,

awk '{print substr($0, 4)}' test.txt

We can also print out lines in a specific range by writing,

awk 'NR==4, NR==8 {print NR " " $0}' test.txt 

Other built-in variables include NF, FS, OFS, ORS.

Processing columns.

Now from our file test.txt we can see that the text is in csv format, we could process it as it is using awk but for readability, we convert it into a format with rows and columns, therefore we replace all commas with spaces by writing and redirecting the output into another file output.txt.

awk '{gsub(/,/," ");print}' test.txt > output.txt

output.txt

id firstname lastname email email2 profession metric1 metric2 metric2
100 Theodora Lubin Theodora.Lubin@yopmail.com Theodora.Lubin@gmail.com developer 135 32 582
101 Merci Celestine Merci.Celestine@yopmail.com Merci.Celestine@gmail.com worker 288 50 636
102 Layla Braun Layla.Braun@yopmail.com Layla.Braun@gmail.com worker 486 28 998
103 Elise Laverne Elise.Laverne@yopmail.com Elise.Laverne@gmail.com developer 279 27 971
104 Viki Suk Viki.Suk@yopmail.com Viki.Suk@gmail.com worker 343 42 573
105 Isa Lubin Isa.Lubin@yopmail.com Isa.Lubin@gmail.com doctor 202 33 586
106 Chloris Sinegold Chloris.Sinegold@yopmail.com Chloris.Sinegold@gmail.com worker 288 2 828
107 Kassey Amadas Kassey.Amadas@yopmail.com Kassey.Amadas@gmail.com doctor 460 25 663
108 Almeta Colleen Almeta.Colleen@yopmail.com Almeta.Colleen@gmail.com firefighter 485 17 643
109 Calla Doig Calla.Doig@yopmail.com Calla.Doig@gmail.com worker 278 23 543
110 Gerianna Orelee Gerianna.Orelee@yopmail.com Gerianna.Orelee@gmail.com developer 294 2 762

Now we can use awk to get specific columns i.e getting all gmail email addresses.

awk '{print $5}' output.txt 

$5 represents the fifth field which is email2 and this is whereby all gmail addresses are located.

We call also opt to get multiple columns, e.g to get the first name and the persons profession, we can use the following command where $2 is the second columns and $6 the sixth column.

awk '{print $2, $6}' output.txt

Processing column lines.

In awk we can also specify a line in a specific column.

An example
Suppose we want to print the first five lines in the profession's column. We can write the following.

awk '{print $6}' output.txt | head -5

The first part of the command awk '{print $6}' will print the sixth column which is of profession then we pipe(|) to the head command which takes and argument -5 meaning select the first lines.

Processing lines with specific pattern.

We can use awk to print lines with specific patterns.

An example
To print all people who work as developers, we can use the following command.

awk '/developer$/' output.txt

We use the $ character after the pattern to indicate how a line ends.

Another example
To print all people who are not doctors, we can write,

awk '! /developer$/' output.txt

The ! character is used as a negation, where by it matches all lines which DON'T end in the string developer.

Another example
We can also use match in awk to match characters in a line by writing,

awk 'match($0, /d/) {print $0 " CHARACTER d FOUND at " RSTART}' output2.txt

RSTART is used to get the index location of a character.

Regex matching with awk.

Awk can also be used to get strings/text matching a specified pattern.

An example
To get all people whose first names start with the letter J we can write the following,

awk '/J/{print $0}' output.txt

We can also print their first names and professions by writing,

awk '/J/{print $2, $6}' output.txt

Characters.

The following are characters commonly used for regular expressions matching.

Characters Description
* Matches a sequence of zero or more instances of the previous characters.
. Matches any single character including a new line.
$ Matches at the end of the regular expression/pattern space.
^ Acts as a special character and matches the beginning of a regular expression/pattern space.
! Negation applies the command to lines that do not apply to the input.

Arithmetic with awk.

We can use awk to calculate sums of columns whose values are integers.

An example
To calculate the sum of all values in column #8, we could write,

awk '{x += $8} END {print x}' output.txt 

Filtering with awk.

We could filter text based on a condition with awk.
An example
To filter all people whose first name's length is less than five we could write,

awk 'length($2) < 5' output2.txt

Comparison operators in awk.

We can compare text and print it.
An example
To print people whose metric2 number is greater than, for example 20 we could write the following.

awk '$8 > 20 {print $0}' output.txt

Another example
To get all people whose profession is developer using if statements we could write,

awk '{if($6 == "developer") print $0}' output.txt

Loops in awk.

In awk just like any other language there is looping,
An example
For printing squares of numbers from 10 to 20, we could write,

awk 'BEGIN {for(i=10; i<=20; i++) print i "*" i " is " i*i;}'

Summary.

We have discussed some introductory awking although this is not the end, there is a a lot more that can be done using this command.
Awk is a powerful text scripting tool, combined with other tools and languages, your imagination is your only limit.

References.

  1. Run the command man awk in a linux system for the awk manual page.
Awk command in Linux
Share this