Awk by Example
This post is part of a multi-part series. While each of these posts was designed to be self-contained, you might be interested in reading about Grep and Sed, the other two tools covered previously.
Of the three utilities that we're exploring, Awk is by far the most powerful and complicated of the bunch. Rather than attempting to document every facet of this tool, we'll be examining a handful of examples, hopefully piquing your curiosity in the process.
Firstly, let's review what we've covered in the previous parts:
- Grep searches patterns, printing lines that match
- Sed performs find and replace operations
Awk is a programming language which focuses on text processing and manipulation. As we'll see in the examples, Awk can do everything that Grep and Sed can do (and more!).
Just as a reminder, as before, when "rhyme.txt" is referenced in an example, please assume it has the following contents:
Hickory dickory dock
The mouse ran up the clock
The clock struck one
The mouse ran down
Hickory dickory dock
-- "Hickory, Dickory, Dock" (public domain)
Example 1: Searching for a Given String
Back in part 1, we saw how Grep can find all occurrences of the word "mouse" in "rhyme.txt," using this command:
grep 'mouse' rhyme.txt
However, the same thing can be accomplished with Awk:
awk '/mouse/ {print}' rhyme.txt
The syntax for Awk is:
awk pattern { action }
However, the default action is "print," so our Awk could be shortened to (closely resembling the grep command):
awk '/mouse/' rhyme.txt
resulting in:
The mouse ran up the clock
The mouse ran down
Example 2: Find and Replace
We also saw in part 2 that if we wanted to change the rhyme to be about cats, we could use the following sed command:
sed 's/mouse/cat/' rhyme.txt
Which would produce the following output:
Hickory dickory dock
The cat ran up the clock
The clock struck one
The cat ran down
Hickory dickory dock
Using Awk, we could use this command:
awk '// {sub(/mouse/, "cat"); print}' rhyme.txt
Let's break that down:
- "//": Sets an empty pattern so every line will match
- "sub(/mouse/, "cat");": Searches for mouse, and if it is found, replaces it with cat
- "print": Prints the results to the screen
However, if no pattern is provided, Awk already matches every line by default. This means that our command could be shortened to:
awk '{sub(/mouse/, "cat"); print}' rhyme.txt
Example 3: Counting Lines and Words
We've demonstrated how to use Awk to emulate grep and sed. However, now let's try something a little more difficult. Let's suppose that we want to know the total number of lines in our rhyme. To accomplish this, we could use the following command:
awk 'BEGIN{count=0} //{count++} END{print "Total:",count,"lines"}' rhyme.txt
Resulting in:
Total: 5 lines
How does this work?
"BEGIN{count=0}"
: Initializes our counter to 0. The "BEGIN" section is executed by Awk before processing any matches against input"//{count++}"
: This matches every line and increments the counter by 1 (as we saw in the previous example, this could also be written simply as"{count++}"
"END{print "Total:",count,"lines"}"
: Prints the result to the screen. The "END" section will run after the file is completely processed
Now, how do we change this command to count words instead of lines?
awk 'BEGIN{count=0} //{count++} END{print "Total:",count,"words"}' RS='\[\[:space:\]\]' rhyme.txt
Which outputs:
Total: 20 words
If you notice, the awk command is essentially the same, with an additional "RS" setting. Awk processes record-by-record, using a predefined separator to define a record's boundary. Setting Awk's Record Separator (RS) to "[[:space:]]" (any white-space) causes Awk to process word-by-word instead of line-by-line.
Conclusion
Awk is an extremely powerful tool and we have only explored a tiny portion of its capabilities. Hopefully, your curiosity has been piqued, and if you want to learn more, check out the references listed below.