AWK by Example
This post is part of a multi-part series. While each of these posts was designed to be self-contained, you might be interested in reading about Grep and Sed, the other two tools covered previously.
We are continuing our multipart series on command-line text processing tools. In part 1, we explored Grep, which allowed us to search text. In part 2, we explored Sed, which allowed us to modify text. Now, we'll explore AWK, which allows us to process text in more sophisticated ways.
AWK is a versatile programming language designed for text processing and typically used as a data extraction and reporting tool. In this post, we'll work through several examples.
As before, whenever "rhyme.txt" is referenced, assume it contains the following content:
Hickory dickory dock
The mouse ran up the clock
The clock struck one
The mouse ran down
Hickory dickory dock
-- "Hickory, Dickory, Dock" (public domain)
Example 1: Print Specific Fields
By default, AWK splits each line into fields based on whitespace. The fields can be referenced using $1, $2, etc. For example, to print the second word of each line:
awk '{print $2}' rhyme.txt
Which would output:
dickory
mouse
clock
mouse
dickory
Example 2: Print Lines Matching a Pattern
Like Grep, AWK can search for patterns:
awk '/mouse/' rhyme.txt
Which would output:
The mouse ran up the clock
The mouse ran down
Example 3: Field Separators
You can specify a different field separator using the -F option. For example, if you had a CSV file:
awk -F',' '{print $2}' data.csv
Example 4: Conditional Actions
AWK allows you to specify conditions:
awk 'length($0) > 20' rhyme.txt
This would print lines longer than 20 characters:
The mouse ran up the clock
Example 5: Built-in Variables
AWK provides several built-in variables:
- NR: Current line number
- NF: Number of fields in current line
- FS: Field separator (same as -F)
- RS: Record separator
For example, to print line numbers:
awk '{print NR ": " $0}' rhyme.txt
Would output:
1: Hickory dickory dock
2: The mouse ran up the clock
3: The clock struck one
4: The mouse ran down
5: Hickory dickory dock
Example 6: Begin and End Blocks
AWK provides special blocks that run before and after processing:
awk 'BEGIN {print "Start"} {print $0} END {print "Done"}' rhyme.txt
Example 7: Arithmetic Operations
AWK can perform calculations:
awk '{sum += NF} END {print "Average words per line:", sum/NR}' rhyme.txt
This would calculate the average number of words per line.
Example 8: Multiple Commands
You can run multiple commands by separating them with semicolons:
awk '{print $1; print $2}' rhyme.txt
Example 9: Regular Expressions
AWK supports regular expressions for pattern matching:
awk '/^The/ {print "Found:", $0}' rhyme.txt
This would print lines starting with "The":
Found: The mouse ran up the clock
Found: The clock struck one
Found: The mouse ran down
Example 10: Writing Functions
AWK allows you to define functions:
awk '
function capitalize(str) {
return toupper(substr(str,1,1)) substr(str,2)
}
{
print capitalize($1)
}' rhyme.txt
This would capitalize the first word of each line.
Conclusion
These examples demonstrate some of AWK's capabilities. While Grep is great for searching and Sed is great for substitutions, AWK provides a complete programming language for text processing. This makes it particularly well-suited for more complex text processing tasks.
Further Reading:
- http://man7.org/linux/man-pages/man1/gawk.1.html
- http://www.grymoire.com/Unix/AwkRef.html