Workflow
Logo
Learn shortcuts. Work smarter.
Blog / AWK by Example

AWK by Example

Saturday, October 15, 2022

This post is part of a multi-part series. While each of these posts was designed to be self-contained, you might be interested in reading about Grep and Sed, the other two tools covered previously.

We are continuing our multipart series on command-line text processing tools. In part 1, we explored Grep, which allowed us to search text. In part 2, we explored Sed, which allowed us to modify text. Now, we'll explore AWK, which allows us to process text in more sophisticated ways.

AWK is a versatile programming language designed for text processing and typically used as a data extraction and reporting tool. In this post, we'll work through several examples.

As before, whenever "rhyme.txt" is referenced, assume it contains the following content:

Hickory dickory dock

The mouse ran up the clock

The clock struck one

The mouse ran down

Hickory dickory dock

-- "Hickory, Dickory, Dock" (public domain)

Example 1: Print Specific Fields

By default, AWK splits each line into fields based on whitespace. The fields can be referenced using $1, $2, etc. For example, to print the second word of each line:

awk '{print $2}' rhyme.txt

Which would output:

dickory

mouse

clock

mouse

dickory

Example 2: Print Lines Matching a Pattern

Like Grep, AWK can search for patterns:

awk '/mouse/' rhyme.txt

Which would output:

The mouse ran up the clock

The mouse ran down

Example 3: Field Separators

You can specify a different field separator using the -F option. For example, if you had a CSV file:

awk -F',' '{print $2}' data.csv

Example 4: Conditional Actions

AWK allows you to specify conditions:

awk 'length($0) > 20' rhyme.txt

This would print lines longer than 20 characters:

The mouse ran up the clock

Example 5: Built-in Variables

AWK provides several built-in variables:

  • NR: Current line number
  • NF: Number of fields in current line
  • FS: Field separator (same as -F)
  • RS: Record separator

For example, to print line numbers:

awk '{print NR ": " $0}' rhyme.txt

Would output:

1: Hickory dickory dock

2: The mouse ran up the clock

3: The clock struck one

4: The mouse ran down

5: Hickory dickory dock

Example 6: Begin and End Blocks

AWK provides special blocks that run before and after processing:

awk 'BEGIN {print "Start"} {print $0} END {print "Done"}' rhyme.txt

Example 7: Arithmetic Operations

AWK can perform calculations:

awk '{sum += NF} END {print "Average words per line:", sum/NR}' rhyme.txt

This would calculate the average number of words per line.

Example 8: Multiple Commands

You can run multiple commands by separating them with semicolons:

awk '{print $1; print $2}' rhyme.txt

Example 9: Regular Expressions

AWK supports regular expressions for pattern matching:

awk '/^The/ {print "Found:", $0}' rhyme.txt

This would print lines starting with "The":

Found: The mouse ran up the clock

Found: The clock struck one

Found: The mouse ran down

Example 10: Writing Functions

AWK allows you to define functions:

awk '
function capitalize(str) {
  return toupper(substr(str,1,1)) substr(str,2)
}
{
  print capitalize($1)
}' rhyme.txt

This would capitalize the first word of each line.

Conclusion

These examples demonstrate some of AWK's capabilities. While Grep is great for searching and Sed is great for substitutions, AWK provides a complete programming language for text processing. This makes it particularly well-suited for more complex text processing tasks.

Further Reading:

  • http://man7.org/linux/man-pages/man1/gawk.1.html
  • http://www.grymoire.com/Unix/AwkRef.html