Sed by Example
We are in the middle of a multipart series. Each post focuses on one member of the command-line text-processing trifecta: Grep, Sed and Awk. In part 1, we introduced Grep, which allowed us to search and select text. Now, we will explore Sed.
Sed stands for stream editor. As its name implies, it manipulates and edits data files and streams that are piped into the program. In this post, we'll work through several examples.
As before, whenever "rhyme.txt" is referenced, assume it contains the following content:
Hickory dickory dock
The mouse ran up the clock
The clock struck one
The mouse ran down
Hickory dickory dock
-- "Hickory, Dickory, Dock" (public domain)
Example 1: Find and Replace
If we wanted to change the rhyme to be about cats, we could run the following command:
sed 's/mouse/cat/' rhyme.txt
Which would produce the following output:
Hickory dickory dock
The cat ran up the clock
The clock struck one
The cat ran down
Hickory dickory dock
Note that, by default, all lines are returned, even if they don't meet the pattern.
As stated before, Sed is a stream editor. To have it operate on a data stream instead of a file, you can stream the file into Sed using one the following commands:
sed 's/mouse/cat/' < rhyme.txt
or
cat rhyme.txt | sed 's/mouse/cat/'
Both commands will produce the same output.
Example 2: Adding Commas
Let's say that we wanted to add a comma to the end of every line. Using Sed with regular expressions allows us to do this easily:
sed 's/\$/,/' rhyme.txt
Remember that '$' in regular expressions matches the end of the line. In other words, we are replacing the end of the line with a comma, which is how you append content to lines in Sed.
Example 3: Adding Dashes
If we were interested in adding dashes between each word, we might be tempted to use the following regular expression:
sed 's/\\s\*/-/g' rhyme.txt
We would be correct in thinking that "\s" matches any white-space character (which is what we want). However, since "`*`" means match zero or more occurrences, the regular expression would end up matching all characters and would produce the following output:
-H-i-c-k-o-r-y-d-i-c-k-o-r-y-d-o-c-k
-T-h-e-m-o-u-s-e-r-a-n-u-p-t-h-e-c-l-o-c-k
-T-h-e-c-l-o-c-k-s-t-r-u-c-k-o-n-e-
-T-h-e-m-o-u-s-e-r-a-n-d-o-w-n-
-H-i-c-k-o-r-y-d-i-c-k-o-r-y-d-o-c-k-
We actually want to use "+" to match one or more (verses `*`'s zero or more matches):
sed -r 's/\\s+/-/g' rhyme.txt
Which correctly produces:
Hickory-dickory-dock
The-mouse-ran-up-the-clock
The-clock-struck-one
The-mouse-ran-down
Hickory-dickory-dock
Please note:
-
You must run sed with a "-r" flag to enable the extended regular expression syntax, or "+" won't work.
-
Generally, Sed will only match/replace the first occurrence in a line. Since we want our pattern to run multiple times per line, we have to append the "g" mode to the end of the pattern.
Example 4: Grouping Word Pairs
Let's take the previous example one step further. Instead of simply adding dashes, we want to use dashes to group every other word together. For example, instead of producing:
The-mouse-ran-up-the-clock
We want Sed to output:
The-mouse ran-up the-clock
To do this, we have to use the following Sed command:
sed -r 's/(\\w+)\\s+(\\w+)/\\1-\\2/g' rhyme.txt
Which will produce:
Hickory-dickory dock
The-mouse ran-up the-clock
The-clock struck-one
The-mouse ran-down
Hickory-dickory dock
There are two new concepts introduced by our Sed command:
- We define a sub-pattern group using parenthesis. This allows us to refer back to part of the matched pattern without having to refer to the entire match.
- To reference a sub-pattern group, use "\N" where "N" is "1" for the first group, "2" for second group, etc.
Keep Learning!
Sed is insanely powerful! See below for sites where you can learn more.