Linux Sysadmin
Using the sed Editor
By Emmett Dulaney
The sed editor is among the most useful assets in the Linux sysadmin's toolbox, so it pays to understand its applications thoroughly
One of the best things about the Linux operating
system is that it is crammed full of utilities. There are so many
different utilities, in fact, that it is next to impossible to know and
understand all of them. One utility that can simplify life in key
situations is sed. It is one of the most powerful tools in any
administrator's toolkit and can prove itself invaluable in a crunch.
The sed utility is an "editor," but it is unlike
most others. In addition to not being screen-oriented, it is also
noninteractive. This means you have to insert commands to be executed
on the data at the command line or in a script to be processed. When
you visualize it, forget any ability to interactively edit files as you
would do with Microsoft Word or most other editors. sed accepts a
series of commands and executes them on a file (or set of files)
noninteractively and unquestionably. As such, it flows through text as
water would through a stream, and thus sed fittingly stands for stream editor.
It can be used to change all occurrences of "Mr. Smyth" to "Mr. Smith"
or "tiger cub" to "wolf cub." The stream editor is ideally suited to
performing repetitive edits that would take considerable time if done
manually. The parameters can be as limited as those needed for a
one-time use of a simple operation, or as complex as a script file
filled with thousands of lines of editing changes to be made. With very
little argument, sed is one of the most useful tools in the Linux and
UNIX tool chest.
How sed Works
The sed utility works by sequentially reading a
file, line by line, into memory. It then performs all actions specified
for the line and places the line back in memory to dump to the terminal
with the requested changes made. After all actions have taken place to
this one line, it reads the next line of the file and repeats the
process until it is finished with the file. As mentioned, the default
output is to display the contents of each line on the screen. Two
important factors come into play here—first, the output can be
redirected to another file to save the changes; second, the original
file, by default, is left unchanged. The default is for sed to read the
entire file and make changes to each line within it. It can, however,
be restricted to specified lines as needed.
The syntax for the utility is:
sed [options] '{command}' [filename]
In this article, we'll walk through the most
commonly used commands and options and illustrate how they work and
where they would be appropriate for use.
The Substitute Command
One of the most common uses of the sed utility,
and any similar editor, is to substitute one value for another. To
accomplish this, the syntax for the command portion of the operation is:
's/{old value}/{new value}/'
Thus, the following illustrates how "tiger" can be changed to "wolf" very simply:
$ echo The tiger cubs will meet on Tuesday after school | sed
's/tiger/wolf/'
The wolf cubs will meet on Tuesday after school
$
Notice that it is not necessary to specify a
filename if input is being derived from the output of a preceding
command—the same as is true for awk, sort, and most other Linux\UNIX
command-line utility programs.
Multiple Changes
If multiple changes need to be made to the same
file or line, there are three methods by which this can be
accomplished. The first is to use the "-e" option, which informs the
program that more than one editing command is being used. For example:
$ echo The tiger cubs will meet on Tuesday after school | sed -e '
s/tiger/wolf/' -e 's/after/before/'
The wolf cubs will meet on Tuesday before school
$
This is pretty much the long way of going about
it, and the "-e" option is not commonly used to any great extent. A
more preferable way is to separate command with semicolons:
$ echo The tiger cubs will meet on Tuesday after school | sed '
s/tiger/wolf/; s/after/before/'
The wolf cubs will meet on Tuesday before school
$
Notice that the semicolon must be the next
character following the slash. If a space is between the two, the
operation will not successfully complete and an error message will be
returned. These two methods are well and good, but there is one more
method that many administrators prefer. The key thing to note is that
everything between the two apostrophes (' ') is interpreted as sed
commands. The shell program reading in the commands will not assume you
are finished entering until the second apostrophe is entered. This
means that the command can be entered on multiple lines—with Linux
changing the prompt from PS1 to a continuation prompt (usually
">")—until the second apostrophe is entered. As soon as it is
entered, and Enter pressed, the processing will take place and the same
results will be generated, as the following illustrates:
$ echo The tiger cubs will meet on Tuesday after school | sed '
> s/tiger/wolf/
> s/after/before/'
The wolf cubs will meet on Tuesday before school
$
Global Changes
Let's begin with a deceptively simple edit.
Suppose the message that is to be changed contains more than one
occurrence of the item to be changed. By default, the result can be
different than what was expected, as the following illustrates:
$ echo The tiger cubs will meet this Tuesday at the same time
as the meeting last Tuesday | sed 's/Tuesday/Thursday/'
The tiger cubs will meet this Thursday at the same time
as the meeting last Tuesday
$
Instead of changing every occurrence of
"Tuesday" for "Thursday," the sed editor moves on after finding a
change and making it, without reading the whole line. The majority of
sed commands function like the substitute one, meaning they all work
for the first occurrence of the chosen sequence in each line. In order
for every occurrence to be substituted, in the event that more than one
occurrence appears in the same line, you must specify for the action to
take place globally:
$ echo The tiger cubs will meet this Tuesday at the same time
as the meeting last Tuesday | sed 's/Tuesday/Thursday/g'
The tiger cubs will meet this Thursday at the same time
as the meeting last Thursday
$
Bear in mind that this need for globalization is
true whether the sequence you are looking for consists of only one
character or a phrase.
sed can also be used to change record field
delimiters from one to another. For example, the following will change
all tabs to spaces:
sed 's/ / /g'
where the entry between the first set of slashes
is a tab, while the entry between the second set is a space. As a
general rule, sed can be used to change any printable character to any
other printable character. If you want to change unprintable characters
to printable ones—for example, a bell to the word "bell"—sed is not the
right tool for the job (but tr would be).
Sometimes, you don't want to change every
occurrence that appears in a file. At times, you only want to make a
change if certain conditions are met—for example, following a match of
some other data. To illustrate, consider the following text file:
$ cat sample_one
one 1
two 1
three 1
one 1
two 1
two 1
three 1
$
Suppose that it would be desirable for "1" to be
substituted with "2," but only after the word "two" and not throughout
every line. This can be accomplished by specifying that a match is to
be found before giving the substitute command:
$ sed '/two/ s/1/2/' sample_one
one 1
two 2
three 1
one 1
two 2
two 2
three 1
$
And now, to make it even more accurate:
$ sed '
> /two/ s/1/2/
> /three/ s/1/3/' sample_one
one 1
two 2
three 3
one 1
two 2
two 2
three 3
$
Bear in mind once again that the only thing
changed is the display. If you look at the original file, it is the
same as it always was. You must save the output to another file to
create permanence. It is worth repeating that the fact that changes are
not made to the original file is a true blessing in disguise—it lets
you experiment with the file without causing any real harm, until you
get the right commands working exactly the way you expect and want them
to.
The following saves the changed output to a new file:
$ sed '
> /two/ s/1/2/
> /three/ s/1/3/' sample_one > sample_two
The output file has all the changes incorporated
in it that would normally appear on the screen. It can now be viewed
with head, cat, or any other similar utility.
Script Files
The sed tool allows you to create a script file
containing commands that are processed from the file, rather than at
the command line, and is referenced via the "-f" option. By creating a
script file, you have the ability to run the same operations over and
over again, and to specify far more detailed operations than what you
would want to try to tackle from the command line each time.
Consider the following script file:
$ cat sedlist
/two/ s/1/2/
/three/ s/1/3/
$
It can now be used on the data file to obtain the same results we saw earlier:
$ sed -f sedlist sample_one
one 1
two 2
three 3
one 1
two 2
two 2
three 3
$
Notice that apostrophes are not used inside the
source file, or from the command line when the "-f" option is invoked.
Script files, also known as source files, are invaluable for operations
that you intend to repeat more than once and for complicated commands
where there is a possibility that you may make an error at the command
line. It is far easier to edit the source file and change one character
than to retype a multiple-line entry at the command line.
Restricting Lines
The default is for the editor to look at, and
for editing to take place on, every line that is input to the stream
editor. This can be changed by specifying restrictions preceding the
command. For example, to substitute "1" with "2" only in the fifth and
sixth lines of the sample file's output, the command would be:
$ sed '5,6 s/1/2/' sample_one
one 1
two 1
three 1
one 1
two 2
two 2
three 1
$
In this case, since the lines to changes were
specifically specified, the substitute command was not needed. Thus you
have the flexibility of choosing which lines to changes (essentially,
restricting the changes) based upon matching criteria that can be
either line numbers or a matched pattern.
Prohibiting the Display
The default is for sed to display on the screen
(or to a file, if so redirected) every line from the original file,
whether it is affected by an edit operation or not; the "-n" parameter
overrides this action. "-n" overrides all printing and displays no
lines whatsoever, whether they were changed by the edit or not. For
example:
$ sed -n -f sedlist sample_one
$
$ sed -n -f sedlist sample_one > sample_two
$ cat sample_two
$
In the first example, nothing is displayed on
the screen. In the second example, nothing is changed, and thus nothing
is written to the new file—it ends up being empty. Doesn't this negate
the whole purpose of the edit? Why is this useful? It is useful only
because the "-n" option has the ability to be overridden by a print
command (-p). To illustrate, suppose the script file were modified to
now resemble the following:
$ cat sedlist
/two/ s/1/2/p
/three/ s/1/3/p
$
Then this would be the result of running it:
$ sed -n -f sedlist sample_one
two 2
three 3
two 2
two 2
three 3
$
Lines that stay the same as they were are not
displayed at all. Only the lines affected by the edit are displayed. In
this manner, it is possible to pull those lines only, make the changes,
and place them in a separate file:
$ sed -n -f sedlist sample_one > sample_two
$
$ cat sample_two
two 2
three 3
two 2
two 2
three 3
$
Another method of utilizing this is to print
only a set number of lines. For example, to print only lines two
through six while making no other editing changes:
$ sed -n '2,6p' sample_one
two 1
three 1
one 1
two 1
two 1
$
All other lines are ignored, and only lines two
through six are printed as output. This is something remarkable that
you cannot do easily with any other utility. head will print the top of
a file, and tail will print the bottom, but sed allows you to pull
anything you want to from anywhere.
Deleting Lines
Substituting one value for another is far from
the only function that can be performed with a stream editor. There are
many more possibilities, and the second-most-used function in my
opinion is delete. Delete works in the same manner as substitute, only
it removes the specified lines (if you want to remove a word and not a
line, don't think of deleting, but think of substituting it for nothing—s/cat//).
The syntax for the command is:
'{what to find} d'
To remove all of the lines containing "two" from the sample_one file:
$ sed '/two/ d' sample_one
one 1
three 1
one 1
three 1
$
To remove the first three lines from the display, regardless of what they are:
$ sed '1,3 d' sample_one
one 1
two 1
two 1
three 1
$
Only the remaining lines are shown, and the
first three cease to exist in the display. There are several things to
keep in mind with the stream editor as they relate to global
expressions in general, and as they apply to deletions in particular:
- The up carat (^) signifies the beginning of a line, thus
sed '/^two/ d' sample_one
would only delete the line if "two" were the first three characters of the line.
- The dollar sign ($) represents the end of the file, or the end of a line, thus
sed '/two$/ d' sample_one
would delete the line only if "two" were the last three characters of the line.
The result of putting these two together:
sed '/^$/ d' {filename}
deletes all blank lines from a file. For
example, the following substitutes "1" for "2" as well as "1" for "3"
and removes any trailing lines in the file:
$ sed '/two/ s/1/2/; /three/ s/1/3/; /^$/ d' sample_one
one 1
two 1
three 1
one 1
two 2
two 2
three 1
$
A common use for this is to delete a header. The
following command will delete all lines in a file, from the first line
through to the first blank line:
sed '1,/^$/ d' {filename}
Appending and Inserting Text
Text can be appended to the end of a file by using sed with the "a" option. This is done in the following manner:
$ sed '$a\
> This is where we stop\
> the test' sample_one
one 1
two 1
three 1
one 1
two 1
two 1
three 1
This is where we stop
the test
$
Within the command, the dollar sign ($)
signifies that the text is to be appended to the end of the file. The
backslashes (\) are necessary to signify that a carriage return is
coming. If they are left out, an error will result proclaiming that the
command is garbled; anywhere that a carriage return is to be entered,
you must use the backslash.
To append the lines into the fourth and fifth positions instead of at the end, the command becomes:
$ sed '3a\
> This is where we stop\
> the test' sample_one
one 1
two 1
three 1
This is where we stop
the test
one 1
two 1
two 1
three 1
$
This appends the text after the third line. As
with almost any editor, you can choose to insert rather than append if
you so desire. The difference between the two is that append follows
the line specified, and insert starts with the line specified. When
using insert instead of append, just replace the "a" with an "i," as
shown below:
$ sed '3i\
> This is where we stop\
> the test' sample_one
one 1
two 1
This is where we stop
the test
three 1
one 1
two 1
two 1
three 1
$
The new text appears in the middle of the output, and processing resumes normally after the specified operation is carried out.
Reading and Writing Files
The ability to redirect the output has already
been illustrated, but it needs to be pointed out that files can be read
in and written out to simultaneously during operation of the editing
commands. For example, to perform the substitution and write the lines
between one and three to a file called sample_three:
$ sed '
> /two/ s/1/2/
> /three/ s/1/3/
> 1,3 w sample_three' sample_one
one 1
two 2
three 3
one 1
two 2
two 2
three 3
$
$ cat sample_three
one 1
two 2
three 3
$
Only the lines specified are written to the new
file, thanks to the "1,3" specification given to the w (write) command.
Regardless of those written, all lines are displayed in the default
output.
The Change Command
In addition to substituting entries, it is
possible to change the lines from one value to another. The thing to
keep in mind is that substitute works on a character-for-character
basis, whereas change functions like delete in that it affects the
entire line:
$ sed '/two/ c\
> We are no longer using two' sample_one
one 1
We are no longer using two
three 1
one 1
We are no longer using two
We are no longer using two
three 1
$
Working much like substitute, the change command
is greater in scale—completely replacing the one entry for another,
regardless of character content, or context. At the risk of overstating
the obvious, when substitute was used, then only the character "1" was
replaced with "2," while when using change, the entire original line
was modified. In both situations, the match to look for was simply the
"two."
Change All but...
With most sed commands, the functions are
spelled out as to what changes are to take place. Using the exclamation
mark, it is possible to have the changes take place everywhere but
those specified—completely reversing the default operation.
For example, to delete all lines that contain the phrase "two," the operation is:
$ sed '/two/ d' sample_one
one 1
three 1
one 1
three 1
$
And to delete all lines except those that contain the phrase "two," the syntax becomes:
$ sed '/two/ !d' sample_one
two 1
two 1
two 1
$
If you have a file that contains a list of items
and want to perform an operation on each of the items in the file, then
it is important that you first do an intelligent scan of those entries
and think about what you are doing. To make matters easier, you can do
so by combining sed with any iteration routine (for, while, until).
As an example, assume you have a text file named "animals" with the following entries:
pig
horse
elephant
cow
dog
cat
And you want to run the following routine:
#mcd.ksh
for I in $*
do
echo Old McDonald had a $I
echo E-I, E-I-O
done
The result will be that each line is printed at
the end of "Old McDonald has a." While this is correct for the majority
of the entries, it is grammatically incorrect for the "elephant" entry,
as the result should be "an elephant" rather than "a elephant." Using
sed, you can scan the output from your shell file for such grammatical
errors and correct them on the fly, by first creating a file of
commands:
#sublist
/ a a/ s/ a / an /
/ a e/ s/ a / an /
/a i/ s / a / an /
/a o/ s/ a / an /
/a u/ s/ a / an /
and then executing the process as follows:
$ sh mcd.ksh 'cat animals' | sed -f sublist
Now, after the mcd script has been run, sed will
scan the output for anywhere that the single letter a (space, "a,"
space) is followed by a vowel. If such exists, it will change the
sequence to space, "an," space. This corrects the problem before it
ever prints on the screen and ensures that editors everywhere sleep
easier at night. The result is:
Old McDonald had a pig
E-I, E-I-O
Old McDonald had a horse
E-I, E-I-O
Old McDonald had an elephant
E-I, E-I-O
Old McDonald had a cow
E-I, E-I-O
Old McDonald had a dog
E-I, E-I-O
Old McDonald had a cat
E-I, E-I-O
Quitting Early
The default is for sed to read through an entire
file and stop only when the end is reached. You can stop processing
early, however, by using the quit command. Only one quit command can be
specified, and processing will continue until the condition calling the
quit command is satisfied.
For example, to perform substitution only on the first five lines of a file and then quit:
$ sed '
> /two/ s/1/2/
> /three/ s/1/3/
> 5q' sample_one
one 1
two 2
three 3
one 1
two 2
$
The entry preceding the quit command can be a line number, as shown, or a find/matching command like the following:
$ sed '
> /two/ s/1/2/
> /three/ s/1/3/
> /three/q' sample_one
one 1
two 2
three 3
$
You can also use the quit command to view lines
beyond a standard number and add functionality that exceeds those in
head. For example, the head command allows you to specify how many of
the first lines of a file you want to see—the default number is ten,
but any number can be used from one to ninety-nine. If you want to see
the first 110 lines of a file, you cannot do so with head, but you can
with sed:
sed 110q filename
Handling Problems
The main thing to keep in mind when dealing with
sed is how it works. It works by reading one line in, performing all
the tasks it knows to perform on that one line, and then moving on to
the next line. Each line is subjected to every editing command given.
This can be troublesome if the order of your
operations is not thoroughly thought out. For example, suppose you need
to change all "two" entries to "three" and all "three" to "four":
$ sed '
> /two/ s/two/three/
> /three/ s/three/four/' sample_one
one 1
four 1
four 1
one 1
four 1
four 1
four 1
$
The very first "two" read was changed to
"three." It then meets the criteria established for the next edit and
becomes "four." The end result is not what was wanted—there are now no
entries but "four" where there should be "three" and "four."
When performing such an operation, you must pay
diligent attention to the manner in which the operations are specified
and arrange them in an order in which one will not clobber another. For
example:
$ sed '
> /three/ s/three/four/
> /two/ s/two/three/' sample_one
one 1
three 1
four 1
one 1
three 1
three 1
four 1
$
This works perfectly, since the "three" value is changed prior to "two" becoming "three."
Labels and Comments
Labels can be placed inside sed script files to
make it easier to explain what is transpiring, once the files begin to
grow in size. There are a variety of commands that relate to these
labels, and they include:
- : The colon signifies a label name. For example:
:HERE
Labels beginning with the colon can be addressed by "b" and "t" commands.
- b {label} Works as a "goto" statement, sending processing to the label preceded by a colon. For example,
b HERE
sends processing to the line
:HERE
If no label is specified following the b, processing goes to the end of the script file.
- t {label} Branches to
the label only if substitutions have been made since the last input
line or execution of a "t" command. As with "b," if a label name is not
given, processing moves to the end of the script file.
- # The pound sign as the first
character of a line causes the entire line to be treated as a comment.
Comment lines are different from labels and cannot be branched to with
b or t commands.
Further Investigations
The sed utility is one of the most powerful and
flexible tools that a Linux administrator has. While this article has
covered a lot of ground, it has only scratched the surface of this
versatile tool. For more information, one of the best sources is Dale
Dougherty and Arnold Robbins' book sed & awk, now in its
second edition from O'Reilly and Associates (see "Next Steps"). The
same publisher also puts out a pocket reference that you can carry with
you.
Emmett Dulaney (edulaney@iquest.net)
has earned 18 vendor certifications. Emmett has written several books
on Linux, UNIX, and certification study, has spoken at a number of
conferences, and is a former partner in Mercury Technical Solutions.
|