The character after the s is the delimiter. It is conventionally a slash, because this is what ed, more, and vi use. It can be anything you want, however. If you want to change a pathname that contains a slash - say /usr/local/bin to /common/bin - you could use the backslash to quote the slash:
sed 's/\/usr\/local\/bin/\/common\/bin/' <old >new
Gulp. Some call this a 'Picket Fence' and it's ugly. It is easier to read if you use an underline instead of a slash as a delimiter:
sed 's_/usr/local/bin_/common/bin_' <old >new
Some people use colons:
sed 's:/usr/local/bin:/common/bin:' <old >new
Others use the "|" character.
sed 's|/usr/local/bin|/common/bin|' <old >new
Pick one you like. As long as it's not in the string you are looking for, anything goes. And remember that you need three delimiters. If you get a "Unterminated `s' command" it's because you are missing one of them.
Using & as the matched string
Sometimes you want to search for a pattern and add some characters, like parenthesis, around or near the pattern you found. It is easy to do this if you are looking for a particular string:
sed 's/abc/(abc)/' <old >new
This won't work if you don't know exactly what you will find. How can you put the string you found in the replacement string if you don't know what it is?
The solution requires the special character "&." It corresponds to the pattern found.
sed 's/[a-z]*/(&)/' <old >new
You can have any number of "&" in the replacement string. You could also double a pattern, e.g. the first number of a line:
% echo "123 abc" | sed 's/[0-9]*/& &/'
123 123 abc
Let me slightly amend this example. Sed will match the first string, and make it as greedy as possible. The first match for '[0-9]*' is the first character on the line, as this matches zero of more numbers. So if the input was "abc 123" the output would be unchanged (well, except for a space before the letters). A better way to duplicate the number is to make sure it matches a number:
% echo "123 abc" | sed 's/[0-9][0-9]*/& &/'
123 123 abc
The string "abc" is unchanged, because it was not matched by the regular expression. If you wanted to eliminate "abc" from the output, you must expand the regular expression to match the rest of the line and explicitly exclude part of the expression using "(", ")" and "\1", which is the next topic.
A quick comment. The original sed did not support the "+" metacharacter. GNU sed does if you use the "-r" command line option, which enables extended regular expressions. The "+" means "one or more matches". So the above could also be written using
% echo "123 abc" | sed -r 's/[0-9]+/& &/'
123 123 abc
Using \1 to keep part of the pattern
I have already described the use of "(" ")" and "1" in my tutorial on regular expressions. To review, the escaped parentheses (that is, parentheses with backslashes before them) remember portions of the regular expression. You can use this to exclude part of the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. Sed has up to nine remembered patterns.
If you wanted to keep the first word of a line, and delete the rest of the line, mark the important part with the parenthesis:
sed 's/\([a-z]*\).*/\1/'
I should elaborate on this. Regular expressions are greedy, and try to match as much as possible. "[a-z]*" matches zero or more lower case letters, and tries to be as big as possible. The ".*" matches zero or more characters after the first match. Since the first one grabs all of the lower case letters, the second matches anything else. Therefore if you type
echo abcd123 | sed 's/\([a-z]*\).*/\1/'
This will output "abcd" and delete the numbers.
If you want to switch two words around, you can remember two patterns and change the order around:
sed 's/\([a-z]*\) \([a-z]*\)/\2 \1/'
Note the space between the two remembered patterns. This is used to make sure two words are found. However, this will do nothing if a single word is found, or any lines with no letters. You may want to insist that words have at least one letter by using
sed 's/\([a-z][a-z]*\) \([a-z][a-z]*\)/\2 \1/'
or by using extended regular expressions
sed -r 's/\([a-z]+\) \([a-z]+\)/\2 \1/' # Using GNU sed
The "\1" doesn't have to be in the replacement string (in the right hand side). It can be in the pattern you are searching for (in the left hand side). If you want to eliminate duplicated words, you can try:
sed 's/\([a-z]*\) \1/\1/'
If you want to detect duplicated words, you can use
sed -n '/\([a-z][a-z]*\) \1/p'
or with extended regular expressions
sed -n '/\([a-z]+\) \1/p'
This, when used as a filter, will print lines with duplicated words.
The numeric value can have up to nine values: "\1" thru "\9." If you wanted to reverse the first three characters on a line, you can use
sed 's/^\(.\)\(.\)\(.\)/\3\2\1/'
Sed Pattern Flags
You can add additional flags after the last delimiter. You might have noticed I used a 'p' at the end of the previous substitute command. I also added the '-n' option. Let me first cover the 'p' and other pattern flags. These flags can specify what happens when a match is found. Let me describe them.
/g - Global replacement
Most UNIX utilities work on files, reading a line at a time. Sed, by default, is the same way. If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first. For an example, let's place parentheses around words on a line. Instead of using a pattern like "[A-Za-z]*" which won't match words like "won't," we will use a pattern, "[^ ]*," that matches everything except a space. Well, this will also match anything because "*" means zero or more. The current version of Solaris's sed (as I wrote this) can get unhappy with patterns like this, and generate errors like "Output line too long" or even run forever. I consider this a bug, and have reported this to Sun. As a work-around, you must avoid matching the null string when using the "g" flag to sed. A work-around example is: "[^ ][^ ]*." The following will put parenthesis around the first word:
sed 's/[^ ]*/(&)/' <old >new
If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:
sed 's/[^ ][^ ]*/(&)/g' <old >new
Is sed recursive?
Sed only operates on patterns found in the in-coming data. That is, the input line is read, and when a pattern is matched, the modified output is generated, and the rest of the input line is scanned. The "s" command will not scan the newly created output. That is, you don't have to worry about expressions like:
sed 's/loop/loop the loop/g' <old >new
This will not cause an infinite loop. If a second "s" command is executed, it could modify the results of a previous command. I will show you how to execute multiple commands later.
/1, /2, etc. Specifying which occurrence
With no flags, the first pattern is changed. With the "g" option, all patterns are changed. If you want to modify a particular pattern that is not the first one on the line, you could use "\(" and "\)" to mark each pattern, and use "\1" to put the first pattern back unchanged. This next example keeps the first word on the line but deletes the second:
sed 's/\([a-zA-Z]*\) \([a-zA-Z]*\) /\1 /' <old >new
Yuck. There is an easier way to do this. You can add a number after the substitution command to indicate you only want to match that particular pattern. Example:
sed 's/[a-zA-Z]* //2' <old >new
You can combine a number with the g (global) flag. For instance, if you want to leave the first word alone, but change the second, third, etc. to be DELETED instead, use /2g:
sed 's/[a-zA-Z]* /DELETED /2g' <old >new
Don't get /2 and \2 confused. The /2 is used at the end. \2 is used in inside the replacement field.
Note the space after the "*" character. Without the space, sed will run a long, long time. (Note: this bug is probably fixed by now.) This is because the number flag and the "g" flag have the same bug. You should also be able to use the pattern
sed 's/[^ ]*//2' <old >new
but this also eats CPU. If this works on your computer, and it does on some UNIX systems, you could remove the encrypted password from the password file:
sed 's/[^:]*//2' </etc/passwd >/etc/password.new
But this didn't work for me the time I wrote this. Using "[^:][^:]*" as a work-around doesn't help because it won't match an non-existent password, and instead delete the third field, which is the user ID! Instead you have to use the ugly parenthesis:
sed 's/^\([^:]*\):[^:]:/\1::/' </etc/passwd >/etc/password.new
You could also add a character to the first pattern so that it no longer matches the null pattern:
sed 's/[^:]*:/:/2' </etc/passwd >/etc/password.new
The number flag is not restricted to a single digit. It can be any number from 1 to 512. If you wanted to add a colon after the 80th character in each line, you could type:
sed 's/./&:/80' <file >new
You can also do it the hard way by using 80 dots:
sed 's/^................................................................................/&:/' <file >new
/p - print
By default, sed prints every line. If it makes a substitution, the new text is printed instead of the old one. If you use an optional argument to sed, "sed -n," it will not, by default, print any new lines. I'll cover this and other options later. When the "-n" option is used, the "p" flag will cause the modified line to be printed. Here is one way to duplicate the function of grep with sed:
sed -n 's/pattern/&/p' <file
But a simpler version is described later
Write to a file with /w filename
There is one more flag that can follow the third delimiter. With it, you can specify a file that will receive the modified data. An example is the following, which will write all lines that start with an even number, followed by a space, to the file even:
sed -n 's/^[0-9]*[02468] /&/w even' <file
In this example, the output file isn't needed, as the input was not modified. You must have exactly one space between the w and the filename. You can also have ten files open with one instance of sed. This allows you to split up a stream of data into separate files. Using the previous example combined with multiple substitution commands described later, you could split a file into ten pieces depending on the last digit of the first number. You could also use this method to log error or debugging information to a special file.
/I - Ignore Case
GNU has added another pattern flags - /I
This flag makes the pattern match case insensitive. This will match abc, aBc, ABC, AbC, etc.: