CATEGORIES:

Biology Chemistry Construction Culture Ecology Economy Electronics Finance Geography History Informatics Law Mathematics Mechanics Medicine Other Pedagogy Philosophy Physics Policy Psychology Sociology Sport Tourism

Keeping more than one line in the hold buffer

The "H" command allows you to combine several lines in the hold buffer. It acts like the "N" command as lines are appended to the buffer, with a "\n" between the lines. You can save several lines in the hold buffer, and print them only if a particular pattern is found later.

As an example, take a file that uses spaces as the first character of a line as a continuation character. The files /etc/termcap, /etc/printcap, makefile and mail messages use spaces or tabs to indicate a continuing of an entry. If you wanted to print the entry before a word, you could use this script. I use a "^I" to indicate an actual tab character:

#!/bin/sh

# print previous entry

sed -n '

/^[ ^I]/!{

# line does not start with a space or tab,

# does it have the pattern we are interested in?

'/$1/' {

# yes it does. print three dashes

---

# get hold buffer, save current line

# now print what was in the hold buffer

# get the original line back

}

# store it in the hold buffer

}

# what about lines that start

# with a space or tab?

/^[ ^I]/ {

# append it to the hold buffer

Click here to get file: grep_previous.sh

You can also use the "H" to extend the context grep. In this example, the program prints out the two lines before the pattern, instead of a single line. The method to limit this to two lines is to use the "s" command to keep one new line, and deleting extra lines. I call it grep4:

#!/bin/sh

# grep4: prints out 4 lines around pattern

# if there is only one argument, exit

case $# in

1);;

*) echo "Usage: $0 pattern";exit;;

esac;

sed -n '

'/$1/' !{

# does not match - add this line to the hold space

# bring it back into the pattern space

# Two lines would look like .*\n.*

# Three lines look like .*\n.*\n.*

# Delete extra lines - keep two

s/^.*\n$.*\n.*$$/\1/

# now put the two lines (at most) into

# the hold buffer again

}

'/$1/' {

# matches - append the current line

# get the next line

# append that one also

# bring it back, but keep the current line in

# the hold buffer. This is the line after the pattern,

# and we want to place it in hold in case the next line

# has the desired pattern

# print the 4 lines

# add the mark

---

Click here to get file: grep4.sh
You can modify this to print any number of lines around a pattern. As you can see, you must remember what is in the hold space, and what is in the pattern space. There are other ways to write the same routine.

Get with g or G

Instead of exchanging the hold space with the pattern space, you can copy the hold space to the pattern space with the "g" command. This deletes the pattern space. If you want to append to the pattern space, use the "G" command. This adds a new line to the pattern space, and copies the hold space after the new line.

Here is another version of the "grep3" command. It works just like the previous one, but is implemented differently. This illustrates that sed has more than one way to solve many problems. What is important is you understand your problem, and document your solution:

#!/bin/sh

# grep3 version c: use 'G' instead of H

# if there is only one argument, exit

case $# in

1);;

*) echo "Usage: $0 pattern";exit;;

esac;

# again - I hope the argument doesn't contain a /

sed -n '

'/$1/' !{

# put the non-matching line in the hold buffer

}

'/$1/' {

# found a line that matches

# add the next line to the pattern space

# exchange the previous line with the

# 2 in pattern space

# now add the two lines back

# and print it.

# add the three hyphens as a marker

---

# remove first 2 lines

s/.*\n.*\n$.*$$/\1/

# and place in the hold buffer for next time

Click here to get file: grep3c.sh

The "G" command makes it easy to have two copies of a line. Suppose you wanted to the convert the first hexadecimal number to uppercase, and don't want to use the script I described in an earlier column

#!/bin/sh

# change the first hex number to upper case format

# uses sed twice

# used as a filter

# convert2uc <in >out

sed '

s/ /\

/' | \

sed ' {

y/abcdef/ABCDEF/

s/\n/ /

Click here to get file: convert2uc.sh

Here is a solution that does not require two invocations of sed:

#!/bin/sh

# convert2uc version b

# change the first hex number to upper case format

# uses sed once

# used as a filter

# convert2uc <in >out

sed '

{

# remember the line

#change the current line to upper case

y/abcdef/ABCDEF/

# add the old line back

# Keep the first word of the first line,

# and second word of the second line

# with one humongous regular expression

s/^$[^ ]*$ .*\n[^ ]* $.*$/\1 \2/

Click here to get file: convert2uc1.sh
Carl Henrik Lunde suggested a way to make this simpler. I was working too hard.

#!/bin/sh

# convert2uc version b

# change the first hex number to upper case format

# uses sed once

# used as a filter

# convert2uc <in >out

sed '

{

# remember the line

#change the current line to upper case

y/abcdef/ABCDEF/

# add the old line back

# Keep the first word of the first line,

# and second word of the second line

# with one humongous regular expression

s/ .* / / # delete all but the first and last word

Click here to get file: convert2uc2.sh
This example only converts the letters "a" through "f" to upper case. This was chosen to make the script easier to print in these narrow columns. You can easily modify the script to convert all letters to uppercase, or to change the first letter, second word, etc.

Flow Control

As you learn about sed you realize that it has its own programming language. It is true that it's a very specialized and simple language. What language would be complete without a method of changing the flow control? There are three commands sed uses for this. You can specify a label with an text string preceded by a colon. The "b" command branches to the label. The label follows the command. If no label is there, branch to the end of the script. The "t" command is used to test conditions. Before I discuss the "t" command, I will show you an example using the "b" command.

This example remembers paragraphs, and if it contains the pattern (specified by an argument), the script prints out the entire paragraph.

#!/bin/sh

sed -n '

# if an empty line, check the paragraph

/^$/ b para

# else add it to the hold buffer

# at end of file, check paragraph

$ b para

# now branch to end of script

# this is where a paragraph is checked for the pattern

:para

# return the entire paragraph

# into the pattern space

# look for the pattern, if there - print

/'$1'/ p

Click here to get file: grep_paragraph.sh

Testing with t

You can execute a branch if a pattern is found. You may want to execute a branch only if a substitution is made. The command "t label" will branch to the label if the last substitute command modified the pattern space.

One use for this is recursive patterns. Suppose you wanted to remove white space inside parenthesis. These parentheses might be nested. That is, you would want to delete a string that looked like "( ( ( ())) )." The sed expressions

sed 's/([ ^I]*)/g'

would only remove the innermost set. You would have to pipe the data through the script four times to remove each set or parenthesis. You could use the regular expression

sed 's/([ ^I()]*)/g'

but that would delete non-matching sets of parenthesis. The "t" command would solve this:

#!/bin/sh

sed '

:again

s/([ ^I]*)//

t again

An earlier version had a 'g' after the 's' expression. This is not needed.

Click here to get file: delete_nested_parens.sh

Debugging with l

The 'l' command will print the pattern space in an unambiguous form. Non-printing characters are printed in a C-style escaped format.

This can be useful when debugging a complex multi-line sed script.

Date: 2016-01-14; view: 893

<== previous page	\|	next page ==>
Working with Multiple Lines	\|	An alternate way of adding comments

doclecture.net - lectures - 2014-2024 year. Copyright infringement or personal data (0.007 sec.)