Multi-Line Patterns

Most UNIX utilities are line oriented. Regular expressions are line oriented. Searching for patterns that covers more than one line is not an easy task. (Hint: It will be very shortly.)

Sed reads in a line of text, performs commands which may modify the line, and outputs modification if desired. The main loop of a sed script looks like this:

The next line is read from the input file and places in the pattern space. If the end of file is found, and if there are additional files to read, the current file is closed, the next file is opened, and the first line of the new file is placed into the pattern space.
The line count is incremented by one. Opening a new file does not reset this number.
Each sed command is examined. If there is a restriction placed on the command, and the current line in the pattern space meets that restriction, the command is executed. Some commands, like "n" or "d" cause sed to go to the top of the loop. The "q" command causes sed to stop. Otherwise the next command is examined.
After all of the commands are examined, the pattern space is output unless sed has the optional "-n" argument.

The restriction before the command determines if the command is executed. If the restriction is a pattern, and the operation is the delete command, then the following will delete all lines that have the pattern:

/PATTERN/ d

If the restriction is a pair of numbers, then the deletion will happen if the line number is equal to the first number or greater than the first number and less than or equal to the last number:

10,20 d

If the restriction is a pair of patterns, there is a variable that is kept for each of these pairs. If the variable is false and the first pattern is found, the variable is made true. If the variable is true, the command is executed. If the variable is true, and the last pattern is on the line, after the command is executed the variable is turned off:

/begin/,/end/ d

Whew! That was a mouthful. If you have read carefully up to here, you should have breezed through this. You may want to refer back, because I covered several subtle points. My choice of words was deliberate. It covers some unusual cases, like:

# what happens if the second number

# is less than the first number?

sed -n '20,1 p' file

and

# generate a 10 line file with line numbers

# and see what happens when two patterns overlap

yes | head -10 | cat -n | \

sed -n -e '/1/,/7/ p' -e '/5/,/9/ p'

Enough mental punishment. Here is another review, this time in a table format. Assume the input file contains the following lines:

When sed starts up, the first line is placed in the pattern space. The next line is "CD." The operations of the "n," "d," and "p" commands can be summarized as:

Pattern Space	Next Input	Command	Output	New Pattern Space	New Text Input
AB	CD	n	<default>	CD	EF
AB	CD	d	-	CD	EF
AB	CD	p	AB	CD	EF

The "n" command may or may not generate output depending upon the existence of the "-n" flag.

That review is a little easier to follow, isn't it? Before I jump into multi-line patterns, I wanted to cover three more commands:

Print line number with =

The "=" command prints the current line number to standard output. One way to find out the line numbers that contain a pattern is to use:

# add line numbers first,

# then use grep,

# then just print the number

cat -n file | grep 'PATTERN' | awk '{print $1}'

The sed solution is:

sed -n '/PATTERN/ =' file

Earlier I used the following to find the number of lines in a file

#!/bin/sh

lines=`wc -l file | awk '{print $1}' `

Using the "=" command can simplify this:

#!/bin/sh

lines=`sed -n '$=' file `

The "=" command only accepts one address, so if you want to print the number for a range of lines, you must use the curly braces:

#!/bin/sh

# Just print the line numbers

sed -n '/begin/,/end/ {

}' file

Since the "=" command only prints to standard output, you cannot print the line number on the same line as the pattern. You need to edit multi-line patterns to do this.

Transform with y

If you wanted to change a word from lower case to upper case, you could write 26 character substitutions, converting "a" to "A," etc. Sed has a command that operates like the tr program. It is called the "y" command. For instance, to change the letters "a" through "f" into their upper case form, use:

sed 'y/abcdef/ABCDEF/' file

I could have used an example that converted all 26 letters into upper case, and while this column covers a broad range of topics, the "column" prefers a narrower format.

If you wanted to convert a line that contained a hexadecimal number (e.g. 0x1aff) to upper case (0x1AFF), you could use:

sed '/0x[0-9a-zA-Z]*/ y/abcdef/ABCDEF' file

This works fine if there are only numbers in the file. If you wanted to change the second word in a line to upper case, you are out of luck - unless you use multi-line editing. (Hey - I think there is some sort of theme here!)

Date: 2016-01-14; view: 1071

<== previous page	\|	next page ==>
Adding, Changing, Inserting new lines	\|	Working with Multiple Lines

doclecture.net - lectures - 2014-2025 year. Copyright infringement or personal data (0.376 sec.)