The Concept of 'Hold space' and 'Pattern space' in sed

linux sed

I'm confused by the two concepts in sed: hold space and pattern space. Can someone help explain them?

Here's a snippet of the manual:

h H Copy/append pattern space to hold space. g G Copy/append hold space to pattern space. n N Read/append the next line of input into the pattern space.

These six commands really confuse me.

Try it yourself: echo $'1\n2\n3\n4' | sed -n '1~2h;2~2{p;x;p}'

Do not be confused, just do not use them. For anything other than simple substitutions on a single line you should be using awk, not sed. Hold spaces, pattern spaces, and 95% of the sed language constructs were invented before awk when there was no better alternative. They became obsolete as soon as awk was invented in the mid-1970s and are only kept alive today by people who enjoy solving problems using seds arcane syntax rather than doing it simply and cearly in awk. If you are using more than s, g, and p (with -n) in sed then you are almost certainly using the wrong tool.

Morton awk works with structured data (each line has the same structure). Sed is meant to work with raw random data. So you can't just simply use awk instead of sed.

I strongly recommend reading info sed. It is much more detailed than the bare man page.

If you want to examine a complicated expression, you can use l. It prints the current pattern space.

January

When sed reads a file line by line, the line that has been currently read is inserted into the pattern buffer (pattern space). Pattern buffer is like the temporary buffer, the scratchpad where the current information is stored. When you tell sed to print, it prints the pattern buffer.

Hold buffer / hold space is like a long-term storage, such that you can catch something, store it and reuse it later when sed is processing another line. You do not directly process the hold space, instead, you need to copy it or append to the pattern space if you want to do something with it. For example, the print command p prints the pattern space only. Likewise, s operates on the pattern space.

Here is an example:

sed -n '1!G;h;$p'

(the -n option suppresses automatic printing of lines)

There are three commands here: 1!G, h and $p. 1!G has an address, 1 (first line), but the ! means that the command will be executed everywhere but on the first line. $p on the other hand will only be executed on the last line. So what happens is this:

first line is read and inserted automatically into the pattern space on the first line, first command is not executed; h copies the first line into the hold space. now the second line replaces whatever was in the pattern space on the second line, first we execute G, appending the contents of the hold buffer to the pattern buffer, separating it by a newline. The pattern space now contains the second line, a newline, and the first line. Then, h command inserts the concatenated contents of the pattern buffer into the hold space, which now holds the reversed lines two and one. We proceed to line number three -- go to the point (3) above.

Finally, after the last line has been read and the hold space (containing all the previous lines in a reverse order) have been appended to the pattern space, pattern space is printed with p. As you have guessed, the above does exactly what the tac command does -- prints the file in reverse.

Does G and h option work like "cut and append"?? It doesn't look like "copy and append" operation.

What appends with pattern and hold space when nested commands (curly braces) are used? '195,210{/add/p}'… is it possible to extract the last line of a group of line involved in a pattern?

Marinos An

@Ed Morton: I disagree with you here. I found sed very useful and simple (once you grok the concept of the pattern and hold buffers) to come up with an elegant way to do multiline grepping.

For example, let's take a text file that has hostnames and some information about each host, with lots of junk in between that I dont care about.

Host: foo1
some junk, doesnt matter
some junk, doesnt matter
Info: about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Info: a second line about foo1 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter
Host: foo2
some junk, doesnt matter
Info: about foo2 that I really care about!!
some junk, doesnt matter
some junk, doesnt matter

To me, an awk script to just get the lines with the hostname and the corresponding info line would take a bit more than what I'm able to do with sed:

sed -n '/Host:/{h}; /Info/{x;p;x;p;}' myfile.txt

output looks like:

Host: foo1
Info: about foo1 that I really care about!!
Host: foo1
Info: a second line about foo1 that I really care about!!
Host: foo2
Info: about foo2 that I really care about!!

(Note that Host: foo1 appears twice in the output.)

Explanation:

-n disables output unless explicitly printed first match, finds and puts the Host: line into hold buffer (h) second match, finds the next Info: line, but first exchanges (x) current line in pattern buffer with hold buffer, and prints (p) the Host: line, then re-exchanges (x) and prints (p) the Info: line.

Yes, this is a simplistic example, but I suspect this is a common issue that was quickly dealt with by a simple sed one-liner. For much more complex tasks, such as ones in which you cannot rely on a given, predictable sequence, awk may be better suited.

In this case though you could just use grep: grep 'Host\|Info'

If there are two Info lines after a given Host, then @JensJenson wants both Info lines to be preceded by an Info line. I think I'll edit the answer accordingly. Pithikos, grep will not suffice then.

@JensJenson, the awk equivalent of your sed code is pretty short too: awk '/Host:/{hold=$0}; /Info/{print hold; print;}' myfile.txt

Julian Mehnle

Although @January's answer and the example are nice, the explanation was not enough for me. I had to search and learn a lot until I managed to understand how exactly sed -n '1!G;h;$p' works. So I'd like to elaborate on the command for someone like me.

First of all, let's see what the command does.

$ echo {a..d} | tr ' ' '\n' # Prints from 'a' to 'd' in each line
a
b
c
d
$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;$p'
d
c
b
a

It reverses the input like tac command does.

sed reads line-by-line, so let's see what happens on the patten space and the hold space at each line. As h command copies the contents of the pattern space to the hold space, both spaces have the same text.

Read line    Pattern Space / Hold Space    Command executed
-----------------------------------------------------------
a            a$                            h
b            b\na$                         1!G;h
c            c\nb\na$                      1!G;h
d            d\nc\nb\na$                   1!G;h;$p

At the last line, $p prints d\nc\nb\na$ which is formatted to

d
c
b
a

If you want to see the pattern space for each line, you can add an l command.

$ echo {a..d} | tr ' ' '\n' | sed -n '1!G;h;l;$p'
a$
b\na$
c\nb\na$
d\nc\nb\na$
d
c
b
a

I found it very helpful to watch this video tutorial Understanding how sed works, as the guy shows how each space will be used step by step. The hold spaced is referred in the 4th tutorial, but I recommend watching all the videos if you are not familiar with sed.

Also GNU sed document and Bruce Barnett's Sed tutorial are very good references.

I think it will also be helpful to mention that hold space for all practical purposes is empty unless we add something to it.

The Concept of 'Hold space' and 'Pattern space' in sed

Follow WeChat

Want to stay one step ahead of the latest teleworks?

相似问题

Platform

Support

Contact US