Wednesday, May 7, 2008

Bash Tip: Finding a Line and the One Following it

I was recently asked the following question:
I need to identify lines containing a string in a file and extract that line and
the next line from the file. There might be mutiple occurrences in the file.

In this example file I need to scan for "gottoget" and then extract line 3
and 4 as well as lines 6 and 7

Example file:
line 1
line 2
gottoget line 3
want this line as well line 4
line 5
gottoget line 6
ok this one must come with line 7
line 8
line 9
line 10

Hope you can help.


I think this puzzle is easily solved through the use of grep's -A flag.

According to the man page for grep (man grep), the -A flag prints the number of lines specified after the matching lines. It sounds like grep -A 1 gottotext examplefile should do the trick. This line will grab the line containing the string we're looking for ("gottotext" in your example) and the first line after that matching line. If we set grep up this way, we get the following:


gottoget line 3
want this line as well line 4
--
gottoget line 6
ok this one must come with line 7


The -- line separates contiguous matches. If you don't want that, the lines are easily removed with another grep filter ( | grep -v ^[--]) which says to show everything but lines that begin with the -- characters. If you have -- characters that are legitimate at the beginning of some lines in your data, you may need to play around a bit to only filter out these unnecessary ones.

Putting it all together, we get:


grep -A 1 gottoget examplefile | grep -v ^[--]


Giving us the cleaned up output of:


gottoget line 3
want this line as well line 4
gottoget line 6
ok this one must come with line 7


And that's it. A simple application of some grep statements provides the answer.

I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques.

Post a comment if you have a different or better way of handling this puzzle.

Take care.

No comments: