Tuesday, April 29, 2008

Bash Tip: Extracting a Range of Lines with sed

A little puzzle came across my desk the other day. I was asked how we could pull a range of lines from a file.

Say the file has ten lines:


Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10


We want to extract only lines five through eight from the file.

This isn't a hard problem, and there are probably a thousand ways to do this. However, I chose to use a simple sed command.

By typing the following we can get the result we want.


sed -n '5,8p;8q' filename


This line basically tells sed to start at line 5 and print until line 8 and then end processing at line 8. This simple construct works equally well with very large files and wider ranges of lines. Your output will look something like this:


Line 5
Line 6
Line 7
Line 8


It also works for extracting single lines if you change the line to look something like this:


sed -n '5p;5q' filename


The above line prints the single line (in this case "Line 5") and then stops processing the file.

I recommend the books Classic Shell Scripting by Arnold Robbins and Nelson H.F. Beebe and O'Reilly's Sed and Awk (second edition) by Dale Dougherty and Arnold Robbins if you would like additional resources for learning how to effectively leverage the sed utility.

How do you extract lines from a file? What tools and techniques work for you?

3 comments:

peter_k said...

Thanks :)

cbuckley said...

Just thought I'd add something to this, as it's prominent in a Google search.

sed is a really powerful tool but these examples are a little slow if you're dealing with large files. The quit command is great if you want lines 1--10 of a million-line file, but if you want lines 900001--900010 then sed has to read all the previous 900000 lines first.

In this instance a combination of tail and head was found to be much faster:

tail -n +900001 largefile | head -n 10

Jim said...

Thanks, cbuckley.

In my quick test, that looks like it works faster as well.