Tuesday, April 29, 2008

Bash Tip: Reverse Sorting Lists in the Shell

Every once in a while I check my site logs and find a common search phrase in referrals from search engines. Often the visitors appear to leave immediately; presumably because the page they landed on didn't answer their question.

A phrase that has been appearing quite frequently lately is "bash reverse sort list".

I can't tell exactly what they mean by their search query, so we'll take a couple of cracks at it.

My first thought is that they might be looking to reverse the output of the command line tool ls.

Say we have a directory and we see the following files when we run ls:


a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij


To get a simple reverse listing of those files, we should use the -r switch for ls. typing ls -r in the same directory yields:


ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a


Having your file list reversed in a horizontal line isn't always useful when you are looking for a vertical list. It just takes a little bit of extra work to turn your list on its side if that's what you need.

First, we'll use the -l switch of ls to show the long listing for the files. Typing ls -lr gives us a reverse listing of our files in a vertical format.


total 0
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ij
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 iii
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ii
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 i
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hijk
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hij
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hi
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 h
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 defg
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 de
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 d
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 cw
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 cqw
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 c
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 be
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 bd
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 b
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ab
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 aa
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 a


That's a bit more verbose than what I expect we're looking for, so we'll need to employ a couple of more tools to trim away the fat.

All we want is the last column (8, if you consider the columns delimited by spaces) of information. This is where the cut command comes in handy. It does exactly what the name implies by slicing and dicing data in multiple handy ways.

By default, the cut command treats data as fields separated by tabs. By sending the output of our ls -lr command as input to the cut command while changing the default delimiter character with the -d switch, we can filter out all but the 8th column. So far our command looks like this, ls -lr | cut -d" " -f8, and our ouput looks like this:



ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a


Almost perfect. However, you'll notice one small problem at the top of the list. There's an extra blank. If you look at the original output of ls -lr, it's quickly becomes clear where the blank line came from. The total 0 line in the original output had only two fields, total and 0, leaving nothing but a blank when cut went looking for the eighth field.

It's not too difficult a job to clean this up with a little creative application of the grep command. We'll use the -v or inverse match switch of grep (otherwise known as "show me everything but") of a line with only a beginning, represented by the carat (^) symbol, and an end, represented by the dollar sign ($) and nothing in between, or -v ^$.

Putting it all together as ls -lr | cut -d" " -f8 | grep -v ^$ successfully removes the blank line from our vertical reverse sorted list of files.


ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a


Another list you might like to sort is one contained in a file. ls isn't going to help us with this one, but the sort command is here to help.

By default, sort will sort a list in a file by the first field as delimited by white and non-white space. Taking an example file (sort.txt) containing the following:


a
b
bd
hij
be
aa
cqw
ab
c
cw
d
de
iii
defg
h
hi
hijk
i
ii
ij


So, running sort against sort.txt results in:


a
aa
ab
b
bd
be
c
cqw
cw
d
de
defg
h
hi
hij
hijk
i
ii
iii
ij


The sort command also offers a reverse sort option through the -r switch. Running sort -r against sort.txt (sort -r sort.txt) results in:


ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a


I hope this answers some of the basic questions about reverse sorting lists. For more information check out the manual pages for ls and sort (man ls and man sort).

However, you might just want your list flipped on its head, with no sorting whatsoever. Say you have the list:


a
d
c
b


You want it like to look like this:


b
c
d
a


As it turns out, there is a command just for that purpose called tac. Where cat will concatenate the contents of a file to the screen (standard output), tac will do the same after reversing the contents of a file.

Take the text of the 1st Amendment to the US Constitution, for example.


Congress shall make no law respecting an establishment of religion,
or prohibiting the free exercise thereof;
or abridging the freedom of speech,
or of the press;
or the right of the people peaceably to assemble,
and to petition the Government for a redress of grievances.


Running tac against these lines compeletely reverses them:


and to petition the Government for a redress of grievances.
or the right of the people peaceably to assemble,
or of the press;
or abridging the freedom of speech,
or prohibiting the free exercise thereof;
Congress shall make no law respecting an establishment of religion,


Whereas if we had used sort, the output would look slightly different:


and to petition the Government for a redress of grievances.
Congress shall make no law respecting an establishment of religion,
or abridging the freedom of speech,
or of the press;
or prohibiting the free exercise thereof;
or the right of the people peaceably to assemble,



If your list isn't vertical with items separated by a newline, you can use tac's -s switch, similar to cut's -d switch, to identify a different separator.

Update 1: A helpful reader pointed out that the ls examples could be a lot smaller with the application of the -1 switch to the ls command. This switch tells the standard ls command to print one file per line. When combined with the reverse, -r, switch, we get a reverse list of files in a vertical as opposed to the standard horizontal layout.

In the end, just typing


ls -1r


will result in this list of files


a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij


being printed like this


ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a


Update 2: It turns out that I didn't cover how to reverse sort a horizontal line. Since it is a little long, you can check out my solution in this post, Bash Tip: Reverse Sorting Lists Revisted; Reversing a Horizontal List.

--

I hope these tips help everyone out. If you want more resources on shell scripting, I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques

Bash Tip: Extracting a Range of Lines with sed

A little puzzle came across my desk the other day. I was asked how we could pull a range of lines from a file.

Say the file has ten lines:


Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10


We want to extract only lines five through eight from the file.

This isn't a hard problem, and there are probably a thousand ways to do this. However, I chose to use a simple sed command.

By typing the following we can get the result we want.


sed -n '5,8p;8q' filename


This line basically tells sed to start at line 5 and print until line 8 and then end processing at line 8. This simple construct works equally well with very large files and wider ranges of lines. Your output will look something like this:


Line 5
Line 6
Line 7
Line 8


It also works for extracting single lines if you change the line to look something like this:


sed -n '5p;5q' filename


The above line prints the single line (in this case "Line 5") and then stops processing the file.

I recommend the books Classic Shell Scripting by Arnold Robbins and Nelson H.F. Beebe and O'Reilly's Sed and Awk (second edition) by Dale Dougherty and Arnold Robbins if you would like additional resources for learning how to effectively leverage the sed utility.

How do you extract lines from a file? What tools and techniques work for you?