Tuesday, September 23, 2008

Bash Tip: Reverse Sorting Lists Revisted; Reversing a Horizontal List

A while back, I noticed that I was getting a lot of hits to my site from search engines like Google with search terms containing the terms "bash reverse sort list." So I decided to write a post about various ways to reverse a list using bash's tools. If my hit counter is any indicator, it is the most popular post I have written yet with thirty out of the last one hundred hits entering on that page.

Recently, I was messing around with a data problem and realized that I needed to reverse a horizontal list. "No problem," I thought, "I'll just use one of those techniques I talked about before." Unfortunately, I didn't cover that case.

On Linux, this is pretty simple, I found, if you have access to the rev command.

Say we have input in the form of the variable named testingHZsort that looks something like this:


bash $ testingHZsort="a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij"
bash $
bash $ echo ${testingHZsort}
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij
bash $


Now, because of variable expansion, we wouldn't be able to call rev on the variable directly as we'll get a bunch of "No such file or directory" errors as rev tries to find files by the names in individual elements of testingHZsort. To get around this problem, we just have to expand the variable and then pipe it to the rev command.


bash $ echo ${testingHZsort} | rev
ji iii ii i kjih jih ih h gfed ed d wc wqc c eb db b ba aa a


Works like a charm.

The only problem is that you have to have access to the rev utility, which isn't available on the versions of Solaris that I have access to, so I had to go back to the drawing board.

On Solaris, I do have access to the sort utility, which works fine on vertical lists and include a reverse (-r) option. All I have to do is convert my vertical data into horizontal data, execute the reverse sort and then convert the vertical output back to horizontal output. Piping a couple of common utilities together should produce the desired result.

The first thing to do is convert the horizontal list into a vertical list, we can do this using the tr, or translate, utility to convert the spaces (' ') between testingHZsort's elements into newlines ('\n').


bash $ echo ${testingHZsort}
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii i
bash $
bash $ echo ${testingHZsort} | tr ' ' '\n'
a
aa
ab
b
bd
be
c
cqw
cw
d
de
defg
h
hi
hij
hijk
i
ii
iii
ij
bash $


Now we can pipe that output through sort with the reverse (-r) option.


bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
bash $


Then we reverse the previous translation and replace all the newlines (\n) with spaces (' ').


bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r | tr '\n' ' '
ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a bash $


That leaves us with a little problem of a final newline being converted into a space and not return our bash prompt (bash $) to its proper position. We can fix that by appending a final newline to the output with printf.


bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r | tr '\n' ' '; printf "\n"
ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a
bash $


What about horizontal lines of data that are separated by some other delimiter than a space? What about a comma separated list?

That's easy, actually. It is just a little different from the earlier examples in that we are going to replace the translation of the space into newlines with the translation of your new delimiter with newlines instead.

So, let's say that testingHZLine looks something like this:


bash $ testingHZsort="a,aa,ab,b,bd,be,c,cqw,cw,d,de,defg,h,hi,hij,hijk,i,ii,iii,ij"
bash $ echo ${testingHZsort}
a,aa,ab,b,bd,be,c,cqw,cw,d,de,defg,h,hi,hij,hijk,i,ii,iii,ij
bash $


We'll do the same sorting that we did before, but this time we'll replace all the commas (,) with newlines (\n).


bash $ echo ${testingHZsort} | tr ',' '\n' | sort -r | tr '\n' ','
ij,iii,ii,i,hijk,hij,hi,h,defg,de,d,cw,cqw,c,be,bd,b,ab,aa,a,bash $


We've lost our last newline again and we have an extra comma to deal with.

That's simple to clean up with a sed substitution that will turn only the trailing comma into a newline character. We'll use the end of line positional regex character ($) to do that. So we match a comma at the end of the line (,$) and replace it with a newline (\n).


bash $ echo ${testingHZsort} | tr ',' '\n' | sort -r | tr '\n' ',' | sed -e 's/,$/\n/'
ij,iii,ii,i,hijk,hij,hi,h,defg,de,d,cw,cqw,c,be,bd,b,ab,aa,a
bash $


That's all there is to it.

If you would like to improve your bash scripting skills you might want to consider picking up a copy of the Bash Cookbook. I highly recommend it. You can read my full review of it, here.

cc photo credit: Man vyi

1 comment:

Anonymous said...

I'm a fan of using the IFS variable in combination with for loops to do this sort of thing. Eg:

# (IFS=" "; for i in $testingHZsort; do echo $i; done) | sort -r

And then put it back together however you like.