You should read Mike MacCana’s post "The 5 lines that mystified O’Reilly: how to use a spreadsheet in bash". NOTE: I took down the link as it is now parked. Venture Cake appears to be gone.
Given that I just reviewed the Bash Cookbook mentioned in Mike's post, I was really happy to find this elegant solution to the problem of processing comma separated value (CSV) files with a simple bash script which doesn't appear in its pages.
It isn't an all purpose solution by any means, but a useful one if you need to expand on it. Some discussion of its finer good points and failings can be found on the Hacker News post that alerted me to it in the first place.
Thursday, September 25, 2008
Tuesday, September 23, 2008
Bash Tip: Reverse Sorting Lists Revisted; Reversing a Horizontal List
A while back, I noticed that I was getting a lot of hits to my site from search engines like Google with search terms containing the terms "bash reverse sort list." So I decided to write a post about various ways to reverse a list using bash's tools. If my hit counter is any indicator, it is the most popular post I have written yet with thirty out of the last one hundred hits entering on that page.
Recently, I was messing around with a data problem and realized that I needed to reverse a horizontal list. "No problem," I thought, "I'll just use one of those techniques I talked about before." Unfortunately, I didn't cover that case.
On Linux, this is pretty simple, I found, if you have access to the rev command.
Say we have input in the form of the variable named testingHZsort that looks something like this:
Now, because of variable expansion, we wouldn't be able to call rev on the variable directly as we'll get a bunch of "No such file or directory" errors as rev tries to find files by the names in individual elements of testingHZsort. To get around this problem, we just have to expand the variable and then pipe it to the rev command.
Works like a charm.
The only problem is that you have to have access to the rev utility, which isn't available on the versions of Solaris that I have access to, so I had to go back to the drawing board.
On Solaris, I do have access to the sort utility, which works fine on vertical lists and include a reverse (-r) option. All I have to do is convert my vertical data into horizontal data, execute the reverse sort and then convert the vertical output back to horizontal output. Piping a couple of common utilities together should produce the desired result.
The first thing to do is convert the horizontal list into a vertical list, we can do this using the tr, or translate, utility to convert the spaces (' ') between testingHZsort's elements into newlines ('\n').
Now we can pipe that output through sort with the reverse (-r) option.
Then we reverse the previous translation and replace all the newlines (\n) with spaces (' ').
That leaves us with a little problem of a final newline being converted into a space and not return our bash prompt (bash $) to its proper position. We can fix that by appending a final newline to the output with printf.
What about horizontal lines of data that are separated by some other delimiter than a space? What about a comma separated list?
That's easy, actually. It is just a little different from the earlier examples in that we are going to replace the translation of the space into newlines with the translation of your new delimiter with newlines instead.
So, let's say that testingHZLine looks something like this:
We'll do the same sorting that we did before, but this time we'll replace all the commas (,) with newlines (\n).
We've lost our last newline again and we have an extra comma to deal with.
That's simple to clean up with a sed substitution that will turn only the trailing comma into a newline character. We'll use the end of line positional regex character ($) to do that. So we match a comma at the end of the line (,$) and replace it with a newline (\n).
That's all there is to it.
If you would like to improve your bash scripting skills you might want to consider picking up a copy of the Bash Cookbook. I highly recommend it. You can read my full review of it, here.
cc photo credit: Man vyi
Recently, I was messing around with a data problem and realized that I needed to reverse a horizontal list. "No problem," I thought, "I'll just use one of those techniques I talked about before." Unfortunately, I didn't cover that case.
On Linux, this is pretty simple, I found, if you have access to the rev command.
Say we have input in the form of the variable named testingHZsort that looks something like this:
bash $ testingHZsort="a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij"
bash $
bash $ echo ${testingHZsort}
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij
bash $
Now, because of variable expansion, we wouldn't be able to call rev on the variable directly as we'll get a bunch of "No such file or directory" errors as rev tries to find files by the names in individual elements of testingHZsort. To get around this problem, we just have to expand the variable and then pipe it to the rev command.
bash $ echo ${testingHZsort} | rev
ji iii ii i kjih jih ih h gfed ed d wc wqc c eb db b ba aa a
Works like a charm.
The only problem is that you have to have access to the rev utility, which isn't available on the versions of Solaris that I have access to, so I had to go back to the drawing board.
On Solaris, I do have access to the sort utility, which works fine on vertical lists and include a reverse (-r) option. All I have to do is convert my vertical data into horizontal data, execute the reverse sort and then convert the vertical output back to horizontal output. Piping a couple of common utilities together should produce the desired result.
The first thing to do is convert the horizontal list into a vertical list, we can do this using the tr, or translate, utility to convert the spaces (' ') between testingHZsort's elements into newlines ('\n').
bash $ echo ${testingHZsort}
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii i
bash $
bash $ echo ${testingHZsort} | tr ' ' '\n'
a
aa
ab
b
bd
be
c
cqw
cw
d
de
defg
h
hi
hij
hijk
i
ii
iii
ij
bash $
Now we can pipe that output through sort with the reverse (-r) option.
bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
bash $
Then we reverse the previous translation and replace all the newlines (\n) with spaces (' ').
bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r | tr '\n' ' '
ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a bash $
That leaves us with a little problem of a final newline being converted into a space and not return our bash prompt (bash $) to its proper position. We can fix that by appending a final newline to the output with printf.
bash $ echo ${testingHZsort} | tr ' ' '\n' | sort -r | tr '\n' ' '; printf "\n"
ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a
bash $
What about horizontal lines of data that are separated by some other delimiter than a space? What about a comma separated list?
That's easy, actually. It is just a little different from the earlier examples in that we are going to replace the translation of the space into newlines with the translation of your new delimiter with newlines instead.
So, let's say that testingHZLine looks something like this:
bash $ testingHZsort="a,aa,ab,b,bd,be,c,cqw,cw,d,de,defg,h,hi,hij,hijk,i,ii,iii,ij"
bash $ echo ${testingHZsort}
a,aa,ab,b,bd,be,c,cqw,cw,d,de,defg,h,hi,hij,hijk,i,ii,iii,ij
bash $
We'll do the same sorting that we did before, but this time we'll replace all the commas (,) with newlines (\n).
bash $ echo ${testingHZsort} | tr ',' '\n' | sort -r | tr '\n' ','
ij,iii,ii,i,hijk,hij,hi,h,defg,de,d,cw,cqw,c,be,bd,b,ab,aa,a,bash $
We've lost our last newline again and we have an extra comma to deal with.
That's simple to clean up with a sed substitution that will turn only the trailing comma into a newline character. We'll use the end of line positional regex character ($) to do that. So we match a comma at the end of the line (,$) and replace it with a newline (\n).
bash $ echo ${testingHZsort} | tr ',' '\n' | sort -r | tr '\n' ',' | sed -e 's/,$/\n/'
ij,iii,ii,i,hijk,hij,hi,h,defg,de,d,cw,cqw,c,be,bd,b,ab,aa,a
bash $
That's all there is to it.
If you would like to improve your bash scripting skills you might want to consider picking up a copy of the Bash Cookbook. I highly recommend it. You can read my full review of it, here.
cc photo credit: Man vyi
Friday, August 15, 2008
Book Review: The Bash Cookbook
The Elevator Pitch
The Bash Cookbook is a must own book for anyone that uses Unix and Linux for fun or profit. Bash is a powerful shell environment available in everything from Mac OS X to commercial Unix offerings like Solaris. Being comfortable and productive with this shell is going to make your life a helluva lot easier. The Bash Cookbook serves as a digestible tutor to this powerful shell while maintaining a depth that makes it a valuable reference for solutions to many of the common problems that command line power users face.
The Full Review
I've been a Unix user since my first days studying Computer Science at college in the early nineties. While coming from using MS-DOS in high school and being plunked in front of a terminal with a dollar ($) prompt probably wasn't as disorienting as a move from Windows might have been, it was still pretty confusing. I struggled through the first few years until I took a systems programming class and finally started to understand the big picture of Unix. Still, it wasn't until almost a decade later that I decided to really try to wrap my mind around the Unix command line, and more specifically the bash shell.
As a Unix administrator, I have now been using the shell environment professionally for over five years. Bash is my shell of choice and I use it to do everything from processing various system logs, to running assorted backups, to creating system monitors, to wrapping more complex commands into usable interfaces, to transforming data into more usable formats. To get to that point, I spent a lot of time reading books like Learning the Bash Shell, hanging out on the shell scripting forums at Unix.com and reading various sysadmin blogs. All that is to say that I think I have a good grasp of the Unix/Linux command line in general and the bash shell in particular.
Recently, I had the opportunity to read the Bash Cookbook. Of all the technical books that I read for personal and professional gain, I prefer the formats of both O'Reilly's Hacks series and its Cookbooks for how they cover common problems and solutions in various technical subjects. I find them easy to digest, as both formats generally break large technical topics into bite sized chunks that present problems and solutions in very thorough, but approachable, ways. I find myself flying though these books. After reading a couple of pages that cover a single hack or recipe, I generally feel like I have learned something versus having to slog through twenty or so chapter pages in a typical tech book.
Thankfully, The Bash Cookbook stands up with its predecessors. The authors, Carl Albing, JP Vossen and Cameron Newham (also an author of the aforementioned Learning the Bash Shell) have backgrounds ranging from general technologists and authors to software developers for the Cray supercomputer company. This multifaceted experience set serves them well as they tackle various bash scripting topics from the mundane to the puzzling to the downright arcane. All of this is done with an approachable style and format that first identifies a problem, then offers a generalized solution, and finally follows up with a detailed discussion of the problem and solutions. This approach helps identify both the reasoning behind their solutions and the corner cases that will either further inform your own implementations or warn you that here be dragons.
The Bash Cookbook is divided into nineteen chapters and five appendixes, a few of which (most notably "Appendix D: Revision Control") could have served as full-on chapters by themselves. Topics include getting started with bash on various platforms (chapter 1); dealing with the intricacies of standard input and output redirection (chapters 2 and 3); job control (chapter 4); shell variables and arithmetic (chapters 5 and 6); finding and manipulating data (chapters 7, 8, and 9); working with functions and trapping conditions (chapter 10); manipulating dates and time (chapter 11); wrapping complex tasks (chapter 12); parsing files (chapter 13); writing scripts securely (chapter 14) ; bash corner cashes (chapter 15); customizing the bash environment (chapter 16); common system administration tasks (chapter 17); bash tips to be more productive (chapter 18); and, finally, common traps and workarounds for novice bash scripters (chapter 19). As you can see, there is a wealth of information to be had between the covers of this book.
I found useful information from the beginning chapters (which are often throw away generalized instructions for getting up to speed in most tech books) all the way to the appendices themselves. Some standout recipes from the book include:
The writing in the Cookbook is clear and to the point and incredibly consistent given that it was written by three writers. This is either a testament to the writing team and their ability to assimilate each other's styles or to O'Reilly's editorial staff's ability to tie the whole thing together (or, I assume, both). I particularly enjoyed the in depth discussion that many recipes received. It had the feel of looking over the shoulder of a veteran Unix admin and having the chance to pick his brain about why he was making the choices he was and why he was going about his business in a particular way. That is the book's greatest strength. As someone who has had to pick up Unix and Linux skills largely on his own, I found this approach invaluable. If you aren't surrounded by a Unix culture, it can be hard to pick up some of the more useful, but more complex, tricks of the trade. Think of The Bash Cookbook as your grey beard Unix hacker mentor on a shelf.
The book and its Table of Contents and Index are so comprehensive with regards to the common types of tasks that one generally performs while writing shell scripts, that it has become, in the short time that I have had it, my first (and usually last) goto reference. If I forget how to search for keywords in files across directories for instance, it just took a quick scan of the Index to find a very good and working answer. I use this book so much, that I am considering buying a second copy to keep at home so I don't have to haul my dog-eared version back and forth to and from work. It is that useful.
Really.
Some Nits to Pick
As with any large project such as a book, there are bound to be a few things that slip through the cracks. The Bash Cookbook is no different. For instance, recipe 6.6 talks about the different ways to check for equality in bash including the use of the single equals (=) or double equals (==) signs. Functionally these two constructs are exactly the same, but using the single equals is more portable as it follows the POSIX standard. That's fine, and very good to know. However, the use of these constructs isn't consistent in the book, which could lead to confusion as the explanation that it really doesn't matter doesn't happen until page sixty four. Even worse, much earlier in the book recipe 3.7 is an example of the use of these two constructs not even being consistent in a single script where the variable $directory is checked for equality with the string "Finished" on one line with the double equals construct (==) and another with the single equals construct (=). From a script maintainability respect, being this inconsistency is a bad idea.
One problem is the seeming omission of the treatment of arrays in bash. Most people unfamiliar with bash don't even realize that there are simple single dimensional arrays available in the environment, so I was happy to see some recipes that covered this topic. However, some of the more powerful array manipulations techniques, such as the ability to find the number of elements in an array with the simple ${array_name[@]} construct or the length of an individual array item with the ${#array_name[index number]} construct which is covered in the discussion of recipe 13.4, "Parsing Output into an Array", are buried in other recipes and hard to find even with the index. This misunderstanding could probably be helped if the See Also sections of each recipe pointed to other recipes in the book that dealt with similar subjects. Recipe 5.19, "Using Array Variables," only points to a section the the O'Reilly book Learning the Bash Shell. Other recipes in the book do a fine job of pointing out external sources of information as well as other recipes, so I think this is just a matter of some editorial consistency that would need to be beefed up for the next edition.
The authors make a conscious effort to stick with core bash tools throughout the text. As the note in the "Preface" of the book, Perl is covered elsewhere. Though they do say they are okay using the right tool for the job and sometimes they tell you when it is best to use something else... much better than having the reader beat their heads against a wall in my opinion. This is a book about bash after all and it would be maddening if many solutions switched to other non-bash solutions whenever something wasn't readily able to be solved with the bash tool set. Unfortunately there are times when they overlook common bash tools in favor of other scripting languages like sed and awk. The prime example of recipe 7.10, "Keeping Some Output, Discarding the Rest" where they use awk to solve the problem. Sure, awk works, but I would have preferred if they would have at least mentioned the cut utility in this context if only for comparison sake. They should have at least linked to recipe 8.4, "Cutting Out Parts of Your Output", and recipe 13.12, "Isolating Specific Fields in Data." Again, I this is just a matter of editorial consistency and just one of only a few examples where the fact that the book was written by multiple authors becomes mildly apparent.
Some other minor editorial issues revolve around typos and other minor errata. On page 84, "Thought" in the third Discussion paragraph should be "Though." On page 64, the comment (after the #) at the end of the script states "end of while not finished" which can be confusing as the loop construct is actually an until statement. Page 207 should be "what" instead of "hat" in the first full sentence of the page and similarly "fpllowing" on page 233 should be "following." For a book this long (622 pages), that's not bad at all. There may be others, but they weren't obvious during my reading. For a technical book, in its first edition, I was happy with the overall quality of the material.
My final suggestion is for the inclusion of sample input and outputs for the scripts. Many scripts give these types of examples, which makes it endlessly easier to understand exactly what the scripts are doing, but this isn't consistent throughout the book and I am not sure what the editorial decision was in not including these types of examples for those scripts that don't have them. My personal opinion is that there should be input and output examples for every recipe in the Bash Cookbook. I liken it to one of my favorite cooking guides, Cooks Illustrated, whose pictures often clear any confusion about preparations for recipes that the text of the recipe may have missed. I think the same holds for sample input and outputs for the tech recipes of the Bash Cookbook. Every recipe, in my opinion should have these examples even if they are only available from O'Reilly's website.
Conclusion
Nitpicks and suggestions aside, this is a great bash scripting resource and should find a good home on any scripter's bookshelf. It provides enough instruction to help a new-ish user understand the deeper power of bash scripting while having enough breadth and depth to serve as an invaluable resource for the experienced scripting guru.
Book Information
Title: Bash Cookbook
Authors: Carl Albing, JP Vossen & Cameron Newham
Paperback: 622 pages
Publisher: O'Reilly Media, Inc., 1 edition (May 24, 2007)
ISBN: 0596526784
The Bash Cookbook is a must own book for anyone that uses Unix and Linux for fun or profit. Bash is a powerful shell environment available in everything from Mac OS X to commercial Unix offerings like Solaris. Being comfortable and productive with this shell is going to make your life a helluva lot easier. The Bash Cookbook serves as a digestible tutor to this powerful shell while maintaining a depth that makes it a valuable reference for solutions to many of the common problems that command line power users face.
The Full Review
I've been a Unix user since my first days studying Computer Science at college in the early nineties. While coming from using MS-DOS in high school and being plunked in front of a terminal with a dollar ($) prompt probably wasn't as disorienting as a move from Windows might have been, it was still pretty confusing. I struggled through the first few years until I took a systems programming class and finally started to understand the big picture of Unix. Still, it wasn't until almost a decade later that I decided to really try to wrap my mind around the Unix command line, and more specifically the bash shell.
As a Unix administrator, I have now been using the shell environment professionally for over five years. Bash is my shell of choice and I use it to do everything from processing various system logs, to running assorted backups, to creating system monitors, to wrapping more complex commands into usable interfaces, to transforming data into more usable formats. To get to that point, I spent a lot of time reading books like Learning the Bash Shell, hanging out on the shell scripting forums at Unix.com and reading various sysadmin blogs. All that is to say that I think I have a good grasp of the Unix/Linux command line in general and the bash shell in particular.
Recently, I had the opportunity to read the Bash Cookbook. Of all the technical books that I read for personal and professional gain, I prefer the formats of both O'Reilly's Hacks series and its Cookbooks for how they cover common problems and solutions in various technical subjects. I find them easy to digest, as both formats generally break large technical topics into bite sized chunks that present problems and solutions in very thorough, but approachable, ways. I find myself flying though these books. After reading a couple of pages that cover a single hack or recipe, I generally feel like I have learned something versus having to slog through twenty or so chapter pages in a typical tech book.
Thankfully, The Bash Cookbook stands up with its predecessors. The authors, Carl Albing, JP Vossen and Cameron Newham (also an author of the aforementioned Learning the Bash Shell) have backgrounds ranging from general technologists and authors to software developers for the Cray supercomputer company. This multifaceted experience set serves them well as they tackle various bash scripting topics from the mundane to the puzzling to the downright arcane. All of this is done with an approachable style and format that first identifies a problem, then offers a generalized solution, and finally follows up with a detailed discussion of the problem and solutions. This approach helps identify both the reasoning behind their solutions and the corner cases that will either further inform your own implementations or warn you that here be dragons.
The Bash Cookbook is divided into nineteen chapters and five appendixes, a few of which (most notably "Appendix D: Revision Control") could have served as full-on chapters by themselves. Topics include getting started with bash on various platforms (chapter 1); dealing with the intricacies of standard input and output redirection (chapters 2 and 3); job control (chapter 4); shell variables and arithmetic (chapters 5 and 6); finding and manipulating data (chapters 7, 8, and 9); working with functions and trapping conditions (chapter 10); manipulating dates and time (chapter 11); wrapping complex tasks (chapter 12); parsing files (chapter 13); writing scripts securely (chapter 14) ; bash corner cashes (chapter 15); customizing the bash environment (chapter 16); common system administration tasks (chapter 17); bash tips to be more productive (chapter 18); and, finally, common traps and workarounds for novice bash scripters (chapter 19). As you can see, there is a wealth of information to be had between the covers of this book.
I found useful information from the beginning chapters (which are often throw away generalized instructions for getting up to speed in most tech books) all the way to the appendices themselves. Some standout recipes from the book include:
- 3.7 Selecting from a List of Options
- 5.2 Embedding Documentation in Shell Scripts
- 5.17 Giving an Error Message for Unset Parameters
- 5.19 Using Array Variables
- 7.15 Showing Data As a Quick and Easy Histogram
- 8.3 Sorting IP Addresses
- 9.9 Finding Files by Content
- 10.6 Trapping Interrupts
- 13.4 Parsing Output into an Array
- 13.12 Isolating Specific Fields in Data
- 15.10 Finding My IP Address
- 15.13 Working Around "argument list too long" Errors
- 15.15 Sending Email from Your Script
- 16.4 Change your $PATH Temporarily
- 17.1 Renaming Many Files
- 17.8 Capturing File Metadata for Recovery
- 17.13 Prepending Data to a File
- 17.16 Finding Lines in One File But Not in the Other
- 17.17 Keeping the Most Recent N Objects
- 19.11 Seeing Odd Behavior from printf
The writing in the Cookbook is clear and to the point and incredibly consistent given that it was written by three writers. This is either a testament to the writing team and their ability to assimilate each other's styles or to O'Reilly's editorial staff's ability to tie the whole thing together (or, I assume, both). I particularly enjoyed the in depth discussion that many recipes received. It had the feel of looking over the shoulder of a veteran Unix admin and having the chance to pick his brain about why he was making the choices he was and why he was going about his business in a particular way. That is the book's greatest strength. As someone who has had to pick up Unix and Linux skills largely on his own, I found this approach invaluable. If you aren't surrounded by a Unix culture, it can be hard to pick up some of the more useful, but more complex, tricks of the trade. Think of The Bash Cookbook as your grey beard Unix hacker mentor on a shelf.
The book and its Table of Contents and Index are so comprehensive with regards to the common types of tasks that one generally performs while writing shell scripts, that it has become, in the short time that I have had it, my first (and usually last) goto reference. If I forget how to search for keywords in files across directories for instance, it just took a quick scan of the Index to find a very good and working answer. I use this book so much, that I am considering buying a second copy to keep at home so I don't have to haul my dog-eared version back and forth to and from work. It is that useful.
Really.
Some Nits to Pick
As with any large project such as a book, there are bound to be a few things that slip through the cracks. The Bash Cookbook is no different. For instance, recipe 6.6 talks about the different ways to check for equality in bash including the use of the single equals (=) or double equals (==) signs. Functionally these two constructs are exactly the same, but using the single equals is more portable as it follows the POSIX standard. That's fine, and very good to know. However, the use of these constructs isn't consistent in the book, which could lead to confusion as the explanation that it really doesn't matter doesn't happen until page sixty four. Even worse, much earlier in the book recipe 3.7 is an example of the use of these two constructs not even being consistent in a single script where the variable $directory is checked for equality with the string "Finished" on one line with the double equals construct (==) and another with the single equals construct (=). From a script maintainability respect, being this inconsistency is a bad idea.
One problem is the seeming omission of the treatment of arrays in bash. Most people unfamiliar with bash don't even realize that there are simple single dimensional arrays available in the environment, so I was happy to see some recipes that covered this topic. However, some of the more powerful array manipulations techniques, such as the ability to find the number of elements in an array with the simple ${array_name[@]} construct or the length of an individual array item with the ${#array_name[index number]} construct which is covered in the discussion of recipe 13.4, "Parsing Output into an Array", are buried in other recipes and hard to find even with the index. This misunderstanding could probably be helped if the See Also sections of each recipe pointed to other recipes in the book that dealt with similar subjects. Recipe 5.19, "Using Array Variables," only points to a section the the O'Reilly book Learning the Bash Shell. Other recipes in the book do a fine job of pointing out external sources of information as well as other recipes, so I think this is just a matter of some editorial consistency that would need to be beefed up for the next edition.
The authors make a conscious effort to stick with core bash tools throughout the text. As the note in the "Preface" of the book, Perl is covered elsewhere. Though they do say they are okay using the right tool for the job and sometimes they tell you when it is best to use something else... much better than having the reader beat their heads against a wall in my opinion. This is a book about bash after all and it would be maddening if many solutions switched to other non-bash solutions whenever something wasn't readily able to be solved with the bash tool set. Unfortunately there are times when they overlook common bash tools in favor of other scripting languages like sed and awk. The prime example of recipe 7.10, "Keeping Some Output, Discarding the Rest" where they use awk to solve the problem. Sure, awk works, but I would have preferred if they would have at least mentioned the cut utility in this context if only for comparison sake. They should have at least linked to recipe 8.4, "Cutting Out Parts of Your Output", and recipe 13.12, "Isolating Specific Fields in Data." Again, I this is just a matter of editorial consistency and just one of only a few examples where the fact that the book was written by multiple authors becomes mildly apparent.
Some other minor editorial issues revolve around typos and other minor errata. On page 84, "Thought" in the third Discussion paragraph should be "Though." On page 64, the comment (after the #) at the end of the script states "end of while not finished" which can be confusing as the loop construct is actually an until statement. Page 207 should be "what" instead of "hat" in the first full sentence of the page and similarly "fpllowing" on page 233 should be "following." For a book this long (622 pages), that's not bad at all. There may be others, but they weren't obvious during my reading. For a technical book, in its first edition, I was happy with the overall quality of the material.
My final suggestion is for the inclusion of sample input and outputs for the scripts. Many scripts give these types of examples, which makes it endlessly easier to understand exactly what the scripts are doing, but this isn't consistent throughout the book and I am not sure what the editorial decision was in not including these types of examples for those scripts that don't have them. My personal opinion is that there should be input and output examples for every recipe in the Bash Cookbook. I liken it to one of my favorite cooking guides, Cooks Illustrated, whose pictures often clear any confusion about preparations for recipes that the text of the recipe may have missed. I think the same holds for sample input and outputs for the tech recipes of the Bash Cookbook. Every recipe, in my opinion should have these examples even if they are only available from O'Reilly's website.
Conclusion
Nitpicks and suggestions aside, this is a great bash scripting resource and should find a good home on any scripter's bookshelf. It provides enough instruction to help a new-ish user understand the deeper power of bash scripting while having enough breadth and depth to serve as an invaluable resource for the experienced scripting guru.
Book Information
Title: Bash Cookbook
Authors: Carl Albing, JP Vossen & Cameron Newham
Paperback: 622 pages
Publisher: O'Reilly Media, Inc., 1 edition (May 24, 2007)
ISBN: 0596526784
Tuesday, May 20, 2008
Python Tip: Checking to see if your Python installation has support for SSL
I was trying to figure out if my installation of Python was compiled with SSL support and found it to be non-intuitive if you didn't compile Python for yourself.
So, to check if you have SSL support configured with your installation of Python, go to your command prompt and type:
and you'll get the Python interactive shell (that will look something like this):
At the >>> prompt, type import socket:
Then check for the ssl attribute, by typing hasattr(socket, "ssl") at the >>> prompt and look for a True or False response:
A True response means that SSL is compiled in your Python installation.
Good luck.
If you want some good books on learning to use Python, I highly recommend Beginning Python: From Novice to Professional by Magnus Hetland and Learning Python by Mark Lutz and David Ascher. I am currently using both books to get up to speed on the Python language and I am really enjoying working with both of them.
So, to check if you have SSL support configured with your installation of Python, go to your command prompt and type:
python
and you'll get the Python interactive shell (that will look something like this):
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
At the >>> prompt, type import socket:
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
Then check for the ssl attribute, by typing hasattr(socket, "ssl") at the >>> prompt and look for a True or False response:
Python 2.5.2 (r252:60911, Apr 21 2008, 11:12:42)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> hasattr(socket, "ssl")
True
>>>
A True response means that SSL is compiled in your Python installation.
Good luck.
If you want some good books on learning to use Python, I highly recommend Beginning Python: From Novice to Professional by Magnus Hetland and Learning Python by Mark Lutz and David Ascher. I am currently using both books to get up to speed on the Python language and I am really enjoying working with both of them.
Labels:
configuration,
programming,
python,
scripting,
ssl,
tips
Wednesday, May 7, 2008
Bash Tip: Proving a Negative with grep and diff
I stumbled across an interesting problem a while back. Given a set of data how do you determine correlating data that isn't there?
I was given a file containing a list of names and some other lines that indicated a successful condition. If a name on one line was followed by a success statement, then the condition was successful for the previous line. However, if a name was followed by another name, then the condition had failed for the first name.
Confused already? Let's look at an example and see if that clears things up. Say we have a file, tally.txt, whose contents look something like this:
Here is the same file with line numbers added to make things clearer:
Lines 3, 5, 7, 12, 14, 19, 21, 23, 29, 31, 33, 35, 39, and 42 all indicate a success condition (voteSuccess). By the conventions of the file, that means that the people on the preceding lines actually had the success (lines 2, 4, 6, 11, 13, 18, 20, 22, 28, 30, 32, 34, 38, and respectively). The problem is that we want to find out who wasn't able to successfully vote. We need to find some way to extract those that voted successfully from the file and only leave those that weren't able to vote.
It should be noted that I simplified this example quite a bit. The success condition string (voteSuccess) could actually be one of a host of things, so it is not just one known string that we can work against, but it is good enough for this exercise.
Of course this whole situation would be a lot easier if the program that created this file placed some sort of indication of failure after the names of the people that didn't have success in voting. Unfortunately, in many instances, we're often stuck with the formats we're given and have to find a way to make them work.
After a little thought, I came up with a psuedo algorithm that I thought might solve the problem:
For step one, we'll use a somewhat current version of the grep utility (the one I used was 2.5.1, you can find the version by typing grep -V) to print all the lines of tally.txt that contain voteSuccess and every line above them. The -B switch for grep prints however many lines you want above the string that you were looking for. Typing:
You'll notice that the -B switch prints -- between contiguous blocks of matches.
We'll clean those out by piping the output into an invert match of grep.
Now we'll clean out the voteSuccess condition statements and sort the output.
Our output from the first command sequence looks like this:
Now that we have a list of the successful voters, let's redirect it to the file, successfulvoters.txt, that we'll later use to ferret out the failed voters.
Next, we need to pull together a sorted list of all voters. This is pretty easy, all we have to do is an inverted search for the term voteSuccess. The only things left will be the names of all the voters which we can sort and redirect into the file allvoters.txt
Finally, we'll compares the successfulvoters.txt and allvoters.txt files using the diff utility. As diff can be verbose, we'll ask it to output an ed (line editor) script by employing the -e switch. These script instructions will highlight what needs to happen to the successfulvoters.txt file in order to make it look like the allvoters.txt file... which is add back all the failed users.
The failed users that we were trying to figure out how to isolate.
Let's compare files:
That gives us the following:
If you look at this output upside down, you can follow along in the successfullvoters.txt file and see where these additions would be added in order to make a complete list of users. If you can't do it mentally, I've flipped the output here:
However, we just need the usernames and not the commands for the ed utility. If we get rid of every line that starts with a number (our voter names don't start with numbers) and every line with a period (.), and then run that through sort, we should have an alphabetical list of people that couldn't successfully vote.
To get rid of any line that begins for a number, we'll do an invert search on the output of the diff command. The expression ^[[:digit:]] uses the carat (^) character to denote starting the line and the shorthand expression [[:digit:]] to denote any number. That will just leave us to content with the periods, which we can remove by piping this output into yet another inverted grep search asking to return any line that doesn't contain them, [.]. Then we sort the output to make it more useable.
And that's that!
We can put it all together in a quick and dirty bash script that will parse out the failed voters given a file name to process.
Does anyone have other ideas how to tackle this problem? While the test case was relatively small, the actual data set contained tens of thousands of entries.
I see a lot of redundancy in this solution with its two passes over the tally file. On the other hand, I had a working solution in about 15 minutes.
I have tested this a couple of times and it looks to work on all my data sets. Perhaps there's a problem that I am not seeing and, if so, please speak up and let me know.
How would you solve this problem?
Do you have any file processing war stories? Tricks of the trade you'd like to share?
Also, I have a couple of texts that I have use to flesh out my understanding of scripting and the bash shell. The first is the O'Reilly book Learning the bash Shell by Cameron Newham. It is a step by step introduction to bash shell concepts and includes a good overview of many standard shell tools and techniques.
I also really like the more general book by Stephen Kochan and Patrick Wood titled Unix Shell Programming (3rd ed). Kochan and Wood write the book to the POSIX standard for shells which should help in writing maintainable and portable scripts, however they also make an effort to point out how each shell differs in its approach. It has its faults, as most books do, but it is solid nonetheless.
I was given a file containing a list of names and some other lines that indicated a successful condition. If a name on one line was followed by a success statement, then the condition was successful for the previous line. However, if a name was followed by another name, then the condition had failed for the first name.
Confused already? Let's look at an example and see if that clears things up. Say we have a file, tally.txt, whose contents look something like this:
Doug
Jim
voteSuccess
Diana
voteSuccess
Thomas
voteSuccess
Drew
Elizabeth
Chris
Adrienne
voteSuccess
Nicholas
voteSuccess
Anita
Greg
Jacob
Trudy
voteSuccess
Alex
voteSuccess
Richard
voteSuccess
Donald
Sam
Steve
Bob
Nathan
voteSuccess
Penelope
voteSuccess
Bishop
voteSuccess
Dustin
voteSuccess
Ron
George
Henry
voteSuccess
Arthur
Reggie
voteSuccess
Here is the same file with line numbers added to make things clearer:
1 Doug
2 Jim
3 voteSuccess
4 Diana
5 voteSuccess
6 Thomas
7 voteSuccess
8 Drew
9 Elizabeth
10 Chris
11 Adrienne
12 voteSuccess
13 Nicholas
14 voteSuccess
15 Anita
16 Greg
17 Jacob
18 Trudy
19 voteSuccess
20 Alex
21 voteSuccess
22 Richard
23 voteSuccess
24 Donald
25 Sam
26 Steve
27 Bob
28 Nathan
29 voteSuccess
30 Penelope
31 voteSuccess
32 Bishop
33 voteSuccess
34 Dustin
35 voteSuccess
36 Ron
37 George
38 Henry
39 voteSuccess
40 Arthur
41 Reggie
42 voteSuccess
Lines 3, 5, 7, 12, 14, 19, 21, 23, 29, 31, 33, 35, 39, and 42 all indicate a success condition (voteSuccess). By the conventions of the file, that means that the people on the preceding lines actually had the success (lines 2, 4, 6, 11, 13, 18, 20, 22, 28, 30, 32, 34, 38, and respectively). The problem is that we want to find out who wasn't able to successfully vote. We need to find some way to extract those that voted successfully from the file and only leave those that weren't able to vote.
It should be noted that I simplified this example quite a bit. The success condition string (voteSuccess) could actually be one of a host of things, so it is not just one known string that we can work against, but it is good enough for this exercise.
Of course this whole situation would be a lot easier if the program that created this file placed some sort of indication of failure after the names of the people that didn't have success in voting. Unfortunately, in many instances, we're often stuck with the formats we're given and have to find a way to make them work.
After a little thought, I came up with a psuedo algorithm that I thought might solve the problem:
- Correlate all the success conditions with the appropriate people.
- Strip these into a file of successful voters sorted alphabetically.
- Strip out all the status messages, leaving only voters, and sort them alphabetically into another file of all voters.
- Check the difference between the files. As the successful voters will be in both files, only those that failed will be different.
For step one, we'll use a somewhat current version of the grep utility (the one I used was 2.5.1, you can find the version by typing grep -V) to print all the lines of tally.txt that contain voteSuccess and every line above them. The -B switch for grep prints however many lines you want above the string that you were looking for. Typing:
grep -B 1 voteSuccess tally.txt
You'll notice that the -B switch prints -- between contiguous blocks of matches.
Jim
voteSuccess
Diana
voteSuccess
Thomas
voteSuccess
--
Adrienne
voteSuccess
Nicholas
voteSuccess
--
Trudy
voteSuccess
Alex
voteSuccess
Richard
voteSuccess
--
Nathan
voteSuccess
Penelope
voteSuccess
Bishop
voteSuccess
Dustin
voteSuccess
--
Henry
voteSuccess
--
Reggie
voteSuccess
We'll clean those out by piping the output into an invert match of grep.
grep -B 1 voteSuccess tally.txt | grep -v ^[--]
Now we'll clean out the voteSuccess condition statements and sort the output.
grep -B 1 voteSuccess tally.txt | grep -v ^[--] | grep -v voteSuccess | sort
Our output from the first command sequence looks like this:
Adrienne
Alex
Bishop
Diana
Dustin
Henry
Jim
Nathan
Nicholas
Penelope
Reggie
Richard
Thomas
Trudy
Now that we have a list of the successful voters, let's redirect it to the file, successfulvoters.txt, that we'll later use to ferret out the failed voters.
grep -B 1 voteSuccess tally.txt | grep -v ^[--] | grep -v voteSuccess | sort > successfulvoters.txt
Next, we need to pull together a sorted list of all voters. This is pretty easy, all we have to do is an inverted search for the term voteSuccess. The only things left will be the names of all the voters which we can sort and redirect into the file allvoters.txt
grep -v voteSuccess tally.txt | sort > allvoters.txt
Finally, we'll compares the successfulvoters.txt and allvoters.txt files using the diff utility. As diff can be verbose, we'll ask it to output an ed (line editor) script by employing the -e switch. These script instructions will highlight what needs to happen to the successfulvoters.txt file in order to make it look like the allvoters.txt file... which is add back all the failed users.
The failed users that we were trying to figure out how to isolate.
Let's compare files:
diff -e successfulvoters.txt allvoters.txt
That gives us the following:
12a
Ron
Sam
Steve
.
6a
Jacob
.
5a
Elizabeth
George
Greg
.
4a
Donald
Doug
Drew
.
3a
Bob
Chris
.
2a
Anita
Arthur
.
If you look at this output upside down, you can follow along in the successfullvoters.txt file and see where these additions would be added in order to make a complete list of users. If you can't do it mentally, I've flipped the output here:
.
Arthur
Anita
2a
.
Chris
Bob
3a
.
Drew
Doug
Donald
4a
.
Greg
George
Elizabeth
5a
.
Jacob
6a
.
Steve
Sam
Ron
12a
However, we just need the usernames and not the commands for the ed utility. If we get rid of every line that starts with a number (our voter names don't start with numbers) and every line with a period (.), and then run that through sort, we should have an alphabetical list of people that couldn't successfully vote.
To get rid of any line that begins for a number, we'll do an invert search on the output of the diff command. The expression ^[[:digit:]] uses the carat (^) character to denote starting the line and the shorthand expression [[:digit:]] to denote any number. That will just leave us to content with the periods, which we can remove by piping this output into yet another inverted grep search asking to return any line that doesn't contain them, [.]. Then we sort the output to make it more useable.
diff -e successfulvoters.txt allvoters.txt | grep -v ^[[:digit:]] | grep -v [.] | sort
And that's that!
We can put it all together in a quick and dirty bash script that will parse out the failed voters given a file name to process.
#!/bin/bash
# Take the filename from the command line and stuff it into a variable
TALLY=$1
# Find the successful voters
grep -B 1 voteSuccess $TALLY | grep -v ^[--] | grep -v voteSuccess | sort > successfulvoters.txt
# Find all the voters
grep -v voteSuccess $TALLY | sort > allvoters.txt
# Find the difference between the successful voters
# and all the possible voters (ie the failed voters)
diff -e successfulvoters.txt allvoters.txt | grep -v ^[[:digit:]] | grep -v [.] | sort
Does anyone have other ideas how to tackle this problem? While the test case was relatively small, the actual data set contained tens of thousands of entries.
I see a lot of redundancy in this solution with its two passes over the tally file. On the other hand, I had a working solution in about 15 minutes.
I have tested this a couple of times and it looks to work on all my data sets. Perhaps there's a problem that I am not seeing and, if so, please speak up and let me know.
How would you solve this problem?
Do you have any file processing war stories? Tricks of the trade you'd like to share?
Also, I have a couple of texts that I have use to flesh out my understanding of scripting and the bash shell. The first is the O'Reilly book Learning the bash Shell by Cameron Newham. It is a step by step introduction to bash shell concepts and includes a good overview of many standard shell tools and techniques.
I also really like the more general book by Stephen Kochan and Patrick Wood titled Unix Shell Programming (3rd ed). Kochan and Wood write the book to the POSIX standard for shells which should help in writing maintainable and portable scripts, however they also make an effort to point out how each shell differs in its approach. It has its faults, as most books do, but it is solid nonetheless.
Labels:
bash,
diff,
file management,
file processing,
grep,
linux,
mac os,
scripting,
text processing,
tips,
unix
Bash Tip: Finding a Line and the One Following it
I was recently asked the following question:
I think this puzzle is easily solved through the use of grep's -A flag.
According to the man page for grep (man grep), the -A flag prints the number of lines specified after the matching lines. It sounds like grep -A 1 gottotext examplefile should do the trick. This line will grab the line containing the string we're looking for ("gottotext" in your example) and the first line after that matching line. If we set grep up this way, we get the following:
The -- line separates contiguous matches. If you don't want that, the lines are easily removed with another grep filter ( | grep -v ^[--]) which says to show everything but lines that begin with the -- characters. If you have -- characters that are legitimate at the beginning of some lines in your data, you may need to play around a bit to only filter out these unnecessary ones.
Putting it all together, we get:
Giving us the cleaned up output of:
And that's it. A simple application of some grep statements provides the answer.
I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques.
Post a comment if you have a different or better way of handling this puzzle.
Take care.
I need to identify lines containing a string in a file and extract that line and
the next line from the file. There might be mutiple occurrences in the file.
In this example file I need to scan for "gottoget" and then extract line 3
and 4 as well as lines 6 and 7
Example file:
line 1
line 2
gottoget line 3
want this line as well line 4
line 5
gottoget line 6
ok this one must come with line 7
line 8
line 9
line 10
Hope you can help.
I think this puzzle is easily solved through the use of grep's -A flag.
According to the man page for grep (man grep), the -A flag prints the number of lines specified after the matching lines. It sounds like grep -A 1 gottotext examplefile should do the trick. This line will grab the line containing the string we're looking for ("gottotext" in your example) and the first line after that matching line. If we set grep up this way, we get the following:
gottoget line 3
want this line as well line 4
--
gottoget line 6
ok this one must come with line 7
The -- line separates contiguous matches. If you don't want that, the lines are easily removed with another grep filter ( | grep -v ^[--]) which says to show everything but lines that begin with the -- characters. If you have -- characters that are legitimate at the beginning of some lines in your data, you may need to play around a bit to only filter out these unnecessary ones.
Putting it all together, we get:
grep -A 1 gottoget examplefile | grep -v ^[--]
Giving us the cleaned up output of:
gottoget line 3
want this line as well line 4
gottoget line 6
ok this one must come with line 7
And that's it. A simple application of some grep statements provides the answer.
I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques.
Post a comment if you have a different or better way of handling this puzzle.
Take care.
Tuesday, April 29, 2008
Bash Tip: Reverse Sorting Lists in the Shell
Every once in a while I check my site logs and find a common search phrase in referrals from search engines. Often the visitors appear to leave immediately; presumably because the page they landed on didn't answer their question.
A phrase that has been appearing quite frequently lately is "bash reverse sort list".
I can't tell exactly what they mean by their search query, so we'll take a couple of cracks at it.
My first thought is that they might be looking to reverse the output of the command line tool ls.
Say we have a directory and we see the following files when we run ls:
To get a simple reverse listing of those files, we should use the -r switch for ls. typing ls -r in the same directory yields:
Having your file list reversed in a horizontal line isn't always useful when you are looking for a vertical list. It just takes a little bit of extra work to turn your list on its side if that's what you need.
First, we'll use the -l switch of ls to show the long listing for the files. Typing ls -lr gives us a reverse listing of our files in a vertical format.
That's a bit more verbose than what I expect we're looking for, so we'll need to employ a couple of more tools to trim away the fat.
All we want is the last column (8, if you consider the columns delimited by spaces) of information. This is where the cut command comes in handy. It does exactly what the name implies by slicing and dicing data in multiple handy ways.
By default, the cut command treats data as fields separated by tabs. By sending the output of our ls -lr command as input to the cut command while changing the default delimiter character with the -d switch, we can filter out all but the 8th column. So far our command looks like this, ls -lr | cut -d" " -f8, and our ouput looks like this:
Almost perfect. However, you'll notice one small problem at the top of the list. There's an extra blank. If you look at the original output of ls -lr, it's quickly becomes clear where the blank line came from. The total 0 line in the original output had only two fields, total and 0, leaving nothing but a blank when cut went looking for the eighth field.
It's not too difficult a job to clean this up with a little creative application of the grep command. We'll use the -v or inverse match switch of grep (otherwise known as "show me everything but") of a line with only a beginning, represented by the carat (^) symbol, and an end, represented by the dollar sign ($) and nothing in between, or -v ^$.
Putting it all together as ls -lr | cut -d" " -f8 | grep -v ^$ successfully removes the blank line from our vertical reverse sorted list of files.
Another list you might like to sort is one contained in a file. ls isn't going to help us with this one, but the sort command is here to help.
By default, sort will sort a list in a file by the first field as delimited by white and non-white space. Taking an example file (sort.txt) containing the following:
So, running sort against sort.txt results in:
The sort command also offers a reverse sort option through the -r switch. Running sort -r against sort.txt (sort -r sort.txt) results in:
I hope this answers some of the basic questions about reverse sorting lists. For more information check out the manual pages for ls and sort (man ls and man sort).
However, you might just want your list flipped on its head, with no sorting whatsoever. Say you have the list:
You want it like to look like this:
As it turns out, there is a command just for that purpose called tac. Where cat will concatenate the contents of a file to the screen (standard output), tac will do the same after reversing the contents of a file.
Take the text of the 1st Amendment to the US Constitution, for example.
Running tac against these lines compeletely reverses them:
Whereas if we had used sort, the output would look slightly different:
If your list isn't vertical with items separated by a newline, you can use tac's -s switch, similar to cut's -d switch, to identify a different separator.
Update 1: A helpful reader pointed out that the ls examples could be a lot smaller with the application of the -1 switch to the ls command. This switch tells the standard ls command to print one file per line. When combined with the reverse, -r, switch, we get a reverse list of files in a vertical as opposed to the standard horizontal layout.
In the end, just typing
will result in this list of files
being printed like this
Update 2: It turns out that I didn't cover how to reverse sort a horizontal line. Since it is a little long, you can check out my solution in this post, Bash Tip: Reverse Sorting Lists Revisted; Reversing a Horizontal List.
--
I hope these tips help everyone out. If you want more resources on shell scripting, I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques
A phrase that has been appearing quite frequently lately is "bash reverse sort list".
I can't tell exactly what they mean by their search query, so we'll take a couple of cracks at it.
My first thought is that they might be looking to reverse the output of the command line tool ls.
Say we have a directory and we see the following files when we run ls:
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij
To get a simple reverse listing of those files, we should use the -r switch for ls. typing ls -r in the same directory yields:
ij iii ii i hijk hij hi h defg de d cw cqw c be bd b ab aa a
Having your file list reversed in a horizontal line isn't always useful when you are looking for a vertical list. It just takes a little bit of extra work to turn your list on its side if that's what you need.
First, we'll use the -l switch of ls to show the long listing for the files. Typing ls -lr gives us a reverse listing of our files in a vertical format.
total 0
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ij
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 iii
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ii
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 i
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hijk
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hij
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 hi
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 h
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 defg
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 de
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 d
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 cw
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 cqw
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 c
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 be
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 bd
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 b
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 ab
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 aa
-rw-r--r-- 1 jjones jjones 0 2007-05-07 21:23 a
That's a bit more verbose than what I expect we're looking for, so we'll need to employ a couple of more tools to trim away the fat.
All we want is the last column (8, if you consider the columns delimited by spaces) of information. This is where the cut command comes in handy. It does exactly what the name implies by slicing and dicing data in multiple handy ways.
By default, the cut command treats data as fields separated by tabs. By sending the output of our ls -lr command as input to the cut command while changing the default delimiter character with the -d switch, we can filter out all but the 8th column. So far our command looks like this, ls -lr | cut -d" " -f8, and our ouput looks like this:
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
Almost perfect. However, you'll notice one small problem at the top of the list. There's an extra blank. If you look at the original output of ls -lr, it's quickly becomes clear where the blank line came from. The total 0 line in the original output had only two fields, total and 0, leaving nothing but a blank when cut went looking for the eighth field.
It's not too difficult a job to clean this up with a little creative application of the grep command. We'll use the -v or inverse match switch of grep (otherwise known as "show me everything but") of a line with only a beginning, represented by the carat (^) symbol, and an end, represented by the dollar sign ($) and nothing in between, or -v ^$.
Putting it all together as ls -lr | cut -d" " -f8 | grep -v ^$ successfully removes the blank line from our vertical reverse sorted list of files.
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
Another list you might like to sort is one contained in a file. ls isn't going to help us with this one, but the sort command is here to help.
By default, sort will sort a list in a file by the first field as delimited by white and non-white space. Taking an example file (sort.txt) containing the following:
a
b
bd
hij
be
aa
cqw
ab
c
cw
d
de
iii
defg
h
hi
hijk
i
ii
ij
So, running sort against sort.txt results in:
a
aa
ab
b
bd
be
c
cqw
cw
d
de
defg
h
hi
hij
hijk
i
ii
iii
ij
The sort command also offers a reverse sort option through the -r switch. Running sort -r against sort.txt (sort -r sort.txt) results in:
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
I hope this answers some of the basic questions about reverse sorting lists. For more information check out the manual pages for ls and sort (man ls and man sort).
However, you might just want your list flipped on its head, with no sorting whatsoever. Say you have the list:
a
d
c
b
You want it like to look like this:
b
c
d
a
As it turns out, there is a command just for that purpose called tac. Where cat will concatenate the contents of a file to the screen (standard output), tac will do the same after reversing the contents of a file.
Take the text of the 1st Amendment to the US Constitution, for example.
Congress shall make no law respecting an establishment of religion,
or prohibiting the free exercise thereof;
or abridging the freedom of speech,
or of the press;
or the right of the people peaceably to assemble,
and to petition the Government for a redress of grievances.
Running tac against these lines compeletely reverses them:
and to petition the Government for a redress of grievances.
or the right of the people peaceably to assemble,
or of the press;
or abridging the freedom of speech,
or prohibiting the free exercise thereof;
Congress shall make no law respecting an establishment of religion,
Whereas if we had used sort, the output would look slightly different:
and to petition the Government for a redress of grievances.
Congress shall make no law respecting an establishment of religion,
or abridging the freedom of speech,
or of the press;
or prohibiting the free exercise thereof;
or the right of the people peaceably to assemble,
If your list isn't vertical with items separated by a newline, you can use tac's -s switch, similar to cut's -d switch, to identify a different separator.
Update 1: A helpful reader pointed out that the ls examples could be a lot smaller with the application of the -1 switch to the ls command. This switch tells the standard ls command to print one file per line. When combined with the reverse, -r, switch, we get a reverse list of files in a vertical as opposed to the standard horizontal layout.
In the end, just typing
ls -1r
will result in this list of files
a aa ab b bd be c cqw cw d de defg h hi hij hijk i ii iii ij
being printed like this
ij
iii
ii
i
hijk
hij
hi
h
defg
de
d
cw
cqw
c
be
bd
b
ab
aa
a
Update 2: It turns out that I didn't cover how to reverse sort a horizontal line. Since it is a little long, you can check out my solution in this post, Bash Tip: Reverse Sorting Lists Revisted; Reversing a Horizontal List.
--
I hope these tips help everyone out. If you want more resources on shell scripting, I highly recommend Unix Shell Programming (3rd edition) by Stephen Kochan and Patrick Wood if you are interested improving your understanding of shell scripting. Kochan and Wood do a very thorough job (using plenty of code examples) exploring various aspects of essential shell scripting tools and techniques
Bash Tip: Extracting a Range of Lines with sed
A little puzzle came across my desk the other day. I was asked how we could pull a range of lines from a file.
Say the file has ten lines:
We want to extract only lines five through eight from the file.
This isn't a hard problem, and there are probably a thousand ways to do this. However, I chose to use a simple sed command.
By typing the following we can get the result we want.
This line basically tells sed to start at line 5 and print until line 8 and then end processing at line 8. This simple construct works equally well with very large files and wider ranges of lines. Your output will look something like this:
It also works for extracting single lines if you change the line to look something like this:
The above line prints the single line (in this case "Line 5") and then stops processing the file.
I recommend the books Classic Shell Scripting by Arnold Robbins and Nelson H.F. Beebe and O'Reilly's Sed and Awk (second edition) by Dale Dougherty and Arnold Robbins if you would like additional resources for learning how to effectively leverage the sed utility.
How do you extract lines from a file? What tools and techniques work for you?
Say the file has ten lines:
Line 1
Line 2
Line 3
Line 4
Line 5
Line 6
Line 7
Line 8
Line 9
Line 10
We want to extract only lines five through eight from the file.
This isn't a hard problem, and there are probably a thousand ways to do this. However, I chose to use a simple sed command.
By typing the following we can get the result we want.
sed -n '5,8p;8q' filename
This line basically tells sed to start at line 5 and print until line 8 and then end processing at line 8. This simple construct works equally well with very large files and wider ranges of lines. Your output will look something like this:
Line 5
Line 6
Line 7
Line 8
It also works for extracting single lines if you change the line to look something like this:
sed -n '5p;5q' filename
The above line prints the single line (in this case "Line 5") and then stops processing the file.
I recommend the books Classic Shell Scripting by Arnold Robbins and Nelson H.F. Beebe and O'Reilly's Sed and Awk (second edition) by Dale Dougherty and Arnold Robbins if you would like additional resources for learning how to effectively leverage the sed utility.
How do you extract lines from a file? What tools and techniques work for you?
Monday, January 28, 2008
Komodo Tip: Launching a Python Shell and Running Code from Komodo Edit 4.2 on Linux
I have been teaching myself python lately and as part of that I have been trying out a few editors including vim, geany, and (most recently) Komodo Edit 4.2 from the folks at Active State (makers of fine cross platform scripting tools and language ports).
Komodo Edit appears to be a stripped down version of Active States more full featured Komodo IDE. Even though it is stripped down, it still has a some nice features.
While the Komodo Edit doesn't have all the built in features of the full-fledged IDE, the inclusion of the extensible Toolbox allows you to approximate some tools and integration that turns out to be quite handy.
Launching a Python Shell
The Komodo IDE has a nice integrated shell for python that makes trying out code quite easy. This is one of the features that was left out of Komodo Edit, but we can make launching a shell a little easier by creating a Toolbox entry that you can then bind to a key combination and launch from within the editor.
To get the entry:
Running a Python File
Similarly you can create a Run Command to use python to execute the current file that you are working with.
To get the entry:
I will post any more tricks in Komodo Edit as I come across them. If you have any tips or tricks, please post them to the comments.
CC photo credit: Fred Hsu
Komodo Edit appears to be a stripped down version of Active States more full featured Komodo IDE. Even though it is stripped down, it still has a some nice features.
While the Komodo Edit doesn't have all the built in features of the full-fledged IDE, the inclusion of the extensible Toolbox allows you to approximate some tools and integration that turns out to be quite handy.
Launching a Python Shell
The Komodo IDE has a nice integrated shell for python that makes trying out code quite easy. This is one of the features that was left out of Komodo Edit, but we can make launching a shell a little easier by creating a Toolbox entry that you can then bind to a key combination and launch from within the editor.
To get the entry:
- Launch Komodo Edit
- Click the Toolbox drop down menu
- Choose the Add menu item
- Then, click the New Command item
- In the Add Command window, type Python Shell in the first feild
- Then type, gnome-terminal -x python (or your terminal emulator of choice with the appropriate execute switch)
- The choose, No Console (GUI Application) from the Run In: field
- Next, click the Key Binding tab at the top and assign a keyboard combo that you can use to easily launch a python shell.
Running a Python File
Similarly you can create a Run Command to use python to execute the current file that you are working with.
To get the entry:
- Launch Komodo Edit
- Click the Toolbox drop down menu
- Choose the Add menu item
- Then, click the New Command item
- In the Add Command window, type Run Python File in the first feild
- Then type, %(python) %F
- The choose, Command Output Tab from the Run In: field
- Next, click the Key Binding tab at the top and assign a keyboard combo that you can use to easily launch a python shell.
I will post any more tricks in Komodo Edit as I come across them. If you have any tips or tricks, please post them to the comments.
CC photo credit: Fred Hsu
Labels:
komodo edit,
programming,
python,
scripting,
tips
Subscribe to:
Posts (Atom)