Hi guys and girls, this is the first guest post on my blog. It's written by Waldner from #awk on FreeNode IRC Network. He works as a sysadmin and does shell scripting as a hobby. Waldner will be happy to take any questions about the article. You can ask them in the comments of this post or on IRC.
This article takes a look at ten tips, tricks and pitfalls in Awk programming language. They are mostly taken from the discussions in #awk IRC channel. Here they are:
Update: Mr. Waldner just notified me that he has improved the tips on being idiomatic. See "Idiomatic Awk" on his website!
In this paragraph, we give some hints on how to write more idiomatic (and usually shorter and more efficient) awk programs. Many awk programs you're likely to encounter, especially short ones, make large use of these notions.
Suppose one wants to print all the lines in a file that match some pattern (a kind of awk-grep, if you like). A reasonable first shot is usually something like
awk '{if ($0 ~ /pattern/) print $0}'
That works, but there are a number of things to note.
The first thing to note is that it is not structured according to the awk's definition of a program, which is
condition { actions }
Our program can clearly be rewritten using this form, since both the condition and the action are very clear here:
awk '$0 ~ /pattern/ {print $0}'
Our next step in the perfect awk-ification of this program is to note that /pattern/ is the same as $0 ~ /pattern/. That is, when awk sees a single regular expression used as an expression, it implicitly applies it to $0, and returns success if there is a match. Then we have:
awk '/pattern/ {print $0}'
Now, let's turn our attention to the action part (what's inside braces). print print $0 is a redundant statement, since print alone, by default, prints $0.
awk '/pattern/ {print}'
But now we note that, when it finds that a condition is true, and there are no associated actions, awk performs a default action that is (you guessed it) print (which we already know is equivalent to print print $0). Thus we can do this:
awk '/pattern/'
Now we have reduced the initial program to its simplest (and more idiomatic) form. In many cases, if all you want to do is print some lines, according to a condition, you can write awk programs composed only of a condition (although complex):
awk '(NR%2 && /pattern/) || (!(NR%2) && /anotherpattern/)'
That prints odd lines that match /pattern/, or even lines that match /anotherpattern/. Naturally, if you don't want to print $0 but instead do something else, then you'll have to manually add a specific action to do what you want.
From the above, it follows that
awk 1 awk '"a"' # single quotes are important!
are both awk programs that just print their input unchanged. Sometimes, you want to operate only on some lines of the input (according to some condition), but also want to print all the lines, regardless of whether they were affected by your operation or not. A typical example is a program like this:
awk '{sub(/pattern/, "foobar")}1'