Previous | Table of Contents | Next |
In Chapter 16, Filtering Text Using Regular Expressions, you looked at the sed command and used regular expressions to filter text. In this chapter you will look at another powerful text filtering command called awk.
The awk command is a complete programming language that enables you to search many files for patterns and conditionally modify files without having to worry about opening files, reading lines, or closing files. Its found on all UNIX systems and is quite fast, easy to learn, and extremely flexible.
This chapter concentrates on the awk elements that are most commonly used in shell scripts. Specifically these features are
The awk command is a programming language that enables you to search through files and modify records with these files based on patterns. The name awk comes from the last names of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan. It has been a part of UNIX since 1978, when it was added to UNIX Version 7.
Currently three main versions are available:
The original awk has remained almost the same since its first introduction to UNIX in 1978. Originally it was intended to be a small programming language for filtering text and producing reports.
By the mid-1980s, people were using awk for large programs, so in 1985 its authors decided to extend it. This version, called nawk (as in new awk), was released to the public in 1987 and became a part of SunOS 4.1.x. Its developers intended for nawk to replace awk eventually. This has yet to happen. Most commercial UNIX versions such as HP-UX and Solaris still ship with both awk and nawk.
In 1992 the Institute of Electrical and Electronics Engineers (IEEE) standardized awk as part of its Portable Operating Systems Interface standard (POSIX). gawk, the GNU version of awk, is based on this standard. All Linux systems ship with gawk.
The examples in this chapter work with any version of awk.
The basic syntax of an awk command is
awk 'script' files
Here files is a list of one or more files, and script is one or more commands of the form:
/pattern/ { actions }
Here pattern is a regular expression, and actions is one or more of the commands that are covered later in this chapter. If pattern is omitted, awk performs the specified actions for each input line.
Look at the simplest task in awk, displaying all the input lines from a file. In this case you use a modified version of the file fruit_prices.txt from the previous chapter:
$ awk "{ print ; }" fruit_prices.txt Fruit Price/lbs Quantity Banana $0.89 100 Peach $0.79 65 Kiwi $1.50 22 Pineapple $1.29 35 Apple $0.99 78
Here you use the awk command print to print each line of the input. When the print command is given without arguments, it prints the input line exactly as it was read.
Notice that there is a semicolon (;) after the print command. This semicolon is required to let awk know that the command has concluded. Strictly speaking, some older versions of awk do not require this, but it is good practice to include it anyway.
One of the nicest features available in awk is that it automatically divides input lines into fields. A field is a set of characters that are separated by one or more field separator characters. The default field separator characters are tab and space.
When a line is read, awk places the fields that it has parsed into the variable 1 for the first field, 2 for the second field, and so on. To access a field, use the field operator, $. Thus, the first field is $1.
Note:
The use of the $ in awk is slightly different than in the shell. The $ is required only when accessing the value of a field variable; it is not required when accessing the values of other variables. I will explain creating and using variables in awk in depth later in this chapter.
As an example of using fields, you can print only the name of a fruit and its quantity using the following awk command:
$ awk '{ print $1 $3 ; }' fruit_prices.txt
Here you use awk to print two fields from every input line:
The output looks like the following:
FruitQuantity Banana100 Peach65 Kiwi22 Pineapple35 Apple78
Notice that in the output there is no separation between the fields. This is the default behavior of the print command. To print a space between each field you need to use the , operator as follows:
$ awk '{ print $1 , $3 ; }' fruit_prices.txt Fruit Quantity Banana 100 Peach 65 Kiwi 22 Pineapple 35 Apple 78
You can format the output by using the awk printf command instead of the print command as follows:
$ awk '{ printf "%-15s %s\n" , $1 , $3 ; }' fruit_prices.txt Fruit Quantity Banana 100 Peach 65 Kiwi 22 Pineapple 35 Apple 78
All the features of the printf command discussed in Chapter 13, Input/Output, are available in the awk command printf.
Previous | Table of Contents | Next |