Sams Teach Yourself Shell Programming in 24 Hours:Fltering Text with awk

Sams Teach Yourself Shell Programming in 24 Hours
(Publisher: Macmillan Computer Publishing)
Author(s): Sriranga Veeraraghavan
ISBN: 0672314819
Publication Date: 01/01/99

Table of Contents

Built-in Variables

In addition to the variables that you can define, awk predefines several variables that are available for your use. The complete list of these variables is given in Table 17.4. Unless otherwise noted, you can safely change the values of any of these variables.

**Table 17.4** Built-in Variables in `awk`

Variables	Description

`FILENAME`	The name of the current input file. You should not change the value of this variable.
`NR`	The number of the current input line or record in the input file. You should not change the value of this variable.
`NF`	The number of fields in the current line or record. You should not change the value of this variable.
`OFS`	The output field separator (default is space).
`FS`	The input field separator (default is space and tab).
`ORS`	The output record separator (default is newline).
`RS`	The input record separator (default is newline).

Using FILENAME and NR In the previous example you used the shell to print the name of the input file. By using the variable FILENAME in conjunction with the BEGIN statement, you can do this all in awk.

While you’re at it, change the previous script to print the percentage of lines in the file that were blank. To accomplish this, you need to use the following expression in the END pattern:

100*(x/NR)

Because awk does all its numeric computation in floating point, you get a correct answer. Here you are using the variable NR, which stores the current record or line number.

In the END pattern, the value of NR is the line number of the last line that was processed, which is the same as the total number of lines processed.

With these changes, the script is

#!/bin/sh
for i in $@ ;
do
    if [ -f "$i" ] ; then
        awk 'BEGIN { printf "%s\t",FILENAME ; }
            /^ *$/ { x+=1 ; }
            END { ave=100*(x/NR) ; printf " %s\t%3.1f\n",x,ave; }
        ' "$i"
    else
        echo "ERROR: $i not a file." >&2
    fi
done

The new output looks like

urls.txt         4      36.4

Changing the Input Field Separator The input field separator, FS, controls how awk breaks up fields in an input line. The default value for FS is space and tab. Because most commands, such as ls or ps, use spaces or tabs to separate columns, this default value enables you to easily manipulate their output using awk.

You can manually set FS to any other characters in order to influence how awk breaks up an input line. Usually, this character is changed when you look through system databases, such as /etc/passwd. The two mechanisms available for changing FS are

• Manually resetting FS in a BEGIN pattern

• Specifying the -F option to awk

As an example, set FS to a colon (:). You can use the following BEGIN pattern

BEGIN { FS=":" ; }

or the following awk invocation:

awk -F: '{ ... }'

The major difference between the two is that the -F option enables you to use a shell variable to specify the field separator dynamically as follows

$ MYFS=: ; export MYFS ; awk -F${MYFS} '{ ... }'

whereas the BEGIN block forces you to hard code the value of the field separator.

A simple example that demonstrates the use of changing FS is the following:

$ awk 'BEGIN { FS=":" ; } { print $1 , $6 ; }' /etc/passwd

This command prints each user’s username and home directory. It can also be written as follows:

$ awk -F: '{ print $1, $6 ; }' /etc/passwd

A short excerpt of the output on my system is as follows:

root /
daemon /
bin /usr/bin
sys /
adm /var/adm
ranga /home/ranga

Allowing awk to Use Shell Variables

Most versions of awk have no direct way of accessing the values of environment variables that are set in the shell. In order for awk to use these variables, you have to convert them to awk variables on the command line.

The basic syntax for setting variables on the command line is

awk 'script' awkvar1=value awkvar2=value ... files

Here, script is the awk script that you want to execute. The variables awkvar1, awkvar2, and so on are the names of awk variables that you want to set. As usual, files is a list of files.

Say that you want to generate a list of all the fruit in fruit_prices.txt that are less than or equal to some number x, where x is supplied by the user. In order to make this possible, you need to forward to awk the value of x that the user gives your script.

Assuming that the user-supplied value for x is given to your script as $1, you can make the following changes:

#!/bin/sh
NUMFRUIT="$1"
if [ -z "$NUMFRUIT" ] ; then NUMFRUIT=75 ; fi

awk '
    $3 <= numfruit  { print ; }
' numfruit="$NUMFRUIT" fruit_prices.txt

Table of Contents