Sams Teach Yourself Shell Programming in 24 Hours:Debugging

Sams Teach Yourself Shell Programming in 24 Hours
(Publisher: Macmillan Computer Publishing)
Author(s): Sriranga Veeraraghavan
ISBN: 0672314819
Publication Date: 01/01/99

Table of Contents

Shell Tracing

There are many instances when syntax checking gives your script a clean bill of health, but bugs are still lurking in your script. Running syntax checking on a shell script is similar to running a spelling checker on a text document—it might find most of the misspellings, but it can’t fix problems like spelling read as red.

For text documents, you need to proofread them in order to find and fix all misspellings. To find and fix these types of problems in shell scripts, you need to use shell tracing.

In shell tracing mode the shell prints each command in the exact form in which it executes. For this reason, you sometimes see the shell tracing mode referred to as the execution tracing mode.

The shell tracing or execution tracing mode is enabled by using the -x option (x as in execution). For a complete script, it is enabled as follows:

$ /bin/sh -x script arg1 arg2 … argN

As was mentioned before, it can also be enabled using the set command:

set -x

To get an idea of what the output of shell tracing looks like, try the following command:

$ set -x ; ls *.sh ; set +x

The output is similar to the following:

+ ls buggy.sh buggy1.sh buggy2.sh buggy3.sh buggy4.sh
buggy.sh   buggy1.sh  buggy2.sh  buggy3.sh  buggy4.sh
+ set +x

In the output, the lines preceded by the plus (+) character are the commands that the shell executes. The other lines are the output from those commands. As you can see from the output, the shell prints the exact ls command it executes. This is extremely useful in debugging because it enables you to determine whether all the substitutions were performed correctly.

Finding Syntax Bugs Using Shell Tracing

In the preceding example, you used the script buggy2.sh. One of the problems with this script is that it deleted the old backup before asking whether you wanted to make a new backup. To solve this problem, the script is rewritten as follows:

#!/bin/sh

Failed() {
    if [ $1 -ne 0 ] ; then
        echo "Failed. Exiting." ; exit 1 ;
    fi
    echo "Done."
}

YesNo() {
    echo "$1 (y/n)? \c"
    read RESPONSE
    case $RESPONSE in
        [yY]|[Yy][Ee][Ss]) RESPONSE=y ;;
        [nN]|[Nn][Oo]) RESPONSE=n ;;
    esac
}

YesNo "Make backup"
if [ $RESPONSE = "y" ] ; then

    echo "Deleting old backups, please wait… \c"
    rm -fr backup > /dev/null 2>&1
    Failed $?

    echo "Making new backups, please wait… \c"
    cp -r docs backup
    Failed

fi

There are at least three syntax bugs in this script and at least one logical oversight. See if you can find them.

Assuming that the script is called buggy3.sh, first check its syntax as follows:

$ /bin/sh -n ./buggy3.sh

Because there is no output, execute it:

$ /bin/sh ./buggy3.sh

The script first prompts you as follows:

Make backup (y/n)?

Answering y to this prompt produces output similar to the following:

Deleting old backups, please wait… Done.
Making new backups, please wait… buggy3.sh: test: argument expected

On Linux systems, the output might vary slightly. In any case, an error message is generated. Because this doesn’t state which line of the script the error occurs on, you need to track it down manually.

From the output you know that the old backup was deleted successfully; therefore, the error is probably in the following part of the script:

    echo "Making new backups, please wait… \c"
    cp -r docs backup
    Failed

Enable shell tracing for this section as follows:

    set -x
    echo "Making new backups, please wait… \c"
    cp -r docs backup
    Failed
    set +x

The output changes as follows (assuming you answer y to the question):

Make backup (y/n)? y
Deleting old backups, please wait… Done.
+ echo Making new backups, please wait… \c
Making new backups, please wait… + cp -r docs backup
+ Failed
+ [ -ne 0 ]
buggy3.sh: test: argument expected

The execution trace varies slightly on Linux systems.

From this output you can see that the problem occurred in the following statement:

[ -ne 0 ]

From Chapter 10, “Flow Control”, you know that the form of a numerical test command is

[ num1 operator num2 ]

Here it looks like num1 does not exist. Also from the trace you can tell that this error occurred after executing the Failed function. Looking at the function

Failed() {
    if [ $1 -ne 0 ] ; then
        echo "Failed. Exiting." ; exit 1 ;
    fi
    echo "Done."
}

you find that there is only one numerical test. This test compares $1, the first argument to the function, to see whether it is equal to 0. Now the problem should be obvious. When you invoked the Failed function

    echo "Making new backups, please wait… \c"
    cp -r docs backup
    Failed

you forgot to give it an argument, thus the numeric test failed. There are two possible fixes to this bug. The first is to fix the code that calls the function:

    echo "Making new backups, please wait… \c"
    cp -r docs backup
    Failed $?

The second is to fix the function itself by quoting the first argument, “$1”:

Failed() {
    if [ "$1" -ne 0 ] ; then
        echo "Failed. Exiting." ; exit 1 ;
    fi
    echo "Done."
}

By quoting the first argument, “$1”, the shell uses the null or empty string when the function is called without any arguments. In this case the numeric test will not fail because both num1 and num2 have a value.

The best idea is to perform both fixes. After these fixes are applied, the shell tracing output is similar to the following:

Make backup (y/n)? y
Deleting old backups, please wait… Done.
+ echo Making new backups, please wait… \c
Making new backups, please wait… + cp -r docs backup
+ Failed
+ [  -ne 0 ]
+ echo Done.
Done.
+ set +x

Table of Contents