Sams Teach Yourself Shell Programming in 24 Hours
(Publisher: Macmillan Computer Publishing)
Author(s): Sriranga Veeraraghavan
ISBN: 0672314819
Publication Date: 01/01/99

Previous Table of Contents Next


Hour 15
Text Filters

Shell scripts are often called on to manipulate and reformat the output from commands that they execute. Sometimes this task is as simple as displaying only part of the output by filtering out certain lines. In most instances, the processing required is much more sophisticated.

In this chapter, you will look at several commands that are used heavily as text filters in shell scripts. These commands include

  head
  tail
  grep
  sort
  uniq
  tr

I will also show you how to combine these commands to filter text in extremely powerful ways.

The head and tail Commands

In Chapter 3, “Working with Files,” you looked at viewing the contents of a file using the cat command. This command enables you to view an entire file, but often you need more control over lines that are displayed. The head and tail commands provide some of this control.

The head Command

The basic syntax for the head command is

head [-n lines] files

Here files is the list of the files you want the head command to process. Without the -n lines option, the head command shows the first 10 lines of its standard input. This option shows the specified number of lines instead.

Although this command is useful for viewing the tops of large README files, its real power happens in daily applications. Consider the following problem. I need to generate a list of the five most recently accessed files in my public HTML files directory. What is the easiest solution?

It’s easy to devise a solution by breaking the problem down. First, I generate a list of my public HTML files using the following command:

$ ls -1 /home/ranga/public_html

In my case, this generates the following list of files and directories:

RCS
cgi-bin
downloads
humor
images
index.html
misc
projects
school

Next, I need to sort the list by the date of the last access. I can do this by specifying the -ut (sort by last accessed time) option to the ls command:

$ ls -1ut /home/ranga/public_html

The output now changes as follows:

RCS
humor
misc
downloads
images
resume
projects
school
cgi-bin
index.html

To retrieve a list of the five most recently accessed files, I can pipe the output of the ls command into a head command:

ls -1ut /home/ranga/public_html | head -5

This produces the following list:

index.html
RCS
humor
misc
downloads

The tail Command

The basic syntax for the tail command is similar to that of the head command:

tail [-n lines] files

Here files is the list of the files the tail command should process. Without the -n lines option, the tail command shows the last 10 lines of its standard input. With this option it shows the specified number of lines instead.

To illustrate the use of the tail command, consider the problem of generating a list of the five oldest mail spools on my system.

I can start with ls -1 command again, but this time I’ll use the -t (sort by last modified time) option instead:

$ ls -1t /var/spool/mail

To get the bottom five, I’ll use tail instead of head:

$ ls -1t /var/spool/mail | tail -5

On my system the following list is generated:

anna
root
amma
vathsa
ranga

In this list, the files are listed from newest to oldest. To reverse the order, I can also specify the -r option to the ls command:

ls -1rt /var/spool/mail | tail -5

On my system, I get this list:

ranga
vathsa
amma
root
anna

The follow Option

An extremely useful feature of the tail command is the -f (f as in follow) option:

tail -f file

Specifying the -f option enables you to examine the specified file while programs are writing to it.

Often I have to look at the log files generated by programs that I am debugging, but I don’t want to wait for the program to finish, so I can start the program and then use tail -f for the log file.

Some Web administrators use a command such as the following to watch the HTTP requests made for their system:

$ tail -f /var/log/httpd/access_log

Using grep

The grep command lets you locate the lines in a file that contain a particular word or a phrase. The word grep stands for globally regular expression print. The command is derived from a feature of the original UNIX text editor, ed. To find a word in ed, the following command was used:

g/word/p

Here word is a regular expression. For those readers who are not familiar with regular expressions, Chapter 16, “Filtering Text Using Regular Expressions,” discusses them in detail.

This particular ed command was used widely in shell scripts, thus it was factored into its own command called grep. In this section, you will look at the grep command and some of its most commonly used options.


Previous Table of Contents Next