Sams Teach Yourself Shell Programming in 24 Hours
(Publisher: Macmillan Computer Publishing)
Author(s): Sriranga Veeraraghavan
ISBN: 0672314819
Publication Date: 01/01/99

Previous Table of Contents Next


Using Character Classes with tr

Take a look at the output of the command:

$ tr '!?":;\[\]{}(),.\t\n' ' ' < /home/ranga/docs/ch15.doc |
tr 'A-Z' 'a-z' | tr -s ' '  | tr ' ' '\n' | sort | uniq -c |
sort -rn | head
389 the
164 to
127 of
115 is
115 and
111 a
 80 files
 70 file
 69 in
 65 '

You might have noticed that the tenth most common word in this chapter is the single quote character. You might be wondering what’s going on because I said you took care of the punctuation with the very first tr command.

Well, I lied (sort of). You took care of all the characters that would fit between quotes, and a single quote won’t fit.

So why not backslash escape that sucker? Well, not all versions of the shell handle that properly.

So what’s the solution?

The solution is to use the predefined character sets in tr. The tr command knows several character classes, and the punctuation class is one of them. Table 15.1 gives a complete list of the character class names.

Table 15.1 Character Classes Understood by the tr Command

Class Description

alnum Letters and digits
alpha Letters
blank Horizontal whitespace
cntrl Control characters
digit Digits
graph Printable characters, not including spaces
lower Lowercase letters
print Printable characters, including spaces
punct Punctuation
space Horizontal or vertical whitespace
upper Uppercase letters
xdigit Hexadecimal digits

The way to invoke tr with one of these character classes is

tr '[:classname:]' 'set2'

Here classname is the name of one of the classes given in Table 15.1, and set2 is the set of characters you want the characters in classname to be transliterated to.

For example, to get rid of punctuation and spaces, you use the punct and space classes:

$ tr '[:punct:]' ' ' < /home/ranga/docs/ch15.doc | tr
⇒'[:space:]' ' ' |
tr 'A-Z' 'a-z' | tr -s ' '  | tr ' ' '\n' | sort | uniq -c |
sort -rn | head

Here’s some of the new output:

405 the
170 to
136 a
134 of
122 and
119 is
 80 files
 74 file
 72 in
 67 or

The numbers are different for some of the words because I ran the commands and wrote the chapter at the same time.

I could also have replaced 'A-Z' and 'a-z' with the upper and lower classes, but there is no real advantage to using the classes. In most cases the ranges are much more illustrative of your intentions.

Summary

In this chapter you looked at some of the commands that are heavily used for filtering text in scripts. These commands include:

  head
  tail
  grep
  sort
  uniq
  tr

I also covered how to combine these commands together to solve problems such as counting the number of times a word was repeated in a text file. In Chapter 16 I will introduce two more text filtering commands, awk and sed, that give you much more control over editing lines and printing specific columns of your output.

Questions

1.  Given the following shell function
lspids() { /bin/ps -ef | grep "$1"| grep -v grep ; }

make the necessary changes so that when the function is executed as follows
$ lspid -h ssh

the output looks like this:
UID   PID  PPID  C    STIME TTY       TIME COMMAND
root  2121     1  0  Nov 16  ?         0:14 /opt/bin/sshd

Also, when the function executes as
$ lspid ssh

the output looks like this:
root  2121     1  0  Nov 16  ?         0:14 /opt/bin/sshd

Here you are using ssh as the word specified to grep, but your function should be able to use any word as an argument.
Also, validate that you have enough arguments before executing the ps command.
If you are using a Linux or FreeBSD-based system, please use the following version of the function lspids as a starting point instead of the version given previously:
lspids() { /bin/ps -auwx 2> /dev/null | grep "$1"| grep -v
⇒grep ; }

(HINT: The header that you are using is the first line in the output from the /bin/ps -ef command.)
2.  Take the function you wrote in question 1 and add a -s option that sorts the output of the ps command by process ID. The process IDs, or pids, do not have to be arranged from largest to smallest.
If you are using a Linux or FreeBSD system, you need to sort on column 1. On other systems you need to sort on column 2.


Previous Table of Contents Next