Previous | Table of Contents | Next |
Take a look at the output of the command:
$ tr '!?":;\[\]{}(),.\t\n' ' ' < /home/ranga/docs/ch15.doc | tr 'A-Z' 'a-z' | tr -s ' ' | tr ' ' '\n' | sort | uniq -c | sort -rn | head 389 the 164 to 127 of 115 is 115 and 111 a 80 files 70 file 69 in 65 '
You might have noticed that the tenth most common word in this chapter is the single quote character. You might be wondering whats going on because I said you took care of the punctuation with the very first tr command.
Well, I lied (sort of). You took care of all the characters that would fit between quotes, and a single quote wont fit.
So why not backslash escape that sucker? Well, not all versions of the shell handle that properly.
So whats the solution?
The solution is to use the predefined character sets in tr. The tr command knows several character classes, and the punctuation class is one of them. Table 15.1 gives a complete list of the character class names.
Class | Description |
---|---|
alnum | Letters and digits |
alpha | Letters |
blank | Horizontal whitespace |
cntrl | Control characters |
digit | Digits |
graph | Printable characters, not including spaces |
lower | Lowercase letters |
Printable characters, including spaces | |
punct | Punctuation |
space | Horizontal or vertical whitespace |
upper | Uppercase letters |
xdigit | Hexadecimal digits |
The way to invoke tr with one of these character classes is
tr '[:classname:]' 'set2'
Here classname is the name of one of the classes given in Table 15.1, and set2 is the set of characters you want the characters in classname to be transliterated to.
For example, to get rid of punctuation and spaces, you use the punct and space classes:
$ tr '[:punct:]' ' ' < /home/ranga/docs/ch15.doc | tr ⇒'[:space:]' ' ' | tr 'A-Z' 'a-z' | tr -s ' ' | tr ' ' '\n' | sort | uniq -c | sort -rn | head
Heres some of the new output:
405 the 170 to 136 a 134 of 122 and 119 is 80 files 74 file 72 in 67 or
The numbers are different for some of the words because I ran the commands and wrote the chapter at the same time.
I could also have replaced 'A-Z' and 'a-z' with the upper and lower classes, but there is no real advantage to using the classes. In most cases the ranges are much more illustrative of your intentions.
In this chapter you looked at some of the commands that are heavily used for filtering text in scripts. These commands include:
I also covered how to combine these commands together to solve problems such as counting the number of times a word was repeated in a text file. In Chapter 16 I will introduce two more text filtering commands, awk and sed, that give you much more control over editing lines and printing specific columns of your output.
lspids() { /bin/ps -ef | grep "$1"| grep -v grep ; }
$ lspid -h ssh
UID PID PPID C STIME TTY TIME COMMAND root 2121 1 0 Nov 16 ? 0:14 /opt/bin/sshd
$ lspid ssh
root 2121 1 0 Nov 16 ? 0:14 /opt/bin/sshd
lspids() { /bin/ps -auwx 2> /dev/null | grep "$1"| grep -v ⇒grep ; }
Previous | Table of Contents | Next |