Sams Teach Yourself Shell Programming in 24 Hours
(Publisher: Macmillan Computer Publishing)
Author(s): Sriranga Veeraraghavan
ISBN: 0672314819
Publication Date: 01/01/99

Previous Table of Contents Next


You can combine the ^ and the $ metacharacters along with sets of characters and the other metacharacters to match lines according to an expression. For example, the following expression

/^Chapter [1-9]*[0-9]$/

matches lines such as

Chapter 1
Chapter 20

but it does not match lines such as

Chapter 00 Introduction
Chapter 101

Because the ^ and $ metacharacters anchor the expression to the beginning and end of a line, you match empty lines as follows:

/^$/

Escaping Metacharacters

Many times you need to match strings such as

Peaches $0.89/lbs
Oil $15.10/barrel

This string contains three characters with special meanings in regular expressions:

  The dollar character, $
  The decimal point character, .
  The per character, /

If you use an ordinary expression such as the following

/$[0-9].[0-9][0-9]/[a-zA-Z]*/

you are unable to match any string because the expression is garbled. The two main problems are:

  The first character in this expression is the $ character. Because the $ matches the end of the line, this expression tries to look for characters after the end of the line. This is an impossible pattern to match.
  There are three slashes. The first two slashes are used as the delimiters for the pattern. The [a-zA-Z]*/ that occurs after this pattern confuses awk and sed.

The third problem is related to the .. Because this metacharacter matches a single occurrence of any one character, it can match the following strings in addition to the strings you want:

0x00
12345


You can solve all these problems using the backslash metacharacter (\). The character immediately following the backslash is always treated literally. When an ordinary character is preceded by a backslash, it has no effect. For example, \a and a are both treated as a lowercase a. When the backslash precedes a metacharacter, the special meaning of that metacharacter is “deactivated.”


The process of using the backslash to deactivate a metacharacter is called escaping it. For example,
$

matches the end of a line, but

\$

matches the dollar sign ($) literally.


By using escaping, you can use the following expression to solve your problems:

/\$[0-9]*\.[0-9][0-9]\/[a-zA-Z]*/

If you need to match the \ literally, you can match it by escaping itself, \\.

Sometimes the process of escaping a metacharacter with the backslash is called backslash escaping.

Useful Regular Expressions Table 16.3 provides some useful regular expressions.

Table 16.3 Some Useful Regular Expressions

String Type Expression

Blank lines /^$/
An entire line /^.*$/
One or more spaces / */
HTML (or XML) Tags /<[^>][^>]*>/
Valid URLs /[a-zA-Z][a-zA-Z]*:\/\/[a-zA-Z0-9][a-zA-Z0-9\.]*.*/
Formatted dollar amounts /\$[0-9]*\.[0-9][0-9]/

Using sed

sed is a stream editor that you can use as a filter. It reads each line of input and then performs a set of requested actions. The basic syntax of a sed command is

sed 'script' files

Here files is a list of one or more files, and script is one or more commands of the form:

/pattern/ action

Here, pattern is a regular expression, and action is one of the commands given in Table 16.4. If pattern is omitted, action is performed for every line.

Table 16.4 Some of the Actions Available in sed

Action Description

p Prints the line
d Deletes the line
s/pattern1/pattern2/ Substitutes the first occurrence of pattern1 with pattern2.

Printing Lines

Start with the simplest feature available in sed—printing a line that matches a pattern.

Consider the price list for a small fruit market. The list is stored in the file fruit_prices.txt:

$ cat fruit_prices.txt
Fruit           Price/lbs
Banana          0.89
Paech           0.79
Kiwi            1.50
Pineapple       1.29
Apple           0.99
Mango           2.20

Here you list the name of a fruit and its price per pound.

Say you want to print out a list of those fruits that cost less than $1 per pound. You need to use the sed command p (p as in print):

/pattern/p

Here pattern is a regular expression.

Try the following sed command:

$ sed '/0\.[0-9][0-9]$/p' fruit_prices.txt

Here you tell sed to print all the lines that match the pattern:

/0\.[0-9][0-9]$/

This means that lines that end in prices such as 0.89 and 0.99 should be printed. You don’t want lines that end in prices such as 2.20 to be printed.

Now, look at the output:

Fruit           Price/lbs
Banana          0.89
Banana          0.89
Paech           0.79
Paech           0.79
Kiwi            1.50
Pineapple       1.29
Apple           0.99
Apple           0.99
Mango           2.20

You find that the lines for fruit with prices less than a dollar are printed twice, whereas lines for fruit with prices greater than a dollar are printed only once.

This demonstrates the default behavior of sed—it prints every input line to the output. To avoid this behavior, you can specify the -n option to sed as follows:

$ sed -n '/0\.[0-9][0-9]$/p' fruit_prices.txt

This changes the output as follows:

Banana          0.89
Paech           0.79
Apple           0.99


Previous Table of Contents Next