Previous | Table of Contents | Next |
You can combine the ^ and the $ metacharacters along with sets of characters and the other metacharacters to match lines according to an expression. For example, the following expression
/^Chapter [1-9]*[0-9]$/
matches lines such as
Chapter 1 Chapter 20
but it does not match lines such as
Chapter 00 Introduction Chapter 101
Because the ^ and $ metacharacters anchor the expression to the beginning and end of a line, you match empty lines as follows:
/^$/
Escaping Metacharacters
Many times you need to match strings such as
Peaches $0.89/lbs Oil $15.10/barrel
This string contains three characters with special meanings in regular expressions:
If you use an ordinary expression such as the following
/$[0-9].[0-9][0-9]/[a-zA-Z]*/
you are unable to match any string because the expression is garbled. The two main problems are:
The third problem is related to the .. Because this metacharacter matches a single occurrence of any one character, it can match the following strings in addition to the strings you want:
0x00 12345
You can solve all these problems using the backslash metacharacter (\). The character immediately following the backslash is always treated literally. When an ordinary character is preceded by a backslash, it has no effect. For example, \a and a are both treated as a lowercase a. When the backslash precedes a metacharacter, the special meaning of that metacharacter is deactivated.
The process of using the backslash to deactivate a metacharacter is called escaping it. For example,
$matches the end of a line, but
\$matches the dollar sign ($) literally.
By using escaping, you can use the following expression to solve your problems:
/\$[0-9]*\.[0-9][0-9]\/[a-zA-Z]*/
If you need to match the \ literally, you can match it by escaping itself, \\.
Sometimes the process of escaping a metacharacter with the backslash is called backslash escaping.
Useful Regular Expressions Table 16.3 provides some useful regular expressions.
String Type | Expression |
---|---|
Blank lines | /^$/ |
An entire line | /^.*$/ |
One or more spaces | / */ |
HTML (or XML) Tags | /<[^>][^>]*>/ |
Valid URLs | /[a-zA-Z][a-zA-Z]*:\/\/[a-zA-Z0-9][a-zA-Z0-9\.]*.*/ |
Formatted dollar amounts | /\$[0-9]*\.[0-9][0-9]/ |
sed is a stream editor that you can use as a filter. It reads each line of input and then performs a set of requested actions. The basic syntax of a sed command is
sed 'script' files
Here files is a list of one or more files, and script is one or more commands of the form:
/pattern/ action
Here, pattern is a regular expression, and action is one of the commands given in Table 16.4. If pattern is omitted, action is performed for every line.
Action | Description |
---|---|
p | Prints the line |
d | Deletes the line |
s/pattern1/pattern2/ | Substitutes the first occurrence of pattern1 with pattern2. |
Start with the simplest feature available in sedprinting a line that matches a pattern.
Consider the price list for a small fruit market. The list is stored in the file fruit_prices.txt:
$ cat fruit_prices.txt Fruit Price/lbs Banana 0.89 Paech 0.79 Kiwi 1.50 Pineapple 1.29 Apple 0.99 Mango 2.20
Here you list the name of a fruit and its price per pound.
Say you want to print out a list of those fruits that cost less than $1 per pound. You need to use the sed command p (p as in print):
/pattern/p
Here pattern is a regular expression.
Try the following sed command:
$ sed '/0\.[0-9][0-9]$/p' fruit_prices.txt
Here you tell sed to print all the lines that match the pattern:
/0\.[0-9][0-9]$/
This means that lines that end in prices such as 0.89 and 0.99 should be printed. You dont want lines that end in prices such as 2.20 to be printed.
Now, look at the output:
Fruit Price/lbs Banana 0.89 Banana 0.89 Paech 0.79 Paech 0.79 Kiwi 1.50 Pineapple 1.29 Apple 0.99 Apple 0.99 Mango 2.20
You find that the lines for fruit with prices less than a dollar are printed twice, whereas lines for fruit with prices greater than a dollar are printed only once.
This demonstrates the default behavior of sedit prints every input line to the output. To avoid this behavior, you can specify the -n option to sed as follows:
$ sed -n '/0\.[0-9][0-9]$/p' fruit_prices.txt
This changes the output as follows:
Banana 0.89 Paech 0.79 Apple 0.99
Previous | Table of Contents | Next |