SYNTAX FOR EXPRESSING BINARY CHARACTERS A number of programs allow specification of binary characters via special syntax. The syntax is largely similar from program to program but may differ in small but significant detail. Here is the related, official documentation for some such programs: - bash - sed - echo (the binary executable one) - echo (the shell builtin one) - tr - gawk >>> bash (from bash man page) Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows: \a alert (bell) \b backspace \e \E an escape character \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab \\ backslash \' single quote \" double quote \nnn the eight-bit character whose value is the octal value nnn (one to three digits) \xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits) \uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits) \UHHHHHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex dig- its) \cx a control-x character The expanded result is single-quoted, as if the dollar sign had not been present. >>> sed (from GNU sed user's manual) 3.9 GNU Extensions for Escapes in Regular Expressions ===================================================== Until this chapter, we have only encountered escapes of the form `\^', which tell `sed' not to interpret the circumflex as a special character, but rather to take it literally. For example, `\*' matches a single asterisk rather than zero or more backslashes. This chapter introduces another kind of escape(1)--that is, escapes that are applied to a character or sequence of characters that ordinarily are taken literally, and that `sed' replaces with a special character. This provides a way of encoding non-printable characters in patterns in a visible manner. There is no restriction on the appearance of non-printing characters in a `sed' script but when a script is being prepared in the shell or by text editing, it is usually easier to use one of the following escape sequences than the binary character it represents: The list of these escapes is: `\a' Produces or matches a BEL character, that is an "alert" (ASCII 7). `\f' Produces or matches a form feed (ASCII 12). `\n' Produces or matches a newline (ASCII 10). `\r' Produces or matches a carriage return (ASCII 13). `\t' Produces or matches a horizontal tab (ASCII 9). `\v' Produces or matches a so called "vertical tab" (ASCII 11). `\cX' Produces or matches `CONTROL-X', where X is any character. The precise effect of `\cX' is as follows: if X is a lower case letter, it is converted to upper case. Then bit 6 of the character (hex 40) is inverted. Thus `\cz' becomes hex 1A, but `\c{' becomes hex 3B, while `\c;' becomes hex 7B. `\dXXX' Produces or matches a character whose decimal ASCII value is XXX. `\oXXX' Produces or matches a character whose octal ASCII value is XXX. `\xXX' Produces or matches a character whose hexadecimal ASCII value is XX. >>> echo - the executable one (from echo "info" page) The program accepts the following options.... '-e' Enable interpretation of the following backslash-escaped characters in each STRING: '\a' alert (bell) '\b' backspace '\c' produce no further output '\e' escape '\f' form feed '\n' newline '\r' carriage return '\t' horizontal tab '\v' vertical tab '\\' backslash '\0NNN' the eight-bit value that is the octal number NNN (zero to three octal digits), if NNN is a nine-bit value, the ninth bit is ignored '\NNN' the eight-bit value that is the octal number NNN (one to three octal digits), if NNN is a nine-bit value, the ninth bit is ignored '\xHH' the eight-bit value that is the hexadecimal number HH (one or two hexadecimal digits) >>> echo - the shell builtin one (from builtin man page) echo [-neE] [arg ...] Output the args, separated by spaces, followed by a newline. The return status is always 0. If -n is specified, the trailing newline is suppressed. If the -e option is given, interpretation of the following backslash-escaped characters is enabled. The -E option disables the interpretation of these escape characters, even on systems where they are interpreted by default. The xpg_echo shell option may be used to dynamically determine whether or not echo expands these escape characters by default. echo does not interpret -- to mean the end of options. echo interprets the following escape sequences: \a alert (bell) \b backspace \c suppress further output \e \E an escape character \f form feed \n new line \r carriage return \t horizontal tab \v vertical tab \\ backslash \0nnn the eight-bit character whose value is the octal value nnn (zero to three octal digits) \xHH the eight-bit character whose value is the hexadecimal value HH (one or two hex digits) \uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits) \UHHHHHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex dig- its) >>> tr - the tr info manual 9.1.1 Specifying sets of characters ----------------------------------- The format of the SET1 and SET2 arguments resembles the format of regular expressions; however, they are not regular expressions, only lists of characters. Most characters simply represent themselves in these strings, but the strings can contain the shorthands listed below, for convenience. Some of them can be used only in SET1 or SET2, as noted below. Backslash escapes The following backslash escape sequences are recognized: `\a' Control-G. `\b' Control-H. `\f' Control-L. `\n' Control-J. `\r' Control-M. `\t' Control-I. `\v' Control-K. `\OOO' The character with the value given by OOO, which is 1 to 3 octal digits, `\\' A backslash. While a backslash followed by a character not listed above is interpreted as that character, the backslash also effectively removes any special significance, so it is useful to escape `[', `]', `*', and `-'. >>> gawk - the gawk man page Octal and Hexadecimal Constants Starting with version 3.1 of gawk , you may use C-style octal and hex- adecimal constants in your AWK program source code. For example, the octal value 011 is equal to decimal 9, and the hexadecimal value 0x11 is equal to decimal 17. String Constants String constants in AWK are sequences of characters enclosed between double quotes ("). Within strings, certain escape sequences are recog- nized, as in C. These are: \\ A literal backslash. \a The "alert" character; usually the ASCII BEL character. \b backspace. \f form-feed. \n newline. \r carriage return. \t horizontal tab. \v vertical tab. \xhex digits The character represented by the string of hexadecimal digits fol- lowing the \x. As in ANSI C, all following hexadecimal digits are considered part of the escape sequence. (This feature should tell us something about language design by committee.) E.g., "\x1B" is the ASCII ESC (escape) character. \ddd The character represented by the 1-, 2-, or 3-digit sequence of octal digits. E.g., "\033" is the ASCII ESC (escape) character. \c The literal character c. The escape sequences may also be used inside constant regular expres- sions (e.g., /[ \t\f\n\r\v]/ matches whitespace characters). In compatibility mode, the characters represented by octal and hexadec- imal escape sequences are treated literally when used in regular expression constants. Thus, /a\52b/ is equivalent to /a\*b/. (see also sed-and-awk-table.xls)