Suppose, we are looking for a numeric digit then the regular expression we would search for is [0-9] . The brackets indicate that the character being compared should match any one of the characters enclosed within the bracket. The dash (-) between 0 and 9 indicates that it is a range from 0 to 9. Therefore, this regular expression will match any character between 0 and 9, that is, any digit. If we want to search for a special character literally we must use a backslash before the special character. For example, the single character regular expression \* matches a single asterisk. In the table below the special characters are briefly described.
Table 4-1. Regexp Control Characters
^ | Beginning of the string. The expression ^A will match an A only at the beginning of the string. |
^ | The caret (^) immediately following the left-bracket ([) has a different meaning. It is used to exclude the remaining characters within brackets from matching the target string. The expression [^0-9] indicates that the target character should not be a digit. |
$ | The dollar sign ($ ) will match the end of the string. The expression abc$ will match the sub-string abc only if it is at the end of the string. |
| | The alternation character (| ) allows either expression on its side to match the target string. The expression a|b will match a as well as b . |
. | The dot (. ) will match any character. |
* | The asterisk (* ) indicates that the character to the left of the asterisk in the expression should match 0 or more times. |
+ | The plus (+ ) is similar to asterisk but there should be at least one match of the character to the left of the + sign in the expression. |
? | The question mark (? ) matches the character to its left 0 or 1 times. |
() | The parenthesis affects the order of pattern evaluation. |
[ ] | Brackets ([ and ] ) enclosing a set of characters indicates that any of the enclosed characters may match the target character. |
The parenthesis, besides affecting the evaluation order of the regular expression, also serves as tagged expression which is something like a temporary memory. This memory can then be used when we want to replace the source expression with a replace expression. The replace expression can specify an & character which means that the & represents the sub-string that was found. So, if the sub-string that matched the regular expression is abcd , then a replace expression of xyz&xyz will change it to xyzabcdxyz . The replace expression can also be expressed as xyz\0xyz . The \0 indicates a tagged expression representing the entire sub-string that was matched. Similarly you can have other tagged expression represented by \1 , \2 etc. Note that although the tagged expression 0 is always defined, the tagged expression 1, 2, etc. are only defined if the regular expression used in the search had enough sets of parenthesis. Here are few examples:
Table 4-2. Regexp Examples
String | Search | Replace | Result |
---|---|---|---|
Mr. | (Mr)(\.) | \1s\2 | Mrs. |
abc | (a)b(c) | &-\1-\2 | abc-a-c |
bcd | (a|b)c*d | &-\1 | bcd-b |
abcde | (.*)c(.*) | &-\1-\2 | abcde-ab-de |
cde | (ab|cd)e | &-\1 | cde-cd |
([0-9,A-Z,a-z,\ ]*)(STOP:)([0-9,A-Z,a-z,\ ]*) -> \1\2 | foo bar STOP: lkasdfkjakjlf | foo bar STOP: |