| The Perl You Need to Know: Basic Regular Expression Syntax | ||
| . | any single character | The dot (.) can be used as a placeholder for any character. Examples: "do." would match "dog", "dot", "doe", etc. "d..r" would match "door" and "deer". |
| * | zero or more of the previous character | The asterisk (*) specifies that zero or more instances of the previous character should exist in sequence. Examples: "do.*" would match "dog", "done", "doppleganger", etc. (why? "d-o- followed by zero or more of any chararcter") "to*" would match "to" and "too" (why? "t-o- followed by zero or more o's") "fre*.." would match "frat", "free", "from" (why? "f-r- followed by zero or more e's followed by any two characters) |
| + | one or more of the previous character | The plus sign (+) demands that there be at least one of the previous character in sequence; similar to (*) but slightly more strict. Examples: "fre+.." would match "freak", "freeze", "fresh" (why? "f-r- followed by one or more e's followed by any two characters) |
| ? | zero or one of the previous character | The question mark (?) says that there should be zero or one of the previous character but not more than one. This is stricter than either (*) or (+). Examples: "ton?e" would match "toe" and "tone" (why? "t-o- followed by zero or one n followed by e") |
| ( ) | grouping | The parentheses ( ) are used to group together patterns, for instance, to logically combine two or more patterns. Example: (dog|cat) would match "dog" and "cat" (why? "dog or cat") |
| [] | any character from the set | The square brackets ([]) can be used as a placeholder for a single character which matches any of a set of characters. Confusing, at first, but some examples should clarify: "ta[pb]" would match "tap" and "tab" (why? "t-a- followed by one character from the set of pb") "r[aeiou]t" would match "rat", "ret", "rot", "rut" (why? "r- followed by one character from the set of vowels followed by t") "r[aeiou]+t" would match "rat" (plus all of the above), "riot", "root", etc. (why? "r- followed by one or more vowels followed by t") |
| [^] | any character not from the set | Placing a carat (^) inside the square brackets ([]) negates the set; meaning the character must match any character not within the set. This is a useful way of specifying a large set of characters, for instance, consonants are "not vowels"; examples: "t[^aeiou]+.*s" matches "thanks", "this", "trappings", etc. (why? "t- followed by one or more of any character which is not a vowel followed by zero or more of any character followed by an s") |
| {min,max} | range of occurrences | The curly braces ({}) are used to require that the preceding character or set of characters occur a certain number of times. Examples: "[a-z]{3}" would require that a lowercase letter appear 3 consecutive times. "[0-9]{3,}" would require that a digit appear 3 or more consecutive times. "[A-Z]{2,5}" would require that an uppercase letter appear between 2 and 5 consecutive times. |
| Character Classes | Anchor Sequences | ||
| \d | Any digit [0-9] | ^ | Beginning of data string |
| \D | Any non-digit [^0-9] | $ | End of data string |
| \w | Any alphanumeric [a-zA-Z0-9_] | \b | A word boundary |
| \W | Any non-alphanumeric [^a-zA-Z0-9_] | \B | Any place except a word boundary |
| \s | Any space [ \t\n\r\f] | ||
| \S | Any non-space [^ \t\n\r\f] | ||
| Escape Sequences | |
| \n | Newline character, aka linefeed. This is the typical end-of-line character. |
| \r | Carriage return character. |
| \t | Tab character. |
| \e | Escape character. |
| \xFF | A hexadecimal value in place of "FF". |
No comments:
Post a Comment