Regex Basics: The Syntax
Regular Expressions can be complicated as the expression gets longer. However, the individual pieces are simple. There’s just a lot of them. Below is a table of the more common characters available. All the available escape sequences are documented on php.net.
Syntax | Meaning |
---|---|
Syntax | Meaning |
\d |
any decimal digit: 0-9 |
\D |
any character that is not a decimal digit |
\w |
any word characters: a-z and A-Z, 0-9, and _ |
\W |
Any non word characters |
\s |
any whitespace characters:
|
\S |
Any non whitespace characters |
\b |
A word boundary |
\B |
A non word boundary |
One thing you’ll notice is the escape character with the capital letter, is the opposite, or complement (in the mathematical sense) of it lowercase version. For example, if you think 0-3 being a set of digits, then 4-9 would be it’s complement.
The other set of characters you should know are the meta characters.
Syntax | Meaning |
---|---|
Syntax | Meaning |
\ |
General Escape character |
[ |
Start a character class definition |
] |
End a character class definition |
( |
Start a sub-pattern match |
) |
End a sub-pattern match |
{ |
start min/max quantifier |
} |
end min/max quantifier |
^ |
|
$ |
assert end of subject or before a terminating newline (or end of line, in multi-line mode) |
. |
match any character except newline (by default) |
? |
|
* |
0 or more quantifier |
+ |
1 or more quantifier |
- |
indicates a character range |
The [] allow you define a custom set of characters to be matched. individual characters can be used, or you may define a range. For example, [3456] and [3-6], represent the values 3, 4, 5, and 6, that can be matched against a value.
The () is used to define a sub-pattern. When a value does match the expression within the (), it will be captured or stored into a variable.
The {} sets the number of times should be repeated. One or two numbers can be specified within the {}. When written as {4}, a value will be matched exactly 4 times. When written as {4,7}, a value can be matched 4,5,6,or 7 times. If specified as {,3} a value can be matched 0, 1, 2, or 3 times. When the first of the two numbers is missing, it’s assumed to be zero. When the second of the two number is not specified, there is no upper bound – it’ll match as many times as available.
The ? makes something optional. It can match 0 or 1 times. It’s equivalent to {0,1}
The * means something can match 0 or more times. It’s equivalent to {0,}
The + means something can match 1 or more times. It’s equivalent to {1,}