Regex Basics: The Syntax

Regular Expressions can be complicated as the expression gets longer. However, the individual pieces are simple. There’s just a lot of them.  Below is a table of the more common characters available. All the available escape sequences are documented on php.net.

Syntax Meaning
Syntax Meaning
 \d any decimal digit: 0-9
 \D any character that is not a decimal digit
 \w  any word characters: a-z and A-Z, 0-9, and _
 \W  Any non word characters
 \s  any whitespace characters:

  • a tab “\t”
  • a vertical tab “\v”
  • a space
  • linefeed “\n”
  • a carriage return “\r”
  • a form feed \f
 \S  Any non whitespace characters
 \b  A word boundary
 \B  A non word boundary

One thing you’ll notice is the escape character with the capital letter, is the opposite, or complement (in the mathematical sense) of it lowercase version. For example, if you think 0-3 being a set of digits, then 4-9 would be it’s complement.

The other set of characters you should know are the meta characters.

Syntax Meaning
Syntax Meaning
 \ General Escape character
 [ Start a character class definition
 ] End a character class definition
 ( Start a sub-pattern match
 ) End a sub-pattern match
 { start min/max quantifier
 } end min/max quantifier
 ^
  • outside of []: assert start of subject (or line, in multi-line mode)
  • inside of []: negates the value of the character class
 $ assert end of subject or before a terminating newline (or end of line, in multi-line mode)
 . match any character except newline (by default)
 ?
  • extends the meaning of (
  • 0 or 1 quantifier
  • also makes greedy quantifiers lazy (see repetition)
 * 0 or more quantifier
 + 1 or more quantifier
 - indicates a character range

The [] allow you define a custom set of characters to be matched. individual characters can be used, or you may define a range. For example, [3456] and [3-6], represent the values 3, 4, 5, and 6, that can be matched against a value.

The () is used to define a sub-pattern. When a value does match the expression within the (), it will be captured or stored into a variable.

The {} sets the number of times should be repeated. One or two numbers can be specified within the {}. When written as {4}, a value will be matched exactly 4 times. When written as {4,7}, a value can be matched 4,5,6,or 7 times. If specified as {,3} a value can be matched 0, 1, 2, or 3 times.  When the first of the two numbers is missing, it’s assumed to be zero. When the second of the two number is not specified, there is no upper bound – it’ll match as many times as available.

The ? makes something optional. It can match 0 or 1 times. It’s equivalent to {0,1}

The * means something can match 0 or more times. It’s equivalent to {0,}

The + means something can match 1 or more times. It’s equivalent to {1,}

Sorry, but comments are closed. I hope you enjoyed the article