RegEx Basics: Numbers
Regular Expressions, or RegEx for short, are a powerful tool to have in your skill set. For all their power, they have a single purpose — to describe what data looks like. They have a reputation for being difficult — and rightfully so. The syntax used to describe data can be quite complicated. There are a few different types of regular expressions, but the best, and most widely used regular expressions come from Perl. In fact, every major programming language uses them. In PHP, we call them PCRE.
In this post, we’ll break down a regular expression validates if a value matches a whole number between 0 and 255. Why this number range? It has a couple of uses. First, it could used in a larger regex to validate an IPv4 address. Second it also be used in a regex for match CSS colors in the form of the rgb(207, 254, 224)
. It’s also happens to be a great expression to teach people learning about regular expressions — conceptually simple, but complex enough to teach a couple of regex concepts.
The Numbers 0-255 RegEx
(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])
This regular expression checks if a string matches any whole number between 0 and 255, inclusive. Let’s break down this regular expression into pieces.
The Parentheses
In a regular expression, parentheses are used to group things together. They’re also used to capture the information, which is automatically stored into a numbered variable – $1 for the first match, $2 for the second, and so on. The zero variable – $0 – stores the entire value that matched.
The Pipes
The pipe character ‘|’, sometimes called the vertical bar, means ‘or’ in regular expression syntax. It lets you validate if more than one value is true — this or that. In our regex, we’re checking if a value falls into one of 4 ranges of numbers.
The Ranges of Numbers
Remember, the objective of the regex above is valid if a number is between 0 and 255. In a regular expression, the \d character is used to represent a single digit, 0-9. Sometimes we want to pick which characters we want to look for. For this, we can specify any characters we want between square brackets e.g. [0-9]
, which you’ll notice is equivalent to the \d
we mentioned earlier. Let look at our number ranges and their associated expressions
- 250-255 : 25[0-5]
- 200-249 : 2[0-4][0-9]
- 100-199 : 1[0-9][0-9]
- 0-99 : [1-9]?[0-9]
While I could’ve used \d
instead of the [0-9]
ranges in the above, I felt this this way was more readable in this particular expression. Based on what we previously discussed, seeing the number ranges next to their associated expressions seem pretty self explanatory. We’ll maybe not the 0-99 expression. That needs some explanation. The numbers 0-99 make up 10 blocks of 10 numbers. But only 9 of those blocks have 2 digits. For that first block of 10, the number 0-9, only have 1 digits. The ?
in the expression [1-9]?[0-9]
means the number in the tens place is optional. Without that question mark, only the number 10-99 would be valid. The numbers 0-9 would never be matched! So we need it there.
In Summary
Here’s what happening in the regular expression: Within the parentheses, four number ranges are compared against the value : first number range 250-255, then 200-249, then 100-199, and finally 0-99. If the value matches any of the number ranges, the value will be stored in the equivalent of $1 variable.
Here’s the corresponding PHP code:
function isBetween0and255($subject) { $pattern = '/(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])/'; $matches = []; $result = preg_match($pattern, $subject, $matches); if (isset($matches[1]) && $matches[1] === $subject){ return true; } return false; }
I hope this helps you. Stay tuned for future articles on regular expressions.