Introduction to Regular Expressions using JavaScript - Part 3
Published on 10th of September 2008. Copyright Tavs Dokkedahl. Displayed 1436 time(s)Counting and repeating characters
This part of the tutorial will teach you to count characters and use repeating patterns.
In part 1 and part 2 we have learned about matching single characters, groups of characters and a little about character position in the string (start and end).
With repetition and counting you should be able to see just how powerfull regexs can be.
Repetition
There are several choices for specifying how many of a kind you would like to match. Using an asteric (*) we specify zero or more, a question mark (?) specifies zero or one and a plus sign (+) specifies 1 or more.
1 // Match the word 'color' or 'colour' 2 var rgx = /colou?r/; 3 // Will match 4 'color' 5 'colour'
By this we can match both the british and the american way of spelling the word. The question mark specifies that the 'u' is optional - there can be zero or one 'u'. The *, ? and + are always relative to the character immediatly before.
Now if we want to match a string like '15 colours' but want to replace 15 for any number we can do
1 // Match at least 1 digit followed by a space followed 2 // by the word 'color' or 'colour' 3 var rgx = /\d+\scolou?r/; 4 // Will match 5 '13 color' 6 '1235234 colour' 7 // but not 8 '34 color' (too many spaces) 9 '99colour' (space is missing)
This reads as the character class \d (any digit 0 to 9) one or more times followed by a space (\s) and followed with the optional spelling og color.
If you are matching the above but don't care about how manu spaces there are between the number and the word color you can do this
1 // Match at least 1 digit followed by any number of spaces 2 // (or no space) followed by the word 'color' or 'colour' 3 var rgx = /\d+\s*colou?r/; 4 // Will match 5 '13 color' 6 '1235234 colour' 7 '34 color' 8 '99colour'
The asteric is often useful for leading and trailing spaces. If you want to make sure a form field only contains digits but allow the user to have spaces before and after the digits you can do
1 // Match any number of spaces followed by at least one digit 2 // followed by any number of spaces 3 var rgx = /\s*\d+\s*/; 4 // Will match 5 '13' 6 ' 12' 7 '34 ' 8 ' 99 '
The *, ? and + is really just shorthand for specific cases of the general repetition syntax. Counting the number of repetitions is done using { and } (curly brackets). You can use these in three ways
| Character | Number of matches |
|---|---|
| {x,y} | Match at least x and at most y times |
| {x,} | Match at least x times |
| {x} | Match exactly x times |
From the table we get that
| Character | Equivalent |
|---|---|
| ? | Is the same as {0,1} |
| * | Is the same as {0,} |
| + | Is the same as {1,} |
To match exactly 8 digits we simply write
1 // Match exactly 8 digits 2 var rgx = /\d{8}/; 3 // Will match 4 '12345678' 5 '45678912' 6 // but not 7 '34' 8 '99123637687687'
To match at most 32 characters in the range a-z
1 // Match at most 32 characters in the range a-z 2 var rgx = /[a-z]{0,32}/; 3 // Will match 4 'The small town' 5 '' 6 // but not 7 'A sentence with too many characters'
Matching a string starting with between 7 and 9 digits followed by 1 or more spaces and then followed any character exactly 4 times is done with
1 var rgx = /\d{7,9}\s+.{4}/; 2 // Will match 3 '12345678 Alpha' 4 '4567891 City' 5 // but not 6 '34 snakes' 7 '4567891 crimson'
To test wheter the 4th chracter is an 'o'
1 // Match a string which contains a 'u' as the 4th character 2 var rgx = /^.{3}u/; 3 // Will match 4 'Columbia'
User form example
Now that we can count we can greatly increase the efficiency of our form validation
The regexs are not complete yet but they are certainly better.
Names are unlikely to be smaller than 2 letters and town names are probably 4 letters long. There are room for more than one space between first and last name etc. We still can't provide an optional prefix as they usually start with a plus sign and we can't use the plus sign as it has a special meaning. The same is the case with the email address which requires a sinlge dot.
The regexs in the form translate to
| Field | Regex | Translation |
|---|---|---|
| Name |
/[a-zA-Z]{2,}\s+[a-zA-Z]{2,}/
|
2 or more letters in the range a-z (both lowercase and uppercase) followed by 1 or more spaces followed by the same range again. |
| Address |
/[a-zA-Z]{2,}\s+\d+/
|
2 or more letters in the range a-z (both lowercase and uppercase) followed by 1 or more spaces followed by 1 or more digits. |
| Zip |
/\d{4,8}/
|
Bewteen 4 and 8 digits. |
| City |
/[a-zA-Z]{4,32}/
|
Between 4 and 32 letters in the range a-z (both lowercase and uppercase). |
| Phone |
/\d{8,16}/
|
Between 8 and 16 digits |
/[a-zA-Z]+@[a-zA-Z]+/
|
1 or more letters in the range a-z (both lowercase and uppercase) followed by a single @ followed by the same range again. |
Moving on we will work with special characters and grouping.
| « Part 2 | Part 4 » |
