This blog post describes when and how to use regular expressions whose flag
/g is set and what can go wrong.
(If you want to read a more general introduction to regular expressions, consult
[1].)
The flag /g of regular expressions
Sometimes, a regular expression should match the same string multiple times.
Then the regular expression object needs to be created with the flag
/g set (be it via a regular expression literal, be it via the constructor
RegExp). That leads to the property
global of the regular expression object being true and to several operations behaving differently.
> var regex = /x/g;
> regex.global
true
The property
lastIndex is used to keep track where in the string matching should continue, as we shall see in a moment.
RegExp.prototype.test(): determining whether there is a match
Regular expressions have the method
RegExp.prototype.test(str)
Without the flag
/g, the method
test() of regular expressions simply checks whether there is a match somewhere in
str:
> var str = '_x_x';
> /x/.test(str)
true
With the flag
/g set,
test() returns
true as many times as there are matches in the string.
lastIndex contains the index after the last match.
> var regex = /x/g;
> regex.lastIndex
0
> regex.test(str)
true
> regex.lastIndex
2
> regex.test(str)
true
> regex.lastIndex
4
> regex.test(str)
false
String.prototype.search(): finding the index of a match
Strings have the method
String.prototype.search(regex)
This method ignores the properties
global and
lastIndex of
regex. It returns the index where
regex matches (the first time).
> '_x_x'.search(/x/)
1
RegExp.prototype.exec(): capturing groups, optionally repeatedly
Regular expressions have the method
RegExp.prototype.exec(str)
If the flag
/g is not set then this method always returns the match object
[1] for the first match:
> var str = '_x_x';
> var regex1 = /x/;
> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex1.exec(str)
[ 'x', index: 1, input: '_x_x' ]
If the flag
/g is set, then all matches are returned – the first one on the first invocation, the second one on the second invocation, etc.
> var regex2 = /x/g;
> regex2.exec(str)
[ 'x', index: 1, input: '_x_x' ]
> regex2.exec(str)
[ 'x', index: 3, input: '_x_x' ]
> regex2.exec(str)
null
String.prototype.match():
Strings have the method
String.prototype.match(regex)
If the flag
/g of
regex is not set then this method behaves like
RegExp.prototype.exec(). If the flag
/g is set then this method returns all matching substrings of the string (every group 0). If there is no match then
null is returned.
> var regex = /x/g;
> '_x_x'.match(regex)
[ 'x', 'x' ]
> 'abc'.match(regex)
null
replace(): search and replace
Strings have the method
String.prototype.replace(search, replacement)
If
search is either a string or a regular expression whose flag
/g is not set, then only the first match is replaced.
If the flag
/g is set, then all matches are replaced.
> '_x_x'.replace(/x/, 'y')
'_y_x'
> '_x_x'.replace(/x/g, 'y')
'_y_y'
The problem with the /g flag
Regular expressions whose
/g flag is set are problematic if a method working with them must be invoked multiple times to return all results. That’s the case for two methods:
- RegExp.prototype.test()
- RegExp.prototype.exec()
Then JavaScript abuses the regular expression as an iterator, as a pointer into the sequence of results. That causes problems:
The following example illustrates the latter problem.
Example: counting occurrences
The following is a naive implementation of a function that counts how many matches there are for the regular expression
regex in the string
str.
// Naive implementation
function countOccurrences(regex, str) {
var count = 0;
while (regex.test(str)) count++;
return count;
}
An example of using this function:
> countOccurrences(/x/g, '_x_x')
2
The first problem is that this function goes into an infinite loop if the regular expression’s
/g flag is not set, e.g.:
countOccurrences(/x/, '_x_x')
The second problem is that the function doesn’t work correctly if
regex.lastIndex isn’t 0. For example:
> var regex = /x/g;
> regex.lastIndex = 2;
2
> countOccurrences(regex, '_x_x')
1
The following implementation fixes the two problems:
function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
var origLastIndex = regex.lastIndex; // store
regex.lastIndex = 0;
var count = 0;
while (regex.test(str)) count++;
regex.lastIndex = origLastIndex; // restore
return count;
}
Using match() to count occurrences
A simpler alternative is to use
match():
function countOccurrences(regex, str) {
if (! regex.global) {
throw new Error('Please set flag /g of regex');
}
return (str.match(regex) || []).length;
}
One possible pitfall:
str.match() returns
null if the
/g flag is set and there are no matches (solved above by accessing
length of
[] if the result of
match() isn’t truthy).
Performance considerations
Juan Ignacio Dopazo compared the performance of the two implementations of counting occurrences and found out that using
test() is faster, presumably because it doesn’t collect the results in an array.
Acknowledgements
Mathias Bynens and
Juan Ignacio Dopazo pointed me to
match() and
test(),
Šime Vidas warned me about being careful with
match() if there are no matches.
Reference
- JavaScript: an overview of the regular expression API