/g
, /y
, and .lastIndex
In this blog post, we examine how the RegExp flags /g
and /y
work and how they depend on the RegExp property .lastIndex
. We’ll also discover an interesting use case for .lastIndex
that you may not have considered yet.
/g
and /y
These flags can be summarized as follows:
/g
(.global
) activates multi-match modes for several regular expression operations./y
(.sticky
) is similar to /g
, but there can’t be gaps between matches.The following two regular expression operations completely ignore /g
and /y
:
String.prototype.search(regExp)
String.prototype.split(regExp)
All other operations are affected by them, in some ways.
/g
(.global
) Let’s see what the multi-match modes look like.
.exec()
and /g
Without /g
, .exec()
always returns a match object for the first match:
> const re = /#/;
> re.exec('##-#')
{ 0: '#', index: 0, input: '##-#' }
> re.exec('##-#')
{ 0: '#', index: 0, input: '##-#' }
With /g
, it returns one new match per invocation and null
when there are no more matches:
> const re = /#/g;
> re.exec('##-#')
{ 0: '#', index: 0, input: '##-#' }
> re.exec('##-#')
{ 0: '#', index: 1, input: '##-#' }
> re.exec('##-#')
{ 0: '#', index: 3, input: '##-#' }
> re.exec('##-#')
null
.replace()
and /g
Without /g
, .replace()
only replaces the first match:
> '##-#'.replace(/#/, 'x')
'x#-#'
With /g
, .replace()
replaces all matches:
> '##-#'.replace(/#/g, 'x')
'xx-x'
.matchAll()
and /g
.matchAll()
only works if /g
is set and returns all match objects:
> const re = /#/g;
> [...'##-#'.matchAll(re)]
[
{ 0: '#', index: 0, input: '##-#' },
{ 0: '#', index: 1, input: '##-#' },
{ 0: '#', index: 3, input: '##-#' },
]
/y
(.sticky
) We will use /y
together with /g
for now (think “/g
without gaps”). To understand /y
on its own, we’ll need to learn about the RegExp property .lastIndex
, which we’ll get to soon.
.exec()
and /gy
With /gy
, each match returned by .exec()
must immediately follow the previous match. That’s why it only returns two matches in the following example:
> const re = /#/gy;
> re.exec('##-#')
{ 0: '#', index: 0, input: '##-#' }
> re.exec('##-#')
{ 0: '#', index: 1, input: '##-#' }
> re.exec('##-#')
null
.replace()
and /gy
With /gy
, .replace()
replaces all matches, as long as there are no gaps between them:
> '##-#'.replace(/#/gy, 'x')
'xx-#'
.matchAll()
and /gy
With /gy
, .matchAll()
returns match objects for adjacent matches only:
> const re = /#/gy;
> [...'##-#'.matchAll(re)]
[
{ 0: '#', index: 0, input: '##-#' },
{ 0: '#', index: 1, input: '##-#' },
]
.lastIndex
The regular expression property .lastIndex
only has an effect if at least one of the flags /g
and /y
is used.
For regular expression operations that are affected by it, it controls where matching starts.
.lastIndex
and /g
For example, .exec()
uses .lastIndex
to remember where it currently is in the input string:
> const re = /[a-z]/g;
> re.lastIndex
0
> [re.exec('a-b'), re.lastIndex]
[{ 0: 'a', index: 0, input: 'a-b' }, 1]
> [re.exec('a-b'), re.lastIndex]
[{ 0: 'b', index: 2, input: 'a-b' }, 3]
> [re.exec('a-b'), re.lastIndex]
[ null, 0 ]
.matchAll()
honors .lastIndex
but does not change it:
> const re = /#/g; re.lastIndex = 1;
> [...'##-#'.matchAll(re)]
[
{ 0: '#', index: 1, input: '##-#' },
{ 0: '#', index: 3, input: '##-#' },
]
> re.lastIndex
1
.replace()
ignores .lastIndex
and sets it to zero:
> const re = /#/g; re.lastIndex = 1;
> '##-#'.replace(re, 'x')
'xx-x'
> re.lastIndex
0
To summarize, for several operations, /g
means: Match at .lastIndex
or later.
.lastIndex
and /y
For /y
, .lastIndex
means: Match at exactly .lastIndex
. It works as if the beginning of the regular expression were anchored to .lastIndex
.
Note that ^
and $
continue to work as usually: They anchor matches to the beginning or end of the input string, unless .multiline
is set. Then they anchor to the beginnings or ends of lines.
.exec()
matches multiple times if /y
is set (even if /g
is not set):
> const re = /#/y; re.lastIndex = 1;
> [re.exec('##-#'), re.lastIndex]
[{0: '#', index: 1, input: '##-#'}, 2]
> [re.exec('##-#'), re.lastIndex]
[ null, 0 ]
If /y
is used without /g
, then .replace()
replaces the first occurrence that is found (directly) at .lastIndex
. It updates .lastIndex
.
> const re = /#/y; re.lastIndex = 1;
> ['##-#'.replace(re, 'x'), re.lastIndex]
[ '#x-#', 2 ]
> ['##-#'.replace(re, 'x'), re.lastIndex] // no match
[ '##-#', 0 ]
> ['##-#'.replace(re, 'x'), re.lastIndex]
[ 'x#-#', 1 ]
/g
and /y
/g
or /y
A regular expression with /g
can’t be inlined. For example, in the following while
loop, the regular expression is created fresh, every time the condition is checked. Therefore, its .lastIndex
is always zero and the loop never terminates.
let matchObj;
// Infinite loop
while (matchObj = /a+/g.exec('bbbaabaaa')) {
console.log(matchObj[0]);
}
With /y
, the problem is the same.
/g
or /y
can break code If code expects a regular expression with /g
and has a loop over the results of .exec()
or .test()
, then a regular expression without /g
can cause an infinite loop:
function collectMatches(regExp, str) {
const matches = [];
let matchObj;
// Infinite loop
while (matchObj = regExp.exec(str)) {
matches.push(matchObj[0]);
}
return matches;
}
collectMatches(/a+/, 'bbbaabaaa'); // Missing: flag /g
Why is there an infinity loop? Because .exec()
always returns the first result, a match object, and never null
.
With /y
, the problem is the same.
/g
or /y
can break code With .test()
, there is another caveat: It is affected by .lastIndex
. Therefore, if we want to check exactly once if a regular expression matches a string, then the regular expression must not have /g
. Otherwise, we generally get a different result every time we call .test()
:
> const regExp = /^X/g;
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ false, 0 ]
> [regExp.test('Xa'), regExp.lastIndex]
[ true, 1 ]
The first invocation produces a match and updates .lastIndex
. The second invocation does not find a match and resets .lastIndex
to zero.
If we create a regular expression specifically for .test()
, then we probably won’t add /g
. However, the likeliness of encountering /g
increases if we use the same regular expression for replacing and for testing.
Once again, this problem also exists with /y
:
> const regExp = /^X/y;
> regExp.test('Xa')
true
> regExp.test('Xa')
false
> regExp.test('Xa')
true
.lastIndex
isn’t zero Given all the regular expression operations that are affected by .lastIndex
, we must be careful with many algorithms that .lastIndex
is zero at the beginning. Otherwise, we may get unexpected results:
function countMatches(regExp, str) {
let count = 0;
while (regExp.test(str)) {
count++;
}
return count;
}
const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(
countMatches(myRegExp, 'babaa'), 1); // should be 3
Normally, .lastIndex
is zero in newly created regular expressions and we won’t change it explicitly like we did in the example. But .lastIndex
can still end up not being zero if we use the regular expression multiple times.
/g
, /y
, and .lastIndex
As an example of dealing with /g
and .lastIndex
, we revisit countMatches()
from the previous example. How do we prevent a wrong regular expression from breaking our code? Let’s look at three approaches.
First, we can throw an exception if /g
isn’t set or .lastIndex
isn’t zero:
function countMatches(regExp, str) {
if (!regExp.global) {
throw new Error('Flag /g of regExp must be set');
}
if (regExp.lastIndex !== 0) {
throw new Error('regExp.lastIndex must be zero');
}
let count = 0;
while (regExp.test(str)) {
count++;
}
return count;
}
Second, we can clone the parameter. That has the added benefit that regExp
won’t be changed.
function countMatches(regExp, str) {
const cloneFlags = regExp.flags + (regExp.global ? '' : 'g');
const clone = new RegExp(regExp, cloneFlags);
let count = 0;
while (clone.test(str)) {
count++;
}
return count;
}
.lastIndex
or flags Several regular expression operations are not affected by .lastIndex
or by flags. For example, .match()
ignores .lastIndex
if /g
is present:
function countMatches(regExp, str) {
if (!regExp.global) {
throw new Error('Flag /g of regExp must be set');
}
return (str.match(regExp) || []).length;
}
const myRegExp = /a/g;
myRegExp.lastIndex = 4;
assert.equal(countMatches(myRegExp, 'babaa'), 3); // OK!
Here, countMatches()
works even though we didn’t check or fix .lastIndex
.
.lastIndex
: starting matching at a given index Apart from storing state, .lastIndex
can also be used to start matching at a given index. This section describes how.
Given that .test()
is affected by /y
and .lastIndex
, we can use it to check if a regular expression regExp
matches a string str
at a given index
:
function matchesStringAt(regExp, str, index) {
if (!regExp.sticky) {
throw new Error('Flag /y of regExp must be set');
}
regExp.lastIndex = index;
return regExp.test(str);
}
assert.equal(
matchesStringAt(/x+/y, 'aaxxx', 0), false);
assert.equal(
matchesStringAt(/x+/y, 'aaxxx', 2), true);
regExp
is anchored to .lastIndex
due to /y
.
Note that we must not use the assertion ^
which would anchor regExp
to the beginning of the input string.
.search()
lets us find the location where a regular expression matches:
> '#--#'.search(/#/)
0
Alas, we can’t change where .search()
starts looking for matches. As a work-around, we can use .exec()
for searching:
function searchAt(regExp, str, index) {
if (!regExp.global && !regExp.sticky) {
throw new Error('Either flag /g or flag /y of regExp must be set');
}
regExp.lastIndex = index;
const match = regExp.exec(str);
if (match) {
return match.index;
} else {
return -1;
}
}
assert.equal(
searchAt(/#/g, '#--#', 0), 0);
assert.equal(
searchAt(/#/g, '#--#', 1), 3);
When used without /g
and with /y
, .replace()
makes one replacement – if there is a match at .lastIndex
:
function replaceOnceAt(str, regExp, replacement, index) {
if (!(regExp.sticky && !regExp.global)) {
throw new Error('Flag /y must be set, flag /g must not be set');
}
regExp.lastIndex = index;
return str.replace(regExp, replacement);
}
assert.equal(
replaceOnceAt('aa aaaa a', /a+/y, 'X', 0), 'X aaaa a');
assert.equal(
replaceOnceAt('aa aaaa a', /a+/y, 'X', 3), 'aa X a');
assert.equal(
replaceOnceAt('aa aaaa a', /a+/y, 'X', 8), 'aa aaaa X');
.global
(/g
) and .sticky
(/y
) The following two methods are completely unaffected by /g
and /y
:
String.prototype.search()
String.prototype.split()
Flag /g |
# | .lI |
Flag /yg |
|
---|---|---|---|---|
.exec() |
0+ | at .lI or later |
✓ upd. | same as /y |
`: null | MObj` | |||
.test() |
0+ | at .lI or later |
✓ upd. | same as /y |
: boolean |
||||
.replace() |
1 | all occurrences | ✗ reset | /g w/o gaps |
: string |
||||
.replaceAll() |
1 | (same as .replace ) |
✗ reset | /g w/o gaps |
: string |
||||
.match() |
1 | ✗ reset | /g w/o gaps |
|
`: null | Array |
|||
.matchAll() |
1 | at .lI or later |
✓ unch. | /g w/o gaps |
: Iterable<MObj> |
Legend:
.lI
means .lastIndex
MObj
means MatchObject
.lastIndex
?
.lastIndex
is either updated or unchanged..lastIndex
isn’t touched, but several operations reset it to zero.Flag /y |
# | Result | .lI |
|
---|---|---|---|---|
.exec() |
0+ | null¦MObj |
at .lI |
✓ updated |
.test() |
0+ | boolean |
at .lI |
✓ updated |
.replace() |
1 | string |
occurrence at .lI |
✓ updated |
.replaceAll() |
TypeError |
|||
.match() |
0+ | null¦MObj |
(same as .exec() ) |
✓ updated |
.matchAll() |
TypeError |
I have written a small Node.js script that prints the following result table:
const s='##-#';
const r=/#/g; r.lastIndex=1;
r.exec(s) .index=1 .lastIndex updated
r.test(s) true .lastIndex updated
s.replace(r, 'x') "xx-x" .lastIndex reset
s.replaceAll(r, 'x') "xx-x" .lastIndex reset
s.match(r) ["#","#","#"] .lastIndex reset
s.matchAll(r) [["#"],["#"]] .lastIndex unchanged
const r=/#/y; r.lastIndex=1;
r.exec(s) .index=1 .lastIndex updated
r.test(s) true .lastIndex updated
s.replace(r, 'x') "#x-#" .lastIndex updated
s.replaceAll(r, 'x') TypeError
s.match(r) .index=1 .lastIndex updated
s.matchAll(r) TypeError
const r=/#/yg; r.lastIndex=1;
r.exec(s) .index=1 .lastIndex updated
r.test(s) true .lastIndex updated
s.replace(r, 'x') "xx-#" .lastIndex reset
s.replaceAll(r, 'x') "xx-#" .lastIndex reset
s.match(r) ["#","#"] .lastIndex reset
s.matchAll(r) [["#"]] .lastIndex unchanged
(Older versions of .matchAll()
don’t throw a TypeError
if /g
is missing.)
The regular expression property .lastIndex
has two significant downsides:
.lastIndex
is inconsistent among regular expression operations.On the upside, .lastIndex
also gives us additional useful functionality: We can dictate where matching should begin (for some operations).
RegExp
)” in “JavaScript for impatient programmers”String.prototype.matchAll
”String.prototype.replaceAll
”