The proposal “
dotAll) flag for regular expressions” by Mathias Bynens and Brian Terlson is currently at stage 3. This blog post explains how it works.
Currently, the dot (
.) in regular expressions doesn’t match line terminator characters:
The proposal specifies the regular expression flag
/s that changes that:
Limitations of the dot (
.) in regular expressions
The dot (
.) in regular expressions has two limitations.
First, it doesn’t match astral (non-BMP) characters such as emoji:
This can be fixed via the
Second, the dot does not match line terminator characters:
That can currently only be fixed by replacing the dot with work-arounds such as
[^] (“all characters except no character”) or
[\s\S] (“either whitespace nor not whitespace”).
Line terminators recognized by ECMAScript
Line termators in ECMAScript affect:
- The dot, in all regular expressions that don’t have the flag
- The anchors
$ if the flag
multiline) is used.
The following for characters are considered line terminators by ECMAScript:
- U+000A LINE FEED (LF) (
- U+000D CARRIAGE RETURN (CR) (
- U+2028 LINE SEPARATOR
- U+2029 PARAGRAPH SEPARATOR
There are additionally some newline-ish characters that are not considered line terminators by ECMAScript:
- U+000B VERTICAL TAB (
- U+000C FORM FEED (
- U+0085 NEXT LINE
Those three characters are matched by the dot without a flag:
The proposal introduces the regular expression flag
/s (short for “singleline”), which leads to the dot matching line terminators:
The long name of
> new RegExp('.', 's').dotAll
dotAll only affects the dot.
multiline only affects
Why is the flag named
dotAll is a good description of what the flag does, so, arguably,
/d would have been better names. However,
/s is already an established name (Perl, Python, Java, C#, ...).
Trying it out
V8 5.9+ implements the proposal, but you need
--harmony-regexp-dotall to switch it on: