Tips for making regular expressions easier to use in JavaScript

[2025-06-24] dev, javascript
(Ad, please don’t block)

In this blog post, we explore ways in which we can make regular expressions easier to use.

The running example  

We’ll use the following regular expression as an example:

const RE_API_SIGNATURE =
  /^(new |get )?([A-Za-z0-9_.\[\]]+)/;

Right now, it is still fairly cryptic. It will be much easier to understand once we get to the tip about “insignificant whitespace”.

Tip: flag /v  

If we add flag /v to our regular expression, we get fewer quirks and more features:

const RE_API_SIGNATURE =
  /^(new |get )?([A-Za-z0-9_.\[\]]+)/v;

/v doesn’t change anything in this particular case, but it helps us if we add grapheme clusters with more than one code point or if we want features such as set operations in character classes.

Tip: Order flags alphabetically  

If there is more than one flag, we should order the flags alphabetically – e.g.:

/pattern/giv

That makes ordering consistent and is also how JavaScript displays regular expressions:

> String(/pattern/vgi)
'/pattern/giv'

Tip: named capture groups  

Our regular expression contains two positional capture groups. If we name them, they describe their purposes and we need less external documentation:

const RE_API_SIGNATURE =
  /^(?<prefix>new |get )?(?<name>[A-Za-z0-9_.\[\]]+)/;

Tip: insignificant whitespace and line comments via #  

So far, the regular expression is still fairly hard to read. We can change that by adding spaces and line breaks. Since the built-in regular expression literals don’t allow us to do that, we use the library Regex+ which provides us with the template tag regex:

import {regex} from 'regex';

const RE_API_SIGNATURE = regex`
  ^
  (?<prefix>
    new \x20  # constructor
    |
    get \x20  # getter
  )?
  (?<name>
    # Square brackets are needed for symbol keys
    [
      A-Z a-z 0-9 _
      .
      \[ \]
    ]+
  )
`;

That’s much easier to read, right?

The feature of ignoring whitespace in regular expression patterns is called insignificant whitespace. Additionally, we used a feature called inline comments – which are started by hash symbols (#).

Two observations:

  • Since all spaces are removed, we use the hex escape \x20 to express that there is a space after new and after get.
  • Alas, line comments are not allowed inside character classes. That’s why the comment about square brackets comes before the character class.

In the future, JavaScript may get built-in support for insignificant whitespace via a flag /x (ECMAScript proposal).

With the regex template tag, the following flags are always active:

  • Flag /v
  • Flag /x (emulated) enables insignificant whitespace and line comments via #.
  • Flag /n (emulated) enables named capture only mode, which prevents numbered groups from capturing. In other words: (pattern) is treated like (?:pattern).

Tip: Write tests for your regular expression  

To make sure that a regular expression works as intended, we can write tests for it. These are tests for RE_API_SIGNATURE:

assert.deepEqual(
  getCaptures(`get Map.prototype.size`),
  {
    prefix: 'get ',
    name: 'Map.prototype.size',
  }
);
assert.deepEqual(
  getCaptures(`new Array(len = 0)`),
  {
    prefix: 'new ',
    name: 'Array',
  }
);
assert.deepEqual(
  getCaptures(`Array.prototype.push(...items)`),
  {
    prefix: undefined,
    name: 'Array.prototype.push',
  }
);
assert.deepEqual(
  getCaptures(`Map.prototype[Symbol.iterator]()`),
  {
    prefix: undefined,
    name: 'Map.prototype[Symbol.iterator]',
  }
);

function getCaptures(apiSignature) {
  const match = RE_API_SIGNATURE.exec(apiSignature);
  // Spread so that the result does not have a null prototype
  // and is easier to compare.
  return {...match.groups};
}

Tip: Mention examples in your documentation  

Seeing strings that match, helps with understanding what a regular expression is supposed to do:

/**
 * Matches API signatures – e.g.:
 * ```
 * `get Map.prototype.size`
 * `new Array(len = 0)`
 * `Array.prototype.push(...items)`
 * `Map.prototype[Symbol.iterator]()`
 * ```
 */
const RE_API_SIGNATURE = regex`
  ···
`;

Some documentation tools let us refer to unit tests in doc comments and show their code in the documentation. That’s a good alternative to what we have done above.

Bonus tip: reusing patterns via interpolation  

The Regex+ library lets us interpolate regular expression fragments (“patterns”), which helps with reuse. The following example defines a simple markup syntax that is reminiscent of HTML:

import { pattern, regex } from 'regex';

const LABEL = pattern`[a-z\-]+`;
const ARGS = pattern`
  (?<args>
    \x20+
    ${LABEL}
  )*
`;
const NAME = pattern`
  (?<name> ${LABEL} )
`;

const TAG = regex`
  (?<openingTag>
    \[
    \x20*
    ${NAME}
    ${ARGS}
    \x20*
    \]
  )
  |
  (?<singletonTag>
    \[
    \x20*
    ${NAME}
    ${ARGS}
    \x20*
    / \]
  )
`;

assert.deepEqual(
  TAG.exec('[pre js line-numbers]').groups,
  {
    openingTag: '[pre js line-numbers]',
    name: 'pre',
    args: ' line-numbers',
    singletonTag: undefined,
    __proto__: null,
  }
);

assert.deepEqual(
  TAG.exec('[hr /]').groups,
  {
    openingTag: undefined,
    name: 'hr',
    args: undefined,
    singletonTag: '[hr /]',
    __proto__: null,
  }
);

The regular expression TAG uses the regular expression fragments NAME and ARGS twice – which reduces redundancy.

Bonus tip: insignificant whitespace without a library  

With the following trick, we don’t need a library to write a regular expression with insignificant whitespace:

const RE_API_SIGNATURE = new RegExp(
  String.raw`
    ^
    (?<prefix>
      new \x20
      |
      get \x20
    )?
    (?<name>
      [
        A-Z a-z 0-9 _
        .
        \[ \]
      ]+
    )
  `.replaceAll(/\s+/g, ''), // (A)
  'v'
);
assert.equal(
  String(RE_API_SIGNATURE),
  String.raw`/^(?<prefix>new\x20|get\x20)?(?<name>[A-Za-z0-9_.\[\]]+)/v`
);

How does this code work?

  • String.raw means we don’t have to escape regular expression backslashes inside the literal. Due to the backticks (vs. single straight quotes or double straight quotes), the regular expression can span multiple lines.
  • .replaceAll() removes all whitespace (spaces, tabs, line breaks, etc.) so that the end result looks almost like the initial version of the regular expression. There is one difference, though: Since literal spaces are removed, we have to find a different way to specify that there is a space after new and after get. One option is the hex escape \x20: hexadecimal 20 (decimal 32) is the code point SPACE.

We can even emulate inline comments like this:

// Template tag function
const cmt = () => '';
const RE = new RegExp(
  String.raw`
    a+ ${cmt`one or more as`}
  `.replaceAll(/\s+/g, ''),
  'v'
);
assert.equal(
  String(RE), '/a+/v'
);

Alas, it’s more syntactically noisy than I’d like.

Conclusion: This is how regular expressions are meant to be written  

One reason why many people don’t like regular expressions is that they find them difficult to read. However, that is much less of a problem with insignificant whitespace and comments. I’d argue that is the proper way of writing regular expressions: Think what JavaScript code would look like if we had to write it without whitespace and comments.

This blog post is an excerpt from “Exploring JavaScript” (free online)  

This blog post is a section in the chapter on regular expressions in my book “Exploring JavaScript” – which is free to read online.