Template strings: embedded DSLs in ECMAScript 6

[2011-09-20] dsl, esnext, dev, template literals, javascript
(Ad, please don’t block)

In ECMAScript 6, template strings [1] are a syntactic construct that facilitates the implementation of embedded domain-specific languages (DSLs) in JavaScript. They were originally called “quasi-literals”. This blog post explains how they work.

Warning: This blog post is slightly outdated. The terminology has changed:

  • Template literal (was: template string): `abc`
  • Tagged template (was: tagged template string): func`abc`
  • Tag function (was: template handler): func from previous item

Introduction  

The idea is as follows: A template string (short: a template) is similar to a string literal and a regular expression literal in that it provides a simple syntax for creating data. The following is an example.

templateHandler`Hello ${firstName} ${lastName}!`

This is just a compact way of writing (roughly) the following function call:

templateHandler(['Hello ', ' ', '!'], firstName, lastName)

Thus, the name before the content in backquotes is the name of a function to call, the template handler. The handler receives two different kinds of data:

  • Literal sections such as 'Hello '.
  • Substitutions such as firstName (delimited by a dollar sign and braces). A substitution can be any expression.

Literal sections are known statically, substitutions are only known at runtime.

Examples  

Template strings are quite versatile, because they become function calls and because the text that that function receives is structured. Therefore, you only need to write a new function to support a new domain-specific language. The following examples are taken from [2] (which you can consult for details):

Raw strings  

Raw strings are string literals with multiple lines of text and no interpretation of escaped characters.

let str = String.raw`This is a text
with multiple lines.
Escapes are not interpreted,
\n is not a newline.`;

Parameterized regular expression literals  

There are two ways of creating regular expression instances.

  • Statically, via a regular expression literal.
  • Dynamically, via the RegExp constructor.

If you use the latter way, it is because you have to wait until runtime so that all necessary ingredients are available: You are usually concatenating regular expression fragments and text that is to be matched verbatim. The latter has to be escaped properly (dots, square brackets, etc.). A regular expression handler re can help with this task:

re`\d+(${localeSpecificDecimalPoint}\d+)?`

Query languages  

Example:

$`a.${className}[href=~'//${domain}/']`

This is a DOM query that looks for all <a> tags whose CSS class is className and whose target is a URL with the given domain. The template handler $ ensures that the arguments are correctly escaped, making this approach safer than manual string concatenation.

Text localization (L10N)  

There are two components to L10N. First the language and second the locale (how to format numbers, time, etc.). Given the following message.

alert(msg`Welcome to ${siteName}, you are visitor
          number ${visitorNumber}:d!`);

The handler msg would work as follows.

First, The literal parts are concatenated to form a string that can be used to look up a translation in a table. An example for a lookup string is:

'Welcome to {0}, you are visitor number {1}!'

An example for a translation to German is:

'Besucher Nr. {1}, willkommen bei {0}!'

The English “translation” would be the same as the lookup string.

Second, the result from the lookup is used to display the substitutions. Because a lookup result includes indices, it can rearrange the order of the substitutions. That has been done in German, where the visitor number comes before the site name. How the substitutions are formatted can be influenced via annotations such as :d. This annotation means that a locale-specific decimal separator should be used for visitorNumber. Thus, a possible English result is:

Welcome to ACME Corp., you are visitor number 1,300!

In German, we have results such as:

Besucher Nr. 1.300, willkommen bei ACME Corp.!

Secure content generation  

With template strings, one can make a distinction between trusted content coming from the program and untrusted content coming from a user. For example:

safehtml`<a href="${url}">${text}</a>`

The literal sections come from the program, the substitutions url and text come from a user. The template handler safehtml can ensure that no malicious cade is injected via the substitutions. For HTML, the ability to nest template strings is useful:

rows = [['Unicorns', 'Sunbeams', 'Puppies'],
        ['<3', '<3', '<3']],
safehtml`<table>${
    rows.map(function(row) {
        return safehtml`<tr>${
            row.map(function(cell) {
                return safehtml`<td>${cell}</td>`
            })
        }</tr>`
    })
}</table>`

Explanation: The rows of the table are produced by an expression – the invocation of the method row.map(). The result of that invocation is an array of strings that are produced by recursively invoking a template string. safehtml concatenates those strings and inserts them into the given frame. The cells for each row are produced in the same manner.

More examples  

Implementing a handler  

The following is a template string:

handlerName`lit1\n${subst1} lit2 ${subst2}`

This is transformed internally to a function call (adapted from [2:1]):

// Hoisted: call site ID
// “cooked”, newline interpreted
const callSiteId1234 = ['lit1\n', ' lit2 ', ''];
// “raw”, newline verbatim
callSiteId1234.raw = ['lit1\\n', ' lit2 ', ''];

// In-situ: handler invocation
handlerName(callSiteId1234, subst1, subst2)

The parameters of the handler are split into two categories:

  1. The callSiteID where you get the literal parts both with escapes such as \n interpreted (“cooked”) and uninterpreted (“raw”). The number of literal parts is always one plus the number of substitutions. If a substitution is first in a literal, it is prefixed by an empty literal part. If substitution is last, it is suffixed by an empty literal part (as in the example above).

  2. The substitutions, whose values become trailing parameters.

The idea is that the same literal might be executed multiple times (e.g. in a loop); with the callSiteID, the handler can cache data from previous invocations. (1) is potentially cacheable data, (2) changes with each invocation.

Conclusion  

As you can see, there are many applications for template strings. You might wonder why ECMAScript 6 does not introduce a full-blown macro system. That is because it is quite difficult to create a macro system for a language whose syntax is as complex as JavaScript’s. Work on macros is ongoing (see Mozilla’s sweet.js), but will take time. With luck, we will see macros in ECMAScript 8 [3] (which could arrive as early as 2018).

Acknowledgement. Thanks to Brendan Eich, Mark S. Miller, Mike Samuel, and Allen Wirfs-Brock for answering my template-string-related questions on the es-discuss mailing list.

References  


  1. ECMAScript 6 specification: “Template Literals↩︎

  2. ECMAScript Quasi-Literals [proposal for ECMAScript 6] ↩︎ ↩︎

  3. A first look at what might be in ECMAScript 7 and 8 ↩︎