JavaScript needs more helper functions for iteration (map, filter, etc.) – where should we put them?

[2021-08-09] dev, javascript, iteration
(Ad, please don’t block)

Iteration is a standard that connects operations with data containers: Each operation that follows this standard, can be applied to each data container that implements this standard.

In this blog post:

  • We first explore three questions:
    • How does JavaScript’s iteration work?
    • What are its quirks?
    • What do helper functions for iteration look like? Examples include iteration versions of the Array methods .map(), .filter(), and .forEach().
  • Next, we examine the pros and cons of several approaches for implementing helper functions: As methods of data containers? As functions? Etc. Two of these approaches are supported by concrete proposals.
  • The post concludes by explaining additional benefits of iteration and iteration-based helpers.

JavaScript iteration and its quirks  

What is iteration?  

Iteration was added to JavaScript in ECMAScript 6. There are two sides to this protocol (interfaces and rules for using them):

  • A data producer (such as a data structure) can implement the iteration protocol and expose its output (or content) through it.
  • A data consumer (such as an algorithm) can retrieve its input via the iteration protocol.

A data producer that implements the iteration protocol is called an iterable. That term is also used as an adjective: “an iterable data structure”.

A key benefit of iteration is that each data consumer that uses iteration, can be used with each iterable data producer.

The JavaScript standard library already has several iteration-based data producers and data consumers – for example:

  • Data producers:
    • Arrays, Maps, Sets, strings
    • Results of array.keys() (iterable objects that are not Arrays)
    • Results of map.entries() (iterable objects that are not Arrays)
  • Data consumers
    • for-of
    • Array.from()
    • Spreading into Arrays ([...input])
    • Spreading into function calls (func(...input))

Alas, JavaScript does not yet support many iteration-based algorithms. These are three examples of helper functions that would be useful:

  • map: lists the results of invoking a callback on each value of an iterable.
  • filter: lists all values of an iterable for which a callback returns true.
  • forEach: invokes a callback with each value of an iterable.

Both input and output of map and filter are iterable, which means that we can chain these operations.

Core iteration entities: iterables and iterators  

The two most important entities in the iteration protocol are:

  • Iterable: This entity is the container that holds data. It exposes that data by being a factory for iterators.
  • Iterator: This entity returns each value contained in an iterable, one at a time (think cursor in a database).

An object obj becomes iterable by implementing one method:

  • obj[Symbol.iterator](): This method return iterators.

An iterator iter is an object that delivers values via one method:

  • iter.next(): This method returns objects with two properties:
    • .value: contains the current value
    • .done: is false as long as there still are values and true afterwards.

This is what using iteration looks like in practice:

> const iterable = ['a', 'b'];
> const iterator = iterable[Symbol.iterator]();
> iterator.next()
{ value: 'a', done: false }
> iterator.next()
{ value: 'b', done: false }
> iterator.next()
{ value: undefined, done: true }

Iterators that are also iterable  

When implementing an iterable, a common technique is to make that iterable also an iterator:

function iterArgs(...args) {
  let index = 0;
  const iterable = {
    [Symbol.iterator]() {
      return this; // (A)
    },
    next() {
      if (index >= args.length) {
        return {done: true};
      }
      const value = args[index];
      index++;
      return {value, done: false};
    }
  };
  return iterable;
}

const iterable1 = iterArgs('a', 'b', 'c');
assert.deepEqual([...iterable1], ['a', 'b', 'c']);

In line A, we don’t return a new object, we return this.

This technique has three upsides:

  1. Our code becomes simpler.
  2. We can iterate over iterators.
  3. Generator functions and methods can be used to implement both iterables and iterators.

We’ll first examine upside #2 and then upside #3.

We can iterate over iterable iterators  

In line B in the following code, we can continue the iteration that we started in line A.

const iterable2 = iterArgs('a', 'b', 'c');
const iterator2 = iterable2[Symbol.iterator]();

const firstItem = iterator2.next().value; // (A)
assert.equal(firstItem, 'a');

const remainingItems = [...iterator2]; // (B)
assert.deepEqual(
  remainingItems, ['b', 'c']);

Generators return iterable iterators and can implement both iterables and iterators  

All iterators created by the JavaScript standard library are iterable. The objects returned by generators are also both iterators and iterables.

Therefore, we can use generators to implement iterables:

function* iterArgs(...args) {
  for (const arg of args) {
    yield arg;
  }
}
const iterable = iterArgs('red', 'green', 'blue');
assert.deepEqual(
  Array.from(iterable),
  ['red', 'green', 'blue']
);

But we can also use them to implement iterators (line A):

class ValueContainer {
  #values;
  constructor(...values) {
    this.#values = values;
  }
  * [Symbol.iterator]() { // (A)
    for (const value of this.#values) {
      yield value;
    }
  }
}

const iterable = new ValueContainer(1, 2, 3);
assert.deepEqual(
  Array.from(iterable),
  [1, 2, 3]
);

A downside of iterable iterators  

With iterable iterators, we now have two kinds of iterators.

On one hand, there are iterables over which we can iterate as often as we want:

function iterateTwice(iterable) {
  return [...iterable, ...iterable];
}

const iterable1 = ['a', 'b'];
assert.deepEqual(
  iterateTwice(iterable1),
  ['a', 'b', 'a', 'b']);

On the other hand, there are iterables over which we can only iterate once:

// .values() returns an iterable (not an Array like Object.values())
const iterable2 = ['a', 'b'].values();
assert.deepEqual(
  iterateTwice(iterable2),
  ['a', 'b']);

function* gen() {
  yield 'a';
  yield 'b';
}
const iterable3 = gen();
assert.deepEqual(
  iterateTwice(iterable3),
  ['a', 'b']);

Conceptually, things have become more confusing:

  • On one hand, for-of, spreading, etc. only accept iterables.
  • On the other hand, constructs such as generators, array.keys(), and map.entries() return iterators that just happen to also be iterable.

%IteratorPrototype%: the prototype of all iterators in the standard library  

In the ECMAScript specification, internal objects are enclosed in percent signs. One such object is %IteratorPrototype%:

// We omit the percent signs so that it’s legal JavaScript
const IteratorPrototype = {
  [Symbol.iterator]() {
    return this;
  },
};

This method is inherited by all objects that have %IteratorPrototype% as a prototype and makes them iterable – if this is also an iterator.

Even though %IteratorPrototype% is not directly accessible from JavaScript, we can access it indirectly:

const IteratorPrototype = Object.getPrototypeOf(
  Object.getPrototypeOf(
    [][Symbol.iterator]()
  )
);

%IteratorPrototype% is in the prototype chain of all iterators that are created by the standard library:

> IteratorPrototype.isPrototypeOf('abc'[Symbol.iterator]())
true
> IteratorPrototype.isPrototypeOf([].keys())
true
> IteratorPrototype.isPrototypeOf(new Map().entries())
true
> IteratorPrototype.isPrototypeOf((function* () {})())
true
> IteratorPrototype.isPrototypeOf('aaa'.matchAll(/a/g))
true

One of the proposals for iteration helpers depends on the fact that %IteratorPrototype% is a prototype of many iterators. We’ll get into the details when we look at that proposal.

Where to put helper functions for iterables?  

Approach: methods of iterables  

If we want to be able to method-chain iteration helpers like we can with Arrays, the conceptually cleanest way is to make those helpers methods of iterable objects. That could look as follows.

class Iterable {
  * map(mapFn) {
    for (const item of this) {
      yield mapFn(item);
    }
  }
  * filter(filterFn) {
    for (const item of this) {
      if (filterFn(item)) {
        yield item;
      }
    }
  }
  toArray() {
    return [...this];
  }
}
class Set2 extends Iterable {
  #elements;
  constructor(elements) {
    super();
    // The real Set eliminates duplicates here
    this.#elements = elements;
  }
  [Symbol.iterator]() {
    return this.#elements[Symbol.iterator]();
  }
  // ···
}

Now we can do:

const arr = new Set2([0, -1, 3, -4, 8])
  .filter(x => x >= 0)
  .map(x => x * 2)
  .toArray()
;
assert.deepEqual(
  arr, [0, 6, 16]
);

Upsides and downsides of methods of iterables  

If we want our helpers to be methods, then iterables are the right location for them. Apart from chaining, another benefit of methods is that specific classes can override a default implementation of an operation if they can implement it more efficiently (via the specific features of the class).

Alas, we are facing a fatal obstacle: We can’t change the inheritance hierarchies of existing classes (especially not of Array). That makes it impossible to use this approach.

Approach: wrapping iterables  

Another way of enabling method chains is via a technique that became popular in the JavaScript world via the libraries jQuery and Underscore: We wrap the objects that we want to operate on and add methods to them that way:

function iter(data) {
  return {
    filter(filterFn) {
      function* internalFilter(data) {
        for (const item of data) {
          if (filterFn(item)) {
            yield item;
          }
        }
      }
      return iter(internalFilter(data));
    },
    map(mapFn) {
      function* internalMap(data) {
        for (const item of data) {
          yield mapFn(item);
        }
      }
      return iter(internalMap(data));
    },
    toArray() {
      return [...data];
    },
  };
}

const arr = iter(new Set([0, -1, 3, -4, 8]))
  .filter(x => x >= 0)
  .map(x => x * 2)
  .toArray()
;
assert.deepEqual(
  arr, [0, 6, 16]
);
  • Upside of this approach: It’s conceptually clean. No work-arounds are required, we don’t need to change iteration in any way.
  • We can method-chain helper invocations.
  • Downside of this approach: wrapping introduces extra overhead.

Approach: introducing a superclass for iterators  

Yet another way of getting method chaining is by adding helper methods to iterators. There is an ECMAScript language proposal for this approach.

How does that work? As we have seen, all iterators created by the standard library already have the object %IteratorPrototype% in their prototype chains. We can complete this prototype into a full class:

const IteratorPrototype = Object.getPrototypeOf(
  Object.getPrototypeOf(
    [][Symbol.iterator]()
  )
);

// “Class” based on IteratorPrototype
function Iterator() {}
Iterator.prototype = IteratorPrototype;

Iterator.prototype.filter = function* (filterFn) {
  for (const item of this) {
    if (filterFn(item)) {
      yield item;
    }
  }
};

Iterator.prototype.map = function* (mapFn) {
  for (const item of this) {
    yield mapFn(item);
  }
};

Iterator.prototype.toArray = function () {
  return [...this];
};

Interestingly, due to how instanceof works and due to Iterator.prototype being a prototype of each iterator in the standard library, many objects are already instances of Iterator:

> 'abc'[Symbol.iterator]() instanceof Iterator
true
> [].values() instanceof Iterator
true
> new Map().entries() instanceof Iterator
true
> (function* () {})() instanceof Iterator
true
> 'aaa'.matchAll(/a/g) instanceof Iterator
true

This is what using the new approach looks like:

const arr = new Set([0, -1, 3, -4, 8])
  .values() // get iterator
  .filter(x => x >= 0)
  .map(x => x * 2)
  .toArray()
;

assert.deepEqual(
  arr, [0, 6, 16]
);

Handling the different kinds of iterables  

With this approach, we have to distinguish three kinds of iterables:

  • Iterables that are not iterators (Arrays, Sets, Maps, results of Object.keys(), etc.)
  • Iterable iterators that extend Iterator (array.keys(), map.entries(), etc.)
  • Iterable iterators that don’t extend Iterator (in existing code).

The proposal provides the tool function Iterator.from() to handle all these cases in the same manner:

  • If an iterable is not an iterator, it returns its iterator.
  • If an iterable is an iterator and extends Iterator, it returns it as is.
  • If an iterable is an iterator and does not extend Iterator, it wraps that iterable so that it has the methods of Iterator.

Upsides of iterator methods  

  • As with the previous two approaches, we can chain methods.
  • Classes can override default helper implementations with more efficient ones.

Downsides of iterator methods  

  • Requiring all iterators to extend Iterator is a significant change of the original iteration protocol.
    • Not all existing code will be updated to meet this requirement.
    • Some iterators may not be able to meet this requirement (e.g. subclasses).
  • The location of the new methods is not ideal. When we want to apply an operation to an iterable, we have to distinguish: non-iterator iterables, iterable iterators that extend Iterator, and iterable iterators that don’t extend Iterator.
    • This can be handled via Iterator.from(), but then we get an API that is like the wrapping approach.
  • If helper methods are intended for iterables then putting them in iterators is a conceptual mismatch.
    • This becomes especially obvious when using Iterator.from(): It converts an iterable to an iterator, then we invoke iterator methods, then we use a construct such as for-of to process the result. That construct only accepts iterables, but instances of Iterator also happen to be iterable.
    • If an operation accepts more than one operand (such as zip()), these operands would be iterables, while this would be an instance of Iterator.
    • A practical consequence of the mismatch is that we have to create an iterator before we can apply a helper to a non-iterator iterable.
    • I agree that the status quo w.r.t. iterables vs. iterators is already slightly confusing. I’d prefer not to make things worse.
  • The set of built-in operations can’t be extended. If a library wants to provide more iterator helpers, they cannot be added to Iterator and would probably be functions.
  • Different from prior art: Established libraries such as JavaScript’s Underscore/Lodash library and Python’s itertools are based on functions.

Approach: functions  

The established way of providing helpers for iterables is actually not via methods but via functions. Examples include Python’s itertools and JavaScript’s Underscore/Lodash.

The library @rauschma/iterable prototypes what that would look like. It is implemented roughly like this:

const Iterable = {
  * filter(filterFn, iterable) {
    for (const item of iterable) {
      if (filterFn(item)) {
        yield item;
      }
    }
  },

  * map(mapFn, iterable) {
    for (const item of iterable) {
      yield mapFn(item);
    }
  },

  toArray(iterable) {
    return [...iterable];
  },

  // ···
};

const {filter, map, toArray} = Iterable;

const set = new Set([0, -1, 3, -4, 8]);
const filtered = filter(x => x >= 0, set);
const mapped = map(x => x * 2, filtered);
const arr = toArray(mapped);

assert.deepEqual(
  arr, [0, 6, 16]
);

Note that Iterable isn’t really an object but a namespace/pseudo-module for the functions, similar to JSON and Math.

Should JavaScript get a pipeline operator, we could even use these functions as if they were methods:

const {filter, map, toArray} = Iterable;

const arr = new Set([0, -1, 3, -4, 8])
  |> filter(x => x >= 0, ?)
  |> map(x => x * 2, ?)
  |> toArray(?)
;

Upsides of functions  

  • Helpers as functions follow the tradition established by Underscore/Lodash and others.
  • Conceptually, functions work well as iterable helpers: operands and results are always iterables.
  • The set of helpers can be extended by anyone, because any function will be in the same category as the built-in helpers.
    • Especially functions that create iterables (such as range(start, end)) will be important additions to this category.
  • The current iteration protocol does not have to change.

Downsides of functions  

Functions won’t allow method chaining. However:

  • In my experience, chains are rarely long. And even with long chains, I don’t mind introducing intermediate variables (as was done in the example before the previous one).
  • We may get a pipeline operator in JavaScript and then can chain helper functions (as in the previous example). Note that the pipeline operator is nice to have for functions, but not a requirement. Their other benefits stand on their own.

If we are not chaining, functions are convenient:

const filtered = filter(x => x, iterable);

In contrast, this is what using iterator methods looks like:

const filtered1 = iterable.values().filter(x => x);
const filtered2 = Iterator.from(iterable).filter(x => x);

More benefits of iteration and iteration helpers  

In this blog post, we have only seen synchronous iteration: We immediately get a new item when we request it from an iterator.

But there is also asynchronous iteration where code pauses until the next item is available.

One important use case for asynchronous iteration is processing streams of data (e.g. web streams and Node.js streams).

Even if we apply multiple operations to an asynchronous iterable, the input data is processed one item at a time. That enables us to work with very large datasets because we don’t have to keep all of them in memory. And we see output much more quickly (compared to first applying the first operation to the complete dataset, then the second operation to the result, etc.).

Conclusion  

We have explored iteration and four approaches for implementing iteration helpers:

  1. Methods of iterables
  2. Wrapping iterables
  3. Introducing a superclass for iterators (proposal)
  4. Functions (proposal)

I expect (at most) one of these approaches to be added to JavaScript. How should we choose?

Given all the constraints we are facing, my favorite is (4):

  • We don’t have to change the current iteration protocol.
  • Operations on iterables are conceptually cleaner and easier to use.
  • The built-in set of operations can be extended.

JavaScript may eventually get a pipeline operator. That would even enable us to chain functions. But the pipeline operator is not required to make functions an appealing solution (details).

If methods are preferred over functions  

If methods are preferred over functions, I’d argue in favor of wrapping (as done by jQuery and supported by Underscore/Lodash). Compare:

// Iterator methods
const arr1 = new Set([0, -1, 3, -4, 8])
  .values() // get iterator
  .filter(x => x >= 0)
  .map(x => x * 2)
  .toArray()
;

// Wrapping
const arr2 = iter(new Set([0, -1, 3, -4, 8]))
  .filter(x => x >= 0)
  .map(x => x * 2)
  .toArray()
;

A module-based standard library?  

Using global variables as namespaces for functions is a common pattern in JavaScript. Examples include:

There is a proposal for built-in modules in JavaScript (i.e., a module-based standard library). Built-in modules would provide functionality such as what is listed above and could also contain function-based iteration helpers:

import {map, filter, toArray} from 'std:Iterable';

Further reading on iteration  

Chapters in my book “JavaScript for impatient programmers” (which is free to read online):