ReasonML: external and internal iteration

This blog post explains patterns and techniques for external and internal iteration in ReasonML.

The GitHub repository `reasonml-demo-iterators` has the source code of the examples.

Iteration  #

Many algorithms operate on series of values. We can apply such algorithms to any data structure that lets us iterate over its elements (visit them sequentially, one element at a time). Examples of such algorithms:

• Counting the number of items in a series.
• Computing the average of all numbers in a series.

Two ways of implementing iteration are: externally and internally.

(In this blog post, I’m using the term item for an iterated value and the term element for a value stored in a data structure.)

External iteration  #

With external iteration, we ask for each element of the data structure (pull). In OOP languages such as JavaScript, that looks as follows.

``````> const arr = ["a", "b"];
> const iter = arr[Symbol.iterator]();

> iter.next()
{ value: 'a', done: false }
> iter.next()
{ value: 'b', done: false }
> iter.next()
{ value: undefined, done: true }
``````

We first create an iterator `iter` for the data structure `arr`. Then we invoke `iter.next()` repeatedly until the result signals us that all values were visited, via `done: true`.

Internal iteration  #

With internal iteration, the data structure feeds us its elements (push). In OOP languages such as JavaScript, that looks as follows.

``````> const arr = ["a", "b"];
> arr.forEach(x => console.log(x));
a
b
``````

The parameter of `arr.forEach()` is a callback that successively receives each element and logs it to the console.

Abstractions for iterations  #

The next step for us will be to introduce abstractions for iteration:

• We have already seen an iterator, an abstraction for external iteration. However, for ReasonML, we want an abstraction that follows the style of functional programming (not object-oriented programming).
• It is not yet clear what an abstraction for internal iteration should look like. But, as we will see, there is one.

Abstractions for iteration have two benefits:

• Algorithms that operate on series of values can use such abstractions as their inputs. Then they automatically work with all data structures that support the abstractions (instead of having to be implemented for each data structure separately).
• Combinators are operations consuming and/or producing iteration abstractions. A combinator works with any data source that supports its iteration abstraction. It also computes on demand: To process the lines in a large text file without some kind of iteration abstraction (think streams), we need to read them all into a data structure (e.g. a list or an array) and then work with that data structure. If the lines are handled via nested combinators, processing can start much earlier – as soon as the first line is available. Combinators for iteration abstractions are similar to combinators for lists (which are also linear) such as `List.length`, `List.map()` and `List.filter()`.

It is interesting that iteration abstractions (such as iterators and streams) are based on mutable state at their core: the same entity produces different results when you access it in the same manner several times. However, they also feel quite functional, due to combinators.

Abstracting external iteration in ReasonML  #

In OOP, the abstraction for external iteration is an object, the so-called iterator.

In ReasonML, we also call the abstraction an iterator, but we use a nullary function that we call repeatedly to produce the iterated values. As an example, let’s use the conversion function `ofList()` (which is explained soon) to create an iterator `gen()` for a list `myList`:

``````# let myList = ["a", "b", "c"];
# let gen = ofList(myList);

# gen();
Some("a")
# gen();
Some("b")
# gen();
Some("c")
# gen();
None
``````

`gen()` returns values of type `option(string)`, a variant with two constructors:

• Elements `x` of `myList` are returned as `Some(x)`.
• The end of the iteration is indicated via `None`.

The following type is for iterators that produce values of type `'a`:

``````type gen('a) = unit => option('a);
``````

Within the ReasonML/OCaml ecosystem, iterators are also called enumerations and generators (due to them being producers of data, muddling the difference between interface and implementation).

To better understand how generators work, let’s look at code that produces and consumes them.

Converting lists to generators  #

This is how `ofList()` can be implemented:

``````let ofList = (l: list('a)): gen('a) => {
let current = ref(l);
() => { /* A */
switch (current^) {
| [] => None
current := tail;
}
};
};
``````

The last thing `ofList()` does (in line A), is return the generator, a nullary function. That function uses the ref cell `current` (think mutable variable) to store the next iteration value. If the list, pointed to by `current`, is not yet empty, the generator returns `Some(head)`. Otherwise, it returns `None`.

Converting generators to lists  #

The inverse process, collecting iteration values in a list, works as follows.

``````let toList = (g: gen('a)): list('a) => {
let rec aux = (acc) =>
switch (g()) {
| None => acc
| Some(x) => aux([x, ...acc])
};
List.rev(aux([]));
};
``````

We repeatedly call `g()` and collect its results in the accumulator parameter `acc` that we return once `aux()` is finished. The resulting list is in the wrong order, which is why we reverse it via `List.rev()`.

The following code is a test for `toList()`.

``````let count = ref(0);
let myGen = () =>
switch (count^) {
| 0 =>
count := count^ + 1;
Some("hello");
| 1 =>
count := count^ + 1;
Some("world");
| 2 => None
};

expect(toList(myGen)) |> toBe(["hello", "world"]);
``````

`myGen` is a hand-written generator that uses the ref cell `count` to produce a series of values. The last line checks if `toList()` properly converts `myGen` to a list. The assertion is based on the `bs-jest` API.

Generator combinator: `length()`#

Let’s implement our first combinator: A function `length` whose input is a generator and whose output is the number of items in that generator:

``````let length = (g: gen('a)): int => {
let rec aux = (len) =>
switch (g()) {
| None => len
| Some(_) => aux(len+1)
};
aux(0);
};

expect(length(ofList([1, 2, 3]))) |> toBe(3);
``````

We use the helper function `aux()` to loop over all elements in the generator `g`. While we are doing so, we increment the counter `len`. Once the generator is finished, we return `len`.

Generator combinator: `map()`#

The generator combinator `map` has as input a function `f` and a generator `g` and returns a new generator. Each item of the result is derived from an item of `g` by applying `f` to it:

``````let map = (~f: 'a => 'b, in_: gen('a)): gen('b) => {
let out = () =>
switch (in_()) {
| None => None /* A */
| Some(x) => Some(f(x)) /* B */
};
out;
};
``````

This is a test for `map()`:

``````let genIn = ofList([1, 2, 3]);
let genOut = map(~f = x=>2*x, genIn);
expect(toList(genOut)) |> toEqual([2, 4, 6]);
``````

The result is the function `out()`. It derives its results from `g()`: If `g()` is done then so are we (line A). Otherwise, we need to apply `f()` to the iterated value `x` (line B).

Generator combinator: `filter()`#

The combinator `filter()` copies only those elements of its input generator to its output generator for which its callback parameter `f` returns `true`:

``````let filter = (~f: 'a => bool, in_: gen('a)): gen('a) => {
let rec out = () =>
switch (in_()) {
| None => None
| Some(x) when f(x) => Some(x) /* A */
| _ => out() /* B */
};
out;
};
``````

This is a test for `filter()`:

``````let genIn = ofList([1, -2, 3, -4]);
let genOut = filter(~f = x=>x >= 0, genIn);
expect(toList(genOut)) |> toEqual([1, 3]);
``````

With `filter()`, you pass on whatever the input generator `in_()` produces (line A), with one exception: If `f()` returns `false` for an iteration value, you immediately advance to the next result of `in_()` (line B).

Abstracting internal iteration in ReasonML  #

Let’s briefly review what internal iteration looks like. The following code is an implementation of `ListLabels.iter()`.

``````let iter = (~f: 'a => unit, theList: list('a)) => {
let rec aux = (l) =>
switch (l) {
| [] => ()
aux(tail);
};
aux(theList);
};
``````

`iter()` uses the helper function `aux()` to loop over the elements of `theList` and apply `f` to them.

This is what using `iter()` looks like:

``````let result = ref([]);
let f = (x) => result := [x, ...result^]; /* A */
iter(~f, ["a", "b", "c"]);
expect(result^) |> toEqual(["c", "b", "a"]);
``````

Given that `f()` does not return anything, it must do its work via side effects.

Each step prepends a new item `x` in front of the existing items in `result` (line A). That’s why the final result is the original list in reverse.

`iter()` is elegant (if you need or want to use side effects), but it has one disadvantage: It is not an abstraction that can be returned by combinators.

An abstraction for internal iteration  #

So what do we want from an abstraction for internal iteration?

• It should work much like `iter()`.
• It should be a type of value that data structures can be mapped to and that combinators can work with.

The solution is as follows. A function that creates an internal iterator for a data structure simply returns the function `aux` that performs internal iteration (a partially applied `iter()`, if you will). That gives us the following type for sequences (internal iterators):

``````type sequence('a) = ('a => unit) => unit;
``````

A sequence is a function that takes a callback of type `'a => unit` and feeds it values of type `'a`.

Converting lists to sequences  #

Let’s examine how the type `sequence` works, by implementing a function that returns a sequence for a list:

``````let ofList = (theList: list('a)): sequence('a) =>
(f: 'a => unit) => {
let rec aux = (l) =>
switch (l) {
| [] => ()
aux(tail);
};
aux(theList);
};
``````

As you can see, this is a curried version of `iter()`. The following code tests `ofList()`.

``````let result = ref([]);

let seq = ofList(["a", "b", "c"]);
seq(x => result := [x, ...result^]);

expect(result^) |> toBe(["c", "b", "a"]);
``````

Combinators for sequences  #

Let’s take a brief look at how combinators are implemented for internal iterators.

On one hand, computing results is more difficult than with external iterators – you need to rely on side effects to compute and return the result:

``````let length = (seq: sequence('a)) => {
let result = ref(0);
seq((_) => { result := result^ + 1 });
result^;
};
``````

On the other hand, producing an output iterator is simpler, because you decide when you send data to it:

``````let filter = (~f: 'a => bool, seqIn: sequence('a)): sequence('a) => {
let seqOut = (out: 'b => unit) =>
seqIn((in_: 'a) =>
switch (f(in_)) {
| true => out(in_); /* send value to output */
| false => (); /* skip value */
}
);
seqOut;
};
``````

Libraries for iteration  #

These are three OCaml libraries supporting iteration:

• `Gen` supports generators (external iterators).
• Tip for reading the docs: check out included modules and sub-modules, they provide much interesting functionality.
• `BatEnum` supports enumerations (external iterators).
• `Sequence` supports sequences (internal iterators).

In the following section, we’ll use the `Gen` library to demonstrate more iteration concepts.

Folding iterators  #

One important operation for processing iterators is folding. We’ll use the `Gen` library to explore how that works. The following function `len()` counts the number of items produced by a generator `gen`:

``````let len = (gen: Gen.t('a)) => {
let foldFunc = (state: int, _x: 'a) =>
state + 1;
Gen.fold(foldFunc, 0, gen);
};
``````

This is a test for a generator with zero items:

``````let gen = Gen.of_list([]);
expect(len(gen)) |> toBe(0);
``````

This is a test for a generator with two items:

``````let gen = Gen.of_list(["a", "b"]);
expect(len(gen)) |> toBe(2);
``````

`Gen.fold()` works much like `List.fold()`:

• The first parameter is a function that maps the current state and an iteration value to the next state. States are arbitrary values whose type depends on the algorithm. In this case, the states are ints, because the last computed state is also the result of `Gen.fold()`.
• The second parameter is the first state.
• The third parameter is the generator that `Gen.fold()` operates on.

As you can see, we are not even interested in the current iteration value `_x`. We only need to count how often `foldFunc` is called and use the states to do so.

Implementing iterators for trees  #

So far, we have only looked at iterating over linear data structures. Iterating over non-linear data structures is more challenging.

Take, for example, a tree (the same data structure we have used in the blog post on variants):

``````type tree('a) =
| Empty
| Node('a, tree('a), tree('a));
``````

Writing an external iterator for trees is not trivial. You could, for example, manually manage a stack of nodes to visit. Here, we explore an alternative technique that we will develop in three steps. We start by implementing internal iteration over trees.

Iterating internally over trees  #

The following function is similar, in spirit, to `ListLabels.iter()`:

``````let rec iterTree = (~f: ('a) => unit, tree: tree('a)) => {
switch (tree) {
| Empty => ()
| Node(x, left, right) =>
f(x);
iterTree(~f, left);
iterTree(~f, right);
}
};
``````

The code is straightforward – we use recursion to visit all nodes in the tree.

Note that `iterTree` is also how we’d implement internal iteration. The only change would be to nest two functions:

• The parameter of the first one is `tree`.
• The parameter of the second one is `f`.

Iterating, continuation passing style  #

The next step is to convert the previous version to continuation-passing style (CPS): Instead of returning the results, we pass them on to callbacks. These callbacks are named continuations, because they determine how computation continues.

``````let rec iterTreeCps = (~f: ('a, unit=>unit) => unit,
tree: tree('a), cont: unit => unit) => {
switch (tree) {
| Empty => cont()
| Node(x, left, right) =>
f(x, () => { /* A */
iterTreeCps(~f, left, () => {
iterTreeCps(~f, right, cont);
});
});
}
};
``````

Line A is a good example of how CPS works: `f()` “returns” to us via its second parameter, a continuation. In CPS, the callee can delay (or avoid) returning a value to the caller. That’s why this style is used for asynchronous computations (e.g. by Node.js).

External iteration for trees  #

For external iteration, we want the following:

• Use the algorithm of `iterTreeCps`.
• But instead of calling `f()` in line A, we want to pause and return a value. When the iterator is called again, it should continue its execution where it previously paused.

This is how `yield` and generators work in, for example, JavaScript, Python and C#. We can simulate this behavior by storing the continuation in a ref cell `next`:

``````let ofTree = (tr: tree('a)): Gen.t('a) => {
let next = ref(None);

let rec visitTree = (t: tree('a), cont: unit => option('a)) => {
switch (t) {
| Empty => cont()
| Node(x, left, right) =>
next := Some(() => {
visitTree(left, () => {
visitTree(right, cont);
});
});
Some(x);
}
};

/* Continuing... */
``````

The all that remains to be done is to define how everything starts and to return the actual external iterator.

``````/* Continued */

next := Some(() => visitTree(tr, () => None));

() =>
switch (next^) {
| Some(func) =>
next := None;
func();
/* After the last node of the tree, */
/* we always return None */
| None => None
};
``````

This is how you use `ofTree()`:

``````let myStrTree =
Node("a",
Node("b", Empty, Empty),
Node("c",
Node("d", Empty, Empty),
Empty
)
);

let treeGen = TreeIteration.ofTree(myStrTree);
expect(Gen.to_list(treeGen))
|> toEqual(["a", "b", "c", "d"]);
``````

Infinite iterators  #

Iterators don’t have to be finite. For example:

``````let naturalNumbers = (): Gen.t(int) => {
let n = ref(0);
() => {
let cur = n^;
n := n^ + 1;
Some(cur);
};
};
``````

You obviously would not want to iterate over the items of this iterator (it would take a very long time). Infinite iterator are, however, useful for a few things.

First, you can use them to produce finite iterations of arbitrary length, via `Gen.take()`. `Gen.take(n, gen)` returns a generator with the first `n` items of generator `gen`.

``````let take5 = Gen.take(5, naturalNumbers());
expect(Gen.to_list(take5))
|> toEqual([0, 1, 2, 3, 4]);
``````

Second, `Gen.zip()` combines two iterations into a single iteration of pairs. And an infinite iterator can fill in one component of each pair:

``````let abc = Gen.of_list(["a", "b", "c"]);
let pairs = Gen.zip(naturalNumbers(), abc);
expect(Gen.to_list(pairs))
|> toEqual([
(0, "a"),
(1, "b"),
(2, "c")
]);
``````

This works, because `zip()` stops as soon as one of the two iterators is finished.

You can also add indices to iteration values via `Gen.zip_index()`.

Detecting the first or last iterated value  #

Occasionally, when working with series of values, you need to distinguish the first or the last value. Alas, that is normally not possible: all iterated values are treated the same.

Let’s examine this issue by implementing `join()`, a function that concatenates a series of strings into a single string. Optionally, it inserts separator strings between adjacent iteration values.

Detecting the first element by indexing iteration values  #

The first solution is to index each iteration value `str` by mapping it to a pair `(index, str)`. Then it is easy to determine which is the first element – it has the index 0:

``````let joinViaIndex = (~sep: string = "", gen: Gen.t(string)) => {
let foldFunc = (state, (index, str)) =>
if (index == 0) {
/* No separator before the first element */
state ++ str;
} else {
state ++ sep ++ str;
};
Gen.fold(foldFunc, "", Gen.zip_index(gen));
};
``````

`joinViaIndex()` works like this:

``````let gen = Gen.of_list(["a", "b", "c"]);
expect(joinViaIndex(~sep="|", gen))
|> toBe("a|b|c");
``````

We are using `Gen.fold()` (which we have already encountered in this blog post) to process the generator `gen`.

You may wonder if we could have simply checked whether the intermediate result `state` was empty and only added a separator if it wasn’t. But that would mean that we wouldn’t properly add a separator after an initial empty string (or after and between several initial empty strings).

Detecting the last element via `Gen.peek()`#

The `Gen` library has a clever way of supporting look-ahead for generators, via its function `peek()`. That function maps each iteration value `x` to a pair:

• The first component of the pair is `x`.
• The second component is the next iteration value, still wrapped in `option()`.

That means that the last iteration value `x_last` is mapped to the following pair:

``````(x_last, None)
``````

Let’s apply `Gen.peek()` to an iterator with two strings:

``````let gen = Gen.of_list(["a", "b"]);
let peekGen = Gen.peek(gen);
expect(Gen.to_list(peekGen))
|> toEqual([
("a", Some("b")), /* A */
("b", None), /* B */
]);
``````

The first pair in line A looks ahead to the second value, `"b"`. The second pair enables you to determine that `"b"` is the last iteration value.

With this capability, we can again implement `join()` via `Gen.fold()`:

``````let joinViaLookAhead = (~sep: string = "", gen: Gen.t(string)) => {
let foldFunc = (state, (str, lookAhead)) =>
/* No separator after the last element */
| None => state ++ str
| _ => state ++ str ++ sep
};
Gen.fold(foldFunc, "", Gen.peek(gen));
};
``````

A practical example  #

As a practical example, consider the following task. Given this text in Markdown syntax:

``````## Intro
## Details
These are the details
``````

We’d like to split this text into chunks, records with two fields. Each section in the text produces one:

• The field `title` contains the heading of the section.
• The field `body` is a list with the lines of the section (including the line with the heading).

The result looks as follows.

``````[
{ title: "Intro", body: [
"## Intro",
]},
{ title: "Details", body: [
"## Details",
"These are the details",
]},
]
``````

To implement this functionality, we use a more powerful variant of `fold()`: `unfold_scan()`. The callback `unfoldScanFunc` of this function works similarly to `fold`’s callback:

• Its parameters are also the current state and the current iteration value.
• Its result, however, is a pair `(nextState, output)`:
• `nextState` is the next state (the only value returned by `fold`’s callback).
• `output` is the iteration value added to the output iterator.

That means that we can use the states to assemble chunks. As long as a chunk isn’t ready yet, our output is `None`. Once the chunk in `state` is finished, our output is `Some(state)`.

``````let unfoldScanFunc = (state: chunk, line: string) => {
switch (extractTitle(line)) {
| None =>
let nextState = appendLine(line, state);
let output = None;
(nextState, output)
| Some(title) =>
let nextState = {title, body: [line]};
let output = toOptChunk(state); /* wrap with option() */
(nextState, output);
};
};
``````

You can check out the complete source code in the file `ParseChunks.re` in the repository. It also includes an imperative version of the chunking algorithm that is less elegant.

Conclusion  #

This is a summary of iteration as we have explored it in this blog post:

External Internal
Access Pull Push
Who is in control? Client Data source
Abstraction (iterator) Generator Sequence

Iterators are incredibly useful when processing series of values. It is fascinating that using them feels functional, even though they are an abstraction based on mutable state.

It also noteworthy that once you work with combinators, there is not much of a difference between external and internal iterators. For example, the following two test files are the same (apart from the the modules to be tested).