Table of contents for this series of posts: “What is ReasonML?”
This blog post explains patterns and techniques for external and internal iteration in ReasonML.
The GitHub repository reasonml-demo-iterators
has the source code of the examples.
Many algorithms operate on series of values. We can apply such algorithms to any data structure that lets us iterate over its elements (visit them sequentially, one element at a time). Examples of such algorithms:
Two ways of implementing iteration are: externally and internally.
(In this blog post, I’m using the term item for an iterated value and the term element for a value stored in a data structure.)
With external iteration, we ask for each element of the data structure (pull). In OOP languages such as JavaScript, that looks as follows.
> const arr = ["a", "b"];
> const iter = arr[Symbol.iterator]();
> iter.next()
{ value: 'a', done: false }
> iter.next()
{ value: 'b', done: false }
> iter.next()
{ value: undefined, done: true }
We first create an iterator iter
for the data structure arr
. Then we invoke iter.next()
repeatedly until the result signals us that all values were visited, via done: true
.
With internal iteration, the data structure feeds us its elements (push). In OOP languages such as JavaScript, that looks as follows.
> const arr = ["a", "b"];
> arr.forEach(x => console.log(x));
a
b
The parameter of arr.forEach()
is a callback that successively receives each element and logs it to the console.
The next step for us will be to introduce abstractions for iteration:
Abstractions for iteration have two benefits:
List.length
, List.map()
and List.filter()
.It is interesting that iteration abstractions (such as iterators and streams) are based on mutable state at their core: the same entity produces different results when you access it in the same manner several times. However, they also feel quite functional, due to combinators.
In OOP, the abstraction for external iteration is an object, the so-called iterator.
In ReasonML, we also call the abstraction an iterator, but we use a nullary function that we call repeatedly to produce the iterated values. As an example, let’s use the conversion function ofList()
(which is explained soon) to create an iterator gen()
for a list myList
:
# let myList = ["a", "b", "c"];
# let gen = ofList(myList);
# gen();
Some("a")
# gen();
Some("b")
# gen();
Some("c")
# gen();
None
gen()
returns values of type option(string)
, a variant with two constructors:
x
of myList
are returned as Some(x)
.None
.The following type is for iterators that produce values of type 'a
:
type gen('a) = unit => option('a);
Within the ReasonML/OCaml ecosystem, iterators are also called enumerations and generators (due to them being producers of data, muddling the difference between interface and implementation).
To better understand how generators work, let’s look at code that produces and consumes them.
This is how ofList()
can be implemented:
let ofList = (l: list('a)): gen('a) => {
let current = ref(l);
() => { /* A */
switch (current^) {
| [] => None
| [head, ...tail] =>
current := tail;
Some(head)
}
};
};
The last thing ofList()
does (in line A), is return the generator, a nullary function. That function uses the ref cell current
(think mutable variable) to store the next iteration value. If the list, pointed to by current
, is not yet empty, the generator returns Some(head)
. Otherwise, it returns None
.
The inverse process, collecting iteration values in a list, works as follows.
let toList = (g: gen('a)): list('a) => {
let rec aux = (acc) =>
switch (g()) {
| None => acc
| Some(x) => aux([x, ...acc])
};
List.rev(aux([]));
};
We repeatedly call g()
and collect its results in the accumulator parameter acc
that we return once aux()
is finished. The resulting list is in the wrong order, which is why we reverse it via List.rev()
.
The following code is a test for toList()
.
let count = ref(0);
let myGen = () =>
switch (count^) {
| 0 =>
count := count^ + 1;
Some("hello");
| 1 =>
count := count^ + 1;
Some("world");
| 2 => None
};
expect(toList(myGen)) |> toBe(["hello", "world"]);
myGen
is a hand-written generator that uses the ref cell count
to produce a series of values. The last line checks if toList()
properly converts myGen
to a list. The assertion is based on the bs-jest
API.
length()
Let’s implement our first combinator: A function length
whose input is a generator and whose output is the number of items in that generator:
let length = (g: gen('a)): int => {
let rec aux = (len) =>
switch (g()) {
| None => len
| Some(_) => aux(len+1)
};
aux(0);
};
expect(length(ofList([1, 2, 3]))) |> toBe(3);
We use the helper function aux()
to loop over all elements in the generator g
. While we are doing so, we increment the counter len
. Once the generator is finished, we return len
.
map()
The generator combinator map
has as input a function f
and a generator g
and returns a new generator. Each item of the result is derived from an item of g
by applying f
to it:
let map = (~f: 'a => 'b, in_: gen('a)): gen('b) => {
let out = () =>
switch (in_()) {
| None => None /* A */
| Some(x) => Some(f(x)) /* B */
};
out;
};
This is a test for map()
:
let genIn = ofList([1, 2, 3]);
let genOut = map(~f = x=>2*x, genIn);
expect(toList(genOut)) |> toEqual([2, 4, 6]);
The result is the function out()
. It derives its results from g()
: If g()
is done then so are we (line A). Otherwise, we need to apply f()
to the iterated value x
(line B).
filter()
The combinator filter()
copies only those elements of its input generator to its output generator for which its callback parameter f
returns true
:
let filter = (~f: 'a => bool, in_: gen('a)): gen('a) => {
let rec out = () =>
switch (in_()) {
| None => None
| Some(x) when f(x) => Some(x) /* A */
| _ => out() /* B */
};
out;
};
This is a test for filter()
:
let genIn = ofList([1, -2, 3, -4]);
let genOut = filter(~f = x=>x >= 0, genIn);
expect(toList(genOut)) |> toEqual([1, 3]);
With filter()
, you pass on whatever the input generator in_()
produces (line A), with one exception: If f()
returns false
for an iteration value, you immediately advance to the next result of in_()
(line B).
Let’s briefly review what internal iteration looks like. The following code is an implementation of ListLabels.iter()
.
let iter = (~f: 'a => unit, theList: list('a)) => {
let rec aux = (l) =>
switch (l) {
| [] => ()
| [head, ...tail] =>
f(head);
aux(tail);
};
aux(theList);
};
iter()
uses the helper function aux()
to loop over the elements of theList
and apply f
to them.
This is what using iter()
looks like:
let result = ref([]);
let f = (x) => result := [x, ...result^]; /* A */
iter(~f, ["a", "b", "c"]);
expect(result^) |> toEqual(["c", "b", "a"]);
Given that f()
does not return anything, it must do its work via side effects.
Each step prepends a new item x
in front of the existing items in result
(line A). That’s why the final result is the original list in reverse.
iter()
is elegant (if you need or want to use side effects), but it has one disadvantage: It is not an abstraction that can be returned by combinators.
So what do we want from an abstraction for internal iteration?
iter()
.The solution is as follows. A function that creates an internal iterator for a data structure simply returns the function aux
that performs internal iteration (a partially applied iter()
, if you will). That gives us the following type for sequences (internal iterators):
type sequence('a) = ('a => unit) => unit;
A sequence is a function that takes a callback of type 'a => unit
and feeds it values of type 'a
.
Let’s examine how the type sequence
works, by implementing a function that returns a sequence for a list:
let ofList = (theList: list('a)): sequence('a) =>
(f: 'a => unit) => {
let rec aux = (l) =>
switch (l) {
| [] => ()
| [head, ...tail] =>
f(head);
aux(tail);
};
aux(theList);
};
As you can see, this is a curried version of iter()
. The following code tests ofList()
.
let result = ref([]);
let seq = ofList(["a", "b", "c"]);
seq(x => result := [x, ...result^]);
expect(result^) |> toBe(["c", "b", "a"]);
Let’s take a brief look at how combinators are implemented for internal iterators.
On one hand, computing results is more difficult than with external iterators – you need to rely on side effects to compute and return the result:
let length = (seq: sequence('a)) => {
let result = ref(0);
seq((_) => { result := result^ + 1 });
result^;
};
On the other hand, producing an output iterator is simpler, because you decide when you send data to it:
let filter = (~f: 'a => bool, seqIn: sequence('a)): sequence('a) => {
let seqOut = (out: 'b => unit) =>
seqIn((in_: 'a) =>
switch (f(in_)) {
| true => out(in_); /* send value to output */
| false => (); /* skip value */
}
);
seqOut;
};
These are three OCaml libraries supporting iteration:
Gen
supports generators (external iterators).
BatEnum
supports enumerations (external iterators).Sequence
supports sequences (internal iterators).In the following section, we’ll use the Gen
library to demonstrate more iteration concepts.
One important operation for processing iterators is folding. We’ll use the Gen
library to explore how that works. The following function len()
counts the number of items produced by a generator gen
:
let len = (gen: Gen.t('a)) => {
let foldFunc = (state: int, _x: 'a) =>
state + 1;
Gen.fold(foldFunc, 0, gen);
};
This is a test for a generator with zero items:
let gen = Gen.of_list([]);
expect(len(gen)) |> toBe(0);
This is a test for a generator with two items:
let gen = Gen.of_list(["a", "b"]);
expect(len(gen)) |> toBe(2);
Gen.fold()
works much like List.fold()
:
Gen.fold()
.Gen.fold()
operates on.As you can see, we are not even interested in the current iteration value _x
. We only need to count how often foldFunc
is called and use the states to do so.
So far, we have only looked at iterating over linear data structures. Iterating over non-linear data structures is more challenging.
Take, for example, a tree (the same data structure we have used in the blog post on variants):
type tree('a) =
| Empty
| Node('a, tree('a), tree('a));
Writing an external iterator for trees is not trivial. You could, for example, manually manage a stack of nodes to visit. Here, we explore an alternative technique that we will develop in three steps. We start by implementing internal iteration over trees.
The following function is similar, in spirit, to ListLabels.iter()
:
let rec iterTree = (~f: ('a) => unit, tree: tree('a)) => {
switch (tree) {
| Empty => ()
| Node(x, left, right) =>
f(x);
iterTree(~f, left);
iterTree(~f, right);
}
};
The code is straightforward – we use recursion to visit all nodes in the tree.
Note that iterTree
is also how we’d implement internal iteration. The only change would be to nest two functions:
tree
.f
.The next step is to convert the previous version to continuation-passing style (CPS): Instead of returning the results, we pass them on to callbacks. These callbacks are named continuations, because they determine how computation continues.
let rec iterTreeCps = (~f: ('a, unit=>unit) => unit,
tree: tree('a), cont: unit => unit) => {
switch (tree) {
| Empty => cont()
| Node(x, left, right) =>
f(x, () => { /* A */
iterTreeCps(~f, left, () => {
iterTreeCps(~f, right, cont);
});
});
}
};
Line A is a good example of how CPS works: f()
“returns” to us via its second parameter, a continuation. In CPS, the callee can delay (or avoid) returning a value to the caller. That’s why this style is used for asynchronous computations (e.g. by Node.js).
For external iteration, we want the following:
iterTreeCps
.f()
in line A, we want to pause and return a value. When the iterator is called again, it should continue its execution where it previously paused.This is how yield
and generators work in, for example, JavaScript, Python and C#. We can simulate this behavior by storing the continuation in a ref cell next
:
let ofTree = (tr: tree('a)): Gen.t('a) => {
let next = ref(None);
let rec visitTree = (t: tree('a), cont: unit => option('a)) => {
switch (t) {
| Empty => cont()
| Node(x, left, right) =>
next := Some(() => {
visitTree(left, () => {
visitTree(right, cont);
});
});
Some(x);
}
};
/* Continuing... */
The all that remains to be done is to define how everything starts and to return the actual external iterator.
/* Continued */
next := Some(() => visitTree(tr, () => None));
() =>
switch (next^) {
| Some(func) =>
next := None;
func();
/* After the last node of the tree, */
/* we always return None */
| None => None
};
This is how you use ofTree()
:
let myStrTree =
Node("a",
Node("b", Empty, Empty),
Node("c",
Node("d", Empty, Empty),
Empty
)
);
let treeGen = TreeIteration.ofTree(myStrTree);
expect(Gen.to_list(treeGen))
|> toEqual(["a", "b", "c", "d"]);
Iterators don’t have to be finite. For example:
let naturalNumbers = (): Gen.t(int) => {
let n = ref(0);
() => {
let cur = n^;
n := n^ + 1;
Some(cur);
};
};
You obviously would not want to iterate over the items of this iterator (it would take a very long time). Infinite iterator are, however, useful for a few things.
First, you can use them to produce finite iterations of arbitrary length, via Gen.take()
. Gen.take(n, gen)
returns a generator with the first n
items of generator gen
.
let take5 = Gen.take(5, naturalNumbers());
expect(Gen.to_list(take5))
|> toEqual([0, 1, 2, 3, 4]);
Second, Gen.zip()
combines two iterations into a single iteration of pairs. And an infinite iterator can fill in one component of each pair:
let abc = Gen.of_list(["a", "b", "c"]);
let pairs = Gen.zip(naturalNumbers(), abc);
expect(Gen.to_list(pairs))
|> toEqual([
(0, "a"),
(1, "b"),
(2, "c")
]);
This works, because zip()
stops as soon as one of the two iterators is finished.
You can also add indices to iteration values via Gen.zip_index()
.
Occasionally, when working with series of values, you need to distinguish the first or the last value. Alas, that is normally not possible: all iterated values are treated the same.
Let’s examine this issue by implementing join()
, a function that concatenates a series of strings into a single string. Optionally, it inserts separator strings between adjacent iteration values.
The first solution is to index each iteration value str
by mapping it to a pair (index, str)
. Then it is easy to determine which is the first element – it has the index 0:
let joinViaIndex = (~sep: string = "", gen: Gen.t(string)) => {
let foldFunc = (state, (index, str)) =>
if (index == 0) {
/* No separator before the first element */
state ++ str;
} else {
state ++ sep ++ str;
};
Gen.fold(foldFunc, "", Gen.zip_index(gen));
};
joinViaIndex()
works like this:
let gen = Gen.of_list(["a", "b", "c"]);
expect(joinViaIndex(~sep="|", gen))
|> toBe("a|b|c");
We are using Gen.fold()
(which we have already encountered in this blog post) to process the generator gen
.
You may wonder if we could have simply checked whether the intermediate result state
was empty and only added a separator if it wasn’t. But that would mean that we wouldn’t properly add a separator after an initial empty string (or after and between several initial empty strings).
Gen.peek()
The Gen
library has a clever way of supporting look-ahead for generators, via its function peek()
. That function maps each iteration value x
to a pair:
x
.option()
.That means that the last iteration value x_last
is mapped to the following pair:
(x_last, None)
Let’s apply Gen.peek()
to an iterator with two strings:
let gen = Gen.of_list(["a", "b"]);
let peekGen = Gen.peek(gen);
expect(Gen.to_list(peekGen))
|> toEqual([
("a", Some("b")), /* A */
("b", None), /* B */
]);
The first pair in line A looks ahead to the second value, "b"
. The second pair enables you to determine that "b"
is the last iteration value.
With this capability, we can again implement join()
via Gen.fold()
:
let joinViaLookAhead = (~sep: string = "", gen: Gen.t(string)) => {
let foldFunc = (state, (str, lookAhead)) =>
switch lookAhead {
/* No separator after the last element */
| None => state ++ str
| _ => state ++ str ++ sep
};
Gen.fold(foldFunc, "", Gen.peek(gen));
};
As a practical example, consider the following task. Given this text in Markdown syntax:
## Intro
This article is
about something.
## Details
These are the details
about something.
We’d like to split this text into chunks, records with two fields. Each section in the text produces one:
title
contains the heading of the section.body
is a list with the lines of the section (including the line with the heading).The result looks as follows.
[
{ title: "Intro", body: [
"## Intro",
"This article is",
"about something.",
]},
{ title: "Details", body: [
"## Details",
"These are the details",
"about something.",
]},
]
To implement this functionality, we use a more powerful variant of fold()
: unfold_scan()
. The callback unfoldScanFunc
of this function works similarly to fold
’s callback:
(nextState, output)
:
nextState
is the next state (the only value returned by fold
’s callback).output
is the iteration value added to the output iterator.That means that we can use the states to assemble chunks. As long as a chunk isn’t ready yet, our output is None
. Once the chunk in state
is finished, our output is Some(state)
.
let unfoldScanFunc = (state: chunk, line: string) => {
switch (extractTitle(line)) {
| None =>
let nextState = appendLine(line, state);
let output = None;
(nextState, output)
| Some(title) =>
let nextState = {title, body: [line]};
let output = toOptChunk(state); /* wrap with option() */
(nextState, output);
};
};
You can check out the complete source code in the file ParseChunks.re
in the repository. It also includes an imperative version of the chunking algorithm that is less elegant.
This is a summary of iteration as we have explored it in this blog post:
External | Internal | |
---|---|---|
Access | Pull | Push |
Who is in control? | Client | Data source |
Abstraction (iterator) | Generator | Sequence |
Iterators are incredibly useful when processing series of values. It is fascinating that using them feels functional, even though they are an abstraction based on mutable state.
It also noteworthy that once you work with combinators, there is not much of a difference between external and internal iterators. For example, the following two test files are the same (apart from the the modules to be tested).