ReasonML: variant types

[2017-12-24] dev, reasonml
(Ad, please don’t block)

Table of contents for this series of posts: “What is ReasonML?


Variant types (short: variants) are a data type supported by many functional programming languages. They are an important ingredient in ReasonML that is not available in C-style languages (C, C++, Java, C#, etc.). This blog post explains how they work.

Variants as sets of symbols (enums)  

Variants let you define sets of symbols. When used like this, they are similar to enums in C-style languages. For example, the following type color defines symbols for six colors.

type color = Red | Orange | Yellow | Green | Blue | Purple;

There are two elements in this type definition:

  • The name of the type, color, which must start with a lowercase letter.
  • The names of constructors (Red, Orange, ...), which must start with uppercase letters. Why constructors are called constructors will become clear, once we use variants as data structures.

The names of constructors must be unique within the current scope. That enables ReasonML to easily deduce their types:

# Purple;
- : color = Purple

Variants can be processed via switch and pattern matching:

let invert = (c: color) =>
  switch c {
  | Red => Green
  | Orange => Blue
  | Yellow => Purple
  | Green => Red
  | Blue => Orange
  | Purple => Yellow
  };

Here, constructors are used both as patterns (left-hand sides of =>) and values (right-hand sides of =>). This is invert() in action:

# invert(Red);
- : color = Green
# invert(Yellow);
- : color = Purple

Tip: replacing booleans with variants  

In ReasonML, variants are often a better choice than booleans. Take for example, this function definition. (Remember that in ReasonML, the main parameter goes at the end, to enable currying.)

let stringOfContact(includeDetails: bool, c: contact) => ···;

This is how stringOfContact is invoked:

let str = stringOfContact(true, myContact);

It’s not clear what the boolean at the end does. You can improve this function via a labeled parameter.

let stringOfContact(~includeDetails: bool, c: contact) => ···;
let str = stringOfContact(~includeDetails=true, myContact);

Even more self-descriptive is to introduce a variant for the value of ~includeDetails:

type includeDetails = ShowEverything | HideDetails;
let stringOfContact(~levelOfDetail: includeDetails, c: contact) => ···;
let str = stringOfContact(~levelOfDetail=ShowEverything, myContact);

Using the variant includeDetails has two advantages:

  • It is immediately clear what “not showing details” means.
  • It is easy to add more modes later on.

Associating variant values with data  

Sometimes, you want to use variant values as keys for looking up data. One way of doing so is via a function that maps variant values to data:

type color = Red | Orange | Yellow | Green | Blue | Purple;
let stringOfColor(c: color) =>
  switch c {
  | Red => "Red"
  | Orange => "Orange"
  | Yellow => "Yellow"
  | Green => "Green"
  | Blue => "Blue"
  | Purple => "Purple"
  };

This technique has one disadvantage: it leads to redundancies, especially if you want to associate multiple pieces of data with the same variant value. We’ll explore alternatives in a future blog post.

Variants as data structures  

Each constructor can also hold one or more values. These values are identified by position. That is, individual constructors are similar to tuples. The following code demonstrates this feature.

type point = Point(float, float);
type shape =
  | Rectangle(point, point)
  | Circle(point, float);

Type point is a variant with a single constructor. It holds two floating point numbers. A shape is another variant. It is either:

  • a Rectangle defined by two corner points or
  • a Circle defined by a center and a radius.

With multiple constructor parameters, them being positional and not labeled becomes a problem – we have to describe elsewhere what their roles are. Records are an alternative in this case (to be described in a future blog post).

This is how you use the constructors:

# let bottomLeft = Point(-1.0, -2.0);
let bottomLeft: point = Point(-1., -2.);
# let topRight = Point(7.0, 6.0);
let topRight: point = Point(7., 6.);
# let circ = Circle(topRight, 5.0);
let circ: shape = Circle(Point(7., 6.), 5.);
# let rect = Rectangle(bottomLeft, topRight);
let rect: shape = Rectangle(Point(-1., -2.), Point(7., 6.));

Due to each constructor name being unique, ReasonML can easily infer the types.

If constructors hold data, pattern matching via switch is even more convenient, because it also lets you access that data:

let pi = 4.0 *. atan(1.0);

let computeArea = (s: shape) =>
  switch s {
  | Rectangle(Point(x1, y1), Point(x2, y2)) =>
    let width = abs_float(x2 -. x1);
    let height = abs_float(y2 -. y1);
    width *. height;
  | Circle(_, radius) => pi *. (radius ** 2.0)
  };

Let’s use computeArea, continuing our previous interactive rtop session:

# computeArea(circ);
- : float = 78.5398163397448315
# computeArea(rect);
- : float = 64.

Self-recursive data structures via variants  

You can also define recursive data structures via variants. For example, binary trees whose nodes contain integers:

type intTree =
  | Empty
  | Node(int, intTree, intTree);

intTree values are constructed like this:

let myIntTree = Node(1,
  Node(2, Empty, Empty),
  Node(3,
    Node(4, Empty, Empty),
    Empty
  )
);

myIntTree looks as follows: 1 has the two child nodes 2 and 3. 2 has two empty child nodes. Etc.

1
  2
    X
    X
  3
    4
      X
      X
    X

Processing self-recursive data structures via recursion  

To demonstrate processing self-recursive data structures, let’s implement a function computeSum, which computes the sum of the integers stored in the nodes.

let rec computeSum = (t: intTree) =>
  switch t {
  | Empty => 0
  | Node(i, leftTree, rightTree) =>
    i + computeSum(leftTree) + computeSum(rightTree)
  };

computeSum(myIntTree); /* 10 */

This kind of recursion is typical when working with variant types:

  1. A limited set of constructors is used to create data. In this case: Empty and Node().
  2. The same constructors are used as patterns to process the data.

That ensures that we handle whatever data is passed to us properly, as long as it is of type intTree. ReasonML helps by warning us if switch doesn’t cover intTree exhaustively. That protects us from forgetting cases that we should consider. To illustrate, let’s assume we forgot Empty and wrote computeSum like this:

let rec computeSum = (t: intTree) =>
  switch t {
  /* Missing: Empty */
  | Node(i, leftTree, rightTree) =>
    i + computeSum(leftTree) + computeSum(rightTree)
  };

Then we get the following warning.

Warning: this pattern-matching is not exhaustive.
Here is an example of a value that is not matched:
Empty

As mentioned in the blog post on functions, introducing catch-all cases means that you lose this protection. That’s why you should avoid them if you can.

Mutually recursive data structures via variants  

Recall that with let, we had to use let rec whenever recursion was involved:

  • A single self-recursive definition was done via let rec.
  • Multiple mutually recursive definitions were done via let rec and connected via and.

type is implicitly rec. That allowed us to do self-recursive definitions such as intTree. For mutually recursive definitions, we also need to connect those definitions via and. The following example again defines int trees, but this time with a separate type for nodes.

type intTree =
  | Empty
  | IntTreeNode(intNode)
and intNode =
  | IntNode(int, intTree, intTree);

intTree and intNode are mutually recursive, which is why they need to be defined within the same type declaration, separated via and.

Parameterized variants  

Let’s recall our original definition of int trees:

type intTree =
  | Empty
  | Node(int, intTree, intTree);

How can we turn this definition into a generic definition for trees whose nodes can contain any type of value? To do so, we have to introduce a variable for the type of a Node’s content. Type variables are prefixed with apostrophes in ReasonML. For example: 'a. Therefore, a generic tree looks as follows:

type tree('a) =
  | Empty
  | Node('a, tree('a), tree('a));

Two things are noteworthy. First, the content of a Node, which previously had the type int, now has the type 'a. Second, the type variable 'a has become a parameter of the type tree. Node passes that parameter on to its subtrees. That is, we can choose a different node value type for each tree, but within a tree, all node values must have the same type.

We can now define a type for int trees via a type alias, by providing tree’s type parameter:

type intTree = tree(int);

Let’s use tree to create a tree of strings:

let myStrTree = Node("a",
  Node("b", Empty, Empty),
  Node("c",
    Node("d", Empty, Empty),
    Empty
  )
);

Due to type inference, you do not need to provide a type parameter. ReasonML automatically infers that myStrTree has the type tree(string). The following generic function prints any kind of tree:

/**
 * @param ~indent How much to indent the current (sub)tree.
 * @param ~stringOfValue Converts node values to strings.
 * @param t The tree to convert to a string.
 */
let rec stringOfTree = (~indent=0, ~stringOfValue: 'a => string, t: tree('a)) => {
  let indentStr = String.make(indent*2, ' ');
  switch t {
  | Empty => indentStr ++ "X" ++ "\n"
  | Node(x, leftTree, rightTree) =>
    indentStr ++ stringOfValue(x) ++ "\n" ++
    stringOfTree(~indent=indent+1, ~stringOfValue, leftTree) ++
    stringOfTree(~indent=indent+1, ~stringOfValue, rightTree)
  };
};

This function uses recursion to iterate over the nodes of its parameter t. Given that stringOfTree works with arbitrary types 'a, we need a type-specific function to convert values of type 'a to strings. That is what parameter ~stringOfValue is for.

This is how we can print our previously defined myStrTree:

# print_string(stringOfTree(~stringOfValue=x=>x, myStrTree));
a
  b
    X
    X
  c
    d
      X
      X
    X

Useful standard variants  

I will briefly show two commonly used standard variants. A future blog post will give tips for using them.

Type option('a) for optional values  

In many object-oriented languages, a variable having type string means that the variable can be either null or a string value. Types that include null are called nullable. Nullable types are problematic in that it’s easy to work with their values while forgetting to handle null. If – unexpectedly – a null appears, you get the infamous null pointer exceptions.

In ReasonML, types are never nullable. Instead, potentially missing values are handled via the following parameterized variant:

type option('a) =
  | None
  | Some('a);

option forces you to always consider the None case.

ReasonML’s support for option is minimal. The definition of this variant is part of the language, but the core standard library has no utility functions for working with optional values, yet. Until they are, you can use BuckleScript’s Js.Option.

Type result('a) for error handling  

result is another standard variant for error-handling in OCaml:

type result('good, 'bad) =
  | Ok('good)
  | Error('bad);

Until ReasonML’s core library supports it, you can use BuckleScript’s Js.Result.

Example: evaluating integer expressions  

Working with trees is one of the strengths of ML-style languages. That’s why they are often used for programs involving syntax trees (interpreters, compilers, etc.). For example, the syntax checker Flow by Facebook is written in OCaml.

Therefore, as a concluding example, let’s implement an evaluator for simple integer expressions.

The following is a data structure for integer expressions.

type expression =
  | Plus(expression, expression)
  | Minus(expression, expression)
  | Times(expression, expression)
  | DividedBy(expression, expression)
  | Literal(int);

This is what an expression encoded with this variant looks like:

/* (3 - (16 / (6 + 2)) */
let expr =
  Minus(
    Literal(3),
    DividedBy(
      Literal(16),
      Plus(
        Literal(6),
        Literal(2)
      )
    )
  );

And finally, this is the function the evaluates integer expressions.

let rec eval(e: expression) =
  switch e {
  | Plus(e1, e2) => eval(e1) + eval(e2)
  | Minus(e1, e2) => eval(e1) - eval(e2)
  | Times(e1, e2) => eval(e1) * eval(e2)
  | DividedBy(e1, e2) => eval(e1) / eval(e2)
  | Literal(i) => i
  };

eval(expr); /* 1 */