ReasonML: polymorphic variant types

[2018-01-07] dev, reasonml
(Ad, please don’t block)

Table of contents for this series of posts: “What is ReasonML?


In this blog post, we look at polymorphic variants, which are a more flexible version of normal variants. But that flexibility also makes them more complicated.

[Aside: this blog post was challenging to write and is based various resources on the web. Corrections and tips welcome!]

What are polymorphic variants?  

Polymorphic variants are similar to normal variants. The biggest difference is that constructors are not tied to types anymore; they exist independently. That makes polymorphic variants more versatile and leads to interesting consequences for the type system. Some of those consequences are negatives – they are the price you pay for the versatility.

Let’s start with the following normal variant and then turn it into a polymorphic variant.

type rgb = Red | Green | Blue;

The polymorphic version of rgb is defined like this:

type rgb = [`Red | `Green | `Blue];

The constructors of a polymorphic variant must be wrapped in square brackets and their names must start with backticks followed by either lowercase letters or uppercase letters:

# `red;
- : [> `red ] = `red
# `Red;
- : [> `Red ] = `Red

As usual, types such as rgb must start with lowercase letters.

Polymorphic constructors exist on their own  

You can only use a non-polymorphic constructor if it is part of a non-polymorphic variant:

# Int(123);
Error: Unbound constructor Int

# type data = Int(int) | Str(string);
type data = Int(int) | Str(string);
# Int(123);
- : data = Int(123)

In contrast, you can just use polymorphic constructors. There is no need to define them beforehand:

# `Int(123);
- : [> `Int(int) ] = `Int(123)

Note that `Int(123) has an interesting type: [> `Int(int) ]. We’ll examine what that means later.

Polymorphic constructors existing on their own, enables using the same constructor more than once:

type color = [`Red | `Orange | `Yellow | `Green | `Blue | `Purple];
type rgb = [`Red | `Green | `Blue];

In contrast, with normal variants, you should avoid multiple variants within the same scope having the same constructor.

You can even use the same polymorphic constructor with different arities and/or type parameters:

# `Int("abc", true);
- : [> `Int((string, bool)) ] = `Int(("abc", true))
# `Int(1.0, 2.0, 3.0);
- : [> `Int((float, float, float)) ] = `Int((1., 2., 3.))

Extending polymorphic variants  

When defining a polymorphic variant, you can extend an existing variant. In the following example, color extends rgb:

type rgb = [`Red | `Green | `Blue];
type color = [rgb | `Orange | `Yellow | `Purple];

As you can see, case matters here: rgb being lowercased indicates that its constructors are inserted.

Extending multiple variants is also possible:

type red = [ `Red ];
type green = [ `Green ];
type blue = [ `Blue ];
type rgb = [ red | green | blue ];

The namespace of polymorphic constructors is global  

The names of non-polymorphic constructors are part of their scopes (e.g. their modules), whereas the namespace of polymorphic constructors is global. You can see that in the following example:

module M = {
  type data = [`Int(int) | `Str(string)];
  let stringOfData = (x: data) =>
    switch x {
    | `Int(i) => string_of_int(i)
    | `Str(s) => s
    };
};
M.stringOfData(`Int(123));

In the last line, the parameter of M.stringOfData() is created via a constructor that is not from M’s scope. Due to the global namespace of polymorphic constructors, the parameter is compatible with the type data.

Type compatibility  

How does ReasonML determine if the type of an actual parameter (function call) is compatible with the type of the formal parameter (function definition)? Let’s look at examples. We start by creating a type and a function whose single parameter is of that type.

# type rgb = [`Red | `Green | `Blue];
type rgb = [ `Blue | `Green | `Red ];
# let id = (x: rgb) => x;
let id: (rgb) => rgb = <fun>;

Next, we create a new type rgb2 that has the same polymorphic constructors as rgb and call id() with a value whose type is rgb2:

# type rgb2 = [`Red | `Green | `Blue];
type rgb2 = [ `Blue | `Green | `Red ];
# let arg: rgb2 = `Blue;
let arg: rgb2 = `Blue;
# id(arg);
- : rgb = `Blue

Normally, the type of the formal parameter x and the actual parameter arg have to be the same. But with polymorphic variants, it’s enough if your types have the same constructors.

If, however, the type of an actual parameter arg2 has less constructors, then ReasonML won’t let you do the function call:

# type rg = [`Red | `Green];
type rg = [ `Green | `Red ];
# let arg2: rg = `Red;
let arg2: rg = `Red;
# id(arg2);
Error: This expression has type rg but an expression was expected of type rgb
The first variant type does not allow tag(s) `Blue

The reason is that id makes precise demands on what it needs: exactly the constructors `Red, `Green and `Blue. Therefore, the type rg is not enough.

From concrete types to type constraints  

Interestingly, the following does work:

# type rgb = [`Red | `Green | `Blue];
type rgb = [ `Blue | `Green | `Red ];
# let id = (x: rgb) => x;
let id: (rgb) => rgb = <fun>;

# let arg3 = `Red;
let arg3: [> `Red ] = `Red;
# id(arg3);
- : rgb = `Red

Why? Because ReasonML didn’t infer a fixed type for arg3, it inferred the type constraint [> `Red ]. This constraint is a so-called lower bound and means: “all types that have at least the constructor `Red”. This constraint is compatible with rgb, the type of x.

Type constraints for parameters  

We can also use constraints as the types of parameters. For example, the following call to id() fails, because the parameter’s type, rgbw, has too many constructors:

# type rgbw = [`Red | `Green | `Blue | `White ];
type rgbw = [ `Blue | `Green | `Red | `White ];
# id(`Red: rgbw);
Error: This expression has type rgbw but an expression was expected of type rgb
The second variant type does not allow tag(s) `White

This time, we call id() and specify the type rgbw of the parameter directly (no intermediate let binding). We can make the function call work without changing the function call, by only changing id():

# let id = (x: [> `Red | `Green | `Blue]) => x;
let id: (([> `Blue | `Green | `Red ] as 'a)) => 'a = <fun>;
# id(`Red: rgbw); /* same as before */
- : rgbw = `Red

Now x has a lower bound and accepts all types that have at least the given three constructors.

You can also define constraints by referring to variants. For example, the following definition of id() is basically equivalent to the previous one.

let id = (x: [> rgb]) => x;

We’ll take a deeper look at type constraints later on.

Writing extensible code with polymorphic variants  

One key benefit of polymorphic variants is that code becomes more extensible.

The type and code we want to extend  

As an example, take the following type definitions for shapes. They are polymorphic versions of the example in the preceding blog post.

type point = [ `Point(float, float) ];
type shape = [
  | `Rectangle(point, point)
  | `Circle(point, float)
];

Based on these type definitions, we can write a function that computes the area of a shape:

let pi = 4.0 *. atan(1.0);
let computeArea = (s: shape) =>
  switch s {
  | `Rectangle(`Point(x1, y1), `Point(x2, y2)) =>
    let width = abs_float(x2 -. x1);
    let height = abs_float(y2 -. y1);
    width *. height;
  | `Circle(_, radius) => pi *. (radius ** 2.0)
  };

Extending shape: a failing first attempt  

Let’s say we want to extend shape with one more shape – triangles. How would we do that? We can simply define a new type shapePlus that reuses the existing polymorphic constructors Rectangle and Circle, and adds the constructor Triangle:

type shapePlus = [
  | `Rectangle(point, point)
  | `Circle(point, float)
  | `Triangle(point, point, point)
];

Now we also need to extend computeArea(). The following function is our first attempt at writing that extension:

let shoelaceFormula = (`Point(x1, y1), `Point(x2, y2), `Point(x3, y3)) =>
  0.5 *. abs_float(x1*.y2 -. x3*.y2 +. x3*.y1 -. x1*.y3 +. x2*.y3 -. x2*.y1);
let computeAreaPlus = (sp: shapePlus) =>
  switch sp {
  | `Triangle(p1, p2, p3) => shoelaceFormula(p1, p2, p3)
  | `Rectangle(_, _) => computeArea(sp) /* A */
  | `Circle(_, _) => computeArea(sp) /* B */
  };

Alas, this code doesn’t work: In lines A and B, sp’s type shapePlus is not compatible with the type shape of computeArea’s parameter. We get the following error message:

Error: This expression has type shapePlus
but an expression was expected of type shape
The second variant type does not allow tag(s) `Triangle

Fixing computeAreaPlus via an as clause  

Thankfully, we can fix the problem by using the as clause for the last two cases:

let computeAreaPlus = (sp: shapePlus) =>
  switch sp {
  | `Triangle(p1, p2, p3) => shoelaceFormula(p1, p2, p3)
  | `Rectangle(_, _) as r => computeArea(r)
  | `Circle(_, _) as c => computeArea(c)
  };

How does as help us? With polymorphic variants, it picks the most general type possible. That is:

  • r has the type [> `Rectangle(point, point)]
  • c has the type [> `Circle(point, float)]

Both of these types are compatible with shape, the type of computeArea’s parameter.

The final solution  

There is one more improvement we can make. Given the following variant:

type myvariant = [`C1(t1) | `C2(t2)];

Then the following two patterns are equivalent:

#myvariant
(`C1(_: t1) | `C2(_: t2))

If we use the hash for the type shape, we get:

let computeAreaPlus = (sp: shapePlus) =>
  switch sp {
  | `Triangle(p1, p2, p3) => shoelaceFormula(p1, p2, p3)
  | #shape as s => computeArea(s)
  };

Let’s use computeAreaPlus() with two shapes:

let top = `Point(3.0, 5.0);
let left = `Point(0.0, 0.0);
let right = `Point(3.0, 0.0);

let circ = `Circle(top, 3.0);
let tri = `Triangle(top, left, right);

computeAreaPlus(circ); /* 28.274333882308138 */
computeAreaPlus(tri); /* 7.5 */

We have therefore successfully extended both the type shape and the function computeArea() operating on it.

Best practices: normal variants vs. polymorphic variants  

In general, you should prefer normal variants over polymorphic variants, because they are slightly more efficient and enable stricter type checking. Quoting the OCaml manual:

[...] polymorphic variants, while being type-safe, result in a weaker type discipline. That is, core language variants do actually much more than ensuring type-safety, they also check that you use only declared constructors, that all constructors present in a data-structure are compatible, and they enforce typing constraints to their parameters.

However, polymorphic variants have a few clear strengths. If any of those matter in a given situation, you should use polymorphic variants:

  • Reuse: A constructor (possibly along with code processing it) is useful for more than one variant. Constructors for colors fall into this category, for example.

  • Decoupling: A constructor is used in multiple locations, but you don’t want those locations to depend on a single module where the constructor is defined. Instead, you can simply use the constructor without defining it.

  • Extensibility: You expect a variant to be extended later on. Similar to how shapePlus was an extension of shape, earlier in this post.

  • Conciseness: Due to the global namespace of polymorphic constructors, you can use them without qualifying them or opening their modules (see next subsection).

  • Use constructors without prior definitions: You can use polymorphic constructors without defining them beforehand via variants. That is occasionally convenient for throw-away types that are only used in single locations.

Thankfully, it is relatively easy to move to a polymorphic variant from a normal variant if the need arises.

Conciseness: normal variants vs. polymorphic variants  

Let’s compare the conciseness of normal variants and polymorphic variants.

For the normal variant bwNormal, you need to qualify Black in line A (or open its module):

module MyModule = {
  type bwNormal = Black | White;

  let getNameNormal(bw: bwNormal) =
    switch bw {
    | Black => "Black"
    | White => "White"
    };
};

print_string(MyModule.getNameNormal(MyModule.Black)); /* A */
  /* "Black" */

For the polymorphic variant bwPoly, no qualification is necessary for `Black in line A:

module MyModule = {
  type bwPoly = [ `Black | `White ];

  let getNamePoly(bw: bwPoly) =
    switch bw {
    | `Black => "Black"
    | `White => "White"
    };
};

print_string(MyModule.getNamePoly(`Black)); /* A */
  /* "Black" */

Not having to qualify didn’t really have much of an effect in this example, but it matters if you use the constructors many times.

Preventing typos with polymorphic variants  

One issue with polymorphic variants is that you get less warnings about typos, because you can use constructors without defining them. For example, in the following code, `Green is misspelled as `Gren in line A:

type rgb = [`Red | `Green | `Blue];
let f = (x) =>
  switch x {
  | `Red => "Red"
  | `Gren => "Green" /* A */
  | `Blue => "Blue"
  };
/* let f: ([< `Blue | `Gren | `Red ]) => string = <fun>; */

You do get a warning if you add a type annotation for the parameter x:

let f = (x: rgb) =>
  switch x {
  | `Red => "Red"
  | `Gren => "Green"
  | `Blue => "Blue"
  };
/*
Error: This pattern matches values of type [? `Gren ]
but a pattern was expected which matches values of type rgb
The second variant type does not allow tag(s) `Gren
*/

If you return polymorphic variant values from a function, you can specify the return type of that function, too. But that also adds weight to your code, so be sure you really benefit. If you type all parameters, many if not most problems should be caught.

(Advanced)  

All of the following sections cover advanced topics.

Type constraints for type variables  

Before we can take a closer look at constraints for polymorphic variants, we first need to understand general constraints for type variables (of which the former is a special case).

At the end of a type definition, there can be one or more type constraints. These have the following syntax:

constraint «typeexpr» = «typeexpr»

These constraints are used to refine the type variables in the preceding type definition. This is a simple example:

type t('a) = 'a constraint 'a=int;

How does ReasonML handle constraints? Before we can look into that, let’s first understand unification and how it builds on pattern matching.

Pattern matching goes in one direction: One term without variables is used to fill in the variables in another term. In the following example, `Int(123) is the term without variables and `Int(x) is the term with variables:

switch(`Int(123)) {
| `Int(x) => print_int(x)
}

Unification is pattern matching that works in both directions: both terms can have variables, and variables on both sides are filled in. As an example, consider:

# type t('a, 'b) = ('a, 'b) constraint ('a, int) = (bool, 'b);
type t('a, 'b) = ('a, 'b) constraint 'a = bool constraint 'b = int;

ReasonML simplified as much as possible: the original complex constraint with variables on both sides of the equals sign was converted to two simple constraints with variables only on the left-hand sides.

This is an example where things can be simplified so much that no constraint is needed, anymore:

# type t('a, 'b) = 'c constraint 'c = ('a, 'b);
type t('a, 'b) = ('a, 'b);

Type constraints for polymorphic variants  

The type constraints we have seen earlier in this blog post are actually just type constraints that are specific to polymorphic variants.

For example, the following two expressions are equivalent:

let x: [> `Red ] = `Red;
let x: [> `Red] as 'a = `Red;

On the other hand, the following two type expressions are also equivalent (but you can’t use constraint in let bindings and parameter definitions):

[> `Red] as 'a
'a constraint 'a = [> `Red ]

That is, with all the polymorphic variant constraints that we have used so far, there was always an implicit (hidden) type variable. We can see that if we try to use such a constraint to define a type t:

# type t = [> `Red ];
Error: A type variable is unbound in this type declaration.
In type [> `Red ] as 'a the variable 'a is unbound

We fix this as follows. Note the final type computed by ReasonML.

# type t('a) = [> `Red] as 'a;
type t('a) = 'a constraint 'a = [> `Red ];

Upper and lower bounds for polymorphic variants  

For the remainder of this blog post, we refer to type constraints for polymorphic variants as simply type constraints or constraints.

Type constraints consist of either or both of the following:

  • A lower bound: indicates what elements a type must contain at least. For example: [> `Red | `Green] accepts all types that include the constructors `Red and `Green. In other words: taken as a set, the constraint accepts all types that are supersets of it.

  • An upper bound: indicates what elements a type must contain at most. For example: [< `Red | `Green] accepts the following types: [`Red | `Green], [`Red], [`Green].

You can use type constraints for:

  • Type definitions
  • let bindings
  • Parameter definitions

For the latter two, you must use the short form (without constraint).

What do type constraints match?  

Lower bounds  

Let’s examine how the lower bound [> `Red | `Green] works by using it as the type of a function parameter x:

let lower = (x: [> `Red | `Green]) => true;

Values of type [`Red | `Green | `Blue] and [`Red | `Green] are accepted:

# lower(`Red: [`Red | `Green | `Blue]);
- : bool = true
# lower(`Red: [`Red | `Green]);
- : bool = true

However, values of type [`Red] are not accepted, because that type doesn’t contain both of the constructors of the constraint.

# lower(`Red: [`Red]);
Error: This expression has type [ `Red ]
but an expression was expected of type [> `Green | `Red ]
The first variant type does not allow tag(s) `Green

Upper bounds  

The following interaction experiments with the upper bound [< `Red | `Green]:

# let upper = (x: [< `Red | `Green]) => true;
let upper: ([< `Green | `Red ]) => bool = <fun>;

# upper(`Red: [`Red | `Green]); /* OK */
- : bool = true
# upper(`Red: [`Red]); /* OK */
- : bool = true

# upper(`Red: [`Red | `Green | `Blue]);
Error: This expression has type [ `Blue | `Green | `Red ]
but an expression was expected of type [< `Green | `Red ]
The second variant type does not allow tag(s) `Blue

Inferred type constraints  

If you use polymorphic constructors, ReasonML infers type constraints for you. Let’s look at a few examples.

Lower bounds  

# `Int(3);
- : [> `Int(int) ] = `Int(3)

The value `Int(3) has the inferred type [> `Int(int) ] which is compatible with all types that have at least the constructor `Int(int).

With a tuple, you get two separate inferred type constraints:

# (`Red, `Green);
- : ([> `Red ], [> `Green ]) = (`Red, `Green)

On the other hand, the elements of a list must all have the same type, which is why the two inferred constraints are merged:

# [`Red, `Green];
- : list([> `Green | `Red ]) = [`Red, `Green]

This list is accepted whenever the expected type is a list with elements whose type includes at least the constructors `Red and `Green.

If you try to use the same constructor with parameters of different types, you get an error, because ReasonML can’t merge the two inferred types:

# [`Int(3), `Int("abc")];
Error: This expression has type string but
an expression was expected of type int

Upper bounds  

So far, we have only seen lower bounds being inferred. In the following example, ReasonML infers an upper bound for the parameter x:

let f = (x) =>
  switch x {
  | `Red => 1
  | `Green => 2
  };
/* let f: ([< `Green | `Red ]) => int = <fun>; */

Due to the switch expression, f can handle at most the two constructors `Red and `Green.

The inferred type becomes more complex if f returns its parameter:

let f = (x) =>
  switch x {
  | `Red => x
  | `Green => x
  };
/* let f: (([< `Green | `Red ] as 'a)) => 'a = <fun>; */

The type parameter 'a is used to express that the type of the parameter and the type of the result are the same.

Things change if we use as clauses, because they decouple the input type from the output type:

let f = (x) =>
  switch x {
  | `Red as r => r
  | `Green as g => g
  };
/* let f: ([< `Green | `Red ]) => [> `Green | `Red ] = <fun>; */

The limits of ReasonML’s type system  

Some things go beyond the capabilities of ReasonML’s type system:

let f = (x) =>
  switch x {
  | `Red => x
  | `Green => `Blue
  };
/*
Error: This expression has type [> `Blue ]
but an expression was expected of type [< `Green | `Red ]
The second variant type does not allow tag(s) `Blue
*/

The return type of f is: “the type of x or the type [> `Blue]”. However, there is no way to express that via a constraint.

More complex constraints  

Type constraints can become quite complex. Let’s start with two simple functions:

let even1 = (x) =>
  switch x {
  | `Data(n) => (n mod 2) == 0
  };
/* let even1: ([< `Data(int) ]) => bool = <fun>; */

let even2 = (x) =>
  switch x {
  | `Data(s) => (String.length(s) mod 2) == 0
  };
/* let even2: ([< `Data(string) ]) => bool = <fun>; */

Both inferred types include upper bounds, caused by switch statements.

Let’s use the same variable x as a parameter for both functions:

let even = (x) => even1(x) && even2(x);
/* let even: ([< `Data(string & int) ]) => bool = <fun>; */

To type x, ReasonML merges the following two types:

[< `Data(int) ]
[< `Data(string) ]

The result contains the following constructor.

`Data(string & int)

That stands for “a constructor `Data” whose type parameter has both type string and type int. Such a constructor doesn’t exist, meaning that you can’t call even(), because there is no value that is compatible with the type of its parameter.

Type inference uses unification (which is bi-directional)  

When ReasonML computes types via inference, and determines that two types t1, t2 must be equal (e.g. the type of an actual parameter and the type of a formal parameter), it uses unification to solve the equation t1 = t2.

I’ll demonstrate that via several functions. Each of those functions returns its only parameter. The type tp it specifies for the parameter is different from the type tr it specifies for the result. ReasonML will try to unify tp and tr, allowing us to observe that unification is bi-directional.

Two constraints  

If both parameter type and result type are constraints, ReasonML tries to merge the constraints.

# let f = (x: [>`Red]): [< `Red | `Green] => x;
let f: (([< `Green | `Red > `Red ] as 'a)) => 'a = <fun>;
# let f = (x: [>`Red]): [> `Red | `Green] => x;
let f: (([> `Green | `Red ] as 'a)) => 'a = <fun>;

The unified type remains polymorphic – it contains the type variable 'a. The type variable is used to express: “whatever the type of x eventually is, the result has the same type”.

One monomorphic type, one constraint  

If one of the two types is monomorphic (has no type variables) and the other one a constraint then the constraint is only used to check the monomorphic type. Due to unification, both types end up being monomorphic.

# let f = (x: [>`Red]): [`Red | `Green] => x;
let f: ([ `Green | `Red ]) => [ `Green | `Red ] = <fun>;
# let f = (x: [`Red]): [< `Red | `Green] => x;
let f: ([ `Red ]) => [ `Red ] = <fun>;

Two monomorphic types  

If both types are monomorphic then they must have the same constructors.

# let f = (x: [`Red]): [`Red | `Green] => x;
Error: This expression has type [ `Red ]
but an expression was expected of type [ `Green | `Red ]
The first variant type does not allow tag(s) `Green

# let f = (x: [`Red | `Green]): [`Red | `Green] => x;
let f: ([ `Green | `Red ]) => [ `Green | `Red ] = <fun>;

Monomorphic type vs. constraint  

One more demonstration of the difference between monomorphic type and constraint. Consider the following two functions:

type rgb = [`Red | `Green | `Blue];

let id1 = (x: rgb) => x;
  /* let id1: (rgb) => rgb = <fun>; */

let id2 = (x:[>rgb]) => x;
  /* let id2: (([> rgb ] as 'a)) => 'a = <fun>; */

Let’s compare the two functions:

  • id1() has a parameter whose type rgb is monomorphic (it has no type variables).
  • id2() has a parameter whose type [>rgb] is a constraint. The definition itself doesn’t look polymorphic, but the computed signature does – there is now a type variable, 'a.

Let’s see what happens if we call these functions with arguments whose type is a new polymorphic variant that has the same constructors as rgb:

type c3 = [ `Blue | `Green | `Red ];

With id1(), the type of x, and therefore the result, is fixed. That is, it stays rgb:

# id1(`Red: c3);
- : rgb = `Red

We were only able to call id1(), because rgb and c3 are the same.

In contrast, with id2(), the type of x is polymorphic and more flexible. During unification, 'a is bound to c3. The result of the function call has type c3 (the value of 'a).

# id2(`Red: c3);
- : c3 = `Red

Material  

Polymorphic variants and how to use them:

Type variable constraints:

The semantics of polymorphic variants: