Pattern matching in ReasonML: destructuring, switch, if expressions

[2017-12-12] dev, reasonml
(Ad, please don’t block)

Table of contents for this series of posts: “What is ReasonML?

Update 2017-12-13: Complete rewrite of how patterns are introduced.


In this blog post, we look at three features that are all related to pattern matching: destructuring, switch, and if expressions.

Digression: tuples  

To illustrate patterns and pattern matching, we’ll use tuples. Tuples are basically records whose parts are identified by position (and not by name). The parts of a tuple are called components.

Let’s create a tuple in rtop:

# (true, "abc");
- : (bool, string) = (true, "abc")

The first component of this tuple is the boolean true, the second component is the string "abc". Accordingly, the tuple’s type is (bool, string).

Let’s create one more tuple:

# (1.8, 5, ('a', 'b'));
- : (float, int, (char, char)) = (1.8, 5, ('a', 'b'))

Pattern matching  

Before we can examine destructuring, switch and if, we need to learn their foundation: pattern matching.

Patterns are a programming mechanism that helps with processing data. They serve two purposes:

  • Check what structure data has.
  • Extract parts of data.

This is done by matching patterns against data. Syntactically, patterns work as follows:

  • ReasonML has syntax for creating data. For example: tuples are created by separating data with commas and putting the result in parentheses.
  • ReasonML has syntax for processing data. The syntax of patterns mirrors the syntax for creating data.

Let’s start with simple patterns that support tuples. They have the following syntax:

  • A variable name is a pattern.
    • Examples: x, y, foo
  • A literal is a pattern.
    • Examples: 123, "abc", true
  • A tuple of patterns is a pattern.
    • Examples: (8,x), (3.2,"abc",true), (1, (9, foo))

The same variable name cannot be used in two different locations. That is, the following pattern is illegal: (x, x)

Equality checks  

The simplest patterns don’t have any variables. Matching such patterns is basically the same as an equality check. Let’s look at a few examples:

Pattern Data Matches?
3 3 yes
1 3 no
(true, 12, 'p') (true, 12, 'p') yes
(false, 12, 'p') (true, 12, 'p') no

So far, we have used the pattern to ensure that the data has the expected structure. As a next step, we introduce variable names. Those make the structural checks more flexible and let us extract data.

Variable names in patterns  

A variable name matches any data at its position and leads to the creation of a variable that is bound to that data.

Pattern Data Matches? Variable bindings
x 3 yes x = 3
(x, y) (1, 4) yes x = 1, y = 4
(1, y) (1, 4) yes y = 4
(2, y) (1, 4) no

The special variable name _ does not create variable bindings and can be used multiple times:

Pattern Data Matches? Variable bindings
(x, _) (1, 4) yes x = 1
(1, _) (1, 4) yes
(_, _) (1, 4) yes

Alternatives in patterns  

Let’s examine another pattern feature: Two or more subpatterns separated by vertical bars form an alternative. Such a pattern matches if one of the subpatterns matches. If a variable name exists in one subpattern, it must exit in all subpatterns.

Examples:

Pattern Data Matches? Variable bindings
1¦2¦3 1 yes
1¦2¦3 2 yes
1¦2¦3 3 yes
1¦2¦3 4 no
(1¦2¦3, 4) (1, 4) yes
(1¦2¦3, 4) (2, 4) yes
(1¦2¦3, 4) (3, 4) yes
(x, 0) ¦ (0, x) (1, 0) yes x = 1

The as operator: bind and match at the same time  

Until now, you had to decide whether you wanted to bind a piece of data to a variable or to match it via a subpattern. The as operator lets you do both: it’s left-hand side is a subpattern to match, its right-hand side is the name of a variable that the current data will be bound to.

Pattern Data Matches? Variable bindings
7 as x 7 yes x = 7
(8, x) as y (8, 5) yes x = 5, y = (8, 5)
((1,x) as y, 3) ((1,2), 3)) yes x = 2, y = (1, 2)

There are many more ways of creating patterns  

ReasonML supports more complex data types than just tuples. For example: lists and records. Many of those data types are also supported via pattern matching. More on that in upcoming blog posts.

Pattern matching via let (destructuring)  

You can do pattern matching via let. As an example, let’s start by creating a tuple:

# let tuple = (7, 4);
let tuple: (int, int) = (7, 4);

We can use pattern matching to create the variables x and y and bind them to 7 and 4, respectively:

# let (x, y) = tuple;
let x: int = 7;
let y: int = 4;

The variable name _ also works and does not create variables:

# let (_, y) = tuple;
let y: int = 4;
# let (_, _) = tuple;

If a pattern doesn’t match, you get an exception:

# let (1, x) = (5, 5);
Warning: this pattern-matching is not exhaustive.
Exception: Match_failure.

We get two kinds of feedback from ReasonML:

  • At compile time: A warning that there are (int, int) tuples that the pattern doesn’t cover. We’ll look at what that means when we learn switch expressions.
  • At runtime: An exception that matching failed.

Single-branch pattern matching via let is called destructuring. Destructuring can also be used with function parameters (as we’ll see in an upcoming blog post).

switch  

let matched a single pattern against data. With a switch expression, we can try multiple patterns. The first match determines the result of the expression. That looks as follows:

switch «value» {
| «pattern1» => «result1»
| «pattern2» => «result2»
···
}

switch goes through the branches sequentially: the first pattern that matches value leads to the associated expression becoming the result of the switch expression. Let’s look at an example where pattern matching is simple:

let value = 1;
let result = switch value {
| 1 => "one"
| 2 => "two"
};
/* result == "one" */

If the switch value is more than a single entity (variable name, qualified variable name, literal, etc.), it needs to be in parentheses:

let result = switch (1 + 1) {
| 1 => "one"
| 2 => "two"
};
/* result == "two" */

Warnings about exhaustiveness  

When you compile the previous example or enter it in rtop, you get the following compile-time warning:

Warning: this pattern-matching is not exhaustive.

That means: The operand 1 has the type int and the branches do not cover all elements of that type. This warning is very useful, because it tells us that there are cases that we may have missed. That is, we are warned about potential trouble ahead. If there is no warning, switch will always succeed.

If you don’t fix this issue, ResonML throws a runtime exception when an operand doesn’t have a matching branch:

let result = switch 3 {
| 1 => "one"
| 2 => "two"
};
/* Exception: Match_failure */

One way to make this warning go away is to handle all elements of a type. I’ll briefly sketch how to do that for recursively defined types. These are defined via:

  • One or more (non-recursive) base cases.
  • One or more recursive cases.

For example, for natural numbers, the base case is zero, the recursive case is one plus a natural number. You can cover natural numbers exhaustively with switch via two branches, one for each case. How exactly this works will be described in an upcoming blog post.

For now, it’s enough to know that, whenever you can, you should do exhaustive coverage. Then the compiler warns you if you miss a case, preventing a whole category of errors.

If exhaustive coverage is not an option, you can introduce a catch-all branch. The next section shows how to do that.

Variables as patterns  

The warning about exhaustiveness goes away if you add a branch whose pattern is a variable:

let result = switch 3 {
| 1 => "one"
| 2 => "two"
| x => "unknown: " ++ string_of_int(x)
};
/* result == "unknown: 3" */

We have created the new variable x by matching it against the switch value. That new variable can be used in the expression of the branch.

This kind of branch is called “catch-all”: it comes last and is evaluated if all other branches fail. It always succeeds and matches everything. In C-style languages, catch-all branches are called default.

If you just want to match everything and don’t care what is matched, you can use an underscore:

let result = switch 3 {
| 1 => "one"
| 2 => "two"
| _ => "unknown"
};
/* result == "unknown" */

Patterns for tuples  

Let’s implement logical And (&&) via a switch expression:

let tuple = (true, true);

let result = switch tuple {
| (false, false) => false
| (false, true) => false
| (true, false) => false
| (true, true) => true
};
/* result == true */

This code can be simplified by using an underscore and a variable:

let result = switch tuple {
| (false, _) => false
| (true, x) => x
};
/* result == true */

The as operator  

The as operator also works in switch patterns:

let tuple = (8, (5, 9));
let result = switch tuple {
| (0, _) => (0, (0, 0))
| (_, (x, _) as t) => (x, t)
};
/* result == (5, (5, 9)) */

Alternatives in patterns  

Using alternatives in subpatterns looks as follows.

switch someTuple {
| (0, 1 | 2 | 3) => "first branch"
| _ => "second branch"
};

Alternatives can also be used at the top level:

switch "Monday" {
| "Monday"
| "Tuesday"
| "Wednesday"
| "Thursday"
| "Friday" => "weekday"
| "Saturday"
| "Sunday" => "weekend"
| day => "Illegal value: " ++ day
};
/* Result: "weekday" */

Guards for branches  

guards (conditions) for branches are a switch-specific feature: they come after patterns and are preceded by the keyword when. Let’s look at an example:

let tuple = (3, 4);
let max = switch tuple {
| (x, y) when x > y => x
| (_, y) => y
};
/* max == 4 */

The first branch is only evaluated if the guard x > y is true.

if expressions  

ReasonML’s if expressions look as follows (else can be omitted):

if («bool») «thenExpr» else «elseExpr»;

For example:

# let bool = true;
let bool: bool = true;
# let boolStr = if (bool) "true" else "false";
let boolStr: string = "true";

Given that scope blocks are also expressions, the following two if expressions are equivalent:

if (bool) "true" else "false"
if (bool) {"true"} else {"false"}

In fact, refmt pretty-prints the former expression as the latter.

The then expression and the else expression must have the same type.

Reason # if (true) 123 else "abc";
Error: This expression has type string
but an expression was expected of type int

Omitting the else branch  

You can omit the else branch – the following two expressions are equivalent.

if (b) expr else ()
if (b) expr

Given that both branches must have the same type, expr must have the type unit (whose only element is ()).

For example, print_string() evaluates to () and the following code works:

# if (true) print_string("hello\n");
hello
- : unit = ()

In contrast, this doesn’t work:

# if (true) "abc";
Error: This expression has type string
but an expression was expected of type unit

The ternary operator (_?_:_)  

ReasonML also gives you the ternary operator as an alternative to if expressions. The following two expressions are equivalent.

if (b) expr1 else expr2
b ? expr1 : expr2

The following two expressions are equivalent, too. refmt even pretty-prints the former as the latter.

switch (b) {
| true => expr1
| false => expr2
};

b ? expr1 : expr2;

I don’t find the ternary operator operator very useful in ReasonML: its purpose in languages with C syntax is to have an expression version of the if statement. But if is already an expression in ReasonML.