TypeScript: validating external data

[2020-06-09] dev, javascript, typescript
(Ad, please don’t block)

Data validation means ensuring that data has the desired structure and content.

With TypeScript, validation becomes relevant when we receive external data such as:

  • Data parsed from JSON files
  • Data received from web services

In these cases, we expect the data to fit static types we have, but we can’t be sure. Contrast that with data we create ourselves, where TypeScript continuously checks that everything is correct.

This blog post explains how to validate external data in TypeScript.

JSON schema  

Before we can explore approaches for data validation in TypeScript, we need to take a look at JSON schema because several of the approaches are based on it.

The idea behind JSON schema is to express the schema (structure and content, think static type) of JSON data in JSON. That is, metadata is expressed in the same format as data.

The use cases for JSON schema are:

  • Validating JSON data: If we have a schema definition for data, we can use tools to check that the data is correct. One issue with data can also be fixed automatically: We can specify default values that can be used to add properties that are missing.

  • Documenting JSON data formats: On one hand, the core schema definitions can be considered documentation. But JSON schema additionally supports descriptions, deprecation notes, comments, examples, and more. These mechanisms are called annotations. They are not used for validation, but for documentation.

  • IDE support for editing data: For example, Visual Studio Code supports JSON schema. If there is a schema for a JSON file, we gain several editing features: auto-completion, highlighting of errors, etc. Notably, VS Code’s support for package.json files is completely based on a JSON schema.

An example JSON schema  

This example is taken from the json-schema.org website:

{
  "$id": "https://example.com/geographical-location.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Longitude and Latitude Values",
  "description": "A geographical coordinate.",
  "required": [ "latitude", "longitude" ],
  "type": "object",
  "properties": {
    "latitude": {
      "type": "number",
      "minimum": -90,
      "maximum": 90
    },
    "longitude": {
      "type": "number",
      "minimum": -180,
      "maximum": 180
    }
  }
}

The following JSON data is valid w.r.t. this schema:

{
  "latitude": 48.858093,
  "longitude": 2.294694
}

Approaches for data validation in TypeScript  

This section provides a brief overview of various approaches for validating data in TypeScript. For each approach, I list one or more libraries that support the approach. W.r.t. libraries, I don’t intend to be comprehensive because things change quickly in this space.

Approaches not using JSON schema  

Approaches using JSON schema  

Picking a library  

Which approach and therefore library to use, depends on what we need:

  • If we are starting with TypeScript types and want to ensure that data (coming from configuration files, etc.) fits those types, then builder APIs that support static types are a good choice.

  • If our starting point is a JSON schema, then we should consider one of the libraries that support JSON schema.

  • If we are handling data that is more messy (e.g. submitted via forms), we may need a more flexible approach where static types play less of a role.

Example: validating data via the library Zod  

Defining a “schema” via Zod’s builder API  

Zod has a builder API that produces both types and validation functions. That API is used as follows:

import * as z from 'zod';

const FileEntryInputSchema = z.union([
  z.string(),
  z.tuple([z.string(), z.string(), z.array(z.string())]),
  z.object({
    file: z.string(),
    author: z.string().optional(),
    tags: z.array(z.string()).optional(),
  }),
]);

For larger schemas, it can make sense to break things up into multiple const declarations.

Zod can produce a static type from FileEntryInputSchema, but I decided to (redundantly!) manually maintain the static type FileEntryInput:

type FileEntryInput =
  | string
  | [string, string, string[]]
  | {file: string, author?: string, tags?: string[]}
  ;

Why the redundancy?

  • It’s easier to read.
  • It helps with migrating to a different validation library or approach, should I ever have to.

Zod’s generated type is still helpful because we can check if it’s assignable to FileEntryInput. That will warn us about most problems related to the two getting out of sync.

Validating data  

The following function checks if the parameter data conforms to FileEntryInputSchema:

function validateData(data: unknown): FileEntryInput {
  return FileEntryInputSchema.parse(data); // may throw an exception
}

validateData(['iceland.txt', 'me', ['vacation', 'family']]); // OK

assert.throws(
  () => validateData(['iceland.txt', 'me']));

The static type of the result of FileEntryInputSchema.parse() is what Zod derived from FileEntryInputSchema. By making FileEntryInput the return type of validateData(), we ensure that the former type is assignable to the latter.

Type guards  

FileEntryInputSchema.check() is a type guard:

function func(data: unknown) {
  if (FileEntryInputSchema.check(data)) {
    // %inferred-type: string
    // | [string, string, string[]]
    // | { author?: string | undefined; tags?: string[] | undefined; file: string; }
    data;
  }
}

It can make sense to define a custom type guard that supports FileEntryInput instead of what Zod infers.

function isValidData(data: unknown): data is FileEntryInput {
  return FileEntryInputSchema.check(data);
}

Deriving a static type from a Zod schema  

The parameterized type z.infer<Schema> can be used to derive a type from a schema:

// %inferred-type: string
// | [string, string, string[]]
// | { author?: string | undefined; tags?: string[] | undefined; file: string; }
type FileEntryInputStatic = z.infer<typeof FileEntryInputSchema>;

External vs. internal representation of data  

When working with external data, it’s often useful to distinguish two types.

On one hand, there is the type that describes the input data. Its structure is optimized for being easy to author:

type FileEntryInput =
  | string
  | [string, string, string[]]
  | {file: string, author?: string, tags?: string[]}
  ;

On the other hand, there is the type that is used in the program. Its structure is optimized for being easy to use in code:

type FileEntry = {
  file: string,
  author: null|string,
  tags: string[],
};

After we have used Zod to ensure that the input data conforms to FileEntryInput, we use a conversion function that converts the data to a value of type FileEntry.

Conclusion  

My use case for a data validation library was making sure that data matched a given TypeScript type. Therefore, I would have preferred to directly compile the type to a validation function. So far, only the Babel macro typecheck.macro does that and it requiring Babel ruled it out for me. I think I would also be OK with a tool that compiles a TypeScript type to a separate module with a validation function. But that also has downsides, usability-wise.

Therefore, Zod currently is a good solution for me and I haven’t had any regrets.

For libraries that have a builder API, I’d like to have tools that compile TypeScript types to builder API invocations (online and via a command line). This would help in two ways:

  • The tools can be used to explore how the APIs work.
  • We have the option of producing API code via the tools.