Parsing command line arguments with util.parseArgs() in Node.js

[2022-08-04] dev, javascript, nodejs
(Ad, please don’t block)
Warning: This blog post is outdated. Instead, read chapter “Parsing command line arguments with util.parseArgs() in “Shell scripting with Node.js”.

In this blog post, we explore how to use the Node.js function parseArgs() from module node:util to parse command line arguments.

Imports that are implied in this blog post  

The following two imports are implied in every example in this post:

import * as assert from 'node:assert/strict';
import {parseArgs} from 'node:util';

The first import is for test assertions we use to check values. The second import is for function parseArgs() that is the topic of this post.

The steps involved in processing command line arguments  

The following steps are involved in processing command line arguments:

  1. The user inputs a text string.
  2. The shell parses the string into a sequence of words and operators.
  3. If a command is called, it gets zero or more words as arguments.
  4. Our Node.js code receives the words via an Array stored in process.argv. process is a global variable on Node.js.
  5. We use parseArgs() to turn that Array into something that is more convenient to work with.

Let’s use the following shell script args.mjs with Node.js code to see what process.argv looks like:

#!/usr/bin/env node
console.log(process.argv);

We start with a simple command:

% ./args.mjs one two
[ '/usr/bin/node', '/home/john/args.mjs', 'one', 'two' ]

If we install the command via npm on Windows, the same command produces the following result on the Windows Command shell:

[
  'C:\\Program Files\\nodejs\\node.exe',
  'C:\\Users\\jane\\args.mjs',
  'one',
  'two'
]

No matter how we invoke a shell script, process.argv always starts with the path of the Node.js binary that is used to run our code. Next is the path of our script. The Array ends with the actual arguments the were passed to the script. In other words: The arguments of a script always start at index 2.

Therefore, we change our script so that it looks like this:

#!/usr/bin/env node
console.log(process.argv.slice(2));

Let’s try more complicated arguments:

% ./args.mjs --str abc --bool home.html main.js
[ '--str', 'abc', '--bool', 'home.html', 'main.js' ]

These arguments consist of:

  • Option --str whose value is the text abc. Such an option is called a string option.
  • Option --bool which has no associated value – it’s a flag that’s either there or not. Such an option is called a boolean option.
  • Two so-called positional arguments which have no names: home.html and main.js.

Two styles of using arguments are common:

  • The main arguments are positional, options provide additional – often optional – information.
  • Only options are used.

Written as a JavaScript function call, the previous example would look like this (in JavaScript, options usually come last):

argsMjs('home.html', 'main.js', {str: 'abc', bool: false});

Parsing command line arguments  

The basics  

If we want parseArgs() to parse an Array with arguments, we first need to tell it how our options work. Let’s assume our script has:

  • A boolean option --verbose
  • An option --times that receives non-negative integers. parseArgs() has no special support for numbers, so we have to make it a string option.
  • A string option --color

We describe these options to parseArgs() as follows:

const options = {
  'verbose': {
    type: 'boolean',
    short: 'v',
  },
  'color': {
    type: 'string',
    short: 'c',
  },
  'times': {
    type: 'string',
    short: 't',
  },
};

As long as a property key of options is a valid JavaScript identifier, it is up to you if you want to quote it or not. Both have pros and cons. In this blog post, they are always quoted. That way, options with non-identifier names such as my-new-option look the same as those with identifier names.

Each entry in options can have the following properties (as defined via a TypeScript type):

type Options = {
  type: 'boolean' | 'string', // required
  short?: string, // optional
  multiple?: boolean, // optional, default `false`
};
  • .type specifies if an option is boolean or string.
  • .short defines the short version of an option. It must be a single character. We’ll see soon how to use short versions.
  • .multiple indicates if an option can be used at most once or zero or more times. We’ll see later what that means.

The following code uses parseArgs() and options to parse an Array with arguments:

assert.deepEqual(
  parseArgs({options, args: [
    '--verbose', '--color', 'green', '--times', '5'
  ]}),
  {
    values: {__proto__:null,
      verbose: true,
      color: 'green',
      times: '5'
    },
    positionals: []
  }
);

The prototype of the object stored in .values is null. That means that we can use the in operator to check if a property exists, without having to worry about inherited properties such as .toString.

As mentioned before, the number 5 that is the value of --times, is processed as a string.

The object we pass to parseArgs() has the following TypeScript type:

type ParseArgsProps = {
  options?: {[key: string], Options}, // optional, default: {}
  args?: Array<string>, // optional
    // default: process.argv.slice(2)
  strict?: boolean, // optional, default `true`
  allowPositionals?: boolean, // optional, default `false`
};
  • .args: The arguments to parse. If we omit this property, parseArgs() uses process.argv, starting with the element at index 2.
  • .strict: If true, an exception is thrown if args isn’t correct. More on that later.
  • .allowPositionals: Can args contain positional arguments?

This is the type of the result of parseArgs():

type ParseArgsResult = {
  values: {[key: string]: ValuesValue}, // an object
  positionals: Array<string>, // always an Array
};
type ValuesValue = boolean | string | Array<boolean|string>;
  • .values contains the optional arguments. We have already seen strings and booleans as property values. We’ll see Array-valued properties when we explore option definitions where .multiple is true.
  • .positionals contains the positional arguments.

Two hyphens are used to refer to the long version of an option. One hyphen is used to refer to the short version:

assert.deepEqual(
  parseArgs({options, args: ['-v', '-c', 'green']}),
  {
    values: {__proto__:null,
      verbose: true,
      color: 'green',
    },
    positionals: []
  }
);

Note that .values contains the long names of the options.

We conclude this subsection by parsing positional arguments that are mixed with optional arguments:

assert.deepEqual(
  parseArgs({
    options,
    allowPositionals: true,
    args: [
      'home.html', '--verbose', 'main.js', '--color', 'red', 'post.md'
    ]
  }),
  {
    values: {__proto__:null,
      verbose: true,
      color: 'red',
    },
    positionals: [
      'home.html', 'main.js', 'post.md'
    ]
  }
);

Using options multiple times  

If we use an option multiple times, the default is that only the last time counts. It overrides all previous occurrences:

const options = {
  'bool': {
    type: 'boolean',
  },
  'str': {
    type: 'string',
  },
};

assert.deepEqual(
  parseArgs({
    options, args: [
      '--bool', '--bool', '--str', 'yes', '--str', 'no'
    ]
  }),
  {
    values: {__proto__:null,
      bool: true,
      str: 'no'
    },
    positionals: []
  }
);

If, however, we set .multiple to true in the definition of an option, parseArgs() gives us all option values in an Array:

const options = {
  'bool': {
    type: 'boolean',
    multiple: true,
  },
  'str': {
    type: 'string',
    multiple: true,
  },
};

assert.deepEqual(
  parseArgs({
    options, args: [
      '--bool', '--bool', '--str', 'yes', '--str', 'no'
    ]
  }),
  {
    values: {__proto__:null,
      bool: [ true, true ],
      str: [ 'yes', 'no' ]
    },
    positionals: []
  }
);

More ways of using long and short options  

Consider the following options:

const options = {
  'verbose': {
    type: 'boolean',
    short: 'v',
  },
  'silent': {
    type: 'boolean',
    short: 's',
  },
  'color': {
    type: 'string',
    short: 'c',
  },
};

The following is a compact way of using multiple boolean options:

assert.deepEqual(
  parseArgs({options, args: ['-vs']}),
  {
    values: {__proto__:null,
      verbose: true,
      silent: true,
    },
    positionals: []
  }
);

We can directly attach the value of a long string option via an equals sign. That is called an inline value.

assert.deepEqual(
  parseArgs({options, args: ['--color=green']}),
  {
    values: {__proto__:null,
      color: 'green'
    },
    positionals: []
  }
);

Short options can’t have inline values.

Quoting values  

So far, all option values and positional values were single words. If we want to use values that contain spaces, we need to quote them – with double quotes or single quotes. The latter is not supported by all shells, however.

How shells parse quoted values  

To examine how shells parse quoted values, we again use the script args.mjs:

#!/usr/bin/env node
console.log(process.argv.slice(2));

On Unix, these are the differences between double quotes and single quotes:

  • Double quotes: we can escape quotes with backslashes (which are otherwise passed on verbatim) and variables are interpolated:

    % ./args.mjs "say \"hi\"" "\t\n" "$USER"
    [ 'say "hi"', '\\t\\n', 'rauschma' ]
    
  • Single quotes: all content is passed on verbatim and we can’t escape quotes:

    % ./args.mjs 'back slash\' '\t\n' '$USER' 
    [ 'back slash\\', '\\t\\n', '$USER' ]
    

The following interaction demonstrates option values that are doube-quoted and single-quoted:

% ./args.mjs --str "two words" --str 'two words'
[ '--str', 'two words', '--str', 'two words' ]

% ./args.mjs --str="two words" --str='two words'
[ '--str=two words', '--str=two words' ]

% ./args.mjs -s "two words" -s 'two words'
[ '-s', 'two words', '-s', 'two words' ]

In the Windows Command shell single quotes are not special in any way:

>node args.mjs "say \"hi\"" "\t\n" "%USERNAME%"
[ 'say "hi"', '\\t\\n', 'jane' ]

>node args.mjs 'back slash\' '\t\n' '%USERNAME%'
[ "'back", "slash\\'", "'\\t\\n'", "'jane'" ]

Quoted option values in the Windows Command shell:

>node args.mjs --str 'two words' --str "two words"
[ '--str', "'two", "words'", '--str', 'two words' ]

>node args.mjs --str='two words' --str="two words"
[ "--str='two", "words'", '--str=two words' ]

>>node args.mjs -s "two words" -s 'two words'
[ '-s', 'two words', '-s', "'two", "words'" ]

In Windows PowerShell, we can quote with single quotes, variable names are not interpolated inside quotes and single quotes can’t be escaped:

> node args.mjs "say `"hi`"" "\t\n" "%USERNAME%"
[ 'say hi', '\\t\\n', '%USERNAME%' ]
> node args.mjs 'backtick`' '\t\n' '%USERNAME%'
[ 'backtick`', '\\t\\n', '%USERNAME%' ]

How parseArgs() handles quoted values  

This is how parseArgs() handles quoted values:

const options = {
  'times': {
    type: 'string',
    short: 't',
  },
  'color': {
    type: 'string',
    short: 'c',
  },
};

// Quoted external option values
assert.deepEqual(
  parseArgs({
    options,
    args: ['-t', '5 times', '--color', 'light green']
  }),
  {
    values: {__proto__:null,
      times: '5 times',
      color: 'light green',
    },
    positionals: []
  }
);

// Quoted inline option values
assert.deepEqual(
  parseArgs({
    options,
    args: ['--color=light green']
  }),
  {
    values: {__proto__:null,
      color: 'light green',
    },
    positionals: []
  }
);

// Quoted positional values
assert.deepEqual(
  parseArgs({
    options, allowPositionals: true,
    args: ['two words', 'more words']
  }),
  {
    values: {__proto__:null,
    },
    positionals: [ 'two words', 'more words' ]
  }
);

Option terminators  

parseArgs() supports so-called option terminators: If one of the elements of args is a double hyphen (--), then the remaining arguments are all treated as positional.

Where are option terminators needed? Some executables invoke other executables, e.g. the node executable. Then an option terminator can be used to separate the caller’s arguments from the callee’s arguments.

This is how parseArgs() handles option terminators:

const options = {
  'verbose': {
    type: 'boolean',
  },
  'count': {
    type: 'string',
  },
};

assert.deepEqual(
  parseArgs({options, allowPositionals: true,
    args: [
      'how', '--verbose', 'are', '--', '--count', '5', 'you'
    ]
  }),
  {
    values: {__proto__:null,
      verbose: true
    },
    positionals: [ 'how', 'are', '--count', '5', 'you' ]
  }
);

Strict parseArgs()  

If the option .strict is true (which is the default), then parseArgs() throws an exception if one of the following things happens:

  • The name of an option used in args is not in options.
  • An option in args has the wrong type. Currently that only haappens if a string option is missing an argument.
  • There are positional arguments in args even though .allowPositions is false (which is the default).

The following code demonstrates each of these cases:

const options = {
  'str': {
    type: 'string',
  },
};

// Unknown option name
assert.throws(
  () => parseArgs({
      options,
      args: ['--unknown']
    }),
  {
    name: 'TypeError',
    message: "Unknown option '--unknown'",
  }
);

// Wrong option type (missing value)
assert.throws(
  () => parseArgs({
      options,
      args: ['--str']
    }),
  {
    name: 'TypeError',
    message: "Option '--str <value>' argument missing",
  }
);

// Unallowed positional
assert.throws(
  () => parseArgs({
      options,
      allowPositionals: false, // (the default)
      args: ['posarg']
    }),
  {
    name: 'TypeError',
    message: "Unexpected argument 'posarg'. " +
      "This command does not take positional arguments",
  }
);

parseArgs tokens  

parseArgs() processes the args Array in two phases:

  • Phase 1: It parses args into an Array of tokens: These tokens are mostly the elements of args annotated with type information: Is it an option? Is it a positional? Etc. However, if an option has a value then the token stores both option name and option value and therefore contains the data of two args elements.
  • Phase 2: It assembles the tokens into the object that is returned via the result property .values.

We can get access to the tokens if we set config.tokens to true. Then the object returned by parseArgs() contains a property .tokens with the tokens.

These are the properties of tokens:

type Token = OptionToken | PositionalToken | OptionTerminatorToken;

interface CommonTokenProperties {
    /** Where in `args` does the token start? */
  index: number;
}

interface OptionToken extends CommonTokenProperties {
  kind: 'option';

  /** Long name of option */
  name: string;

  /** The option name as mentioned in `args` */
  rawName: string;

  /** The option’s value. `undefined` for boolean options. */
  value: string | undefined;

  /** Is the option value specified inline (e.g. --level=5)? */
  inlineValue: boolean | undefined;
}

interface PositionalToken extends CommonTokenProperties {
  kind: 'positional';

  /** The value of the positional, args[token.index] */
  value: string;
}

interface OptionTerminatorToken extends CommonTokenProperties {
  kind: 'option-terminator';
}

Examples of tokens  

As an example, consider the following options:

const options = {
  'bool': {
    type: 'boolean',
    short: 'b',
  },
  'flag': {
    type: 'boolean',
    short: 'f',
  },
  'str': {
    type: 'string',
    short: 's',
  },
};

The tokens for boolean options look like this:

assert.deepEqual(
  parseArgs({
    options, tokens: true,
    args: [
      '--bool', '-b', '-bf',
    ]
  }),
  {
    values: {__proto__:null,
      bool: true,
      flag: true,
    },
    positionals: [],
    tokens: [
      {
        kind: 'option',
        name: 'bool',
        rawName: '--bool',
        index: 0,
        value: undefined,
        inlineValue: undefined
      },
      {
        kind: 'option',
        name: 'bool',
        rawName: '-b',
        index: 1,
        value: undefined,
        inlineValue: undefined
      },
      {
        kind: 'option',
        name: 'bool',
        rawName: '-b',
        index: 2,
        value: undefined,
        inlineValue: undefined
      },
      {
        kind: 'option',
        name: 'flag',
        rawName: '-f',
        index: 2,
        value: undefined,
        inlineValue: undefined
      },
    ]
  }
);

Note that there are three tokens for option bool because it is mentioned three times in args. However, due to phase 2 of parsing, there is only one property for bool in .values.

In the next example, we parse string options into tokens. .inlineValue has boolean values now (it is always undefined for boolean options):

assert.deepEqual(
  parseArgs({
    options, tokens: true,
    args: [
      '--str', 'yes', '--str=yes', '-s', 'yes',
    ]
  }),
  {
    values: {__proto__:null,
      str: 'yes',
    },
    positionals: [],
    tokens: [
      {
        kind: 'option',
        name: 'str',
        rawName: '--str',
        index: 0,
        value: 'yes',
        inlineValue: false
      },
      {
        kind: 'option',
        name: 'str',
        rawName: '--str',
        index: 2,
        value: 'yes',
        inlineValue: true
      },
      {
        kind: 'option',
        name: 'str',
        rawName: '-s',
        index: 3,
        value: 'yes',
        inlineValue: false
      }
    ]
  }
);

Lastly, this is an example of parsing positional arguments and an option terminator:

assert.deepEqual(
  parseArgs({
    options, allowPositionals: true, tokens: true,
    args: [
      'command', '--', '--str', 'yes', '--str=yes'
    ]
  }),
  {
    values: {__proto__:null,
    },
    positionals: [ 'command', '--str', 'yes', '--str=yes' ],
    tokens: [
      { kind: 'positional', index: 0, value: 'command' },
      { kind: 'option-terminator', index: 1 },
      { kind: 'positional', index: 2, value: '--str' },
      { kind: 'positional', index: 3, value: 'yes' },
      { kind: 'positional', index: 4, value: '--str=yes' }
    ]
  }
);

Using tokens to implement subcommands  

By default, parseArgs() does not support subcommands such as git clone or npm install. However, it is relatively easy to implement this functionality via tokens.

This is the implementation:

function parseSubcommand(config) {
  // The subcommand is a positional, allow them
  const {tokens} = parseArgs({
    ...config, tokens: true, allowPositionals: true
  });
  let firstPosToken = tokens.find(({kind}) => kind==='positional');
  if (!firstPosToken) {
    throw new Error('Command name is missing: ' + config.args);
  }

  //----- Command options

  const cmdArgs = config.args.slice(0, firstPosToken.index);
  // Override `config.args`
  const commandResult = parseArgs({
    ...config, args: cmdArgs, tokens: false, allowPositionals: false
  });

  //----- Subcommand

  const subcommandName = firstPosToken.value;

  const subcmdArgs = config.args.slice(firstPosToken.index+1);
  // Override `config.args`
  const subcommandResult = parseArgs({
    ...config, args: subcmdArgs, tokens: false
  });

  return {
    commandResult,
    subcommandName,
    subcommandResult,
  };
}

This is parseSubcommand() in action:

const options = {
  'log': {
    type: 'string',
  },
  color: {
    type: 'boolean',
  }
};
const args = ['--log', 'all', 'print', '--color', 'file.txt'];
const result = parseSubcommand({options, allowPositionals: true, args});

const pn = obj => Object.setPrototypeOf(obj, null);
assert.deepEqual(
  result,
  {
    commandResult: {
      values: pn({'log': 'all'}),
      positionals: []
    },
    subcommandName: 'print',
    subcommandResult: {
      values: pn({color: true}),
      positionals: ['file.txt']
    }
  }
);