The new 2ality blog setup: statically generated via isomorphic React, hosted on Amazon S3

[2017-03-21] dev, javascript, nodejs, static site generation, 2ality
(Ad, please don’t block)

The new setup for the 2ality blog was literally years in the making: First, I experimented with various approaches. Then fine-tuning took a while, too. In this blog post I explain the details.

Why static site generation?  

For years, using Google’s Blogger as the host of my blog worked well, but I wanted a simpler design and more control. I could have gone with a server-side blogging app such as Ghost. But I decided in favor of static site generation for several reasons:

  • Hosting content via Amazon Web Services is relatively cheap.
  • Much safer w.r.t. hacking. If you host the blogging app yourself, missing a security-critical update is a constant risk.
  • More robust (unless AWS goes down ;-)

Now I’m fully in control of my content, which enables future improvements and helps with little things such as batch-editing files.

Why write your own static site generator?  

I tried several static site generators, but there was always something missing. I found that customizing them until I got what I wanted was more work than writing my own generator. Let me make it clear that the impulse to roll your own is usually not a good one, but when it comes to static site generation, you can rely on so many libraries that a generation framework does not bring that much to the table.

Whatever the solution, it had to be tweakable in JavaScript, because that’s the language I’m currently most comfortable with.

An abandoned first approach  

I first experimented with assembling the site via nested Handlebar templates. I quickly encountered two problems:

  1. I didn’t like having so much logic encoded in custom syntax in external files.
  2. More importantly, however, there were pockets of interactive content that I wrote in React and that had to be maintained and built separately and then inserted into the static content.

Static site generation via isomorphic React  

Static site generation via isomorphic React involves the following steps:

  1. Every blog post is represented as a JSON(-compatible) object containing the content as HTML plus metadata such as the title and tags.
  2. At generation time, one uses React to render that data as HTML, which is then written to disk. So far, we are still in traditional static site generation territory. This approach differs in that the HTML contains annotations. These annotations allow React to install itself into the page on the client, turning it into a dynamic web app. React needs the JSON data mentioned in the first step, which is why this data is included in the HTML file.
  3. The HTML file is uploaded to a static web server.
  4. When a user goes to the post’s web page, they immediately see content (due to the static HTML). The React-based dynamic web app starts up in the background. But even if JavaScript is switched off, most of the blog still works.

This approach neatly fixes the aforementioned problems related to static site generation:

  1. You can use familiar React syntax when writing your templates (again, not that big of a deal, but nice if you know React).
  2. It is very easy to make any part of a page interactive, in a way that is integrated well into the page-as-an-app. Additionally, graceful degradation is automatic.

Let’s take a look at code. For each page, you need the following parts:

  • SoloPage.js: one specific kind of page (and page layout), implemented as a React component that renders a JSON object as HTML. Invoked both at generation time and at runtime (on the client).
  • PageFrame.js: a site-global frame with links etc. that is wrapped around each page.
  • buildSoloPage.js: generates the page and writes it to disk.
  • page-template.ejs: is used by buildSoloPage.js. Blanks to be filled in include the static React-generated content and the JSON data.

PageFrame.js: site-global frame around pages  

export default class PageFrame extends React.Component {
  render() {
    return <div>
      <TopRow pageData={this.props.pageData} />
      <div id="bottom_row">
        {this.props.children}
        <RightColumn pageData={this.props.pageData} />
      </div>
    </div>;
  }
}

I’ve omitted the React components TopRow and RightColumn. This component receives the aforementioned JSON object, via the property pageData. The RightColumn shows a widget with the top ten most popular posts during the last 30 days and that widget renders its contents via data stored in pageData.

SoloPage.js: React component for the page content  

The SoloPage component wraps the PageFrame around the core of the page and passes on the JSON pageData:

···
import PageFrame from '../PageFrame';
···

export default class SoloPage extends React.Component {
  render() {
    return <PageFrame pageData={this.props.pageData}>
      <SoloPageCore pageData={this.props.pageData} />
    </PageFrame>;
  }
}

The core of the page displays the HTML:

class SoloPageCore extends React.Component {
  render() {
    return <div class="number-headings" id="pageCore">
      <h1 dangerouslySetInnerHTML={{ __html: this.props.pageData.titleHtml }} />
      <div dangerouslySetInnerHTML={{ __html: this.props.pageData.soloHtml }} />
    </div>;
  }
}

The last part of the file has to do with it doing double duty:

  1. At generation time, SoloPage is used to render HTML.
  2. In the browser, the file is run as a script and renders SoloPage into a <div> with the ID 'reactHtml'.

Step 2 is performed by the following code:

if (typeof window !== 'undefined') {
  render((
    <SoloPage pageData={pageData} />
  ), document.getElementById('reactHtml'));
}

The JSON object pageData comes from a global variable. How it gets there is shown in the subsection on page-template.ejs.

buildSoloPage.js: writing the file to disk  

The following function is only executed at build time:

···
import {renderToString} from 'react-dom/server';
import SoloPage from './SoloPage';
···

export default function buildSoloPage(pageData, targetPath) {
    const reactHtml = renderToString(<SoloPage pageData={pageData} />);
    const fileText = skeletonTemplate({
        pageData,
        pageBundleName: 'SoloPage.bundle.js',
        reactHtml,
    });
    fs.writeFileSync(targetPath, fileText, { encoding: 'utf8' });
}

skeletonTemplate() applies page-template.ejs to its parameter and returns the resulting string.

page-template.ejs: template for HTML page  

EJS templates can be visually a bit jarring, but are convenient in that the templating logic is expressed in JavaScript. I’ve written a blog post about how you can improve their syntax, but I’m not using that technique here.

<!doctype html>
<html>

<head>
  <meta charset="utf-8">
  <title>
    <%=pageData.titleText%>
  </title>
</head>

<body>
  <div id="reactHtml"><%-reactHtml%></div>
  <script>
    var pageData = <%-JSON.stringify(pageData)%>;
  </script>
  <script src="<%=pageBundle%>"></script>
</body>

</html>

There are four blanks to be filled in:

  • <%=pageData.titleText%>: escapes and display the title of the page.
  • <%-reactHtml%>: displays the unescaped HTML generated statically via React.
  • <%-JSON.stringify(pageData)%>: inserts the JSON data into the page (unescaped).
  • <%=pageBundle%> points to the webpack-built bundle whose entry point is SoloPage.js. Each kind of page has its own bundle.

The full generation algorithm  

Basic idea:

  • All of the data of a website is stored in a project directory proj/
  • The actual content is stored in Markdown files in proj/content/.
  • The output is written to a directory target/.

To produce the output, one iterates over all files in proj/content/:

  • Content files are translated to HTML. For now, I only have content stored in Markdown files that is translated to HTML and wrapped in React components. In the future, I may support other content, e.g. JSON data.
  • All other files are copied verbatim.
  • Files starting with a dot or an underscore are ignored.

How a Markdown file is translated to HTML is determined via its path, not via metadata stored inside it. Paths are described via globs, patterns with wildcard characters.

  • Blog posts: show up in index pages, the archive and the RSS feed.
    • Input: proj/content/YYYY/MM/file-name.md
    • Output: target/YYYY/MM/file-name.html
    • Path glob: [0-9][0-9][0-9][0-9]/[0-9][0-9]/+([-_a-z0-9]).md
  • Solo pages: are independent web pages.
    • Input: proj/content/p/file-name.md
    • Output: target/p/file-name.html
    • Path glob: p/*.md

So far, we have only looked at pages, where output is produced from single files. Additionally, there are so-called summaries that produce HTML files via input collected from multiple pages. Summaries include index pages (index.html etc.) and the RSS feed.

In-file metdata  

In order to minimize in-file metadata, I decided against using a markup language (JSON, YAML, etc.). This is what the preamble of the blog post 2ality_com_proj/content/2017/02/babel-preset-env.md looks like:

<!--
created: "2017-02-22"
tags: ["esnext", "dev", "javascript", "babel"]
-->

# `babel-preset-env`: a preset that configures Babel for you

I wanted the files to look nice in Markdown preview, which is why the metadata is wrapped in a comment and the title of the post is a normal Markdown heading.

Summaries  

During generation, I collect the metadata of all blog posts in a big Array in RAM. That data is then used to generate so-called summaries:

  • Index files show the pages in reverse chronological order and allow you to browse through the site: index.html, i/index2.html, etc.
  • An Atom feed can be used to subscribe to site updates: feeds/feed.atom, feeds/feed-2016.atom, etc.
    • The current year is in feed.atom. The remaining feed entries are grouped by year, to make the file structure stable and help with caching.
  • The Archive page lets you interactively browse the blog’s contents. A JSON file with all of the blog’s metadata is still reasonably small, enabling me to use static generation for this feature, too.

Other features  

  • Top 10 blog posts: I show a widget in a side bar that displays that top 10 most popular blog posts during the last 30 days. I collect the data for the top 10 at generation, from Google Analytics. How I do that is described in a separate blog post. I download the data at most once a day and cache it in-between.

  • Comments are handled by Disqus. Migrating Disqus from Blogger to my static site was remarkably easy, because Disqus lets you specify a canonical URL for each page, via JavaScript. Blogger was hosted at www.2ality.com, my new site is hosted as 2ality.com. Thus, the new site tells Disqus to use the domain www.2ality.com and all existing comments are where you’d expect them to be.

  • I’m using Google Custom Search for content-related searches. For now, I’m simply linking to an external page. I may customize and embed it in the future.

CSS: tips and techniques  

Making sites responsive is hard and involves lots of trial and error. Safari’s and Chrome’s responsive design modes helped. But they came with their own challenges. For example, what you see in Desktop Safari for the iPad screen size is not what you see on an actual iPad (the layout is different).

Tips and techniques:

  • Google Fonts is a great resource with lots of web fonts. There are articles on the web that provide recommendations for font pairing – Google Fonts that go together well for headings and bodies.

  • Flexbox: helps a lot with responsive layout. Alas, if you want to responsively rearrange items across axes, you are out of luck. I’m looking forward to CSS grid in this regard.

  • Displaying code: is challenging in the context of responsive design, because code becomes nearly unreadable if its lines are wrapped. But you need those lines to get narrower for some screen sizes. The solution is to scroll horizontally:

    pre {
        overflow-x: auto;
        ···
    }
    
  • Then I still had the occasional long word (e.g. a camel-cased JavaScript identifier) wreck my layout. Here, the solution was to wrap more aggressively:

    overflow-wrap: break-word;
    
  • Giving elements the width 100% occasionally caused problems, too, which I fixed via:

    width: calc(100% - «padding»)
    

Libraries I used  

So much help comes from the rich ecosystem of npm packages. These are the most important ones I used:

  • I built via npm scripts and webpack.
  • I’m using mocha for unit tests.
  • I’m planning to use the headless browser library Nightmare and a manual checker (“no comments in Markdown should appear in the output”) to make sure that the statically generated output is OK. Things are easier to check with static generation, but I still want to make sure I don’t accidentally break anything.
  • ejs: for templating. I like its simplicity.
  • fs-extra: for file system operations like recursively copying or removing directories.
  • minimatch: for matching globs against file system paths.
  • denodeify: for promisifying callback-based Node.js functions.
  • markdown-it: for parsing Markdown and rendering it to HTML. I especially liked how easy it was to add features I was missing via plugins:
    • markdown-it-footnote
    • markdown-it-anchor
    • markdown-it-attrs
  • highlight.js: to statically syntax-highlight code snippets inside Markdown content. markdown-it makes it easy to plug in the syntax highlighter of your choice.

Deployment  

For deploying the content to Amazon S3, I used s3cmd. This tool makes synching directories with S3 easy. As a plus, it only uploads files that changed, which it determines not by date, but by looking at the actual content of a file. That is tremendously helpful for static site generation, where files are often written to disk even though nothing changes.

Conclusions  

So far, I’m very happy with the new approach. But there are still a few challenges:

  • Even with over 1100 blog posts, static generation is still reasonably fast. What takes time is uploading the generated files.
  • Everything would be fine, if files were only regenerated if I change the contents of a blog post, but they are also all regenerated at least once a day, when I retrieve new “top 10 blog posts” data. It would be nice if such frequently changing content could be factored out. Maybe that will be possible if HTML imports become more widely supported (I’d still want my HTML to work statically). I wrote down my thoughts on this topic in the blog post “Modular HTML pages”.
  • At the moment, I’m still duplicating the HTML data: for each page, it exists once in JSON and once statically embedded.

Plans for the future:

  • I enjoy how easy it is to change something: If I want a different page frame, I simply edit PageFrame.js. A next step will be to separate the parts that can be reused between projects from those that can’t. It won’t be easy to do so while keeping the current ease of use.

  • I’ll also probably write command line tools for managing tags (searching, summarizing, renaming, merging, ...).

Further reading: