About static site generators


I’ve been looking for a decent static site generator to build a simple, 10-page-or-so documentation site, and I’m failing. Here are some notes on my journey, to serve as a warning sign to future travellers, and thoughts on what static site generators could do better.

What is a static site generator, really?

I’d like to try and describe static site generators as if we hadn’t encountered this class of software yet. Please do not skip this section, even if you already (think you) know the answer.

Static site generators (SSG) are a class of software programs that help with creating websites. They’re often written by developers — professionals or hobbyists — to power their own small website or blog, and as such they tend to have a limited feature set.

SSG are close parents to scripts: short programs (a few hundred lines of code), which can be used from a command prompt to achieve specific tasks. They’re often built as a hobby by one developer, sometimes getting help from a few others months or years later. They almost never offer a graphical user interface.

One core principle of those scripts is that they work with a source collection of plain text files (aka “content”), process those files to transform them to the HTML format, and wrap them in more HTML. This is often called a “build” step. Before running the build command, you would have a file tree that looks like this:

My Cool Website
└── content
    ├── 2018-09-07-cool-blog-post.txt
    └── 2018-10-01-other-blog-post.txt

And after typing some text to run a command in a “terminal” or “command prompt” application, you would get something like:

My Cool Website
├── content
│   ├── 2018-09-07-cool-blog-post.txt
│   └── 2018-10-01-other-blog-post.txt
└── output
    ├── cool-blog-post.html
    ├── index.html
    ├── other-blog-post.html
    └── rss.xml

With that setup, you get a simple blog with one HTML page for each blog post, an index (or home) page that lists published posts, and a RSS feed. You can write your own styles if you know CSS, or — with some SSG, but not all — use a “theme” or an example project as a starting point.

Users of static site generators do this work on their own computer, and after the “output” folder has been generated they are expected to know how to put those files online to publish them. This can require software such as a FTP client, or more complex setups involving one or two additional command line tools.

There are hundreds of static site generators, with many listed on StaticGen.com. Some of the most popular ones are Jekyll, Hugo and Hexo.

When it comes to editing features, there are not many to speak of. Users are expected to know what formats like HTML and Markdown are, how to prepare images themselves (in Photoshop, Gimp or similar software), how to insert images inside a page’s content by using code that may look like this:

![My cat at 4 months](sniffles-4months.jpg)

Or even something more complex:

![My cat at 4 months]({{< imgproc sniffles-4months.jpg Resize "600x" />}})

At this point we must recognize that static site generators are tools made by developers for developers. They require a ton of previous knowledge: HTML, Markdown, command line interfaces, FTP or some other way to publish HTML files online, and of course you would need some kind of web hosting to put those files. Users will also need to dive into technical documentation for their tool of choice when they want to customize their site’s pages, work in “templates”, learn a templating language, and probably write some CSS.

The most popular static site generator, Jekyll, may have the best documentation pages of the lot. And yet, it still puts a bunch of technical language front-and-center, on the project’s home page:

A series of 4 terminal commands, starting with: gem install bundler jekyll…
Jekyll’s core message, “Get up and running in seconds”

This is developer-talk, because static site generators are tools made by developers for themselves and their peers.

On the other hand, “content management” software such as WordPress, Drupal or my personal favorite, Kirby, target a dual audience: content editors (who write and manage content), and developers (who set up the CMS and implement the site’s design and features).

I’m making this point because, if we own up to who our software’s audience is, then we can consider what we can do for that audience and hopefully design less terrible software.

A developer’s idea of elegance

I sometimes talk with developers who praise static site generators as simple, which is hogwash. I’ve seen a few projects where content editors were handed off a site based on a SSG (especially in startups where devs may dominate the conversation), and could not manage any change without a lot of training and assistance. Nothing simple about that.

What those developers probably mean is that they find static site generators elegant. A programmer thinks that a solution is elegant when it uses their current knowledge to achieve something with:

Static site generators are an elegant solution for developers because:

  1. It rests on our previous knowledge of: file and code formats, the command line, the plain-files-and-scripts philosophy, git and ssh or scp and other deployment strategies, etc.
  2. Using this base knowledge, it offers solutions to boring, repetitive tasks. For example, Markdown syntax lets us avoid repeating <p>…</p> tags in our content; templates let us avoid tasks like manually repeating HTML structure and updating menus and content lists.
  3. Finally, using the filesystem to store content lets us work in a single place (avoiding database software and languages, which would be overkill for limited datasets).

I’m all for elegant solutions that fit my mindset (aka just “elegant solutions”), and I’ve used static site generators in the past, including Pelican, Hyde, Jekyll and Metalsmith.

Here’s the thing, though: it’s been a terrible experience. It turns out that static site generators are terrible at handling content. Which is too bad, because that’s one of their very few features to begin with.

Static site generators are elegant on principle, but are not designed to deal with content more complex than a handful of pages and a list of blog posts. And I’m not talking about the speed of building hundreds of pages or more (Jekyll is notoriously slow, Hugo is fast), but the sheer ability to use content however you want.

Static site generators are bad at content

Let’s back up a bit. Most static site generators were built — or at least are advertised — as alternatives to database-driven CMS software. We’ve already shown that they may only be a valid alternative for developers and technical users with a lot of prior knowledge. But they are missing another core capability of CMS software.

In your average PHP or Python CMS, as in many Web frameworks, querying content from a relational database is always possible (and more or less straightforward). Linked objects and/or hierarchical content structures are built in features. Building home pages, landing pages, lists, modular pages, product pages with many sections, etc., out of content hierarchies or relations is standard practice; CMS software that was too limited to do that either evolved or was replaced by their competition.

When I’ve used static site generators in the past ten years, there were a few pain points like lacking documentation and strange and incompatible conventions. But the nail in the coffin was always that it’s either impossible or way too hard to build a single page from several pieces of content.

Recently, I’ve looked at static site generators again, because I wanted to convert Kaliop’s Frontend Style Guide, originally built with Kirby CMS, to something that could be built to static HTML in fewer steps by developers (e.g. npm install && npm run build).

This site is made of a handful of long “chapter” pages, with a top-level table of contents which lists all chapters and their sections:

Screenshot showing a full table of contents with 3 levels of titles

For easier content management and reordering of sections, I had broken each chapter into different section files, so that the index page is generated from three levels of content. Looking at the “General Rules” branch only, the content hierarchy looks like:

2.0
└─ general
   ├─ 1-readme
   ├─ 2-editorconfig
   ├─ 3-dry
   └─ 4-clean

When generating the full table of contents, I expect to be able to write (in pseudo-code):

for chapter in page.children:
    print chapter.title
    for section in chapter.children:
        print section.title

Similarly, when generating a chapter page, I expect that I can easily print the full chapter content:

for section in page.children:
    print section.title
    print section.content

Actual templates would be a bit more complex, since we need to wrap this content in HTML tags, and maybe pass it to template partials or components. But retrieving content from a hierarchy of local files should be straightforward.

I haven’t found a single static site generator that handles this use case gracefully. Instead, after reviewing the documentation of three dozens tools (listed on StaticGen), and actively trying to build my use case with a handful of them, I’ve seen a range of limitations and awkward workarounds:

  1. Sometimes it’s just not possible to work with content hierarchies and partial content. (Most JavaScript static site generators fall into that category.)
  2. Or you have to write some configuration code that populates “Collections” ahead of build time. (Too indirect, and breaks if you change your information architecture a bit. We need something a bit closer to direct manipulation!)
  3. Finally, a short list of tools allow listing a page’s children or “resources”, but have bugs when dealing with more than one level (Gutenberg) or impose arbitrary restrictions on what page “types” can list different types of content (Hugo).

When you can list content from children, grandchildren or other locations, it’s often not possible to process this content as Markdown or access its metadata.

A few words of warning about Hugo

The Hugo static site generator, praised by many for its speed, is more capable than most. There are surely many good things to say about Hugo, but in the spirit of warning folks about the terribleness of software: I wouldn’t recommend it for anyone who wants a smooth learning experience.

For one thing, it requires learning about way too many specific concepts: Sections, Lists, Taxonomies, Page Bundles, Leaf Bundles, Template lookup order, and the perfectly unintuitive differences between a index.md page and a _index.md page. (Or are those Sections, or Bundles? Is one a Section and the other a Page? Are Sections and Bundles “pages” too? I have no idea.)

I particularly dislike the way Hugo matches a page to a dozen possible template files or more — taking a page from Drupal’s book of idiosyncratic awfulness. Now, I understand that this is meant to make it possible to use different “themes”, and at worst your pages will still render with the theme’s layouts/_default/list.html or layouts/_default/single.html templates. But if you want to write your own HTML and CSS, the template lookup stuff is a nuisance (much like it is in the Drupal world when you want any kind of control over the HTML output).

On top of the complexity of Hugo’s design, its documentation is often subpar. For instance, the “Content Management > Page Resources” page does not explain much about, well, content, or content management. I expected a description showing content examples (some markdown or a file structure maybe), then information about how to use it in templates and other contexts. Instead, it’s a kind of API documentation listing “Properties” and “Methods”.

The second issue is that while Hugo builds pages fast, it often builds the wrong thing fast. For instance, if I rename a index.md file to _index.md, I need to run the hugo command two or three times to get it to render with the correct template. Using hugo server seems even less reliable.

Hugo is also unhelpful when a source Markdown file doesn’t render anything (there is no log or warning) or outputs a cryptic <pre></pre>. This doesn’t help the learning curve.

That being said, I heard that Hugo is an interesting choice if you’re building hundreds or thousands of pages (because it builds these pages really quickly), in which case it might be worth spending a couple weeks learning to use it.

What could we do differently?

There are thousands of static site generators out there (counting a bunch of scripts with 5 stars on GitHub) because thousands of developers looked at existing solutions and said “eh, I’ll just build my own”.

As a thought experiment, here are the core topics I would look at if I wanted to build my own tool from scratch.

1. Content naming conventions

Conventions for naming content files are necessary to allow users to create content files quickly and get predictable results. Those conventions must be short, clearly defined, clearly explained (you need great docs here), and intuitive (when possible, follow existing conventions rather than roll your own).

Major issues to solve regarding source content:

Some of those concerns can be addressed in templates or in configuration or other code, but there’s value in using elegant conventions to let users directly manipulate or tweak content to get a desired result.

2. Give feedback on content-to-output mapping

The generator should provide, by default, feedback on which files are picked up, which are ignored, and why. When a file is picked up and processed in some way, it should say so too.

To surface this information efficiently, the generator may need to come with a GUI (a web UI could work). I would be looking at tools like Fractal for inspiration.

3. More code should happen in templates

Most generators use restrictive template engines, and feed a restricted context (aka a set of data and functions) to those templates. This seems to come from a misguided need to keep templates “simple”, “logic-less”, and other kinds of baloney. Look, in a big MVC application built by people with different roles and technical knowledge, it might be useful to have your very responsible backend developers write controllers and let designers or frontend developers write HTML in a sandbox. Well, that’s a questionable approach too.

Anyway, as we said we’re making a tool for technically-minded people, and should give them power to actually do stuff. And since we’re talking about websites built ahead-of-time, the risks of logic in templates don’t apply as much.

Another consequence of restricted templates is that SSG documentation will then tell you to work in a third place: not in content, not in templates, but in “config code”. This is a problem because A) it’s probably one place too many and B) it’s often unclear at what time this config code is running or how often: once for every page generation or template run, or just once at the start of a build?

So, do more work in templates. We’re making somewhat simple websites here, it’s going to be okay. It’s cool if you can refactor your code to avoid duplication or separate concerns or whatever, but this shouldn’t be a requirement; users will only jump through hoops if they’re working on a big enough project to justify it.

4. A full content API

There should be a full content traversal API, which could be modelled after Kirby’s API (itself inspired by jQuery). Other sources of inspiration:

One concern with offering a full content API in templates is that if you run a template for a thousand pages, and do the same work (including querying the filesystem) every time, you’re going to have performance issues. I have a few possible optimizations in mind (fragmenting content queries and memoization), but I’ll admit that as a UI developer this is not my forte.

On top of traversing and retrieving content, templates should be able to transform formats (Markdown to HTML, parsing JSON and maybe YAML) and process images.

5. Markdown plus front-matter is too limiting

Originating in simple blog engines, static site generators treat pages as a single content chunk (often in Markdown) with some metadata sprinkled on top. If you need several long chunks of content (say, a product short description, long description, technical specs, and a list of vendors), you’re out of luck.

Theoretically, you can put any kind of text content in YAML “front-matter”, but editing that content without breaking the YAML syntax is a pain.

One workaround is using separate files in the same folder or in subfolder, than use the content API to retrieve them. Each such “resource” can have a body and metadata.

Kirby does things a bit differently (though it’s always possible to use a page’s files or child pages too), using a custom format with arbitrary fields:

Title: My Title
----
Date: 2018-12-25
----
Desc: Short description for list views and SEO.
----
Body:
## A field contains arbitrary text
So we can use a field value and parse it as Markdown or whatever.
…
----
Notes: …
----
References: …

Other sources of inspiration:

I’m not sure what would be the “best” solution, but it should be possible to unlock more power without completely breaking with convention.

6. No theming system

A theming infrastructure imposes a lot of technical restrictions and indirection, such as template lookup orders (see: Hugo), having to specify a handful of default template names (Hugo, again), and having to specify strange content conventions for mapping content to those default template names (still Hugo). Frankly, if a user really wants a template inheritance and theme inheritance mechanism, they’ll probably bite the bullet and work with WordPress or Drupal.

In the lightweight CMS world, having a theming system is the main reason why Grav is less straightforward than Kirby (which inspired it): themes and plugins rely on metadata in pages, so you have to put detailed config-as-metadata in every page to enable and configure theme and plugin features, which often feels like randomly pressing buttons and hoping to get a result.