Making a multilingual website with Jekyll collections

There are many guides for making Jekyll websites with translated pages, but none of them resulted in something as foolproof and flexible as I need it to be. After trying several approaches, I found that using collections is the best way to set things up. In fact, I found that making a website multilingual is pretty easy!

The basics

Most guides I found out there seem to be based on the principle laid out in Sylvain Durand’s Making Jekyll multilingual. If you’re getting started with translations, I recommend you read that first. I use the same basic approach. Each page and post gets a translation_reference property in its front matter: a string that is the same for all translations. That string can be used in liquid tags to look up translations.

My requirements

The examples I found didn’t solve all my problems, because they were either prone to errors in front matter or too rigid. So it turned out, I had a few more requirements than I thought I had. Here they are:

  1. There’s a limited number of supported languages: languages for which there are a homepage, header and footer navigation.
  2. Pages may be added in an unsupported language: one-off pages for events in a specific locale for instance. These pages should have menus in the default language.
  3. Any page in a supported language can have any number of translations in supported languages.
  4. All content for a supported language should be stored in its own directory.
  5. The URL structure for pages in supported languages should follow the pattern /[language code]/[page slug]
  6. For supported languages, the only translation-related info stored in front matter should be the translation_reference; only pages in an unsupported language get a language property.
  7. All pages can get a translation menu. In that menu, for each supported language there is either a link to a translated page or, in case there is no translation, the homepage.

Using Jekyll collections for translations

A site’s config.yml allows defaults for both categories and collections. That means that you can set a URL pattern with a slug there and don’t have to define a permalink for each page. Nice!

But should we now use categories or collections for this? In my view, categories are useful mainly to organize posts. To be honest, I don’t understand the purpose of collections completely, but the docs say:

Collections are a great way to group related content like members of a team or talks at a conference.

Since grouping related content (by language) is exactly what I want to do and because it’s likely that we’re going to need to use categories to organize posts at some stage, I decided to to use collections to organize translations.

Now my file structure looks like this:

content
└─── _de
     └───  pages
     └───  posts
└─── _en
     └───  pages
     └───  posts

This means all pages for supported languages are in the content directory.

Note that for the language codes, I use ISO 639. That way, they can be used directly to define the page lanugage in the head. That’s essential for accessibility and SEO.

In config.yml I’ve set up the following:

language_default: de

collections_dir: content
collections:
  de:
    output: true
    permalink: /:collection/:slug
  en:
    output: true
    permalink: /:collection/:slug

defaults:
  -
    scope:
      path: ""
      type: "pages"
    values:
      layout: "page"
  -
    scope:
      path: ""
      type: "de"
    values:
      layout: "page"
  -
    scope:
      path: "_de/posts"
    values:
      layout: "post"
      permalink: /de/post/:slug
  -
    scope:
      path: ""
      type: "en"
    values:
      layout: "page"
  -
    scope:
      path: "_en/posts"
    values:
      layout: "post"
      permalink: /en/post/:slug

This is all you need to organize content for a multilingual Jekyll website.

Finding a page’s translations

With posts an pages in place, now we can access them in various ways. Perhaps the most important way to navigate multilingual content is showing a specific page’s translations in a layout template. To do that, we first get the language of the current page. If the page is in an unsupported language, its language code should be defined in its front matter. If we were sloppy and no language is defined, we use the default language so we don’t break anything:

{% assign page_language = page.collection | default: page.language | default: "none" %}

Now we can look up all pages with the same translation reference:

{% if page.translation_reference != nil %}
  {% assign page_translations = site.documents | where: "translation_reference", page.translation_reference | where_exp: "item", "item.collection != page_language" | where_exp: "item", "item.published != false" %}
{% endif %}

Creating a translations menu

For the translations menu, we’re using the page_translations variable we declared above. Then we create an array with all supported languages. Note that Jekyll automatically adds a collection for posts, so we need to ignore that one:

{% assign languages = site.collections | where_exp: "item", "item.label != 'posts' and item.label != page_language " %}

And now it’s a matter of iterating through the languages, getting a translated page’s URL or using the homepage for the link instead:

{% for language in languages %}
  {% assign menu_item_url = '' %}

  {% if page_translations %}
    {% assign translation = page_translations | where: "translation_reference", page.translation_reference | where_exp: "item", "item.collection == language.label" | where_exp: "item", "item.published != false" | first %}
    {% if translation.url %}
      {% assign menu_item_url = translation.url %}
    {% endif %}
  {% endif %}

  {% if page_translations.size < 1 or menu_item_url.size < 1 %}
    {% assign homepage = site.documents | where: "translation_reference", "home" | where_exp: "item", "item.collection == language.label" | where_exp: "item", "item.published != false" | first %}
    {% if language.label == site.language_default %}
      {% assign menu_item_url = "/" %}
    {% elsif homepage.url %}
      {% assign menu_item_url = homepage.url %}
    {% else %}
      {% assign menu_item_url = null %}
    {% endif %}
  {% endif %}

  {% if menu_item_url %}
    <li>
      {% assign item_language_info = site.data.translations | where: "code", language.label | first %}
      <a
        lang="{{ language.label }}"
        href="{{ menu_item_url }}"
      >{% comment %}Label for the language that you pull from your data file.{% endcomment %}</a>
    </li>
  {% endif %}

{% endfor %}

I use the same approach to create language-specific RSS feeds and translation links (the <link rel="alternate"> tags in the <head>). Except that those don’t have the other language’s homepage as a fallback, of course.

Finding a translated string

It’s a good habit to put UI copy like names of menu items in yml files in the _data directory (and not in the layout files themselves). When managing translations, this is essential. Here’s an example of a simplified data yml file for a menu:

menu_title:
  de: Primäre Navigation
  en: Primary navigation

menu_items:
  de:
    - label: So funktioniert's
      path: /de/so-funktioniert-es
    - label: Über uns
      path: /de/ueber-fortomorrow
    - label: Blog
      path: /de/blog
    - label: Mitmachen
      path: /de/klima-abos
  en:
    - label: How it works
      path: /en/how-it-works
    - label: About
      path: /en/about
    - label: Join now
      path: /en/climate-subscriptions

Instead, you could use the languages as the top level identifier. I found the approach I show here works best, because the strings for the different languages appear together. That doesn’t just help understanding the meaning of the strings, but makes it easy to add, edit and remove strings too. The important thing here is to organize strings across multiple files using the same approach. That way you can copy code snippets between templates without having to adjust them for different data structures.

We use the menu data defined above like this:

{% assign nav_language = page.collection | default: site.language_default %}
{% assign menu_items = site.data.header.menu_items.[nav_language] %}
{% unless menu_items %}
  {% assign menu_items = site.data.header.menu_items.[site.language_default] %}
{% endunless %}

And we can now iterate through the menu items like you would with any liquid array:

{% for item in menu_items %}
  <li>
    <a href="{{ item.path }}">{{ item.label }}</a>
  </li>
{% endfor %}

Language-specific blog feeds

Because we store all content for one language a collection, we can’t use the posts collection that Jekyll adds by default. To create a blog archive for a language, we assign all documents in that page’s language in a variable documents, then iterate through that array.

1. Dynamically accessing all documents in a collection

I’ve done countless internet searches like ‘jekyll how to dynamically get documents in collection’. For some reason my memory fails to accept that this is the right syntax. It’s straightforward, though—this is how:

{% assign documents = site[page.collection] %}

2. Iterating through the documents and only showing posts

With all the documents in the current archive page’s language stored in the documents variable, we can now iterate through that array. We only have to filter for the type of page; we only want the posts and not the pages. We could filter for the page’s layout, but there may be multiple post layouts in use. So instead, we simply filter for a string in the document paths:

{% for document in documents %}
  {% if document.path contains "/posts/" %}
    {% comment %} Post preview stuff here {% endcomment %}
  {% endif %}
{% endfor %}

Multilingual 404 Page

With Jekyll we can only define one 404 page for all languages. Because of that, the translations for that page work a bit differently. We add copy in all supported languages to the 404 page’s front matter. Then using JavaScript and the browser’s locale, we guess what language is best shown. The <noscript> fallback simply shows text all languages.

Drafts

With collections, Jekyll requires you save drafts in a _drafts directory inside the collections directory. But we want content for a language in its own collection directory! So we’re not using _drafts at all and instead put unpublished: true in the front matter.

Conclusion

Using collections to add translation support to Jekyll works great for me. The solution above meets all requirements and there are no additional dependencies. The only issue I could see with it, is that I can’t use collections for other purposes anymore. Keeping my content organized by language, I would need nested collections, and I don’t think that’s supported. That said, I still don’t fully understand the purpose of Jekyll collections and therefore don’t mind not being able to apply them in a different way. After all, the quintessential example use of collections—collecting all pages written by one author—can easily be done with using an author tag in front matter and filtering documents based on that.

Finally a little warning: although the approach I present here is pretty easy, it can still be a lot of work to convert an existing site. After all, every piece of text appearing on the site needs to go through the language filter. If you’re building a new site that needs translations at a later stage, I recommend building that in from the start.