The magic of static
2016-04-24
How do static site generators work? That’s the question I had when I began building my own static site generator for learning purposes.
With over 100 static site generatores listed on staticgen.com, there must be something interesting going on there.
This post looks at a 2 popular static site generators, Jekyll and Hakyll, and dig into their inner workings.
Jekyll
Build a
Configuration
, create aSite
, callReader
, runGenerator
, runConverter
, write to disk.
Jekyll is really easy to use, you install the gem, run Jekyll from the command line, and it will generate your site into the _site
folder.
$ jekyll build
The build
command is found in build.rb
. There are 3 top level steps are:
- Build a
Configuration
- Initialize a
Site
with configuration - Calls
process
onSite
object
In this single command we see the 2 most important classes in Jekyll: Configuration
and Site
.
A Configuration
specifies where to find your files, plugins, layouts, data, where to write output, etc.
A Site
is the orchestrator object that knows how to call other classes to build a site. The main method of Site
is process
, and it shows clearly the steps taken.
reset
read
generate
render
cleanup
write
print_stats
Let’s dive deeper in to some of the more interesting methods.
read
Jekyll knows about different kinds of files, such as layout, data, collections, posts, pages. The class that does this work is Reader
. Each of this file has a specialized class to take care of finding the file and reading it, for example PostReader
, DataReader
, PageReader
, etc. They can be found in readers
. File contents are read into attributes on the Site
object.
render
Renderer
takes the file contents read in by the readers and run them through converters. Converters can do things like convert markdown to html, or render templates using liquid renderer.
Plugins
PluginManager
uses ruby’s require feature to load plugins from _plugins
, as gems specified in _config.yml
, or gems specified in Gemfile
. Plugins are initialized when initializing Site
.
To write a plugin, you can extend various Jekyll base classes, such as Generator
, Converter
, and implementing some methods.
Hakyll
Hakyll relies heavily on the Haskell type system, if you are not familiar with Haskell at all, this might be a difficult read. I try to describe things without talking too much about types, so I hope it is still understandable.
Hakyll takes a different approach of building your site.
You first install Hakyll using stack, and use hakyll
to initialize a site, which will generate a simple site.hs
.
This file is the entry point for configuring your build. You write haskell, calling classes and functions provided by the Hakyll library, but it looks like you’re configuring because Hakyll provides a DSL that makes it look like you’re configuring your build. Finally, to build your file, you run that configuration using stack.
By default, files are read from current directory and written to _site/
.
Again, we’ll take a top down approach.
In haskell, the main function is called main
, and in this case it is calling hakyllWith
with a Configuration
and Rules
.
If you are unfamiliar with haskell, the
$
basically means, “evaluate everything to my right first”
If we trace hakyllWith
, we find that it eventually calls Runtime.run
with a Configuration
, Logger
, and Rules
.
We can think of Rules
as a set of rules, which is made up of
- Match pattern (
match
), - Route (
route
), and - Compiler (
compile
)
A match pattern specifies glob expressions to match file paths.
A route specifies where to write output to.
A compiler specifies what to do to the input before writing it out.
These 3 functions are defined in Core/Rules.hs
.
If we look at the types of these 3 functions, they all return Rules ()
. You can find the definition of Rules
in Core/Rules/Internal.hs
>
The definition looks strange, but I think of Rules
as an abstract data type that contains computation.
All computations can read from a shared enviroment, output to a shared space, and share state.
In that way, Rules
is a composite data structure, made up of 3 other types:
RulesRead
- shared environment that values can be read fromRuleSet
- place where computation can write outputRulesState
- the current status of computation
route
and compile
set the state (RulesState
), setting the current route and compiler to use.
match
fills up the RuleSet
structure, which is made up of routes, compilers, resources and pattern, then clears the state (RulesState
). A list of matched Identifiers
is retrieved from the environment. A route is retrieved from RulesState
that route
set up, and added to the RuleSet
. The compiler is retrieved from the RulesState
that compile
set up, and a compiler is assigned to each matched Identifier
. This mapping is set on RuleSet
.
The multiple match
expressions all return Rules
, and I like to think that all the Rules
are squashed into a single Rules
, like how many Int
can be summed into a single Int
.
Finally after all this explanation, we return to the run
function. At this point we have a RuleSet
to work with, and using that we build a Runtime
.
Runtime
is similar to Rules
in that it has 3 components:
RuntimeRead
- shared environment()
- place to output to, in this case this means that we don’t save the outputRuntimeState
- current status of computation
run
eventually calls build
which calls pickAndChase
.
chase
will pick from a list of Identifier
that represents todos, and try to run the compiler on it.
A compilation can be successful, in that case the we can write routes, save to store (cache), and update states. If this compilation requires some dependency, then run the dependencies first.
When all this completes running, you’ll have your input files transformed by the specified compilers, and written to paths specified by your route!