How to make your own static site generator

22 points by abhin4v 15 hours ago on lobsters | 15 comments

A few notes.

Interesting decision to take published/updated dates from Git. Among static site generators, it’s far more common to use a frontmatter field. For updated, I think using commit date is reasonable, but if you ever commit before publishing (which most people probably should) you’ll need to do at least minor history rewriting, such as with git commit --amend --reset-author --no-edit, or choose some other way of designating which commit published an item. I’m in the process of replacing my site with something completely new (and archiving rather than migrating old content), and I’ve been leaning toward doing it this way, and also automatically generating a changelog from added commits. (Part of my reason for doing it so is that I’m choosing to be a server rather than a static site generator, and would like to produce perfect Last-Modified headers, rather than just something like server start or content reload time.)
If you’re using Atom for your feeds (which you should, unless you’re dealing with podcasts where Apple poisoned everything), please don’t call it RSS. Let RSS die. It should have twenty years ago. Atom is uniformly better, in ways that actually do matter in a few places.
I see no reason to use UUIDs for feed IDs: just use your URLs.

The only reason not to is if they’re subject to change. The way they’ve been constructed gives what I think is a false impression of unchangingness. They’re not tied to the domain name, so you could shift to a different domain name without guaranteed disturbance (a good idea, incidentally—why tie yourself to GitHub?); but they are still tied to your article file path, so you presumably can’t change a slug without changing its ID, though you could make changes like removing trailing .html across the board.

For me, I changed my URL style from /blog/slug.html to /blog/slug/ in 2019, so I just put an if page.year < 2019 branch in my feed template to avoid changing the old <id> values. I’ve taken that approach with at least one company’s site migration before, and in at least one other company’s site migration put a frontmatter field for the old ID.
On drafts: I think there can be value in being able to publish a build with drafts included and marked. I have a subdomain that I use for such purposes, so I can share drafts with others while having the content still clearly marked as draft. But there’s a lot of cargo culting in static site generators, and the frontmatter field and subdirectory approaches are frequently used when you really don’t need such a thing. For my new site, I’ll be heading in quite a different direction.
And the JavaScript side is also very short:

[22 lines of code]

Y’know, it can comfortably be reduced to a one-liner, only shedding the console logging (which seems superfluous to me). Just sayin’.
```
location.origin.includes("github")||(new EventSource("/blog/live-reload").onmessage=()=>location.reload())
```

: lilac | 4 hours ago

and in at least one other company’s site migration put a frontmatter field for the old ID.

my blog is currently doing something entirely jankier than this. excited to steal this idea later!

Lately, I've been thinking of and studying different architectures for small sites and can't think of anything better than simple pipeline maybe moving the request through middleware until the handler then the response through more middleware until the end, easily visualized as: (serve (-> (handler routes) (csrf-session-parse-fancy-stuff) (log) (404) (layout))) which some web frameworks use. This works dynamically, but adding a (memoize) middleware results in an SSG just as well. (Mapping over the articles when initializing the server would trigger memoizing, you could just directly map over and memoize them to generate the html to throw onto some other server too.) As much as I think, I can't think of anything better but I'm curious about better or funner alternatives. Everything's a pipeline, that's too obvious.

Code in DB instead of source files sometimes interests people, what about keeping every paragraph and citation in a DB and then somehow making their contents into facts which the renderer can use to prehydrate minimal inline CSS into the html? Maybe making articles/pages into actors polling services to transform their contents into html would be interesting. I'm not sure how to make it more Rube Goldberg, help! IDK how enlive works so ...maybe that.

: juuso | 11 hours ago
I use Nix for SSG, which has the upside that you can make any folder into a flake hence "freeze" the page in time regarding CSS and so. The build time is like 10 minutes though. I also use my own templating program, which handles a template as a lens view of the files that it imports. It would be interesting to experiment more with bidirectionality with SSGs -- one could have a join of views but also compose that up in the stack. It would be great to have an editor view of the composed files and edit in the joined view such that the changes would propagate back to the original template files; the lens laws would guarantee this would work.
: zetashift | 7 hours ago

Code in DB

Ha! did somebody say more complex than necessary? I write my dev logs and wannabe literate programs using Unison doc format. There is a library that creates a blog from those. The doc format is basically a flavor of markdown.

Ideally I'd use org-mode for everything, but it's been hard to get my Emacs config just right for it(which is a me problem tbh).

matklad | 9 hours ago

I always wanted to add live-reloading

This one I now consider a mistake. I also used to add live-reloading to stuff I write, by adding my own reloading code, or using VS Code live-reload extension.

What I realized is that this is poor factoring --- live reloading should be in a different process, a feature of the static file server. With this setup you can "add" live reloading anywhere, without any changes. In other words, if you run python3 -m http.server -d ./out/ to start a normal server, you can run live-server ./out/ instead, and have live-reloading working orthogonally to the rest of your infrastructure.

Sadly, I don't know, of the top of my head, a go-to implementation here (like python3 -m http.server is the goto for non-reloading one). I've been using https://github.com/lomirus/live-server, it works great, probably deserves to be better known.

Compare with static sites --- there's no such thing as a static site, you don't "literally" shove your file system into user's browser. Rather, there's a convention that is followed by many programs, that, when you receives GET /assets/style.css HTTP/1.1 on a TCP socket, you send back contents of ./assets/style.css which appropriate HTTP preamble. This is implicit interface that allows many different static site generators, and many different web servers, to meet. Exactly the same convention works for live-reloading also, it's just that the space of live-reloading web-servers is thinner.

: zimpenfish | 9 hours ago

What I realized is that this is poor factoring --- live reloading should be in a different process, a feature of the static file server.

Sounds similar(ish) to hugo - hugo server launches a local server with live reloading / in-memory building for development / testing whilst hugo without the server argument rebuilds the static site.
: polywolf | 7 hours ago
I agree, it certainly seems like dedicated live-reloading servers could do a better job. I'd use https://github.com/yandeu/five-server (which is just live-server kept up-to-date with the npm dependency treadmill), weird there aren't other ones though. Guess it's because it needs to introspect/modify the data going out (to inject the live reloading code), which is a niche place to be when typically web frameworks focus on "serve stuff from other sources really fast" or "generate everything from scratch"?
: jaredkrinke | 5 hours ago
In general, I agree, but one upside to having an integrated live-reload server is speed. You’ll know which pages to rebuild (the ones waiting on events), and the web server won’t have to wait on noisy file system events.

Usually not a big deal, but I was able to get my live rebuild time under 100ms on an old computer, with a naive JavaScript SSG, by doing this. Having the page update immediately was nice for iteration.

[OP] abhin4v | 10 hours ago

I wrote a post about writing an SSG using Shake in Haskell. I see that it implements many ideas from this post, at least in a rudimentary way. My actual SSG (also build on Shake) is way more featureful but this article (and mine) are a good starting point.

fanf | 6 hours ago

For search I use pagefind which is really easy to add to a static site.

: carlana | 3 hours ago
Yes, I had a very good experience integrating it into a project.

sigmonsez | 2 hours ago

It's super easy to throw together an SSG using go with goldmark for markdown ahndling, templ for html layout and templating and yaml for frontmatter. If anyone is interested i'll do a blog post on it. been meaning to for a while but never find the time.

: iamnearlythere | 15 minutes ago
We’ve got similar taste: https://iamnearlythere.com/replacing-Jekyll-with-go/ The Makefile of that repo shows the reload strategy: using entr

einacio | 11 hours ago

I wonder if it makes sense to use a markdownast instead of dom on the html. On the markdown you'd have pre linting and source code references, but things like finding one element lists or transformations are simple to do with the standard api.