It was a well written blog post which had built up suspense. Disappointing to not see the root cause. I'd say it's not even established that it's systemd-resolved that is broken.
I've seen systemd-resolved do weird things with DNSSEC-enabled domains before. Perhaps the circumstances I saw weirdness matched this, but I don't have the notes from debugging it before. I've learned not to trust systemd-resolved (or dnsmasq) at all and always replace it with good old Unbound.
This domain isn’t signed and the article says systemd-resolved’s DNSSEC validator was turned off.
But I seem to have found a bug: takeonme.org is hosted by Cloudflare, and although the authoritative servers return NXDOMAIN for most query types, they return NODATA for DNSKEY. But I would be surprised if that’s relevant to this article’s issue.
Regarding the staging fallback: Caddy will not use a certificate retrieved on staging, it is only used as a way to check if the challenge is solvable, without being hindered by the rate-limiting of LE prod.
Once staging is successful, Caddy retries against prod immediately.
Regarding the monitoring: a soon-to-expire certificate should trigger an Uptime-Kuma alert if configured correctly ([ ] Certificate Expiry Notification).
I started removing systemd-resolved from my linux machines. Too much troubleshooting complexity. I don't need a third or fourth way to cache DNS between my ISP, router, and apps. What is the point of it? Didn't ask for it.
systemd-resolvd is not part of the init system though. It is a DNS resolver daemon that just happens to be developed by and be a part of the systemd suite of software.
fanf | 21 hours ago
Sadly we never find out why systemd-resolved is dropping NXDOMAIN responses.
tuxes | 21 hours ago
It was a well written blog post which had built up suspense. Disappointing to not see the root cause. I'd say it's not even established that it's systemd-resolved that is broken.
intelfx | 21 hours ago
I noticed resolved dropping NXDOMAINs multiple times already, but never bothered to investigate. Might this be the final push?
jamesog | 16 hours ago
I've seen systemd-resolved do weird things with DNSSEC-enabled domains before. Perhaps the circumstances I saw weirdness matched this, but I don't have the notes from debugging it before. I've learned not to trust systemd-resolved (or dnsmasq) at all and always replace it with good old Unbound.
fanf | 16 hours ago
This domain isn’t signed and the article says systemd-resolved’s DNSSEC validator was turned off.
But I seem to have found a bug: takeonme.org is hosted by Cloudflare, and although the authoritative servers return NXDOMAIN for most query types, they return NODATA for DNSKEY. But I would be surprised if that’s relevant to this article’s issue.
jamesog | 16 hours ago
Ah, I misread then, I thought I read that DNSSEC was in play.
I still don't trust systemd-resolved. :-)
Garbi | 7 hours ago
There is now a follow-up project on my whiteboard
I learned I need a whiteboard in my home lab.
oliverpool | 20 hours ago
Regarding the staging fallback: Caddy will not use a certificate retrieved on staging, it is only used as a way to check if the challenge is solvable, without being hindered by the rate-limiting of LE prod. Once staging is successful, Caddy retries against prod immediately.
Regarding the monitoring: a soon-to-expire certificate should trigger an Uptime-Kuma alert if configured correctly (
[ ] Certificate Expiry Notification).white-star | 14 hours ago
I started removing systemd-resolved from my linux machines. Too much troubleshooting complexity. I don't need a third or fourth way to cache DNS between my ISP, router, and apps. What is the point of it? Didn't ask for it.
heavyrain266 | 5 hours ago
Why would you want an init system to handle DNS resolution? That thing is a huge pile of junk, it even tries to replace sudo through the run0 gimmick.
yaxley_peaks | 2 hours ago
systemd-resolvd is not part of the init system though. It is a DNS resolver daemon that just happens to be developed by and be a part of the systemd suite of software.