The Linux kernel is just a program

82 points by wofo a day ago on lobsters | 22 comments

Amazing read! Would love to get another article on how the write(1... Call turns into a text on the screen. Basically "how does kernel talk to the device and I see the result"

jtolio | a day ago

Sorry, I know that this is obnoxiously tangential to the actual point of the article, but is it a good idea to run a Go program as PID 1? I think it's not. PID 1 is special, and has special signal handling and special responsibilities, that I imagine are made more challenging by the expectations of the Go runtime. Understood that this blog post is trying to demystify the kernel more than anything else, but I think it's either worth a comment about the unusual responsibilities of PID 1, or switching to a more fit-for-purpose language for a small PID 1 example program?

denz | a day ago

What's the problem? This isn't novel: see, for example, https://gokrazy.org/.

jtolio | 23 hours ago

GoKrazy deals with this specific problem though! From https://gokrazy.org/development/process-interface/

gokrazy’s init process (pid 1) supervises all the binaries the user specified via gokr-packer flags.

Containers have this issue, and so many ship either https://github.com/krallin/tini or https://github.com/Yelp/dumb-init or similar, which are very small programs that just do the minimal PID 1 responsibilities and then get out of the way.

The Go runtime is much more heavyweight than is traditionally needed for a simple PID 1, and makes use of many things that might go awry in PID 1 space. I don't mean to say it's impossible to write a PID 1 handler in Go, I'm sure you can do it, but I feel much less confident about it given the Go runtime, and decisions the runtime might make in the future. Go is certainly targeting a traditional application environment (it expects to have a functional OS running around it), and a working PID 1 is typically part of that.

: stapelberg | 22 hours ago
[gokrazy author here]

The gokrazy implementation is pretty straight-forward, and the Go runtime doesn’t seem to get in the way: https://github.com/gokrazy/gokrazy/blob/9c06b898c109609336b51b5de48fc80ded4d8514/supervise.go#L400

(The packer builds pid 1 here: https://github.com/gokrazy/tools/blob/main/internal/packer/buildinit.go — as you can see, it just does setup and then calls into gokrazy.SuperviseServices.)
: donio | 22 hours ago

The Go runtime is much more heavyweight than is traditionally needed for a simple PID 1 [...]

Unfortunately that ship has sailed with systemd. On my laptop the PID 1 systemd has a 8MB RSS which is more than many Go programs use.
: viraptor | 31 minutes ago

and makes use of many things that might go awry in PID 1

What things specifically?

denz | 4 hours ago

This article https://lobste.rs/s/aipma8/cpu_cpu_command_go_inspired_by_plan_9_cpu reminded me that even u-boot provides a init in Go: https://github.com/u-root/u-root

madhadron | 21 hours ago

It's not really a problem. Any runtime you can use to write other Unix programs that deal with signal handling works fine for PID 1. It would be fine to use Scheme or Prolog, too, so long as the implementation you wrote your program in had Unix signal handling figured out.

zknd | 9 hours ago

Author of the post here. First of all, thanks for sharing the post here, and for raising this question.

You’re absolutely right that PID 1 is special and has unusual responsibilities that need to be discussed. I deliberately chose Go for the examples instead of C to keep the series approachable for a broader audience of developers, while still being close enough to the OS to show what's really going on under the hood. The goal isn't to present Go as a "good" or feature complete init process in production, but to use a familiar language to reason about kernel behavior without getting lost in incidental complexity.

This is a work-in-progress blog post series, and I’m intentionally focusing on introducing one concept at a time. Your point about PID 1's responsibilities is completely fair, there will be a dedicated follow-up post at some point, where we complete the init process and discuss those responsibilities explicitly.

ema-pe | 8 hours ago

This is a very interesting article! I also found it very accessible.

zknd | 7 hours ago

Glad you liked it. There is a second post in case you are interested: https://serversfor.dev/linux-inside-out/system-calls-how-programs-talk-to-the-linux-kernel/

I am curious what do you think about it.

: ema-pe | 5 hours ago
Yes, I've read the second article and I'm now following the blog via RSS. The second article is also well done. I would only suggest showing a simple syscall invocation from the program in userspace.

andyc | 19 hours ago

Another possible issue is that Wait() on a process will burn an OS thread per process

OK well I looked up a comment I wrote about this (12 years ago!) - https://news.ycombinator.com/item?id=7924165

I'd be interested in any updates ... as far as I know, this is a fundamental issue with portable Go

This is the same reason that writing a Unix shell in Go can be awkward -- its concurrency is based around goroutines, which are implemented with threads

And threads do not compose well with processes

I also recently looked at what Docker/containerd does. I didn't get that far, but it seems like a hairball of abstraction

e.g. https://github.com/moby/moby/issues/31487 seems to be an issue related to the fact that processes and goroutines don't compose well

There is Docker/Moby, containerd, some kind of container shim, and then also crun/runc ...

It seems at least 10x more complex than it needs to be

I think the a C implementation ( crun, which podman uses,) makes a lot more sense than Go (runc, which Docker uses) for this application

mxey | 10 hours ago

Is there a reason why the Go runtime doesn’t use the event loop for waiting on processes? epoll and so on also support that, right?

rzhikharevich | 4 hours ago

This would require using the self-pipe trick on non-Linux Unix platforms (Linux has signalfd) which, I guess, is awkward because it involves installing a signal handler (that the user might want to install themselves).

A relevant comment from a Go contributor: https://github.com/golang/go/issues/60481#issuecomment-1567596259

: andyc | 3 hours ago
That's very related, though I think there's nothing stopping the Go runtime from using the self-pipe trick (like the Python asyncio runtime does, and presumably node.js)

I think the real problem is that Go only supports os.ForkExec() -- there is no os.Fork().

os.Fork() would be unsafe because there are also threads involved (the ones that implement goroutines, and then ones that do blocking I/O).

Hm well that's a problem for a SHELL - it has to fork() without exec

But maybe a process supervisor can use os.ForkExec()? and non-blocking WaitPid() instead of Wait()?

Someone else can probably unpack that a bit more ...

But yeah given the 2023 response from the Go team, I would still personally be wary of using Go for process management. You can probably do it, but it won't be pretty or portable.

The os package is intended to be somewhat OS independent. Some aspects of this only seem to make sense on Unix systems.

It's not yet obvious to me why we need new API. Programs that need this level of control over subprocesses are likely system dependent and can use the syscall API. I don't see a reason to worry about conflicts between the syscall API and the os API; if you need the syscall API, use only the syscall API. Then you can take advantage of whatever facilities the operating system provides, and can produce the same channel-based API that you are describing here.

The whole point is that processes are different on Unix and Windows! So making a portable os package means you are limited to a tiny least common denominator.

I think Go is trying too hard to be an OS, rather than a language ... same issue with the Dial() stuff over socket APIs

runxiyu | 20 hours ago

I find signal handling to be fine. My concern with using Go on PID 1 is Go's inability to properly handle OOM or any allocation failures in general

mxey | 12 hours ago

What special responsibilities does PID 1 have except reaping all children?

acatton | 6 hours ago

I think this is highly subjective. It depends where you put the cursor. You've put the cursor at reaping all children, but IMHO one could argue PID 1 doesn't have to reap its children either. It could totally let zombie processes occupy the process table until it's full and the system crash.

On the other hand, one could put the cursor at the opposite side of the scale, and say that it's supposed to:

mount the fstab
mount virutal filesystems (/proc, /sys, /dev, ...)
pivot_root
run fsck if necessary
load driver modules
set the runtime kernel parameters (aka "sysctl")

I know that some folks will say this can be done by subprocesses, like sysv init does it with services, or systemd with units. But IMHO, this is subjective, the init process was still in charge of starting these processes that performed these early system configuration.

I feel that what author of the story was trying to show is that you don't need any of this to run Linux. He mounted nothing, and ran his entire PID 1 from its initramfs root. Heck, his PID 1 wasn't even reaping processes.

zie | 5 hours ago

I seem to remember I used to have PID 1 as a shell script. Like other comments have mentioned, Go isn't the most ideal perhaps, but it's good enough.

kana | an hour ago

From the title I thought it was going to talk about user-mode Linux, where the kernel is run directly as a user-space program. (Maybe this series will get to it later?) But it turns out way more interesting than I expected!