Moving beyond fork() + exec()

31 points by hongminhee 19 hours ago on lobsters | 14 comments

drmorr | 18 hours ago

I don't fully understand why "fork+exec" exists in the first place. If I, knowing relatively little about kernel development, were going to design a set of syscalls for process management, I feel like naively I would assume that the "normal" case is "spawn a new pristine child process" and the exceptional case is "make a copy of the currently running process". The latter would just be used by things like, idk, webservers where the entire model is lots of copies of identical processes.

Does anyone have any history on why fork is designed the way that it is in the first place?

edit: I suck at typing on my phone

spc476 | 15 hours ago

There's a thread on the Orange Site that goes into the history of fork()+exec().

drmorr | 5 hours ago

Thanks, this was actually an interesting orange site thread. (the tldr for folks who don't want to click through is, "fork was a requirement for machines that had tiny amounts of memory fifty years ago")

lcamtuf | 17 hours ago

Depends on how you define "normal", I guess. On many systems, creation of subprocesses and threads may outnumber execve() calls.

I don't know the ancient history - the semantics long predate Linux - but a logical guess is that you needed fork() anyway, so building on top of that was a simpler and more composable design than having a separate exec_in_a_new_process() API that duplicates some of the process-creation logic.

The split also allowed you to use execve() w/o fork() to replace the current process, which can be a worthwhile optimization in some cases. Basically, if you're done and want to pass the baton to another program, you can do that and not hog entries in the process table or memory.

Note that in performance-critical applications, fork() or execve() alone may already be too slow, so a streamlined API might not solve all your woes. This is why you see optimizations such as pre-staged Zygote processes on Android, pre-forked webserver workers, etc. These are just a context switch away.

natkr | 7 hours ago

The split also allowed you to use execve() w/o fork() to replace the current process, which can be a worthwhile optimization in some cases.

I think treating it as an optimization is underselling the case. Often, when wrapping a program, you don't want other programs to be able to be able to distinguish between the wrapper and the wrappee. You want ctrl+c, wait(), kill(), etc to transfer over transparently.

In fact exec() came first: in very early unix there was exactly one process per terminal, and exit() re-executed the shell in place – cd did not need to be a shell builtin!

https://read.seas.harvard.edu/~kohler/class/aosref/ritchie84evolution.pdf

Ritchie wrote that Thompson got the idea for fork() from the Berkeley timesharing system citing https://bitsavers.org/pdf/sds/9xx/940/ucbProjectGenie/R-21_Time-Sharing_System_Reference_Oct68.pdf

byroot | 14 hours ago

Not certain if that was the reason or just an happy accident, but the advantage of starting from a copy is that before calling exec you can do all sorts of mutations (change environment, group, user, file descriptors, signal masks, etc).

If you look at posix_spawn, which is the modern "spawn a new pristine child process" API, you'll see it has dozens upon dozens of flags and options, and is much harder to extend.

lcapaldo | 9 hours ago

it would be hard to retrofit now but the “obvious” solution is to have syscalls take the process to operate on as a parameter as a rule. Windows has some of these eg CreateRemoteThread(Ex), DuplicateHandle (the equivalent of dup/dup2 and pidfd_getfd), but certainly not everything exists in this form there either.

adrien | 11 hours ago

The proposed kernel proposal also appears to dedicate a fair amount of work to these same preparations.

I dislike that we are "stuck" with fork+exec but the approach of encoding preparations in data structures has the same issues as YAML programming.

On top of that, dealing with subprocesses and redirections is a staple of UNIX which makes fork+exec not that bad from userland programmers' point of view.

pclouds | 8 hours ago

One example of "fork only" is subshell. When a subshell is created from a parent shell, it has the exact same environment, so basically fork(). When I ported ash to windows, I had to serialize all the data to a new subshell process.

icefox | 6 hours ago

Maybe wait a week or two before posting LWN content? They deserve their subscribers and their subscribers deserve nice things.

sammko | 2 hours ago

When a LWN post is linked on lobsters: Is there more people that currently have or are considering a LWN subscription that will decide not to pay for it anymore, or more people that will find the content interesting and consider subscribing to LWN?

One advantage of fork() is that it allows processes to grow features without every program needing changes in its process creation code. Think for instance chroot (processes grow a root directory feature), job control (processes grow a controlling terminal feature), network namespaces, etc. With fork it’s natural that a new process inherits any new feature from its parent and the new feature works uniformly for old programs. Only programs that manipulate the new feature need to know about it.

If new processes are created really empty and everything has to be set up with something like pidfd_config(), then every program that spawns another needs to be updated to set up that new feature correctly.

This makes me think that in this kind of non-unix system, maybe the process environment (in the broad sense, including things like working directory, namespaces, owner and groups, …) should be a concrete thing separate from processes. But then I wonder where to draw the line between environment state and process state. Should an environment be a shared container for multiple processes, not the inherited part of process state like on unix? hmm…

It seems like you can do better than all of these with clone/exec; the bulk of forks problems are that it unshares memory, which works poorly when you add threads and large numbers of page table entries to the model.

Exec'ing in a shared memory thread solves a lot of this. I guess I have to go put some experiments together.