Reiisi -- 零石: Thinking about an Ideal Operating System Init System

[JMR20170414: Posted the meat of this in my defining computers blog, here: http://defining-computers.blogspot.com/2017/04/how-proper-init-process-should-work.html.]

Just daydreaming, here. The systemd brouhaha is not dying down, of course. But proponents keep saying, "If you don't like it, fork it."

I don't particularly want to fork systemd. The so-called sysv-init was good enough until Red Hat decided that Linux had to be "management friendly" or some such really strange goal. And decided to abandon proper engineering practices to achieve their goals.

Proper engineering goals for bringing a change as drastic as systemd into an OS is to split development. (In the open source world, that's called a "fork".) Develop the stable, existing OS and the new one in parallel, let users experiment with the new one and use the stable, existing one for real work.

And keeping the stable one means you have something reasonable to compare the new one against when looking for the root causes of difficult bugs in the new system.

No, Red Hat Enterprise is not "good enough" as a stable OS. That's their product. They needed a stable Fedora and a systemd Fedora, separate from their product. Product should not participate at that level of development. (And Red Hat really hasn't used it as such.)

But that's not what this rant is about. systemd is a bad design for any and all the functions it does. One of the biggest problems is that it tries to do way too much at process id 1.

What does systemd do? It is not just a replacement for the venerable sysv-init. In addition, it tries to enforce all the things that are "supposed to be happening" in a Linux OS. What the kernel does is apparently not enough. What the various processes in a traditional Linux OS do is apparently not enough.

Well, okay, management does not want to be entirely at the mercy of their IT department, and that's a reasonable desire.

If they were willing to learn enough information science to do their job of managing people correctly, they'd know enough to not be at the mercy of their IT departments, too. But in the real world, too many managers are scared of actually knowing what's happening. So they need bells and whistles on their computers so they can think they know what's happening.

There is a right way to do this, even though it's not the best goal. And it fits within the traditional way of doing things in Linux or Unix.

The right way to do it is to develop machine functions to support the desired management and reporting functions. In the *nix world, these support functions are called daemons.

Not demons, daemons.

Daemons in mythology are unseen servants, doing small tasks behind the scenes to keep the world running. (Okay, that's an oversimplification, but it's the meaning that was intended when early computer scientists decided to use the word.)

In computers, we could have called them "angels", I suppose, but the word "angel" focuses on the role of messenger more than general laborer. The word "service" is also not quite right because what the daemons do is the work to make it possible to provide the service. And the word "servant" would also cause false expectations, I think. Servants are human, and too functional.

In engineering, there is a principle of simplicity. Make your machines as simple as possible. Complexities invite error and malfunction. Too much complexity and your machine will fail, usually at very inopportune moments.

Computers are often seen as a magic way to overcome this requirement of simplicity, but the way they work is to manage the complexity in small, simple, chunks. Inside a well-designed operating system, complex tasks are handled by dividing them into small pieces and using simple programs to deal with the pieces, and simple programs to manage the processes. This is the reason computer scientists wanted to use a word that could, in English, by used to evoke simple processes.

This is the right way to do things in computer operating systems. Break the problem into simple pieces, use simple programs to work on the simple pieces, and use simple programs to manage the processes.

One place where the advantage of this divide-and-solve approach is in the traditional Unix and Linux "init" process. This process starts other processes and then basically does little other than to make sure the other processes are kept running until their jobs are done.

So, when we think we are solving hard and complicated problems with computers, we should understand that the computer, when programmed correctly, is usually seeing those problems as a simple, organized set of simple problems and using simple techniques to solve them one at a time, very quickly.

(There are some exceptions to this rule, problems that programmers have not figured out how to divide up into simpler problems, but when we fail to use the divide-and-solve approach in our programs, our programs tend to fail, and fail at very inopportune moments.)

Now, if I were designing an ideal process structure for an operating system, here's what I would do:

Process id 1, the parent to the whole think outside the kernel, would

Call the watchdog process startup routine.
Call the interrupt manager process startup routine.
Call the process resource recycling process startup routine.
Call the general process manager process startup routine.
Enter a loop in which it

monitors these four processes and keeps them running (Who watches the watchdog? -- Uses status and maintenance routines for each.);
collects orphaned processes and passes their process records to the resource recycler process;
and checks whether the system is being taken down, exiting the loop if it is.

Call the general process manager process shutdown routine.
Call the process resource recycling process shutdown routine.
Call the interrupt manager process shutdown routine.
Call the watchdog process shutdown routine.

All other processes, including daemons, would be managed by the process manager process.

systemd and traditional init systems manage ordinary processes directly. I'm pretty sure it's more robust to have them managed separately from pid 1. Thus the separate process manager process.

There are a few more possible candidates for being managed directly by pid 1, but you really don't want anything managed directly by pid 1 that doesn't absolutely have to be. Possible candidates, that I need to look at more closely:

A process/daemon I call the rollcall daemon, that tracks daemon states and dependencies at runtime.
A process/daemon to manage signals, semaphors, and counting monitors. But non-scalar interprocess communication managers definitely should not be handled by pid 1.
A special support process for certain kinds of device managers that have to work closely with hardware.
A special support process for maintaining system resource checksums.

Processes I consider ordinary processes to be managed by the general process manager, instead of pid 1, are basically everything else, including

Socket, pipe, and other non-scalar interprocess communication managers,
Error Logging system managers,
Authentication and login managers,
Most device manager processes (including the worker processes supported by the special support process I might have the pid 1 process manage),
Processes checking and maintaining system resource checksums,
Etc.

Definitely, code to deal with SELinux, Access Control Lists, cgroups, and that kind of fluff, should be managed by ordinary processes managed by the general process manager.

Well, this is daydreaming. I am not in a position where I have the time or resources to code it up.

So, to you who tell me to shut up about systemd and make my own Linux distribution, I'd love to. You wanna pay my wages and buy me the hardware so I can?

Reiisi -- 零石

My Best Teaching Is One-on-One

Monday, October 13, 2014

Thinking about an Ideal Operating System Init System

No comments:

Post a Comment

Threads of Thought

Contributors

Blog Archive