Amok Time
I'm administring a couple of small number crunchers at work. That's usually very little effort, as all of them run under Debian Squeeze and an occasional 'wajig dailyupgrade' is all what's needed to keep the system up to date. Furthermore, all users of these systems are knowledgeable, disciplined and cooperative. Errors, of course, are human. And shit happens. 😉
One of these knowledgeable, disciplined and cooperative users sent me an e-mail last night, stating that he's afraid that a program he wrote and started could cause ... ahem ...problems...
Indeed. It was eating up the available memory at an alarming rate. Of course, I've set a limit for a the memory a single process can possibly request. But at the same time, this program caused the kernel to spit out a lot of oopses and other warnings resulting in an even more alarming growth of the files in /var/log. Somehow, I didn't like the looks of it, and I thus tried to kill the associated processes.
Well, my user stated in his mail that he had tried that too, and wasn't successful despite being the owner. Being root, I couldn't kill his processes either. Mind you: the status of these processes was shown to be R, nor D or Z. The parent was init, by the way. Hmpf.
Well, then, I thought, let's reboot. 'Shutdown -r now' will teach this process who's the boss around here!
Nothing happened, though, and when looking, it became clear that my shutdown command went into immediate 'uninterruptible sleep' (status D). Almost all commands did, in fact. 'Sync', for example. 😄
The system was just so faithfully and naively waiting for something to happen, that nothing worked anymore. Except for the processes of my user causing a load of above 60...and a rapidly diminishing memory and harddisk space.
Fortunately, one can employ the magic sysrequest also without physical access to Alt+druck+B:
echo b > /proc/sysrq-trigger
Replace the b by an o when you want to halt the system. 😉