Brent Dax (brentdax) wrote,
Brent Dax

  • Mood:
  • Music:

Load average: 314.1, 213.2, 106.7

Every once in a while, the server goes nuts.

Usually it happens when I try to turn something on and don't realize how much load it will cause.  The system seems to be running fine; then a few minutes later I notice the load average creeping up.

For those of you who are not geeks, the load average tells you how many programs are trying to use the CPU at once.  Usually one-, five-, and fifteen-minute load averages are shown.  A system's load average should never be greater than the number of CPUs in the machine for more than a short time; if it is, that means that the server can't keep up with the demand.  I only have one CPU in the server, so the load average should never exceed 1.  A typical load average on Navi is 0.33.

This new feature adds, say, 0.8 to the load average.  That makes the load average 1.13.  Which is a big problem, because the server will never catch up with a load average of 1.13; it has entered a slow slide into oblivion.

Anyway, a few minutes after I activate whatever feature, I notice the one-minute average is around 10, and the five minute average is maybe 4 or so.  If I look at the case, the hard drive lights are on solid.  I try to log in to correct this, and find that the console's reacting very sluggishly.  Pulling up top takes a minute or two, during which time the one-minute average goes up to 15.

The top display is chaotic; sometimes five Apaches will appear above ten spamds, because the five Apaches are each taking up more CPU than any one of the spamds.  This isn't a big problem normally--I'm clever enough to realize which process is the real problem when I can see all of them.  But by now there are thirty Apaches, twenty MySQLs, and fifty spamds off the bottom of the screen.

So I quit (load average is 50 now) and try using ps.  After a minute or two, this reveals that the problem is spamd, so I run killall spamd.  But qmail immediately loads up more copies of spamd.

The load average is now around 100.

I try to kill off qmail, but by this point letters appear one at a time as I type, so I make three or four typos and have to slowly backspace back to fix them.  Just trying to shut off qmail takes five more minutes.  The load average is now 200.

And the server's fucked.  I have to reboot and let it rebuild the filesystems from their journals.  (Thank you, ext3.)  Then I have to get into single-user mode, check all the databases for damage, and fix the configuration to disable the feature that started the whole sequence.  Finally, I can boot up again.

This happened in real life twice today.  The 0.33 was the normal low-level stress I have to deal with, some of which was my own crap and some of which was others' crap.  The first time, the 0.8 was Mom screaming at me to fix something I had no control over; the second, a friend was screaming at me about a dispute between one of her friends and one of mine.  So I took reboots--first in the form of a nap, then in the form of a shower.

Christ, has this day sucked.

Yeah, you know you got to help me out
Yeah, oh don't you put me on the back burner
You know you got to help me out
You're gonna bring yourself down
Yeah, you're gonna bring yourself down
Yeah, you're gonna bring yourself down

  • Paging madlori (and anyone who knows her)

    An interesting thing just happened on Facebook chat. Lori Summers [2:29:44] Got my message ? Brent Royal-Gordon [2:33:45] I did. Lori Summers…

  • guest post

    kate is the best better than the rest the best the best haikus about kate: kate's my favourite i want to lick her ballsack it would taste so…

  • Practice

    This December, I will have been practicing programming seriously for ten years. That will mark the tenth anniversary of me starting to learn Perl. I…

  • Post a new comment


    default userpic

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 1 comment