« Thankful.
» Lunch and Dinner Here

General

Failure: You Lose

11.29.07 | 1 Comment

image.jpg

They say problems happen in threes. Hopefully I had my third today. We have a client that requires dedicated hardware for their websites because of the nature of their business, so when the head node of their cluster started hard locking (the screenshot above), I was called in to move these duties to another node.

In the midst of this moving, which was going to be at most five minutes of schedule downtime, very odd things happened.

  • The head node crashed twice while I was pulling its data across.
  • The new head node decided that it did not want to boot after I had the data across
  • The head node IP was inaccessible on the new MAC address (datacenter’s fault)

Needless to say, it took right at an hour to repair and luckily we had scheduled downtime so it was no biggie (at least from where I sit).

Bizarre server issues must be going around. Josh knows what I am talking about — losing two drives in a RAID 5… nuts.

We’ll be shipping off our head node to the hardware vendor tomorrow for an autopsy and revitalization (it is down to either bad CPUs or bad memory). Until then it sits in my foyer unplugged from the world in punishment.

So anyways, here is to backups, quick thinking and getting it up and keeping it up when it counts most. ;)

1 Comment

have your say

Add your comment below, or trackback from your own site. Subscribe to these comments.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

:

:


« Thankful.
» Lunch and Dinner Here