This is the Message Centre for Icy North
- 1
- 2
Dilemma
Icy North Started conversation Mar 1, 2013
I had a difficult decision to make yesterday.
The client's main financial reporting system was down - it had been down for a couple of days. We'd only taken over supporting it in January and the previous support team weren't around any more to help.
The software vendor was particularly unhelpful, either getting us to try things which we all knew wouldn't work, or telling us we had to upgrade as they didn't support that version of software any more. Upgrading such a critical system just can't be done there and then - it would take many days, what with all the customised code and testing we'd have to do.
There was a disaster-recovery system we could switch to, but the last time we did that it was too slow to be of use, and we were in the long process of upgrading it all.
There a final option - just switching the live servers off and on again. We tried it on the development servers - it failed. We tried it again - it failed with different errors. Then someone told us that the live servers hadn't been rebooted for over 2 years. To do it now would be incredibly risky.
Then someone reminded us that financial month end was coming up and we had to do something now.
What would you do?
1.Tell the client they can't have their system for a week while we upgrade.
2. Fire up the disaster recovery system in the hope that it doesn't grind to an immediate halt.
3. Reboot the live system which hasn't been shut down for 2 years.
What did I do?
Dilemma
Icy North Posted Mar 1, 2013
Option 2 would take many hours to do, as we have to migrate a massive database. Does that change your decision?
Dilemma
bobstafford Posted Mar 1, 2013
Sounds like a late night and if the data base is un useable without a danger of data loss what choice have you unless you risk the data
Dilemma
Bluebottle Posted Mar 1, 2013
Is this a trick question where the answer isn't one of the three options?
<BB<
Dilemma
Icy North Posted Mar 1, 2013
Drinks? This was a Hamlet cigar moment.
Feel free to suggest another option, BB. I couldn't think of one.
Dilemma
Geggs Posted Mar 1, 2013
I'm guessing Option 1 will take far too far for the customer's liking.
And Option 2 to will be a bit too long also.
So, it's got to be the incredibly dangerous Option 3. It's quicker, and you've just got to hope that it'll work.
Geggs
Dilemma
Beatrice Posted Mar 1, 2013
If Option 3 doesn't work - what will the situation be? Are any of the other options then ruled out?
Dilemma
Gnomon - time to move on Posted Mar 1, 2013
What sort of a company doesn't reboot a server for 2 years? You should at least reboot them every month when you apply the operating system patches.
You haven't been applying the patches? Oh dear..
Dilemma
Icy North Posted Mar 1, 2013
If it were so easy...
Oh, I forgot to tell you the outcome. By divine providence it actually restarted in one piece. Not only that, but it runs jobs a lot faster than it has for the last 2 years.
There's a moral to this story, but I'm not sure what it is. I'd still feel very uncomfortable if I had to make that decision again.
Dilemma
Milla, h2g2 Operations Posted Mar 1, 2013
Regular reboots, regular back ups, and restore tests. (Yes, tests!)
A mirrored server, that you can switch to as a failover.
Documented procedures for all these.
A disaster recovery plan in place. Documented, and tested.
A business continuity plan in place. Documented, and tested.
All signed and approved by the client.
To begin with, at least.
Dilemma
Dmitri Gheorgheni, Post Editor Posted Mar 1, 2013
Boy, you have nerves of steel, Icy. I'd never have guessed you'd take Option 3.
Glad the Force was with you.
Dilemma
Icy North Posted Mar 1, 2013
Thanks for the best practice, Milla.
Sadly, the real world only works like that when the client stumps up the cash.
Dilemma
Milla, h2g2 Operations Posted Mar 1, 2013
Isn't reality precious....
Yeah, I know it doesn't happen unless disaster has struck once, and then only maybe....
Dilemma
Baron Grim Posted Mar 1, 2013
I'm with Gnomon; I'm aghast that they haven't been performing system software patches.
But I wasn't at all surprised that you went with option 3.
Just last night I was rewatching episodes of The IT Crowd.
"Have you tried turning it off and back on again." That's always step 1.
Key: Complain about this post
- 1
- 2
Dilemma
- 1: Icy North (Mar 1, 2013)
- 2: bobstafford (Mar 1, 2013)
- 3: Icy North (Mar 1, 2013)
- 4: Galaxy Babe - eclectic editor (Mar 1, 2013)
- 5: bobstafford (Mar 1, 2013)
- 6: Bluebottle (Mar 1, 2013)
- 7: 8584330 (Mar 1, 2013)
- 8: Icy North (Mar 1, 2013)
- 9: Geggs (Mar 1, 2013)
- 10: Beatrice (Mar 1, 2013)
- 11: Icy North (Mar 1, 2013)
- 12: Gnomon - time to move on (Mar 1, 2013)
- 13: Icy North (Mar 1, 2013)
- 14: Recumbentman (Mar 1, 2013)
- 15: Milla, h2g2 Operations (Mar 1, 2013)
- 16: Dmitri Gheorgheni, Post Editor (Mar 1, 2013)
- 17: Icy North (Mar 1, 2013)
- 18: bobstafford (Mar 1, 2013)
- 19: Milla, h2g2 Operations (Mar 1, 2013)
- 20: Baron Grim (Mar 1, 2013)
More Conversations for Icy North
Write an Entry
"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."