Jump to content
DerelictStudios Forums
Count von Phoib

Downtime Of The Past 2 Days

Recommended Posts

Edit: The backup generator failed, so we had another 20 16 (Hey, don't be mean - Sovjohn) hours of downtime. We're just back, so please hope with me that the server won't go offline again. If that happens, Sovjohn and I are bound to go Spartan on TP's asses...

 

----------------------

 

As some of you may have heard already, the datacenter where Derelict Studios is hosted experienced a major malfunction on June the 1st, 0:00 GMT.

There was a fire and subsequent explosion in the powerroom of our server facility, forcing us, and 9,000 other servers, offline.

 

Due to the extent of the damage, our server could not be powered up directly. Luckily, nobody was injured, and there was no direct server damage.

We apolagize for the downtime, and hope our service will remain undisturbed again.

 

However, since we are currently operating on backup power, there will be the possibility of downtime in the coming week. We will continue to post updates regarding those updates in this thread.

 

At our service provider, a log can be accessed to see what exactly happened. I have posted the main parts of it in the next post, so click here for more details.

Share this post


Link to post
Share on other sites
This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

 

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.

 

This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.

 

We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.

 

There is no impact in any of our other five data centers.

 

I am sorry that this accident has occurred and I apologize for the impact.

 

As previously committed, I would like to provide an update on where we stand following yesterday's explosion in our H1 data center. First, I would like to extend my sincere thanks for your patience during the past 28 hours. We are acutely aware that uptime is critical to your business, and you have my personal commitment that The Planet team will continue to work around the clock to restore your service.

 

As you have read, we have begun receiving some of the equipment required to start repairs. While no customer servers have been damaged or lost, we have new information that damage to our H1 data center is worse than initially expected. Three walls of the electrical equipment room on the first floor blew several feet from their original position, and the underground cabling that powers the first floor of H1 was destroyed.

 

There is some good news, however. We have found a way to get power to Phase 2 (upstairs, second floor) of the data center and to restore network connectivity. We will be powering up the air conditioning system and other necessary equipment within the next few hours. Once these systems are tested, we will begin bringing the 6,000 servers online. It will take four to five hours to get them all running.

 

We have brought in additional support from Dallas to have more hands and eyes on site to help with any servers that may experience problems. The call center has also brought in double staff to handle the increase in tickets we're expecting. Hopefully by sunrise tomorrow Phase 2 will be well on its way to full production.

 

Let me next address Phase 1 (first floor) of the data center and the affected 3,000 servers. The news is not as good, and we were not as lucky. The damage there was far more extensive, and we have a bigger challenge that will require a two-step process. For the first step, we have designed a temporary method that we believe will bring power back to those servers sometime tomorrow evening, but the solution will be temporary. We will use a generator to supply power through next weekend when the necessary gear will be delivered to permanently restore normal utility power and our battery backup system. During the upcoming week, we will be working with those customers to resolve issues.

 

We know this may not be a satisfactory solution for you and your business but at this time, it is the best we can do.

 

We understand that you will be due service credits based on our Service Level Agreement. We will proactively begin providing those following the restoration of service, which is our number priority, so please bear with us until this has been completed.

 

I recognize that this is not all good news. I can only assure you we will continue to utilize every means possible to fully restore service.

 

I plan to have an audio update tomorrow evening.

 

Until then,

 

Douglas J. Erwin

Chairman & Chief Executive Officer

 

Additional details can be found here.

Share this post


Link to post
Share on other sites

I wouldnt have posted anything anyway, as usual, so you wont have missed out on that measure :P

Share this post


Link to post
Share on other sites

Thats the best reasoning for a server outage ever.

 

Glad nobody got hurt and that most of the intellectual property of server owners wasnt damaged.

Share this post


Link to post
Share on other sites
Fire and explosion? Fuck yeah, that's an awesome server outage.

The only thing that could of made it cooler, was if some rabbits were in there and caught on fire, then they would running out on fire screaming like in family guy.

Share this post


Link to post
Share on other sites

As Phoib (and whoever else has my MSN) can confirm, while being physically unable to do anything to shorten the downtime, I was with my finger on the F5 on the status update page ALL DAY today.

 

He can fill you in on the curses we agreed to throw to TP staff together.

 

Buenas noches from me.

 

-Sovjohn out

Share this post


Link to post
Share on other sites

Ah well, accidents happen, at least we got back up promptly...

 

Raptor, that's horrible... children would be different, but not poor bunniez!

Share this post


Link to post
Share on other sites
Fire and explosion? Fuck yeah, that's an awesome server outage.

It's also the geekiest explosion possible.

Share this post


Link to post
Share on other sites

×