Go Back   HostGator Peer Support Forums > HostGator Announcements > Network Status

Notices

Closed Thread
 
Thread Tools
  #1  
Old 01-02-2005, 06:51 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default supra 1/2/05 major outage

Here is the ongoing ticket we have with the datacenter....

Pervious kernel won't boot.
--------------------------------------
(c12862host-01/02/05-04:14):This isn't telling me anything! What exactly is happening on the console? What "previous" kernel? Which kernel is not booting? What is going on with the box?
--------------------------------------
(c12862host-01/02/05-04:25):Please update this ticket with descriptive messages. I need to know what is happening on the console now.
------------------------------------------

(bwalker-01/02/05-04:36):
I looked into this. During the boot sequence it runs fsck.

The fsck errors out and says that a manual fsck is needed. When it attempts to load the password file to present a manual fsck login the system errors out stating that the system can not access the password database. This error then forces the system to automatically reboot.

I have attempted to boot into single user mode with both "single" and "init=1" with no success. I attempted to boot with the last stable redhat kernel I saw which I believe was 2.4.8-smp and was presented the same errors.

I am currently booting this machine with a knoppix CD to see if I can access anything that way. If you would like, I can run the fsck manually from knoppix to see if this error is correctable.

--------------------------------------
(c12862host-01/02/05-04:39):yeah, please manually fsck the partition it's complaining about..
--------------------------------------
(c12862host-01/02/05-04:52):Are you working on this? We are extremely tired and appreciate your help.
------------------------------------------

(bwalker-01/02/05-05:02):
After rebooting and attempting to boot the knoppix CD this system displayed the error message that it could not find a boot device. The raid array is now listing the drives as incomplete and not installing the raid bios. I am discussing with my team lead how to best approach this problem now.

I apologize for the delay.
--------------------------------------
(c12862host-01/02/05-05:05):yeah, please figure out how you're going to proceed next quickly. we need to plan on our end for any additional downtime there may be.
------------------------------------------

(bwalker-01/02/05-05:10):
Do you know if this system was setup with raid 0, 1 or 5? This raid card is not listing the raid anymore and I can't find what it was setup as.
--------------------------------------
(c12862host-01/02/05-05:12):raid1
--------------------------------------
(c12862host-01/02/05-05:13)erhaps the raid controller was the problem from the beginning? why dont you try replacing the raid controller?
--------------------------------------
(c12862host-01/02/05-05:14):are you able to boot from any one of the single disks?

We've also had multiple phone calls with the DC as well. We are currently having the raid controller swapped as we believe it failed and caused many of the issues were seeing and could be the cause of past problems as well. We will update with our next step when we find out how the replacement goes.
__________________
Gators love marshmallows.
  #2  
Old 01-02-2005, 07:05 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

all the raid cards are under lock and no one has access....

I called my contact who just got off the phone with one of the employees ordering them to climb a 16 ft fence to get what we need. (no joke)
Hopefully he's a good climber and we'll have supra as good as new.
__________________
Gators love marshmallows.
  #3  
Old 01-02-2005, 07:37 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

Raid controller is being swapped as I post this. Hopefully the data isn't corrupted and everyone will be back up in a few minutes.

If this doesn’t work we might be out of options and have to pull from weekly offsite backup. The backups look to be less than 24 hours old and contain 90-95% of sites. The rest would be a week old. Of course this is the last thing we want to do…

I’ll update everyone shortly.


and yes a raid controller slowly failing could cause all kinds of problems.
__________________
Gators love marshmallows.
  #4  
Old 01-02-2005, 07:40 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

(bwalker-01/02/05-07:31):
I apologize. this raid controller is not the same model as the one currently in the machine. we're going to look and find the right one now. "

Not a fun night / day.
__________________
Gators love marshmallows.
  #5  
Old 01-02-2005, 07:50 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

I was told 35-40 minutes it shoud be up. I'll be extremely impressed if they can pull that off though.
__________________
Gators love marshmallows.
  #6  
Old 01-02-2005, 08:47 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

I figured there would be complications...

bwalker-01/02/05-08:01):
Just a quick update on this machine. I've located and replaced this controller twice with the same results. We're going to attempt to swap out one of the drives now.
------------------------------------------

(bwalker-01/02/05-08:24):
We've replaced one of the drives and the raid is currently rebuilding the data on the mirror. Once this is complete I will update this ticket.
__________________
Gators love marshmallows.
  #7  
Old 01-02-2005, 10:45 AM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

Hope its back up soon as well.. It was only like 8 or 9 days ago it was down
  #8  
Old 01-02-2005, 12:55 PM
Trinity31 Trinity31 is offline
Baby Croc
 
Join Date: Mar 2004
Location: Canada
Posts: 78
Unhappy Re: supra 1/2/05 major outage

Wow this one is taking long!
I know everyone is working hard to resolve this, so thanks!
But do keep us updated!
__________________
Regards,
Andrew Ellsworth
Trinity31.Net
  #9  
Old 01-02-2005, 01:43 PM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

Whoa you're name is marcus too?

/offtopic

Come on supra! Figures... after I launch a new site yesterday...
  #10  
Old 01-02-2005, 03:02 PM
Denoha Denoha is offline
Hatchling Croc
 
Join Date: May 2004
Posts: 30
Default Re: supra 1/2/05 major outage

Any updates on Supra?
  #11  
Old 01-02-2005, 03:30 PM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

Well looks like its been down for appx. 9 hours since brent first posted here.

31 days in this month... that makes 744 hours in this month.

9 / 744 * 100 ~ 1.2% downtime if my calculations are correct. Which also means their 99.9% uptime guarantee is rubbish this month ( 98.8% )... and its only the 2nd day.
  #12  
Old 01-02-2005, 05:08 PM
ghumphrey ghumphrey is offline
Hatchling Croc
 
Join Date: Sep 2004
Posts: 26
Unhappy Re: supra 1/2/05 major outage

Of course, I have my customers breathing down my neck...

I hope it gets back online soon!

I appreciate all of the work that you guys are doing (and thanks to GatorSupport3 for the assistance as well!)

I would like to request a full status report when this issue is resolved (such as specific time the server went down and when it came back online, as well as the problem and resolution.) as this will help me deal with my customers.

Thanks.
  #13  
Old 01-02-2005, 05:15 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

There was a drive/raid controller failure. The raid array is rebuilding and we are actively working on it. It should be completed soon but there is no ETA as of yet.

You know when you reformat your windows pc how long it takes? It's like that accept it's rebuilding a drive with all of data that was on it. I believe there's around 45 gigs of disk usage on the drives so it is taking a while.


There's no gauage on it so it could be done in minutes or hours.
There's nothing more that can be done other than waiting for the drives to do their job.
__________________
Gators love marshmallows.
  #14  
Old 01-02-2005, 05:53 PM
ghumphrey ghumphrey is offline
Hatchling Croc
 
Join Date: Sep 2004
Posts: 26
Default Re: supra 1/2/05 major outage

Brent - Thanks for the update!
  #15  
Old 01-02-2005, 06:59 PM
Trinity31 Trinity31 is offline
Baby Croc
 
Join Date: Mar 2004
Location: Canada
Posts: 78
Unhappy Re: supra 1/2/05 major outage

A few more hours and it looks like this will have hit a total of 24 hours offline!
Surely the DC guys can do better than this!
__________________
Regards,
Andrew Ellsworth
Trinity31.Net
  #16  
Old 01-02-2005, 07:22 PM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

Im not TOO technical when ti comes to the servers, but isnt there a way to put up a technical diff. be back later type thing up?
  #17  
Old 01-02-2005, 07:28 PM
orbitant orbitant is offline
Hatchling Croc
 
Join Date: Apr 2004
Location: Singapore
Posts: 24
Default Re: supra 1/2/05 major outage

What a good way to start a new year, with customers chasing after me

Please get it up soon.
  #18  
Old 01-02-2005, 09:06 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: supra 1/2/05 major outage

I just got off the phone it is still syncing. The admin, I spoke with said he has seen these take up to 16 hours to sync. We are at about 12 hours now, so hopefully it'll be done any minute.

He believes there is a 60% chance it can be fixed from this. If this fails, we will have to go to our offsite backups and begin transferring the data over which would take another day.

Whenever you are moving large amounts of data it is pretty much on autopilot, and there is no way to speed it up. It's going to transfer the data as fast as it can and there is no magic formula to speed it up.

The servers have two hard drives that are mirrored from the raid1. What looks like happened was that the raid controller failed and as it was failing it caused data corruption making both drives to no longer be mirrored. So...
we have the second drive rebuilding to match the first drive which possibly can make both drives accessible again.
__________________
Gators love marshmallows.
  #19  
Old 01-02-2005, 09:21 PM
Nutter Nutter is offline
Baby Croc
 
Join Date: May 2004
Location: Houston, Texas
Posts: 94
Default Re: supra 1/2/05 major outage

Any word on how old these off-site backups are? I have my own set, but obviously I would rather not have to use them if y'all are restoring a new enough set.

I want to add a thank you for getting us through this with as much professionalism as you have.

- Ryan
  #20  
Old 01-02-2005, 10:03 PM
Kevin Moreland Kevin Moreland is offline
Hatchling Croc
 
Join Date: Jun 2004
Posts: 15
Exclamation Re: supra 1/2/05 major outage

Sheese...tomorrow is quickly approaching here...the *biggest* jam day for office users surfing after back from Christmas & New Year's vacation. I sincerely hope everything is back to normal in a couple of hours. Keep us informed, dig?
  #21  
Old 01-02-2005, 10:16 PM
onsite onsite is offline
Hatchling Croc
 
Join Date: Apr 2004
Posts: 41
Thumbs down Re: supra 1/2/05 major outage

This is my greatest concern as well. I hope that my sites are back up soon or there is going to be hell to pay. My customers are hopping mad at me. I don't blame them 1 bit.


Quote:
Originally Posted by Kevin Moreland
Sheese...tomorrow is quickly approaching here...the *biggest* jam day for office users surfing after back from Christmas & New Year's vacation. I sincerely hope everything is back to normal in a couple of hours. Keep us informed, dig?
  #22  
Old 01-02-2005, 10:26 PM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

Agreed. Sad to say but I've already begun getting quotes from other hosts. These problems have been rampant lately and its getting rdicuous. Although this last one cant be helped, it doesnt chnge the fact..
  #23  
Old 01-02-2005, 11:04 PM
ghumphrey ghumphrey is offline
Hatchling Croc
 
Join Date: Sep 2004
Posts: 26
Default Re: supra 1/2/05 major outage

Please forgive my ignorance, but wouldn't it be a good idea to try to bring up another duplicate server with the backup while waiting for this to rebuild as to try to reduce the downtime?

I have already had to refund numerous customers...buh.
  #24  
Old 01-03-2005, 12:37 AM
airkat airkat is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 75
Default Re: supra 1/2/05 major outage

I already suggested that. Apparantly not...

*edit* he said around 16 hours.. so we're on hour 15 now of the raid rebuild... which HOPEFULLY is successful.

Last edited by airkat; 01-03-2005 at 12:41 AM.
  #25  
Old 01-03-2005, 01:01 AM
Denoha Denoha is offline
Hatchling Croc
 
Join Date: May 2004
Posts: 30
Default Re: supra 1/2/05 major outage

I hope this is resolved soon. I'm not looking forward to Monday morning phone calls of "Where did my site and e-mail go.... again?"
Closed Thread

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
supra down again gameutopia Network Status 1 01-03-2005 08:46 AM
Problems on Supra? Kenshi Network Status 1 11-12-2004 03:56 PM
What happened on Supra Nutter Network Status 0 10-29-2004 06:07 PM
Supra Outage Thomas Network Status 4 08-16-2004 02:03 PM

All times are GMT -6. The time now is 03:23 PM.

 
Forum SEO by Zoints