|
#1
|
||||
|
||||
|
Here is the ongoing ticket we have with the datacenter....
Pervious kernel won't boot. -------------------------------------- (c12862host-01/02/05-04:14):This isn't telling me anything! What exactly is happening on the console? What "previous" kernel? Which kernel is not booting? What is going on with the box? -------------------------------------- (c12862host-01/02/05-04:25):Please update this ticket with descriptive messages. I need to know what is happening on the console now. ------------------------------------------ (bwalker-01/02/05-04:36): I looked into this. During the boot sequence it runs fsck. The fsck errors out and says that a manual fsck is needed. When it attempts to load the password file to present a manual fsck login the system errors out stating that the system can not access the password database. This error then forces the system to automatically reboot. I have attempted to boot into single user mode with both "single" and "init=1" with no success. I attempted to boot with the last stable redhat kernel I saw which I believe was 2.4.8-smp and was presented the same errors. I am currently booting this machine with a knoppix CD to see if I can access anything that way. If you would like, I can run the fsck manually from knoppix to see if this error is correctable. -------------------------------------- (c12862host-01/02/05-04:39):yeah, please manually fsck the partition it's complaining about.. -------------------------------------- (c12862host-01/02/05-04:52):Are you working on this? We are extremely tired and appreciate your help. ------------------------------------------ (bwalker-01/02/05-05:02): After rebooting and attempting to boot the knoppix CD this system displayed the error message that it could not find a boot device. The raid array is now listing the drives as incomplete and not installing the raid bios. I am discussing with my team lead how to best approach this problem now. I apologize for the delay. -------------------------------------- (c12862host-01/02/05-05:05):yeah, please figure out how you're going to proceed next quickly. we need to plan on our end for any additional downtime there may be. ------------------------------------------ (bwalker-01/02/05-05:10): Do you know if this system was setup with raid 0, 1 or 5? This raid card is not listing the raid anymore and I can't find what it was setup as. -------------------------------------- (c12862host-01/02/05-05:12):raid1 -------------------------------------- (c12862host-01/02/05-05:13) erhaps the raid controller was the problem from the beginning? why dont you try replacing the raid controller?-------------------------------------- (c12862host-01/02/05-05:14):are you able to boot from any one of the single disks? We've also had multiple phone calls with the DC as well. We are currently having the raid controller swapped as we believe it failed and caused many of the issues were seeing and could be the cause of past problems as well. We will update with our next step when we find out how the replacement goes.
__________________
Gators love marshmallows. |
|
#2
|
||||
|
||||
|
all the raid cards are under lock and no one has access....
I called my contact who just got off the phone with one of the employees ordering them to climb a 16 ft fence to get what we need. (no joke) Hopefully he's a good climber and we'll have supra as good as new.
__________________
Gators love marshmallows. |
|
#3
|
||||
|
||||
|
Raid controller is being swapped as I post this. Hopefully the data isn't corrupted and everyone will be back up in a few minutes.
If this doesn’t work we might be out of options and have to pull from weekly offsite backup. The backups look to be less than 24 hours old and contain 90-95% of sites. The rest would be a week old. Of course this is the last thing we want to do… I’ll update everyone shortly. and yes a raid controller slowly failing could cause all kinds of problems.
__________________
Gators love marshmallows. |
|
#4
|
||||
|
||||
|
(bwalker-01/02/05-07:31):
I apologize. this raid controller is not the same model as the one currently in the machine. we're going to look and find the right one now. " Not a fun night / day.
__________________
Gators love marshmallows. |
|
#5
|
||||
|
||||
|
I was told 35-40 minutes it shoud be up. I'll be extremely impressed if they can pull that off though.
__________________
Gators love marshmallows. |
|
#6
|
||||
|
||||
|
I figured there would be complications...
bwalker-01/02/05-08:01): Just a quick update on this machine. I've located and replaced this controller twice with the same results. We're going to attempt to swap out one of the drives now. ------------------------------------------ (bwalker-01/02/05-08:24): We've replaced one of the drives and the raid is currently rebuilding the data on the mirror. Once this is complete I will update this ticket.
__________________
Gators love marshmallows. |
|
#7
|
|||
|
|||
|
Hope its back up soon as well.. It was only like 8 or 9 days ago it was down
|
|
#8
|
|||
|
|||
|
Wow this one is taking long!
I know everyone is working hard to resolve this, so thanks! But do keep us updated!
__________________
Regards, Andrew Ellsworth Trinity31.Net |
|
#9
|
|||
|
|||
|
Whoa you're name is marcus too?
/offtopic Come on supra! Figures... after I launch a new site yesterday... |
|
#10
|
|||
|
|||
|
Any updates on Supra?
|
|
#11
|
|||
|
|||
|
Well looks like its been down for appx. 9 hours since brent first posted here.
31 days in this month... that makes 744 hours in this month. 9 / 744 * 100 ~ 1.2% downtime if my calculations are correct. Which also means their 99.9% uptime guarantee is rubbish this month ( 98.8% )... and its only the 2nd day. |
|
#12
|
|||
|
|||
|
Of course, I have my customers breathing down my neck...
I hope it gets back online soon! I appreciate all of the work that you guys are doing (and thanks to GatorSupport3 for the assistance as well!) I would like to request a full status report when this issue is resolved (such as specific time the server went down and when it came back online, as well as the problem and resolution.) as this will help me deal with my customers. Thanks. |
|
#13
|
||||
|
||||
|
There was a drive/raid controller failure. The raid array is rebuilding and we are actively working on it. It should be completed soon but there is no ETA as of yet.
You know when you reformat your windows pc how long it takes? It's like that accept it's rebuilding a drive with all of data that was on it. I believe there's around 45 gigs of disk usage on the drives so it is taking a while. There's no gauage on it so it could be done in minutes or hours. There's nothing more that can be done other than waiting for the drives to do their job.
__________________
Gators love marshmallows. |
|
#14
|
|||
|
|||
|
Brent - Thanks for the update!
|
|
#15
|
|||
|
|||
|
A few more hours and it looks like this will have hit a total of 24 hours offline!
Surely the DC guys can do better than this!
__________________
Regards, Andrew Ellsworth Trinity31.Net |
|
#16
|
|||
|
|||
|
Im not TOO technical when ti comes to the servers, but isnt there a way to put up a technical diff. be back later type thing up?
|
|
#17
|
|||
|
|||
|
What a good way to start a new year, with customers chasing after me
![]() Please get it up soon. |
|
#18
|
||||
|
||||
|
I just got off the phone it is still syncing. The admin, I spoke with said he has seen these take up to 16 hours to sync. We are at about 12 hours now, so hopefully it'll be done any minute.
He believes there is a 60% chance it can be fixed from this. If this fails, we will have to go to our offsite backups and begin transferring the data over which would take another day. Whenever you are moving large amounts of data it is pretty much on autopilot, and there is no way to speed it up. It's going to transfer the data as fast as it can and there is no magic formula to speed it up. The servers have two hard drives that are mirrored from the raid1. What looks like happened was that the raid controller failed and as it was failing it caused data corruption making both drives to no longer be mirrored. So... we have the second drive rebuilding to match the first drive which possibly can make both drives accessible again.
__________________
Gators love marshmallows. |
|
#19
|
|||
|
|||
|
Any word on how old these off-site backups are? I have my own set, but obviously I would rather not have to use them if y'all are restoring a new enough set.
I want to add a thank you for getting us through this with as much professionalism as you have. - Ryan |
|
#20
|
|||
|
|||
|
Sheese...tomorrow is quickly approaching here...the *biggest* jam day for office users surfing after back from Christmas & New Year's vacation. I sincerely hope everything is back to normal in a couple of hours. Keep us informed, dig?
|
|
#21
|
|||
|
|||
|
This is my greatest concern as well. I hope that my sites are back up soon or there is going to be hell to pay. My customers are hopping mad at me. I don't blame them 1 bit.
Quote:
|
|
#22
|
|||
|
|||
|
Agreed. Sad to say but I've already begun getting quotes from other hosts. These problems have been rampant lately and its getting rdicuous. Although this last one cant be helped, it doesnt chnge the fact..
|
|
#23
|
|||
|
|||
|
Please forgive my ignorance, but wouldn't it be a good idea to try to bring up another duplicate server with the backup while waiting for this to rebuild as to try to reduce the downtime?
I have already had to refund numerous customers...buh. |
|
#24
|
|||
|
|||
|
I already suggested that. Apparantly not...
![]() *edit* he said around 16 hours.. so we're on hour 15 now of the raid rebuild... which HOPEFULLY is successful. Last edited by airkat; 01-03-2005 at 12:41 AM. |
|
#25
|
|||
|
|||
|
I hope this is resolved soon. I'm not looking forward to Monday morning phone calls of "Where did my site and e-mail go.... again?"
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| supra down again | gameutopia | Network Status | 1 | 01-03-2005 08:46 AM |
| Problems on Supra? | Kenshi | Network Status | 1 | 11-12-2004 03:56 PM |
| What happened on Supra | Nutter | Network Status | 0 | 10-29-2004 06:07 PM |
| Supra Outage | Thomas | Network Status | 4 | 08-16-2004 02:03 PM |