Go Back   HostGator Peer Support Forums > HostGator Announcements > Network Status

Notices

Closed Thread
 
Thread Tools
  #76  
Old 05-02-2005, 12:14 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang what really happened

Must be offsite solution 50 gigs of data daily.
__________________
Gators love marshmallows.
  #77  
Old 05-02-2005, 12:54 AM
VPC VPC is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 58
Default Re: mustang what really happened

The 50G is nothing, but offsite? Ouch! The datacenter does not have space to place a system alongside with all of the colo servers you have with them? This setup is what the downfall is?
  #78  
Old 05-02-2005, 07:17 AM
amartins's Avatar
amartins amartins is offline
Royal Croc
 
Join Date: Dec 2003
Location: Europe
Posts: 415
Default Re: mustang what really happened

Hi,

Quote:
Originally Posted by GatorBrent
We were able to figure out exactly what happened!
The server was up and running fine up until four days ago when one of the drives in the raid failed. We had it scheduled to be replaced over the weekend when traffic was at its lowest.
Ouch... murphys law... having the machine on 1 drive was (hindsight) a bad choice. Would taking it down to swap the drive ASAP have been cause for much downtime?

Anyway I still think having RAID1 is an acceptable solution for our price range but 2 drives giving up the ghost at more or less the same is just too strange I would have pointed the finger at the controller. Maybe something happened physicaly to the server to cause the problem.

Antonio.
  #79  
Old 05-02-2005, 09:19 AM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang what really happened

We have found the only solution that works is doing scans that look for only files that changed and then doing backups of those files. Anything that involves tarring KILLS the boxes. If a single account logs into their control panel and does a tar if they are even a medium-size site the server load will go to 20 or so causing everything to be slow and usually some services to fail such as mysql, pop, imap. Scanning with a lower priority does work without affecting server speed too much. We do these scans only on the weekend so users do not see the effect on a grand scale. The scans take a while to complete I believe it is 12-24 hrs depending on the server.

We hate to see any type of data loss, but in a shared environment only so much can be done without affecting user performance significantly.

RAID1 is the frontline for what should be instant backup, disksync is the next line of Defense in case the raid goes with it. DS is a weekly so depending on when the files go bad determines how old your backups are. 1-7 days old.

The chances of both drives going bad within a few days time is extremely rare. but everything comes down to numbers........

when you have hundreds of servers rare things happen.
it is the same with humanity....
when you have billions of people some will die from a paper cut.

companies with billions of dollars suffer data loss all the time. Even with the best solutions in place nothing is ever foolproof.

We are not happy with what happened, but hardware will go bad. Considering what took place the backups could have been older....

here are a few things to consider....

Say the failing drive was replaced before the second drive failure. It is very possible both drives could have been knocked out from the data loss on this drive making both completely unusable. Instead we ended up with a drive that was taken off the raid days earlier and we were able to salvage data off of before it completely died.

there was a lot of downtime yes....
but if we had to resort to the disksync the restore would of taken much longer. I would say at least 24 hrs, possibly even longer due to this being a reseller server. (it is automatic process we cannot speed up)

This could have been a blessing in disguise for all we know...
raid1 is a mirrored drive. Meaning it is very possible for data loss on both from a failing drive at the same time.
__________________
Gators love marshmallows.
  #80  
Old 05-02-2005, 10:41 AM
gnugioh gnugioh is offline
Hatchling Croc
 
Join Date: May 2005
Posts: 1
Exclamation Re: mustang what really happened

Quote:
Originally Posted by GatorBrent
We were able to figure out exactly what happened!

The server was up and running fine up until four days ago when one of the drives in the raid failed. We had it scheduled to be replaced over the weekend when traffic was at its lowest. By coincidence the working drive also completely failed the same day it was scheduled to have the other drive replaced.

When they took it off line they replaced the one drive and that's when we were waiting the entire day for the fsck. That drive was completely bad and could not be salvaged. The technician who was working on it believed they replaced the bad drive with the same drive. But in fact they did not screw up! both drives were bad!! the one drive was completely unusable the other was in the process of failing. That is the one that went down today and was replaced. It had enough life left in it to continue on and give us a full backup on another drive. However was the one erroring today which we replaced.


Nobody screwed up, just no one caught on either that both drives had failed.

This is extremely confusing so I'm not sure if anyone here will understand what I wrote but here's a quick break down....

two drives were operational 0, and 1
drive 0 failed and the server was running on drive 1
drive 1 failed leaving the server with 0,1 failed.

they replaced one Bad Drive during the first outage, and then another in the next when we shut down the server.

We have only had a couple drive failures out of the hundreds of servers we have to date. I have no idea what the odds of 2 drives on the same server failing within days of one another!!! crazy!

I have closed the other two threads if you have any questions please post in this one.
I work alot with hardware and I am really curious about what kind of raid controller we are talking about. Every controller that is worth using can handle and "online swap". Is this a "software aided" controller? I might point out that if this controller is a single channel, that losing one drive can knock the other drive off line (because it may share that channel). This is caused when one drive is going bad, it saturates the channel with 'noise' and the other drive can no longer communicate. The controller then assumes that the other drives on that channel are not avaliable. It will essentially knock the "raid group" or "container" off line. SOmetimes it can corrupt the data on the 'good' disk. Dual channel or quad channel controllers are avalibale and reasonably priced. (in my opinion) They do provide added protection against this type of issue; what is known as "fault propigation". Having two drives go bad at one time is not a 'coincidince' as it is seeming to be taken. It is an engineering flaw and can be designed to avaoid it. On a system that hosts this many websites I would think it a must.
  #81  
Old 05-02-2005, 02:24 PM
123456qwerty 123456qwerty is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 5
Default Requested Solutions

Here are some things I would like to see:

1. Daily Back-ups and a way for us to back-up an entire reseller account and associated files on the server to a home computer or separate space (in one click of the mouse).

2. Please give plenty of notice to hosting resellers prior to any maintenance on the server.

3. Please inform us via alternate email addresses (i.e. hotmail, shaw, aol) that there are server problems, etc.

*If we don't get daily back-ups then we cannot plausibly look our clients in the eyes and say we offer secure reliable servers - something of which is the main selling point for many of us reselling. I have customers that moved services to me because their other web host had too many problems. Now I look really bad because the that DATA LOSS. This has cost me a lot of money. Much more for others I suspect.
  #82  
Old 05-03-2005, 12:17 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default mustang getting brand new chassis

We aren't taking any chances we are doing complete chassis swap on everything in mustang now as a precaution.
__________________
Gators love marshmallows.
  #83  
Old 05-03-2005, 12:19 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang getting brand new chassis

Should be 30 minutes of downtime or so, but something is killing drives so we aren't waiting around for it to strike again.
__________________
Gators love marshmallows.
  #84  
Old 05-03-2005, 12:43 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang getting brand new chassis

I was told 20-60 minutes. Swapping chassis isn't a big deal so don't see any complications from it. The risk not doing so is to great.

This should be taking place in 20 minutes or so.
__________________
Gators love marshmallows.
  #85  
Old 05-03-2005, 01:23 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang getting brand new chassis

was shut down chassis swap going on now.
__________________
Gators love marshmallows.
  #86  
Old 05-03-2005, 01:27 PM
bonzihostin's Avatar
bonzihostin bonzihostin is offline
Hatchling Croc
 
Join Date: Oct 2004
Location: mi
Posts: 34
Default Re: mustang getting brand new chassis

Thanks for the post! I was wondering why sites I had on the server were not loading.
  #87  
Old 05-03-2005, 01:32 PM
garymchu garymchu is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 9
Default Re: mustang getting brand new chassis

INCREDIBLE!

Already lost my biggest client we'll see how many go now. Last straw guys, can't run a professional business without professional backup.
  #88  
Old 05-03-2005, 01:40 PM
VPC VPC is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 58
Default Re: mustang getting brand new chassis

There has got to be a way to send out notifications about this. Why does this question go unanswered? How hard is it to send email notifications to the reseller email used at sign-up? Whether people get it or not, at least some would. I know I would since my main HG domain is not even hosted here anymore.
  #89  
Old 05-03-2005, 01:55 PM
GatorBrent's Avatar
GatorBrent GatorBrent is offline
HostGator Staff
 
Join Date: Oct 2002
Location: houston, texas
Posts: 2,977
Default Re: mustang getting brand new chassis

It has been completed and was a success. Everything is brand-new in this server.
__________________
Gators love marshmallows.
  #90  
Old 05-03-2005, 01:58 PM
garymchu garymchu is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 9
Default Re: mustang getting brand new chassis

Then why are my clients sites not live????

http://www.stonesatthemoment.com/
  #91  
Old 05-03-2005, 02:20 PM
GatorJay's Avatar
GatorJay GatorJay is offline
HostGator Staff
 
Join Date: Apr 2005
Location: Texas
Posts: 705
Default Re: mustang getting brand new chassis

Quote:
Originally Posted by garymchu
Then why are my clients sites not live????

http://www.stonesatthemoment.com/
Site is up now.
__________________
Affiliate Director
Check out the HostGator Blog!
Follow us on Twitter!
  #92  
Old 05-03-2005, 06:39 PM
mystry mystry is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 1
Default Re: mustang getting brand new chassis

This is great.. you guys said everything is fine now... but now look at the server. Its down every half hour and is running very slow. At least when you say its fixed please be sure your right about it.
  #93  
Old 05-03-2005, 06:41 PM
VPC VPC is offline
Baby Croc
 
Join Date: Sep 2004
Posts: 58
Default Re: mustang getting brand new chassis

yup, down again. Apache failed. HG racking up the points.
  #94  
Old 05-03-2005, 06:41 PM
macfive macfive is offline
Hatchling Croc
 
Join Date: Oct 2004
Posts: 4
Default Re: mustang what really happened

While I can't speak to the hardware issues because I have limited experience in these sort of things, I will say that for 1 year I have had gatorhost pretty much trouble free.

So if you guys can solve this problem I am going to stick with you if you can give me another year of trouble free service.
  #95  
Old 05-03-2005, 06:43 PM
macfive macfive is offline
Hatchling Croc
 
Join Date: Oct 2004
Posts: 4
Default Re: mustang what really happened

Wow, your back up! Ok then - thanks for the quick response!



Quote:
Originally Posted by macfive
While I can't speak to the hardware issues because I have limited experience in these sort of things, I will say that for 1 year I have had gatorhost pretty much trouble free.

So if you guys can solve this problem I am going to stick with you if you can give me another year of trouble free service.
  #96  
Old 05-03-2005, 07:07 PM
Pacepub Pacepub is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 1
Default Re: mustang getting brand new chassis

I have kept my frustration to myself through this nightmare but now I can't help myself... I really thought is was over now.

I was under the impression that finally "everything is brand new" - but still we see our sites going up and down. But:

Apache failed - mysql failed....

Will this yo-yo hosting continue?
Why is this still happening when everything supposedly is fixed?
When will this server be stable again?

It is a pain for everyone involved - hostgator, resellers and end customers - but please at least let us know if there STILL are continuing problems to expect so we can take decisions on whether to to move the most important sites to other servers.

Len
  #97  
Old 05-03-2005, 08:52 PM
tranquil tranquil is offline
Hatchling Croc
 
Join Date: May 2005
Posts: 2
Default Re: mustang getting brand new chassis

I also have been patient with this problem but I have to say that, although I have been satisfied with HG up until now, I am not impressed with how this issue has been handled. If the outage had happened during business hours rather than the weekend I'd be out of biz. I quit my previous host as a reseller for this same problem...continuing outages. Brent...this <must> get fixed and stay fixed. We need:

1) Better notification of when the server being taken down for maintenance/repair

2) Faster (and accurate) identification of problems

3) Faster reestablishment of service after said problems are identified

4) A detailed outline of your backup proceedures so that we know what to expect when an outage occurs

I am still having slowness issues in accessing the server...mail, mysql, etc. are all slow. Please fix or I will have to find another host...my customers leaving will leave me no choice.
  #98  
Old 05-04-2005, 07:27 PM
eNonProfits eNonProfits is offline
Hatchling Croc
 
Join Date: Apr 2005
Posts: 1
Default Re: mustang getting brand new chassis

Please respond to the question of refunds/credit.

Our sites have been up and down since this whole thing began a week ago. Email is sporaddicat best. Is there a realistic timeframe for a resolution and response to the $ issue>

VERY FRUSTRATED RESELLER
  #99  
Old 05-05-2005, 04:38 AM
amartins's Avatar
amartins amartins is offline
Royal Croc
 
Join Date: Dec 2003
Location: Europe
Posts: 415
Default Re: mustang getting brand new chassis

Hello!

Quote:
Originally Posted by eNonProfits
Please respond to the question of refunds/credit.
Our sites have been up and down since this whole thing began a week ago. Email is sporaddicat best. Is there a realistic timeframe for a resolution and response to the $ issue> VERY FRUSTRATED RESELLER
I had my 1st refund experrience last month (in almost a year and a half at HG) you email sales@hostgator.com tell them your domain and the server you are on and request a refund then your next monthly payment will be credited. It was pretty fast, efficient and painless.

HTH

Antonio.
Closed Thread

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Mustang Downtime - 30Min Tonight April 28 2005 GatorBen Network Status 0 04-28-2005 03:47 PM
Mustang MySQL problems again...... VPC Network Status 8 02-25-2005 05:30 PM
acura hard drive needs to be replaced GatorBrent Network Status 22 01-06-2005 09:54 AM
gator4 hard drive failure 12/26/03 GatorBrent Network Status 37 01-05-2004 05:41 PM

All times are GMT -6. The time now is 10:13 PM.

 
Forum SEO by Zoints