|
#1
|
|||
|
|||
|
Let it be known that HG does not have ANY sort of disaster recovery plan for their shared hosting servers even at the Business Plan level. What this means is that it is entirely possible they lose every client's data if they have a complete RAID array failure, or if they "eek one out" as they just did with their gator1143, you could end up with incompetent fools running hardware beyond its service life, never checking (or as they should be doing, constantly polling) the S.M.A.R.T. data of the drive's in the array(s) that would have alerted them well in advance failures were eminent and having a 50+ hour 100% outage of services for which the vast majority of that time (if not more) the entire server was useless and your domains/sites and other services will be down and your settings unrestored or wrong. For the only backups they apparently keep our those you have of your own data on the very same RAID array and physical machine that hosts your services - you know, the ones you'll find in your jailed root account on their server. Anyone in this industry knows backing up data on the same volume is utter lunacy and a waste of time. C-Panel's systems allow them to do this easily, but it is only good for issues arising either from a client error, occasional hiccup in updating a server's installed components (like a bad .php module update that breaks a common CMS, or other conflict etc...) an exploit; beit from an internal, external, client or host based source or any number of things that could take down a client's account; something usually confined to a client's domain(s) and web content (and perhaps db's) - besides that is all that is backed up per account separately...
Now "Gators" (IT Staff at HOSTGATOR) don't misunderstand; I make this statement without knowing "really beyond a shadow of a doubt" what your procedures are and will post links to your own thread(s) below regarding this topic/fiasco and attach .pdf copies of your original posts so they are not modified to save face or distort reality. If you plan on refuting this and have any sort of proof to back up the notion my assertions are incorrect, I warn you to tread with caution. For if you DO have ANY backup snapshots of your shared hosts (of the entire server as any Enterprise should...) and are not simply relying on the fault tolerance of your RAID arrays on each physical machine, you could only say your response to this crisis was even more incompetent for not simply restoring the most recent backup of the entire shared hosting server (in this case gator1143) with the failed hardware (in this instance multiple HDD's in an array...) if for nothing more than as an interim measure to alleviate the long-term 100% downtime experienced by clients hosted on that server and I say only for the "interim" if your backup snapshots are irresponsibly out of date, if they were done properly, you'd have daily snapshots and the data loss, etc would be very, very minimal and you could pose it to the clients as to whether they'd like have a few hours of downtime from a backup snapshot from a maximum of 12 hours ago, or would they rather deal with the potential of days of 100% downtime of all their contracted services. I can guess what the response would be from your clients... The other alternative which was a lapse in judgement is that before the array became critical, on the first failure, you could have shut down access temporarily and made a backup BEFORE ever attempting to rebuild the array as we all know the rebuilding process is the very most stressful thing one can do to a RAID arrays' hard disks - especially when you have another drive or more indicating it was failing. Once you had that backup, restore it to any number of idle servers that should be at the datacenter on standby for these sort of circumstances. Once the original 1143 has had all of its drives pulled and the server has been re-validated as 100%, migrate back to the original server and put your standby machine back in stasis ready for the next emergency. Also, keep in mind what this might mean to you as a new client of HG. If this were to happen let's say within your 45 day refund period, HG will be unable to even supply you with your own data to migrate to another server/hosting company, for they simply will not have it, until which time everything has been fully restored anyway. Brent, you should be ashamed of this level of incompetence and arrogance within both your staff and organization as a whole as far as your disaster recovery planning, procedures, execution and the lack of any system in place to poll the array's S.M.A.R.T. data to alert IT admins at HG that failure of not just one, but multiple HDD's in an array were eminent; if this is in place, incompetent associates ignored it. and would only point to pure incompetence of the IT administration team(s) - thinking it wise to try and rebuild an array with other drives indicating failure was imminent as RAID rebuilding is by far, absolutely the most stress one can place on each drive in an array - and even the poor resources at Wikipedia can tell the average "Joe-Blow" all about this if they were not already aware of such things, let alone a company with staff members oblivious to the realities I and everyone else in the Enterprise IT World have known since the advent of RAID technology now proven over a decade and a half and has been around even longer...So, if data integrity, disaster recovery and uptime are a concern to you as a prospective HOSTGATOR customer, I cannot at all recommend Brent's organization (HOSTGATOR) any further to the few clients we deal with these days, nor publicly anyone, now that we know the extent of their negligence both by staff, and just as importantly at higher management level(s) - those that decided to not have ANY disaster recovery backup servers taking snapshots of entire shared hosting servers. Even if HG was doing such a thing, it obviously was deemed worthless as it would have had to have been so far outdated that their risk management based decision was to have about a 50 hour outage rather than restore a backup that was out of date - which I contend could not exist at all and if it did/does, might as well not, for it is of no use to them or their clients; hence the 50 hour rebuilding fiasco of an array that began with likely more than just one HDD failure and stretched until it became a "nail biter" for IT. What they described put them at the edge of one more drive failure with loss of parity data with a likely end result losing everyone's data and the entire shared hosting server. And throughout all of this, communication and actual relevant details of what was really going on were not disclosed until after the outage and well into day two. Communication is horrible and divisive from HG in these instances, for no one likes bare their behind and face the music of their own incompetence when if it had not been for that, this disaster likely could have entirely been mitigated and if not entirely, would have confined it to a very short disruption required to restore the snapshot(s) they do not apparently have, or are so far outdated, 50 hours of lost services seemed a better plan. I see this as indefensible and every reason to not solicit HG's services, at least in shared hosting AND - make no mistake, if this is how they handle disaster recovery with shared hosting, I would implore any future clientele looking at VPS or leasing a dedicated server for hosting to: DIRECTLY INQUIRE WITH REGARD TO THEIR DISASTER RECOVERY PLANNING AND WHAT IS IN PLACE FOR THOSE SERVICES. ASK IF THEY ARE TAKING DAILY BACKUPS OF THE ENTIRE SERVER ON AN ENTIRELY DIFFERENT PHYSICAL MACHINE TO INSURE IF YOUR SERVER IS TO FAIL DUE TO MAINTENANCE ISSUES, OS OR FILE STRUCTURE CORRUPTION, HARDWARE FAILURES, INTERNAL OR EXTERNAL EXPLOITS, WILL THEY BE ABLE TO BRING UP A CURRENT BACKUP OF THE SERVER YOU ARE PAYING BIG BUCKS TO LEASE FROM THEM AND IF SO HOW LONG WOULD SUCH AN OCCURRENCE TAKE? - If the answer is no and you conduct business with them anyhow, you've been forewarned and have no reason to complain, unless of course they guarantee such services and then cannot provide it, while I'd have no idea of their practices in that regard for dedicated or VPS plans, if they cannot make good on it, it is entirely possible they have been flying by the seat of their pants on that guarantee and simply are not backing up anything at all to separate physical machines and are simply relying on fault tolerant RAID arrays on the dedicated or VPS servers you are leasing as well - which given enough time and their past performance, guarantees a failure. Lastly, HG had in past and up till now a great uptime record on our shared hosting server account. Support has always been lacking and has never once taken culpability for anything even when the root cause was unilaterally traced to their end. In fairness, this is pretty common in the industry, morally, ethically and legally incorrect, but sadly today's norm with most of your alternative providers as well. BUT, remember your High School mathematics class here. 99.9% uptime on their advertised sites means that at any one time you could expect up to 5,000 sites, accounts, or whatever their uptime metric is based upon to be in failure mode at any one time. One day it will be everyone's turn and if they do not make changes, eventually you may be the one that loses services for even more than 50 hours. Then later only to be informed that after they had over-stressed an array with a crop of end-of-lifecycle failing HDD's by rebuilding and rebuilding - over taxing those drives with the rest of the parity data on them that they have experienced 100% data loss. They'd likely give you a full refund, but... I don't know about anyone else, but the content's value of one's domains to say nothing of what is lost by downtime far exceeds the money they'll return for ruining you, your business, clientele and your data. I'd liken that to an auto shop installing new tires and forgetting to tighten the lug nuts on your wheels which in turn you have a serious accident. Then, they offer to replace the wheels tires and rims and this time promise to torque the lug nuts - this would satisfy few that might experience such a thing... Brent, sounds like you need have HR looking for some new people with Enterprise disaster recovery experience... Here is their Network Status Thread regarding the disaster/fiasco. Attached are copies of said thread to "keep 'em honest..." Prospective clients, you've been give the information to make an informed decision which you'll need do, we/I can only make those decisions for my own interest and no one else, so I hope you take pause to consider the risk/reward ratio inherent to HG's procedures as they stand today and make an informed decision that best fits your needs. Warm regards, everyone. Douglas Attachments : 2 .pdf duplicate transcript of HG's communication during the 50 hour disruption of services, to keep this on the level. AND Link to the sort of communication/miscommunication clients were provided during the incident. : http://forums.hostgator.com/gator114...e-t140911.html Last edited by DA_MAN; 09-21-2011 at 09:10 AM. Reason: [spelling] |
|
#2
|
||||
|
||||
|
Greetings,
While I can understand the frustrations this entire situation has likely caused users on this server, a large portion of the information presented in this forum post is not factual. I'd like to offer clarification, as well as factual information regarding many of the points you've made about our procedures as well as what actually occurred. First and foremost the notion that there is no disaster recovery plan in place is completely wrong. Given the importance and generally sensitive nature of this plan we're not able to go into full details, though again stating that there is no plan in place is an out right lie. This is simply not factual and I'd be curious to know how you concluded that having no insight to our procedure or systems. There was mention as well regarding us not using S.M.A.R.T data to monitor the arrays. Again, this is not the case. That data is monitored every 30 minutes for S.M.A.R.T and raid status, current status as well as any and all errors that may appear. The review then went on to state that we're backing up data to the same volume it's served from. That's also not the case as our backups are kept remotely on separate servers. The assumption that the drives failing caused the loss of those backups is false. To clarify the concern regarding the backups not being available, the unfortunate fact is that prior to any of these issues occurring with gator1143 we were in the process of re-allocating the backup server to allow more resources for backups with the particular subset of shared servers using that backup server. Gator1143 was unfortunately in that subset, which means during that time we did not have backups. As such, we very clearly state in our backup policy that backups are kept as a convenience and are absolutely not guaranteed. This has always been our policy and I feel we make every effort possible to be up front about this. This is even addressed in our Terms of Service, which we require you acknowledge and agree to when signing up. You noted as well that new clients should inquire about backups being taken daily. The backup policy again clearly states the backups are weekly, if there was anything we did to imply otherwise I'd definitely like to get more information about that so I can make sure that's being addressed. Beyond the monitoring of hardware and our backups, I'd like to clarify the issue with the two drives failing and why it took so long to restore the data back to the server. While there were in fact no backups during this time due to the independent re-allocation of our backup server for that subset, every effort was made to bring the data back online as quickly as possible. To elaborate further on the series of events, the first drive failed which we replaced via a hot-swap. The normal procedure in this situation is to allow the drive to auto-rebuild. We do not typically just restore backups in a case like this. That all being said, during this time there was no indication of any alerts regarding the second hard drive. However during that process the second hard drive did in fact end up experiencing a failure. Due to this unforeseen issue and not being able to keep the second drive mounted for extended periods the data retrieval was extremely slow. We constantly spent time having to re-import the drive to the array. The fortunate fact though while this did take an extended period is that all the data was retrieved and nothing ultimately lost. Finally, there appears to be great concern with the flow of information regarding what was occurring. We're simply not going to provide inaccurate information to our clients. It's unfortunate that this took the amount of time that it did, but we're not going to alarm clients stating there is an issue with their data unless we're certain there is. I apologize if that seems like we withheld information, though that's not the case. Looking back on the matter in retrospect and given the fact that the data was retrieved, it would have been far more stressful for server clients to have been told on Sunday, "We've had hard drive failures, your data may or may not be there still" when in reality the data was ultimately retrieved. Again, I apologize if you disagree, though the handling and disclosure of information given the situation was appropriate. Moving forward I’m definitely available to discuss this all further with you. I do want to ensure your concerns are addressed fully and most importantly to your satisfaction. Please feel free to contact myself directly here on the forums via PM so we can continue communication regarding this matter. Best Regards,
__________________
Joshua Martin Customer Service Manager Hostgator.com LLC. http://support.hostgator.com/ @HGSupport @Hostgator |
|
#3
|
||||
|
||||
|
Good Afternoon,
I've unfortunately still not heard from you regarding this matter. I'm more then happy to discuss this all with you at your convenience. I do certainly understand the concerns you've stated and would like to make sure those are addressed completely for you. I'll be sending you another PM here on the forums to see if we can get in touch. Best Regards,
__________________
Joshua Martin Customer Service Manager Hostgator.com LLC. http://support.hostgator.com/ @HGSupport @Hostgator |
|
#4
|
|||
|
|||
|
Quote:
Quote:
Quote:
Again, you can have a disaster recovery plan, but your admission of defeating it makes it worthless and not in place to serve you or your clientele's best interests. S.M.A.R.T. data, yeah I know it's polled there, I nor others have any idea what anyone does internally, except your attempt here at damage control makes HG appear more as if you were running a farm out of a dorm room rather than an Enterprise hosting company with "Over Five Million Domains..." You go on to add that backups (NO, not the client backups of one's own domains - I am talking about the snapshots of the entirety of gator 1143 that you took offline that could have reduced the downtime from the absurd 60 hour event that took place) are clearly stated as a service and are in no way guaranteed by the TOS. Then why have them? - Oh, you don't always, that is likely why someone was goofing off rebuilding and rebuilding gator1143 for nearly three days, which as we all know is the very most stress one can put on an array EVER... Yeah there is a concern "about the flow of information" as it has always been divisive and poor; yet there is even more concern about the level of inexperience and incompetence used in not only the day to day decision making processes of associates at HG, corporate policy and the notion that reasonable procedures like what we can all agree were not followed and if they were, they are flawed terribly. Simple, you only take a backup server out of service if there is one to replace it during that time. It did not take me the 30+ years experience I have in this industry to learn such things. If you'd like defend your position publicly here further, might I suggest against it? Will you re-read what your post really reveals about HG, their policies and procedures and still think you should be doing damage control? Yet another example of a poor decision; both RM and the legal department would have been the appropriate filters for you comments; instead, in a nutshell you said: "We have a disaster recovery plan - you are wrong Sir! We don't mind defeating said plan and making it useless. We defeated that plan in this case and very easily could have lost everything or had an even more lengthy outage. Said outage could have been lessened greatly by simply swapping out a few drives, re-striping and restoring the entire server image we did not have on our backup server to gator1143; yet all of this is moot for as we clearly state, no backups are guaranteed within our TOS and we accordingly treated systems with the same sort of disregard for what the consequences might be. So, why are you so upset???" We keep backups - like the ones you need not run on our very empty domains. What we do not have a "backup" of is a server - you don't either or you'd have used it (A SERVER) let alone the (full server) image(s) someone decided to ditch (or make completely unavailable) while migrating an array, that even if in a scenario that made it necessary, something you do not do until you can put another machine in its place to take its place while it is out of service. If we had a "backup server" we'd not pay for hosting at all. We'd simply host our own. Only our locale and the cost of running fiber half way across the World has prevented it at this time... I could go on and on; I'd advice you not to publicly, unless you'd like talk about how much more we know about exactly what and how this scenario played out and what associates did, based off of the factual data we collected as the server was coming up. If you'd like go further, we could discuss ho HG is parsing outgoing mail and trapping it. No biggies huh? Yeah, but what happens when lax HG employees allow a breach of that new system, or perhaps a disgruntled employee would like get their hands on some data? I know we have serious industrial espionage and copyright infringement issues that are not a bit calmed by HG's systems reading every outbound email. I can turn SpamAssasin off; there is no way for a customer to "turn off" HG's parsing/reading of outbound email; in fact they've not disclosed publicly they are even doing so. While I know your legal department will point at the clause any such company would have in their TOS's, "to secure the network" I'm sorry, but that's one to be tested in Federal Court if HG does not discontinue it. As clients, we are not employees there; this would be different inside an organization, but not with a public web hosting company. Anything that would blacklist a client's email address is already a violation of the TOS; reading outbound email is beyond the scope of your TOS and even if it were amended, breaking Federal, State and local laws with a TOS, does not support the notion that it is still not illegal. There are a number of Federal Laws I am aware of that would apply here, ask the legal department... Should I go on? I'd bet not. Should you reply here? If you are smart you'll stop before really getting me fired up. What is that they say? If you can't beat them, join them? I've been short on work these days. Considering it looks like HG is in dire need of knowledgeable, experienced IT staff members, perhaps you should refer me to HR?
|
|
#5
|
||||
|
||||
|
Good Afternoon,
While it's evident this unfortunately will not be resolved through communication over the forums, I do certainly want to discuss this with you. I believe through further communication over the phone I can address your concerns with the hope of moving forward beyond this all. I've sent you an additional PM this afternoon requesting a call back phone number where I can reach you, as well as an email. I'll look forward to speaking with you. Best Regards,
__________________
Joshua Martin Customer Service Manager Hostgator.com LLC. http://support.hostgator.com/ @HGSupport @Hostgator |
|
#6
|
||||
|
||||
|
Greetings,
I've unfortunately still not be able to contact you regarding this. I'd very much like to discuss this all further with you. I realize you have concerns and would like to ensure these are addressed as soon as possible. I've not been able to reach you at the phone number you provided me however. I have also sent an additional email requesting we schedule a time convenient for you to discuss this. I do look forward to speaking with you so we can address this! Best Regards,
__________________
Joshua Martin Customer Service Manager Hostgator.com LLC. http://support.hostgator.com/ @HGSupport @Hostgator |
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is this too much for a shared hosting plan? | arvind_gupta | Shared Hosting Support | 9 | 10-28-2010 10:33 AM |
| Just how big can a site be on a shared hosting plan? | Cedik | Shared Hosting Support | 4 | 06-16-2008 02:15 PM |
| Image Hosting on Shared hosting plan | hosterbean | Pre-Sales Questions | 16 | 01-24-2008 08:42 AM |
| Free file hosting site on shared plan? | Monarch | Pre-Sales Questions | 18 | 09-30-2007 01:02 AM |
All times are GMT -5. The time now is 05:02 AM.



Brent, you should be ashamed of this level of incompetence and arrogance within both your staff and organization as a whole as far as your disaster recovery planning, procedures, execution and the lack of any system in place to poll the array's S.M.A.R.T. data to alert IT admins at HG that failure of not just one, but multiple HDD's in an array were eminent; if this is in place, incompetent associates ignored it. and would only point to pure incompetence of the IT administration team(s) - thinking it wise to try and rebuild an array with other drives indicating failure was imminent as RAID rebuilding is by far, absolutely the most stress one can place on each drive in an array - and even the poor resources at Wikipedia can tell the average "Joe-Blow" all about this if they were not already aware of such things, let alone a company with staff members oblivious to the realities I and everyone else in the Enterprise IT World have known since the advent of RAID technology now proven over a decade and a half and has been around even longer...





