The Site’s Tale (A foray into the tech abyss)

Dear Internet,
And so, here we are.
What started out last summer as a support ticket to my web host provider for site slowness has become a farce of epic proportions. Here is the timeline:

In the summer of 2012, open up ticket for site slowness
- Web host provider claim other sites that live on the server my domains are located on, plus the random DDoS attacks that were occurring during the same time on the server, are reason for my sites slowness. Web host provider suggest I get a VPS to clear up the problem. (Note they did not offer to move me to another server cluster, which would have more than likely fixed the problem.)
Domains are moved to the VPS. VPS configured with Apache, MySQL, PHP (Web host provider’s default setup.)
- Less then a month after the move, my sites become almost inaccessible. We discover Apache is randomly spiking the memory and CPU. The randomness doesn’t give any clues nor is there anything in the logs.
- To clear the spikes, we have to restart the VPS. Sites become accessible for a few hours and then the spikes start all over again.
Inbetweenest the spiking, my main domain is infected with an injection hack. Three separate times. After each hack, I use WordPress best practices for site security as well as web provider’s list of security recommendations. Yet, I was hacked two more times after the first.
- When I opened up a ticket with the provider about the hacking and I had followed their best practices AND WordPress’ for security, they told me there was “nothing they could do.”
Web host provider cannot diagnosis the problem or provide a fix/solution with the Apache issues. All they do tell me is to move over from Apache to Nginx, another web server software provided by the provider.
- The change over to Nginx is seamless and the spiking immediately stops. As does the injection hack attempts.
End of December 2012, all of my domains start throwing up 502/504 errors. I open up ticket with the provider and the errors almost immediately stop. I’m told to clear my browser cache and DNS cache on my computer in the future.
- 502/504 errors come and go most of January 2013 and into February. Sometimes they get so bad (like hours without access) that I have to restart the VPS to get it going again.
- I open up a ticket in mid-late February as my sites has been inaccessible that no matter what I do (restart VPS, clear caches), unload plugins, nothing works. While I care that my sites work, it’s becoming more of an imperative in having a working portfolio in April to hand in to my boss.
  - No one responds to my ticket for 8 days.
When I get a response, the person responding used http://www.downforeveryoneorjustme.com/ to verify site connectivity and wanted to close the ticket. Site was apparently live when they did it because when I got the response from them, the sites were dead again.
When I respond back the sites were down again, and had been down, the support person’s response was we needed to have supervisord installed and running to kill any PHP processes that were just hanging. Supervisord was installed but not configured and the only way to configure that was adding a user to the VPS as root and configuring it ourselves, which we did per web host provider’s instructions.
TheHusband installs supervisord and gets it configured, and the 502/504 errors do not end. We respond to the ticket and we are told the only way web host provider will look into this if we disable supervisord, remove the root user and then they will reconfigure and manage supervisord.
We do as they request and in the interim, I am told the sql server is in the wrong cluster. This is surely the problem and why we are getting all the 504/502 errors.
- My SQL server is moved and nothing changes.
- Web host provider configures supervisord and nothing changes
Support then recommends I turn on PHP XCache Support in our domain control panel and install W3 Total Cache in WordPress to help
- W3 Total Cache had a vulnerability discovered in it in December 2012 that was immediately fixed and updated. My domain had the updated version but was hijacked by script kiddies for the same vulnerability (so the plugin has another vulnerability hole in it) within hours of installation. I was notified by Google Webmaster, within hours of installing the plugin, my site was no longer secure. After turning the plugin off, and cleaning up, I opened a ticket with provider for their security to verify my site was clean. The support person(s) verified the site was clean, ventured a guess the whole problem with the 502/504 errors was the W3 Total Cache plugin, so they suggested in addition to uninstalling the plugin, I also clear my cache and everything will be right as rain. I pointed out this was an ongoing issue that was known for months and had nothing to do with W3 Total Cache, which I had just installed for the first time a few days ago. Transfer me back to regular support.
Original ticket for the 502/504 errors (not including previous slowness history) was opened on February 12. As of March 30, there has been no resolution or solution.
- During that period, nearly 70 emails were exchanged by me and support — most of it having to remind them they already told me X solution and it didn’t work or providing them with data to back up my problem. Every time a tech suggested I “just clear my browser cache” as there was “nothing in the logs,” they got an email from me with data of users from around the globe who were getting the same 502/504 error. Every time they suggested I go to Apache, I point to the ticket from them telling us to move from Apache to Nginx because they couldn’t fix the Apache spiking errors when I was on Apache.

During all of this, TheHusband noticed a couple of things:

The provider never offered to move me to a new server or cluster, the default response from them was for me to move to a VPS and once on the VPS, up my memory (aka, to get more money from me)
The VPS is crippled. You have zero control to update any software (Apache, Nginx, PHP, whatever) nor can you do any configuration that goes outside of what the provider allows (which you don’t find out until you try to do thing)
Even having root access on the VPS, which should give you full control, doesn’t. That too is crippled and some functionality is stripped.
CPU and memory usage should be minimal on site like mine, but when the 502/504 errors were not going on, they were spiking. Running top on the VPS showed nearly 50% CPU utilization with web services turned off, so nothing should be running and yet here is 50% CPU utilization. Since that is all controlled by the web provider, we could not clean it up or turn off unnecessary services that were eating away at my CPU/memory usage.

Fed up with my caterwauling, TheHusband set up a near identical site at a new provider’s VPS, migrate the content, update the DNS, and get EPbaB running fairly quickly. TheHusband was also able to update and optimize PHP, Nginx, and WordPress as that was not allowed at Dreamhost and swapped us from MySql to MariaDB, the open source solution, for the database. After we got everything up and configured, he ran structured packet queries against both sites. The result? Host provider had 93% packet loss while new provider had 0% packet loss.
TheHusband also calculated the current provider could not handle more than 1 connection a second, where as the new provider can handle 5 times the load – on the exact same set up.
We’re moving the remaining three domains over in the upcoming weeks, then I’m canceling service.
The last email I got from the provider, from March 30th, gave me a long spiel of apologies and trying to make this right. It was similar to an email I had received from someone else at support that told me, “502/504 errors are normal” a few days earlier. Both suggested I move to Apache, which would solve all my problems.
Apparently neither of them read the ticket history, though they both claimed to have done so.
There are so many levels of frustration going on, it’s hard to figure out what to fight and what to let go. I have been with this web host provider since April 2003. A decade of service and loyalty, so much so that my referral kick backs meant my monthly bill was pennies. Shit just worked. Up until say, 2010, I never had to open a damned ticket with them. Then it got progressively worse.
I would have held on for loyalty and the years of great service by this provider, but the constant defending or reiterating ourselves on what we did (95% of the time specific directives by the provider), the miscommunication of the support team, technical negligence, the often patronizing tone of the emails (“It’s no a problem with us, it’s a problem with you.”), coupled with you had no idea what line of support you were with or even who you were talking to was the final straw. My sites don’t generate a lot of hits, and I am okay with that, but they should work when I need them to work.
And so, here we are. New provider. Snappier site. Everything working. I have an awesome husband who not only got the site up and running, but was able to fix all the gaping security issues he couldn’t fix on the older provider because it was locked down.
Now, the world is starting to look better again.
xoxo,
Lisa

One thought on “The Site’s Tale (A foray into the tech abyss)”