Diversification as Risk Management, or, Disaster with AWS

Posted: 2023-07-08 | Tags: Amazon, AWS, blog, Digital Ocean, Gandi, Hugo, linux, Render, Vercel, website

That title sounds a bit Rocky and Bullwinkle to me. But it is appropriate to the situation.

I have migrated an older Wordpress post to this blog on AWS backup that provides some context or evidence of my past history with AWS.

Let me try to tell this story as concisely as I can. For the TL;DR crowd, the takeaways are labeled this whole episode made me realize.

I had been using AWS to serve all of my various websites for a long time. At least since 2011 for the first of them. The appeal of the S3 static website option, and the integration of certificates, http->https redirects, Cloudfront, and so on was appealing. And the widescale adoption of AWS and associated documentation and help aided this process.

But I became complacent. After having developed an unchanging routine for website development involving local Hugo delivered to AWS S3, with Cloudfront and Route 53 serving as the glue to deliver the final product to the web, I kept using that “stack” in an unquestioning manner. And AWS was reliable during this time.

But, just as with bananas, a monoculture is vulnerable.

Somehow, AWS had flagged my account for unauthorized activity in late June. Their opaque messages warning me of this ended up in my spam folders, perhaps because they follow spam styling conventions. All I saw was a prompt to change my password once I logged in, which struck me as routine, since sites often force this on their users once they require them to adopt longer passwords or ones requiring special characters.

After I changed my password, I received additional e-mail that did get through to my active inbox about the issues with my account. I went into my AWS console and “closed” the issue, since I had not noticed any unauthorized activity. Although I have no actual explanation from Amazon about what happened, I believe that “closing” the issue was the wrong thing to do, equivalent to marking my “unauthorized use” as “not resolved”. After I did that, I soon noticed the problems. My websites served through Amazon went down.

Yes, all of my sites were offline for hours!!!!!!!!! So much for Amazon’s market-leading reliability!

This whole episode made me realize just how opaque and unfriendly the Amazon diktats that one receives are. There was literally no guidance on how to resolve this issue from the big A, or on what had caused it.

After a couple of hours of searching online for solutions, lost in a maze of pages about IAM users and policy authorizations with overlapping and contradictory information of unclear currency, I contacted Amazon’s chat help in the middle of the night. They did collect my information and submit it for verification. A few hours after that - not sure when, but while I was sleeping - access was restored, and I woke up to working websites.

This whole episode made me realize that the static website space has evolved a lot since I started out with AWS as a default ten years ago. There are many providers, each of whom offers their own combination of strong points for different use cases. Services like Netlify, Cloudflare, and others clearly are serving their users well. In the case of Netlify, I did not really need their quick build features or heavy-duty production process, along with potential costs. In the case of Cloudflare, their preference for use of their own DNS (although there appear to be partial workarounds for that) was the only thing keeping me away. If I am happy with anything, I am very happy with Gandi DNS–see below.

Now, I have no information from Amazon about what might have caused this issue, but it really shook me up. I can speculate that it is because I use a VPN and edit from multiple machines, each of which might temporarily be seen as operating from a different country. If AWS is using that as a screen for unauthorized activity, I can only say that it seems extremely naive (although Rutgers once locked my account for this exact reason). I am not going to dumb down my computing security practices solely in order not to inconvenience Amazon.

Also, this whole episode made me realize that I was overly dependent on a behemoth corporation to serve up material that ostensibly represents my academic career and other interests. Amazon has not been getting better lately, acting in malevolent ways. There is no reason to believe that AWS is an exception to their corporate culture of dehumanizing pressure to perform to metrics and to automate out the human component of any interaction. If their e-commerce site is now notorious for selling counterfeit items and fly-by-night ersatz products, and Amazon’s corporate mission was to destroy–ahem, disrupt–the market, why wouldn’t AWS also be moving in that direction?

So, this whole episode made me realize that I had better investigate substitute ways of serving up my web content, pronto.

In my next post I will talk a bit more about the solutions I found, involving Render, DigitalOcean, Netlify, Vercel.

Let me also give a shoutout to Gandi for providing 100% reliable, unquestioned domain registration, DNS, and webmail services to me for more than 20 years! And doing it with simplicity and user-friendliness. I have never had any doubts, qualms, or suspicions about Gandi or its future, unlike AWS. So when I could, within the space of minutes, delete my AWS Route 53 DNS and edit Gandi DNS to point to Render, I was so delighted with this solution to my AWS problems.

Again, this whole episode made me realize that I had better diversify and have a few methods of serving up web content at my fingertips in case of other issues with any provider. So I am now actively hosting my sites on a couple of new providers, and I am hanging onto my S3 buckets and methods for serving from AWS, just in case. But I sincerely hope that new AI or whatever alerting systems will not flag me as a problem user, no matter what service I am on. However, the onus is on the individual to protect themselves and manage their exposure and risk resulting from any one service provider. I hope I have now moved into that new, more diverse and protected space, thanks to the cuckoo AWS nudging me out of the cozy nest they had once created!

Anyway, this is the first blog post of mine after I have migrated hosting of ryanwomack.com to Vercel for ryanwomack.com. Thanks Vercel for supporting me in a more secure and controllable life!

Was my title hyperbolic? No, I don’t think so. Having one’s core identity websites down for hours, without knowing why, is a crisis. And diagnosing the crisis, it seems that my AWS account was working, my S3 was working, my static S3 websites were being still served via their long S3 URLs, but Route 53 was blocking all requests for my domains and Cloudfront was also disabled. Why not freeze my account but leave services running? Or, why not label my Cloudfront endpoints as “temporarily unavailable”? But to me, having an unresolved DNS record was basically tbe worst imaginable outcome.

What do you usually think when a hostname comes back as “we couldn’t find that site”? You think it is gone forever from the web. Maybe the owner took down the server, or forgot to renew the domain name, or anything like that. But the implication is clearly “dead and gone”. And that is what AWS did to me. I do not understand the rationale for this. If an account is blocked, even for legitimate reasons, why would the DNS be disabled in this heinous manner?

This whole episode made me realize that I don’t trust AWS or Amazon anymore, anyway, anyhow, anywhere. I hope I am able to fully free myself from dependence on them.