>How many would notice, if we hijacked runtime calls and wrote to a remote blob ...

zasdffaa · on July 22, 2022

Which is innately more fragile, a communication system you rely on for data that runs dozens to hundreds of miles through some communication medium that you don't control, to disks.

Or a disk (or five) in your own chassis.

It's not quite that simple, but still...

throw0101a · on July 22, 2022

> Which is innately more fragile, a communication system […]

"A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable." — https://en.wikiquote.org/wiki/Leslie_Lamport

jl6 · on July 22, 2022

An alternative perspective - which is innately more fragile: a communication system with support and incident management by professionals whose sole job it is to keep the system running 24x7 and bring it back online in the rare event it fails; or, you, whose main job is something else entirely, and you would rather not have to think about disks at all.

zasdffaa · on July 22, 2022

That's a very good point.

For: I've seen a company very nearly destroyed by not having the skills to deal with a single disk failure in raid.

Against: A Large Hosting Company we used couldn't read my simple instructions and lost a backup.

BTW read the SLAs of your provider and my guess is you'll agree with what a lawyer who worked for us said - shite. Basically, while it was down our provider wouldn't charge us for it. The end. What does yours do? If similar, what motivation have they to fix breaks? And is the provider responsible for loss of connectivity between them and you?

Put another way, who gets hurt more in downtime, the provider or you?

Some reading for me <https://journal.uptimeinstitute.com/cloud-slas-punish-not-co...>

Quick extract:

"SLA compensation doesn’t even scratch the surface of these losses. If a single virtual machine goes down for less than 7 hours, 18 minutes (99% monthly availability), AWS will pay 10% of the monthly cost of that virtual machine. Considering the price of a small instance (a ‘t4g.nano’) in the large US-East-1 region (in Northern Virginia, US) is around $3 per month, total compensation for this outage would be 30 cents.

If a virtual machine goes down for less than 36 hours (95% availability in a month), the compensation is just 30% — just under a dollar. The user only receives a full refund for the month if the resource is down for more than one day, 12 hours and 31 minutes in total."

yep, that sounds about right.

Edit for context:

"In the 2021 Uptime Institute data center survey, the average cost of respondents’ most significant recent downtime incident was $973,000. This average does not include the 2% of respondents who estimate they lost more than $40M for their most recent worst downtime incident."

nickjj · on July 22, 2022

The AWS SLA compensation is also very much rigged against you beyond percent based outage durations.

For example a couple of months ago AWS had an outage that caused all of our customer facing domains to go down in us-west-2. It meant going to example.com wasn't resolving with our site due to a confirmed AWS outage.

For a few hours all of our RDS instances, EC2 instances, etc. were being charged for but providing $0 value since the entire org's sites were down. All revenue halted because the site wasn't accessible. When I contacted AWS support about the outage they mentioned we only qualified for some microscopic amount because the outage wasn't directly related to RDS, EC2 or VPC related things, etc..

kgeist · on July 22, 2022

>Against: A Large Hosting Company we used couldn't read my simple instructions and lost a backup.

One of our previous incidents happened because an employee at a large hosting company misunderstood the ticket and manually shut down our entire live server without warning

mustyoshi · on July 22, 2022

It sounds like AWS gives almost a 10x credit for the quick cases and a 6x credit for the not quick cases.

Business interruption insurance should be covering the actual downtime cost.

zasdffaa · on July 22, 2022

> It sounds like...

How very generous.

The insurance is an interesting idea, why not have MS/google/amazon roll that insurance into their offerings, I mean, it makes sense.

dijit · on July 22, 2022

All else being equal: those people aren’t going to care that it’s down for you specifically.

Their nines are not your nines.

https://rachelbythebay.com/w/2019/07/15/giant/

christophilus · on July 22, 2022

I would much (much) rather support a simple server with standard file access than a complex network abstraction.

MontyCarloHall · on July 22, 2022

This is only feasible for applications where latency is not a concern. The overhead of just the HTTP call to an s3 bucket (not to mention all the other bucket access overhead) is much higher than the overhead of a disk read request. Try performing 1000 random file accesses to a bucket, and 1000 random accesses to a disk. The performance won’t even be close.

yung_steezy · on July 23, 2022

I always forget the factor but I remember disk operations generally being 100,000 times faster than network calls.