Hacker News new | past | comments | ask | show | jobs | submit login

Once one of our devops engineers was testing a script with nohup yes > output.

The /home directory was mounted to an autoexpanding EFS on AWS.

23.4tbs and 2 months later we noticed the bill :)




Dare I ask: how much?


Not too much, like $4k extra a month i think, noticeable but not the end of the world.

I assume a smaller place would just beg AWS for forgiveness and probably get it.


> Not too much, like $4k extra a month i think, noticeable but not the end of the world.

It's nice that it's manageable and also a learning experience, but over here that would be like 2 months' take home salary for a software engineer.

Kind of why development/test environments shouldn't have autoexpanding or scalable anything, in my experience.


I assume larger companies can deal with temporary mistakes on the order of one engineer’s salary.


Oh, they certainly can, some better than others. Though personally I'd most certainly want to avoid such situations.

First, to retain an air of "vaguely knows what they're doing" about me, even though everyone makes mistakes and that should be treated as something that's okay - especially if you can limit the impact of mistakes, like with automated spending limits.

Secondly, because I wouldn't want to risk doing something like that in a personal project, given that my wallet is likely to be much thinner than those of organizations.


As in, paying for that engineer was a mistake all along?


No. It doesn't really impact the company's bottom line if your software engineering org is 100 people making $20k a month and someone accidentally wastes $4k of EBS disk. It's nice if you don't waste it, of course, but "oops, filled up the disk with 'y' output" is better than "yeah actually all of those files are pretty important, I think team X is using them" because you can instantly delete it, rather than doing a multi-month project to see if team X really is using the files.


Everything should be made to bounded, is there no max that could have been set? Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed). I wonder if there us a failure to do pre-prod monitoring as well - it's super handy having dashboards telling you disk bbn usage


> Everything should be made to bounded, is there no max that could have been set?

I'd expect that you'd reach the maximum once your card is rejected. :)

But truthfully many platforms out there will let you set up spending alerts, but not outright set limits because then you get into a bunch of difficult questions - should further data just be redirected and piped to /dev/null? Should you as the service provider instead limit IOPS in some way, or allow slower network connectivity if allowed egress amount of data is exceeded? What about managed databases, slow it down or throw it out altogether?

I've talked about those in detail with some of the people here ages ago and there are actually companies that take the "graceful degradation" approach, like Time4VPS who host almost all of my cloud VPSes for now: https://www.time4vps.com/?affid=5294 (affiliate link, feel free to remove affid if you'd prefer not to have it)

  What happens if I run out of bandwidth?
  We reduce your VPS server’s port speed 10 times until the new month starts. No worries, we won’t charge any extra fees or suspend your services.
Honestly, that's a really cool idea for handling resources, though one has to also understand that storage is a bit different and if you've built your entire platform around the concept of scalability and dynamically allotting more resources, you might make the choice of having the occasional story of large bills (some of which you'll probably forgive for good PR), as opposed to more frequent stories about things going down because the people forgot to pay you, as well as many enraged individuals complaining about their data being deleted, though it's supposedly your fault.

So in a way it's also a business choice to be made, though one can feasibly imagine hard spend limits being feasible to implement.

> Not using an expandable storage in test risks deviations from prod (and then you get the prid only bugs that are difficult to keep fixed).

I concede that this is an excellent point, you also should be able to test automatic scaling when necessary etc.

Though the difference is probably in being able to test it but not leave it without (more conservative) limits when you're not looking at it.


I wonder what the quickest way to rack up the highest bill on AWS is.


Private CA + Dedidicated CloudFront IP's are the fastest ways to do it in one line-item, but most commonly massive DB instances. Why create an index when you can double the number of cores? Wait, cores don't help improve queries? But more memory, so better right? Elastic MapReduce with oversized instances used to be pretty common, but RDS is a perpetual winner for most companies I've worked with.

But the typical worst case-practices, are the small companies who think avoiding vendor lock-in is a thing that matters at their scale. Look, you're never going to change cloud providers - if you do, that will be a problem you can solve then. You're never going to go multi-cloud. If you do that's a problem you can solve then. Preemptively DIY'ing everything from database copies to security to encryption is going to break you - you're now engineered on a brittle substrate of hacks with no support, all so that one day you could maybe consider saving 10% on your cloud bill by moving to a different provider. The day that migration will save you more than one engineer per month, consider it. Until then, you're just making your own costs worse.

/rant


And in real life I've seen vendor lock-in cause exactly every worst fear and worse.

Vendor lock-in is only tolerable when the service is so easily swapped out that it's not actually vendor lock-in.

There is no valid argument for not worrying about that before it happens, and bending pretty far to avoid it. No matter how hard you work to stay as portable as possible early and at each daily step along the way, it's 10x or 1000x less than dealing with it later.

If you're just talking about a pluggable service, well then by definition that's not really lock-in.


I believe you, I do, but for every company afraid to use the features of a service they pay for because of vendor lock in, I can show you five, six and seven figure bills attributable to DIY. Nothing is drop in if it's value added - if it's not value added, why are you using a vendor at all?

Portability is a huge myth that eats engineers hours like a snack every single day and rarely pays off.


I believe that you believe me, so I'll rest there, since I don't want to speak actual products and timescales and business sizes and buusiness types required to make the claim more solid.


> the small companies who think avoiding vendor lock-in is a thing that matters at their scale

This. If AWS ever were to even consider dramatically increasing their prices, there are players whose AWS bills are two or three or more orders of magnitude more expensive than yours who will howl and gnash their teeth and whose potential departure from AWS does far more to protect you than you could ever do to protect yourself.

Similar stupidity includes spending engineering time on such issues like how to deal with an S3 outage. If S3 is down, your competitors are down too. Nobody cares.


The people with huge AWS bills are already paying a different rate to you. They don't care what happens to your rate.


One lock-in scenario to consider is if you may ever want to offer an on-premise version of your SaaS product. This is what my company is doing now. It's a huge pain in the butt, but it does bring in a lot of revenue.


Lambda functions on a put event on S3 that also put an object _into_ an s3 bucket is so common it's called out on the aws documentation page [0].

[0] https://docs.aws.amazon.com/lambda/latest/operatorguide/recu...


The Lambda functions should put at least two objects into the bucket, otherwise you don't get that nice exponential growth.


Ouch!


Leak your AWS keys and a nice support team will take care of it by mining cryptocurrencies with your credit card.


Plus make sure your contact details are NOT up to date so you miss the AWS warnings...


A fork bomb that fires off GPU enabled VMs for every instance of the fork?


That will hit provisioning limits nearly instantly


That's ok ;)


For reference launching a single u-12tb1.112xlarge and leaving it up for a month would be around 80k


A DynamoDB table provisioned at 40,000 RCU/WCU ? You have been warned!


And this is why you need grafana dashboards of your storage!


Nobody ignores the billing statement :P


Why didn't you have an alarm set?

If you are going to use ANY cloud provider, learn about alarms, or you will get screwed.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: