We've had this discussion already. Get your business off the ground by staying simple and non-optimize ; if there ever comes a time where it becomes necessary to the health of the company then you should have the means to introduce scalability at that point. If you don't then it means your business model doesn't work as well as you thought it did.
(Of course this doesn't mean you should charge ahead without thinking about the future at all. If you can make something scalable at a very low cost of implementation/maintenance, go ahead but growth should take precedence.)
Of course it always depends, but my experience is people reach for distributed systems when they can simply run a single node.
If you can run a single node, and it meets all your business requirements, then I’d argue that is simpler than having to manage the very challenging problems inherent in distributed systems.
A simple web app can run 100k+ connections per second with latency in the 100s of microseconds. This is 2 orders of magnitude better than most apps need.
If you want durability you can use a service like RDS which takes care of replication and backups for you.
What deployment orchestration tool would one use in such a case? Is it just plain systemd/docker compose and a shell script?
I surely want zero downtime for my deployments. Do I then need multiple instances of the app running on the server and do something like blue/green? In that case I also need some load balancer config management during my deployment.
How do you guys do it? Curious, because I only ever lived in the k8s/cloud native world.
I drop in some "standard" parts like load balancer, ingress controller, monitoring, storage automation (both for persistent volumes and database with CNPG), let's encrypt, DNS auto registering.
On top of that then I can easily deploy project/customer/workload specific things while caring less about implementation for common traits like "I need application's port 80 to show on public internet as app.foo.quux/API and I need an SQL database for it".
Usually there's also SSO and few "Dev happiness" elements like simple dashboard to access relevant applications if you stumble on the root domain.
It might sound like a lot of work, but it's mostly single time investment (especially when you haven't done some bits before) that is remarkably little interaction on later, and definitely takes less time than fiddling with terraform or others only to then have manual (or so custom only one person in the world knows them) she'll scripts on the server.
Depends on what is your SLA. If it's an offline data processing and you can survive 1-2h outage (during machine restart) why not?
I had a chance to see how such system works and it was great. 1 powerful machines, few services that process the data. Even it fails (which happened ~2 in 5 years) you just wait until it restarts.
Also, most of the work happened during the night, few hours of lag was okay 'cause before 6AM everything was ready to be served for the upcoming working hours.
for small applications which is 99% of the applications out there - one main and a couple of replicas for redudancy. Would that be distributed systems ?
then also there's one thing -- let it fail. lots of things in the world fail occassionally or go down without real consequences. we seem to have forgotten that part.
I think calling out Durability is a bit of a straw man. Most services get their durability from S3 or some other managed database service. So they're really only making the "do it on a beefy machine argument" for the stateless portion of their service.
I agree with the other points for production services with the caveat that many workloads don't need all of those. Internal workloads or batch data processing use cases often don't need 4 9's of availability and can be done more simply and cheaply on a chonky EC2 instance.
The last point is part of our thesis for https://rowzero.io. You can vertically scale data analysis workloads way further than most people expect.
Another usually misunderstood fact is that if you are trying to go fast enough, i.e at the limit of a machine then even one machine is a distributed system. While cache coherency and a bunch of things try to pretend it isn't you really do need to be aware of NUMA and PCI-E lane configuration if you want to get everything out of a box.
The very same techniques that make networked distributed systems work well are also useful on threaded or multi-process single image workloads. Sure it doesn't matter at less than ~5gbit line rate but past that point it certainly does.
Yes, one machine is all you need in most cases if you really think about it. If you need more it will be pretty obvious. Then you should absolutely setup more machines.
Then there is managed services route, where you don't even know where exactly you are deploying stuff and use Vercel, Supabase, Serverless of some kind etc. This is also valid, but then you relay on 3rd parties and their lava lamps.
> Distributed systems achieve exponentially better availability at linear cost
This seems like the opposite of my experience? Adding another nine of availability (~10x, linear) seems to require at least 10x more effort (superlinear). Especially once you start reaching for 4+ nines.
Yeah, but you don't have to completely forgo resilience with "single machine". There's quite a few steps between "distributed" and "single machine". Why not just have 3 identical machines with a stateless monolith?
If you think distributed systems are somehow, some way are simpler, need less coordination, and can have less latency, I don't know what to tell you.
It seems to me that author is convinced about distributed systems or it's their whole identity and try to come up with reasons to feel better or something.
It's not that you don't ever need distributed systems, but that's the exception when you do, not very frequent.
I agree with all arguments made in the article from a certain point of view. And that is of a large org with several teams, his point about reducing coordination between the teams is crucial in the trade-off calculation.
If you have a small org with small number of teams then you are right, but still what qualifies as "small" or "large" is not really clear and people seem to jump into the distributed systems architecture way too soon.
I think biggest issue with the post is the use of the term "distributed systems".
It makes the whole thing murkier, because it's unclear to me what exactly they advocate for. Multiple servers? Microservices? Just running multiple services but don't go too crazy on the micro? (Many "focused" small monoliths handling cross cutting concerns might be considered microservices but are not as micro as some people do, etc). Maybe running multiple servers with some distributed systems technology but not actually messing with distributed systems as product part (k8s, some easy to distribute clustering databases, etc)? All of that simply doesn't fire for me when I see "distributed systems".
One could just as well read it as arguing for actually going deep into distributed systems theory and practice, integrating those ideas in your product, designing your software to use distributed persistence systems instead of single "coherent view" SQL or other database, etc.
Literally all of them. Availability might have been a good point, but their other blog post somehow reaches 100% so I don't believe that either. Also not every business needs 99.9999 availability and can give up for good reasons (like cost and complexity).
The author is an engineer at AWS so he has an obvious bridge to sell.
That said, I do agree with the general point of needing more than one computer. But most people probably just need two, one for backup, far from “distributed system” in the traditional sense.
(Of course this doesn't mean you should charge ahead without thinking about the future at all. If you can make something scalable at a very low cost of implementation/maintenance, go ahead but growth should take precedence.)