Does each microservice really need its own database? I have recently proposed my team initially not do this, and I'm wondering if I am creating a huge problem.
Isolated datastores is really the thing that differentiates microservice architecture (datastores meant in the most broad sense possible - queues, caches, RDBMSs, nosql catalogs, S3 buckets, whatever).
If you share a datastore across multiple services, you have a service-oriented architecture, but it is not a microservice architecture.
Note that I'm saying this without any judgement as to the validity of either architectural choice, just making a definitional point. A non-microservice architecture might be valid for your usecase, but there is no such thing as 'microservices with a shared database'.
It's like, if you're making a cupcake recipe, saying 'but does each cake actually need its own tin? I was planning on just putting all the batter in one large caketin'.
It's fine, that's a perfectly valid way to make a cake, but... you're not making cupcakes any more.
If they won't have it then they're not microservices.
The main premise is independent deployability. You need to be able to work on microservice independently of the rest, deploy it independently, it has to support partial rollouts (ie. half of replicas on version X and half on version Y), rollbacks including partial rollbacks etc.
You could stretch it in some kind of quasimodo to have separate schemas within single database for each microservice where each would be responsible for managing migrations of that schema and you'd employ some kind of policy of isolation. You pretty much wouldn't be able to use anything from other schemas as that would almost always violate those principles making the whole thing just unnecessary complexity at best. Overall it would be a stretch and a weird one.
Of course it implies that before simple few liners in sql with transaction isolation/atomicity now become phd-level-like, complex, distributed problems to solve with sagas, two phase commits, do+undo actions, complex error handling because comms can break at arbitrary places, performance cam be a problem, ordering of events, you don't have immediate consistency anymore, you have to switch to eventual consistency, very likely have to do some form of event sourcing, duplicate data in multiple places, think about forward and backward compatibility a lot ie. on event schema, taking care of apis and their compatibility contracts, choosing well orchestration vs choreography etc.
You want to employ those kind of techniques not for fun but because you simply have to, you have no other choice - ie. you have hundreds or thousands of developers, scale at hundreds or thousands of servers etc.
It's also worth mentioning that you can have independent deployability with services/platforms as well - if they're conceptually distinct and have relatively low api surface, they are potentially extractable, you can form dedicated team around them etc.
independent deployability, independent scalability, ease of refactoring, reduced blast radius, code ownership and maintenance, rapid iteration, language diversity (ie an ml service in python and a rest api in nodejs), clear domain (payments, user management, data repository and search) just to name a few. if two or more services need none of the above or must communicate with the same database or is too complex to communicate with each other if a db is not used (ie queue nightmare, shared cache or files) are usually signs that the two should be merged as they probably belong to the same domain. at least thats some of the logic i follow when architecting them.
Agree and disagree - it really depends on why you are going to micro services. Is it because you have too many people trying to work on the same thing and you’re architecture is just a reflection of your organisation. Or is it to decouple some services that need to scale in different ways but still need to sit on top of the same data. Or is it some other reason?
I think the dogmatic “you always need a separate database for each micro service” ignores a lot of subtleties - and cost…
> Or is it to decouple some services that need to scale in different ways
This is really over sold. You could allocate another instance to a specific service to provide more CPU to it, or you can allocate another instance to your whole monolith to provide more CPU.
Maybe if the services use disproportionately different types of resources - such as GPU vs CPU vs memory vs disk. But if your resources are fungible across services, it generally doesn't matter if you can independently scale them.
Compute for most projects is the easiest thing to scale out. The database is the hard part.
> Compute for most projects is the easiest thing to scale out. The database is the hard part.
If you did things the "right" way to begin with. You have to keep in mind that many people in the industry need to solve problems of their own making, and this then doesn't translate or make sense to other people.
Scaling is a great example.
It's common to see web applications written in horrendously inefficient front-end languages. Developers often forget to turn "debug" builds off, or spend 90% of the CPU cycles logging to text files. Single-threaded web servers were actually fairly common until recently.
Then of course the web tier will have performance issues, which developers can paper over by scaling out to rack after rack of hardware. Single threaded web server? No worries! Just spin up 500 tiny virtual machines behind a load balancer.
Meanwhile, the database engine itself is probably some CotS product. It was probably written in C++, is likely well-optimised, scalable, etc...
So in that scenario, the database is the "easy" part that developers can just ignore, and scaling the web servers is the "hard" part.
Meanwhile, if you write your front-end properly, then its CPU usage relative to the database engine will be roughly one-to-one. Then, scaling means scaling both the front-end and database tiers.
Yes, but when doing so seems silly it's a good sign that they should not be separate services. Keep things that change at the same time in the same place. When you schema changes the code that relies on it changes.
Need depends on your needs. You can share the DB but you lose the isolation. The tradeoff is up to you.
There are also different ways to share. Are we talking about different DBs on the same hardware? Different schemas, different users, different tables?
If you want to be so integrated that services are joining across everything and there is no concept of ownership between service and data, then you're going to have a very tough time untangling that.
If it's just reusing hardware at lower scale but the data is isolated then it won't be so bad.
I'm agreeing with your other other replies, but with one caveat.
Each service needs its own isolated place to store data. This programming and integration layer concern is very important. What's less important is having those data stores physically isolated from each other, which becomes a performance and cost concern.
If your database has the ability to isolate schemas / namespaces then you can share the physical DB as long as the data is only used by a single service. I've seen a lot of microservices laid out with different write/read side concerns. These are often due to scaling concerns, as read-side and write-side often have very different scaling needs. This causes data coupling between these two services, but they together form the facade of a single purpose service like any single microservices for outside parties.
Additionally, you can probably get by having low criticality reports fed through direct DB access as well. If you can afford to have them broken after an update for a time, it's probably easier than needing to run queries through the API.
There are two ways to interpret this question, and I'm not sure which you're asking. You should not have two microservices sharing a single database (there lies race conditions and schema nightmares), but it is totally fine for some microservices to not have any database at all.
I like microservices owning their databases. It allows you to choose the correct database for the job and for the team. Sharing state across these microservices is often a bad sign for how you’ve split your services. Often a simple orchestrator can aggregate the relevant data that it needs.
Are you talking about different DBs, or just different tables? If it's just different tables, they can operate sufficiently independently if you design them that way, so you can change the schema on one table without messing up the others.