I am not a MongiDB fan (I am auser but not a fan) but what your are implying is FUD; MOngoDB is unlikely to lose your data but it is not as idiot-proof as traditional DBs. Anyone who bothers to read the documentation would know how to avoid data loss. Having said that, I am not happy about Mongo's DB Locking but it hasn't affected my applications' throughput yet.
Any particular advice on how to not lose data when map-reducing from a sharded collection into another sharded collection? Because we had to migrate away from mongodb in a hurry when we had ~20-40% of our data just not get written (basically 1-2 shards worth) from that scenario. And by in a hurry, I mean we spent a significant amount of time debugging, troubleshooting and tracking down what was causing the issue (we were on 2.2 at the time, not sure if it's been fixed since...) and realized there was no way to fix the bug unless we were willing to delve into how mongo did the writes from map-reduces. So once we found the underlying issue, we quickly migrated away.
The number of conditions where it will silently lose data and you have no control over the write consistency is absurd. However, every time it's brought up, people shout it down because they assume you're talking about the known (32-bit version and dataset size/ram size, not setting it to confirm the write, etc) write issues, and not completely different ones that aren't resolvable.
@ismarc , I have never had to encounter your particular use case. Is there a bug report with details for duplicating this ? Like I said, I am not a fan of Mongo and were I to encounter an issue like yours in my use cases, I would bite the bullet and migrate to something else.
There may still be. The first two I opened were closed pointing to docs on how to set the writing stuff. The reproduction is pretty easy, if a shard tries to write its results to a shard that is write locked, none of that shard's map reduced data after that point is written to any shard. The more evenly distributed your data, and the more shards you have, the more likely you'll hit the condition. Combined with the fact that all unsharded collections always go to the same shard, the whole system becomes useless unless you can fit your entire dataset in all collections in ram on a single box.
No, MongoDB's persistence issues are more than just the fact that drivers previously did not default to telling the server to fsync. Here's a list of open questions I have about MongoDB in production: