Filesystems are close to being obsolete in the cloud. We still need them to get ...

londons_explore · on July 22, 2022

There is sadly still a huge performance gap between most filesystems and other storage API's.

If you tried to boot windows with all system files stored on S3, it would take forever.

It turns out the overhead of HTTP, encoding, many memory copies, splitting into TCP/UDP packets, etc. Is huge compared to a DMA transfer straight from an SSD into an applications memory space.

perlgeek · on July 22, 2022

> The other thing is, once disks are as fast as RAM, there's really no point in having RAM.

You make it sound like RAM is only used as a cache for disks, but there are other reasons to have RAM:

* RAM supports basically unlimited number of reads and writes, which (to the best of my knowledge) our current disks don't

* it serves as a central integration point for all sorts of IO: disk IO, network, interface to GPU etc

* volatility can be a feature (switching off a device wipes RAM)

tmikaeld · on July 22, 2022

Combining NVM and RAM has been in progress for a long time:

https://en.wikipedia.org/wiki/Resistive_random-access_memory

But there seems to be a lot of more research needed before the gap is even close to being closed.

notacoward · on July 22, 2022

> all applications should be doing I/O in a giant pool of state

I'm somewhat infamous as one of the last people to have given up on distributed filesystems, but even I have always believed the "giant pool of state" approach is a better model for applications. Storage should be designed for applications, not the other way around. Hide the tiers and locality and layouts behind an abstraction layer as much as possible, let 90% of the code "think" in objects or rows or arrays or whatever makes sense for it. Yes, the abstraction will leak. You'll have to deal with it when performance tuning, and probably add some other user-visible concept of consistency/durability points, but all of that should be minimized. In addition to lessening the cognitive load for most programmers, having such a layer makes it easier to adapt to new technologies in a rapidly changing landscape.

> The idea of this was called SSI before, but it can actually become reality

This is where we diverge a bit. SSI vs. explicit distribution is really orthogonal to storage models and abstractions. Most SSI attempts failed because the coordination/coherency cost even for compute/memory stuff was too high. Also the semantics around things like signals, file descriptors, process termination and so on were always a mess. POSIX contains too many things that only really work in a single system, barely a shared-memory multiprocessor let alone a more loosely coupled type of system, but that goes well beyond the storage parts. (Yes, young 'uns, there's more to POSIX than the filesystem part.) Tanenbaum's and Deutsch's critiques of RPC mostly apply to any kind of SSI, especially with respect to handling partial failure. While abstracting away the storage part makes sense, I don't think abstracting away a system's distributed nature is a good idea (or even possible) for most domains.

slindz · on July 22, 2022

Sounds a lot like erlang.

Elixir has been my daily driver since 2016; still loving it.

pjmlp · on July 22, 2022

The irony is that this circles back to the early days of Java Application Servers and the idea that all that that the EAR depends on (besides configuration resources) should live on the data layers, configured via JNDI.

Indeed Jurassic Cloud.