Filesystems are close to being obsolete in the cloud. We still need them to get applications "started", but after that, applications shouldn't really be using a filesystem. I/O should take place with a remote (or local) service that deals with the many moving parts behind the thing the application really wants (look up some data, read some data, write some data, delete some data) in a large distributed system (logging, tracing, authn+z, storing/retrieving, sharing, archiving, etc). How it gets stored and on what media is pretty inconsequential to the app, and in many cases things are just more difficult because of the inherent limitations of storing data in files in filesystems on disks.
The other thing is, once disks are as fast as RAM, there's really no point in having RAM. It only exists because running programs would have been really slow if we had to wait for the disk to seek 100 times to print "Hello World".
Combine these two scenarios and you land smack into one new reality: all applications should be doing I/O in a giant pool of state that is both data storage and computational memory, as a global, distributed, virtual storage memory manager. In effect, clusters of computers should be sharing CPU and storage, and applications should run across all of those resources, like one big computer cut into 100 pieces. The idea of this was called SSI before, but it can actually become reality a lot easier if we can make the abstractions so simple that we don't have to code around ugly problems like threaded-shared-memory apps.
Basically we need to make a new "layer" that will allow us to delete a couple of old ones, and that simplicity will enable whole new designs and make currently-hard problems easy.
There is sadly still a huge performance gap between most filesystems and other storage API's.
If you tried to boot windows with all system files stored on S3, it would take forever.
It turns out the overhead of HTTP, encoding, many memory copies, splitting into TCP/UDP packets, etc. Is huge compared to a DMA transfer straight from an SSD into an applications memory space.
> all applications should be doing I/O in a giant pool of state
I'm somewhat infamous as one of the last people to have given up on distributed filesystems, but even I have always believed the "giant pool of state" approach is a better model for applications. Storage should be designed for applications, not the other way around. Hide the tiers and locality and layouts behind an abstraction layer as much as possible, let 90% of the code "think" in objects or rows or arrays or whatever makes sense for it. Yes, the abstraction will leak. You'll have to deal with it when performance tuning, and probably add some other user-visible concept of consistency/durability points, but all of that should be minimized. In addition to lessening the cognitive load for most programmers, having such a layer makes it easier to adapt to new technologies in a rapidly changing landscape.
> The idea of this was called SSI before, but it can actually become reality
This is where we diverge a bit. SSI vs. explicit distribution is really orthogonal to storage models and abstractions. Most SSI attempts failed because the coordination/coherency cost even for compute/memory stuff was too high. Also the semantics around things like signals, file descriptors, process termination and so on were always a mess. POSIX contains too many things that only really work in a single system, barely a shared-memory multiprocessor let alone a more loosely coupled type of system, but that goes well beyond the storage parts. (Yes, young 'uns, there's more to POSIX than the filesystem part.) Tanenbaum's and Deutsch's critiques of RPC mostly apply to any kind of SSI, especially with respect to handling partial failure. While abstracting away the storage part makes sense, I don't think abstracting away a system's distributed nature is a good idea (or even possible) for most domains.
The irony is that this circles back to the early days of Java Application Servers and the idea that all that that the EAR depends on (besides configuration resources) should live on the data layers, configured via JNDI.
The other thing is, once disks are as fast as RAM, there's really no point in having RAM. It only exists because running programs would have been really slow if we had to wait for the disk to seek 100 times to print "Hello World".
Combine these two scenarios and you land smack into one new reality: all applications should be doing I/O in a giant pool of state that is both data storage and computational memory, as a global, distributed, virtual storage memory manager. In effect, clusters of computers should be sharing CPU and storage, and applications should run across all of those resources, like one big computer cut into 100 pieces. The idea of this was called SSI before, but it can actually become reality a lot easier if we can make the abstractions so simple that we don't have to code around ugly problems like threaded-shared-memory apps.
Basically we need to make a new "layer" that will allow us to delete a couple of old ones, and that simplicity will enable whole new designs and make currently-hard problems easy.