What's wrong with PCIe? The devices are ubiquitous. It's point-to-point, allowing for device to device connectivity. We have external PCIe enclosures and cables. We have a healthy set of PCIe switches.
This is similar. It's also point-to-point, and it has an external story. Skimming the spec, they've even thought about higher latency (200ns) links, and optical. Even with these advantages, I'm unsure how it'll work due to the lack of IP available as compared to PCIe.
Interesting, though. Will be fun to see it play out. It bums me out a little bit though that POWER isn't easily available yet.
PCIe is still rather slow. A single-packet transaction on PCIe costs 120ns.
The lack of IP is the reason why P8/CAPI stuck to PCIe. In the original design, CAPI was to simply reuse the PCI link layer, not the transaction layer.
With a cache-coherent system based on PCIe, especially when the coherency layer is set at L3 like on POWER8, you are looking at ~500ns latency for a single cache line. This kind of latency is just too much for many applications.
It depends on how low latency they mean by 'low latency'. If it can't drive fat gaming GPUs to full utilization, PCIe will be around for a while still. Also, Intel isn't joining up, so PCIe is absolutely sticking around.
Xilinx offers 25gbps single-lanes that can bond up to 4x to get IEEE 802.3-2012 spec compliance for free* with their suite. Sure, you're going to need to control those trace impedences and your board won't be something coming out of OSH Park, but those are definitely attainable speeds for the consumer (e.g., in the single-thousands of dollars; not 800k Cisco VXR tier-1 infrastructure).
You can configure it in CAUI-10 (10 lanes x 10.3125G) or CAUI-4 (4 lanes x 25.78125G), either way, it's been production-ready for quite some time now. (The docs have numbers, but trust me, you can get full throughput within that 200 ns).
There's even production Agilent off-the-shelf test equipment out there that can fully sample at those speeds (none of that over-sampling tomfoolery, we're talking live, Bill O'Reilly style).
In 1989, UltraSPARC had similar facilities (SBus) to push 100MBit between other Sun machines, so I mean, not too insane comparatively.
* Free with purchase of Virtex® UltraScale™ and Kintex® UltraScale FPGA required haha.
I would like to search the term and understand how they can achieve 200ns latency. Maybe am I the only one who thinks 200ns latency as completion latency?