Correct, though it depends whether you are CPU-bound or I/O-bound. We see the latter a lot more often for large data sets.
Columnar storage for PostgreSQL is especially relevant in cloud environments. Most database servers in the cloud use managed, network-attached disks because of durability, availability, and encryption-at-rest requirements. However, those do come with a performance penalty compared to local SSDs. The VMs also have IOPS and bandwidth limits, partly to manage capacity within the IaaS platform.
If you can reduce the data size by 10x, then you are effectively increasing your disk bandwidth by that much as well. Moreover, you can keep more data in memory, so you will read much less data from disk, plus you'll only read the columns used by the query. Hence, you're likely to see speed ups of more than 10x for some queries, even without column-oriented execution.
That's exactly what we've seen. I don't know how Citus exactly stores data, but we found difference of 30x between gzipped Parquet and "generic Oracle table".
There's a huge difference whether your analytical query is scanning full 30 GB or 1 GB (or maybe even half that or less if you need to scan just some columns).
Wouldn't it be possible to create a new type of index in Postgres (or maybe it already exists) which would take all the data of a column and simply lays it out in columnar format in memory, with all the benefit of compression ?
Columnar storage for PostgreSQL is especially relevant in cloud environments. Most database servers in the cloud use managed, network-attached disks because of durability, availability, and encryption-at-rest requirements. However, those do come with a performance penalty compared to local SSDs. The VMs also have IOPS and bandwidth limits, partly to manage capacity within the IaaS platform.
If you can reduce the data size by 10x, then you are effectively increasing your disk bandwidth by that much as well. Moreover, you can keep more data in memory, so you will read much less data from disk, plus you'll only read the columns used by the query. Hence, you're likely to see speed ups of more than 10x for some queries, even without column-oriented execution.