Things I hate about PostgreSQL (2020)

rattray · on April 6, 2021

Another recent Postgres-complaint post from one of the best engineers I've worked with: https://blog.nelhage.com/post/some-opinionated-sql-takes/

Quoting his conclusion:

> As for Postgres, I have enormous respect for it and its engineering and capabilities, but, for me, it’s just too damn operationally scary. In my experience it’s much worse than MySQL for operational footguns and performance cliffs, where using it slightly wrong can utterly tank your performance or availability. … Postgres is a fine choice, especially if you already have expertise using it on your team, but I’ve personally been burned too many times.

He wrote that shortly after chasing down a gnarly bug caused by an obscure Django/Postgres crossover: https://buttondown.email/nelhage/archive/22ab771c-25b4-4cd9-...

Personally, I'd still opt for Postgres every time – the featureset is incredible, and while it may have scary footguns, it's better to have footguns than bugs – at least you can do something about them.

Still, I absolutely wish the official Postgres docs did a better job outlining How Things Can Go Wrong, both in general and on the docs page for each given feature.

paulryanrogers · on April 6, 2021

It's interesting how personal scars can entrench ones perspective. After MySQL 8's renaming-table-will-crash-server bug I'm reluctant to use it for new projects.

aaron-santos · on April 6, 2021

In any sufficiently large dev team the technology choices trend toward zero. Because everyone has their "I will not work with this tech after dealing with its nasty bug."

paulryanrogers · on April 6, 2021

Indeed. And products can get better and worse over time. If there's a consistent pattern of critical bugs at least with FOSS there is the possibility of forks and 3rd party patches.

Corrado · on April 7, 2021

Agreed! My personal bias against a system depends on if it failed at it's primary purpose or not. I've had a lot of things fail in a lot of ways and I'm generally pretty forgiving, but there are some things that are just inexcusable. Some examples:

* A popular (at the time) version control system started to silently overwrite files when it ran out of disk space. We discovered this when we retrieved source code that had parts of another file in it.

* A UPS vendor had a product that tested the battery by turning off the power. When the battery ultimately failed our servers were powered off automatically. This happened randomly about every 3 weeks (usually on a weekend evening). It took us months to find the cause.

I can't come back from these problems. So, if the system is a bit slow or crashes sometimes or has other weird problems, I'm OK with that. If it bites me specifically where it's supposed to protect me, it's out, forever.

jrockway · on April 6, 2021

I'm personally guilty of this mindset, but it's something I'm working on. After you get burned by a system, you know of the bug, and you can fix it. But the instinct is to switch to a new system, or to rewrite the system. That does get rid of all the bugs in the old system! But, in the process you've replaced them with brand new bugs that nobody has seen or heard from, until they decide to crawl into your mouth while you're asleep and you wake up in a panic. It's bugs all the way down, folks... this is software we're dealing with!

I think sendmail is the classic example of a program with so many bugs it had to be rewritten from scratch, and many people did. All of those alternatives, even qmail (widely debated: https://www.qualys.com/2020/05/19/cve-2005-1513/remote-code-...), ended up with a bug or security problem too. And they seem to have even fixed sendmail. It's still around and it doesn't take down the whole Internet every three weeks anymore. Wow! Sometimes there are just a million bugs, and you fix all one million of them, and then there aren't any more bugs.

hvidgaard · on April 7, 2021

In reality, the smart engineering choice is to know exactly what you are expecting from a specific system, and test it. Bugs, mistakes and even cosmic rays happen, so you must specifically test that it works. If you have a backup power system, test it.

citrin_ru · on April 7, 2021

sendmail is not a good example of a bad rewrite: 1st qmail is no more rewritten sendmail than Linux is rewrittnen Unix - qmail/exim/postfix just different software for the same use case. 2dn many other MTA (e. g. Postfix) managed to maintain much lower amount of security vulnerabilities than sendmail.

psanford · on April 6, 2021

> It's still around and it doesn't take down the whole Internet every three weeks anymore.

Maybe that's because the vast majority of email no longer goes through sendmail.

oauea · on April 6, 2021

As long as mysql can't run ddl statements in a transaction it's worthless as far as I'm concerned.

Also the thing where they (used to?) silently truncate your data when it wouldn't fit a column is absolutely insane. I'll take operational footguns over losing half my data every damn time.

setr · on April 6, 2021

Till v8.0.16 mysql used to accept and then just ignore check constraints

I've never been so offended by a technology as the day I discovered that; it's not a misfeature and its not a bug -- only pure malice could have driven such a decision

zodiakzz · on April 6, 2021

Don't forget the 3-byte encoding they invented and call it 'utf8'.

spiralx · on April 8, 2021

I remember reading the MySQL Gotchas page back in the 2000s and that leaping out as a particularly egregious issue. It fit in with their whole ethos of "databases don't need transactions and users don't need errors" around that time though, which put me off for life.

lokedhs · on April 6, 2021

I'm definitely no fan of mysql. I have, like many others, been scarred by its misfeatures. However, not having DDL in transactions isn't really a barrier for being useful. Oracle doesn't have transactional DDL either, and say what you will about the company, the product itself has proven itself.

jhgb · on April 6, 2021

But Firebird does have it, too. I find it funny how supposedly top products sometimes lack the most basic features.

stepbeek · on April 6, 2021

The lack of ddl in a transaction is what scared me away too. Having to manually clean up after a failed migration just felt like something I shouldn't be thinking about.

skunkworker · on April 6, 2021

As nice as having DDL in a transaction is, once you get to scale this isn't used as you'll be doing index creation concurrently which cannot be done inside a DDL transaction.

carlos_rpn · on April 6, 2021

They don't truncate data anymore, unless you enable it in the configuration (it's disabled by default). Invalid data (0000-00-00) is also not accepted anymore.

Foobar8568 · on April 6, 2021

MySQL is as advanced as Oracle on this topic (DDL in transaction), unless Oracle has changed in the recent years.

jeltz · on April 6, 2021

Which means equally useless. I agree with him about transactional DDL. Having worked both with it and without it I would never want to go back to MySQL.

Foobar8568 · on April 6, 2021

I have never said that it was useful :o I work with SQL Server, and I have been always amazed that DDL aren't transactional in Oracle. And it's supposed to be a "serious" database. That and the empty string being equals to NULL, but I think they 180 on that point in the recent years.

jbluepolarbear · on April 6, 2021

They’re so right about performance gotchas. I worked on a large Java project a few years back and they were transitioning from MySQL to Postgres, after the upgrade performance was abysmal. I then spent the next 5 months optimizing queries. A lot of the issues were inner joins and how MySQL and Postgres handled lookups in inner joins differently. I would still pick Postgres over MySQL because the tools and features around it are too very good.

runeks · on April 6, 2021

Devil’s advocate: could it simply be that someone spent 5 months optimizing queries for MySQL before switching to Postgres? Such that Postgres performance isn’t “worse”, it just doesn’t plan queries in the same way that MySQL does.

jbluepolarbear · on April 6, 2021

It was definitely that. The queries were built to take advantage of MySQL features.I joined after they just switched and was told to optimize queries. Was a pretty toxic job, I was hired work on streaming media systems (like Hulu, ESPN, etc) but instead they had me doing query optimizations. I quit after 5 months. I was never able talk to my boss once, he avoided me and was always too busy.

Shorel · on April 9, 2021

That's the kind of job I enjoy doing.

It is just not sexy and some startup rejected me when I said that's what I do.

nomel · on April 7, 2021

The purpose of a query engine is so I don't have to think about this stuff. Computers are supposed to do work for me, no the other way around, dangit!

twic · on April 6, 2021

My experience was the opposite - a Java app running on MySQL that had painfully slow joins, that immediately got much faster on porting to PostgreSQL!

rzwitserloot · on April 7, 2021

> it’s much worse than MySQL for operational footguns and performance cliffs

As wikipedians would say, [citation needed].

The post you link to concludes with:

> Operating PostgreSQL at scale requires deep expertise

and

> I hate performance cliffs

However, both of these statements are true for _any_ major SQL-based DB engine available, including MySQL.

As the post itself shows, psql is at least doing its job in guaranteeing consistency of the data, and has tools to figure out what is going on, which is absolutely crucial when 'operating at scale'.

In other words, yeah, you need deep expertise. However, no, it's not 'much worse' than MySQL for operational footguns. MySQL has a ton of footguns just the same.

rattray · on April 6, 2021

Does anyone know of a quality, comprehensive book that enumerates all the things to watch out for and problems to proactively prevent when operating Postgres at scale?

eor · on April 6, 2021

It's not a book, but Christophe Pettus' blog (https://thebuild.com/blog) has a lot of really good information. In particular, his talk "Breaking PostgreSQL at Scale" goes through the problems you run into as you hit different levels of scale (https://thebuild.com/presentations/2019-fosdem-broken.pdf)

rattray · on April 6, 2021

Thanks! Very helpful, that talk looks like a (very) condensed version of what I was looking for.

It looks like the video for that talk is here: https://www.youtube.com/watch?v=XUkTUMZRBE8

twunde · on April 6, 2021

FYI, for anyone interested Pettus does consulting work as the founder of https://pgexperts.com/. Highly recommend working with him if you need a postgres DBA.

gshulegaard · on April 6, 2021

Thanks for that! Been doing a lot of Postgres work but first time seeing that slide deck.

gshulegaard · on April 6, 2021

Sorry for the long, rambling comment. After I wrote it I wasn't sure it added much, but since I invested so much time writing it I figured someone might find something in it useful so in that off chance I am posting it.

---

Those were really interesting reads, and it's obvious to me that the author is well experienced even if I find myself at odds with some of the points and ultimate conclusion. To be explicit, there _are_ points which resonated strongly with me.

I am by no means an expert, and fairly middling in experience by any nominal measure, but I _have_ spent a significant portion of my professional experience scaling PostgreSQL so I thought I would throw out my $0.02. I have seen many of the common issues:

- Checkpoint bloat

- Autovacuum deficiencies

- Lock contention

- Write amplification

and even some less widely known (maybe even esoteric) issues like:

- Index miss resulting in seq scan (see "random_page_cost" https://www.postgresql.org/docs/13/runtime-config-query.html)

I originally scaled out Postgres 9.4 for a SaaS monitoring and analytics platform, which I can only describe as being a very "hands on" or a manual process. Mostly because many performance oriented features like:

- Parallel execution (9.6+) (originally limited in 9.6 and expanded in later releases)

- Vacuum and other parallelization/performance improvements (9.6+)

- Declarative partitioning (10.0) (Hash based partitions added in 11.0)

- Optional JIT compiling of some SQL to speed up expression evaluation (11.0)

- (and more added in 12 and 13)

Simply didn't exist yet. But even without all of that we were able to scale our PostgreSQL deployment to handle a few terabytes of data ingest a day by the time I left the project. The team was small, between 4-7 (average 5) full time team members over 3 years including product and QA. I think that it was possible--somewhat surprisingly--then, and has been getting steadily easier/better ever since.

I think the general belief that it is difficult to scale or requires a high level of specialization is at odds with my personal experience. I doubt anyone would consider me a specialist; I personally see myself as an average DB _user_ that has had the good fortune (or misfortune) to deal with data sets large enough to expose some less common challenges. Ultimately, I think most engineers would have come up with similar (if not the same) solutions after reading the same documentation we did. Another way to say this is I don't think there is much magic to scaling Postgres and it is actually more straight forward than common belief suggests; I believe there is a disproportionate amount of the fear of the unknown rather than PostgreSQL being intrinsically more difficult to scale than other RDBMS's.

The size and scope of the PostgreSQL feature set can make it somewhat difficult to figure out where to start, but I think this is a challenge for any feature-rich, mature tool and the quality of the PostgreSQL documentation is a huge help to actually figuring out a solution in my experience.

Also, with the relatively recent (last 5 years or so) rise of PostgreSQL horizontal-scale projects like Citus and TimescaleDB I think it is an even easier to scale PostgreSQL. Most recently, I used Citus to implement a single (sharded) storage/warehouse for my current project. I have been _very_ pleasantly surprised by how easy it was to create a hybrid data model which handles everything from OLTP single node data to auto-partitioned time series tables. There are some gotchas and lessons learned, but that's probably a blog post in it's own right so I'll just leave it as a qualification that it's not a magic bullet that completely abstracts the nuances of how to scale PostgreSQL (but it does a darned lot).

TL;DR: I think scaling PostgreSQL is easier than most believe and have done it with small teams (< 5) without deep PostgreSQL expertise. New features in PostgreSQL core and tangential projects like Citus and TimescaleDB have made it even easier.

barrkel · on April 7, 2021

Don't knock yourself. Doing this work of scaling increases your expertise substantially, and the journey and the hurdles you cross along the way move you several standard deviations beyond the crowd. Specialist is a different term; it's more exclusionary than it is necessarily denoting of expertise. You can specialize in small applications without gaining expertise in scale.

gshulegaard · on April 8, 2021

I greatly appreciate your positivity!

Experience is a great expertise builder for sure, although I find the more experience I get the more technical expertise I realize I don't have. A bit ironic now that I am thinking about it in those terms.

But I hope my comment made scaling PostgreSQL feel approachable for others who consider themselves non-experts in the area. The message I hoped to build was that non-experts can be successful without trivializing the effort. Which can be a somewhat difficult line to walk.

But thank you regardless.

rattray · on April 6, 2021

Thanks :) FWIW I find this level of detail useful.

What do you mean by "Index miss" – index cache miss (ie not in RAM)?

gshulegaard · on April 7, 2021

Oh that is a self-coined term...sorry! Basically (as I understand it as a layman) the query planner can "choose" to perform a sequential scan instead of use an index when executing a query. For "normal" sized workloads on "traditional" hardware (like 5400 rpm disk drives) this is a performance gain, but for large data sets it can cripple performance and even lock up a DB.

ptman · on April 7, 2021

can you avoid that by tweaking the cost factors in postgres config?

gshulegaard · on April 7, 2021

Exactly! Reduce "random_page_cost" down (default is 4) and this becomes a non-issue. It's just one of the less well known things and somewhat difficult to diagnose. What I mean is that what you will see is your database returned rows metric will rapidly climb but rows fetched will hold steady (or decrease). Based on that alone it is a bit of a leap (or at least it was for me at the time) to conclude that the query planner was not using an index on a table that was clearly indexed.

Once you understand what is going on, it makes perfect sense. It's just getting there that is the trick.

LinuxBender · on April 7, 2021

Out of curiosity, do these optimizations / calculations find their way into scripts like postgresqltuner [1] or do most DBA's here just apply their own learned optimizations manually?

[1] - https://github.com/jfcoz/postgresqltuner

gshulegaard · on April 8, 2021

This is the type of thing that I hope make into a tool like postgresqltuner, although I must confess this is the first time I have heard of it (great looking utility though!). I am more of a generalist that happens to have spent time scaling Postgres so I can't comment much on what a "true" DBA would do, but I find myself applying sensible settings in the postgres config for parameters I have used before. This would be more on the "manual" side of your question. To be fair, I think the relatively straightforward nature of PostgreSQL and the great configuration documentation makes this easy once you know a configuration parameter exists.

If I were to skew more of my development cycles to dedicated database management, I think incorporating analytic tools like postgresqltuner would be a must. Although I would probably cross-reference any suggestions with my past experience and dig into Postgres docs on the specific parameters it highlights. Regardless I suspect it would be a valuable source of additional context.

mikl · on April 6, 2021

I think it’s worth mentioning that most of these problems only occur at a scale that only top 1% of companies will reach. I’ve been using PostgreSQL for over a decade without reaching any of the mentioned scaling-related problems.

PostgreSQL is still the best general purpose database in my opinion, and you can then consider using something else for parts of your application if you have special needs. I’ve used Cassandra alongside PostgreSQL for massive write loads with great success.

latch · on April 6, 2021

PostgreSQL is great, but I don't think your statement is particularly true.

Process per connection is pretty easy to accidentally run into, even at small scale. So now you need to manage another piece of infrastructure to deal with it.

Downtime for upgrades impacts everyone. Just because you're small scale doesn't mean your users don't expect (possibly contractually) availability.

Replication: see point above.

General performance: Query complexity is the other part of the performance equation, and it has nothing to do with scale. Small data (data that fits in RAM) can still be attacked with complex queries that can benefit from things such as clustered index and hints.

yarcob · on April 7, 2021

> Downtime for upgrades impacts everyone. Just because you're small scale doesn't mean your users don't expect (possibly contractually) availability.

I don't understand this mindset. Every tiny startup thinks they need zero downtime migrations.

At the same time, major banks and government institutions just announce maintenance windows. They just pick a time when few people use the service and then shut the whole system off for a few hours.

Sure, it's nice if your service is never down. But I'm also pretty sure that most customers prefer paying for new features rather than preparing for zero downtime migrations.

Also, considering how long PostgreSQL versions are supported, you only need to do major version upgrades every five years or so.

latch · on April 7, 2021

We're a small transport company, trying to get employees to and from work. No matter the time of day, there are always people commuting (or planning to commute). When's a good time for outage?

Imagine it's 3:30am, you just got off a shift, you can't afford a cab and the nearest subway is 10KM away. How fun is it that the transport app you rely on it down for maintenance?

Maybe that helps you understand the mindset?

yarcob · on April 7, 2021

I'm pretty sure I could deal with my commuting app being down for planned maintenance once a year for 3 hours. Especially if I got prior notification.

What's a good time for outage? I don't know anything about your app. You could do a database query and look for 3 hour intervals where less than 10 people are using your app. If those happen at regular times, those would be good candidate times for planned maintenance.

If you can't find a time slot like that, because you always have a significant number of people using your app at any time of day every day of the year, and the impact of a planned maintenance window would be significant to your customers, then you are probably at a scale where it makes sense to think about zero downtime migrations.

But to be honest, I think that 90% of startups don't fall into that category. I've seen founders that wasted time on multi master replication and automatic scaling just because it was fun to think about, before they even had any data or customers...

andoriyu · on April 6, 2021

> Process per connection is pretty easy to accidentally run into, even at small scale. So now you need to manage another piece of infrastructure to deal with it.

Most places I saw this as an issue, are where developers think that by tweaking the number of connections will give them a linear boost in performance. Those are the same people that think adding more writers in RWLock will improve writing performance.

I agree that it's easy to run into and pretty silly concurrency pattern for today's time. At the same time, it's just a thing you need to be aware of when using PostgreSQL and design your service with that in mind.

holtalanm · on April 6, 2021

> I think it’s worth mentioning that most of these problems only occur at a scale that only top 1% of companies will reach

I'll echo what another commenter said. Tons of data != tons of profit.

Tons of data just means tons of data.

Source: Worked on an industrial operations workflow application that handled literally _billions_ of records in the database. Sure, the companies using the software were highly profitable, but I wouldn't have called the company I worked with 'top 1%' considering it was a startup.

hobs · on April 6, 2021

It's really not that hard to have billions of rows in a modern data ingest situation, especially if you allow soft/deleting/versioning.

Honestly anything that fits on one hard drive shouldnt be called "tons of data."

jholman · on April 7, 2021

> handled literally _billions_ of records in the database

Classic example of Medium Data

barrkel · on April 6, 2021

I've hit many query performance regression problems with ~10 million rows, which required rewriting with CTEs and other techniques to tweak the planner. This isn't a large scale at all.

john8903 · on April 6, 2021

Not true - I work at a company of 400 people, and we ran into the Process-Per-Connection / pgbouncer issue.

mikl · on April 6, 2021

I guess that’s very dependent of what kind of framework you’re using. The only PostgreSQL-driver I’ve seen that does not have connection pooling built-in is the PHP one (since PHP’s runtime model does not work in a way where that would be easily possible).

tomnipotent · on April 6, 2021

> not have connection pooling

Local connection pooling only goes a very small way to mitigate this issue. If you have enough servers hitting PG, you're going to need to add in something like PgBouncer sooner or later.

darksaints · on April 6, 2021

It's still fairly easy to hit problems even when you're using an application level connection pool, simply because it's so damn easy to scale up application nodes.

hattar · on April 6, 2021

Company of 20 here. Same.

KptMarchewa · on April 6, 2021

>I think it’s worth mentioning that most of these problems only occur at a scale that only top 1% of companies will reach.

If you're talking about 1% of all software companies, then it's not true. You don't need to be B2C company with XXX millions users to have a lot of data.

>PostgreSQL is still the best general purpose database in my opinion, and you can then consider using something else for parts of your application if you have special needs.

Well, yes, you're already talking about one mitigation strategy to not get to this scaling problems.

remus · on April 6, 2021

I don't think this is necessarily true. Say you have 100 sensors sampling at 1kHz for a year, you'd have ~3 trillion rows in your database and plenty of potential for scaling issues at a very reasonable price.

refenestrator · on April 6, 2021

In that specific case, you probably want to roll up that time-series data as it gets older, while keeping the full dataset in a flat file system for data science etc if you need it.

You probably never need a millisecond-granularity data point from 6 months ago in your database.

sitkack · on April 6, 2021

They probably shouldn't be rows at all. They are effectively low frequency sound files. I'd probably store them in parquet and use a FDW wrapper in Postgres.

nomel · on April 7, 2021

I have time series of 2d 64x64 sensor data, resulting in a few billion values that I'm trying to cram into some custom parquet format. I'm often surprised that it's 2021 and we're still stuck in tabular data, with n-dimensional arrays often not even considered.

sitkack · on April 7, 2021

Thermal or depth map camera? Really depends on how you want to process and query it. At 16k per frame, I'd store each one sequentially. Do you need to only look at a single pixel across a million frames? Or do you process groups of 20-100 frames at a time?

nitwit005 · on April 6, 2021

Most of the time, people just decide on the aggregates they want from those sensors in advance, and discard the raw data.

I worked at a company that had some IOT devices logging "still alive and working fine" every couple of minutes. There was no point to holding onto that data. You only needed to know when the status changed or it stopped reporting in, as that's all anyone cared about.

dls2016 · on April 6, 2021

I'm starting a project in this realm right now, though only three sensors to begin with. Generally I'm leaning towards "everything in Postgres", but I think I'm going to store the raw sensor data in the filesystem.

ants_a · on April 6, 2021

I did a set of benchmarks recently for multi-dimensional scientific sensor data. You most definitely don't want row per measurement in PostgreSQL, but you can get surprisingly good results where you store a block of results per row in an array. For even better results TimescaleDB and ClickHouse achieved approximately 2-6 bytes per float32 timestamped measurement, depending on the dataset and shape.

dls2016 · on April 6, 2021

I intend to use TimescaleDB (+PostGIS) for all other data and last year researched this approach for the raw sensor data. I think they have improved compression since then.

For my use, the raw sensor data should only be retrieved 1) to perform a windowed analysis (the results of which will be stored in PostgreSQL) or 2) to display for the user. I'm planning to archive the raw sensor data in files for archival, so I think it's easier to just jam the data in 15-minute netCDF files and call it a day. Will definitely keep an open mind.

beckingz · on April 6, 2021

This is what I'm doing right now for my home sensor network.

3 ESP8266's with temperature, humidity, and light sensors sending a reading every second to a python app that writes a row to postgres on a raspberry pi 3.

So far the hardest bit has been getting all the services to restart on pi restart. Postgres works just fine.

dls2016 · on April 6, 2021

I'm going to be using an RPI as well and have been messing with systemd. I know people have opinions, but I don't and it was straightforward to configure a Python script to run as a service.

neolog · on April 6, 2021

How frequently do they sample?

dls2016 · on April 6, 2021

500Hz

citrin_ru · on April 7, 2021

> 100 sensors sampling at 1kHz for a year, you'd have ~3 trillion rows

PosgreSQL is a great OLTP DB, but this looks like a good fit for ClickHouse or some time series DB.

darkstar_16 · on April 6, 2021

I think a lot of issues that people complain about PostgreSQL come from the fact that the default config is not very good if one wants to run it in production, even for relatively small workloads. Things like process per connection can kick one in the foot if one is not aware of how PG works.

mekster · on April 6, 2021

Everytime I see comments that praises PostgreSQL on top of MySQL without any explanations, I tend to think they're trying to bury a product from Oracle than from a real need of one over the other.

sli · on April 6, 2021

Maybe some younger developers, but I'd imagine a lot of us grew to dislike MySQL years before Oracle bought it (in 2010). I'd switched to Postgres already by then.

I_am_tiberius · on April 6, 2021

The problem is that you want to build something that can scale in the future.

marcus_holmes · on April 6, 2021

ffs, this attitude causes massively more problems than it solves.

1. You can always change later. Uber switched from Postgres to MySQL when they had already achieved massive scale.

2. You don't know what scaling problems you're going to get until you've scaled.

3. Systems designed to scale properly sacrifice other abilities in order to do that. You're actively hurting your velocity with this attitude.

4. Every single expert in the field who has done this, says to start with a monolith and break it out into microservices as the product matures. Yet every startup is founding on K8s because "we'll need it when we hit scale so we might as well start with it"

5. Twitter's Fail Whale - the problems that failing to scale properly bring are less than the problems of not being flexible enough in the early stages.

Build it simple, and adapt it as you go. Messing up your architecture and slowing down your development now to cope with a problem you don't have is crazy.

deckard1 · on April 6, 2021

> You don't know what scaling problems you're going to get until you've scaled.

This is the point I keep repeating.

If you find yourself needing to scale, the way you scale likely does not match what anyone else is doing. The way Netflix scaled does not look anything like the way WhatsApp scaled. The application dictates the architecture. Not the other way around. Netflix started as a DVD service. Their primary scaling concerns were probably keeping a LAMP stack running and how the hell to organize, ship, and receive thousands of DVDs a day. These scaling problems have little in common with their current, streaming, scaling problems.

It's a weird thing that developers love to discuss and hype up scale and scaling technology and then turn around and warn against the dangers of premature optimization in code. If you ask me, the mother of all premature optimization is scaling out your architecture to multiple servers, sharding when you don't need to, dealing with load balancing, multiple security layers, availability, redundancy, data consistency, containers, container orchestration, etc. All for a system that could, realistically, run quite adequately on an off-the-shelf Best Buy laptop. We have gigabit ethernet and USB 3 on a Raspberry Pi today and people are still shocked you could run a site like HN off a single server. We've all been lobotomized by the cloud hype of the 2010s that we can't even function without AWS holding our hand.

yen223 · on April 6, 2021

I am partial to the "don't solve problems you don't have" argument which holds true in a lot of cases.

That said, the database is the one part of the system that is very tricky to evolve after the fact. Data migrations are hard. It's worth investing a little bit of time upfront to get it right.

nicoburns · on April 6, 2021

> Data migrations are hard.

Yes, which is exactly why you shouldn't go with a highly scalable database solution. All of the solutions for really big scale involve storing data in non-normalised form, which mean the pain of data migrations frequently while developing features.

Best to avoid this until you have to.

doctor_eval · on April 6, 2021

Agree entirely. You're going to have to migrate anyway. May as well migrate from a database that's easy to work with.

zaarn · on April 6, 2021

Don't do anything obviously complex with your RDBMS and migrations are free. If all you need is a few views, tables and FKs, then migration between RDBMS' should be low effort if you have a decent RSM or ORM to plug behind it. And even with more efforted things, I've written low-effort migrations from and to various RDBMS', it's not black magic.

The little time upfront is "use pgsql unless there is a good reason not to" as your first choice.

slt2021 · on April 6, 2021

if you dont change schema dramatically, then it doesnt make much sense to migrate to another RDBMS, because most engines have pretty much similar query planner (if you not doing "anything obviously complex").

if you do migrate due to scaling issues, then the schema must evolve, for example: add in-memory db for caching, db sharding/partitioning, table partitioning, hot/cold data split, OLTP/OLAP split, etc.

zaarn · on April 6, 2021

Scaling issues can present themselves in numerous ways which may not require an in-memory DB, sharding/partitioning, hot/cold or such to be changed, they may even be already present.

In a lot of cases, these can be used and added without locking you out of migration since parts of these are deeper application level or just DB side. The query planner isn't the end-all of performance, there is plenty of differences between MySQL and PgSQL performance behaviour that might force you to switch even though the query planner won't drastically change things.

webreac · on April 6, 2021

I have not seen comments about technical debt. I think you are right: It is good to take shortcuts to ship faster. When you do that, you accumulate technical debt. I think it is important to identify it and to remain aware of this debt. I've seen too many people in denial who resist change.

marcus_holmes · on April 6, 2021

It's not even tech debt. It's like a "tech short" - assuming you'll have this specific scaling problem in the future, and paying the cost now.

jakeva · on April 6, 2021

"Tech short" - I love it. I'm going to use that.

I_am_tiberius · on April 6, 2021

ffs, this attitude causes massively more problems than it solves.

I don't think that it causes so many problems to just use MySQL instead of Postgres from the very beginning of a project. I like using Postgres and I understand that I shouldn't care about scaling but if a make a good decision from the very beginning it can't hurt.

chrisandchris · on April 6, 2021

I would rather use Postgres and have a RDBMS that is quite strict and migrate data later instead of having a RDBMS that just does what it likes sometimes.

For example, query your table „picture“ with a first column „uuid“ (varchar) with the following query:

SELECT * FROM picture WHERE uuid = 123;

I don‘t know what you expect, I expect the query to fail because a number is not a string. MySQL thinks otherwise.

cafard · on April 6, 2021

In Oracle it will fail, but only if uuid has characters that can't be parsed as numbers...

chrisandchris · on April 6, 2021

Does that make it better? IMHO, it‘s even worse.

marcus_holmes · on April 6, 2021

Uber switched because of a very specific problem they had with the internals of Postgres, that was handled differently in MySQL (which I believe is now "solved" anyway).

It's not that MySQL scales better than Postgres, but that Uber hit a particular specific scaling problem that they could solve by switching to MySQL.

You could well use MySQL "because it scales better" and then hit a particular specific problem that would be solved by switching to Postgres.

doctor_eval · on April 6, 2021

Is MySQL a general solution to scaling? What if your scaling problem is with writes?

derekperkins · on April 6, 2021

That's why Vitess is so awesome - you can scale writes infinitely. There's not a truly comparable option for Postgres

o_p · on April 6, 2021

Its better to work on getting all those users before planning what color the ferrari will be..

dkarl · on April 6, 2021

While the executives are dreaming of exotic cars, the engineers are dreaming of exotic architectures. The difference is that when the CEO says, "It's crucial that I have this Ferrari BEFORE the business takes off," nobody takes them seriously.

notyourday · on April 6, 2021

The really funny part is that the engineers don't just dream of those architectures, they implement them. That's how you get an app that adds two numbers that runs on K8S, requires four databases, a queuing system, a deploy pipeline, a grafana/prometheus cluster, some ad-hock Rust kernel driver and a devops team.

I_am_tiberius · on April 6, 2021

Exactly. That's why it would be good to have a system which is prepared for scaling in the future.

oblio · on April 6, 2021

What's that system? MySQL? Are there any other OSS RDBMSes which are comparable and scale better?

I_am_tiberius · on April 6, 2021

I would only have thought of MySQL.

doctor_eval · on April 6, 2021

MySQL isn't a general solution to problems of scale, because you don't know what problems you're going to have until you have them. So for example if your scaling problem is ACID compliant database updates - say you're the next fintech - then I was under the impression that MySQL would be the last database you'd want to be using. Have I missed something?

I_am_tiberius · on April 6, 2021

I'm no expert and can't answer that. It was just my impression, and I might be wrong, that for scaling purposes MySQL is better suited. Currently I'm working on a Saas product and the test instance that runs on Digital Ocean sometimes causes connection limit issues (with connection pool) sometimes. Sure my code is maybe not perfectly utilizing connections but I'm really afraid that this happens in production and I don't know how to fix it. On my test environment I just restart everything but on a productive environment I can't do that all the time.

aidos · on April 6, 2021

The default limit on Postgres is 100, so you need to ask yourself why you’re exhausting all those connections. The issue isn’t the dB, it’s the code making the connections. Advice: don’t fret scaling issues, get your fundamentals right

theshrike79 · on April 6, 2021

I think I've heard a saying about this, something about premature optimisation...

I_am_tiberius · on April 6, 2021

Sure you shouldn't care about scaling at the beginning. But why should you start using a system that you already know won't scale in the future?

oblio · on April 6, 2021

> Sure you shouldn't care about scaling at the beginning. But why should you start using a system that you already know won't scale in the future?

Because it's well supported and solid otherwise? There's a wealth of documentation, resources of many kinds, software built around it (debugging, tracing, UIs, etc.). Because there's a solid community available that can help you with your problems?

What alternative technology is there that scales better? I guess MySQL could be it, but doesn't MySQL also come with a ton of its own footguns?

I_am_tiberius · on April 6, 2021

I use Postgres at the moment and I'm happy except for the process per connection part and the upgrade part. Knowing what I know now I think MySQL would have made me happier. On the other hand, it may have caused other issues I don't have with Postgres. I just hope the Postgres team maintains its roadmap based on posts like this.

theshrike79 · on April 6, 2021

Only if I _know_ I'm creating something that will definitely have huge amounts of concurrent users and someone pays me to make it scale from the start.

For a hobby project that might take off or might not, there's really no point in making everything "webscale"[0] just in case.

[0] https://youtu.be/b2F-DItXtZs

why_Mr_Anderson · on April 6, 2021

But you have to get to that future first! If you lose your customers because you can't deliver something on time due to complexity of your 'scaling-proof' system or because you can't accommodate changes requested by clients because they would compromise your architecture, scaling will be last of your worries.

WJW · on April 6, 2021

Because the hyperscalable databases are much more difficult to set up, use and administer. It's not a "free" upgrade, it'll slow down everything else you do.

spaetzleesser · on April 6, 2021

There are a lot of dimensions to scaling. It's hard to predict where you really will have to scale up.

barrkel · on April 6, 2021

My single biggest beef about PG is the lack of query planner hints.

Unplanned query plan changes as data distribution shifts can and does cause queries to perform orders of magnitude worse. Queries that used to execute in milliseconds can start taking minutes without warning.

Even the ability to freeze query plans would be useful, independent of query hints. In practice, I've used CTEs to force query evaluation order. I've considered implementing a query interceptor which converts comments into before/after per-connection settings tweaks, like turning off sequential scan (a big culprit for performance regressions, when PG decides to do a sequential scan of a big table rather than believe an inner join is actually sparse and will be a more effective filter).

ganomi · on April 6, 2021

Take a look at this Postgres Extension: http://pghintplan.osdn.jp/pg_hint_plan.html

I am even using this with AWS RDS since it comes in the set of default extensions that can be activated.

rattray · on April 6, 2021

The pg_hint_plan is now being developed on github: https://github.com/ossc-db/pg_hint_plan

I recently put up a PR for a README, you can read it here: https://github.com/ossc-db/pg_hint_plan/blob/8a00e70c387fc07...

ollysb · on April 6, 2021

This looks very interesting. I had real difficulty where I needed both a btree and gin(pg_trgm) index on the same column. When using `like` postgres would consistently choose the btree index which resulted in performance that was something like 15secs as opposed to the 200ms or so I'd see if the gin index were used. In the end I added two separate columns, one for each index so that I could force the correct one to be used for a particular query.

orthoxerox · on April 6, 2021

ClickHouse is the opposite: it has no optimizer, so your SQL must be structured the way you want it to run: deeply nested subqueries with one JOIN per SELECT. But at least you can be sure your query runs the way you intended.

rwmj · on April 6, 2021

An interesting approach and I'm not sure if I'd prefer it (I happen to like my queries being optimized automatically for my very tiny databases). But wouldn't it be possible to modify PostgreSQL to work this way too? It's unclear why you'd want to switch to a whole new DBMS for this.

barrkel · on April 7, 2021

Well, you're better off not doing joins at all in ClickHouse, beyond small dimension tables. Don't do joins between two or more big tables at all, is generally the rule in analytics databases; instead, pre-join your data at insert time.

CH supports optimizations for low-cardinality columns, so you can efficiently store things like enums directly as strings, rather than needing a separate table for them.

jeff-davis · on April 6, 2021

PostgreSQL offers a config where you can control join order to match the query text:

https://www.postgresql.org/docs/13/runtime-config-query.html...

zepearl · on April 6, 2021

> My single biggest beef about PG is the lack of query planner hints.

Same here.

I did evaluate if to use PG for my stuff, but not having any hint available at all makes dealing with problems super-hard and potential bad situations become super-risky (esp. for PROD environments where you'll need an immediate fix if things go wrong for any reason, and especially involving 3rd party software which might not allow you to change the SQLs that it executes).

Not saying that it should be as hardcore as Oracle (hundreds of hints available, at the same time a quite stubborn optimizer), but not having anything that can be used is the other bad extreme.

I'd like as well to add that using hints doesn't have to be always the result of something that was implemented in a bad way - many times I as a human just knew better than the DB about how many rows would be accessed/why/how/when/etc... (e.g. maybe just the previous "update"-sql could have changed the data distribution in one of the tables but statistics would not immediately reflect that change) and not being able to force the execution to be done in a certain way (by using a hint) just leaved me without any options.

MariaDB's optimizer can often be a "dummy" even with simple queries, but at least it provides some way (hints) to steer it in the right direction => in this case I feel like I have more options without having to rethink&reimplement the whole DB-approach each time that some SQL doesn't perform.

jeff-davis · on April 6, 2021

"many times I as a human just knew better than the DB about how many rows would be accessed/why/how/when/etc..."

Would you say the primary problem that you have with the planner is a misestimate of the number of rows input/output from a subplan? Or are you encountering other problems, too?

nrdvana · on April 6, 2021

(not the OP but...) I have had 3 cases in the last year where a postgres instance with less than millions of rows per table has decided to join with fancy hash algorithms that result in tens of seconds per query instead of the 5ms that it would take when it uses nested loops (i.e. literally start with the table in the from clause, apply some where clause, join to next table, apply more where clause, join to next table, and so on)

I do believe the planner was coming up with vast mis-estimates in some of those cases. 2 of the 3 were cases where the fully joined query would have been massive, but we were displaying it in a paged interface and only wanted 100 rows at a time.

One was a case where I was running a “value IN (select ...)” subquery where the subquery was very fast and returned a very small number of rows, but postgres decided to be clever and merge that subquery into the parent. I fixed that one by running two separate queries, plugging the result of the first into the second.

For one of the others, we actually had to re-structure the table and use a different primary key that matched the auto-inc id column of its peer instead of using the symbolic identifier (which was equally indexed). In that case we were basically just throwing stuff at the wall to see what sticks.

I have no idea what we’d do if one of these problems just showed up suddenly in production, which is kind of scary.

I’m sure the postgres optimizer is doing nice things for us in places of the system that we don’t even realize, but I’m sorely tempted to just find some way to disable it entirely and live with whatever performance we get from nested loops. Our data is already structured in a way that matches our access patterns.

The most frustrating part of it all is how much time we can waste fighting the query planner when the solution is so obvious that even sqlite could handle it faster.

For context, I’ve only been using postgres professionally for about a year, having come from mysql, sql server, and sqlite, and I’m certainly still on the learning curve to figure out how the planner works and how to live with it. Meanwhile, postgres feature set is so much better than mysql or sql server I’d never consider going back.

barrkel · on April 7, 2021

The feature set from an application perspective is killer. I love window functions especially; all sorts of clever things can be done in a single query which would otherwise require painful self-joins or multiple iterated queries and application-side joins in less sophisticated dialects.

nrdvana · on April 8, 2021

My favorite killer feature is jsonb_agg / jsonb_object_agg which let me pull trees of data in a single query without the exponential waste you’s get from cartesian products, and even deliver it to the frontend without needing to assemble the json myself.

barrkel · on April 7, 2021

The biggest problem I see is the planner not knowing the cardinality of columns in a big table after a join or predicate has been applied. You see this especially with aggregate queries rather than point queries.

That is, it decides that a sequential scan would be just peachy even though there's an inner join in the mix which in practice reduces the set of responsive rows, if it just constructed the join graph that way. The quickest route out of this is disabling sequential scan, but there's no hint to do that on a per-query basis. The longer route is hiding bits of the query in CTEs so the optimizer can't rewrite too much (CTEs which need MATERIALIZED nowadays since PG got smarter).

High total cardinality but low dependent cardinality - dependent on data in other tables, or with predicates applied to other tables - seems hard to capture without dynamic monitoring of query patterns and data access. I don't think PG does that; if it did, I think they'd sell it hard. It comes up with application-level constraints which relate to the data distribution across multiple tables.

jeff-davis · on April 6, 2021

Can you enumerate some use cases you've run into? Sometimes looking at the individual use cases leads to better features than trying to generalize too quickly. For instance, controlling join order might suggest a different solution than a cardinality misestimate or a costing problem.

Query plan freezing seems like an independently useful feature.

natmaka · on April 6, 2021

Isn't the optimizer fooled by some inadequately set parameter, for example "effective_cache_size"?

The planner may be fooled due to a too small data sample, you may try: ALTER TABLE table_name ALTER COLUMN column_name SET STATISTICS 10000;

Can't you use the autovacuumer in order to kick an ANALYZE whenever there is a risk of data distribution shift? ALTER TABLE table_name autovacuum_analyze_scale_factor=X, autovacuum_analyze_threshold=Y;

brianwawok · on April 6, 2021

It's pretty hard to fix a complex black-box query planner with an estimate of when another black box analyze command will fix it.

That said, if you have good monitoring, you can hopefully find out when a query gets hosed and at least have a chance to fix it.. it's not terribly often.

CoffeeOnWrite · on April 6, 2021

Don’t blindly set stats to 10000, an intermediate value between the default of 100 and the max of 10000 may give you the best plans; experiment to find out.

natmaka · on April 8, 2021

I don't understand. At ANALYZE time isn't, all other parameters being equal and adequate (costs, GEQO at max, effective_cache_size ...), the probability of obtaining a representative set of columns better with a larger amount of randomly-selected values? Then at planning time isn't the the devised plan of better quality?

Adding sampled values may be bad performance-wise, for example if the planner cannot take everything into account due to some margin/interval effect, and therefore produces the same plan using a bigger set of values. The random selection process may also, sometimes, select less-representative data in the biggest analyzed set. But how may it never lead to the best plans (which may be produced using a smaller analyzed set) or lead to (on average) worse plans?

CoffeeOnWrite · on April 8, 2021

First, your point about planning time is important, thanks for adding that.

Regarding my point, it's possible that the planner may provide a better (on average, faster executed) plan for a given key, if that key is not found in stats, and that keys for which this is true may fit a pattern within the middle of the stats distribution. It all depends on the database schema and stats distributions.

natmaka · on April 9, 2021

I understand and neglected this case, thank you!

nezirus · on April 6, 2021

I think this is a good list, one needs to know potential pitfalls and plan accordingly.

As for point #7, if your upgrade requires hours, you are holding it wrong, try pg_upgrade --link: https://www.endpoint.com/blog/2015/07/01/how-fast-is-pgupgra...

(as usual, before letting pg_upgrade mess with on disk data, make proper backups with pg_basebackup based tools such as barman).

briffle · on April 6, 2021

my only complaint with the pg_upgrade (with or without the --link) is for some reason, it does not move the statistics over, and you have to rebuild them, or have horrible performance for a while, until each page hits it auto-analyze thresholds.

I'm doing some testing now for my DB, and the rebuilding all stats takes far, far longer than the upgrade. The upgrade takes seconds, and it takes a while to analyze multi-TB sized tables, even on SSDs.

fdr · on April 6, 2021

Yeah, there's some kludgey workaround for this that is definitely 80/20 kind of material...pg_upgrade will generate a script that does progressively more accurate re-ANALYZE so you're not flying your early queries totally blind. Maybe look into running that.

otikik · on April 6, 2021

I will add one minor point to this list:

The name.

To this day I am convinced that the Hazapard UpperCASE usage is what has granted us:

- A database called PostgreSQL

- A library called libpostgres

- An app folder called postgres

- An executable called psql

- A host of client libraries which chose to call themselves Pg or a variation.

SigmundA · on April 6, 2021

Lets not forget the column casing issue, you either get columns to match your apps casing and have to quote them everywhere or live with them being lower cased automatically. https://dev.to/lefebvre/dont-get-bit-by-postgresql-case-sens...

ironmagma · on April 6, 2021

Don’t forget “libpq”

cpa · on April 6, 2021

Funfact: PQ is short for toilet paper in French So libpq always cracks me up. But then again there's a theorem prover called Coq (which is indeed pronounced as you imagine, it means rooster) and it's been named by French researchers at INRIA!

zdkl · on April 6, 2021

libcaca, also french: http://caca.zoy.org/

gonzus · on April 6, 2021

Ok, let's talk about this... pico, the editor and SI prefix, means "dick" in at least some Spanish-speaking countries... Source of endless nerd jokes.

mst · on April 6, 2021

Not to mention the (un)surprisingly low sales in spanish speaking countries of the car known as the Vauxhall Nova ...

brandmeyer · on April 6, 2021

As an American, I feel like I have to deliberately mis-pronounce 'coq' the theorem prover like 'coke' the soda.

ironmagma · on April 6, 2021

Same. It really doesn't help that their logo is skin-colored and the shape that it is...

pdw · on April 6, 2021

PostgreSQL used to be called Postgres. They renamed it when they added SQL support.

icedchai · on April 6, 2021

This is true. Many people get confused by the name. I've met several developers who refer to it as "Postgray" or some variation.

wosc · on April 6, 2021

Recently I saw "post grayskull" on twitter, that's now my favourite. ;)

dhritzkiv · on April 6, 2021

I've never heard that before, but I kind of like it. Sounds a bit nicer and is easier to say than 'Post Gress'. I'm still not sure how to pronounce the proper name, when it has 'SQL' as part of it. I think it's 'Post Gress Queue Ell', but it feels… bad.

rattray · on April 6, 2021

Renaming would be even worse!

skrebbel · on April 6, 2021

Seems to me that they could safely rename to Postgres without much downside.

rattray · on April 6, 2021

Among other potential issues, this would make it much harder to search for information related to the database. Starting out, it'd always make sense to google for eg "postgres ilike", but for new features you'd have to search for eg "NewNameSQL kindalike" (assuming a new ILIKE replacement called KINDALIKE comes along in pg15 aka newname3).

Even years in to the rename, newcomers to NewNameSQL would need to be told that it used to be called Postgres and that they should look for things related to that too.

Tools and code that refer to Postgres would all have to change their names, including those developed internally, open-source, closed-source, and no-longer-maintained. Not all would, and some would change the name and functionality at the same time.

It'd be chaos.

skrebbel · on April 7, 2021

I meant change the name from PostgreSQL to Postgres.

rattray · on April 7, 2021

Ah! So sorry for misunderstanding. Yes that sounds like a straightforward, good idea!

fdr · on April 6, 2021

I think this is a reasonable list of weaknesses, with a few quibbles. I guess since I've built parts of Heroku Postgres, and Citus Cloud, and now Crunchy Bridge...maybe I'd know.

On the other hand...on the whole...maintaining Postgres is probably among the cheapest pieces of software on which I have to do so, which is why the cloud business model works. Something less stable (in all senses of the word) would chew up too much time per customer.

rattray · on April 6, 2021

I'd be very curious to hear your quibbles!

fdr · on April 6, 2021

I don't think the post informs on Physical and Logical replication that well.

Most database systems of adequate budget and maturity implement both, for various reasons.

rattray · on April 6, 2021

Interesting, thanks. Yeah I was surprised to hear his skepticism of logical replication, but I've never operated it in production before. Curious for resources on that.

fdr · on April 6, 2021

You mean physical, re: skepticism. Just different things. Bulky for "CREATE INDEX" or "VACUUM", but also faster for a lot of things (no decoding) and able to more naturally deal with incomplete transactions. A good way to get a feel for that is to read how people compare using either one for proprietary databases that have both.

candiddevmike · on April 6, 2021

Only thing I really hate about PostgreSQL (probably not specific to it) is the lack of visibility into triggers. Give me jaeger style tracing for each trigger with per statement durations and I would be a very happy dev.

drewbug01 · on April 6, 2021

Your statement intrigued me, so I fired up the ol' googles and started looking to see if anyone had tried this. And within the first page of results I found a comment from you a few months ago saying the same thing! :)

This seems really interesting - at least for debugging (I worry that it would tank performance under load). Have you considered trying to work on it? My googling suggest that you seem rather interested in the idea! The postgres community is overall really welcoming to contributions (as is the OpenTelemetry community, hint hint).

candiddevmike · on April 6, 2021

I keep posting it on HN hoping a Postgres dev hears my plea :)

I've never programmed in C for anything serious, so I'm not sure where I'd even start. I _think_, based on my limited knowledge of postgres extensions, you'd have to bake the jaeger sampling into PG proper--I don't think extensions can intercept/inspect triggers.

sitkack · on April 6, 2021

I think the solution would be to add triggers to the dtrace probes.

https://www.postgresql.org/docs/current/dynamic-trace.html

agentultra · on April 6, 2021

Gosh I remember when Postgres didn't have any streaming replication. That was a huge pain point. You had to manually ship the WAL files to the standby and use a trigger for fail-over... and pray that your standby is actually up to date.

The code in Postgres is written in a pragmatic, no-nonsense style and overall I'm quite happy with it. I've been bitten at times by run-away toast table bloat and the odd query plan misfire. But over all it's been a really solid database to work with.

bpodgursky · on April 6, 2021

I'm surprised nobody is complaining about the complexity of the permission system.

I'm a generally smart guy, but setting up default permissions so that new tables created by a service user are owned by the application user... is shockingly complicated.

(I love using Postgres overall, and have no intention of going back to MySQL.)

pyrophane · on April 6, 2021

Yes! Postgres permissions are a huge pain to manage! You have to worry about table ownership, grants on existing objects, and default grants on new objects. It is a huge pain to manage.

joana035 · on April 6, 2021

My only complain about PostgreSQL is COUNT() being quite slow compared with MySQL.

Everything else is pretty good, MySQL has compressed tables, but in PostgreSQL the same amount of data already takes less space by default.

Pghero/pg_stat_statements are also very handy.

But "hate"? No, no hate here :)

dijit · on April 6, 2021

just so you're aware, COUNT() on mysql can lie.

Basically it's fetching metadata on the table, which can in some cases not be updated (yet), where as in pg it actually counts entries in the index.

jhgb · on April 6, 2021

Does it really count entries in the index? For example, in Firebird, it has to fetch rows because of row versioning (which happens in data pages, not in indices), and since PostgreSQL does versioning, too, I would have assumed that it's subject to the same limitation if it wants to return a correct answer for the current transaction.

davidrowley · on April 6, 2021

Index Only Scans are a thing in PostgreSQL, however, they may still need to visit the heap if the visibility map bit for the heap page indicates that the not all tuples on the heap page are visible to all transactions. When a high percentage of pages are marked as "allvisible" then Index Only Scans can give a good boost to performance.

jhgb · on April 6, 2021

So this "visibility map" is a little bit like Netfrastructure/Falcon in-memory versioning, then? I see.

davidrowley · on April 7, 2021

The visibility map is just 1 bit per page. Vacuum sets these bits to "1" when it sees that all tuples on the page are visible to all transactions. i.e. all tuple xmins are <= the oldest running transaction and none of the tuples have not been marked as deleted by any transaction yet. The visibility map bit will be unset when a new tuple is added to the page or an existing one is "deleted" or more accurately, has the xmax set with the deleting transaction's ID.

The visibility map is stored on-disk as a different fork of the filenode for the table. Two bits are actually stored per page, 1 for visibility and another to mark if the page only contains only frozen tuples. The frozen bit helps reduce the cost of vacuuming the table for transaction wraparound, which is also mentioned in the blog post.

The query planner does not count these bits to determine if it should perform an Index Only Scan vs an Index Scan. An approximate value is stored in pg_class.relallvisible.

aidos · on April 6, 2021

Ha, snap. Makes sense that it looks at the dirtiness of the visibility_map while planning.

aidos · on April 6, 2021

I believe it can speed it up by using index only scans along with the visibility_map which effectively tells it which entries are “current” in more broad strokes.

joana035 · on April 6, 2021

Doesn't this happens only when using sql_calc_found_rows?

meritt · on April 6, 2021

That's only for MyISAM which sees very little use today. The InnoDB engine on MySQL does a full row count and is also relatively slow.

rattray · on April 6, 2021

You may be interested in this technique, or some of the others in the article: https://www.citusdata.com/blog/2016/10/12/count-performance/...

EDIT: I'm also curious what version of Postgres you've experienced this on? Sounds like there may have been improvements to COUNT (DISTINCT in v11+

caioariede · on April 6, 2021

If you're not making a lot of writes, that may be good approach since AFAIK the counts will only be updated after the table is ANALYZE'd

drewbug01 · on April 6, 2021

> #1: Disastrous XID Wraparound

> Pretty much any non-trivial PostgreSQL install that isn’t staffed with a top expert will run into it eventually.

I agree that this landmine is particularly nasty - and I think it needs to be fixed upstream somehow. But I do think it is fairly well known at this point. Or at least, people outside of "top expert[s]" have heard of it and are at least aware of the problem by now.

dfox · on April 6, 2021

In normal use XID wraparound is not particularly problematic. For it to be an issue you have to either have application that for some reason assumes that XIDs monotonically increase (for example in its implementation of optimistic locking) or you have significantly larger issue with totally unmanaged MVCC bloat (caused by either not running vacuum at all or by really having ridiculous amount of ridiculously long-running transactions active at once).

But then there is interesting related issue in some client libraries: the XID is 32bit unsigned value and some libraries which transparently implement optimistic locking (eg. ODBC) interpret it as 32b signed value. I somewhat suspect that most people who had "significant production outage caused by XID wraparound" were in fact bitten by this or something similar.

oblio · on April 6, 2021

Maybe there are some core PostgreSQL hackers here:

I know this probably sounds silly but for the transaction ID thing, it does seem like a big deal, is it really insurmountable to make it a 64 bit value? It would probably push this problem up to a level where only very, very few companies would ever hit it and from a (huge) distance the change shouldn't be a huge problem.

jeltz · on April 6, 2021

There have been several discussion about this and I if I recall correctly the main issue is that this would bloat the tuple size even more (PostgreSQL already has a large per-tuple overhead). The most promising proposal I have seen is to have 64-bit XIDs but only store the lower 32-bits per tuple but have a per-page epoch for the upper bits.

tandr · on April 6, 2021

store it (something) like protobuf does - the smaller the number, the less bytes it takes?

natmaka · on April 6, 2021

One thing I hate about such articles is this "((use)) a managed database service" hint. Many if not most readers' data are confidential and storing them on a machine managed by unknown people seems foolish to me. Am I paranoid?

dewey · on April 6, 2021

> Am I paranoid?

Yes, because letting someone who knows what they are doing run your database is in most cases a better idea / more secure than doing it yourself if that's not your main business. If you pick a reputable provider there's not really an incentive for them to not keep your data confidential.

Example: All the open MongoDB instances because the owners expose them to the internet with a simple configuration mistakes.

latch · on April 6, 2021

Yes, but no.

I'm a staunch believer that multi-tenant hardware and managed services are _obvious_ no-gos for privacy reasons.

But, having done B2B where i had to deal with security procedures/questionnaires/documentation/checklists from large customers, no one else agrees.

watermelon0 · on April 6, 2021

From the point of view of those large customers, would you really trust people working in company X more than AWS/Azure/GCP? Especially since those customers already use other SaaS providers, that probably use at least on the big cloud providers.

There definitely are companies that employ great engineers, follow best practices, and can be on par with big cloud providers, but generally you shouldn't really expect that. In such cases, I'd rather see they leverage managed services, instead of deploying their own servers.

latch · on April 7, 2021

Both SaaS and multi-tenant hardware have massive surface area.

For multi-tenancy, it isn't about trusting AWS/Azure/GCP, it's about trusting everyone you're sharing hardware with.

Cloud products are difficult to setup (AWS in particular). If you can't setup PostgreSQL properly, why are we assuming you can setup AWS properly? Look at the recent Endgame pen testing tool (1)

(1) - https://news.ycombinator.com/item?id=26154038

natmaka · on April 8, 2021

This!

See also https://news.ycombinator.com/item?id=26725185

mixedCase · on April 6, 2021

You are, unless you have a very good reason to treat your cloud provider as a likely malicious actor, in which case good luck setting up your own data center.

mpolun · on April 6, 2021

I don't see how it's any different than using any hosting provider. It's probably worth encrypting your databases, but if you don't trust your hosting provider you're hosed -- managed service or not.

If your paranoia is justified (which it may be, depending on your needs), you need to host the machines in your own datacenter

natmaka · on April 8, 2021

> If your paranoia is justified (which it may be, depending on your needs), you need to host the machines in your own datacenter

Indeed! That's the reason why the author shouldn't write "((use)) a managed database service", à la "whatever you have in hand, screws or nails, use a hammer!"

ksec · on April 6, 2021

>While much of this praise is certainly well-deserved, the lack of meaningful dissent left me a bit bothered.

Had the same feeling when I was reading that thread. And has been for quite some time when the hype is over the top.

The problem is seemingly Tech is often a cult. On HN, mentioning MySQL is better at certain things and hoping Postgres improve will draw out the Oracle haters and Postgres apologist. Or they are titled in Silicon valley as evangelist.

And I am reading through all the blog post from the author and this [1] caught my attention. Part of this is relevant to the discussion because AWS RDS solves most of those shortcomings. What I didn't realise, were the 78% premium over EC2.

[1] RDS Pricing Has More Than Doubled

https://rbranson.medium.com/rds-pricing-has-more-than-double...

twic · on April 6, 2021

I have a kneejerk reaction against "there is something, anything at all, wrong with PostgreSQL" posts. I don't think it's because i'm in a cult.

I think it's because, despite real flaws, PostgreSQL is still the best all-round option, and still the thing i would most like to find when i move to a new company. Every post pointing out a flaw with PostgreSQL is potentially ammunition for an energetic but misguided early-stage employee of that company to say "no, let's not use PostgreSQL, let's use ${some_random_database_you_will_regret} instead".

I suppose the root of this is that i basically don't trust other programmers to make good decisions.

ksec · on April 6, 2021

That is also true as well. I guess my point is I want balance views. I dont want a one sided opinion pieces.

twic · on April 6, 2021

Me neither! As long as those balanced views are only posted on the secret internet, where mature and sophisticated programmers such as the two of us can read them.

vbezhenar · on April 6, 2021

One thing that I miss from PostgreSQL is transparent encryption. Some information systems require encryption of personal data by law. It's trivially implemented with commercial databases, so you can enable it and check a mark. Not so much with Postres.

robertlagrant · on April 6, 2021

This seems better done at the storage layer. Doing it in the database layer is a good idea if you're monetizing your database by CPU core, of course.

datavirtue · on April 6, 2021

This blog post answered a lot of questions related to the internals, allowing me to make a better (real) comparison between SQL Server and PostgreS.

For all of these issues he pointed out it is simply done differently in SQL Server and suffers none of the stated pitfalls. Well, you can't get the source code, and it is not free.

Tostino · on April 7, 2021

Yeah, and SQL Server has its own set of warts and tradeoffs. As there are with any design. Just the nature of these things. Lots of these issues are getting focus of some sort from the hackers.

MasiUnpleasant · on April 9, 2021

> In terms of relational databases, Galera Cluster’s group replication is also imperfect, but closer to the ideal.

As a longtime Galera user, I have to admit this "closer to the ideal" has nothing in common with reality. It fails, it loses data, quorum kills healthy nodes, transactions add significant latency. The more nodes you have, the lower performance and fault tolerance. One mysql node could literally endure triple of load, which could be deadly for Galera cluster of 3 nodes. Also, it rollbacks transactions silently.