what makes it better?

dfsegoat · on March 24, 2020

A few things I really like about pgloader (which to be clear, uses the COPY protocol):

- Handles MySQL -> pg; MSSQL -> pg; And pg -> pg copy

- Very fast (lisp/"real threading")

- Syncs schemas effectively (this says data only)

- Has it's own DSL for batch jobs: specify tables to include/exclude, renaming them on the fly in dest, and cast dtypes between src and dest if needed. etc

pampa · on March 24, 2020

I tried it once, could not make it work.

I tried migrating a smallish mysql database (~10Gb) to postgres and it always crashed with a weird runtime memory error. Reducing the number of threads or doing it table by table didnt help.

chucky_z · on March 25, 2020

I found compiling with Clozure CL instead of SBCL leads to way better memory allocation/usage.

It's worth getting it going, it's one of the greatest tools in migrating things into and out of pgsql I've ever used. If you have pgsql in your pipeline, get it working.

I used this and debezium to really make my ETL pipeline absolutely bulletproof.

dfsegoat · on March 26, 2020

> I used this and debezium

This looks very cool, was not aware of it:

"Debezium is a set of distributed services that capture row-level changes in your databases so that your applications can see and respond to those changes."

https://debezium.io/documentation/faq/#what_is_debezium

chucky_z · on March 28, 2020

It has a lot of required components, however I ran most of them in a single-component setup (e.g.: 1 zookeeper server , 1 kafka server; using only their provided containers) and got extremely far (think 1 billion rows a day ingested, then transferred to pgsql and s3) with some extra overhead still.

folkhack · on March 24, 2020

Just offering an anecdotal experience to counter, not saying you or your experience is wrong...

I've used pgloader multiple times because I'm a huge Postgres evangelist for that exact use case multiple times without issue. Honestly - it's a favorite tool in my toolbox.

phoe-krk · on March 24, 2020

> weird runtime memory error.

If it's "heap exhausted, game over", there's a solution for that - you need to tell pgloader to allocate more memory for itself.

massaman_yams · on March 24, 2020

This uses COPY for export & import:

https://github.com/ankane/pgsync/blob/master/lib/pgsync/tabl...

dfsegoat · on March 24, 2020

That was noted in the first line of my comment FWIW.

massaman_yams · on March 25, 2020

Wasn't clear you were saying it uses copy just like pgsync uses copy. I read it as you making the distinction between the two.