This is an engineering conversation around pg_lake - a new OSS Postgres extension that lets you query and manage Iceberg tables directly from Postgres.
Marco Slot, who has EXTENSIVE experience, shares with us various engineering internals, like:
• how pg_lake makes analytics (literally) 100x faster
• why Postgres is architecturally terrible at analytical queries (and how vectorized execution fixes this)
• how (and why) pg_lake intercepts query plans and delegates parts of the query tree to DuckDB
• Marco's hard-won experience through a decade+ career in Postgres
• versatility as the real moat of Postgres
• the practical differences in engineering b/w OLTP and OLAP
• and a lot more
TIMELINE
0:02 What is pg_lake?
2:23 Postgres' 100x slower problem and columnar storage experiments they had to make Postgres fast for analytics
6:00 practical examples and internals
16:20 perf internals - vectorized execution & CPU Optimization
23:00 pg_lake architecture (why DuckDB isn't embedded) and the connection-per-process issue
29:16 how pg_lake intercepts the query plan tree and delegates parts to DuckDB
41:09 Iceberg catalogs
48:24 postgres to iceberg ingestion patterns (and pg_incremental)
53:40 Marco's (long) career: early AWS, Citus, Microsoft, Crunchy Data & Snowflake
1:04:20 Marco's observations around the merging between OLTP and OLAP (and the subtle dev differences there)
1:15:30 reverse ETL
1:33:08 Iceberg as the TCP/IP for tables
1:35:00 Marco's thoughts on the "Just Use Postgres" fever
Marco
You can find Marco on:
LinkedIn: https://www.linkedin.com/in/marcoslot/
GitHub: https://github.com/marcoslot
Transcript
Feed this into your favorite AI for summarization, or to prompt it specific questions:
https://gist.githubusercontent.com/stanislavkozlovski/65c037a8963e49d8121b25003ec94715/raw/4f51f5dcd562b42e8d511b8bc58f0fff6ad5302e/foo.md
OTHER PLATFORMS
Watch on Spotify here
General RSS here




