
Tag: Databases
2 entries found

DuckDB and httpfs behind a proxy: the secret nobody tells you
The problem: httpfs ignores your environment variables
If you work with DuckDB and the httpfs extension to read remote Parquet files, CSVs from S3, or any HTTP resource, you probably assume that the HTTP_PROXY and HTTPS_PROXY environment variables work just like every other tool. Curl respects them. wget respects them. Python requests respects them. Node.js respects them.
DuckDB does not.
I ran into this while working in a corporate environment with a mandatory proxy. I had a script reading Parquet files from Google Cloud Storage using httpfs, and it simply would not work. No clear error, no descriptive timeout, just silence. Meanwhile, a curl to the same resource with the same environment variables returned data without issue.

How PostgreSQL Estimates Your Queries (And Why It Sometimes Gets It Wrong)
Every query starts with a plan. Every slow query probably starts with a bad one. And more often than not, the statistics are to blame. But how does it really work?
PostgreSQL doesn’t run the query to find out — it estimates the cost. It reads pre-computed data from pg_class and pg_statistic and does the maths to figure out the cheapest path to your data.
In the ideal scenario, the numbers read are accurate, and you get the plan you expect. But when they’re stale, the situation gets out of control. The planner estimates 500 rows, plans a nested loop, and hits 25,000. What seemed like an optimal plan turns into a cascading failure.




