
Tag: Duckdb
2 entries found

DuckDB and httpfs behind a proxy: the secret nobody tells you
The problem: httpfs ignores your environment variables
If you work with DuckDB and the httpfs extension to read remote Parquet files, CSVs from S3, or any HTTP resource, you probably assume that the HTTP_PROXY and HTTPS_PROXY environment variables work just like every other tool. Curl respects them. wget respects them. Python requests respects them. Node.js respects them.
DuckDB does not.
I ran into this while working in a corporate environment with a mandatory proxy. I had a script reading Parquet files from Google Cloud Storage using httpfs, and it simply would not work. No clear error, no descriptive timeout, just silence. Meanwhile, a curl to the same resource with the same environment variables returned data without issue.

DuckDB: File Formats and Performance Optimizations
Lately I’ve been working quite a bit with DuckDB, and one of the things that interests me most is understanding how to optimize performance according to the file format we’re using.
It’s not the same working with Parquet, compressed CSV, or uncompressed CSV. And the performance differences can be dramatic.
Let’s review the key optimizations to keep in mind when working with different file formats in DuckDB.
Parquet: Direct Query or Load First?
DuckDB has advanced Parquet support, including the ability to query Parquet files directly without loading them into the database. But when should you do one or the other?




