DuckDB and httpfs behind a proxy: the secret nobody tells you

The problem: httpfs ignores your environment variables

If you work with DuckDB and the httpfs extension to read remote Parquet files, CSVs from S3, or any HTTP resource, you probably assume that the HTTP_PROXY and HTTPS_PROXY environment variables work just like every other tool. Curl respects them. wget respects them. Python requests respects them. Node.js respects them.

DuckDB does not.

I ran into this while working in a corporate environment with a mandatory proxy. I had a script reading Parquet files from Google Cloud Storage using httpfs, and it simply would not work. No clear error, no descriptive timeout, just silence. Meanwhile, a curl to the same resource with the same environment variables returned data without issue.

Why this happens

DuckDB’s httpfs extension uses cpp-httplib internally for HTTP connections. Unlike curl or libcurl, this library does not automatically read system environment variables for proxy configuration. It is a design decision (or a gap, depending on how you look at it).

This has been reported in multiple GitHub issues:

  • Issue #3836: http_proxy and https_proxy env vars are ignored when installing extensions
  • Issue #6064: Request for proxy and custom SSL cert support in httpfs
  • Discussion #5944: Community discussion on the same topic

The situation has partially improved. Since version 0.10.x, DuckDB supports explicit proxy configuration. But the key word is explicit. It does not do it automatically by reading the environment.

The solution: CREATE SECRET or SET

DuckDB provides two mechanisms to configure the proxy manually.

CREATE SECRET http_proxy (
    TYPE http,
    HTTP_PROXY 'http://proxy.company.com:8080'
);

If your proxy requires authentication:

CREATE SECRET http_proxy (
    TYPE http,
    HTTP_PROXY 'http://proxy.company.com:8080',
    HTTP_PROXY_USERNAME 'username',
    HTTP_PROXY_PASSWORD 'password'
);

Option 2: SET (pragmas)

SET http_proxy = 'http://proxy.company.com:8080';
SET http_proxy_username = 'username';
SET http_proxy_password = 'password';

Both options work, but CREATE SECRET is cleaner because it groups all configuration in a single statement and is consistent with how DuckDB manages S3 and GCS credentials.

Automating it: reading environment variables

What you really need in a production environment is for your application to read environment variables and configure DuckDB automatically. Here is a Node.js example using @duckdb/node-api:

const proxy = process.env.HTTPS_PROXY
  || process.env.https_proxy
  || process.env.HTTP_PROXY
  || process.env.http_proxy;

if (proxy) {
  const proxyUrl = new URL(proxy);
  const proxyBase = `${proxyUrl.protocol}//${proxyUrl.hostname}:${proxyUrl.port || 3128}`;

  let secretSQL = `CREATE SECRET http_proxy (TYPE http, HTTP_PROXY '${proxyBase}'`;

  if (proxyUrl.username && proxyUrl.password) {
    secretSQL += `, HTTP_PROXY_USERNAME '${decodeURIComponent(proxyUrl.username)}'`;
    secretSQL += `, HTTP_PROXY_PASSWORD '${decodeURIComponent(proxyUrl.password)}'`;
  }
  secretSQL += ');';

  await connection.run(secretSQL);
}

And in Python:

import os
import duckdb

con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")

proxy = os.environ.get('HTTPS_PROXY') or os.environ.get('HTTP_PROXY')
if proxy:
    con.execute(f"SET http_proxy = '{proxy}'")

# Now it works
con.execute("SELECT * FROM read_parquet('https://example.com/data.parquet')")

Watch out for special characters

There is an additional gotcha documented in Issue #14279: if your proxy password contains special characters that are URL-encoded in the environment variable (for example @ as %40), DuckDB passes them as-is without decoding, and authentication fails.

The solution is to decode before passing them to DuckDB (as I do in the Node.js example with decodeURIComponent) or to use passwords without special characters.

It also affects extension installation

This problem does not only affect HTTP queries. It also affects INSTALL httpfs and all other extensions. If you are behind a proxy and cannot install extensions, the solution is the same: configure the proxy before attempting the installation.

SET http_proxy = 'http://proxy.company.com:8080';
INSTALL httpfs;
LOAD httpfs;

Or alternatively, download extensions manually and load them from local disk.

Kubernetes and Docker environments

In corporate Kubernetes deployments, it is common for pods to inherit HTTP_PROXY and HTTPS_PROXY variables from cluster configuration. Every application inside the pod respects them… except DuckDB.

In my case, the solution was to add automatic proxy detection in the service’s DuckDB client, right after loading extensions and before executing any query that accesses remote resources. It is a simple but necessary pattern you should implement if you use DuckDB in proxy environments.

Final thoughts

DuckDB is an extraordinary tool. The speed at which you can analyze Parquet, CSV, or even remote PostgreSQL databases is impressive. But this proxy detail is one of those things that makes you lose hours until you find the root cause.

The frustrating part is not that it does not support proxies – it does. It is that it does not follow the universal convention of reading environment variables that the entire Unix ecosystem has respected for decades. This is a solved problem in virtually every HTTP library on the planet, and DuckDB has chosen its own path.

That said, once you know where the problem lies, the solution is straightforward. A CREATE SECRET and back to work.