AgentHouse: When databases start speaking our language : AntonioCortes.com

A few months ago, when Anthropic launched their MCP (Model Context Protocol), I knew we’d see interesting integrations between LLMs and databases. What I didn’t expect was to see something as polished and functional as ClickHouse’s AgentHouse so soon.

I’m planning to test this demo soon, but just reading about it, the idea of being able to ask a database questions like “What are the most popular GitHub repositories this month?” and getting not just an answer, but automatic visualizations, seems fascinating.

What is AgentHouse?

AgentHouse is basically a playground where you can talk to databases using natural language. It’s a demo that ClickHouse has made available at llm.clickhouse.com to show how their database can integrate with LLMs through the MCP protocol.

The idea originated internally at ClickHouse when the integration team created a small demo connecting Claude Sonnet with a ClickHouse database. What began as a quick proof of concept became “Dwaine” (Data Warehouse AI Natural Expert), an internal assistant that helps sales, operations, product, and finance teams get insights without needing to write SQL.

The technical components

The AgentHouse architecture is interesting because it elegantly combines several technologies:

Claude Sonnet as the brain

They use Anthropic’s Claude Sonnet, which proves especially good at understanding complex contexts and reasoning about structured data. From what I’ve seen, Sonnet seems to be one of the best options for generating SQL and interpreting query results.

LibreChat as the interface

For the UI, they chose LibreChat, an open-source project that provides a clean interface for working with LLMs. It’s a smart choice because it allows natural conversations and the creation of visual artifacts (charts, tables) directly in the interface.

ClickHouse MCP Server: the secret

The most interesting component is the specific MCP server they’ve developed for ClickHouse. This server acts as a bridge between the database and the LLM, providing:

Efficient data transfer between ClickHouse and LLMs
Intelligent optimization of SQL queries generated by the LLM
Context management for stateful data conversations
Secure and controlled access to database resources

Available datasets

One of the things that catches my attention most is the variety of datasets they’ve included. They have 37 different datasets covering very diverse use cases:

# Some examples available:
- github: GitHub activity data, updated every hour
- pypi: Python package downloads - over 1.3 trillion rows
- hackernews: Hacker News posts and comments
- stackoverflow: Stack Overflow questions and answers
- nyc_taxi: NYC taxi trip data
- opensky: OpenSky Network aviation data

What the experience promises

According to the documentation and demos I’ve seen, the behavior seems quite consistent across different types of queries:

Simple query: “What are the most popular programming languages on GitHub?”

Complex query: “Show me the evolution of Python package downloads related to machine learning in the last 6 months”

Query with visualization: “Create a chart showing the distribution of property prices in London by district”

What seems most impressive is that it not only generates the correct SQL but also interprets the results and creates appropriate visualizations automatically.

The MCP protocol in action

From a technical perspective, the most interesting thing about AgentHouse is seeing the MCP protocol working in a real environment. MCP allows LLMs to interact with external resources in a safe and structured way, in this case, ClickHouse databases.

The implementation handles several critical aspects:

Context management: The LLM maintains context about the database schema and previous queries
Query optimization: The MCP server can optimize the SQL queries generated by the LLM
Security: Controlled access to data with appropriate permissions
Error handling: Intelligent interpretation of SQL errors and correction suggestions

Reflections on the future

AgentHouse represents what I believe will be the future of data interaction. The idea of having to learn SQL, understand complex schemas, and manually build dashboards is starting to seem… outdated.

In my experience working with non-technical teams, one of the biggest barriers to efficient data use has always been technical complexity. Tools like AgentHouse could eliminate that barrier completely.

Practical use cases

I see several scenarios where this could be especially useful:

For product teams: “How has user engagement evolved in recent weeks?”

For marketing: “Show me the conversion funnel by acquisition channel”

For operations: “What are the most common errors in our application logs?”

For exploratory analysis: “Search for anomalous patterns in last month’s transaction data”

Limitations and considerations

Although impressive, there are some things to keep in mind:

Accuracy: Although Claude Sonnet is very good, it can occasionally generate incorrect queries with complex data
Security: In a real environment, you’d need to implement more granular access controls
Performance: For very large datasets, LLM-generated queries may not be the most efficient
Context: The LLM can lose context in very long conversations

How to try it?

If you want to experiment with AgentHouse:

Go to llm.clickhouse.com
Log in with your Google account
Ask “What datasets do you have available?” to get started
Experiment with natural language queries

My recommendation would be to start with simple questions and gradually increase complexity to understand the system’s capabilities.

AgentHouse is a perfect example of how emerging technologies can be combined to create genuinely useful experiences. It’s not just an impressive technical demo, but a vision of the future of how we’ll interact with our data. It’s definitely on my list of things to try soon.

NOTE: If you’re thinking about implementing something similar in your organization, the ClickHouse MCP server code is available on GitHub, which is a great starting point.

Two protocols, two philosophies

In recent months, two protocols have emerged that will change how we build AI systems: Agent2Agent Protocol (A2A) from Google and Model Context Protocol (MCP) from Anthropic. But here’s the thing: they don’t compete with each other.

In fact, after analyzing both for weeks, I’ve realized that understanding the difference between A2A and MCP is crucial for anyone building AI systems beyond simple chatbots.

The key lies in one question: Are you connecting an AI with tools, or are you coordinating multiple intelligences?

Latest Posts

Claude Code with LSP: from searching text to understanding code

Ghost Jobs: the economy built on positions that don't exist

DuckDB and httpfs behind a proxy: the secret nobody tells you

How PostgreSQL Estimates Your Queries (And Why It Sometimes Gets It Wrong)

Analyzing Container Filesystem Isolation for Multi-Tenant Workloads

The Software Development Renaissance with AI Agents

AgentHouse: When databases start speaking our language

What is AgentHouse?

The technical components

Claude Sonnet as the brain

LibreChat as the interface

ClickHouse MCP Server: the secret

Available datasets

What the experience promises

The MCP protocol in action

Reflections on the future

Practical use cases

Limitations and considerations

How to try it?

Comments

Latest Posts

AI Coding Agents: Rules, Commands, Skills, MCP and Hooks Explained

MCPHero: The Bridge Between MCP and Traditional AI Libraries

What is MCPHero?

Local AI on Raspberry Pi 5 with Ollama: Your private AI server at home

How to build an agent: from idea to reality

A2A vs MCP: Tools or Agents? The difference that will change how we build AI systems

Two protocols, two philosophies

Context Engineering: Prompt Engineering Has Grown Up

From prompt engineering to context engineering