Local AI on Raspberry Pi 5 with Ollama: Your private AI server at home
5 min read

Local AI on Raspberry Pi 5 with Ollama: Your private AI server at home

1053 words

A few months ago I came across something that really caught my attention: the possibility of having my own “ChatGPT” running at home, without sending data anywhere, using only a Raspberry Pi 5. Sounds too good to be true, right?

Well, it turns out that with Ollama and a Pi 5 it’s perfectly possible to set up a local AI server that works surprisingly well. Let me tell you my experience and how you can do it too.

What is Ollama and why did I like it so much?

Ollama is an open source tool that allows you to run large language models (LLMs) directly on your machine, without depending on external services. What I like most is that all your data stays at home - no sending sensitive information to remote servers.

The Raspberry Pi 5, especially the 8GB RAM version, turns out to be the perfect companion for this type of project. It consumes little energy, is inexpensive, and on top of that you can leave it running 24/7 without problems.

The advantages I value most

  • Total privacy: Everything is processed locally
  • No internet dependency: Once configured, it works offline
  • Minimal cost: No subscriptions or usage fees
  • Complete personalization: You can choose exactly which models to use

What you need to get started

The setup is quite simple:

  • A Raspberry Pi 5 (I strongly recommend the 8GB version)
  • Sufficient storage - some models take up several GB
  • Raspberry Pi OS Bookworm 64-bit
  • Internet connection for initial installation
  • A little patience for the initial configuration

Important: Make sure to use the 64-bit version of the operating system. It’s essential.

Step-by-step installation

The installation is much simpler than I expected. Ollama provides a script that automates the entire process:

# Update the system
sudo apt update && sudo apt upgrade

# Install curl if you don't have it
sudo apt install curl

# Download and install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Verify the installation
ollama --version

And that’s it. Seriously, it’s that simple.

Choosing the right model

Here comes the interesting part: choosing what “brain” you want for your AI. I’ve tested several and let me tell you my experience:

TinyLlama - The sprinter

ollama run tinyllama

It’s the lightest (1.1B parameters) and fastest. Perfect for initial tests and basic chatbots. The responses aren’t the most elaborate, but the speed is impressive.

Phi3 - The balanced one

ollama run phi3

Developed by Microsoft, it offers a good balance between speed and response quality. It’s my favorite option for daily use on the Pi 5.

Llama3 - The brainiac

ollama run llama3

It’s the most advanced, but also the most demanding. The responses are excellent, but you need patience. Only recommended if you have the 8GB version and don’t mind waiting a bit longer.

Deepseek-R1 - The specialist

ollama run deepseek-r1:1.5b

It comes in different sizes. The 1.5B version works well on the Pi 5 and is quite competent.

My recommendation: start with Phi3. It’s the best compromise between functionality and performance.

Beyond the terminal

Once you have Ollama running, you can take it to the next level by installing a web interface. There are several options available, but personally I like using Docker to keep everything organized:

# If you don't have Docker installed
curl -sSL https://get.docker.com | sh
sudo usermod -aG docker $USER

# After restarting the session, you can use a WebUI
# (there are several projects on GitHub specific for Pi 5)

With a web interface, you can access your AI from any device on your local network. It’s much more comfortable.

The API that opens a world of possibilities

What really excited me about Ollama is its integrated HTTP API. You can make queries programmatically:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "phi3",
    "prompt": "What is the capital of Australia?",
    "stream": false
  }'

This opens a bunch of possibilities: automation, integration with other systems, creating custom bots… The options are endless.

Real use cases I’ve tried

Offline personal assistant

Perfect for quick queries without sending data out of the house.

Document analysis

You can process and analyze texts locally, ideal for sensitive information.

Task automation

Combined with scripts, you can automate email responses, text classification, etc.

Educational experiments

Excellent for learning about AI without additional costs.

Practical optimization tips

Monitor RAM usage: If you notice slowness, try smaller models.

Use fast storage: A good microSD or better yet, an external SSD, makes a difference.

Control temperature: The Pi 5 can heat up with heavy models. A fan doesn’t hurt.

Update regularly: Both Ollama and the models update frequently with improvements.

Common problems I encountered

The system runs out of memory

Solution: Switch to a smaller model or close other applications.

Very slow responses

Solution: It’s normal with large models. Patience or try lighter models.

Architecture error

Solution: Verify you’re using Raspberry Pi OS 64-bit.

My experience after several months

I’ve been using this setup for several months and I’m genuinely impressed. No, it’s not as fast as ChatGPT, but for many use cases it’s perfectly valid. And the peace of mind of knowing my data doesn’t leave home is priceless.

The energy consumption is minimal, so I have it running 24/7. When I need to make a quick query or analyze a document, I simply open the web interface from any device in the house.

Is it worth it?

For me, absolutely yes. If you value privacy, like experimenting with technology, or just want to have your own AI server without depending on third parties, this combination is perfect.

Don’t expect miracles in terms of speed, but a solid and very satisfactory experience. And the best of all: it’s yours, completely.

Next steps

Once you have everything running, I recommend exploring:

  • Integration with LangChain for more complex workflows
  • Creating custom bots using the API
  • Home task automation
  • Experimenting with different models based on your needs

The Ollama community is very active, and constantly new models and improvements appear. It’s an exciting time to experiment with local AI.

Do you dare to set up your own AI server? If you do, I’d love to know how it goes. And if you have doubts, you know where to find me.


Have you tried Ollama on your Raspberry Pi? Which models work best for you? Share your experience in the comments.

Comments

Latest Posts

5 min

945 words

Creating long, well-founded articles has traditionally been a complex task requiring advanced research and writing skills. Recently, researchers from Stanford presented STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking), a revolutionary system that automates the Wikipedia-style article writing process from scratch, and the results are truly impressive.

In this detailed analysis, we’ll explore how STORM is transforming the way we think about AI-assisted writing and why this approach could forever change the way we create informative content.

3 min

555 words

Amazon has taken an important step in the world of artificial intelligence with the launch of S3 Vectors, the first cloud storage service with native support for large-scale vectors. This innovation promises to reduce costs by up to 90% for uploading, storing, and querying vector data.

What are vectors and why do we care?

Vectors are numerical representations of unstructured data (text, images, audio, video) generated by embedding models. They are the foundation of generative AI applications that need to find similarities between data using distance metrics.

5 min

949 words

Lately, there’s been talk of AI agents everywhere. Every company has their roadmap full of “agents that will revolutionize this and that,” but when you scratch a little, you realize few have actually managed to build something useful that works in production.

Recently I read a very interesting article by LangChain about how to build agents in a practical way, and it seems to me a very sensible approach I wanted to share with you. I’ve adapted it with my own reflections after having banged my head more than once trying to implement “intelligent” systems that weren’t really that intelligent.

5 min

911 words

A few months ago, when Anthropic launched their MCP (Model Context Protocol), I knew we’d see interesting integrations between LLMs and databases. What I didn’t expect was to see something as polished and functional as ClickHouse’s AgentHouse so soon.

I’m planning to test this demo soon, but just reading about it, the idea of being able to ask a database questions like “What are the most popular GitHub repositories this month?” and getting not just an answer, but automatic visualizations, seems fascinating.

5 min

866 words

In my years developing software, I’ve learned that the best tools are those that eliminate unnecessary friction. And LM Studio has just taken a huge step in that direction: it’s now completely free for enterprise use.

This may sound like “just another AI news item,” but for those of us who have been experimenting with local models for a while, this is an important paradigm shift.

The problem that existed before

Since its launch in May 2023, LM Studio was always free for personal use. But if you wanted to use it in your company, you had to contact them to obtain a commercial license. This created exactly the type of friction that kills team experimentation.

6 min

1248 words

A few years ago, many AI researchers (even the most reputable) predicted that prompt engineering would be a temporary skill that would quickly disappear. They were completely wrong. Not only has it not disappeared, but it has evolved into something much more sophisticated: Context Engineering.

And no, it’s not just another buzzword. It’s a natural evolution that reflects the real complexity of working with LLMs in production applications.

From prompt engineering to context engineering

The problem with the term “prompt engineering” is that many people confuse it with blind prompting - simply writing a question in ChatGPT and expecting a result. That’s not engineering, that’s using a tool.