Issue 54 - Build "YouGPT" with Open WebUI

        June 30, 2026, 9:22 p.m.

    Issue 54 - Build "YouGPT" with Open WebUI
    Keep your content and your thoughts to yourself as you configure a knowledge base and query it with Open WebUI and local models.

        Code, Content, and Career with Brian Hogan

        The June issue is only going to cover a single topic: setting up a local LLM to surface your own work. Load up your blog posts, tutorials, notes, and other works into a pipeline you control, without having to hand things over to for-profit models. Keep your content and your thoughts to yourself.
But first, here are a couple of other topics you might find interesting.
Things To Explore

mq is a command-line tool that processes Markdown using a syntax similar to jq.
Auto Router from OpenRouter automatically selects the best model for your prompt. If you're using OpenRouter, this can be a good way to ensure that you don't use expensive models for trivial tasks.

Query Your Own Writing with Open WebUI Knowledge Bases
In Issue 51, you set up Ollama and ran models from the command line. In Issue 52, you connected a model to OpenCode and ran your own local coding agent. In this issue, you'll set up Open WebUI, a self-hosted, browser-based interface to your models. It gives you conversation history, a model picker, and saved system prompts, all running on your own hardware. But you're going to set it up for a specific reason: to create a model that uses your own writing.
Open WebUI has a Knowledge feature, a built-in Retrieval-Augmented Generation (RAG) pipeline. This lets a model answer questions using your own writing instead of being limited to its training data. 
Building a RAG system yourself means putting together a pipeline that chunks your text, generates embeddings, stores them in a vector database, and handles retrieval. Open WebUI does all this for you once you configure a handful of settings and upload your content.
In this tutorial, you'll install Open WebUI with Docker, connect it to Ollama models, and configure a knowledge base from your own writing. Then you'll create your own model that searches your knowledge base and cites the sources it used.
This tutorial assumes you're using Ollama already, which Issue 51 covers in detail. It also assumes you have Docker available, since that's the easiest way to set up Open WebUI. If you're on macOS, follow my guide to setting up Docker with Colima. 
Finally, you'll need a machine powerful enough to run larger models. A Mac with 32GB of unified memory will be enough, but 48GB would be better. The more you can load into video memory, the better performance you'll get.
Install a model that can call tools and summarize results
Open WebUI uses agentic tool calling for knowledge bases. It gives the model a set of tools to search your writing for answers. This means the model you use has to support tool calling and be capable enough to use those tools reliably.
In this tutorial, you'll use Qwen 3.5 35B. This is a mixture-of-experts (MoE) model. It has 35 billion parameters in total but activates only about 3 billion for each token, so it's relatively fast. The tradeoff is that its reasoning is lighter than a dense model of the same total size. For a knowledge base, you shouldn't notice the tradeoff because retrieval finds the right content and the model mostly summarizes and cites what it's handed.
Open a Terminal and use ollama pull to download the model.
On Apple Silicon, pull the MLX build, which runs faster on Metal:
$ ollama pull qwen3.5:35b-mlx

On other hardware, pull qwen3.5:35b.
Plan for at least 32 GB of memory for the model, with 48 GB or more giving comfortable headroom. The full model loads into memory even though only part of it runs per token. Qwen 3.5:35b is around 24 GB at a 4-bit quantization. You'll also need room for the context window, the embedding model you'll run when you import your data, and your operating system itself.
Install Open WebUI
You can install Open WebUI using Docker or through Python's package managers. You'll use Docker to run Open WebUI because it's the quickest and most consistent method. The Docker image bundles Open WebUI with everything it needs.
With Docker running, install and start Open WebUI with the following command:
$ docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Here's what the options mean:

-p 3000:8080 makes Open WebUI at http://localhost:3000.
--add-host=host.docker.internal:host-gateway lets the container reach the Ollama service on your host. This connects Open WebUI to the models you pulled.
-v open-webui:/app/backend/data configures a Docker volume which saves your chat history, settings, and knowledge bases. Leave it out, and you'll lose everything when the container restarts.
--restart always ensures Open WebUI restarts if it crashes.
The :main image is the standard one. There's also an :ollama image that bundles Ollama, but you already run Ollama natively, so use :main.

Docker pulls the image on the first run. Once it's finished downloading, open http://localhost:3000 in your browser and create your first account. The first account you create becomes the administrator. You can configure Open WebUI for multiple users, each with their own accounts and chat histories.
Open WebUI finds your Ollama setup automatically. Click the model dropdown at the top of a new chat, and you'll see the models you have installed previously, including the qwen3.5:35b model: 

You can try chatting with your model to ensure it works, but it'll only know about information in its training data. 
Now that Open WebUI is running, configure it to upload your content.
Configure document chunking, embedding, and retrieval
Before you can upload and configure your knowledge base, you'll configure how Open WebUI splits your writing into chunks, turns those chunks into searchable data, and decides how many chunks to return for each search. These settings apply to every knowledge base you create. 
In the Admin Panel, open Settings and choose Documents. You'll start configuring the General settings:

Leave the Content Extraction Engine on the default setting. The built-in extractor reads text and Markdown locally, which covers most writing. If your archive is mostly PDFs, the local Apache Tika or Docling engines extract them more cleanly. Avoid cloud-based extractors if you want to keep everything private, as those extractors send your files out.

Set the Text Splitter to Token, so the chunk size is counted in tokens. 

Set Chunk Size to 1500 and Chunk Overlap to 200.  These options control how Open WebUI slices your writing before embedding. These values are good for prose. The overlap keeps sentences from getting cut off at chunk boundaries. 

Now configure the embedding model:

Set the Embedding Model Engine to Ollama. 
Choosing Ollama reveals two more fields: an Ollama base URL, already pointed at your Ollama service (the same host.docker.internal address from the install step), and an API key field. Leave the URL as it is, and leave the API key blank, since local Ollama doesn't use one.

Type nomic-embed-text into the Embedding Model field. The embedding model gets baked into your index when you upload, so if you change it later, you have to re-embed everything. 

Open a Terminal window and pull the model nomic-embed-text, the embedding model Open WebUI's own documentation recommends for an Ollama setup.
bash
$ ollama pull nomic-embed-text

Finally, configure the Retrieval section:

Turn on Hybrid Search. By default, Open WebUI searches by meaning, which works well for finding related ideas, but it can miss exact terms like names, product titles, and the vocabulary or terms you've invented. Hybrid search adds keyword matching alongside the semantic search, so it's likely to surface more.

Turning on Hybrid Search reveals a Reranking Model field. Set it to bge-reranker-v2-m3. The reranker reads through the chunks that came back and reorders them so the most relevant ones land at the top, where the model pays the most attention. It runs on the CPU, which is fine for a personal archive. 

Set Top K to 10. Top K is the number of chunks each search returns to the model, and the default is low. This Top K controls retrieval. It is not the same as Ollama's top_k sampling parameter, which controls how the model picks words as it writes. Leave the relevance threshold at 0 for now.

You'll also see a RAG Template field on this page. Leave it on its default. It shapes the older path where Open WebUI injects retrieved text directly into the prompt, not the tool-based search your model will use. Save your settings. You're now ready to upload your content.
Create the knowledge base
Now import your writing. Go to Workspace, then Knowledge, and click the New Knowledge button.
Fill in the What are you working on field with a name for the knowledge base, and enter details about what you want to accomplish. "My newsletter issues, blog posts, and tutorials" works better than "stuff."

Open the knowledge base you created and use the Add Content menu to import your writing. Choose one of the following methods to add some content:

Upload File handles individual documents. 
Upload Directory imports a whole folder at once, which helps if your posts are stored as Markdown files. Open WebUI embeds each file as you add it, using the model and chunk settings you configured a moment ago. Large imports take a while, and the embedding runs in your browser session, so don't navigate away from the page while it's working.
Sync Directory uploads a directory and keeps track of what changed between your directory and the knowledge base. When you choose this option, Open WebUI adds or removes files from the knowledge base. 

You can create directories within your knowledge base if you want to organize its contents, but the Sync Directory setting doesn't support this. It instead replaces everything with the contents of the source directory you specify.
When the files show as processed, your archive is searchable. Now you can attach your knowledge base to a model and configure a prompt.
Create a model and attach your knowledge
A model in Open WebUI is a wrapper around a base model. It lets you add your own system prompt, a context size, tool settings, and knowledge bases. This model then shows up in the model selection when you start new chats.
Go to Workspace, then Models, and click Add New Model.
Enter a name for the model and select the qwen-3.5-35b model you pulled at the start as the base model.

Open the Advanced Params section. 

Set Function Calling to Native. Native is what lets the model call the knowledge tools at all. You can also set Native once for every model under Admin Panel, then Settings, then Models, but setting it here keeps it scoped to this model only. 

Set num_ctx to 32768. Open WebUI sends num_ctx to Ollama on each request, so you don't need to create a custom model variant or worry about a global setting. You don't need a larger context than 32k: with native retrieval, the model pulls in one search's worth of results at a time, not your whole archive, and 32k leaves room for several rounds of searching plus the answer.

Find the Knowledge section and select the knowledge base you created. This scopes the model's knowledge tools to your writing.
Confirm the model can reach its knowledge tools. Under Capabilities, make sure the Builtin Tools capability is on and the Knowledge Base tool category is enabled. These should be set for you by default, but double-check that they are before moving on.
Now set the system prompt. Your prompt needs to tell the model to search your knowledge base every time and tell it what tools to use to retrieve your content. Here's one to start with:
You are a research assistant with access to my own writing: newsletter issues, blog posts, book drafts, and personal notes, stored in an attached knowledge base.

Always search the knowledge base before you answer. For every question, without exception, first call `list_knowledge` to see what is attached, then call `query_knowledge_files` to search it. Do this even when the question looks like general knowledge and even when you are confident you already know the answer. I have written about many topics, including general technical ones, and I want my own writing surfaced whenever it exists. Do not decide on your own that a question is too general to need a search.

After searching, ground your answer in the results. Synthesize across all the relevant sources rather than relying on a single excerpt, and name which source each point comes from. Keep my own terminology and phrasing intact, and point out when something is a rough note rather than published writing. If the search returns nothing relevant, tell me the knowledge base had nothing on this, then answer from general knowledge.

The backticks around list_knowledge and query_knowledge_files mark those words as the tool names, which helps the model call them instead of treating them as prose. The model can decide that a question is too general to search your knowledge, and it might try to answer from its training data, so telling it to search every time, even when it thinks it knows the answer, is what makes it use your writing first.
Save the model. Now you can test it out.
Ask your model some questions
Start a new chat, select your model, and ask it something you know is in your archive but might have phrased differently than you'd search for. For example, if you have information in your knowledge base about how to prepare for interviews, you can ask it how to prepare:

You'll see the model call its knowledge tools before it answers, and a good answer lists the sources it used so you can expand each one and confirm the model isn't making things up. 
If you're not seeing the results you want, try one or more of the following fixes:

If the model answers without searching, and you've set Function Calling  to Native and the Knowledge Base Tool category is enabled, the model may not be calling the tools effectively. Switch out the model for a different one and try again. Some models have issues with reliably calling tools, or they send back invalid responses from those tools. You can edit your model's definition and replace the underlying model any time you want without having to delete the model and start over.
If answers come out thin, try switching to a dense model around 30 billion parameters, like gemma4:31B. Dense models can reason better, but they run a lot slower than the mixture-of-experts model you used.

If the model searches but misses things you know are there, go to Admin Panel, select Settings, select Documents. Increase Top K and confirm that hybrid search and the reranker are on. The search tool uses them when they're enabled, and semantic search alone is the usual reason exact terms get missed. You can change these freely without re-importing, since they apply at query time, not at upload. Only the embedding model and chunk settings bake into the index.

Once you're seeing positive results, add new documents to your knowledge base and keep asking it questions. Add work notes, drafts, ideas, and more.
Parting Thoughts
Before the next issue, explore the following things:

Explore different models and see how they perform. There are many variables, including the source of the models and the hardware you used.
Uploading files using the web interface works for short-term work, but as your files change, you'll want a better solution, so look at the oikb command-line tool. You can use this official tool to ingest content, keep things in sync, and update your knowledge bases on a schedule.
Explore other content extraction engines and see how they perform against PDFs. 

See you next issue. As always, thanks for reading.

            I'd love to talk with you about this issue on BlueSky, Mastodon, Twitter, or LinkedIn. Let's connect! 
Please support this newsletter and my work by encouraging others to subscribe and by buying a friend a copy of Write Better with Vale, tmux 3, Exercises for Programmers, Small, Sharp Software Tools, or any of my other books.

                            You just read issue #54 of Code, Content, and Career with Brian Hogan.

                            You can also browse the full archives of this newsletter.

            Email address (required)

                Share this email:

                                Share on Twitter

                                Share on LinkedIn

                                Share on Hacker News

                                Share on Reddit

                                Share via email

                                Share on Mastodon

                                Share on Bluesky

                    Older →

                Issue 53 - Run Your Own Git Remote and Rethink Audience Segments for Docs