Ruborag — Ask the Rust Book

January 04, 2026 | 02:35 PM

A CLI tool that lets you query The Rust Programming Language book using semantic search and retrieval-augmented generation, helping you find the right chapters for the concepts you want to learn.

GoRAGGemini

While learning Rust, I often knew what concept I wanted to understand, but not where in The Rust Programming Language book it was explained clearly. Topics like borrowing, lifetimes, or shadowing are well covered in the book, but locating the most relevant chapter or section is not always obvious.

Ruborag is a command-line tool built to solve this problem.

Querying the Rust Book using ruborag - semantic retrieval and RAG-based answers in the terminal, compared with a traditional keyword search in the browser.

It implements a simple Retrieval-Augmented Generation (RAG) pipeline over the Rust Book. Given a query, it semantically searches the book content and uses the retrieved context to generate an answer grounded strictly in the source material.

The project is written in Go and is designed as a learning exercise to understand RAG systems from first principles rather than relying on hosted frameworks.

How it works

The workflow is intentionally explicit and broken into steps:

HTML chapters from the Rust Book are parsed and cleaned into plain text.
The cleaned text is split into chunks and converted into vector embeddings.
Embeddings are stored locally in a SQLite database.
Queries are embedded and matched against the corpus using cosine similarity.
The most relevant sections are provided as context to an LLM to generate answers.
Each step is exposed as a separate CLI command, making the system easy to inspect and reason about.

Usage

Clone the project and ensure that Go compiler. Build the CLI tool, and use Gemini's API Key: export GEMINI_API_KEY=<api_key>

Parse and clean Rust Book chapters:

ruborag parse chapter1.html chapter2.html

The corpus I used for the project is available in the Github repo.

Generate embeddings for parsed text:

ruborag embed -w <path_to_files>

Search the corpus semantically:

ruborag search "What is shadowing in Rust?"

Ask a question using retrieval-augmented generation:

ruborag ask "What is borrowing in Rust?"

Performance note

While implementing the parsing stage, I compared buffered and unbuffered file I/O. Buffered I/O resulted in approximately a 3.5× speed improvement during parsing. This was a small but useful reminder that low-level implementation details matter, even in tooling projects.

Footnote

ruborag is not intended to be a production-ready RAG system. It is a focused learning project built to understand how document parsing, embeddings, semantic search, and retrieval-augmented answering fit together in practice.

The source code is available at: https://github.com/MahendraDani/ruborag