Flagship project

Keyboard Manual Assistant

A grounded RAG system that answers questions about keyboard synthesizer manuals using retrieved evidence, refusal behaviour, evals, and Langfuse observability.

Read write-up Watch demo ↗GitHub ↗

Problem

Professional keyboard players often request a specific board for a gig and arrive to find something different on stage. The manual may be hundreds of pages long, but the player has minutes to answer practical questions: how to split the keyboard, layer sounds, save a performance, or change controller behaviour.

Why it matters in real gigs

This comes from lived experience, not a synthetic demo. If you have twenty minutes of soundcheck, generic search results are too slow and generic LLM answers are too risky. The assistant needs to answer from the official manual, show enough context to trust the answer, and refuse when the manual does not support the question.

Architecture

Manual PDF upload
    ↓
Extract text and split into chunks
    ↓
Embed chunks with sentence-transformers
    ↓
Store vectors in Qdrant
    ↓
User asks a question
    ↓
Embed question and retrieve top matching chunks
    ↓
Build constrained prompt from retrieved manual excerpts
    ↓
Generate answer with Llama 3.1 via LocalAI
    ↓
Trace retrieval, prompt, answer, timings, and failures in Langfuse

Stack

FastAPI

Backend API for PDF upload, retrieval, answer generation, and streaming responses.

Qdrant

Vector database for storing and retrieving embedded manual chunks.

sentence-transformers

Local embedding model for turning questions and manual chunks into vectors.

LocalAI + Llama 3.1 8B

Local answer generation without relying on a hosted model for every request.

Langfuse

Tracing for retrieval, generation, token usage, latency, and debugging failed answers.

Docker Compose

Portable local stack for the API, embedding service, vector DB, LLM, and frontend.

What makes it production-minded

Grounded answers constrained to retrieved manual excerpts rather than general model knowledge.
Refusal behaviour for questions the manual does not answer.
Eval harness with positive cases and refusal cases so changes can be measured instead of guessed.
Langfuse traces showing retrieved chunks, generation inputs, timings, and token usage.
Failure categorisation for retrieval misses, wrong-document retrieval, LLM timeouts, and weak answers.

Demo

The demo shows PDF upload, manual-grounded answers, Langfuse traces, retrieved chunks, and an example of where the system needs tighter evaluation.

Watch the Loom demo ↗

Code

The source includes the FastAPI backend, embedding service, frontend, Docker Compose setup, Langfuse tracing, and a basic eval harness.

View on GitHub ↗

Next steps

The next useful version would run in the cloud on demand, keep the manual index updated from trusted manufacturer sources, and expand the eval set across more keyboards and failure cases.