Skip to content

Add RAG MCP Server with document search and management UI#384

Open
crab182 wants to merge 6 commits intoratfactor:mainfrom
crab182:claude/setup-llm-rag-mcp-ChHrC
Open

Add RAG MCP Server with document search and management UI#384
crab182 wants to merge 6 commits intoratfactor:mainfrom
crab182:claude/setup-llm-rag-mcp-ChHrC

Conversation

@crab182
Copy link
Copy Markdown

@crab182 crab182 commented Apr 6, 2026

Summary

This PR introduces a complete self-hosted RAG (Retrieval Augmented Generation) system with MCP (Model Context Protocol) server support, designed for deployment on Unraid. The system enables cloud-based LLMs to search local documents via semantic similarity without exposing documents to external services.

Key Changes

Backend (FastAPI + ChromaDB)

  • RAG Engine (backend/app/services/rag_engine.py): Semantic document search using sentence-transformers and ChromaDB with configurable chunking and embedding
  • Document Management (backend/app/routers/documents.py): Upload, query, and manage documents across multiple collections with support for PDF, DOCX, XLSX, and text formats
  • SMB Browser (backend/app/routers/smb.py, backend/app/services/smb_browser.py): Browse and ingest documents from LAN SMB shares with recursive directory support
  • Authentication (backend/app/services/auth.py): API key generation, hashing, and validation for secure access control
  • Document Parser (backend/app/services/document_parser.py): Multi-format file parsing (PDF, DOCX, XLSX, code, config files, etc.)

MCP Server (SSE Transport)

  • MCP Protocol Implementation (mcp_server/server.py): Server-Sent Events (SSE) based MCP server providing tools for:
    • search_documents: Semantic search with relevance scoring
    • list_collections: Browse available document collections
    • list_documents: List documents in a collection
    • get_server_status: Server health and statistics
  • API key authentication via Authorization header
  • Session management for SSE connections

Frontend (React + Vite)

  • Dashboard (frontend/src/pages/Dashboard.jsx): System overview with document counts and collection statistics
  • Documents (frontend/src/pages/Documents.jsx): Upload files, manage collections, reindex documents
  • Search (frontend/src/pages/Search.jsx): Semantic search interface with result display
  • SMB Browser (frontend/src/pages/SMBBrowser.jsx): Browse SMB shares and ingest documents with path navigation
  • API Keys (frontend/src/pages/APIKeys.jsx): Create, revoke, and manage API keys for MCP access
  • MCP Config (frontend/src/pages/MCPConfig.jsx): Enable/disable MCP server and view connection details
  • Styling (frontend/src/index.css): Dark theme with comprehensive component styles (cards, forms, tables, badges, etc.)

Infrastructure

  • Docker Compose (docker-compose.yml): Three-service orchestration (backend, MCP server, frontend)
  • Deployment Script (deploy.sh): Automated setup with health checks and service validation
  • Configuration (backend/app/config.py): Environment-based settings with persistent JSON config storage
  • Nginx (frontend/nginx.conf): Reverse proxy for API requests and SPA routing

Notable Implementation Details

  • Semantic Search: Uses all-MiniLM-L6-v2 embedding model by default with configurable alternatives
  • Document Chunking: Configurable chunk size (512 tokens) with overlap (64 tokens) for context preservation
  • Multi-Collection Support: Organize documents into separate collections with independent indexing
  • SMB Integration: Native SMB/CIFS support for accessing Windows shares on LAN without copying files
  • API Key Security: Keys are hashed with SHA-256 and stored in persistent config; raw keys shown only once
  • SSE Transport: MCP server uses Server-Sent Events for streaming responses, compatible with cloud LLM platforms
  • Persistent Storage: ChromaDB vector store and document metadata persisted to /app/data/ volumes

Deployment

The system is containerized and ready for Unraid deployment:

  • Backend API: Port 8900
  • MCP Server: Port 8901
  • Web UI: Port 8902

All services include health checks and automatic restart policies.

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd

claude added 3 commits April 2, 2026 05:24
Complete self-hosted system for document RAG with MCP server access:
- FastAPI backend with ChromaDB vector store and sentence-transformers
- MCP server (SSE + Streamable HTTP) with API key authentication
- React web UI for document management, search, SMB browsing, and config
- Docker Compose orchestration for Unraid deployment
- SMB share browsing and document ingestion from LAN shares

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
- ChromaDB list_collections: handle 0.5.x API returning strings
- MCP SSE: replace dict-based response with asyncio.Queue (race condition)
- MCP streamable: return 204 for notifications (JSON-RPC compliance)
- SMB browser: remove unused low-level imports, improve list_shares error
- Frontend: fix header merge bug in request(), clipboard fallback for HTTP
- Remove unused deps (react-router-dom, lucide-react, markdown, etc.)
- Switch PyPDF2 to pypdf to avoid deprecation warnings
- Remove unused imports (os, shutil) from routers

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
- Move .dockerignore to per-service build contexts (backend, mcp, frontend)
- Add .gitkeep for data/chromadb/ and data/config/ dirs
- Fix .gitignore to track .gitkeep files in data dirs
- Fix nginx.conf: proxy /sse and /messages for full MCP access via frontend
- Fix SMB list_shares: use request body model (password was in URL query)
- Refactor SMB ingest: extract recursive logic into helper function
- Remove unused imports (Optional, struct, uuid, smbprotocol internals)
- Add deployment README with architecture, quick start, and API docs

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
@crab182 crab182 marked this pull request as draft April 6, 2026 13:12
@crab182 crab182 marked this pull request as ready for review April 6, 2026 13:12
claude added 3 commits April 18, 2026 04:03
Modern Compose v2 emits a warning when the top-level 'version' key is
present. Removing it silences the warning with no behavior change.

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
Pre-install torch from the PyTorch CPU index before sentence-transformers
pulls it in. Avoids ~3GB of unnecessary NVIDIA CUDA packages
(torch+nvidia-cublas+nvidia-cudnn+triton+...) that were exhausting the
default 50GB Unraid docker.img during build.

The all-MiniLM-L6-v2 embedding model runs on CPU with no perf concern
for this workload.

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
Authentication & authorization
- All /api/ routes now require a valid API key via Depends()
- Two-tier keys: admin (manage keys/collections/SMB) vs read-only (query only)
- Bootstrap flow: first admin key can be created without auth; all subsequent
  operations require authentication
- hmac.compare_digest used throughout for constant-time key comparison
  (both backend auth.py and mcp_server/server.py)
- MCP server forwards the client's bearer token to the backend, so the
  same key controls access end-to-end

Input validation
- Collection names restricted to [A-Za-z0-9_-]{1,64} via Pydantic validators
- Filenames sanitized through safe_filename() (NFKC normalize, regex allowlist,
  reject dotfiles and path separators)
- safe_join() enforces uploads/deletes stay within the documents base dir,
  closing a path-traversal vulnerability on /upload and /delete
- File upload capped at 100 MB and non-empty checked
- Query n_results bounded 1..50; query string length bounded
- APIKeyCreate name restricted with regex and length

Transport & CORS
- CORS allowlist via CORS_ALLOWED_ORIGINS env var (default: 192.168.1.52:8902)
- Methods/headers restricted to those actually used
- Origin-header check on MCP endpoints (DNS-rebinding / CSRF defense);
  non-browser clients (no Origin header) still allowed
- Security headers (X-Content-Type-Options, X-Frame-Options, Referrer-Policy,
  CSP, Permissions-Policy) added in nginx and via middleware on both services
- nginx server_tokens off

Error handling
- Generic 500 handler returns "Internal server error"; full exceptions logged
  server-side only
- SMB routes no longer leak exception strings to clients
- MCP tool call failures return a neutral message

Container hardening
- Backend and MCP containers now run as non-root (uid 10001 / 10002)
- Backend purges build-essential after pip install (smaller surface)
- security_opt: no-new-privileges on all three services
- Memory limits set per-service
- MCP config volume mounted read-only (it only needs to validate keys)

Rate limiting
- slowapi added with a 120/minute default on the backend

Frontend
- API client reads Bearer token from localStorage and attaches it to every
  request and upload
- App.jsx gates the UI: bootstrap screen when no keys exist, login screen
  otherwise; Sign-out button clears the stored key
- API Keys page now lets the operator mark a key as admin or read-only

https://claude.ai/code/session_01ByAJeYptU8ZosBaBVw1rbd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants