Ruby knowledge graph

An event-sourced knowledge graph engine written in Ruby on Rails. It takes unstructured text (conference transcripts, meeting notes, sales calls, book-club discussions, …) and turns it into a navigable graph of typed nodes and relations, using an LLM as the extractor.

Every run of an LLM extraction is a first-class event, so the entire graph can be rebuilt from the audit log at any time. Individual nodes and their relations are derived read-models.

The live demo at https://wrocloverb.rubygraph.dev is configured for the wroc_love.rb conference — talks, speakers, tools, events, takeaways, etc. The engine itself is domain-agnostic: by swapping the ontology, prompts and content sources you can point it at any other corpus.

Stack

Ruby 3.4, Rails 8.1
Postgres 17 with pgvector for semantic search on nodes
Rails Event Store (RES) for the event log and handlers
Solid Queue (running in-Puma in production) for background jobs
RubyLLM against Anthropic (Claude Opus 4.7 in prod, Haiku 4.5 in dev) for extraction; Ollama (qwen3-embedding:4b) for local embeddings
MCP server exposing the graph to AI assistants (/mcp)
Thruster + Puma in production, Kamal 2 for deployment

Architecture

TranscriptIngested ──► BuildIngestion ──► Ingestion read-model
                   └─► RequestExtraction
ExtractionRequested ──► BuildExtraction ──► Extraction read-model
                    └─► ExtractKnowledge (job, calls Claude with tool-use)
KnowledgeExtracted ──► BuildKnowledgeGraph ──► Node / Edge read-models

The extraction prompt is composed from a shared template plus a per-kind slice (talk, panel, lightning-talks, …) and a per-format slice. The ontology (node kinds, relation types, attribute schemas) lives in config/ontology.yml.

Adapting to your own domain

The current live instance is configured for a Ruby conference, but everything that makes it domain-specific lives in a handful of files. To point the engine at a different corpus:

What	Where	What to change
Node kinds, relation types, attrs	`config/ontology.yml`	Replace `person/talk/event/tool/...` with kinds that fit your domain (e.g. `customer/ticket/product/feature/...` for support tickets). Relations must also be updated — they declare which source kinds may connect to which target kinds.
Per-kind prompt slices	`app/lib/prompts/kinds/*.md`	Each input's `kind` field selects one of these. Add a new file (`kinds/<your-kind>.md`) describing the structure of that kind of document.
Per-format prompt slices	`app/lib/prompts/formats/*.md`	Same idea for input format (`transcript`, `markdown`, …).
Seed domain data / starter nodes	`db/seeds.rb` + `NODES`/`EDGES` arrays	Replace hardcoded events/people/etc. with your own or empty the seed.

Everything else (event sourcing, read-model builders, the extraction pipeline, MCP server, web UI) is generic and should not need changes.

Getting started

You'll need an ANTHROPIC_API_KEY in the environment for extraction to work — set it in your shell before starting the app (for Dev Containers, put it in .env or export it before opening the project).

Dev Containers (recommended)

Open the project in VS Code or any IDE with Dev Containers support. The .devcontainer spins up the Rails app, Postgres (with pgvector), and Ollama; models are pulled on first boot.

# inside the container
bin/setup          # bundle + db:prepare + db:seed
bin/dev            # starts Rails

Visit http://localhost:3000.

Locally without Dev Containers

You need Ruby 3.4.7, Postgres 17 with pgvector, and an Ollama instance with the embedding model pulled, reachable via OLLAMA_URL:

ollama pull qwen3-embedding:4b

Then:

bundle install
bin/rails db:prepare
bin/rails db:seed    # ingests files from transcripts/, skips auto-extraction
bin/rails server

Seeding only ingests the raw transcripts; it does not kick off LLM extractions (the initializer detects db:seed and disables the auto-extraction subscription so you can trigger runs manually afterwards).

Testing

bundle exec rspec

External network calls are stubbed with WebMock.

MCP

The MCP endpoint is mounted at /mcp. OAuth discovery is disabled on the public demo (.well-known/oauth-authorization-server is commented out), so clients can connect without credentials. Re-enable it in config/routes.rb and app/lib/mcp_rack_app.rb for private deployments.

License

The code in this repository is licensed under the MIT License. Transcripts under transcripts/ belong to their respective speakers; they are included under fair-use for research and conference archival purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.devcontainer		.devcontainer
.github		.github
app		app
bin		bin
config		config
db		db
lib/tasks		lib/tasks
log		log
public		public
script		script
spec		spec
tmp		tmp
transcripts		transcripts
vendor		vendor
.dockerignore		.dockerignore
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
.mcp.json		.mcp.json
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
Dockerfile		Dockerfile
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ruby knowledge graph

Stack

Architecture

Adapting to your own domain

Getting started

Dev Containers (recommended)

Locally without Dev Containers

Testing

MCP

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ruby knowledge graph

Stack

Architecture

Adapting to your own domain

Getting started

Dev Containers (recommended)

Locally without Dev Containers

Testing

MCP

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages