Original Layered Memory Algorithm · Precision Attention Control · Commander-Level Prompts · Multi-AI Engine · Bot · IDE
A cross-generational multi-AI engine collaboration platform
Make AI truly remember you.
This entire project — design, architecture, and development — was completed independently by a university student, leveraging AI-assisted programming with skills spanning algorithm design, biomimicry principles, framework architecture, and logical thinking.
Chat interface with fine-tuned controls, adaptable to various beautification styles
Whether it's AI coding tools (Cursor, Copilot), AI chat applications (ChatGPT, Claude), or AI roleplay platforms (SillyTavern), they all face the same underlying limitations:
| Problem | Current State | Consequence |
|---|---|---|
| Limited context window | Even 128K-1M tokens overflow in long conversations | Early messages get truncated; AI loses critical information |
| Attention degradation | The longer the context, the less the model focuses on each segment | Even if information exists in context, AI may "overlook" it |
| No persistent memory | Closing a conversation = forgetting everything | Every new session starts from zero |
beilu-always accompany breaks through all three limitations at the fundamental level — not by working around them, but by solving them algorithmically.
Designed after the human hippocampus memory formation mechanism and the Ebbinghaus forgetting curve, achieving theoretically unlimited AI memory. No external database or vector storage dependency — pure JSON files + pure prompt-driven.
🔥 Hot Memory Layer — Injected every turn
User profile / Permanent memories Top-100 / Pending tasks / Recent memories about user
🌤️ Warm Memory Layer — On-demand retrieval, last 30 days
Daily summaries / Archived temporary memories / Monthly index
❄️ Cold Memory Layer — Deep retrieval, beyond 30 days
Monthly summaries / Historical daily summaries / Yearly index
Additionally, an L0 Memory Table Layer (10 highly customizable tables, fully injected every turn as CSV) provides structured immediate context.
| Metric | Value |
|---|---|
| Hot layer injection per turn | ~7,000-11,000 tokens (only 5-9% of a 128K window) |
| Retrieval AI context | <5,000 tokens (100% attention focused on retrieval) |
| P1 retrieval efficiency | Max 3 rounds to hit target (BM25 pre-filtering + regex exact match) |
| Retrieval tech stack | BM25 + Regex Search (dual-engine collaboration, zero external deps) |
| Storage cost | Zero (pure JSON files, no database dependency) |
| Single-character sustained operation | 12+ years (at 5,000 files) |
| Theoretical duration | 260+ years (at 100,000 files; NTFS/ext4 support far exceeds this) |
score = weight × (1 / (1 + days_since_triggered × 0.1))
Inspired by the Ebbinghaus forgetting curve: important and recently triggered memories are prioritized for injection, rather than simple chronological order.
The most critical design feature of the memory system: all memory injection, retrieval, archival, and summarization operations are performed by AI through prompts, not traditional hardcoded logic.
This means:
- Table meanings and purposes can be changed anytime: Simply modify the prompt descriptions for tables, and the AI will interpret and operate them accordingly — no code changes needed
- Archival strategies are instantly adjustable: P2-P6 behaviors are entirely defined by prompts; modifying prompts changes archival rules, summary formats, and retrieval strategies
- Zero technical barrier for migration: Users can edit prompts themselves to adapt to different scenarios (roleplay / coding assistant / game NPC) without programming skills
- Naturally avoids technical debt: No complex parsers or state machines to maintain — the AI itself is the most flexible "parser"
This is the project's most critical design advantage. Traditional approaches stuff all memories into a single AI context, causing attention to degrade rapidly as context grows. We solve this completely through AI role separation + dual-engine pre-filtering:
Traditional: [All historical memory + current chat] → Single AI → Attention scattered
↓
Our approach: [Index] → Retrieval AI (focused on finding) → [Selected memory + current chat] → Reply AI (focused on quality)
| AI Role | Context Content | Context Length | Attention Distribution |
|---|---|---|---|
| Retrieval AI (Gemini 2.0/2.5 Flash) | User message + index files | <5K tokens | 100% focused on finding relevant memory |
| Reply AI (user's choice) | Selected memory (~10K) + chat | As needed | 100% focused on reply quality |
The Reply AI only sees precisely filtered memory fragments from the Retrieval AI. The context is clean, the signal-to-noise ratio is extremely high, and attention never degrades.
- BM25 coarse filtering: TF-IDF-based statistical algorithm that quickly filters the most relevant candidates from massive memory files. Pure JS implementation, zero external dependencies
- Regex precise targeting: After BM25 coarse filtering, regex provides exact matching — supports pattern matching, keyword combinations, fuzzy search
- Dual-engine collaboration effect: P1 retrieval optimized from 5 rounds down to max 3 — 40% faster, 40% cheaper on API costs
The preset engine completely takes over all module outputs through the TweakPrompt three-round mechanism, commanding the absolute direction of AI replies. This isn't simple System Prompt stacking — it's a complete message orchestration engine:
[beforeChat] — Preset header (system presets, defining AI core behavior)
[injectionAbove] — @D≥1 injection (world book + memory + plugin data)
[chatHistory] — Actual conversation history
[injectionBelow] — @D=0 injection (P1 retrieval results + real-time data)
[afterChat] — Preset footer (final instructions, e.g. jailbreak)
Round 1 (dl=2): Collect — Gather content from all modules, clear raw data, preset takes full control
Round 2 (dl=1): Rebuild — Construct 5-segment message structure, process world book/memory injection, macro replacement
Round 3 (dl=0): Snapshot — Capture debug snapshot for prompt viewer
P1 Retrieval AI doesn't just retrieve memories — it analyzes conversation intent in real-time and automatically switches to the most suitable prompt preset:
- Multi-mode adaptation: Casual chat, roleplay, coding, prompt engineering… the AI automatically switches to the optimal preset based on conversation content, with prompts and COT (Chain of Thought) changing accordingly
- Seamless experience: No manual intervention needed — just say "help me write code" and the AI's behavior mode adjusts in real-time
- Cooldown anti-oscillation: Built-in cooldown counter (default 5 turns) prevents rapid repeated switching
- Fully customizable: Switching logic is guided by COT in prompts; users can define their own switching conditions and strategies
This means AI is no longer "one preset fits all" — it dynamically adapts to the optimal behavior mode based on the current context, making it a truly multi-mode intelligent agent.
{{char}} / {{user}} / {{tableData}} / {{hotMemory}} / {{chat_history}} / {{lastUserMessage}} / {{presetList}} / {{current_date}} / {{time}} / {{date}} / {{weekday}} / {{idle_duration}} / {{lasttime}} + custom macro variables
📊 Highly Customizable Memory Tables — Solving the AI RP God's-Eye Problem + Universal Scenario Adaptation
10 fully customizable structured tables, injected every turn as CSV. Table meanings and purposes are entirely defined by prompts — change the prompt descriptions and the same table system serves completely different scenarios:
| Scenario | Example Table Usage |
|---|---|
| AI Roleplay | Space-time settings / Character status / Social relations / Quest progress / Inventory — AI only knows what's recorded in the tables, perfectly solving the god's-eye problem |
| Programming | Architecture decisions / Code conventions / Module dependencies / Bug tracking / TODO lists |
| Work Management | Project progress / Meeting notes / Contacts / To-do items / Knowledge accumulation |
| Gaming | Character attributes / Equipment list / Skill trees / World state / NPC relationships |
Core value for AI RP: Tables achieve information isolation — the AI can only see information explicitly recorded in the tables. Information the character doesn't know won't appear in the tables, fundamentally eliminating the "god's-eye view" problem.
Table contents are automatically maintained by the Chat AI via <tableEdit> tags — the AI autonomously decides when to update which data during conversation, with no manual intervention needed.
World book entries support 3 activation modes — the dynamic mode can read real-time data from memory tables to decide whether to inject a world setting:
| Mode | Trigger Condition | Use Case |
|---|---|---|
| Always-on | Unconditional activation, injected every turn | Core worldview, immutable settings |
| Regex | Keyword matching against chat history | Triggered when specific places/people mentioned |
| Dynamic | Reads values/states from memory tables, conditional | Affection thresholds, quest progress, item effects |
Dynamic injection examples:
→ When affection > 80 in Table #2 "Social Relations", inject "Special dialogue unlocked" setting
→ When main quest = "Chapter 3" in Table #3 "Quest Progress", inject corresponding world description
→ When a specific item exists in Table #5 "Inventory", inject the item's usage effects
This table-driven dynamic world-building mechanism turns world settings from static text into a living system that evolves with conversation progress and character state. Easy to learn — modifying table data instantly affects AI behavior.
Built on the core technologies above, we've constructed a complete integrated platform ecosystem:
The system has 7 built-in AI roles, each with a dedicated responsibility. Each conversation only calls 2 AIs (Retrieval AI + Chat AI); the rest trigger on demand — no need to worry about usage:
| AI | Role | Trigger | Usage Notes |
|---|---|---|---|
| Chat AI | Conversation with users, file operations | User sends a message | Called every conversation |
| P1 Retrieval AI | Search relevant history from memory (up to 3 rounds) + Smart Preset Switching | Automatic per turn | Called every turn, can use free AI |
| P2 Archive AI | Summarize and archive when temporary memories exceed threshold | ~50 conversations per trigger | Extremely infrequent |
| P3 Daily Summary AI | Generate detailed daily summary | Manual trigger | Only when user clicks |
| P4 Hot→Warm AI | Move expired hot-layer memories to warm layer | Manual trigger | Only when user clicks |
| P5 Monthly Summary AI | Warm→Cold archival, generate monthly summaries | Manual trigger | Only when user clicks |
| P6 Repair AI | Check and fix memory file format issues | Manual trigger | Only when user clicks |
Key insight: The Retrieval AI (P1) only needs to "find memories" — it doesn't require a high-intelligence model. It can run entirely on free or ultra-low-cost AI (e.g., Gemini 2.0/2.5 Flash free tier).
AI calls per conversation:
① P1 Retrieval AI — Find memories + determine preset switching (can use free AI like Gemini Flash)
② Chat AI — Generate reply based on selected memories (use any model you prefer)
Infrequent AI:
③ P2 Archive — Triggers roughly once per 50 conversations, barely noticeable
④ P3-P6 — All manual trigger, zero usage unless you click
Bottom line: If P1 uses free AI (Gemini Flash free tier is more than enough), then the actual cost per conversation = only one Chat AI call. The memory system runs at virtually zero cost.
- Left panel: Preset management / World book binding / Persona selection / Character editing
- Center panel: Chat / File editor / Memory management — three-tab switching
- Right panel: AI settings / Feature toggles / Memory AI operation panel
IDE includes built-in BM25 + Regex Search dual-engine file retrieval for quick project-wide file and content search — enter keywords or regex expressions for instant results. AI can directly read and write user project files via the beilu-files plugin.
Deploy your AI to Discord channels and chat with your AI anytime, anywhere:
- Full memory access: The Discord Bot shares the local memory system — your AI remembers your history even on Discord
- Visual management panel: Bot Token, Owner, message depth and other common settings displayed as form controls — no manual JSON editing needed
- Real-time message log: View Bot's sent/received messages in real-time on the management interface (user messages / AI replies / error logs)
- Multi-channel support: Bot can work in multiple Discord channels simultaneously, maintaining independent context per channel
- Direct import of SillyTavern format character cards, presets, and world books
- Support for Risu formats (ccv3 / charx / rpack)
- MVU variable system + EJS template rendering (SillyTavern helper script compatibility layer)
- 14 AI service generators (proxy / gemini / claude / claude-api / ollama / grok, etc.)
Preset engine / Memory system / File operations / Desktop screenshot / Browser awareness / Logger / Feature toggles / Multi-AI collaboration / Regex beautification / World book / Web search / System info / MVU variables / EJS templates / GraphRAG knowledge graph / Vector database / User-level plugin host / Plugin container
Management home page supports 4 languages (Chinese / English / Japanese / Traditional Chinese) via a "translation overlay" approach.
- 12-module diagnostic logs: Enable/disable per module independently, zero performance overhead when disabled
- Console interception: Automatically captures all
console.log/warn/error/infofrom both frontend and backend, 500-entry ring buffer - One-click export: Click "📦 One-Click Pack Logs" to generate a single JSON file — just attach it when reporting issues
| Dimension | ChatGPT etc. | beilu-always accompany |
|---|---|---|
| Memory | Simple summaries / conversation history | Three-layer graded + BM25/Regex dual-engine retrieval + multi-AI collaboration, theoretically unlimited |
| Attention | Degrades as context grows | Retrieval AI pre-filters; Reply AI attention stays focused |
| Customization | Limited System Prompt | Full preset system + 10 customizable memory tables + dynamic world book injection |
| Data ownership | Server-side storage | Local JSON files, fully self-owned |
| Cross-platform | Official clients only | Web + Discord Bot, AI serves you on multiple platforms |
| Dimension | Cursor etc. | beilu-always accompany |
|---|---|---|
| Project memory | Based on current file context | Cross-session persistent memory (architecture decisions, code conventions, historical discussions) |
| Multi-AI collaboration | Single model | 7 AIs with dedicated roles; retrieval/summary/reply separated |
| Memory cost | Relies on large context windows | ~10K tokens covers the hot layer |
| File search | IDE built-in | BM25 + Regex Search dual engine + IDE file tree |
| Dimension | SillyTavern | beilu-always accompany |
|---|---|---|
| Memory | No built-in memory system | Original three-layer memory + BM25/Regex dual-engine retrieval + 6 auxiliary AIs |
| God's-eye problem | No solution | Memory table information isolation — AI only knows what's in the tables |
| File operations | None | Built-in IDE file management + AI file operations |
| Desktop capability | None | beilu-eye desktop screenshot → AI recognition |
| Cross-platform | Web only | Web + Discord Bot |
| Preset compatibility | Native | Fully compatible with ST presets/character cards/world books |
Even when context windows expand to 10M+ tokens, layered memory remains valuable:
- Attention problems won't disappear: No matter how large the window, model attention on massive text will still degrade. Pre-filtering + precise injection will always outperform "stuff everything in."
- Cost efficiency: Larger windows = higher costs. Replacing 100K+ tokens of full history with ~10K tokens of selected memory reduces API call costs by 10x or more.
- Structured > Unstructured: Tabular memory is easier for AI to accurately read and update than information scattered across conversations.
Layered memory is not a temporary workaround for limited context windows — it is a superior paradigm for information organization.
- Original three-layer memory algorithm (pure prompt-driven) — Permanent memory, theoretically unlimited
- Precision attention control — BM25 + Regex Search dual-engine pre-filtering
- Commander-level prompt system — 5-segment message structure + TweakPrompt three-round mechanism
- Smart Preset Switching System — P1 real-time context analysis with auto preset switching, multi-mode adaptive COT
- Highly customizable memory tables (10 tables) — Solving the god's-eye problem + universal scenario adaptation
- World book dynamic injection — 3 activation modes (always-on / regex / dynamic), table-driven living world
- Multi-AI collaboration engine (7 AI roles, P1 can use free AI)
- 🆕 Smart Retrieval System — BM25 + Regex Search dual engine, P1 retrieval from 5 rounds down to max 3 (40% faster, 40% cheaper)
- 🆕 Discord Bot — Cross-platform AI service with visual management panel + real-time message log
- 🆕 Browser Page Awareness — Passive + on-demand architecture, userscript monitors DOM changes and reports page snapshots
- 🆕 Claude API Full Adaptation — Native Anthropic format, prefill modes, Extended Thinking toggle
- 🆕 User-Level Plugin Host (M1) — Scans user directory, supports Python/Node/executable child processes, receives external plugin push data via HTTP routes
- IDE-style interface with file operations (including BM25 + Regex Search dual-engine file retrieval)
- Desktop screenshot system (beilu-eye)
- Rendering engine (SillyTavern helper script compatibility layer)
- Management home page i18n (Chinese / English / Japanese / Traditional Chinese)
- 18 feature plugins
- Full-stack diagnostic framework with one-click log export
- More platform Bot integrations
- Plugin ecosystem (Workshop-style high extensibility)
- Live2D integration + AI-controlled models
- AI game engine (chat interface = game interface, code-compatible, userscript-friendly)
- TTS / Text-to-image integration
- VSCode extension compatibility
- Highly extensible core architecture
- Deno runtime
- Modern browser (Chrome / Edge / Firefox)
- At least one AI API key (Gemini API recommended — free tier available)
# Clone the project
git clone https://github.com/beilusaiying/always-accompany.git
cd always-accompany
# Launch (Windows)
run.bat
# Launch (Linux/macOS)
chmod +x run.sh
./run.shAfter launch, open your browser and navigate to http://localhost:1314
- Configure AI source: Home → System Settings → Add AI service source (proxy / gemini / claude, etc.)
- Import character card: Home → Usage → Import (supports SillyTavern PNG/JSON format)
- Configure memory presets: Home → Memory Presets → Set up API for P1-P6 (recommend P1 using Gemini Flash free tier)
- Start chatting: Click a character card to enter the chat interface
- Automatic operation: Memory tables are automatically maintained by the Chat AI (via
<tableEdit>tags); Retrieval AI (P1) triggers automatically each turn - Manual operations: Chat interface right panel → Memory AI Operations → P2-P6 manual buttons
- Daily archival: At the end of each day, click the "End Today" button to trigger the 9-step daily archival process
- Memory browsing: Chat interface → Memory Tab → Browse/edit/import/export memory files
- Create a Bot application on Discord Developer Portal
- In beilu-chat interface → Bot tab at the top → Enter Bot Token and Owner username
- Click "Start Bot" → @your Bot in a Discord channel to start chatting
| Component | Technology |
|---|---|
| Runtime | fount (based on Deno) |
| Backend | Node.js compatibility layer + Express-style routing |
| Frontend | Vanilla JavaScript (ESM modules) |
| AI integration | 14 ServiceGenerators (OpenAI / Claude / Gemini / DeepSeek / Ollama, etc.) |
| Smart retrieval | BM25 + Regex Search dual engine (pure JS, zero deps) |
| Desktop screenshot | Python (mss + tkinter + pystray) |
| Cross-platform | discord.js v14 |
| Storage | Pure JSON file system |
Discussion, resource sharing, prompt exchange, bug reports — come join us!
The project includes a carefully crafted P1-P6 Memory AI prompt preset, ready to use out of the box:
beilu-presets_2026-02-23.json — Complete prompt configurations for P1 Retrieval AI, P2 Archive AI, P3 Daily Summary AI, P4 Hot→Warm AI, P5 Monthly Summary AI, and P6 Repair AI
How to use: Home → Memory Presets → Click "Import" → Select this JSON file to import all presets in one click.
We welcome everyone to participate in building this project! You can:
- 🃏 Share character cards — Create and publish your character cards to enrich the community
- 📝 Publish prompt presets — Share your tuned memory presets and chat presets to help others
- 🌍 Contribute world books — Build world settings for other users to import
- 🐛 Report bugs — Use the one-click log export feature and attach the diagnostic report
- 💡 Suggest features — Feature requests, UI improvements, plugin ideas — all welcome
- 🔧 Contribute code — Fork & PR, let's build together
The community has many more great prompts and character card resources — feel free to explore and share!
This project would not be possible without the contributions of the following open-source projects and communities:
- fount — The foundational framework providing AI message handling, service source management, module loading, and other core infrastructure, saving significant development time on low-level implementation
- SillyTavern — The pioneering project in AI roleplay, whose preset format, character card specification, and world book system have become community standards. This project is fully compatible with its ecosystem
- SillyTavern Plugin Community — Thanks to all open-source plugin authors for their exploration and sharing. Their work on rendering engines, memory enhancement, and feature extensions provided valuable references and inspiration for this project's design
🖥️ IDE AI Editor — VSCode-inspired, easy to get started
IDE-style AI coding and file editing interface, inspired by VSCode for a familiar experience. Plugin integration and management coming soon.
If you're unfamiliar with AI coding or a beginner, please use the designated sandbox space for AI file capabilities: 📖 Read / ✏️ Write / 🗑️ Delete / 🔄 Retry / 🔌 MCP / ❓ Questions / 📋 Todo. You can disable write and delete for safety.
🧠 Memory Files — View and edit memory data in real-time
Manually edit content anytime, observe memory AI operations in real-time. You can also make requests to the memory AI directly.
🎨 Regex Editor — Sandbox & Free modes
Manage regex rules at different levels, modify conversations, with Sandbox and Free modes. Protects against potentially malicious scripts from unknown character cards.
⚠️ We cannot guarantee effectiveness against all malicious scripts. Please review character card code for malicious content before use. We are not responsible for any damages.
📋 Commander-Level Prompts — Full control over all sent content
Commander-level prompts that control all sent content, maximizing prompt effectiveness.
🧠 Memory Presets P1-P6 — Fully prompt-driven, zero technical barrier
P2-P6 behaviors can all be modified through prompts — no coding required, highly adaptable.
📖 System Guide — Detailed documentation for quick onboarding
Detailed system documentation to help you get started quickly.
🔬 System Diagnostics — One-click log export for rapid troubleshooting
Comprehensive system self-diagnosis with one-click log packaging. Captures both browser console and server logs into a single JSON file — just attach it when reporting issues.
This project is built on the fount framework, with direct authorization from the original author.







