Releases: LessUp/tiny-llm
Tiny-LLM v2.0.1
Tiny-LLM v2.0.1 — Bug Fixes
Release Date: April 16, 2026
English
🔵 Fixed
Critical: Scale Dimension Calculation Error
Severity: Critical
Impact: Test utility only
File: tests/test_integration.cu
The createRandomWeight function had an incorrect scale tensor dimension calculation:
// ❌ INCORRECT (rows and cols swapped)
int num_groups = (cols + group_size - 1) / group_size;
w.scales = randomDeviceFP16(rows * num_groups, ...);
// ✅ CORRECT
int num_groups = (rows + group_size - 1) / group_size;
w.scales = randomDeviceFP16(num_groups * cols, ...);Why this matters: W8A16 matmul uses [rows/group_size, cols] to index scales, requiring ceil(rows/g) * cols elements.
Code Cleanup: Removed 12 lines of unused q_reg array loading code in kernels/attention.cu.
✅ Verification
$ ctest --output-on-failure
100% tests passed, 0 tests failed简体中文
🔵 修复
严重: 尺度维度计算错误
严重程度: 严重
影响范围: 仅测试工具
文件: tests/test_integration.cu
createRandomWeight 函数中存在尺度张量维度计算错误:
// ❌ 错误 (rows 和 cols 互换)
int num_groups = (cols + group_size - 1) / group_size;
w.scales = randomDeviceFP16(rows * num_groups, ...);
// ✅ 正确
int num_groups = (rows + group_size - 1) / group_size;
w.scales = randomDeviceFP16(num_groups * cols, ...);重要性: W8A16 矩阵乘使用 [rows/group_size, cols] 索引尺度,需要 ceil(rows/g) * cols 个元素。
代码清理: 移除了 kernels/attention.cu 中 12 行未使用的 q_reg 数组加载代码。
✅ 验证
$ ctest --output-on-failure
100% tests passed, 0 tests failedInstallation | 安装
git clone https://github.com/LessUp/tiny-llm.git
cd tiny-llm
git checkout v2.0.1
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)Tiny-LLM v2.0.0
Tiny-LLM v2.0.0 — Major Refactoring
Release Date: March 9, 2026
English
⚠️ Breaking Changes
KVCache API Redesign: The previous appendKV() implementation had fragile layer-order dependencies that could lead to incorrect cache writes if layers were called in different orders.
Solution: New stateless design with explicit length advancement.
// After (v2.0+)
for (int i = 0; i < num_layers; i++) {
layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// Explicitly advance length once after all layers
kv_cache.advanceSeqLen(seq_id, num_tokens);🟢 Added
- GitHub Actions workflow for continuous integration
- Automated
clang-formatchecking - CMake modernization with target exports (
tiny_llm::tiny_llm) - Improved compiler warning flags
🟡 Changed
- Minimum CMake version: 3.18
- CUDA architecture auto-detection with fallback
📊 Performance
| Metric | v1.0.0 | v2.0.0 | Change |
|---|---|---|---|
| Build time | 45s | 38s | -15% |
| Test runtime | 2.1s | 1.8s | -14% |
简体中文
⚠️ 破坏性变更
KVCache API 重新设计: 之前的 appendKV() 实现存在脆弱的层序依赖,如果层以不同顺序调用可能导致错误的缓存写入。
解决方案: 新的无状态设计,显式推进长度。
// 之后 (v2.0+)
for (int i = 0; i < num_layers; i++) {
layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// 所有层完成后显式推进长度
kv_cache.advanceSeqLen(seq_id, num_tokens);🟢 新增
- GitHub Actions 持续集成工作流
- 自动
clang-format检查 - CMake 现代化,支持 target 导出 (
tiny_llm::tiny_llm) - 改进的编译器警告标志
🟡 变更
- 最低 CMake 版本:3.18
- CUDA 架构自动检测,带常见架构回退
Installation | 安装
git clone https://github.com/LessUp/tiny-llm.git
cd tiny-llm
git checkout v2.0.0
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)