Skip to content

Releases: LessUp/tiny-llm

Tiny-LLM v2.0.1

21 Apr 19:41

Choose a tag to compare

Tiny-LLM v2.0.1 — Bug Fixes

Release Date: April 16, 2026


English

🔵 Fixed

Critical: Scale Dimension Calculation Error

Severity: Critical
Impact: Test utility only
File: tests/test_integration.cu

The createRandomWeight function had an incorrect scale tensor dimension calculation:

// ❌ INCORRECT (rows and cols swapped)
int num_groups = (cols + group_size - 1) / group_size;
w.scales = randomDeviceFP16(rows * num_groups, ...);

// ✅ CORRECT  
int num_groups = (rows + group_size - 1) / group_size;
w.scales = randomDeviceFP16(num_groups * cols, ...);

Why this matters: W8A16 matmul uses [rows/group_size, cols] to index scales, requiring ceil(rows/g) * cols elements.

Code Cleanup: Removed 12 lines of unused q_reg array loading code in kernels/attention.cu.

✅ Verification

$ ctest --output-on-failure
100% tests passed, 0 tests failed

简体中文

🔵 修复

严重: 尺度维度计算错误

严重程度: 严重
影响范围: 仅测试工具
文件: tests/test_integration.cu

createRandomWeight 函数中存在尺度张量维度计算错误:

// ❌ 错误 (rows 和 cols 互换)
int num_groups = (cols + group_size - 1) / group_size;
w.scales = randomDeviceFP16(rows * num_groups, ...);

// ✅ 正确  
int num_groups = (rows + group_size - 1) / group_size;
w.scales = randomDeviceFP16(num_groups * cols, ...);

重要性: W8A16 矩阵乘使用 [rows/group_size, cols] 索引尺度,需要 ceil(rows/g) * cols 个元素。

代码清理: 移除了 kernels/attention.cu 中 12 行未使用的 q_reg 数组加载代码。

✅ 验证

$ ctest --output-on-failure
100% tests passed, 0 tests failed

Installation | 安装

git clone https://github.com/LessUp/tiny-llm.git
cd tiny-llm
git checkout v2.0.1
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Documentation | 文档 | API Reference | API 参考

Tiny-LLM v2.0.0

21 Apr 19:41

Choose a tag to compare

Tiny-LLM v2.0.0 — Major Refactoring

Release Date: March 9, 2026


English

⚠️ Breaking Changes

KVCache API Redesign: The previous appendKV() implementation had fragile layer-order dependencies that could lead to incorrect cache writes if layers were called in different orders.

Solution: New stateless design with explicit length advancement.

// After (v2.0+)
for (int i = 0; i < num_layers; i++) {
    layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// Explicitly advance length once after all layers
kv_cache.advanceSeqLen(seq_id, num_tokens);

🟢 Added

  • GitHub Actions workflow for continuous integration
  • Automated clang-format checking
  • CMake modernization with target exports (tiny_llm::tiny_llm)
  • Improved compiler warning flags

🟡 Changed

  • Minimum CMake version: 3.18
  • CUDA architecture auto-detection with fallback

📊 Performance

Metric v1.0.0 v2.0.0 Change
Build time 45s 38s -15%
Test runtime 2.1s 1.8s -14%

简体中文

⚠️ 破坏性变更

KVCache API 重新设计: 之前的 appendKV() 实现存在脆弱的层序依赖,如果层以不同顺序调用可能导致错误的缓存写入。

解决方案: 新的无状态设计,显式推进长度。

// 之后 (v2.0+)
for (int i = 0; i < num_layers; i++) {
    layers[i]->forward(hidden_states, kv_cache, seq_id, position, stream);
}
// 所有层完成后显式推进长度
kv_cache.advanceSeqLen(seq_id, num_tokens);

🟢 新增

  • GitHub Actions 持续集成工作流
  • 自动 clang-format 检查
  • CMake 现代化,支持 target 导出 (tiny_llm::tiny_llm)
  • 改进的编译器警告标志

🟡 变更

  • 最低 CMake 版本:3.18
  • CUDA 架构自动检测,带常见架构回退

Installation | 安装

git clone https://github.com/LessUp/tiny-llm.git
cd tiny-llm
git checkout v2.0.0
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

Documentation | 文档 | API Reference | API 参考