CUDA Kernel Academy

From Zero to Hero: Systematic CUDA High-Performance Kernel Development
从零到极致：系统性学习 CUDA 高性能算子开发

A structured CUDA learning repository covering matrix multiplication, reusable kernels, advanced optimization techniques, and a lightweight inference engine. Master GPU programming from SGEMM basics to Tensor Core optimization.

Features

Feature	Description
Progressive Learning Path	4 interconnected modules from basics to production
Performance-Focused	Real benchmarks against cuBLAS, not toy examples
Modern C++	Leverages C++17/20 features for clean, safe GPU code
Production Patterns	Header-only library design, memory pools, stream management
Multi-Architecture	Supports Volta (sm_70) through Hopper (sm_90)

Documentation

Sub-projects

#	Project	Focus	Build
01	SGEMM Tutorial	Progressive SGEMM optimization	Standalone Makefile
02	TensorCraft Core	Header-only kernel library	CMake
03	HPC Advanced	Advanced CUDA/HPC techniques	CMake
04	Inference Engine	Lightweight DL inference engine	CMake

Learning path

01-SGEMM Tutorial (1-2 weeks)
        ↓  Master shared memory, bank conflicts, WMMA
02-TensorCraft Core (2-3 weeks)
        ↓  Build reusable kernels, API design
03-HPC Advanced (3-4 weeks)
        ↓  CUDA 13 features, FlashAttention
04-Inference Engine (2-3 weeks)
        ↓  Complete inference framework

Prerequisites: C/C++ basics, linear algebra fundamentals. CUDA experience helpful but not required.

Quick start

git clone https://github.com/LessUp/cuda-kernel-academy.git
cd cuda-kernel-academy

cmake --preset default
cmake --build --preset default
ctest --preset default

List available presets with:

cmake --list-presets

Build notes

The root CMake build covers 02-tensorcraft-core, 03-hpc-advanced, 04-inference-engine, common, and examples.
01-sgemm-tutorial is intentionally standalone and uses its own Makefile.
GitHub Actions currently runs CPU-safe checks (formatting, docs, links, preset validation). Full CUDA builds/tests should be run on a local machine with a GPU.

Build options

Option	Default	Description
`BUILD_TENSORCRAFT`	ON	Build TensorCraft Core
`BUILD_HPC_ADVANCED`	ON	Build HPC Advanced
`BUILD_INFERENCE_ENGINE`	ON	Build Inference Engine
`BUILD_EXAMPLES`	ON	Build examples
`BUILD_TESTS`	ON	Build tests
`BUILD_BENCHMARKS`	ON	Build benchmarks
`BUILD_PYTHON_BINDINGS`	OFF	Build optional Python bindings

Requirements

Component	Minimum	Recommended
CUDA Toolkit	11.0	12.x
CMake	3.20	3.24+
Compiler	GCC 9 / Clang 10	GCC 11+
GPU	Volta (sm_70)	Ampere/Ada (sm_80+)

Supported Architectures:

Arch	sm	GPUs
Volta	70	V100
Turing	75	RTX 2080, T4
Ampere	80, 86	A100, RTX 3090
Ada	89	RTX 4090, L40
Hopper	90	H100

References

CUDA C++ Programming Guide
CUTLASS - CUDA Templates for Linear Algebra
Simon Boehm's GEMM Tutorial - Excellent optimization walkthrough
NVIDIA Developer Blog - Latest techniques and best practices

Citation

If you find this project helpful in your research or work:

@misc{cuda-kernel-academy,
  author = {CUDA Kernel Academy Contributors},
  title = {CUDA Kernel Academy: A Comprehensive Learning Path for High-Performance CUDA Kernel Development},
  year = {2026},
  publisher = {GitHub},
  url = {https://github.com/LessUp/cuda-kernel-academy}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
.kiro/specs/open-source-project-polish		.kiro/specs/open-source-project-polish
01-sgemm-tutorial		01-sgemm-tutorial
02-tensorcraft-core		02-tensorcraft-core
03-hpc-advanced		03-hpc-advanced
04-inference-engine		04-inference-engine
assets		assets
changelog		changelog
common		common
docs		docs
examples		examples
scripts		scripts
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitbook.yaml		.gitbook.yaml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
SUMMARY.md		SUMMARY.md
VERSION		VERSION
book.json		book.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Kernel Academy

Features

Documentation

Sub-projects

Learning path

Quick start

Build notes

Build options

Requirements

References

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CUDA Kernel Academy

Features

Documentation

Sub-projects

Learning path

Quick start

Build notes

Build options

Requirements

References

Citation

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages