Skip to content
#

cuda-optimization

Here are 6 public repositories matching this topic...

Language: All
Filter by language
jetson-orin-matmul-analysis

CUDA matrix multiplication benchmarking on Jetson Orin Nano. Four implementations, three power modes, five matrix sizes. 99.5% mathematical validation. C++/CUDA and Python.

  • Updated Apr 2, 2026
  • Python

🎓 CUDA HPC Kernel Optimization Lab: Progressive GEMM, FlashAttention, Tensor Core & CUDA 13 Features | 从朴素到 Tensor Core 的 CUDA 高性能算子优化实验室

  • Updated Apr 22, 2026
  • Cuda

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

  • Updated Apr 22, 2026
  • Python

Improve this page

Add a description, image, and links to the cuda-optimization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cuda-optimization topic, visit your repo's landing page and select "manage topics."

Learn more