Skip to content

[Feature] Integrate Lightning SM90 Implementation via CuTe DSL #76

@zheyang0825

Description

@zheyang0825

[Feature] Integrate Lightning SM90 Implementation via CuTe DSL

Background

FlashInfer has a production-ready Lightning Attention SM90 (Hopper) implementation. For API completeness, we could port it into cuLA using CuTe DSL, along with corresponding benchmarks and unit tests.

Upstream PR: flashinfer-ai/flashinfer#2276 from guangyunh-nv

Note: The kernel design and algorithm credits belong to the original authors (guangyunh-nv). This issue tracks a porting effort, not an original implementation.

Upstream Implementation Overview

FlashInfer PR #2276 implements Lightning Attention Prefill on Hopper architecture:

  • SM90 Optimized: TMA warp-specialized architecture with asynchronous copy and warp group scheduling
  • Optional gating and final-state output
  • High-level Python API with chunked prefill support

Task Checklist

  • Port kernels via CuTe DSL: Rewrite the upstream C++ Hopper lightning kernels using CuTe DSL, following cuLA's build system and coding conventions
  • Adapt Python interface: Align with cuLA's existing API style and expose the Lightning SM90 Python bindings
  • Add benchmarks: Add performance benchmarks under cuLA's framework, covering the same settings as the KDA benchmarks:
    • Fixed-length: B={1,2}, T={512, 1024, 4096, 8192, 16384}, H=64, D=128, dtype=bf16
    • Varlen: num_seqs={10, 20}, total_len={4096, 8192, 16384}, distributions: uniform / random / skewed
  • Performance validation: Verify that the CuTe DSL version achieves comparable performance to the upstream C++ version across all benchmark settings above
  • Add unit tests: Port the reference implementation and test cases from upstream tests/gdn/ to ensure correctness
  • Update documentation: Document the newly added Lightning SM90 support

References

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions