Introduction to Subprojects

DashInfer implements a variety of highly optimized CUDA kernels, among which some are provided as self-contained subprojects. The two subprojects are:

HIE-DNN: HIE-DNN is an operator library for high-performance inference of deep neural network (DNN), mostly complying with ONNX open format.
SpanAttention: SpanAttention is a high-performance decode-phase attention implementation with paged KV cache for LLM inference on CUDA-enabled devices.

For detailed information about the subprojects, please refer to the following links: