DashInfer Logo
2.0.0

Getting Started

  • Installation Guide
  • Quick Start Guide for Python API
  • Quick Start Guide for OpenAI API Chat Server
  • Environment Variable Usage

Models

  • Supported Models

LLM Deployment

  • Offline Inference with Python API
  • Engine Runtime Config
  • Guided Decoding
  • Prefix Caching
  • LoRA Support

MultiModal LM (MMLM) Deployment

  • VLM Support

Developer Guide

  • Source Code Build
  • Profiling
  • Coding Style

Quantization

  • Weight Quantization
  • KV Cache Quantization

Subprojects

  • Introduction to Subprojects
  • HIE-DNN
  • SpanAttention

FAQ

  • FAQ
DashInfer
  • Search


© Copyright 2024, Alibaba.inc.

Built with Sphinx using a theme provided by Read the Docs.