DashInfer Logo
2.0.0

Getting Started

  • Installation Guide
  • Quick Start Guide for Python API
  • Quick Start Guide for OpenAI API Chat Server
  • Environment Variable Usage

Models

  • Supported Models

LLM Deployment

  • Offline Inference with Python API
  • Engine Runtime Config
  • Guided Decoding
  • Prefix Caching
  • LoRA Support

MultiModal LM (MMLM) Deployment

  • VLM Support

Developer Guide

  • Source Code Build
  • Profiling
  • Coding Style

Quantization

  • Weight Quantization
  • KV Cache Quantization

Subprojects

  • Introduction to Subprojects
  • HIE-DNN
  • SpanAttention

FAQ

  • FAQ
DashInfer
  • Index

Index

A | G | I | L | M | P | R | T

A

  • add_multimedia_content() (MultiMediaInfo method)

G

  • GeneratedElements (built-in class)
  • GeneratedLength() (ResultQueue method)
  • GenerateStatus() (ResultQueue method)
  • Get() (ResultQueue method)
  • GetNoWait() (ResultQueue method)
  • GetWithTimeout() (ResultQueue method)

I

  • ids_from_generate (GeneratedElements attribute)

L

  • log_probs_list (GeneratedElements attribute)

M

  • MultiMediaInfo (built-in class)

P

  • prefix_cache_len (GeneratedElements attribute)
  • prefix_len_cpu (GeneratedElements attribute)
  • prefix_len_gpu (GeneratedElements attribute)

R

  • RequestStatInfo() (ResultQueue method)
  • ResultQueue (built-in class)
  • ResultQueue.GenerateRequestStatus.ContextFinished (built-in variable)
  • ResultQueue.GenerateRequestStatus.GenerateFinished (built-in variable)
  • ResultQueue.GenerateRequestStatus.GenerateInterrupted (built-in variable)
  • ResultQueue.GenerateRequestStatus.Generating (built-in variable)
  • ResultQueue.GenerateRequestStatus.Init (built-in variable)

T

  • tensors_from_model_inference (GeneratedElements attribute)
  • token_logprobs_list (GeneratedElements attribute)

© Copyright 2024, Alibaba.inc.

Built with Sphinx using a theme provided by Read the Docs.