DashInfer Logo
2.0.0

Getting Started

  • Installation Guide
  • Quick Start Guide for Python API
  • Quick Start Guide for OpenAI API Chat Server
  • Environment Variable Usage

Models

  • Supported Models

LLM Deployment

  • Offline Inference with Python API
  • Engine Runtime Config
  • Guided Decoding
  • Prefix Caching
  • LoRA Support

MultiModal LM (MMLM) Deployment

  • VLM Support

Developer Guide

  • Source Code Build
  • Profiling
  • Coding Style

Quantization

  • Weight Quantization
  • KV Cache Quantization

Subprojects

  • Introduction to Subprojects
  • HIE-DNN
  • SpanAttention

FAQ

  • FAQ
DashInfer
  • Supported Models
  • View page source

Supported Models

Supported Models

Architecture

Models

Hugging Face Models

ModelScope Models

Quantization Models

QWenLMHeadModel

Qwen1

Qwen/Qwen-1_8B-Chat

Qwen/Qwen-7B-Chat

Qwen/Qwen-14B-Chat

qwen/Qwen-1_8B-Chat

qwen/Qwen-7B-Chat

qwen/Qwen-14B-Chat

Qwen2ForCausalLM

Qwen1.5

Qwen2

Qwen/Qwen1.5-7B-Chat

Qwen/Qwen1.5-14B-Chat

qwen/Qwen1.5-7B-Chat

qwen/Qwen1.5-14B-Chat

qwen/Qwen2-72B-Instruct

qwen/Qwen2-72B-Instruct-GPTQ-Int4

qwen/Qwen2-72B-Instruct-GPTQ-Int8

Qwen2MoeForCausalLM

Qwen1.5 MoE

Qwen2 MoE

qwen/Qwen1.5-MoE-A2.7B-Chat

LlamaForCausalLM

LLaMA-2

LLaMA-3

meta-llama/Llama-2-7b-chat-hf

meta-llama/Llama-2-13b-chat-hf

meta-llama/Meta-Llama-3-8B-Instruct

modelscope/Llama-2-7b-chat-ms

modelscope/Llama-2-13b-chat-ms

modelscope/Meta-Llama-3-8B-Instruct

ChatGLMModel

ChatGLM

THUDM/glm-4-9b-chat

ZhipuAI/glm-4-9b-chat

Previous Next

© Copyright 2024, Alibaba.inc.

Built with Sphinx using a theme provided by Read the Docs.