## Quick Start Guide for OpenAI API Chat Server ### Test the OpenAI API Server (fastapi) prepare: `pip install fastapi uvicorn openai` start server `python ./dash-infer/examples/api_server/fastapi/fastapi-server.py` user may change the parameter by check `fastapi-server.py -h` After sever start, server will print some log like: ``` INFO: Started server process [4898] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) ``` test with openai client `python ./dash-infer/examples/api_server/fastapi/openai-client.py` This client will call with openai client with streaming and block mode. ### Start OpenAI Server with Docker (fastchat) We have provide a Docker image to start OpenAI server. This example demonstrates how to use Docker to run DashInfer as an inference engine, providing OpenAI API endpoints. ```shell docker run \ --shm-size=8g \ --network=host \ --ipc=host \ --gpus=all \ -v=: \ dashinfer/fschat_ubuntu_cuda:v2.0 \ -h -p -m -- --model-path --device-list ``` - ``: Path to your model on the host - ``: Path where the model is mounted in the container - ``: List of devices supported by the model, e.g., `0,1` - `-h`: The listening address of the API server - `-p`: The listening port of the API server - `-m`: Use Modelscope to download the model - `--model-path`: Path for loading or downloading the model - `--device-list`: List of CUDA devices used to run the model For example: ```shell docker run \ --shm-size=8g \ --network=host \ --ipc=host \ --gpus=all \ -v=/mnt/models/modelscope/hub/qwen/Qwen2.5-7B-Instruct:/models/qwen/Qwen2.5-7B-Instruct \ dashinfer/fschat_ubuntu_cuda:v2.0 \ -h 127.0.0.1 -p 8088 -m -- --model-path /models/qwen/Qwen2.5-7B-Instruct --device-list 0 ``` You can also build you owner fastchat Docker image by modifying the Docker file `scripts/docker/fschat_ubuntu_cuda.Dockerfile`. ### Testing the OpenAI API Server (fastchat) #### Testing with OpenAI SDK In `examples/api_server/fschat/openai-client.py`, the official OpenAI SDK is used to test the API server. ```shell python examples/api_server/fschat/openai-client.py ``` #### Testing with curl Assuming the OpenAI Server has been started and the port number is `8088`, you can use the following command: ```shell curl http://127.0.0.1:8088/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "Qwen2-7B-Instruct", "messages": [{"role": "user", "content": "Hello! What is your name?"}] }' ```