Qwen2-local-api

Qwen2 vllm OpenAI api & gradio front-end

Start

1. Install Dependencies

conda env:

Install Anaconda dependencies listed in environment.yml, you may want to use a command like this:

conda env create -f environment.yml -n qwen2

Auto-GPTQ:

unzip AutoGPTQ_v0.7.1.zip
cd AutoGPTQ_v0.7.1 && pip install -e .

NCCL:

# Extract source code (you may want to download your own nccl files at nvidia.com)
tar -xvf nccl_2.22.3-1+cuda12.2_x86_64.txz
# Set NCCL path to the .so file
export VLLM_NCCL_SO_PATH=~/Qwen/nccl_2.22.3-1+cuda12.2_x86_64/lib/libnccl.so

Note that: this is for x86_64 machine and CUDA driver >= 12.0

You can download your version of NCCL Library at NVIDIA Collective Communications Library (NCCL) | NVIDIA Developer

2. Download Model

python model_download.py

3. Start Server

bash vllm_init.sh

Now an OpenAI API is supposed to be running on http://0.0.0.0:8008

4. Call API

for chatbot web demo, run

# start web UI on local IP port 8000 (by default)
python qw2_web_openai.py

for python API calling, run

# get whole response after processing
python openai_test.py
# get steaming resonse
python openai_test_streaming.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2-local-api

Start

1. Install Dependencies

conda env:

Auto-GPTQ:

NCCL:

2. Download Model

3. Start Server

4. Call API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
AutoGPTQ_v0.7.1.zip		AutoGPTQ_v0.7.1.zip
README.md		README.md
environment.yml		environment.yml
model_download.py		model_download.py
openai_test.py		openai_test.py
openai_test_streaming.py		openai_test_streaming.py
qw2_web_openai.py		qw2_web_openai.py
qw2_web_openai_psychat.py		qw2_web_openai_psychat.py
vllm_init.sh		vllm_init.sh

Folders and files

Latest commit

History

Repository files navigation

Qwen2-local-api

Start

1. Install Dependencies

conda env:

Auto-GPTQ:

NCCL:

2. Download Model

3. Start Server

4. Call API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages