Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16

When running vllm serving with 16 threads using the model DeepSeek-Distill-Qwen-7b, the result is wrong with the prompt below.
xfastertransformer 1.8.2.
vllm-xft                   0.5.5.0

The result is correct while running 12 threads (OMP_NUM_THREADS=12).

The prompt and error message:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{                                                                                                                
    "model": "deepseek-qwen-7b-xft",
    "messages": [{"role": "user", "content": "**请帮我用 HTML 生成一个五子棋游戏，所有代码都保存在一个 HTML 中**。"}],
    "max_tokens": 256,
    "temperature": 0.7
  }'
{"id":"chat-9dc50d6d9c8b499f9b4e13c0f9cd7644","object":"chat.completion","created":1739864206,"model":"deepseek-qwen-7b-xft","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":23,"total_tokens":279,"completion_tokens":256},"prompt_logprobs":null}(base)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions