Skip to content

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

@shanzhou2186

Description

@shanzhou2186

When running vllm serving with 16 threads using the model DeepSeek-Distill-Qwen-7b, the result is wrong with the prompt below.
xfastertransformer 1.8.2.
vllm-xft 0.5.5.0

The result is correct while running 12 threads (OMP_NUM_THREADS=12).

The prompt and error message:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-qwen-7b-xft",
"messages": [{"role": "user", "content": "请帮我用 HTML 生成一个五子棋游戏,所有代码都保存在一个 HTML 中。"}],
"max_tokens": 256,
"temperature": 0.7
}'
{"id":"chat-9dc50d6d9c8b499f9b4e13c0f9cd7644","object":"chat.completion","created":1739864206,"model":"deepseek-qwen-7b-xft","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":23,"total_tokens":279,"completion_tokens":256},"prompt_logprobs":null}(base)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions