Skip to content

[Bug]: EMBEDDING_TOKEN_LIMIT not working #2952

@mcr-ksh

Description

@mcr-ksh

Do you need to file an issue?

  • I have searched the existing issues and this bug is not already filed.
  • I believe this is a legitimate bug, not just a question or feature request.

Describe the bug

Embedding not working when context > 2048

Steps to reproduce

see config settings.
i've tried to set the limit to 2047 but still the same issue. Setting seem to be not honored.

Expected Behavior

context would be max value of EMBEDDING_TOKEN_LIMIT.

Paste your config here

EMBEDDING_BINDING=openai
EMBEDDING_MODEL=google/embeddinggemma-300m
EMBEDDING_DIM=768
EMBEDDING_TOKEN_LIMIT=2048
EMBEDDING_BINDING_HOST=http://irsai.xxx.de:8000/v1

Logs and screenshots

custom OpenAi running gemma on vLLM.

(APIServer pid=2810086) INFO:     192.168.1.17:15008 - "POST /v1/embeddings HTTP/1.1" 400 Bad Request
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Error in preprocessing prompt inputs
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Traceback (most recent call last):
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/pooling/embed/serving.py", line 98, in _preprocess
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     ctx.engine_prompts = await self._preprocess_completion(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 927, in _preprocess_completion
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await self._preprocess_cmpl(request, prompts)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 947, in _preprocess_cmpl
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await renderer.render_cmpl_async(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 695, in render_cmpl_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     tok_prompts = await self.tokenize_prompts_async(dict_prompts, tok_params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 448, in tokenize_prompts_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await asyncio.gather(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 441, in tokenize_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await self._tokenize_singleton_prompt_async(prompt, params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 374, in _tokenize_singleton_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return params.apply_post_tokenization(self.tokenizer, prompt)  # type: ignore[arg-type]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 373, in apply_post_tokenization
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     prompt["prompt_token_ids"] = self._validate_tokens(  # type: ignore[typeddict-unknown-key]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 357, in _validate_tokens
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     tokens = validator(tokenizer, tokens)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 337, in _token_len_check
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     raise VLLMValidationError(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] vllm.exceptions.VLLMValidationError: You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/lightrag/operate.py", line 2603, in _locked_process_entity_name
    entity_data = await _merge_nodes_then_upsert(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/operate.py", line 1932, in _merge_nodes_then_upsert
    await safe_vdb_operation_with_exception(
  File "/app/lightrag/utils.py", line 168, in safe_vdb_operation_with_exception
    raise Exception(error_msg) from e
Exception: VDB entity_upsert failed for Zinsen after 3 attempts: Error code: 400 - {'error': {'message': "You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)", 'type': 'BadRequestError', 'param': None, 'code': 400}, 'model': 'google/embeddinggemma-300m'}

Additional Information

  • LightRAG Version: docker
  • Operating System: docker
  • Python Version: docker
  • Related Issues: docker

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions