Do you need to file an issue?
Describe the bug
Embedding not working when context > 2048
Steps to reproduce
see config settings.
i've tried to set the limit to 2047 but still the same issue. Setting seem to be not honored.
Expected Behavior
context would be max value of EMBEDDING_TOKEN_LIMIT.
Paste your config here
EMBEDDING_BINDING=openai
EMBEDDING_MODEL=google/embeddinggemma-300m
EMBEDDING_DIM=768
EMBEDDING_TOKEN_LIMIT=2048
EMBEDDING_BINDING_HOST=http://irsai.xxx.de:8000/v1
Logs and screenshots
custom OpenAi running gemma on vLLM.
(APIServer pid=2810086) INFO: 192.168.1.17:15008 - "POST /v1/embeddings HTTP/1.1" 400 Bad Request
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Error in preprocessing prompt inputs
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Traceback (most recent call last):
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/pooling/embed/serving.py", line 98, in _preprocess
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] ctx.engine_prompts = await self._preprocess_completion(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 927, in _preprocess_completion
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] return await self._preprocess_cmpl(request, prompts)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 947, in _preprocess_cmpl
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] return await renderer.render_cmpl_async(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 695, in render_cmpl_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] tok_prompts = await self.tokenize_prompts_async(dict_prompts, tok_params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 448, in tokenize_prompts_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] return await asyncio.gather(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 441, in tokenize_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] return await self._tokenize_singleton_prompt_async(prompt, params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 374, in _tokenize_singleton_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] return params.apply_post_tokenization(self.tokenizer, prompt) # type: ignore[arg-type]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 373, in apply_post_tokenization
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] prompt["prompt_token_ids"] = self._validate_tokens( # type: ignore[typeddict-unknown-key]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 357, in _validate_tokens
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] tokens = validator(tokenizer, tokens)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 337, in _token_len_check
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] raise VLLMValidationError(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] vllm.exceptions.VLLMValidationError: You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/lightrag/operate.py", line 2603, in _locked_process_entity_name
entity_data = await _merge_nodes_then_upsert(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lightrag/operate.py", line 1932, in _merge_nodes_then_upsert
await safe_vdb_operation_with_exception(
File "/app/lightrag/utils.py", line 168, in safe_vdb_operation_with_exception
raise Exception(error_msg) from e
Exception: VDB entity_upsert failed for Zinsen after 3 attempts: Error code: 400 - {'error': {'message': "You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)", 'type': 'BadRequestError', 'param': None, 'code': 400}, 'model': 'google/embeddinggemma-300m'}
Additional Information
- LightRAG Version: docker
- Operating System: docker
- Python Version: docker
- Related Issues: docker
Do you need to file an issue?
Describe the bug
Embedding not working when context > 2048
Steps to reproduce
see config settings.
i've tried to set the limit to 2047 but still the same issue. Setting seem to be not honored.
Expected Behavior
context would be max value of EMBEDDING_TOKEN_LIMIT.
Paste your config here
Logs and screenshots
custom OpenAi running gemma on vLLM.
The above exception was the direct cause of the following exception:
Additional Information