[Bug]: EMBEDDING_TOKEN_LIMIT not working

### Do you need to file an issue?

- [x] I have searched the existing issues and this bug is not already filed.
- [x] I believe this is a legitimate bug, not just a question or feature request.

### Describe the bug
Embedding not working when context > 2048

### Steps to reproduce
see config settings.
i've tried to set the limit to 2047 but still the same issue. Setting seem to be not honored. 


### Expected Behavior

context would be max value of EMBEDDING_TOKEN_LIMIT.

# Paste your config here

```
EMBEDDING_BINDING=openai
EMBEDDING_MODEL=google/embeddinggemma-300m
EMBEDDING_DIM=768
EMBEDDING_TOKEN_LIMIT=2048
EMBEDDING_BINDING_HOST=http://irsai.xxx.de:8000/v1
```

### Logs and screenshots

custom OpenAi running gemma on vLLM.

```
(APIServer pid=2810086) INFO:     192.168.1.17:15008 - "POST /v1/embeddings HTTP/1.1" 400 Bad Request
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Error in preprocessing prompt inputs
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] Traceback (most recent call last):
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/pooling/embed/serving.py", line 98, in _preprocess
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     ctx.engine_prompts = await self._preprocess_completion(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 927, in _preprocess_completion
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await self._preprocess_cmpl(request, prompts)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/engine/serving.py", line 947, in _preprocess_cmpl
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await renderer.render_cmpl_async(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 695, in render_cmpl_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     tok_prompts = await self.tokenize_prompts_async(dict_prompts, tok_params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 448, in tokenize_prompts_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await asyncio.gather(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 441, in tokenize_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return await self._tokenize_singleton_prompt_async(prompt, params)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/base.py", line 374, in _tokenize_singleton_prompt_async
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     return params.apply_post_tokenization(self.tokenizer, prompt)  # type: ignore[arg-type]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 373, in apply_post_tokenization
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     prompt["prompt_token_ids"] = self._validate_tokens(  # type: ignore[typeddict-unknown-key]
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 357, in _validate_tokens
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     tokens = validator(tokenizer, tokens)
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]   File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/renderers/params.py", line 337, in _token_len_check
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108]     raise VLLMValidationError(
(APIServer pid=2810086) ERROR 04-17 17:59:34 [serving.py:108] vllm.exceptions.VLLMValidationError: You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)
```

The above exception was the direct cause of the following exception:
```
Traceback (most recent call last):
  File "/app/lightrag/operate.py", line 2603, in _locked_process_entity_name
    entity_data = await _merge_nodes_then_upsert(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/lightrag/operate.py", line 1932, in _merge_nodes_then_upsert
    await safe_vdb_operation_with_exception(
  File "/app/lightrag/utils.py", line 168, in safe_vdb_operation_with_exception
    raise Exception(error_msg) from e
Exception: VDB entity_upsert failed for Zinsen after 3 attempts: Error code: 400 - {'error': {'message': "You passed 2049 input tokens and requested 0 output tokens. However, the model's context length is only 2048 tokens, resulting in a maximum input length of 2048 tokens. Please reduce the length of the input prompt. (parameter=input_tokens, value=2049)", 'type': 'BadRequestError', 'param': None, 'code': 400}, 'model': 'google/embeddinggemma-300m'}
```





### Additional Information

- LightRAG Version: docker 
- Operating System: docker
- Python Version: docker
- Related Issues: docker


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: EMBEDDING_TOKEN_LIMIT not working #2952

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

Paste your config here

Logs and screenshots

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: EMBEDDING_TOKEN_LIMIT not working #2952

Description

Do you need to file an issue?

Describe the bug

Steps to reproduce

Expected Behavior

Paste your config here

Logs and screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions