Skip to content

[Bug] Vertex AI plugin calculates massive token usage for PDF/DOCUMENT types #2896

@Zhouchuanwen

Description

@Zhouchuanwen

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues Dify issues & Dify Official Plugins, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

1.14.0-rc1

Plugin version

0.0.49

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

  1. The Root Cause: Excluding only IMAGE types, but missing DOCUMENT (PDFs)
    In the text concatenation logic within _convert_one_message_to_text (around line 1011):
if isinstance(content, list):
    content = "".join((c.data for c in content if c.type != PromptMessageContentType.IMAGE))

Here, the developer correctly realized that parsing images into text for token calculation would cause errors, so they deliberately added if c.type != PromptMessageContentType.IMAGE to exclude them.

However, multimodal inputs consist of more than just images! When a user uploads a file of type DOCUMENT (such as a PDF), this if condition evaluates to True (since it is not an IMAGE), causing the code to proceed and read the c.data of the PDF.

  1. The Impact: Appending massive Base64 strings to the prompt
    In Dify's data structures, c.data for file types (including images, PDFs, audio, etc.) typically stores the file's Base64 encoded string (e.g., data:application/pdf;base64,JVBERi0xLjMKJc...).

For a small 2MB PDF file, the Base64 conversion inflates its size by about 33%, resulting in approximately 2.7 million characters.

Consequently, this massive 2.7-million-character Base64 string is blindly concatenated into the prompt text. When this text is subsequently fed into the local GPT-2 tokenizer (_get_num_tokens_by_gpt2), it results in wildly inflated and inaccurate token counts (e.g., millions of tokens for a single small PDF).

✔️ Error log

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions