[Bug] Vertex AI plugin calculates massive token usage for PDF/DOCUMENT types

### Self Checks

- [x] This is only for bug report, if you would like to ask a question, please head to [Discussions](https://github.com/langgenius/dify/discussions/categories/general).
- [x] I have searched for existing issues [Dify issues](https://github.com/langgenius/dify/issues) & [Dify Official Plugins](https://github.com/langgenius/dify-official-plugins/issues), including closed ones.
- [x] I confirm that I am using English to submit this report (我已阅读并同意 [Language Policy](https://github.com/langgenius/dify/issues/1542)).
- [x] [FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
- [x] Please do not modify this template :) and fill in all the required fields.

### Dify version

1.14.0-rc1

### Plugin version

0.0.49

### Cloud or Self Hosted

Self Hosted (Docker)

### Steps to reproduce

1. The Root Cause: Excluding only IMAGE types, but missing DOCUMENT (PDFs)
In the text concatenation logic within _convert_one_message_to_text (around line 1011):

```
if isinstance(content, list):
    content = "".join((c.data for c in content if c.type != PromptMessageContentType.IMAGE))
```
Here, the developer correctly realized that parsing images into text for token calculation would cause errors, so they deliberately added if c.type != PromptMessageContentType.IMAGE to exclude them.

However, multimodal inputs consist of more than just images! When a user uploads a file of type DOCUMENT (such as a PDF), this if condition evaluates to True (since it is not an IMAGE), causing the code to proceed and read the c.data of the PDF.

2. The Impact: Appending massive Base64 strings to the prompt
In Dify's data structures, c.data for file types (including images, PDFs, audio, etc.) typically stores the file's Base64 encoded string (e.g., data:application/pdf;base64,JVBERi0xLjMKJc...).

For a small 2MB PDF file, the Base64 conversion inflates its size by about 33%, resulting in approximately 2.7 million characters.

Consequently, this massive 2.7-million-character Base64 string is blindly concatenated into the prompt text. When this text is subsequently fed into the local GPT-2 tokenizer (_get_num_tokens_by_gpt2), it results in wildly inflated and inaccurate token counts (e.g., millions of tokens for a single small PDF).

### ✔️ Error log

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Vertex AI plugin calculates massive token usage for PDF/DOCUMENT types #2896

Self Checks

Dify version

Plugin version

Cloud or Self Hosted

Steps to reproduce

✔️ Error log

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Vertex AI plugin calculates massive token usage for PDF/DOCUMENT types #2896

Description

Self Checks

Dify version

Plugin version

Cloud or Self Hosted

Steps to reproduce

✔️ Error log

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions