[Flink][Blob phase 1] Allow FlinkSQL reads to materialize OOL blobs

### Task Description

Users should be able to run FlinkSQL queries with `read_blob` against fields that are OOL blobs. Similar to SparkSQL, our implementation should ideally not send a DFS pread request for each field/row - rather we should aim to group requests in batches and issue API calls for consecutive byte ranges. Specifically,  `hoodie.blob.batching.max.gap.bytes` and `hoodie.blob.batching.lookahead.size` should still be honored if set by user

### Approach

Instead of adding a read_blob UDF,
- Create a iterator wrapper `ClosableIterator` that, when `next` is invoked, loads a batch of `hoodie.blob.batching.lookahead.size` records, and similar to Spark, sorts them and issues pre-read calls and materializes the blob data in records
- For FlinkV2 - call it in HoodieSplitReaderFunction for each record batch
- For Legacy Flink - call MergeOnReadInputFormat.initIterator 

In order for HUDI to be able to infer which blob fields should be "materialized", we can require users to pass in https://github.com/apache/hudi/pull/18958#issuecomment-4725262176  blob-related configs in the Flink DDL.

### Related Issues

**Parent feature issue:** (if applicable )
**Related issues:**
NOTE: Use `Relationships` button to add parent/blocking issues after issue is created.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Flink][Blob phase 1] Allow FlinkSQL reads to materialize OOL blobs #19032

Task Description

Approach

Related Issues

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Flink][Blob phase 1] Allow FlinkSQL reads to materialize OOL blobs #19032

Description

Task Description

Approach

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions