Skip to content

RL training: Base64 image decode errors (Incorrect padding / Invalid base64-encoded string) #31

@Ryann-Ran

Description

@Ryann-Ran

Hi, thanks for your great work!

I’m trying to reproduce the RL training and I’m consistently seeing errors like:

  • [ERROR code] Failed to decode image 0: Incorrect padding
  • [ERROR code] Failed to decode image 0: Invalid base64-encoded string: number of data characters (336877) cannot be 1 more than a multiple of 4

My main question is whether these errors correspond to the issue described in the README (i.e., caused by network pressure):

During RL training, transmitting a large number of high-resolution images to a single node in a short period may saturate the bandwidth, leading to retransmissions and timeouts.

I tried launching 3 code servers, each with 4 processes, on a single machine with 8× A800 (80GB), but it didn’t work out. I used commands like:

IMAGE=chenshawn6915/multimodal-ipython-sandbox:oss-v2
docker run -p 18901:18901 -p 18902:18902 -p 18903:18903 -p 18904:18904 $IMAGE
docker run -p 28901:18901 -p 28902:18902 -p 28903:18903 -p 28904:18904 $IMAGE
docker run -p 38901:18901 -p 38902:18902 -p 38903:18903 -p 38904:18904 $IMAGE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions