Thank you for sharing this excellent work and making the code available
While reading the paper. I noticed that Section 3.3 describes two fusion strategies:
- Merged Tokens (channel‑wise sum, Eq. 7–8)
- Separate Tokens (sequence‑wise concatenation, Eq. 9–10), which preserves modality‑specific information at the cost of increased computation.
However, in the released repository I only see the merged‑tokens implementation. Could you clarify:
- Was the separate tokens variant implemented internally for the experiments, but not released here?
- If so, do you plan to share that code, or could you provide guidance on how to adapt the current implementation to support it?
Thank you for sharing this excellent work and making the code available
While reading the paper. I noticed that Section 3.3 describes two fusion strategies:
However, in the released repository I only see the merged‑tokens implementation. Could you clarify: