This benchmark is extremely useful and I deeply love it for it. However, the models shown in the table results are ancient at this point (therefore irrelevant). Could you guys please update that table with newer models? There's many modern ones that didn't go through any context-length related tests and benchmarks (in particular Deepseek v3.2), which is extremely frustrating since I need to see what's their true context-length (my use-cases are very context-heavy, so I gotta see what models are the most optimal for very long contexts).
This benchmark is extremely useful and I deeply love it for it. However, the models shown in the table results are ancient at this point (therefore irrelevant). Could you guys please update that table with newer models? There's many modern ones that didn't go through any context-length related tests and benchmarks (in particular Deepseek v3.2), which is extremely frustrating since I need to see what's their true context-length (my use-cases are very context-heavy, so I gotta see what models are the most optimal for very long contexts).