Benchmark Only datasets for GAIA,MATH-500,MGSM,MMLU,SRDD,S1K and evaluation for math-500,srdd have been collected.