Hello
I was looking at some examples of the dataset, in particular in Spanish, and I noticed some numbers were written as digits in the context text, but written as words in the answers.
For example:
Context
[...] Josh Norman [...] consiguió 4 intercepciones [...]
Question
¿Cuántos balones interceptó Josh Norman?
Answer
In the whole context text, the number of interceptions by Josh Norman is never written as "cuatro", but only as "4". Therefore, the model couldn't possibly find the span with the right answer. This isn't handled by the normalize_text function either.
Hello
I was looking at some examples of the dataset, in particular in Spanish, and I noticed some numbers were written as digits in the context text, but written as words in the answers.
For example:
Context
Question
Answer
In the whole context text, the number of interceptions by Josh Norman is never written as "cuatro", but only as "4". Therefore, the model couldn't possibly find the span with the right answer. This isn't handled by the
normalize_textfunction either.