Hi. Here is an issue I'm getting using some French pipelines (fr_core_news_lg or fr_dep_news_trf).
As you can see it works in some cases but fetches the wrong lemma in some other cases.
So far I've only been able to reproduce the issue with some verbs that all are from the same group (called 'first group' - ending in "er"). But not all of them have the issue as you can see in example 2.
The verbs are detected properly, even with the right tense. But the lemma is missing the trailing "r" in a lot of cases.
At quick lookup against a verb dictionary could work around the issue, but I would rather help fix the root cause here :)
Thank you.
How to reproduce the behaviour
import spacy
import fr_dep_news_trf
nlp = fr_dep_news_trf.load(exclude=["ner"])
#1
doc =nlp("le chat dort dans son lit")
print(*[t.lemma_ for t in doc]) # Correct
# Output : le chat dormir dans son lit
#2
doc =nlp("le chat mange des souris")
print(*[t.lemma_ for t in doc]) # Correct
# output : le chat manger un souris
#3
doc =nlp("le chat monte les escaliers")
print(*[t.lemma_ for t in doc]) # Incorrect
# output : le chat monte le escalier
# Should be : le chat monter le escalier
#4
doc =nlp("le chat saute haut")
print(*[t.lemma_ for t in doc]) # Incorrect
# Output : le chat saute haut
# Should be : le chat sauter haut
Info about spaCy
- spaCy version: 3.0.3
- Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.10
- Pipelines: fr_core_news_lg (3.0.0), fr_dep_news_trf (3.0.0)
Hi. Here is an issue I'm getting using some French pipelines (fr_core_news_lg or fr_dep_news_trf).
As you can see it works in some cases but fetches the wrong lemma in some other cases.
So far I've only been able to reproduce the issue with some verbs that all are from the same group (called 'first group' - ending in "er"). But not all of them have the issue as you can see in example 2.
The verbs are detected properly, even with the right tense. But the lemma is missing the trailing "r" in a lot of cases.
At quick lookup against a verb dictionary could work around the issue, but I would rather help fix the root cause here :)
Thank you.
How to reproduce the behaviour
Info about spaCy