Skip to content

Lemmatizer in French not getting the right lemma for some Verbs. #7320

@ioExpander

Description

@ioExpander

Hi. Here is an issue I'm getting using some French pipelines (fr_core_news_lg or fr_dep_news_trf).
As you can see it works in some cases but fetches the wrong lemma in some other cases.
So far I've only been able to reproduce the issue with some verbs that all are from the same group (called 'first group' - ending in "er"). But not all of them have the issue as you can see in example 2.
The verbs are detected properly, even with the right tense. But the lemma is missing the trailing "r" in a lot of cases.

At quick lookup against a verb dictionary could work around the issue, but I would rather help fix the root cause here :)

Thank you.

How to reproduce the behaviour

import spacy
import fr_dep_news_trf

nlp = fr_dep_news_trf.load(exclude=["ner"])

#1
doc =nlp("le chat dort dans son lit")
print(*[t.lemma_ for t in doc]) # Correct
# Output : le chat dormir dans son lit

#2
doc =nlp("le chat mange des souris")
print(*[t.lemma_ for t in doc]) # Correct
# output : le chat manger un souris

#3
doc =nlp("le chat monte les escaliers")
print(*[t.lemma_ for t in doc]) # Incorrect
# output : le chat monte le escalier
# Should be : le chat monter le escalier

#4
doc =nlp("le chat saute haut")
print(*[t.lemma_ for t in doc]) # Incorrect
# Output : le chat saute haut 
# Should be : le chat sauter haut

Info about spaCy

  • spaCy version: 3.0.3
  • Platform: Linux-4.19.112+-x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.7.10
  • Pipelines: fr_core_news_lg (3.0.0), fr_dep_news_trf (3.0.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    feat / lemmatizerFeature: Rule-based and lookup lemmatizationhelp wantedContributions welcome!lang / frFrench language data and modelsperf / accuracyPerformance: accuracy

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions