Skip to content

[kaz] Kazakh "small" data is mis-sized #2

@kylebgorman

Description

@kylebgorman

Hello from the future @jkodner05 and team. I noticed that the file part1/suprise_languages/kaz_small.train is the same size (7,000 exemplars) as kaz_large.train in that directory.

  1. I assume this is in error. Can you confirm?
  2. What is the best way to evaluate/compare to prior results in the lower-resource ("small") setting given this finding? Should I just use kaz_small.train as training data in the lower-resource setting, even though it's not "small" (700 examples) like, say, hye_small.train?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions