Skip to content

Preserve categorical dtype with map in get() #448

@hagenw

Description

@hagenw

When requesting a table column with defined categories, dtype category is returned:

>>> import audb
>>> db = audb.load("emodb", version="1.4.1", only_metadata=True, full_path=False)
>>> db["files"]["transcription"].get().head()
file
wav/03a01Fa.wav    a01
wav/03a01Nc.wav    a01
wav/03a01Wa.wav    a01
wav/03a02Fc.wav    a02
wav/03a02Nc.wav    a02
Name: transcription, dtype: category
Categories (10, object): ['a01', 'a02', 'a04', 'a05', ..., 'b02', 'b03', 'b09', 'b10']

But when using map to map to labels defined by the scheme, we loose the category dtype:

>>> db["files"]["transcription"].get(map=True).head()
file
wav/03a01Fa.wav    Der Lappen liegt auf dem Eisschrank.
wav/03a01Nc.wav    Der Lappen liegt auf dem Eisschrank.
wav/03a01Wa.wav    Der Lappen liegt auf dem Eisschrank.
wav/03a02Fc.wav       Das will sie am Mittwoch abgeben.
wav/03a02Nc.wav       Das will sie am Mittwoch abgeben.
Name: True, dtype: string

I'm not sure yet if this is a feature or a bug, but it feels strange to me, that we loose the category dtype.


The same holds true, when the scheme labels are given by a misc table:

>>> db["files"]["speaker"].get().head()
file
wav/03a01Fa.wav    3
wav/03a01Nc.wav    3
wav/03a01Wa.wav    3
wav/03a02Fc.wav    3
wav/03a02Nc.wav    3
Name: speaker, dtype: category
Categories (10, int64): [3, 8, 9, 10, ..., 13, 14, 15, 16]

>>> db["files"]["speaker"].get(map="age").head()
file
wav/03a01Fa.wav    31
wav/03a01Nc.wav    31
wav/03a01Wa.wav    31
wav/03a02Fc.wav    31
wav/03a02Nc.wav    31
Name: age, dtype: Int64

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions