Skip to content

Use a grouper instead of unique_id #23

@david-waterworth

Description

@david-waterworth

In the main feature extract loop, tsfeatures groups by the hard coded unique_id columns, and then applies transforms the grouped data.

ts_features = pool.starmap(partial_get_feats, ts.groupby('unique_id'))

It would be more generic if you could pass in a Grouper to perform the grouping, i.e. at the moment I have to group my data then create a flat column from the multi-index (i.e. a column of tuples)

# group by id and day
grouper = [pd.Grouper(key='id'), pd.Grouper(key='time', freq='1D')]
grouped_data = df.groupby(grouper, group_keys=True)

# join groups, use grouper key as new index
grouped_data = grouped_data.apply(lambda x: x.drop(columns=['id']))
grouped_data = grouped_data.droplevel(-1)

# flatten index to tuples
grouped_data.index = grouped_data.index.to_flat_index()
grouped_data.index.name = 'id'
grouped_data = grouped_data.reset_index()

The issue I've had with that is that I've been experimenting with Dask and data formats like parquet don't seem to support this column type (you can create a Dask data frame from a pandas dataframe that contains tuple columns but so far I've been unable to persist them). I know tsfeatures doesn't support Dask at this stage but I guess it might be on the roadmap?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions