Skip to content

Offer methods for Dataset to cover the most common mechanisms for moving data between partitions #245

@karlhigley

Description

@karlhigley

The proposed methods would be shuffle_by_keys, sort_by_keys, and group_by_keys. Right now, we only have shuffle_by_keys.

@rjzamora says:

exposing a clear space for documentation is probably the best reason to add it. That documentation should also clarify that these global operations (requiring inter-partition data movement) should be avoided unless absolutely necessary 🙂

Metadata

Metadata

Assignees

Labels

apiChanges or tweaks to the Core APIchoreMaintenance for the repositoryclean up

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions