Skip to content

[Milestone 4] Optimize Spark SQL performance #18115

@the-other-tim-brown

Description

@the-other-tim-brown

Task Description

What needs to be done:
Analyze the query plan when deserializing the data and make sure that this happens after any filtering on the structured data columns and after any joins or other shuffle steps.

Why this task is needed:
This will help reduce the cost of jobs that deal with unstructured data.

Task Type

Code improvement/refactoring

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:devtaskDevelopment tasks and maintenance work

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions