Skip to content

RFC-0025 Derived column#61

Open
ScrapCodes wants to merge 1 commit into
prestodb:mainfrom
ScrapCodes:derived-column-spec
Open

RFC-0025 Derived column#61
ScrapCodes wants to merge 1 commit into
prestodb:mainfrom
ScrapCodes:derived-column-spec

Conversation

@ScrapCodes
Copy link
Copy Markdown
Contributor

@ScrapCodes ScrapCodes commented Apr 24, 2026

What is a derived column?

A column created by applying a SQL expression or a UDF to an existing column in a table.

Why do we need that, since we can always apply a UDF to a column during project, filter or join?

Indeed, a derived column consumes O(N) storage, where N is the number of rows in the table. We still need them because, the performance benefits outweigh the disadvantage of extra storage it consumes. Let us understand with the following use case example:

A compute engine like Presto can easily push down a filter predicate e.g. SELECT col1, col2, FROM table T1 WHERE col1='constant_value' , this allows for pruning the number of rows required for TableScan by applying the filtering WHERE col1=’constant_value’. This is not true of when a UDF is involved in the filter predicate, let us take an example SELECT col1, col2, FROM table T1 WHERE lower(col1)='constant_value'. While optimizers can easily push down the filter predicate, however, it can not be used in filtering using the lower and upper bound metrics, for example Iceberg manifest statics and Parquet row group statistics. As a result, we end up scanning a large number of rows.

So, to support push down of certain predicates (with UDFs in them) and reduce the amount of data scanned, derived column bring massive performance improvements. Derived columns have already been proven in RDBMS system e.g. DB2 [1], and now we intend to bring them to Presto.

@prestodb-ci prestodb-ci added the from:IBM PRs from IBM label Apr 24, 2026
@prestodb-ci prestodb-ci requested review from a team, BryanCutler and infvg and removed request for a team April 24, 2026 07:35
@ScrapCodes ScrapCodes marked this pull request as draft April 24, 2026 07:35
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch 4 times, most recently from 474cf06 to 221e8c0 Compare April 24, 2026 11:37
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from 221e8c0 to 8690c4e Compare May 4, 2026 09:02
@ScrapCodes ScrapCodes changed the title [WIP] RFC-0025 Derived column RFC-0025 Derived column May 4, 2026
@ScrapCodes ScrapCodes marked this pull request as ready for review May 4, 2026 09:58
@prestodb-ci prestodb-ci requested review from a team, infvg and wanglinsong and removed request for a team May 4, 2026 09:58
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch 3 times, most recently from 3bdd1ef to 2448a60 Compare May 4, 2026 12:23
Comment thread RFC-0025-derived-column-support.md Outdated
@jja725 jja725 self-requested a review May 5, 2026 18:25
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from 2448a60 to 8c6c4a4 Compare May 6, 2026 16:39
Copy link
Copy Markdown

@jja725 jja725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that how write work would be the main concern here with compatibility with all the engine

Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from 8c6c4a4 to 03935be Compare May 7, 2026 16:11
@ScrapCodes
Copy link
Copy Markdown
Contributor Author

@tdcmeehan has volunteered to be a co-author ! Yay!

Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md
@ScrapCodes ScrapCodes requested a review from tdcmeehan May 19, 2026 06:47
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from 82feae1 to 334810b Compare May 19, 2026 10:35
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md Outdated
Comment thread RFC-0025-derived-column-support.md
{
"udfSpecList" : [ {
"derivedColumnType" : "PERSISTENT",
"derivedColumnExpression" : "SQL expression",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give more info about the SQL dialect of this expression ? Seems like you want atleast Presto and Spark to understand it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, deriving a common subset of expressions that are interpretable by both Spark and Presto is hard and likely outside of the scope of this RFC. I think the most straightforward thing is to treat them like views, which defer on cross-platform interpretability and force any consumer of the view SQL to understand Presto's dialect. Cross platform expressions can be considered an orthogonal yet important task.

Comment thread RFC-0025-derived-column-support.md
Comment thread RFC-0025-derived-column-support.md Outdated
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from e9533da to 7b6ee54 Compare May 25, 2026 14:00
@ScrapCodes ScrapCodes force-pushed the derived-column-spec branch from 7b6ee54 to 73e85c3 Compare May 26, 2026 06:11
@ScrapCodes ScrapCodes requested a review from aditi-pandit May 26, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PRs from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants