Skip to content

[SPARK-55347][SQL] Pass Generated Column as Expression to DSV2#54126

Open
szehon-ho wants to merge 2 commits intoapache:masterfrom
szehon-ho:analyze_generated_column
Open

[SPARK-55347][SQL] Pass Generated Column as Expression to DSV2#54126
szehon-ho wants to merge 2 commits intoapache:masterfrom
szehon-ho:analyze_generated_column

Conversation

@szehon-ho
Copy link
Member

What changes were proposed in this pull request?

This change passes a DSV2 Expression (from an analyzed generated column Expression) to the DSV2 API.

It also cleans up the code. Previously, the generation expression is analyzed by an independent analyzer. Now, it is inline (similar to how constraint expressions are analyzed). This makes it easier to run optimizer rules to constant fold the expression, before they are passed into DSV2.

Why are the changes needed?

Code cleanup and to make DSV2 data source able to work with DSV2 Expression API for generated columns

Does this PR introduce any user-facing change?

Some error messages may change, but overall should be the same.

How was this patch tested?

Added new tests in DataSourceV2DataFrameSuite, and changed error case expectations in existing tests.

Was this patch authored or co-authored using generative AI tooling?

Some claude 4.5 opus, but hand tuned

@github-actions github-actions bot added the SQL label Feb 4, 2026
@github-actions
Copy link

github-actions bot commented Feb 4, 2026

JIRA Issue Information

=== Improvement SPARK-55347 ===
Summary: Pass Generated Column as Expression to DSV2
Assignee: None
Status: Open
Affected: ["4.1.1"]


This comment was automatically generated by GitHub Actions

@szehon-ho szehon-ho force-pushed the analyze_generated_column branch from 5c1f4ff to 190499f Compare February 4, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant