Skip to content

Spark 3.4: Set data file sort_order_id in manifest for writes from Spark#16308

Merged
kevinjqliu merged 1 commit into
apache:mainfrom
kevinjqliu:spark-3.4-sort-order-id
May 13, 2026
Merged

Spark 3.4: Set data file sort_order_id in manifest for writes from Spark#16308
kevinjqliu merged 1 commit into
apache:mainfrom
kevinjqliu:spark-3.4-sort-order-id

Conversation

@kevinjqliu
Copy link
Copy Markdown
Contributor

Backport of #15832 to spark/v3.4.

Adds the output-sort-order-id write option (and SparkWriteConf.outputSortOrderId) and threads the resolved sort-order id through SparkWrite and SparkPositionDeltaWrite so written data files record the sort order in their manifest entry. SparkShufflingFileRewriteRunner resolves the matching table sort order via SortOrderUtil.findTableSortOrder and logs a warning when no match exists.

Adaptation note

v3.4 SparkWrite still names the parameter partitionedFanoutEnabled (v3.5 renamed it to useFanoutWriter). I kept the v3.4 name and added the new sortOrderId parameter alongside it. All other files match the v3.5 patch.

Validation

  • ./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "*TestSparkWriteConf.testSortOrder*" (new tests pass)
  • ./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "org.apache.iceberg.spark.source.TestSparkDataWrite" (passes)
  • spark-extensions tests compile

Backport of apache#15832 to spark/v3.4.

Adds the output-sort-order-id write option and threads the resolved
sort-order id through SparkWrite and SparkPositionDeltaWrite so written
data files record the sort order in their manifest entry.

Adaptation: v3.4 SparkWrite still uses 'partitionedFanoutEnabled' (not
renamed to 'useFanoutWriter' as in v3.5). Kept the v3.4 name and added
the new 'sortOrderId' parameter alongside it.
@github-actions github-actions Bot added the spark label May 13, 2026
@manuzhang manuzhang requested a review from huaxingao May 13, 2026 01:51
@kevinjqliu
Copy link
Copy Markdown
Contributor Author

thanks for the review! since this is a backport and only for spark 3.4, im comfortable with merging it in as is 😄

@kevinjqliu kevinjqliu merged commit bdbb375 into apache:main May 13, 2026
27 checks passed
@kevinjqliu kevinjqliu deleted the spark-3.4-sort-order-id branch May 13, 2026 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants