Spark 3.4: Set data file sort_order_id in manifest for writes from Spark by kevinjqliu · Pull Request #16308 · apache/iceberg

kevinjqliu · 2026-05-13T00:27:16Z

Backport of #15832 to spark/v3.4.

Adds the output-sort-order-id write option (and SparkWriteConf.outputSortOrderId) and threads the resolved sort-order id through SparkWrite and SparkPositionDeltaWrite so written data files record the sort order in their manifest entry. SparkShufflingFileRewriteRunner resolves the matching table sort order via SortOrderUtil.findTableSortOrder and logs a warning when no match exists.

Adaptation note

v3.4 SparkWrite still names the parameter partitionedFanoutEnabled (v3.5 renamed it to useFanoutWriter). I kept the v3.4 name and added the new sortOrderId parameter alongside it. All other files match the v3.5 patch.

Validation

./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "*TestSparkWriteConf.testSortOrder*" (new tests pass)
./gradlew -DsparkVersions=3.4 :iceberg-spark:iceberg-spark-3.4_2.12:test --tests "org.apache.iceberg.spark.source.TestSparkDataWrite" (passes)
spark-extensions tests compile

Backport of apache#15832 to spark/v3.4. Adds the output-sort-order-id write option and threads the resolved sort-order id through SparkWrite and SparkPositionDeltaWrite so written data files record the sort order in their manifest entry. Adaptation: v3.4 SparkWrite still uses 'partitionedFanoutEnabled' (not renamed to 'useFanoutWriter' as in v3.5). Kept the v3.4 name and added the new 'sortOrderId' parameter alongside it.

kevinjqliu · 2026-05-13T02:03:57Z

thanks for the review! since this is a backport and only for spark 3.4, im comfortable with merging it in as is 😄

github-actions Bot added the spark label May 13, 2026

manuzhang requested a review from huaxingao May 13, 2026 01:51

huaxingao approved these changes May 13, 2026

View reviewed changes

kevinjqliu merged commit bdbb375 into apache:main May 13, 2026
27 checks passed

kevinjqliu deleted the spark-3.4-sort-order-id branch May 13, 2026 02:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark 3.4: Set data file sort_order_id in manifest for writes from Spark#16308

Spark 3.4: Set data file sort_order_id in manifest for writes from Spark#16308
kevinjqliu merged 1 commit into
apache:mainfrom
kevinjqliu:spark-3.4-sort-order-id

kevinjqliu commented May 13, 2026

Uh oh!

kevinjqliu commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kevinjqliu commented May 13, 2026

Adaptation note

Validation

Uh oh!

kevinjqliu commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants