Spark: Add session configs for adaptive split sizing and read parallelism#16088
Spark: Add session configs for adaptive split sizing and read parallelism#16088karuppayya wants to merge 2 commits into
Conversation
|
Tagging reviewers from the PR that introduced the change @rdblue @ConeyLiu @aokolnychyi |
|
|
||
| public Integer splitParallelism() { | ||
| Integer parallelism = | ||
| confParser.intConf().sessionConf(SparkSQLProperties.READ_SPLIT_PARALLELISM).parseOptional(); |
There was a problem hiding this comment.
Why only session conf? ADAPTIVE_SPLIT_SIZE get's table versions?
There was a problem hiding this comment.
I think this is runtime property(for Spark it depends on cores+memory of the application)
Do we want to make it part of table property?
| public static final String READ_ADAPTIVE_SPLIT_SIZE_ENABLED = | ||
| "spark.sql.iceberg.read.adaptive-split-size.enabled"; | ||
|
|
||
| // Controls the parallelism used for adaptive split sizing |
There was a problem hiding this comment.
Currently this masks the fact that this parameter overrides parallelism(), keeping the default here I think would let you have a clearer doc too
| assertThat(description).contains("endSnapshotId=" + endSnapshotId); | ||
| }); | ||
| } | ||
|
|
There was a problem hiding this comment.
A few notes on the tests here,
These don't seem to be in the right place. This class is for testing the scan object and neither of these tests actually touch the spark scan, they are both essentially just parsing checkings. It may be time to start a TestSparkReadConf file for these sorts of tests
or
Actually invoke the scan here and show how the properties are changing things.
We should break out assertions, currently there are mutliple parse cases being handled in the same test.
There should probably be something like
Test Adapative enabled
test Invalid parallelism
Test ...
- Make sure we cover both the positive and negative cases. If we say a value is invalid we should have a test throwing an error when that value is passed.
There was a problem hiding this comment.
- Created a seperate test file
2 and 3 handled
Summary
Changes
Test plan