feat(writer): Make LocationGenerator partition-aware#1625
Conversation
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @CTTY for this pr!
| } | ||
|
|
||
| /// Checks if a partition key is effectively none. | ||
| pub fn partition_key_is_none(partition_key: Option<&PartitionKey>) -> bool { |
There was a problem hiding this comment.
Why not put it in PartitionKey?
There was a problem hiding this comment.
Does this have to be pub?
There was a problem hiding this comment.
I think putting this in PartitionKey is a good idea. Currently I have added a PartitionKeyExt trait to have this function work on Option<&PartitionKey> directly.
partition_key_is_none may be used in custom implementations of location generators or used along with some custom writers, so I think it's better to leave it pub
| let location = default_location_gen.generate_location(Some(partition_key), file_name); | ||
| assert_eq!( | ||
| location, | ||
| "s3://data.db/table/data/id=42/name=\"alice\"/data-00000.parquet" |
There was a problem hiding this comment.
THis is incorrect, it should be url encoded.
There was a problem hiding this comment.
Good point! I didn't notice that the implementation of Transform::to_human_string is different from Java. It turns out the Display implementation for Datum escaped the value for string via double quotes here, and partition_path_to_path uses Transform::to_human_string -> Datum.to_string() to get a human readable string.
Do you happen to know why this is the case?
My current solution is to add a new helper to_human_string under Datum to explicitly not add double quotes
There was a problem hiding this comment.
Do you happen to know why this is the case?
I'm not that clear about it, maybe it's an arbitrary decision.
| } | ||
|
|
||
| /// Extension to help check if a partition key is effectively none. | ||
| pub trait PartitionKeyExt { |
There was a problem hiding this comment.
This is a little over design. What I mean was just like following:
impl PartitionKey {
fn is_effectively_none(input: Option<&PartitionKey>) -> bool {
...
}
}
| impl LocationGenerator for MockLocationGenerator { | ||
| fn generate_location(&self, file_name: &str) -> String { | ||
| format!("{}/{}", self.root, file_name) | ||
| fn generate_location(&self, partition: Option<&PartitionKey>, file_name: &str) -> String { |
There was a problem hiding this comment.
Do we really need this MockLocationGenerator? Why not just add a method for DefaultLocationGenerator such as with_data_location so that all the tests cover using it?
There was a problem hiding this comment.
I had the same thought as well, not sure why we even need a mock location gen. let me give it a try
There was a problem hiding this comment.
It turns out there are 20+ usages of Mock loc gen across the code base, and I think it may be better to have a separate PR, I can work on it after this
created an issue: #1645
… in location generator
290a530 to
724aeeb
Compare
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @CTTY for this pr!
Which issue does this PR close?
What changes are included in this PR?
PartitionKeyto hold necessary partition-related info for path generationLocationGenerator::generate_locationto accept an optional partition keyParquetWritertake in anOption<PartitionKey>Are these changes tested?
Added ut