Skip to content

Commit a13c23d

Browse files
authored
Remove CastColumnExpr and custom_file_casts example; unify on field-aware CastExpr (#21563)
## Which issue does this PR close? * Closes #20164 --- ## Rationale for this change This PR completes the cast unification effort by removing the remaining public and example surface of `CastColumnExpr`. Keeping both `CastExpr` and `CastColumnExpr` introduced duplication in casting logic, increased maintenance overhead, and made it easier for downstream code to reintroduce divergent behavior. With the introduction of field-aware `CastExpr` (via `new_with_target_field`), all functionality previously covered by `CastColumnExpr` can now be expressed through a single, consistent abstraction. This simplifies the mental model for casting, ensures consistent behavior across execution paths, and aligns documentation and examples with the unified design. Additionally, this change ensures that test coverage and examples reflect the final architecture, reducing confusion for contributors and users. --- ## What changes are included in this PR? * **Removal of `CastColumnExpr`** * Deleted `datafusion/physical-expr/src/expressions/cast_column.rs`. * Removed module declaration and public re-export from `expressions/mod.rs`. * **Migration to field-aware `CastExpr`** * Updated code paths to rely exclusively on `CastExpr`. * Replaced usages of `data_type(&schema)` with `target_field().data_type()` where appropriate. * **Example updates** * Updated `custom_file_casts` example to only handle `CastExpr`. * Removed branching logic that handled both `CastExpr` and `CastColumnExpr`. * Updated CLI usage string to reflect current examples. * **Test consolidation and improvements** * Migrated struct, nested struct, and scalar struct cast tests into `cast.rs`. * Added comprehensive field-aware cast tests covering: * Target field metadata preservation * Nullability semantics * Struct casting with missing fields * Nested struct casting * Struct scalar casting * **Refactoring and cleanup** * Renamed helper functions for clarity (e.g., `assert_cast_column` → `assert_cast_input_column`). * Simplified adapter logic to operate on a single cast expression type. * **Documentation updates** * Added upgrade note documenting removal of `CastColumnExpr` and migration guidance. --- ## Are these changes tested? Yes. * Existing tests were updated to reflect the unified cast model. * Coverage from `cast_column.rs` was preserved and migrated into `cast.rs`. * New tests validate: * Field-aware `CastExpr` semantics * Struct and nested struct casting behavior * Scalar struct casting * Metadata and nullability propagation * Adapter and schema rewriting tests were updated and remain green. --- ## Are there any user-facing changes? Yes. * **Breaking change**: `CastColumnExpr` has been removed from the public API. * Users must migrate to `CastExpr`: * Use `CastExpr::new_with_target_field(...)` for field-aware casting. * Access output metadata via `target_field()`. * Access input expressions via `expr()`. * Documentation has been updated with migration guidance. --- ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested. ---
1 parent 41dd942 commit a13c23d

7 files changed

Lines changed: 211 additions & 507 deletions

File tree

datafusion-examples/examples/custom_data_source/custom_file_casts.rs

Lines changed: 14 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ use datafusion::execution::context::SessionContext;
3333
use datafusion::execution::object_store::ObjectStoreUrl;
3434
use datafusion::parquet::arrow::ArrowWriter;
3535
use datafusion::physical_expr::PhysicalExpr;
36-
use datafusion::physical_expr::expressions::{CastColumnExpr, CastExpr};
36+
use datafusion::physical_expr::expressions::CastExpr;
3737
use datafusion::prelude::SessionConfig;
3838
use datafusion_physical_expr_adapter::{
3939
DefaultPhysicalExprAdapterFactory, PhysicalExprAdapter, PhysicalExprAdapterFactory,
@@ -43,9 +43,10 @@ use object_store::path::Path;
4343
use object_store::{ObjectStore, ObjectStoreExt, PutPayload};
4444

4545
// Example showing how to implement custom casting rules to adapt file schemas.
46-
// This example enforces that casts must be strictly widening: if the file type is Int64 and the table type is Int32, it will error
47-
// before even reading the data.
48-
// Without this custom cast rule DataFusion would happily do the narrowing cast, potentially erroring only if it found a row with data it could not cast.
46+
// This example enforces strictly widening casts: if the file type is Int64 and
47+
// the table type is Int32, it errors before reading the data. Without this
48+
// custom cast rule DataFusion would apply the narrowing cast and might only
49+
// error after reading a row that it could not cast.
4950
pub async fn custom_file_casts() -> Result<()> {
5051
println!("=== Creating example data ===");
5152

@@ -139,7 +140,7 @@ async fn write_data(
139140
Ok(())
140141
}
141142

142-
/// Factory for creating DefaultValuePhysicalExprAdapter instances
143+
/// Factory for creating custom cast physical expression adapters
143144
#[derive(Debug)]
144145
struct CustomCastPhysicalExprAdapterFactory {
145146
inner: Arc<dyn PhysicalExprAdapterFactory>,
@@ -167,8 +168,8 @@ impl PhysicalExprAdapterFactory for CustomCastPhysicalExprAdapterFactory {
167168
}
168169
}
169170

170-
/// Custom PhysicalExprAdapter that handles missing columns with default values from metadata
171-
/// and wraps DefaultPhysicalExprAdapter for standard schema adaptation
171+
/// Custom `PhysicalExprAdapter` that wraps the default adapter and rejects
172+
/// narrowing file-schema casts.
172173
#[derive(Debug, Clone)]
173174
struct CustomCastsPhysicalExprAdapter {
174175
physical_file_schema: SchemaRef,
@@ -177,34 +178,23 @@ struct CustomCastsPhysicalExprAdapter {
177178

178179
impl PhysicalExprAdapter for CustomCastsPhysicalExprAdapter {
179180
fn rewrite(&self, mut expr: Arc<dyn PhysicalExpr>) -> Result<Arc<dyn PhysicalExpr>> {
180-
// First delegate to the inner adapter to handle missing columns and discover any necessary casts
181+
// First delegate to the inner adapter to handle standard schema adaptation
182+
// and discover any necessary casts.
181183
expr = self.inner.rewrite(expr)?;
182-
// Now we can apply custom casting rules or even swap out all CastExprs for a custom cast kernel / expression
183-
// For example, [DataFusion Comet](https://github.com/apache/datafusion-comet) has a [custom cast kernel](https://github.com/apache/datafusion-comet/blob/b4ac876ab420ed403ac7fc8e1b29f42f1f442566/native/spark-expr/src/conversion_funcs/cast.rs#L133-L138).
184+
// Now apply custom casting rules or swap CastExprs for a custom cast
185+
// kernel / expression. For example, DataFusion Comet has a custom cast
186+
// kernel in its native Spark expression implementation.
184187
expr.transform(|expr| {
185188
if let Some(cast) = expr.as_any().downcast_ref::<CastExpr>() {
186189
let input_data_type =
187190
cast.expr().data_type(&self.physical_file_schema)?;
188-
let output_data_type = cast.data_type(&self.physical_file_schema)?;
191+
let output_data_type = cast.target_field().data_type();
189192
if !cast.is_bigger_cast(&input_data_type) {
190193
return not_impl_err!(
191194
"Unsupported CAST from {input_data_type} to {output_data_type}"
192195
);
193196
}
194197
}
195-
if let Some(cast) = expr.as_any().downcast_ref::<CastColumnExpr>() {
196-
let input_data_type =
197-
cast.expr().data_type(&self.physical_file_schema)?;
198-
let output_data_type = cast.data_type(&self.physical_file_schema)?;
199-
if !CastExpr::check_bigger_cast(
200-
cast.target_field().data_type(),
201-
&input_data_type,
202-
) {
203-
return not_impl_err!(
204-
"Unsupported CAST from {input_data_type} to {output_data_type}"
205-
);
206-
}
207-
}
208198
Ok(Transformed::no(expr))
209199
})
210200
.data()

datafusion-examples/examples/custom_data_source/main.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
//!
2222
//! ## Usage
2323
//! ```bash
24-
//! cargo run --example custom_data_source -- [all|csv_json_opener|csv_sql_streaming|custom_datasource|custom_file_casts|custom_file_format|default_column_values|file_stream_provider]
24+
//! cargo run --example custom_data_source -- [all|adapter_serialization|csv_json_opener|csv_sql_streaming|custom_datasource|custom_file_casts|custom_file_format|default_column_values|file_stream_provider]
2525
//! ```
2626
//!
2727
//! Each subcommand runs a corresponding example:

datafusion/physical-expr-adapter/src/schema_rewriter.rs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -644,7 +644,7 @@ mod tests {
644644
.expect("Expected CastExpr")
645645
}
646646

647-
fn assert_cast_column(cast_expr: &CastExpr, name: &str, index: usize) {
647+
fn assert_cast_input_column(cast_expr: &CastExpr, name: &str, index: usize) {
648648
let inner_col = cast_expr
649649
.expr()
650650
.as_any()
@@ -771,7 +771,7 @@ mod tests {
771771

772772
let left_cast = assert_cast_expr(left.left());
773773
assert_eq!(left_cast.target_field().data_type(), &DataType::Int64);
774-
assert_cast_column(left_cast, "a", 0);
774+
assert_cast_input_column(left_cast, "a", 0);
775775

776776
let right = outer
777777
.right()
@@ -1672,7 +1672,7 @@ mod tests {
16721672
let cast_expr = assert_cast_expr(&result);
16731673

16741674
// Verify the inner column points to the correct physical index (1)
1675-
assert_cast_column(cast_expr, "a", 1);
1675+
assert_cast_input_column(cast_expr, "a", 1);
16761676

16771677
// Verify cast types
16781678
assert_eq!(
@@ -1692,7 +1692,7 @@ mod tests {
16921692
// Regression: this must still resolve against physical field `a` by name.
16931693
let rewritten = adapter.rewrite(Arc::new(Column::new("a", 0))).unwrap();
16941694
let cast_expr = assert_cast_expr(&rewritten);
1695-
assert_cast_column(cast_expr, "a", 1);
1695+
assert_cast_input_column(cast_expr, "a", 1);
16961696
assert_eq!(cast_expr.target_field().data_type(), &DataType::Int64);
16971697
}
16981698
}

0 commit comments

Comments
 (0)