Update datafusion dependency to latest in preparation for DF54#1532
Draft
timsaucer wants to merge 12 commits into
Draft
Update datafusion dependency to latest in preparation for DF54#1532timsaucer wants to merge 12 commits into
timsaucer wants to merge 12 commits into
Conversation
Bump workspace deps to apache/datafusion@3d06bedc (git pin) in preparation for the 54.0.0 release. Workspace package version moves to 54.0.0 to track the upstream major convention. Compile fixes: - Drop as_any impls (trait now has Any as supertrait) and use the upstream-provided downcast_ref helper on dyn trait objects. - Reconcile FFI provider From conversions to drop redundant `+ Send` on Arc<dyn ...> bounds. - Cast/TryCast: data_type → field.data_type() (FieldRef rename). - Stub match arms for new Expr::HigherOrderFunction / Lambda / LambdaVariable and ScalarValue::ListView / LargeListView variants; proper exposure deferred to PR 3 audit. - DatasetExec: partition_statistics returns Arc<Statistics>; add required apply_expressions trait method. - Suppress TableFunctionImpl::call deprecation pending call_with_args refactor that needs Session plumbing. User-facing test updates for upstream behavior changes: - median / approx_median / approx_percentile_cont now return Float64. - String functions (concat_ws, lower, upper, repeat, reverse, split_part, translate) return StringView when given StringView. - overlay appends past end-of-string rather than replacing the input. - arrays_zip / list_zip struct field names "c0"/"c1" → "1"/"2". - Filter on mismatched cast types now errors (was 0 matches). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the upstream DataFusion 53 → main bump. The check-upstream audit (PR 3 of dev/release/upstream-sync.md) surfaced a small set of trivial wins; this commit ships them. Trivial wins: - DataFrame.alias(name) — wraps the logical plan in a SubqueryAlias. - functions.__all__: add `instr` and `position` (both were defined as public defs but missing from `__all__`, so they didn't show up in `from datafusion.functions import *` or generated docs). - top-level `datafusion.__all__`: re-export `TableProviderFactory` and `TableProviderFactoryExportable` (previously only reachable via the `datafusion.catalog` submodule). Non-trivial gaps surfaced by the audit (DataFrame.registry, into_*/task_ctx, SessionContext extensibility surface, distinct-aware aggregate variants, TableFunctionImpl::call_with_args migration, FFI Protocol pipeline gaps) are deferred — each warrants its own design and PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prior example called alias("t") then to_pydict(), which did not show
the qualifier effect. Replace with a self-join that uses col("l.val")
and col("r.val") so the disambiguation behavior is visible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataFusion 54 introduces Expr::HigherOrderFunction, Expr::Lambda, and Expr::LambdaVariable. PyExpr::to_variant previously errored on each with py_unsupported_variant_err. Add PyHigherOrderFunction, PyLambda, and PyLambdaVariable wrappers, register them in the expr pymodule and re-export from python/datafusion/expr.py, and dispatch to_variant to the new wrappers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Map HigherOrderFunction and Lambda to RexType::Call; LambdaVariable to RexType::Reference. In rex_call_operands return the args for HigherOrderFunction, the body for Lambda, and self for LambdaVariable (mirroring Column). In rex_call_operator return the underlying UDF name for HigherOrderFunction and the literal "lambda" for Lambda. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arrow These ScalarValue variants all wrap Arc<...Array>, exposing the outer DataType via Array::data_type(), so we can mirror the existing ScalarValue::List arm instead of returning PyNotImplementedError. This makes Expr.types() work for plans that round-trip through SQL or proto where these scalar variants surface. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DataFusion 53.0.0 deprecated TableFunctionImpl::call in favor of call_with_args(args: TableFunctionArgs), which threads a Session reference alongside the exprs. Implement call_with_args on PyTableFunction (delegating to the FFI variant's call_with_args, or ignoring the session for the pure-Python variant which doesn't use it) and have __call__ build a TableFunctionArgs from the global session. Drops both #[allow(deprecated)] attributes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…atch.crates-io] The workspace version was prematurely bumped to 54.0.0 in the DF53→pre-54 upgrade. Restore it to 53.0.0 until we are actually ready to cut the 54 release. The same change had moved every datafusion-* dependency from a crates.io version constraint to a direct git dep in [workspace.dependencies]. Switch them back to "version = \"53\"" and move the git rev overrides into [patch.crates-io] so the published manifest will be patch-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-partition `collect()` returns batches in execution-scheduling order, which is non-deterministic and differs between local and CI runners. Sort by the first value of column 0 (unique per partition in each affected test) so the expected/actual comparison is stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to #1533
Rationale for this change
We are updating the upstream DataFusion dependency so that we can reduce the time to release 54 once the new version is released.
What changes are included in this PR?
Updated all datafusion dependencies to
mainat commit3d06bedcc9afbd65781ac1de28741c36140d2cbbPython test fixes (23 expectations) for upstream behavior changes:
median/approx_median/approx_percentile_contreturnFloat64(was matching input type).concat_ws,lower,upper,repeat,reverse,split_part,translate) returnStringViewforStringViewinput (wasString).overlayappends past end-of-string rather than replacing.arrays_zip/list_zipstruct field names changed fromc0/c1to"1"/"2".check-upstreamaudit trivial wins:DataFrame.alias(name)— wraps the logical plan in aSubqueryAliasfor self-joins and qualifier-style references.functions.__all__: addinstrandposition(both already defined as public defs but missing from__all__).datafusion.__all__: re-exportTableProviderFactoryandTableProviderFactoryExportable(previously reachable only via thedatafusion.catalogsubmodule).Are there any user-facing changes?
Yes — several behavior changes inherited from upstream DataFusion 54 (warrants
api changelabel):median/approx_median/approx_percentile_contnow returnFloat64rather than matching the input type.StringViewwhen fedStringViewinput (concat_ws,lower,upper,repeat,reverse,split_part,translate).overlaysemantics: passing a start position past the end of a string now appends the replacement, e.g.overlay("!", "--", 2) → "!--"(was"--").arrays_zip/list_zipfield names changed:c0/c1→"1"/"2".Cannot cast stringerror, where previously it silently produced zero matches.DataFrame.alias(name),instrandpositionnow appear underfrom datafusion.functions import *,TableProviderFactoryandTableProviderFactoryExportableare now reachable from the top-leveldatafusionnamespace.