Harden BigQuery ML tools against SQL injection#5251
Open
petrmarinec wants to merge 1 commit intogoogle:mainfrom
Open
Harden BigQuery ML tools against SQL injection#5251petrmarinec wants to merge 1 commit intogoogle:mainfrom
petrmarinec wants to merge 1 commit intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
89c2b7c to
86db909
Compare
86db909 to
f928c91
Compare
Collaborator
|
Response from ADK Triaging Agent - Security Review Complete. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This change hardens the BigQuery ML helper tools exposed by
BigQueryToolsetagainst SQL injection.forecast,analyze_contribution, anddetect_anomaliespreviously constructed SQL with direct string interpolation of tool inputs. Because these helpers are exposed through the toolset, attacker-influenced inputs could be propagated into generated SQL, including inORDER BYclauses and model option values.What changed
forecast,analyze_contribution, anddetect_anomaliesinstead of interpolating them into SQL strings.history_data,input_data, andtarget_datato valid BigQuery table IDs instead of accepting raw SQL statements.detect_anomaliesbefore placing them inORDER BY.Why this approach
BigQuery supports parameterized queries natively, which is the correct defense for literal values controlled by tool inputs.
Table references and
ORDER BYidentifiers cannot be passed as query parameters, so those inputs are now validated as BigQuery table IDs or field paths before use.Compatibility
This is an intentional hardening change: these ML helper tools no longer accept raw SQL statements as data sources.
Callers that need arbitrary SQL should use
execute_sqldirectly or materialize a view or table first, then pass that table ID to the ML helper.Validation
Verified in a fresh Linux environment with test dependencies installed via
pip install -e '.[test]':pytest tests/unittests/tools/bigquery/test_bigquery_query_tool.py -q95 passedAlso re-ran the offline regression scenario used to confirm the original issue and verified that the generated SQL no longer contains the previously attacker-controlled fragments.
This PR also adds unit regressions for those cases so future changes do not reintroduce the same issue.