Skip to content

feat(bigquery): add typed handling for AI scalar functions#7479

Merged
georgesittas merged 1 commit intotobymao:mainfrom
RedZapdos123:enhancement/bigquery-ai-scalar-typed-handling
Apr 15, 2026
Merged

feat(bigquery): add typed handling for AI scalar functions#7479
georgesittas merged 1 commit intotobymao:mainfrom
RedZapdos123:enhancement/bigquery-ai-scalar-typed-handling

Conversation

@RedZapdos123
Copy link
Copy Markdown
Contributor

@RedZapdos123 RedZapdos123 commented Apr 9, 2026

Description:

  • add typed expression nodes for BigQuery AI scalar calls:
    • AI.EMBED -> AIEmbed
    • AI.SIMILARITY -> AISimilarity
    • AI.GENERATE -> AIGenerate
  • keep SQL output backward compatible through explicit SQL names (EMBED, SIMILARITY, GENERATE)
  • preserve generic behavior for unqualified scalar names; typed conversion is applied to dotted AI.* calls in BigQuery parser column-op handling
  • add BigQuery dialect tests asserting both identity round-trip and typed AST nodes

Closes #7478.

Validation:

  • python -m pytest -q tests/dialects/test_bigquery.py -k ml_functions -vv
  • python -m pytest -q tests/dialects/test_bigquery.py -vv
  • python -m pytest -q

Results:

  • focused BigQuery ml_functions: passed.
  • full BigQuery dialect module: passed (58 passed).
  • full suite: passed (1076 passed).

The validation screenshots of the tests run, locally on WSL:

image image image

Additionally, verified direct parses now produce typed function nodes under AI.* dot calls while preserving round-trip SQL.

Signed-off-by: Mridankan Mandal <xerontitan90@gmail.com>
@georgesittas georgesittas self-assigned this Apr 14, 2026


class AIEmbed(Expression, Func):
arg_types = {"expressions": False}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be True, as at least one argument is expected. The same holds for AISimilarity and AIGenerate.

class AIEmbed(Expression, Func):
arg_types = {"expressions": False}
is_var_len_args = True
_sql_names = ["EMBED"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add _sql_names here and in the other two AST nodes? I don't think we want these. You should instead override the nodes' generators in BigQuery and map them to different names using rename_func.

Comment on lines +734 to +738
ai_scalars: dict[str, type[exp.Func]] = {
"EMBED": exp.AIEmbed,
"SIMILARITY": exp.AISimilarity,
"GENERATE": exp.AIGenerate,
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary if you have proper function parsers (e.g., using FUNCTIONS) for EMBED et al? Check if there's an overlap between these and non-AI-prefixed functions in BigQuery and if that's the case, then try my suggestion to see if we can simplify this.

As a side-note, we generally don't define constants like this mapping inline, but instead "bubble them up" in the parser class.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should increase the test coverage a bit by including more validate_identity tests that cover more args. For example, the EMBED function has the following syntax:

AI.EMBED(
  [ content => ] 'content',
  { endpoint => 'endpoint' | model => 'model' }
  [, task_type => 'task_type']
  [, title => 'title']
  [, model_params => model_params]
  [, connection_id => 'connection']
)

ideally, we want some representative tests with more arguments.

@georgesittas
Copy link
Copy Markdown
Collaborator

I'll get this in and take it to the finish line, thanks.

@georgesittas georgesittas merged commit ec516f2 into tobymao:main Apr 15, 2026
12 checks passed
@georgesittas
Copy link
Copy Markdown
Collaborator

Check out a1cea51.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement: richer typed handling for BigQuery AI.* scalar functions

3 participants