feat(bigquery): add typed handling for AI scalar functions#7479
Conversation
Signed-off-by: Mridankan Mandal <xerontitan90@gmail.com>
|
|
||
|
|
||
| class AIEmbed(Expression, Func): | ||
| arg_types = {"expressions": False} |
There was a problem hiding this comment.
I think this needs to be True, as at least one argument is expected. The same holds for AISimilarity and AIGenerate.
| class AIEmbed(Expression, Func): | ||
| arg_types = {"expressions": False} | ||
| is_var_len_args = True | ||
| _sql_names = ["EMBED"] |
There was a problem hiding this comment.
Why did you add _sql_names here and in the other two AST nodes? I don't think we want these. You should instead override the nodes' generators in BigQuery and map them to different names using rename_func.
| ai_scalars: dict[str, type[exp.Func]] = { | ||
| "EMBED": exp.AIEmbed, | ||
| "SIMILARITY": exp.AISimilarity, | ||
| "GENERATE": exp.AIGenerate, | ||
| } |
There was a problem hiding this comment.
Is this necessary if you have proper function parsers (e.g., using FUNCTIONS) for EMBED et al? Check if there's an overlap between these and non-AI-prefixed functions in BigQuery and if that's the case, then try my suggestion to see if we can simplify this.
As a side-note, we generally don't define constants like this mapping inline, but instead "bubble them up" in the parser class.
There was a problem hiding this comment.
We should increase the test coverage a bit by including more validate_identity tests that cover more args. For example, the EMBED function has the following syntax:
AI.EMBED(
[ content => ] 'content',
{ endpoint => 'endpoint' | model => 'model' }
[, task_type => 'task_type']
[, title => 'title']
[, model_params => model_params]
[, connection_id => 'connection']
)ideally, we want some representative tests with more arguments.
|
I'll get this in and take it to the finish line, thanks. |
|
Check out a1cea51. |
Description:
Closes #7478.
Validation:
Results:
The validation screenshots of the tests run, locally on WSL:
Additionally, verified direct parses now produce typed function nodes under AI.* dot calls while preserving round-trip SQL.