gemini-cli-extensions · averikitsch · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026 · Apr 16, 2026
@@ -23,3 +23,15 @@ This section covers connecting to BigQuery.
  * If an operation fails due to permissions, identify the type of operation and recommend the appropriate role. You can provide these links for assistance:
  * Granting Roles: https://cloud.google.com/iam/docs/grant-role-console
  * BigQuery Permissions: https://cloud.google.com/iam/docs/roles-permissions/bigquery
+
+### 2. BigQuery AI/ML Skills
+These skills leverage BigQuery's built-in AI functions (`AI.*`) for tasks like text generation, classification, and semantic search.
+
+**Important**: Standard SQL-based `AI.*` functions (executed via `execute_sql()`) are preferred over dedicated BigQuery tools for tasks like Forecasting and Anomaly Detection.
+
+1. **Prerequisites**:
+ * Ensure your BigQuery project has the **Vertex AI API** enabled.
+ * A [Cloud Resource Connection](https://docs.cloud.google.com/bigquery/docs/create-cloud-resource-connection) must be established in BigQuery to use `AI.*` functions.
+
+2. **Handle Permission Errors**:
+ * The service account associated with the BigQuery connection requires the **Vertex AI User** (`roles/aiplatform.user`) and the **BigQuery Connection User** (`roles/bigquery.connectionUser`) role.
@@ -47,6 +47,11 @@ Before you begin, ensure you have the following:
 - Ensure [Application Default Credentials](https://cloud.google.com/docs/authentication/gcloud) are available in your environment.
 - IAM Permissions:
     - BigQuery User (`roles/bigquery.user`)
+- (Optional) To use BigQuery AI/ML skills
+  - Ensure that Vertex AI API is enabled
+  - IAM permissions:
+    - BigQuery Connection User (`roles/bigquery.connectionUser`)
+    - Vertex AI User (`roles/aiplatform.user`)
 
 ## Getting Started
 
@@ -235,8 +240,9 @@ Interact with BigQuery using natural language right from your IDE:
 
 This extension provides a comprehensive set of skills:
 
-* `bigquery-data`: Use these skills when you need to handle large-scale data exploration and dataset management. Use when users need to find data assets or run SQL at scale. Provides metadata discovery and query execution across the data warehouse.
-* `bigquery-analytics`: Use these skills when you need to handle advanced data intelligence and predictive tasks. Use when a user asks "why" data changed or needs future projections. Provides automated insight generation and time-series forecasting.
+* [bigquery-data](./skills/bigquery-data/SKILL.md): Use these skills when you need to handle large-scale data exploration and dataset management. Use when users need to find data assets or run SQL at scale. Provides metadata discovery and query execution across the data warehouse.
+* [bigquery-analytics](./skills/bigquery-analytics/SKILL.md): Use these skills when you need to handle advanced data intelligence and predictive tasks. Use when a user asks "why" data changed or needs future projections. Provides automated insight generation and time-series forecasting.
+* [bigquery-ai-ml](./skills/bigquery-ai-ml/SKILL.md): Use these skills for BigQuery AI and Machine Learning queries using standard SQL and `AI.*` functions. Provides capabilities for text generation, classification, semantic search, and forecasting using pre-trained models without needing to manage custom models.
 
 ## Additional Extensions
 

@@ -0,0 +1,47 @@
+---
+name: bigquery-ai-ml
+license: Apache-2.0
+metadata:
+  author: google-adk
+  version: "1.0"
+description: |
+  Skill for BigQuery AI and Machine Learning queries using standard SQL
+  and `AI.*` functions (preferred over dedicated tools).
+
+---
+
+# Skill: bigquery-ai-ml
+
+This skill defines the usage and rules for BigQuery AI/ML functions,
+preferring SQL-based Skills over dedicated BigQuery tools.
+
+## 1. Skill vs Tool Preference (BigQuery AI/ML)
+
+Agents should **prefer using the Skill (SQL via `execute_sql()`)** over
+dedicated BigQuery tools for functionalities like Forecasting and Anomaly
+Detection.
+
+Use `execute_sql()` with the standard BigQuery `AI.*` functions for these tasks
+instead of the corresponding high-level tools.
+
+## 2. Mandatory Reference Routing
+
+This skill file does not contain the syntax for these functions. You **MUST**
+read the associated reference file before generating SQL.
+
+**CRITICAL**: DO NOT GUESS filenames. You MUST only use the exact paths
+provided below.
+
+| Function | Description | Required Reference File to Retrieve |
+| :--- | :--- | :--- |
+| **AI.FORECAST** | Time-series forecasting via the pre-trained TimesFM model | `references/bigquery_ai_forecast.md` |
+| **AI.CLASSIFY** | Categorize unstructured data into predefined labels | `references/bigquery_ai_classify.md` |
+| **AI.DETECT_ANOMALIES** | Identify deviations in time-series data via the pre-trained TimesFM model | `references/bigquery_ai_detect_anomalies.md` |
+| **AI.GENERATE** | General-purpose text and content generation | `references/bigquery_ai_generate.md` |
+| **AI.GENERATE_BOOL** | Generate a boolean value (TRUE/FALSE) based on a prompt | `references/bigquery_ai_generate_bool.md` |
+| **AI.GENERATE_DOUBLE** | Generate a floating-point number based on a prompt | `references/bigquery_ai_generate_double.md` |
+| **AI.GENERATE_INT** | Generate an integer value based on a prompt | `references/bigquery_ai_generate_int.md` |
+| **AI.IF** | Evaluate a natural-language boolean condition | `references/bigquery_ai_if.md` |
+| **AI.SCORE** | Rank items by semantic relevance (use with ORDER BY) | `references/bigquery_ai_score.md` |
+| **AI.SIMILARITY** | Compute cosine similarity between two inputs | `references/bigquery_ai_similarity.md` |
+| **AI.SEARCH** | Semantic search on tables with autonomous embedding generation | `references/bigquery_ai_search.md` |
@@ -0,0 +1,92 @@
+# BigQuery AI.Classify
+
+`AI.CLASSIFY` categorizes unstructured data into a predefined set of labels.
+
+## Syntax Reference
+
+```sql
+AI.CLASSIFY(
+  [ input => ] 'INPUT',
+  [ categories => ] 'CATEGORIES'
+  [, connection_id => 'CONNECTION_ID' ]
+  [, endpoint => 'ENDPOINT' ]
+  [, output_mode => 'OUTPUT_MODE' ]
+)
+```
+
+### Input Arguments
+
+| Argument            | Requirement  | Type          | Description           |
+| :------------------ | :----------- | :------------ | :-------------------- |
+| **`input`**         | **Required** | String        | The text content to   |
+:                     :              :               : classify.             :
+| **`categories`**    | **Required** | Array<String> | A list of target      |
+:                     :              :               : categories/labels.    :
+:                     :              :               : Can be                :
+:                     :              :               : `ARRAY<STRING>` or    :
+:                     :              :               : `ARRAY<STRUCT<STRING, :
+:                     :              :               : STRING>>` (label,     :
+:                     :              :               : description).         :
+| **`connection_id`** | Optional     | String        | The connection ID to  |
+:                     :              :               : use for the LLM.      :
+| **`endpoint`**      | Optional     | String        | The model name, e.g., |
+:                     :              :               : `'gemini-2.5-flash'`. :
+| **`output_mode`**   | Optional     | String        | `'single'` (default)  |
+:                     :              :               : or `'multi'`.         :
+:                     :              :               : Determines the output :
+:                     :              :               : type.                 :
+
+### Output Schema
+
+The output type depends on the `output_mode` argument:
+
+| Output Mode      | output_mode Value | Type            | Description         |
+| :--------------- | :---------------- | :-------------- | :------------------ |
+| **Single Label** | `NULL` (Default)  | `STRING`        | The single category |
+:                  :                   :                 : that best fits the  :
+:                  :                   :                 : input.              :
+| **Single Label   | `'single'`        | `ARRAY<STRING>` | An array containing |
+: (Explicit)**     :                   :                 : exactly one         :
+:                  :                   :                 : category string.    :
+| **Multi Label**  | `'multi'`         | `ARRAY<STRING>` | An array containing |
+:                  :                   :                 : zero or more        :
+:                  :                   :                 : matching            :
+:                  :                   :                 : categories.         :
+
+## Examples
+
+### Classify text into categories
+
+```sql
+SELECT
+  content,
+  AI.CLASSIFY(
+    content,
+    categories => ['Spam', 'Not Spam', 'Urgent'],
+    connection_id => 'my-project.us.my-connection'
+  ) as classification
+FROM `dataset.emails`;
+```
+
+### Classify text into multiple topics
+
+```
+SELECT
+  title,
+  body,
+  AI.CLASSIFY(
+    body,
+    categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
+    output_mode => 'multi') AS categories
+FROM
+  `bigquery-public-data.bbc_news.fulltext`
+LIMIT 100;
+```
+
+### Classify reviews by sentiment
+
+SELECT AI.CLASSIFY( ('Classify the review by sentiment: ', review), categories
+=> [('green', 'The review is positive.'), ('yellow', 'The review is neutral.'),
+('red', 'The review is negative.')]) AS ai_review_rating, reviewer_rating AS
+human_provided_rating, review, FROM `bigquery-public-data.imdb.reviews` WHERE
+title = 'The English Patient'
@@ -0,0 +1,110 @@
+# BigQuery AI.Detect_Anomalies
+
+`AI.DETECT_ANOMALIES` uses the pre-trained **TimesFM** model to identify
+deviations in time series data without needing to train a custom model.
+
+## Syntax Reference
+
+This function compares a target dataset against a historical dataset to identify
+anomalies.
+
+```sql
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  { TABLE `project.dataset.history_table` | (SELECT * FROM history_query) },
+  { TABLE `project.dataset.target_table` | (SELECT * FROM target_query) },
+  data_col => 'DATA_COL',
+  timestamp_col => 'TIMESTAMP_COL'
+  [, model => 'MODEL']
+  [, id_cols => ID_COLS]
+  [, anomaly_prob_threshold => ANOMALY_PROB_THRESHOLD]
+)
+
+```
+
+### Input Arguments
+
+Argument                     | Requirement  | Type          | Description
+:--------------------------- | :----------- | :------------ | :----------
+**`historical_data`**        | **Required** | Table/Query   | The source table or subquery containing historical data for training context.
+**`target_data`**            | **Required** | Table/Query   | The source table or subquery containing data to analyze for anomalies.
+**`data_col`**               | **Required** | String        | The numeric column to analyze.
+**`timestamp_col`**          | **Required** | String        | The column containing dates/timestamps.
+**`id_cols`**                | Optional     | Array<String> | Grouping columns for multiple series (e.g., `['store_id']`).
+**`anomaly_prob_threshold`** | Optional     | Float64       | Threshold for anomaly detection (0 to 1). Defaults to 0.95.
+**`model`**                  | Optional     | String        | Model version. Defaults to `'TimesFM 2.0'`.
+
+### Output Schema
+
+| Column                           | Type       | Description                  |
+| :------------------------------- | :--------- | :--------------------------- |
+| **`id_cols`**                    | (As Input) | Original identifiers for the |
+:                                  :            : series.                      :
+| **`time_series_timestamp`**      | TIMESTAMP  | Timestamp for the analyzed   |
+:                                  :            : points.                      :
+| **`time_series_data`**           | FLOAT64    | The original data value.     |
+| **`is_anomaly`**                 | BOOL       | TRUE if the point is         |
+:                                  :            : identified as an anomaly.    :
+| **`lower_bound`**                | FLOAT64    | Lower bound of the expected  |
+:                                  :            : range.                       :
+| **`upper_bound`**                | FLOAT64    | Upper bound of the expected  |
+:                                  :            : range.                       :
+| **`anomaly_probability`**        | FLOAT64    | Probability that the point   |
+:                                  :            : is an anomaly.               :
+| **`ai_detect_anomalies_status`** | STRING     | Error messages or empty      |
+:                                  :            : string on success. A minimum :
+:                                  :            : of 3 data points is          :
+:                                  :            : required.                    :
+
+## Examples
+
+### Basic Anomaly Detection
+
+Detect anomalies in daily bike trips for a specific 2-month window based on
+prior history.
+
+```sql
+WITH bike_trips AS (
+  SELECT EXTRACT(DATE FROM starttime) AS date, COUNT(*) AS num_trips
+  FROM `bigquery-public-data.new_york.citibike_trips`
+  GROUP BY date
+)
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  -- Historical context (Training data equivalent)
+  (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+  -- Target range (Data to inspect for anomalies)
+  (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+  data_col => 'num_trips',
+  timestamp_col => 'date'
+);
+
+```
+
+### Multivariate Detection (Multiple Series)
+
+Use `id_cols` to detect anomalies separately for different user types (e.g.,
+Subscriber vs. Customer) in the same query.
+
+```sql
+WITH bike_trips AS (
+    SELECT
+      EXTRACT(DATE FROM starttime) AS date, usertype, gender,
+      COUNT(*) AS num_trips
+    FROM `bigquery-public-data.new_york.citibike_trips`
+    GROUP BY date, usertype, gender
+  )
+SELECT *
+FROM
+  AI.DETECT_ANOMALIES(
+    # Historical data from a query
+    (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+    # Target data from a query
+    (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+    data_col => 'num_trips',
+    timestamp_col => 'date',
+    id_cols => ['usertype', 'gender'],
+    model => "TimesFM 2.5",
+    anomaly_prob_threshold => 0.8);
+
+```