gemini-cli-extensions
diff --git a/‎skills/bigquery-ai-ml/SKILL.md‎
Lines changed: 47 additions & 0 deletions b/‎skills/bigquery-ai-ml/SKILL.md‎
Lines changed: 47 additions & 0 deletions
diff --git a/‎skills/bigquery-ai-ml/references/bigquery_ai_classify.md‎
Lines changed: 92 additions & 0 deletions b/‎skills/bigquery-ai-ml/references/bigquery_ai_classify.md‎
Lines changed: 92 additions & 0 deletions
diff --git a/‎skills/bigquery-ai-ml/references/bigquery_ai_detect_anomalies.md‎
Lines changed: 110 additions & 0 deletions b/‎skills/bigquery-ai-ml/references/bigquery_ai_detect_anomalies.md‎
Lines changed: 110 additions & 0 deletions
diff --git a/‎skills/bigquery-ai-ml/references/bigquery_ai_forecast.md‎
Lines changed: 106 additions & 0 deletions b/‎skills/bigquery-ai-ml/references/bigquery_ai_forecast.md‎
Lines changed: 106 additions & 0 deletions
@@ -0,0 +1,47 @@
+---
+name: bigquery-ai-ml
+license: Apache-2.0
+metadata:
+  author: google-adk
+  version: "1.0"
+description: |
+  Skill for BigQuery AI and Machine Learning queries using standard SQL
+  and `AI.*` functions (preferred over dedicated tools).
+
+---
+
+# Skill: bigquery-ai-ml
+
+This skill defines the usage and rules for BigQuery AI/ML functions,
+preferring SQL-based Skills over dedicated BigQuery tools.
+
+## 1. Skill vs Tool Preference (BigQuery AI/ML)
+
+Agents should **prefer using the Skill (SQL via `execute_sql()`)** over
+dedicated BigQuery tools for functionalities like Forecasting and Anomaly
+Detection.
+
+Use `execute_sql()` with the standard BigQuery `AI.*` functions for these tasks
+instead of the corresponding high-level tools.
+
+## 2. Mandatory Reference Routing
+
+This skill file does not contain the syntax for these functions. You **MUST**
+read the associated reference file before generating SQL.
+
+**CRITICAL**: DO NOT GUESS filenames. You MUST only use the exact paths
+provided below.
+
+| Function | Description | Required Reference File to Retrieve |
+| :--- | :--- | :--- |
+| **AI.FORECAST** | Time-series forecasting via the pre-trained TimesFM model | `references/bigquery_ai_forecast.md` |
+| **AI.CLASSIFY** | Categorize unstructured data into predefined labels | `references/bigquery_ai_classify.md` |
+| **AI.DETECT_ANOMALIES** | Identify deviations in time-series data via the pre-trained TimesFM model | `references/bigquery_ai_detect_anomalies.md` |
+| **AI.GENERATE** | General-purpose text and content generation | `references/bigquery_ai_generate.md` |
+| **AI.GENERATE_BOOL** | Generate a boolean value (TRUE/FALSE) based on a prompt | `references/bigquery_ai_generate_bool.md` |
+| **AI.GENERATE_DOUBLE** | Generate a floating-point number based on a prompt | `references/bigquery_ai_generate_double.md` |
+| **AI.GENERATE_INT** | Generate an integer value based on a prompt | `references/bigquery_ai_generate_int.md` |
+| **AI.IF** | Evaluate a natural-language boolean condition | `references/bigquery_ai_if.md` |
+| **AI.SCORE** | Rank items by semantic relevance (use with ORDER BY) | `references/bigquery_ai_score.md` |
+| **AI.SIMILARITY** | Compute cosine similarity between two inputs | `references/bigquery_ai_similarity.md` |
+| **AI.SEARCH** | Semantic search on tables with autonomous embedding generation | `references/bigquery_ai_search.md` |
@@ -0,0 +1,92 @@
+# BigQuery AI.Classify
+
+`AI.CLASSIFY` categorizes unstructured data into a predefined set of labels.
+
+## Syntax Reference
+
+```sql
+AI.CLASSIFY(
+  [ input => ] 'INPUT',
+  [ categories => ] 'CATEGORIES'
+  [, connection_id => 'CONNECTION_ID' ]
+  [, endpoint => 'ENDPOINT' ]
+  [, output_mode => 'OUTPUT_MODE' ]
+)
+```
+
+### Input Arguments
+
+| Argument            | Requirement  | Type          | Description           |
+| :------------------ | :----------- | :------------ | :-------------------- |
+| **`input`**         | **Required** | String        | The text content to   |
+:                     :              :               : classify.             :
+| **`categories`**    | **Required** | Array<String> | A list of target      |
+:                     :              :               : categories/labels.    :
+:                     :              :               : Can be                :
+:                     :              :               : `ARRAY<STRING>` or    :
+:                     :              :               : `ARRAY<STRUCT<STRING, :
+:                     :              :               : STRING>>` (label,     :
+:                     :              :               : description).         :
+| **`connection_id`** | Optional     | String        | The connection ID to  |
+:                     :              :               : use for the LLM.      :
+| **`endpoint`**      | Optional     | String        | The model name, e.g., |
+:                     :              :               : `'gemini-2.5-flash'`. :
+| **`output_mode`**   | Optional     | String        | `'single'` (default)  |
+:                     :              :               : or `'multi'`.         :
+:                     :              :               : Determines the output :
+:                     :              :               : type.                 :
+
+### Output Schema
+
+The output type depends on the `output_mode` argument:
+
+| Output Mode      | output_mode Value | Type            | Description         |
+| :--------------- | :---------------- | :-------------- | :------------------ |
+| **Single Label** | `NULL` (Default)  | `STRING`        | The single category |
+:                  :                   :                 : that best fits the  :
+:                  :                   :                 : input.              :
+| **Single Label   | `'single'`        | `ARRAY<STRING>` | An array containing |
+: (Explicit)**     :                   :                 : exactly one         :
+:                  :                   :                 : category string.    :
+| **Multi Label**  | `'multi'`         | `ARRAY<STRING>` | An array containing |
+:                  :                   :                 : zero or more        :
+:                  :                   :                 : matching            :
+:                  :                   :                 : categories.         :
+
+## Examples
+
+### Classify text into categories
+
+```sql
+SELECT
+  content,
+  AI.CLASSIFY(
+    content,
+    categories => ['Spam', 'Not Spam', 'Urgent'],
+    connection_id => 'my-project.us.my-connection'
+  ) as classification
+FROM `dataset.emails`;
+```
+
+### Classify text into multiple topics
+
+```
+SELECT
+  title,
+  body,
+  AI.CLASSIFY(
+    body,
+    categories => ['tech', 'sport', 'business', 'politics', 'entertainment', 'other'],
+    output_mode => 'multi') AS categories
+FROM
+  `bigquery-public-data.bbc_news.fulltext`
+LIMIT 100;
+```
+
+### Classify reviews by sentiment
+
+SELECT AI.CLASSIFY( ('Classify the review by sentiment: ', review), categories
+=> [('green', 'The review is positive.'), ('yellow', 'The review is neutral.'),
+('red', 'The review is negative.')]) AS ai_review_rating, reviewer_rating AS
+human_provided_rating, review, FROM `bigquery-public-data.imdb.reviews` WHERE
+title = 'The English Patient'
@@ -0,0 +1,110 @@
+# BigQuery AI.Detect_Anomalies
+
+`AI.DETECT_ANOMALIES` uses the pre-trained **TimesFM** model to identify
+deviations in time series data without needing to train a custom model.
+
+## Syntax Reference
+
+This function compares a target dataset against a historical dataset to identify
+anomalies.
+
+```sql
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  { TABLE `project.dataset.history_table` | (SELECT * FROM history_query) },
+  { TABLE `project.dataset.target_table` | (SELECT * FROM target_query) },
+  data_col => 'DATA_COL',
+  timestamp_col => 'TIMESTAMP_COL'
+  [, model => 'MODEL']
+  [, id_cols => ID_COLS]
+  [, anomaly_prob_threshold => ANOMALY_PROB_THRESHOLD]
+)
+
+```
+
+### Input Arguments
+
+Argument                     | Requirement  | Type          | Description
+:--------------------------- | :----------- | :------------ | :----------
+**`historical_data`**        | **Required** | Table/Query   | The source table or subquery containing historical data for training context.
+**`target_data`**            | **Required** | Table/Query   | The source table or subquery containing data to analyze for anomalies.
+**`data_col`**               | **Required** | String        | The numeric column to analyze.
+**`timestamp_col`**          | **Required** | String        | The column containing dates/timestamps.
+**`id_cols`**                | Optional     | Array<String> | Grouping columns for multiple series (e.g., `['store_id']`).
+**`anomaly_prob_threshold`** | Optional     | Float64       | Threshold for anomaly detection (0 to 1). Defaults to 0.95.
+**`model`**                  | Optional     | String        | Model version. Defaults to `'TimesFM 2.0'`.
+
+### Output Schema
+
+| Column                           | Type       | Description                  |
+| :------------------------------- | :--------- | :--------------------------- |
+| **`id_cols`**                    | (As Input) | Original identifiers for the |
+:                                  :            : series.                      :
+| **`time_series_timestamp`**      | TIMESTAMP  | Timestamp for the analyzed   |
+:                                  :            : points.                      :
+| **`time_series_data`**           | FLOAT64    | The original data value.     |
+| **`is_anomaly`**                 | BOOL       | TRUE if the point is         |
+:                                  :            : identified as an anomaly.    :
+| **`lower_bound`**                | FLOAT64    | Lower bound of the expected  |
+:                                  :            : range.                       :
+| **`upper_bound`**                | FLOAT64    | Upper bound of the expected  |
+:                                  :            : range.                       :
+| **`anomaly_probability`**        | FLOAT64    | Probability that the point   |
+:                                  :            : is an anomaly.               :
+| **`ai_detect_anomalies_status`** | STRING     | Error messages or empty      |
+:                                  :            : string on success. A minimum :
+:                                  :            : of 3 data points is          :
+:                                  :            : required.                    :
+
+## Examples
+
+### Basic Anomaly Detection
+
+Detect anomalies in daily bike trips for a specific 2-month window based on
+prior history.
+
+```sql
+WITH bike_trips AS (
+  SELECT EXTRACT(DATE FROM starttime) AS date, COUNT(*) AS num_trips
+  FROM `bigquery-public-data.new_york.citibike_trips`
+  GROUP BY date
+)
+SELECT *
+FROM AI.DETECT_ANOMALIES(
+  -- Historical context (Training data equivalent)
+  (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+  -- Target range (Data to inspect for anomalies)
+  (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+  data_col => 'num_trips',
+  timestamp_col => 'date'
+);
+
+```
+
+### Multivariate Detection (Multiple Series)
+
+Use `id_cols` to detect anomalies separately for different user types (e.g.,
+Subscriber vs. Customer) in the same query.
+
+```sql
+WITH bike_trips AS (
+    SELECT
+      EXTRACT(DATE FROM starttime) AS date, usertype, gender,
+      COUNT(*) AS num_trips
+    FROM `bigquery-public-data.new_york.citibike_trips`
+    GROUP BY date, usertype, gender
+  )
+SELECT *
+FROM
+  AI.DETECT_ANOMALIES(
+    # Historical data from a query
+    (SELECT * FROM bike_trips WHERE date <= DATE('2016-06-30')),
+    # Target data from a query
+    (SELECT * FROM bike_trips WHERE date BETWEEN '2016-07-01' AND '2016-09-01'),
+    data_col => 'num_trips',
+    timestamp_col => 'date',
+    id_cols => ['usertype', 'gender'],
+    model => "TimesFM 2.5",
+    anomaly_prob_threshold => 0.8);
+
+```
@@ -0,0 +1,106 @@
+# BigQuery AI.Forecast
+
+`AI.FORECAST` leverages the pre-trained **TimesFM** foundation model to generate
+forecasts without the need to train and manage custom models.
+
+## Syntax Reference
+
+```sql
+SELECT
+  *
+FROM
+  AI.FORECAST(
+    { TABLE `project.dataset.table` | (QUERY_STATEMENT) },
+    data_col => 'DATA_COL',
+    timestamp_col => 'TIMESTAMP_COL'
+    [, model => 'MODEL']
+    [, id_cols => ID_COLS]
+    [, horizon => HORIZON]
+    [, confidence_level => CONFIDENCE_LEVEL]
+    [, output_historical_time_series => OUTPUT_HISTORICAL_TIME_SERIES]
+    [, context_window => CONTEXT_WINDOW]
+  )
+```
+
+### Input Arguments
+
+| Argument               | Requirement  | Type          | Description       |
+| :--------------------- | :----------- | :------------ | :---------------- |
+| **`input_data`**       | **Required** |               | The source table  |
+:                        :              :               : or subquery       :
+:                        :              :               : containing        :
+:                        :              :               : historical data.  :
+| **`data_col`**         | **Required** | String        | The numeric       |
+:                        :              :               : column to         :
+:                        :              :               : predict.          :
+| **`timestamp_col`**    | **Required** | String        | The column        |
+:                        :              :               : containing        :
+:                        :              :               : dates/timestamps. :
+| **`id_cols`**          | Optional     | Array<String> | Grouping columns  |
+:                        :              :               : for multiple      :
+:                        :              :               : series (e.g.,     :
+:                        :              :               : `['store_id']`).  :
+| **`horizon`**          | Optional     | Int64         | Number of future  |
+:                        :              :               : points to         :
+:                        :              :               : predict. Defaults :
+:                        :              :               : to 10. The valid  :
+:                        :              :               : input range is    :
+:                        :              :               : [1, 10,000]       :
+| **`confidence_level`** | Optional     | Float64       | Confidence        |
+:                        :              :               : interval (0 to    :
+:                        :              :               : 1). Defaults to   :
+:                        :              :               : 0.95.             :
+| **`model`**            | Optional     | String        | Model version.    |
+:                        :              :               : Defaults to       :
+:                        :              :               : `'TimesFM 2.0'`.  :
+| **`context_window`**   | Optional     | Int64         | The number of     |
+:                        :              :               : historical data   :
+:                        :              :               : points the model  :
+:                        :              :               : uses to forecast. :
+:                        :              :               : The min value is  :
+:                        :              :               : 64 and the max    :
+:                        :              :               : value is 2048 for :
+:                        :              :               : `'TimesFM 2.0'`.  :
+:                        :              :               : If not set, the   :
+:                        :              :               : model determines  :
+:                        :              :               : this              :
+:                        :              :               : automatically.    :
+
+### Output Schema
+
+The schema adjusts based on the `output_historical_time_series` flag.
+
+Column                                | Type       | Included if output_historical_time_series=FALSE | Included if output_historical_time_series=TRUE | Description
+:------------------------------------ | :--------- | :---------------------------------------------- | :--------------------------------------------- | :----------
+**`id_cols`**                         | (As Input) | Yes                                             | Yes                                            | Original identifiers for the series.
+**`forecast_timestamp`**              | TIMESTAMP  | **Yes**                                         | No                                             | Timestamp for predicted points.
+**`forecast_value`**                  | FLOAT64    | **Yes**                                         | No                                             | The 50% quantile (median) prediction.
+**`time_series_timestamp`**           | TIMESTAMP  | No                                              | **Yes**                                        | Uniform timestamp column for both history and forecast.
+**`time_series_data`**                | FLOAT64    | No                                              | **Yes**                                        | Merged column: actual values for history, median for forecast.
+**`time_series_type`**                | STRING     | No                                              | **Yes**                                        | Label: `'history'` or `'forecast'`.
+**`prediction_interval_lower_bound`** | FLOAT64    | Yes                                             | Yes                                            | Lower bound (NULL for historical rows).
+**`prediction_interval_upper_bound`** | FLOAT64    | Yes                                             | Yes                                            | Upper bound (NULL for historical rows).
+**`confidence_level`**                | FLOAT64    | Yes                                             | Yes                                            | The constant confidence level used.
+**`ai_forecast_status`**              | STRING     | Yes                                             | Yes                                            | Error messages or empty string on success. A minimum of 3 data points is required.
+
+## Examples
+
+### Forecasting with History
+
+```sql
+WITH
+  citibike_trips AS (
+    SELECT EXTRACT(DATE FROM starttime) AS date, usertype, COUNT(*) AS num_trips
+    FROM `bigquery-public-data.new_york.citibike_trips`
+    GROUP BY date, usertype
+  )
+SELECT *
+FROM
+  AI.FORECAST(
+    TABLE citibike_trips,
+    data_col => 'num_trips',
+    timestamp_col => 'date',
+    id_cols => ['usertype'],
+    horizon => 30,
+    output_historical_time_series => true);
+```