From 0bf28954ed9332109af6f0d78cf7382793764f22 Mon Sep 17 00:00:00 2001
From: Simon Rozsival <simon@rozsival.com>
Date: Tue, 12 May 2026 07:42:48 +0200
Subject: [PATCH] [ci-status skill] Enumerate all test runs, not just
 failed-job runs
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The ci-status skill was missing real test failures in two scenarios:

1. `az devops invoke --area test --resource runs` incorrectly routes
   to `/test/Runs/Statistics` on the `devdiv` org and returns 404,
   so the skill silently skipped test enumeration entirely.

2. The filter `runStatistics[?outcome=='Failed']` only matched runs
   with at least one `Failed` result. It missed runs where the test
   APK process crashed (recorded as `unanalyzedTests > 0` /
   `NotExecuted`), and the "only iterate over failed timeline jobs"
   pattern missed runs published under `succeededWithIssues` jobs.

Together these produced false-green reports: GitHub showed the check
as RED, the skill found nothing actionable in the timeline, and
reported CI as essentially green.

This change:

* Replaces the broken `az devops invoke` call with a direct REST
  call (`curl` + AAD bearer token) that works on both
  `dev.azure.com/dnceng-public` and `devdiv.visualstudio.com`.
* Widens the failing-run filter to `totalTests > passedTests` so it
  catches Failed, NotExecuted, Inconclusive, and unanalyzed buckets.
* Adds an explicit `NotExecuted` follow-up query for crashed
  instrumentation runs (e.g. CoreCLRTrimmable SIGSEGV cases).
* Tightens the "report green and stop" gate so it only fires when
  every test run has `passedTests == totalTests` AND no required
  GitHub check is RED — preventing the false-green path.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---
 .github/skills/ci-status/SKILL.md | 56 +++++++++++++++++++------------
 1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/.github/skills/ci-status/SKILL.md b/.github/skills/ci-status/SKILL.md
index 49d08331a6d..d25302fb680 100644
--- a/.github/skills/ci-status/SKILL.md
+++ b/.github/skills/ci-status/SKILL.md
@@ -159,40 +159,47 @@ If the build already completed, skip the ETA and show the actual duration instea
 
 #### Step 3b — Check for failed tests (always do this, especially when the build is still running)
 
-**This step is critical when the build is in progress.** Test results are published as jobs complete, so failures may already be visible before the build finishes. Surfacing these early lets the user start fixing them immediately.
+**This step is critical** and must run for every build — even when no timeline records are marked `failed`, and even when the build's status is `succeeded` or `succeededWithIssues`. Many failing test runs live under timeline jobs that are themselves `succeededWithIssues` (the wrapper script swallows a non-zero exit), and crashed instrumentations show up as `unanalyzedTests > 0` rather than `Failed`. The previous narrow filter missed both cases.
 
-Query test runs for this build:
+**Always enumerate ALL test runs for the build, not just the ones whose containing job is `failed`.**
+
+> ⚠️ Do **not** use `az devops invoke --area test --resource runs` — on `devdiv` it incorrectly routes to `/test/Runs/Statistics` and returns 404. Use the REST API directly with an AAD bearer token (works on both `devdiv` and `dnceng-public`):
 
 ```bash
-az devops invoke --area test --resource runs \
-  --route-parameters project=$PROJECT \
-  --org $ORG_URL \
-  --query-parameters "buildUri=vstfs:///Build/Build/$BUILD_ID" \
-  --query "value[?runStatistics[?outcome=='Failed']] | [].{id:id, name:name, totalTests:totalTests, state:state, stats:runStatistics}" \
-  --output json 2>&1
+# 499b84ac-1321-427f-aa17-267ca6975798 is the well-known AAD resource ID for Azure DevOps.
+TOKEN=$(az account get-access-token --resource 499b84ac-1321-427f-aa17-267ca6975798 --query accessToken -o tsv)
+
+curl -sL -u ":$TOKEN" \
+  "$ORG_URL/$PROJECT/_apis/test/runs?buildUri=vstfs:///Build/Build/$BUILD_ID&api-version=7.0" \
+  | jq '[.value[] | {id, name, totalTests, passedTests, unanalyzedTests, state}
+                    | select((.totalTests // 0) > (.passedTests // 0))]'
 ```
 
-For each test run that has failures, fetch the failed test results:
+The `select` filter is **`totalTests > passedTests`** on purpose: it catches failed tests, not-executed tests, inconclusive tests, and crashed/unanalyzed tests in one go. Do NOT filter on `runStatistics[?outcome=='Failed']` — that misses `Unanalyzed` (crashes) and several other buckets.
+
+For each failing run, fetch the individual results — first `outcomes=Failed`, then `outcomes=NotExecuted` if the run had `unanalyzedTests > 0`:
 
 ```bash
-az devops invoke --area test --resource results \
-  --route-parameters project=$PROJECT runId=$RUN_ID \
-  --org $ORG_URL \
-  --query-parameters "outcomes=Failed&\$top=20" \
-  --query "value[].{testName:testCaseTitle, outcome:outcome, errorMessage:errorMessage, durationMs:durationInMs}" \
-  --output json 2>&1
+curl -sL -u ":$TOKEN" \
+  "$ORG_URL/$PROJECT/_apis/test/runs/$RUN_ID/results?outcomes=Failed&\$top=50&api-version=7.0" \
+  | jq '[.value[] | {testName: .testCaseTitle, outcome, error: .errorMessage}]'
+
+# Only run this if the run has unanalyzedTests > 0 (e.g. a crashed instrumentation run).
+curl -sL -u ":$TOKEN" \
+  "$ORG_URL/$PROJECT/_apis/test/runs/$RUN_ID/results?outcomes=NotExecuted&\$top=50&api-version=7.0" \
+  | jq '[.value[] | {testName: .testCaseTitle, outcome, error: .errorMessage}]'
 ```
 
-If the `errorMessage` is truncated or absent, you can fetch a single test result's full details:
+If `errorMessage` is truncated or missing, fetch the full result (errorMessage + stackTrace) for a specific test:
 
 ```bash
-az devops invoke --area test --resource results \
-  --route-parameters project=$PROJECT runId=$RUN_ID testId=$TEST_ID \
-  --org $ORG_URL \
-  --query "{testName:testCaseTitle, errorMessage:errorMessage, stackTrace:stackTrace}" \
-  --output json 2>&1
+curl -sL -u ":$TOKEN" \
+  "$ORG_URL/$PROJECT/_apis/test/runs/$RUN_ID/results/$TEST_ID?api-version=7.0" \
+  | jq '{name: .testCaseTitle, error: .errorMessage, stack: .stackTrace}'
 ```
 
+**Heuristic for diagnosing a crashed instrumentation:** if a run has `totalTests=1, passedTests=0, unanalyzedTests=1` and the lone result's title is "Possible Crash / …", the test APK process crashed before NUnit/xUnit could write results. Download the matching `Test Results - APKs …` build artifact and inspect `logcat-*.txt` for `Fatal signal` / `FATAL` / tombstone.
+
 #### Step 4 — Present summary
 
 Use this format — **one section per AZDO build**, each with its own progress and ETA:
@@ -257,7 +264,12 @@ Use this format — **one section per AZDO build**, each with its own progress a
 
 > ⚠️ Build still in progress, but **N tests have already failed** — you can start investigating these now.
 
-**If no failures found anywhere**, report CI as green and stop.
+**If no failures found anywhere**, report CI as green and stop. "No failures found anywhere" means **all three** of:
+1. No timeline records with `result == 'failed'`.
+2. For every test run returned by Step 3b, `passedTests == totalTests` (no Failed, NotExecuted, Inconclusive, or unanalyzed buckets).
+3. All required GitHub checks are `SUCCESS`.
+
+If a GitHub check is RED but the timeline shows no failures, **do not** report green — go back to Step 3b and enumerate test runs, since the failure is almost certainly hiding in a `succeededWithIssues` job's test results.
 
 ### Phase 2: Deep Investigation (only if user requests)