Skip to content

fix analytics queries#1141

Merged
BilalG1 merged 15 commits into
devfrom
fix-analytics-queries
Jan 29, 2026
Merged

fix analytics queries#1141
BilalG1 merged 15 commits into
devfrom
fix-analytics-queries

Conversation

@BilalG1
Copy link
Copy Markdown
Collaborator

@BilalG1 BilalG1 commented Jan 28, 2026

Summary by CodeRabbit

  • Improvements

    • Added automatic retry with stricter result validation and clearer error handling for query profiling to improve reliability.
  • Chores

    • CI workflows updated to use larger runner instances (upgraded runner size across relevant jobs).

✏️ Tip: You can customize this high-level summary in your review settings.

@vercel
Copy link
Copy Markdown

vercel Bot commented Jan 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
stack-backend Ready Ready Preview, Comment Jan 29, 2026 5:51pm
stack-dashboard Ready Ready Preview, Comment Jan 29, 2026 5:51pm
stack-demo Ready Ready Preview, Comment Jan 29, 2026 5:51pm
stack-docs Ready Ready Preview, Comment Jan 29, 2026 5:51pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Add a retry wrapper around ClickHouse profiling queries to enforce exactly one result with backoff retries; update several GitHub Actions workflows to use ubicloud-standard-16 runners (runner image bump only).

Changes

Cohort / File(s) Summary
ClickHouse Query Resilience
apps/backend/src/lib/clickhouse.tsx
Extracted the query into an internal queryProfile function and added a retry loop with delays [75, 150, 300, 600, 1200] ms. Enforces single-row result (returns the row), throws on multiple rows, and throws if zero results after all retries. Explicit result typing and JSON extraction added.
CI Workflows — runner bump
.github/workflows/restart-dev-and-test.yaml, .github/workflows/restart-dev-and-test-with-custom-base-port.yaml, .github/workflows/setup-tests.yaml, .github/workflows/setup-tests-with-custom-base-port.yaml
Changed job runner from ubicloud-standard-8 to ubicloud-standard-16 in each workflow; no other workflow logic or steps were modified.

Sequence Diagram(s)

(Skipped)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibbled code beneath the moon,
Tossed retries in a careful tune,
One row to fetch, no noisy throng,
Backoffs tapped a patient song,
CI woke up on a newer rune.

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is entirely empty beyond the contribution template comment. It provides no explanation of the analytics query changes, retry logic implementation, or the rationale for the infrastructure updates. Add a detailed description explaining the analytics query fix (retry mechanism, error handling), the reason for upgrading runner images, and the impact of these changes.
Title check ❓ Inconclusive The title 'fix analytics queries' is partially related to the changeset. While the backend analytics query logic was updated with retry mechanisms, the majority of changes involve updating GitHub Actions runner images, which is not reflected in the title. Clarify the title to reflect both the analytics query fix and the infrastructure changes, or confirm whether the runner upgrades are the primary focus alongside the analytics fix.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 28, 2026

Greptile Overview

Greptile Summary

Modified the getQueryTimingStats function to query across all ClickHouse cluster replicas using clusterAllReplicas('default', system.query_log) instead of directly querying system.query_log.

Key Changes:

  • Query now uses clusterAllReplicas('default', system.query_log) to fetch query timing statistics across all replicas in a clustered ClickHouse setup

Concerns:

  • The change assumes a cluster named 'default' exists, which may not be true in all environments
  • The dev docker-compose configuration runs ClickHouse as a single instance without clustering
  • No tests were added to verify this works in both clustered and non-clustered environments

Confidence Score: 2/5

  • This PR may break analytics functionality in non-clustered ClickHouse environments
  • The change assumes ClickHouse is running in cluster mode with a cluster named 'default', but the dev environment and potentially other deployments run ClickHouse as a single instance. This could cause query failures. No tests verify compatibility across different deployment configurations.
  • apps/backend/src/lib/clickhouse.tsx requires verification that the cluster query works in all environments

Important Files Changed

Filename Overview
apps/backend/src/lib/clickhouse.tsx Changed query from system.query_log to clusterAllReplicas('default', system.query_log) for distributed querying, but may break in non-clustered environments

Sequence Diagram

sequenceDiagram
    participant Client
    participant getQueryTimingStats
    participant ClickHouse as ClickHouse Cluster
    participant QueryLog as system.query_log

    Client->>getQueryTimingStats: Request timing stats for queryId
    getQueryTimingStats->>ClickHouse: SYSTEM FLUSH LOGS (admin auth)
    ClickHouse-->>getQueryTimingStats: Logs flushed
    getQueryTimingStats->>ClickHouse: Query clusterAllReplicas('default', system.query_log)
    ClickHouse->>QueryLog: Fetch from all replicas
    QueryLog-->>ClickHouse: ProfileEvents data
    ClickHouse-->>getQueryTimingStats: cpu_time_ms, wall_clock_time_ms
    alt Query log entry found
        getQueryTimingStats-->>Client: Return timing stats
    else No entry found
        getQueryTimingStats-->>Client: Throw StackAssertionError
    end
Loading

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread apps/backend/src/lib/clickhouse.tsx Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/backend/src/lib/clickhouse.tsx (1)

52-58: Make cluster name configurable and align log flush with clustered query.

Hardcoding clusterAllReplicas('default', system.query_log) will fail on deployments with non-default cluster names, and the local SYSTEM FLUSH LOGS doesn't flush logs from other replicas, potentially missing query statistics. Configure the cluster name via environment variable and use SYSTEM FLUSH LOGS ON CLUSTER when clustering is enabled.

🔧 Proposed fix (configurable cluster + cluster-aware flush)
 const clickhouseExternalPassword = getEnvVariable("STACK_CLICKHOUSE_EXTERNAL_PASSWORD", "");
 const clickhouseDefaultDatabase = getEnvVariable("STACK_CLICKHOUSE_DATABASE", "default");
+const clickhouseCluster = getEnvVariable("STACK_CLICKHOUSE_CLUSTER", "");
 const HAS_CLICKHOUSE = !!clickhouseUrl && !!clickhouseAdminPassword && !!clickhouseExternalPassword;
   await client.exec({
-    query: "SYSTEM FLUSH LOGS",
+    query: clickhouseCluster
+      ? `SYSTEM FLUSH LOGS ON CLUSTER '${clickhouseCluster}'`
+      : "SYSTEM FLUSH LOGS",
     auth: {
       username: clickhouseAdminUser,
       password: clickhouseAdminPassword,
     },
   });
+  const queryLogSource = clickhouseCluster
+    ? `clusterAllReplicas('${clickhouseCluster}', system.query_log)`
+    : "system.query_log";
   const profile = await client.query({
     query: `
     SELECT
       ProfileEvents['CPUTimeMicroseconds'] / 1000 AS cpu_time_ms,
       ProfileEvents['RealTimeMicroseconds'] / 1000 AS wall_clock_time_ms
-    FROM clusterAllReplicas('default', system.query_log)
+    FROM ${queryLogSource}
     WHERE query_id = {query_id:String} AND type = 'QueryFinish'
     ORDER BY event_time DESC
     LIMIT 1
   `,

Comment thread apps/backend/src/lib/clickhouse.tsx
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@apps/backend/src/lib/clickhouse.tsx`:
- Around line 51-67: The queryProfile function's ClickHouse query hard-codes
clusterAllReplicas('default', ...) and risks per-replica LIMIT pushdown; update
the client.query call to use the configured cluster name (use the existing
cluster variable or an env var like CLICKHOUSE_CLUSTER instead of 'default') and
append "SETTINGS distributed_push_down_limit = 0" to the SQL to force the LIMIT
on the initiator; modify the query string in queryProfile accordingly so it
references the configurable cluster name and includes the SETTINGS clause.

Comment thread apps/backend/src/lib/clickhouse.tsx
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts (1)

64-70: Fix inconsistent team_id expectation in the second test.

The backend explicitly sets team_id: null for token refresh events (see apps/backend/src/lib/events.tsx:255), yet the "cannot read events from other projects" test expects team_id: "" (line 55). The first test correctly expects null. Update the snapshot in the second test to match the backend behavior:

Snapshot change required
{
  "branch_id": "main",
  "event_type": "$token-refresh",
  "project_id": "<stripped UUID>",
- "team_id": "",
+ "team_id": null,
  "user_id": "<stripped UUID>",
},
🧹 Nitpick comments (1)
apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts (1)

141-152: Consider typing the row parameter instead of using any.

Lines 143 and 152 use (row: any) without explanation. Per coding guidelines, avoid any or leave a comment explaining why the type system fails. A simple inline type would improve type safety:

type AnalyticsRow = { user_id: string; event_type: string; project_id: string; branch_id: string; team_id: string | null };
♻️ Suggested improvement
+type AnalyticsRow = { user_id: string; event_type: string; project_id: string; branch_id: string; team_id: string | null };
+
 // Then use:
-  expect(userAResults.every((row: any) => row.user_id === userA)).toBe(true);
+  expect(userAResults.every((row: AnalyticsRow) => row.user_id === userA)).toBe(true);

As per coding guidelines: "Avoid the any type. When necessary, leave a comment explaining why it's being used."

@BilalG1 BilalG1 requested a review from N2D4 January 29, 2026 00:11
@BilalG1 BilalG1 assigned N2D4 and unassigned BilalG1 Jan 29, 2026
Comment thread apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts Outdated
Comment thread apps/backend/src/lib/clickhouse.tsx Outdated
Comment thread apps/e2e/tests/backend/endpoints/api/v1/analytics-events.test.ts Outdated
@github-actions github-actions Bot assigned BilalG1 and unassigned N2D4 Jan 29, 2026
Comment thread .github/workflows/setup-tests.yaml Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/workflows/setup-tests.yaml:
- Line 46: The workflow step uses an invalid command string "pnpm test run
--reporter=verbose" which passes an extra "run" to the test script; update that
step to call "pnpm test --reporter=verbose" (or "pnpm run test
--reporter=verbose") so the package script is invoked correctly—replace the
incorrect command token in the workflow step that currently contains "pnpm test
run --reporter=verbose".

Comment thread .github/workflows/setup-tests.yaml Outdated
@BilalG1 BilalG1 merged commit 7b5cf4f into dev Jan 29, 2026
27 of 29 checks passed
@BilalG1 BilalG1 deleted the fix-analytics-queries branch January 29, 2026 18:22
@coderabbitai coderabbitai Bot mentioned this pull request Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants