Skip to content

feat: add /api/overview endpoint for public usage statistics#3

Open
tym83 wants to merge 6 commits intocozystack:mainfrom
tym83:feat/overview-api
Open

feat: add /api/overview endpoint for public usage statistics#3
tym83 wants to merge 6 commits intocozystack:mainfrom
tym83:feat/overview-api

Conversation

@tym83
Copy link
Copy Markdown

@tym83 tym83 commented Apr 5, 2026

Summary

  • Add GET /api/overview endpoint that returns aggregated telemetry as JSON (clusters, nodes, tenants, per-app usage)
  • Monthly snapshot collection from VictoriaMetrics at 00:01 Pacific Time on the 1st of each month, plus initial collection on first startup
  • Dynamic app list fetched from cozystack/cozystack packages/apps to always include newly added applications
  • Three time periods: last month, last quarter (3 months), last 12 months
  • File-based snapshot storage backed by a PersistentVolumeClaim
  • Updated README with API documentation and configuration reference

Changes

  • overview.go — new file with snapshot collection, VM querying, GitHub app list fetching, aggregation, and HTTP handler
  • main.go — wire in new --vmselect-url and --snapshot-dir flags, register /api/overview route
  • charts/cozy-telemetry/templates/deployment.yaml — add volume mount, new CLI args, single replica with Recreate strategy (PVC requires RWO)
  • charts/cozy-telemetry/templates/pvc.yaml — new PVC for snapshot storage
  • charts/cozy-telemetry/values.yaml — add vmSelectURL, snapshotDir, and snapshot.storage config
  • README.md — document the new endpoint, response format, and all configuration flags

Test plan

  • Build Docker image and verify it starts without errors
  • Verify /api/overview returns valid JSON (or 503 if no snapshots yet)
  • Deploy to staging cluster and confirm snapshot collection works against VictoriaMetrics
  • Verify monthly cron scheduling logic by checking logs
  • Confirm GitHub API app list fetching works (and fallback to default list on rate limit)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a new /api/overview endpoint returning aggregated usage stats for last month, quarter, and 12 months with automatic monthly snapshots.
  • Documentation

    • README updated with API response shape, snapshot schedule/behavior, app-list retrieval, and configuration options (forwarding URL, listen address, VM select URL, snapshot directory).
  • Chores

    • Chart values and deployment updated to expose snapshot directory and VM URL; replica count adjusted.

Add monthly snapshot collection from VictoriaMetrics and a public
GET /api/overview JSON endpoint that aggregates telemetry into three
time periods (last month, last quarter, last 12 months).

- Query VM for cluster count, node count, tenant count, and per-app usage
- Fetch app list dynamically from GitHub cozystack/cozystack packages/apps
- Store monthly snapshots to disk (PVC-backed) on the 1st of each month
- Collect initial snapshot on first startup
- Add Helm chart PVC, vmselect URL, and snapshot directory config

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7163d0ca-1734-4245-8568-7970b0086581

📥 Commits

Reviewing files that changed from the base of the PR and between 7108344 and 9353f8d.

📒 Files selected for processing (2)
  • main.go
  • overview.go
✅ Files skipped from review due to trivial changes (1)
  • overview.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • main.go

📝 Walkthrough

Walkthrough

Adds a new /api/overview HTTP endpoint that returns aggregated telemetry snapshots, implements monthly snapshot creation/caching (disk + in-memory), queries VictoriaMetrics for metrics, fetches app kinds from GitHub, and exposes configuration via CLI flags and Helm chart values for vmselect URL and snapshot directory.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added docs for the new GET /api/overview endpoint, monthly snapshot workflow (1st-of-month snapshot + initial startup snapshot), response schema (generated_at, periods with month/quarter/year), and configuration flags for forwarding, listen address, vmselect URL, and snapshot directory.
Helm chart
charts/cozy-telemetry/values.yaml, charts/cozy-telemetry/templates/deployment.yaml
Added config.vmSelectURL and config.snapshotDir, set replicaCount: 1, passed --vmselect-url and --snapshot-dir to the container, added a volumeMount for snapshots and an emptyDir volumes entry.
Application code
main.go, overview.go
Added CLI flags -vmselect-url and -snapshot-dir, created OverviewManager via NewOverviewManager(vmSelectURL, snapshotDir), registered /api/overview handler wrapped with http.TimeoutHandler (55s) to avoid inheriting the server WriteTimeout (server WriteTimeout remains 10s). New overview.go implements Snapshot/Period/Overview types, snapshot lifecycle (disk + in-memory), inflight coalescing, VictoriaMetrics PromQL queries, GitHub app-kind fetching (with fallback list), aggregation for month/quarter/year, and atomic snapshot persistence.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as "HTTP Server"
    participant OM as "OverviewManager"
    participant Cache as "InMemoryCache"
    participant FS as "FileSystem"
    participant VM as "VictoriaMetrics (vmselect)"
    participant GH as "GitHub (cozystack)"

    Client->>API: GET /api/overview?year=YYYY&month=MM
    API->>OM: HandleOverview(request)
    OM->>Cache: check snapshot for month
    alt snapshot in memory
        Cache-->>OM: snapshot
    else
        OM->>FS: check <snapshotDir>/<month>.json
        alt file exists
            FS-->>OM: read snapshot
        else
            OM->>VM: promql queries (clusters, nodes, tenants, apps by kind)
            OM->>GH: fetch packages/apps directory (app kinds)
            VM-->>OM: metric vectors/scalars
            GH-->>OM: app kind list (or fallback)
            OM->>FS: write snapshot atomically (tmp -> rename)
            OM->>Cache: store snapshot
        end
    end
    OM->>OM: aggregate snapshots (month/quarter/year)
    OM-->>API: JSON OverviewResponse
    API-->>Client: 200 OK + body
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I nibble at snapshots, byte by byte,
VictoriaMetrics hums through the night,
GitHub lends kinds, cache keeps them near,
Months roll to quarters and then to the year—
Telemetry carrots gleam in my sight! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the primary change: adding a new public API endpoint for usage statistics.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a Public Statistics API to the telemetry server, enabling the collection and serving of aggregated usage statistics. Key additions include an OverviewManager for handling monthly snapshots, a persistent storage mechanism via PVC, and the /api/overview endpoint. Feedback focuses on correcting Go time formatting strings to ensure accurate date reporting, addressing potential timezone loading issues in minimal runtime environments, and optimizing HTTP client usage for better performance.

overview.go Outdated

if len(snapshots) == 1 {
stats.Label = latestDate.Format("January 2006")
stats.Start = latestDate.Format("2006-01-01")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The time format string 2006-01-01 is incorrect. In Go's time formatting, 01 is the placeholder for the month and 02 is for the day. Using 2006-01-01 will result in the month being repeated (e.g., 2024-03-03 for March). Since latestDate is already the first of the month, you should use 2006-01-02 to correctly output YYYY-MM-01.

Suggested change
stats.Start = latestDate.Format("2006-01-01")
stats.Start = latestDate.Format("2006-01-02")

Copy link
Copy Markdown
Author

@tym83 tym83 Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 328eed6 — both format strings corrected to "2006-01-02".

overview.go Outdated
stats.Label = fmt.Sprintf("%s \u2014 %s",
oldestDate.Format("January 2006"),
latestDate.Format("January 2006"))
stats.Start = oldestDate.Format("2006-01-01")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The time format string 2006-01-01 is incorrect. It should be 2006-01-02 to correctly represent the first day of the month.

Suggested change
stats.Start = oldestDate.Format("2006-01-01")
stats.Start = oldestDate.Format("2006-01-02")

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 328eed6.

overview.go Outdated
Comment on lines +150 to +155
loc, err := time.LoadLocation("America/Los_Angeles")
if err != nil {
log.Printf("Warning: cannot load Pacific timezone, using UTC: %v", err)
return time.UTC
}
return loc
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

time.LoadLocation depends on the timezone database being present in the runtime environment. In minimal Docker images (like scratch or alpine without tzdata), this will fail and fallback to UTC. This would result in snapshots being collected at the wrong time relative to the Pacific Time mentioned in the documentation. Consider importing time/tzdata to embed the database in the binary.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 8bad2f4. Added import _ "time/tzdata" to embed the timezone database in the binary.

Comment on lines +68 to +73
type OverviewManager struct {
vmSelectURL string
snapshotDir string
mu sync.RWMutex
snapshots []Snapshot
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new http.Client for every request (as seen in queryVector and fetchAppList) is inefficient because it prevents connection reuse. It is recommended to share a single http.Client instance by adding it as a field to the OverviewManager struct and initializing it in the constructor.

Suggested change
type OverviewManager struct {
vmSelectURL string
snapshotDir string
mu sync.RWMutex
snapshots []Snapshot
}
type OverviewManager struct {
vmSelectURL string
snapshotDir string
httpClient *http.Client
mu sync.RWMutex
snapshots []Snapshot
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 8bad2f4httpClient is now a field on OverviewManager, initialized once in the constructor.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (5)
overview.go (3)

440-445: Unchecked error from JSON encoding.

The error from json.NewEncoder(w).Encode(overview) is ignored. While encoding a simple struct is unlikely to fail, checking the error is good practice.

♻️ Proposed fix
 	w.Header().Set("Content-Type", "application/json")
 	w.Header().Set("Access-Control-Allow-Origin", "*")
-	json.NewEncoder(w).Encode(overview)
+	if err := json.NewEncoder(w).Encode(overview); err != nil {
+		log.Printf("Error encoding overview response: %v", err)
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 440 - 445, The JSON encoder error from
json.NewEncoder(w).Encode(overview) is ignored; capture its returned error and
handle it (e.g., log the error and return an HTTP 500). Locate the handler that
calls m.buildOverview(snapshots) to produce overview, replace the bare Encode
call with code that checks err := json.NewEncoder(w).Encode(overview) and on
non-nil err uses the existing logger (or http.Error) to report the failure and
set an appropriate status code before returning.

295-339: GitHub API rate limit consideration.

The anonymous GitHub API is limited to 60 requests/hour. While the fallback to defaultAppList() handles rate limiting gracefully, consider adding a User-Agent header (GitHub recommends this) and potentially caching the app list for longer periods.

♻️ Add User-Agent header (GitHub best practice)
 	req, err := http.NewRequest("GET", apiURL, nil)
 	if err != nil {
 		log.Printf("Error creating GitHub request: %v", err)
 		return defaultAppList()
 	}
 	req.Header.Set("Accept", "application/vnd.github.v3+json")
+	req.Header.Set("User-Agent", "cozystack-telemetry-server")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 295 - 339, The fetchAppList function uses anonymous
GitHub API calls and lacks a User-Agent header and caching; update fetchAppList
to set a User-Agent header on req (e.g., req.Header.Set("User-Agent",
"<app-name-or-service>")) and implement a simple in-memory cache inside
OverviewManager (fields like cachedApps []string and cacheExpiry time.Time) so
subsequent calls return cachedApps until cacheExpiry, falling back to
defaultAppList() on error; keep existing error handling and sort/logging logic
and ensure cache is refreshed on successful fetch.

559-564: Remove unnecessary custom min function—Go 1.23+ provides a built-in.

The project targets Go 1.23.4, which includes the built-in min function introduced in Go 1.21. The custom definition at lines 559–564 shadows the built-in unnecessarily. Since both have the same signature for integers, replace calls to the custom min with the built-in function for clarity and reduced confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 559 - 564, A custom min(a, b int) function is
shadowing the new built-in min; delete the custom func min definition in
overview.go and update the code to call the language-provided min(a, b) directly
everywhere (ensure no other local identifiers named min conflict). Remove the
function declaration for min and run `go vet`/build to confirm there are no
remaining references or shadowing collisions.
main.go (1)

156-158: Consider adding graceful shutdown for the OverviewManager.

The OverviewManager.Start() spawns background goroutines but there's no mechanism to stop them gracefully. When the server terminates, any in-progress snapshot collection could be interrupted. While os.WriteFile is atomic, adding a Stop() method with context cancellation would be cleaner.

♻️ Sketch for graceful shutdown

In overview.go, add a context and stop channel:

type OverviewManager struct {
    // ... existing fields
    ctx    context.Context
    cancel context.CancelFunc
}

func (m *OverviewManager) Stop() {
    m.cancel()
}

In main.go, handle signals:

overview := NewOverviewManager(*vmSelectURL, *snapshotDir)
overview.Start()
defer overview.Stop()

// Handle graceful shutdown
sigChan := make(chan os.Signal, 1)
signal.Notify(sigChan, syscall.SIGINT, syscall.SIGTERM)
go func() {
    <-sigChan
    overview.Stop()
    // shutdown server...
}()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@main.go` around lines 156 - 158, The OverviewManager starts background
goroutines via OverviewManager.Start() but lacks a cancellation mechanism; add
context cancellation to OverviewManager (fields ctx context.Context and cancel
context.CancelFunc set in NewOverviewManager), implement a Stop() method that
calls cancel, have goroutines select on ctx.Done() to exit cleanly, and in
main.go call overview.Stop() (and defer it) and wire a signal handler
(SIGINT/SIGTERM) to call overview.Stop() during shutdown so in-progress snapshot
work is canceled gracefully.
charts/cozy-telemetry/templates/pvc.yaml (1)

1-12: Consider making storageClassName configurable.

The PVC uses the cluster's default storage class. For production flexibility, consider allowing users to specify a storage class via values.

♻️ Optional: Add configurable storageClassName
 spec:
   accessModes:
     - ReadWriteOnce
+  {{- if .Values.snapshot.storageClassName }}
+  storageClassName: {{ .Values.snapshot.storageClassName }}
+  {{- end }}
   resources:
     requests:
       storage: {{ .Values.snapshot.storage }}

And in values.yaml:

snapshot:
  storage: 1Gi
  storageClassName: ""  # Uses cluster default if empty
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@charts/cozy-telemetry/templates/pvc.yaml` around lines 1 - 12, The
PersistentVolumeClaim "cozy-telemetry-snapshots" currently omits
storageClassName and thus always uses the cluster default; add a
spec.storageClassName field that sources its value from
.Values.snapshot.storageClassName (leave empty/null to use the cluster default)
alongside the existing .Values.snapshot.storage request, and update values.yaml
to include snapshot.storageClassName so operators can configure the PVC storage
class; ensure the new key name matches .Values.snapshot.storageClassName used in
the chart templates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@charts/cozy-telemetry/templates/deployment.yaml`:
- Around line 32-34: The deployment template hardcodes the mountPath
"/data/snapshots" under volumeMounts (name: snapshots) instead of using the
chart value; update the deployment.yaml volumeMounts entry for mountPath to
reference the Helm value config.snapshotDir (from values.yaml) so the pod mount
follows the configured snapshotDir (ensure the same Helm template variable is
used wherever snapshotDir is referenced to keep volume and application paths
consistent).
- Around line 8-10: The deployment hardcodes replicas: 1 while values.yaml
defines replicaCount, making the Helm value unused; either remove replicaCount
from charts/cozy-telemetry/values.yaml or update
charts/cozy-telemetry/templates/deployment.yaml to use the templated value
(e.g., replace the hardcoded replicas with .Values.replicaCount) and add a brief
comment above explaining the RWO storage constraint that requires single-replica
deployments when applicable; update whichever file contains
replicaCount/replicas to keep them consistent and document the reason.

In `@overview.go`:
- Around line 545-552: The parseMonth function currently swallows parse errors
and returns time.Now(), which can produce misleading labels; change
parseMonth(month string) to return (time.Time, error) instead of silently
falling back, call time.Parse and return the parsed time or the parse error, and
update every caller of parseMonth to handle the error (log a warning or
propagate it to the API response) rather than using time.Now(); this ensures
invalid input is surfaced and avoids incorrect date ranges in responses.
- Around line 158-227: collectSnapshot currently saves a snapshot even when all
metric queries fail (all-zero), which can overwrite valid prior data; modify
collectSnapshot to detect when no meaningful data was collected (e.g., track a
boolean like anySuccess set to true whenever
queryScalar/queryVector/fetchAppList returns usable results or snapshot fields
are non-zero/non-empty) and only call m.saveSnapshot(snapshot) if anySuccess (or
snapshot contains at least one non-zero field or apps). If no meaningful data
was collected, log a warning and skip saving to avoid overwriting previous
snapshots; use the existing functions queryScalar, queryVector, fetchAppList and
the Snapshot struct to determine success.

---

Nitpick comments:
In `@charts/cozy-telemetry/templates/pvc.yaml`:
- Around line 1-12: The PersistentVolumeClaim "cozy-telemetry-snapshots"
currently omits storageClassName and thus always uses the cluster default; add a
spec.storageClassName field that sources its value from
.Values.snapshot.storageClassName (leave empty/null to use the cluster default)
alongside the existing .Values.snapshot.storage request, and update values.yaml
to include snapshot.storageClassName so operators can configure the PVC storage
class; ensure the new key name matches .Values.snapshot.storageClassName used in
the chart templates.

In `@main.go`:
- Around line 156-158: The OverviewManager starts background goroutines via
OverviewManager.Start() but lacks a cancellation mechanism; add context
cancellation to OverviewManager (fields ctx context.Context and cancel
context.CancelFunc set in NewOverviewManager), implement a Stop() method that
calls cancel, have goroutines select on ctx.Done() to exit cleanly, and in
main.go call overview.Stop() (and defer it) and wire a signal handler
(SIGINT/SIGTERM) to call overview.Stop() during shutdown so in-progress snapshot
work is canceled gracefully.

In `@overview.go`:
- Around line 440-445: The JSON encoder error from
json.NewEncoder(w).Encode(overview) is ignored; capture its returned error and
handle it (e.g., log the error and return an HTTP 500). Locate the handler that
calls m.buildOverview(snapshots) to produce overview, replace the bare Encode
call with code that checks err := json.NewEncoder(w).Encode(overview) and on
non-nil err uses the existing logger (or http.Error) to report the failure and
set an appropriate status code before returning.
- Around line 295-339: The fetchAppList function uses anonymous GitHub API calls
and lacks a User-Agent header and caching; update fetchAppList to set a
User-Agent header on req (e.g., req.Header.Set("User-Agent",
"<app-name-or-service>")) and implement a simple in-memory cache inside
OverviewManager (fields like cachedApps []string and cacheExpiry time.Time) so
subsequent calls return cachedApps until cacheExpiry, falling back to
defaultAppList() on error; keep existing error handling and sort/logging logic
and ensure cache is refreshed on successful fetch.
- Around line 559-564: A custom min(a, b int) function is shadowing the new
built-in min; delete the custom func min definition in overview.go and update
the code to call the language-provided min(a, b) directly everywhere (ensure no
other local identifiers named min conflict). Remove the function declaration for
min and run `go vet`/build to confirm there are no remaining references or
shadowing collisions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1ec9abc8-3fab-4378-9765-8710241897ee

📥 Commits

Reviewing files that changed from the base of the PR and between d32e2b7 and 449475b.

📒 Files selected for processing (6)
  • README.md
  • charts/cozy-telemetry/templates/deployment.yaml
  • charts/cozy-telemetry/templates/pvc.yaml
  • charts/cozy-telemetry/values.yaml
  • main.go
  • overview.go

Go's time format uses 01 for month, 02 for day. The Start field format
was "2006-01-01" (year-month-month) instead of "2006-01-02" (year-month-day).
Also fix parseMonth to parse with explicit day suffix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
overview.go (2)

83-100: Consider adding graceful shutdown support for background goroutines.

The Start() method spawns two long-running goroutines with no mechanism to stop them. When the process receives SIGTERM, these goroutines are abandoned without cleanup. While this may be acceptable for a simple scheduler, it's better practice to support graceful shutdown.

♻️ Proposed refactor: Add context-based shutdown
 type OverviewManager struct {
 	vmSelectURL string
 	snapshotDir string
 	mu          sync.RWMutex
 	snapshots   []Snapshot
+	stopCh      chan struct{}
 }

 func NewOverviewManager(vmSelectURL, snapshotDir string) *OverviewManager {
 	return &OverviewManager{
 		vmSelectURL: vmSelectURL,
 		snapshotDir: snapshotDir,
+		stopCh:      make(chan struct{}),
 	}
 }

-func (m *OverviewManager) Start() {
+func (m *OverviewManager) Start() {
 	// ... existing setup ...
 	go func() {
-		time.Sleep(30 * time.Second)
+		select {
+		case <-time.After(30 * time.Second):
+		case <-m.stopCh:
+			return
+		}
 		m.collectIfNeeded()
 	}()
 	go m.scheduleMonthly()
 }
+
+func (m *OverviewManager) Stop() {
+	close(m.stopCh)
+}

Then wire up signal handling in main.go to call overview.Stop() on SIGTERM/SIGINT.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 83 - 100, Start() currently launches background
goroutines (scheduleMonthly and the collectIfNeeded launcher) with no way to
stop them; add context-based shutdown by giving OverviewManager a
context.CancelFunc (or context.Context + cancel) and a sync.WaitGroup, update
Start to create a cancellable context and increment the WaitGroup for each
goroutine, pass the context into scheduleMonthly and collectIfNeeded (or make
them check ctx.Done()) so they exit when canceled, implement a Stop() method
that calls cancel and waits on the WaitGroup to ensure graceful shutdown, and
wire main.go to call overview.Stop() on SIGINT/SIGTERM.

458-463: Consider using built-in min function instead of custom implementation.

The project uses Go 1.23.4, which supports the built-in min function introduced in Go 1.21. The custom min function defined at lines 559-562 can be removed, and calls to min() at lines 461 and 464 (and corresponding lines in the similar logic) can use Go's built-in instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 458 - 463, The code defines a custom min function
but the project uses Go 1.23 which already provides a built-in min; remove the
custom min implementation (the function around lines 559-562) and change calls
like quarterCount := min(3, len(snapshots)) and yearCount := min(12,
len(snapshots)) (used when preparing resp.Periods and in similar logic that
calls aggregateSnapshots) to use the language-provided min directly (no extra
import needed); run go build/tests to verify no references to the removed custom
min remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@overview.go`:
- Around line 442-445: The JSON encoding call
json.NewEncoder(w).Encode(overview) currently ignores its returned error after
headers are sent; update the code to capture that error, e.g. err :=
json.NewEncoder(w).Encode(overview), and if err != nil log the failure
(including err and context like the overview payload or request info) using the
project's logger (or log.Printf) so encoding/writing failures are recorded for
debugging while leaving headers/status unchanged.

---

Nitpick comments:
In `@overview.go`:
- Around line 83-100: Start() currently launches background goroutines
(scheduleMonthly and the collectIfNeeded launcher) with no way to stop them; add
context-based shutdown by giving OverviewManager a context.CancelFunc (or
context.Context + cancel) and a sync.WaitGroup, update Start to create a
cancellable context and increment the WaitGroup for each goroutine, pass the
context into scheduleMonthly and collectIfNeeded (or make them check ctx.Done())
so they exit when canceled, implement a Stop() method that calls cancel and
waits on the WaitGroup to ensure graceful shutdown, and wire main.go to call
overview.Stop() on SIGINT/SIGTERM.
- Around line 458-463: The code defines a custom min function but the project
uses Go 1.23 which already provides a built-in min; remove the custom min
implementation (the function around lines 559-562) and change calls like
quarterCount := min(3, len(snapshots)) and yearCount := min(12, len(snapshots))
(used when preparing resp.Periods and in similar logic that calls
aggregateSnapshots) to use the language-provided min directly (no extra import
needed); run go build/tests to verify no references to the removed custom min
remain.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 807114c4-7b3c-44f5-83aa-66d87a0ae2dd

📥 Commits

Reviewing files that changed from the base of the PR and between 449475b and 328eed6.

📒 Files selected for processing (1)
  • overview.go

- Import time/tzdata to embed timezone database for scratch/alpine images
- Share single http.Client across OverviewManager instead of per-request
- Skip saving snapshot when all values are zero (VM unreachable)
- Log parseMonth errors and json.Encode failures
- Remove unused replicaCount from values.yaml
- Template volumeMount path from config.snapshotDir

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@overview.go`:
- Around line 127-149: nextFirstOfMonth currently always jumps to the following
month which causes a startup race where a process that starts on the 1st just
before 00:01 PT will miss the prior month's snapshot; update
nextFirstOfMonth(now) to compute the candidate for the current month's 1st at
00:01 (time.Date(year, month, 1, 0, 1, 0, 0, now.Location())) and return that
candidate if now is before it, otherwise return the candidate for the next
month, and also modify scheduleMonthly() so that when the process starts on day
1 and now is before the candidate (i.e., before 00:01) it immediately calls
m.collectSnapshot(prevMonth) for the just-ended month and then advances next to
the following month's 00:01 to avoid double-collecting.
- Around line 359-368: In saveSnapshot, avoid truncating the target month file
in place; instead write atomically by creating a temp file in m.snapshotDir
(e.g. using os.CreateTemp), write the JSON data to it, fsync/close the temp
file, then rename it to filepath.Join(m.snapshotDir, s.Month+".json") with
os.Rename so the replacement is atomic; also fsync the directory if possible.
Reference: saveSnapshot, m.snapshotDir, s.Month, and ensure loadSnapshots
continues to read the final file.
- Around line 463-472: The current logic uses simple slice lengths to build
periods (resp.Periods["month"/"quarter"/"year"]) which selects the newest N
snapshots rather than the last N calendar months; implement a calendar-aware
filter (e.g. filterSnapshotsByMonths(snapshots []Snapshot, months int) that
parses Snapshot.Month into time.Time, computes a cutoff from the latest
snapshot, and returns only snapshots whose month >= cutoff) and replace the
slice-based calls (aggregateSnapshots(snapshots[:1], ...),
aggregateSnapshots(snapshots[:quarterCount], ...),
aggregateSnapshots(snapshots[:yearCount], ...)) with
aggregateSnapshots(filterSnapshotsByMonths(snapshots, 1/3/12), ...) so
quarter/year cover the actual last 3/12 calendar months even when months are
missing in the data.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3d5a85f-dc8b-4998-ba52-738113fbeccc

📥 Commits

Reviewing files that changed from the base of the PR and between 328eed6 and 8bad2f4.

📒 Files selected for processing (3)
  • charts/cozy-telemetry/templates/deployment.yaml
  • charts/cozy-telemetry/values.yaml
  • overview.go

- nextFirstOfMonth: if pod starts on 1st before 00:01, return current
  month's run time instead of skipping to next month
- saveSnapshot: atomic write via temp file + fsync + rename to prevent
  partial files on crash/eviction
- buildOverview: use calendar-range filter for quarter/year instead of
  slice length, so gaps don't produce wrong date ranges

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
app: cozy-telemetry
spec:
replicas: {{ .Values.replicaCount }}
replicas: 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and PVC will make telepetry-server unable to run in multiple replicas, if you want to cache the state, it might be worth of using emptyDir: {} storage.

- Remove PVC and Recreate strategy; use emptyDir so the server can run
  with multiple replicas
- Restore replicaCount to values.yaml
- /api/overview now requires ?year=YYYY&month=MM params; requests
  without them are rejected with 400 to prevent public endpoint abuse
- Snapshots are generated on the fly from VictoriaMetrics at the end of
  the requested month and cached in emptyDir for subsequent requests
- Remove background scheduler and startup auto-collection; all data
  fetching is driven by incoming requests
- Add &time= param to VM queries to support historical month lookups
- Increase server WriteTimeout to 60s to accommodate on-demand VM queries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-e
Signed-off-by: tym83 <6355522@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
overview.go (1)

279-322: Consider adding a User-Agent header for GitHub API requests.

GitHub's API documentation recommends setting a User-Agent header. While the fallback to defaultAppList() handles rate-limit or error scenarios gracefully, adding a descriptive User-Agent can help avoid issues with unauthenticated requests.

🔧 Proposed fix
 	req.Header.Set("Accept", "application/vnd.github.v3+json")
+	req.Header.Set("User-Agent", "cozystack-telemetry-server")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@overview.go` around lines 279 - 322, The fetchAppList method in
OverviewManager should set a User-Agent header on the GitHub API request to
comply with GitHub recommendations and reduce chances of being rejected; after
creating the request in fetchAppList (the req from http.NewRequest) add a
descriptive User-Agent (e.g., include your service name and version like
"cozystack-overview/<version>" or similar) in addition to the existing Accept
header before calling m.httpClient.Do, ensuring the header is consistently
applied for all GitHub calls from this function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@overview.go`:
- Around line 279-322: The fetchAppList method in OverviewManager should set a
User-Agent header on the GitHub API request to comply with GitHub
recommendations and reduce chances of being rejected; after creating the request
in fetchAppList (the req from http.NewRequest) add a descriptive User-Agent
(e.g., include your service name and version like "cozystack-overview/<version>"
or similar) in addition to the existing Accept header before calling
m.httpClient.Do, ensuring the header is consistently applied for all GitHub
calls from this function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9a36397b-3fd3-4996-b4de-c33c683ee2d9

📥 Commits

Reviewing files that changed from the base of the PR and between 8bad2f4 and 7108344.

📒 Files selected for processing (4)
  • charts/cozy-telemetry/templates/deployment.yaml
  • charts/cozy-telemetry/values.yaml
  • main.go
  • overview.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • main.go
  • charts/cozy-telemetry/values.yaml

- Add per-month singleflight via WaitGroup map to prevent thundering
  herd: concurrent requests for the same uncached month now coalesce
  into a single VM query instead of N parallel ones
- Cap VictoriaMetrics response body at 10 MB with io.LimitReader to
  prevent OOM on malformed/hostile responses
- Revert global WriteTimeout to 10s; wrap only the overview handler
  with http.TimeoutHandler(55s) so slow VM queries don't weaken the
  slow-client protection on the telemetry ingestion endpoint

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
-e
Signed-off-by: tym83 <6355522@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants