ClickHouse usage analytics: events/gauges tables with daily MV#3
ClickHouse usage analytics: events/gauges tables with daily MV#3lohanidamodar wants to merge 94 commits intomainfrom
Conversation
- Database adapter - ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter. - Introduced new Query class for building ClickHouse queries with fluent interface. - Added support for advanced query operations including find and count methods. - Enhanced error handling and SQL injection prevention mechanisms. - Created comprehensive usage guide for ClickHouse adapter. - Added unit tests for Query class to ensure functionality and robustness. - Maintained backward compatibility with existing methods while improving overall architecture.
…metric logging with deterministic IDs
…ed tags in ClickHouse and Database adapters
…pdate tests for new behavior
- Refactor Metric class from mutable ArrayObject to immutable DTO using constructor property promotion with readonly properties (review #5) - Add Metric::fromArray() factory for constructing from associative arrays - Store extra dimensions (path, method, status, etc.) in extras map - Remove setUseFinal option; always use FINAL in ClickHouse queries (review #6) - Remove unused Metric import from ClickHouse adapter - Update all Metric tests to use new constructor and fromArray() factory Review items already addressed in prior commits: projectId->tenant (#1), region removal (#2), no purgeExpired on facade (#3), no HTTP log methods on Metric (#4), composer test script (#7). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… aggregation - Replace SummingMergeTree + ReplacingMergeTree with single MergeTree table - Remove period column and fan-out; add type column (event/gauge) - Raw append with UUID IDs instead of deterministic dedup - Add SummingMergeTree materialized view for billing (events only) - Query-time aggregation: SUM for events, argMax for gauges - New methods: addBatch, getTimeSeries, getTotal, getTotalBatch - Remove: increment, set, incrementBatch, setBatch, all *ByPeriod* methods - Single collect() method with type parameter replaces collect/collectSet Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add Usage::TYPE_EVENT and Usage::TYPE_GAUGE constants - Use constants in validation and type comparisons - Add SummingMergeTree daily aggregation MV for events (toStartOfDay) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3b32515 to
15112a0
Compare
- Change tenant type from ?int to ?string everywhere (Adapter, Usage, ClickHouse, Database, Metric) - ClickHouse tenant column: Nullable(String) instead of Nullable(UInt64) - Fix tenant key mismatch: validateMetricsBatch now checks '$tenant' matching resolveTenantFromMetric - Fix MV GROUP BY: conditional on sharedTables (no tenant column when sharedTables=false) - Fix billing/daily target tables: conditional tenant column and ORDER BY - Add collect() validation: empty metric name and negative value checks - Fix ltrim() misuse in buildWhereClause: getTenantFilter now returns bare condition without ' AND ' prefix - Fix SQL.php: replace 'Audit' references with 'Usage', remove unused Database import - Fix parseQueries: use Int64 param type for value attribute instead of String - Rewrite all tests (UsageBase, ClickHouseTest, MetricTest) for new API: replace increment/set/period-based methods with addBatch/collect/ getTotal/getTotalBatch/getTimeSeries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove $useFinal property and setUseFinal() (MergeTree doesn't support FINAL) - Remove buildTableReference $useFinal param - Fix resolveTenantFromMetric mixed type handling - Remove unreachable branch in Database::getTotal() - Remove always-true count check in addBatch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove separate billing table and MV (monthly aggregation) - Daily MV uses same column definitions as source table - Billing queries use daily table (SUM over daily aggregated rows) - Only events are pre-aggregated; gauges query raw table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Split the single MergeTree table into two separate tables: - Events table with dedicated columns for path, method, status, resource, resourceId - Gauges table with simple metric/value/time/tags schema Event-specific columns are automatically extracted from tags during addBatch. The daily SummingMergeTree MV now aggregates by metric, resource, resourceId. All read methods accept an optional $type parameter to target specific tables, with null querying both tables transparently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query the pre-aggregated daily SummingMergeTree table for fast billing/analytics instead of scanning raw events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- country: LowCardinality(Nullable(String)) for efficient low-cardinality storage - userAgent: Nullable(String) with bloom filter index - Both extracted from tags into dedicated columns like other event fields - Added getCountry() and getUserAgent() getters on Metric Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Daily table only has metric, value, time, resource, resourceId, tenant. No path/status/userAgent/country/tags — those don't aggregate meaningfully. MV groups by metric, resource, resourceId, tenant, day. ORDER BY includes resource and resourceId for efficient billing queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use Metric::EVENT_COLUMNS to extract all event columns from tags instead of hardcoding the list. Now country and userAgent are properly stored in dedicated columns instead of being left in tags JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Daily table now only has metric, value, time, tenant. One row per metric per project per day — optimal for billing. Resource-level breakdown queries the raw events table directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single query with GROUP BY metric for summing multiple metrics from the daily table. Returns array<string, int>. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| if ($eventTotal > 0 && $gaugeTotal > 0) { | ||
| // A metric shouldn't be in both tables; return whichever is nonzero | ||
| // In practice, callers specify type for ambiguous cases | ||
| return $eventTotal + $gaugeTotal; | ||
| } | ||
|
|
||
| return $eventTotal > 0 ? $eventTotal : $gaugeTotal; |
There was a problem hiding this comment.
getTotal() sums semantically incompatible aggregations
The inline comment says "return whichever is nonzero" but the code actually returns $eventTotal + $gaugeTotal. These two values are computed with fundamentally different aggregation semantics — SUM(value) for events and argMax(value, time) for gauges — so adding them together produces a number with no valid meaning.
In the common case where a metric name only exists in one table this is harmless (one of the two is 0). But if a metric name were ever present in both tables (e.g. during a data migration or a misconfigured adapter), callers would silently receive a corrupted total without any warning.
Consider throwing (or at least returning the dominant value) instead:
if ($eventTotal > 0 && $gaugeTotal > 0) {
// Metric exists in both tables — this is unexpected; callers should specify $type.
throw new \RuntimeException(
"Metric '{$metric}' found in both events and gauges tables. Specify \$type explicitly."
);
}Or, if silent fallback is preferred, at least fix the comment to match what the code does.
| foreach ($json['data'] as $row) { | ||
| $metricName = $row['metric'] ?? ''; | ||
| $bucketTime = $row['bucket'] ?? ''; | ||
| $value = (int) ($row['agg_value'] ?? 0); |
There was a problem hiding this comment.
getTimeSeriesFromTable() truncates float bucket values to int
$value = (int) ($row['agg_value'] ?? 0) will silently truncate any fractional part returned by ClickHouse. For gauge metrics the aggregation expression is argMax(value, time), and for event metrics it is SUM(value). Large event sums can exceed the range of a PHP int on 32-bit hosts, and gauge values can legitimately be fractional (e.g. storage in fractional GB) if the schema ever changes to Float64.
The declared return type in the docblock is array{total: int, data: array<array{value: int, date: string}>}, which locks in int — but the abstract Adapter interface at the getTimeSeries() level does not restrict callers to integer-only metrics. Consider casting to (float) internally and widening the return type, or at minimum documenting the truncation explicitly so future schema changes surface the discrepancy immediately.
Updated for events/gauges split, event-specific columns, daily MV, query-time aggregation, billing methods, and complete API reference with examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…country/userAgent - Remove stale 'type' key from addBatch() @param array shape in Usage.php, Adapter.php, Database.php - Fix mixed-to-string cast in ClickHouse.php event column extraction with type-safe checks - Reduce path size from 1024 to 255 and userAgent size from 512 to 255 in Metric::getEventSchema() to stay within MySQL 768-byte index limit - Update MetricTest assertions: 11 attributes, 9 indexes, 7 EVENT_COLUMNS - Update ClickHouseTest: userAgent/country are now event columns, not tags Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add UsageQuery class extending Query with a custom groupByInterval method that enables time-bucketed aggregated queries. When present in the queries array, the ClickHouse adapter switches from raw row returns to aggregated results grouped by time bucket (SUM for events, argMax for gauges). Supported intervals: 1m, 5m, 15m, 1h, 1d, 1w, 1M. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| $parsed = $this->parseQueries($queries, Usage::TYPE_EVENT); | ||
| $whereData = $this->buildWhereClause($parsed['filters'], $parsed['params']); |
There was a problem hiding this comment.
Event-specific column filters silently break
findDaily/sumDaily/sumDailyBatch
parseQueries($queries, Usage::TYPE_EVENT) validates query attributes against the full event schema (which includes path, method, status, resource, resourceId, userAgent), so no exception is thrown client-side. However the daily table only contains metric, value, time, and optionally tenant. A caller passing Query::equal('path', '/v1/...') produces WHERE \path` = {param_0:String}` against the daily table, causing a ClickHouse "No such column 'path'" server error at runtime.
The same issue affects sumDaily() (line 1761) and sumDailyBatch() (line 1802). A lightweight fix is to validate the attribute name against the daily table's actual column set before building the clause:
private const DAILY_TABLE_COLUMNS = ['metric', 'value', 'time', 'tenant'];
// in findDaily/sumDaily/sumDailyBatch, before parseQueries:
foreach ($queries as $query) {
$attr = $query->getAttribute();
if ($attr !== '' && !in_array($attr, self::DAILY_TABLE_COLUMNS, true)) {
throw new \InvalidArgumentException(
"Column '{$attr}' does not exist in the daily table. Allowed: "
. implode(', ', self::DAILY_TABLE_COLUMNS)
);
}
}| if ($type === 'event') { | ||
| // Additive: sum values for the same metric | ||
| if (isset($this->buffer[$key])) { | ||
| $this->buffer[$key]['value'] += $value; | ||
| } else { | ||
| $this->buffer[$key] = [ | ||
| 'metric' => $metric, | ||
| 'value' => $value, | ||
| 'type' => $type, | ||
| 'tags' => $tags, | ||
| ]; | ||
| } |
There was a problem hiding this comment.
collect() silently drops event tags on repeated same-metric calls, corrupting path/method attribution
The buffer key is $metric . ':' . $type. When two events for the same metric name arrive with different tags — e.g. collect('bandwidth', 100, 'event', ['path' => '/v1/files']) followed by collect('bandwidth', 200, 'event', ['path' => '/v1/functions']) — only the value is summed; the second call's tags are silently discarded. The single ClickHouse row written at flush time carries path='/v1/files' and value=300, misattributing 200 bytes to the wrong endpoint.
Given that the system has dedicated path, method, status, resource, and resourceId columns specifically for per-path analytics, this aggregation strategy loses the metadata granularity that the schema was designed to preserve. Either:
- Use
addBatch()directly (bypassing the buffer) when per-event tag fidelity matters, or - Change the buffer key to include a hash of the tags so that events with distinct contexts remain separate entries.
Query::parse() uses static::isMethod() which allows UsageQuery
to extend the valid method list. Without this override, parsing
'groupByInterval("time","1h")' throws "Invalid query method".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Complete rewrite of the usage analytics library with a two-table architecture optimized for both real-time analytics and billing.
Architecture
Key Changes
API
Write
Read
Billing (Daily MV)
Test Plan