Altinity · TheraniAA · May 7, 2026
diff --git a/content/en/altinity-kb-setup-and-maintenance/altinity-kb-who-ate-my-memory.md b/content/en/altinity-kb-setup-and-maintenance/altinity-kb-who-ate-my-memory.md
@@ -1,13 +1,46 @@
 ---
 title: "Who ate my ClickHouse® memory?"
 linkTitle: "Who ate my memory?"
+weight: 100
 description: >
-    *"It was here a few minutes ago..."*
-keywords: 
+    *"It was here a few minutes ago..."* — finding what is using ClickHouse® RAM.
+keywords:
   - clickhouse memory
   - clickhouse memory usage
+  - MemoryTracking
+  - jemalloc
 ---
 
+When `MemoryTracking` is high or the OS shows ClickHouse® RSS far above what
+queries account for, the goal is to attribute the memory to one of its
+consumers. The major ones are:
+
+- **Caches** — mark cache, uncompressed cache, mmap cache, query cache,
+  filesystem cache, compiled-expression cache, primary-index cache.
+- **Primary keys** loaded in memory for every active part (`system.parts.primary_key_bytes_in_memory_allocated`).
+- **Dictionaries** (`system.dictionaries.bytes_allocated`).
+- **Running queries** (`system.processes.memory_usage`) and **merges/mutations**
+  (`system.merges.memory_usage`).
+- **In-memory engines** — `Memory`, `Set`, `Join`, `Buffer` tables.
+- **Async inserts** waiting to flush (`system.asynchronous_inserts`).
+- **In-memory parts** (`part_type = 'InMemory'`).
+- **Virtual / accounting overhead** — file read buffers, thread stacks, mmaps.
+- **jemalloc retained pages** — memory the allocator has freed but not yet
+  returned to the OS. This inflates RSS without being attributable to any
+  ClickHouse subsystem; running `SYSTEM JEMALLOC PURGE` first makes the rest
+  of the numbers comparable.
+
+For the related limits and overcommit behavior, see
+[memory configuration settings](/altinity-kb-setup-and-maintenance/altinity-kb-memory-configuration-settings/),
+[Memory Overcommiter](/altinity-kb-setup-and-maintenance/altinity-kb-memory-overcommit/), and
+[Configure ClickHouse for low memory environments](/altinity-kb-setup-and-maintenance/configure_clickhouse_for_low_mem_envs/).
+
+## All-in-one breakdown
+
+Run this first. It purges jemalloc retained pages, then unions every memory
+consumer into one `(group, name, val)` result so you can compare them at a
+glance.
+
 ```sql
 SYSTEM JEMALLOC PURGE;
 
@@ -39,7 +72,7 @@ SELECT 'AsyncInserts' as group, 'db:'||database as name, toInt64(sum(total_bytes
     UNION ALL
 SELECT 'FileBuffersVirtual' as group, metric as name, toInt64(value * 2*1024*1024) FROM system.metrics WHERE metric like 'OpenFileFor%'
     UNION ALL
-SELECT 'ThreadStacksVirual' as group, metric as name, toInt64(value * 8*1024*1024) FROM system.metrics WHERE metric = 'GlobalThread'
+SELECT 'ThreadStacksVirtual' as group, metric as name, toInt64(value * 8*1024*1024) FROM system.metrics WHERE metric = 'GlobalThread'
     UNION ALL
 SELECT 'UserMemoryTracking' as group, user as name, toInt64(memory_usage) FROM system.user_processes
     UNION ALL
@@ -48,43 +81,56 @@ select 'QueryCacheBytes' as group, '', toInt64(sum(result_size)) FROM system.que
 SELECT 'MemoryTracking' as group, 'total' as name, toInt64(value) FROM system.metrics WHERE metric = 'MemoryTracking'
 ```
 
+> Note: `FileBuffersVirtual` and `ThreadStacksVirtual` are *upper bounds* on
+> reservation, not committed RSS — useful for spotting runaway thread/file-handle
+> counts but not directly comparable to `MemoryTracking`.
+
+## Drill down by subsystem
+
+Once the all-in-one query points at a suspect group, use these to inspect it.
+
 ```sql
-SELECT *, formatReadableSize(value) 
-FROM system.metrics 
+-- Live memory/cache metrics
+SELECT *, formatReadableSize(value)
+FROM system.metrics
 WHERE (metric ilike '%Cach%' or metric ilike '%Mem%') and value != 0
 order by metric format PrettyCompactMonoBlock;
 
-SELECT *, formatReadableSize(value) 
-FROM system.asynchronous_metrics 
-WHERE metric like '%Cach%' or metric like '%Mem%' 
+SELECT *, formatReadableSize(value)
+FROM system.asynchronous_metrics
+WHERE metric like '%Cach%' or metric like '%Mem%'
 order by metric format PrettyCompactMonoBlock;
 
-SELECT event_time, metric, value, formatReadableSize(value) 
-FROM system.asynchronous_metric_log 
-WHERE event_time > now() - 600 and (metric like '%Cach%' or metric like '%Mem%') and value <> 0 
+-- Last 10 minutes of memory/cache metrics from the async log
+SELECT event_time, metric, value, formatReadableSize(value)
+FROM system.asynchronous_metric_log
+WHERE event_time > now() - 600 and (metric like '%Cach%' or metric like '%Mem%') and value <> 0
 order by metric, event_time format PrettyCompactMonoBlock;
 
+-- Dictionaries
 SELECT formatReadableSize(sum(bytes_allocated)) FROM system.dictionaries;
 
+-- In-memory engines
 SELECT
     database,
     name,
     formatReadableSize(total_bytes)
 FROM system.tables
 WHERE engine IN ('Memory','Set','Join');
 
+-- Primary keys + in-memory parts
 SELECT
     sumIf(data_uncompressed_bytes, part_type = 'InMemory') as memory_parts,
     formatReadableSize(sum(primary_key_bytes_in_memory)) AS primary_key_bytes_in_memory,
     formatReadableSize(sum(primary_key_bytes_in_memory_allocated)) AS primary_key_bytes_in_memory_allocated
 FROM system.parts;
 
+-- Merges, in-flight queries, query cache
 SELECT formatReadableSize(sum(memory_usage)) FROM system.merges;
-
 SELECT formatReadableSize(sum(memory_usage)) FROM system.processes;
-
 select formatReadableSize(sum(result_size)) FROM system.query_cache;
 
+-- Top current queries by peak memory
 SELECT
     initial_query_id,
     elapsed,
@@ -95,6 +141,7 @@ FROM system.processes
 ORDER BY peak_memory_usage DESC
 LIMIT 10;
 
+-- Top recent (last 2h) finished queries by memory
 SELECT
     type,
     event_time,
@@ -105,14 +152,20 @@ FROM system.query_log
 WHERE (event_date >= today()) AND (event_time >= (now() - 7200))
 ORDER BY memory_usage DESC
 LIMIT 10;
-
 ```
 
+## Polling memory live
+
+When the suspect is moving (an active merge or query), poll the relevant
+tables in a loop. Run from the host shell.
+
+Just merges + processes:
+
 ```bash
 for i in `seq 1 600`; do clickhouse-client --empty_result_for_aggregation_by_empty_set=0 -q "select (select 'Merges: \
 '||formatReadableSize(sum(memory_usage)) from system.merges), (select \
 'Processes: '||formatReadableSize(sum(memory_usage)) from system.processes)";\
-sleep 3;  done 
+sleep 3;  done
 
 Merges: 96.57 MiB	Processes: 41.98 MiB
 Merges: 82.24 MiB	Processes: 41.91 MiB
@@ -121,6 +174,8 @@ Merges: 66.49 MiB	Processes: 37.13 MiB
 Merges: 67.78 MiB	Processes: 37.13 MiB
 ```
 
+Wider view — merges, queries, primary keys, in-memory tables, dictionaries:
+
 ```bash
 echo "         Merges      Processes       PrimaryK       TempTabs          Dicts"; \
 for i in `seq 1 600`; do clickhouse-client --empty_result_for_aggregation_by_empty_set=0  -q "select \
@@ -130,7 +185,7 @@ for i in `seq 1 600`; do clickhouse-client --empty_result_for_aggregation_by_emp
 (select leftPad(formatReadableSize(sum(total_bytes)),15, ' ') from system.tables \
  WHERE engine IN ('Memory','Set','Join'))||
 (select leftPad(formatReadableSize(sum(bytes_allocated)),15, ' ') FROM system.dictionaries)
-"; sleep 3;  done 
+"; sleep 3;  done
 
          Merges      Processes       PrimaryK       TempTabs          Dicts
          0.00 B         0.00 B      21.36 MiB       1.58 GiB     911.07 MiB
@@ -140,7 +195,13 @@ for i in `seq 1 600`; do clickhouse-client --empty_result_for_aggregation_by_emp
 
 ```
 
-## retrospection analysis of the RAM usage based on query_log and part_log (shows peaks)
+## Retrospective: peaks from query_log + part_log
+
+If the spike already happened, reconstruct the timeline by replaying
+allocations and releases from `query_log`, `part_log`, and `query_views_log`.
+Each row contributes `+peak_memory_usage` at start and `-peak_memory_usage`
+at end; the running sum approximates concurrent RAM use, broken out by event
+type.
 
 ```sql
 WITH 
@@ -201,7 +262,11 @@ ORDER BY timeframe
 FORMAT PrettyCompactMonoBlock;
 ```
 
-## retrospection analysis of trace_log
+## Retrospective: trace_log
+
+`trace_log` records `MemoryPeak` samples per query. This bucketizes them in
+5-minute windows and surfaces the worst offender per bucket — useful when
+the query is gone but the trace is still on disk.
 
 ```sql
 WITH 
@@ -239,9 +304,26 @@ ORDER BY t ASC;
 -- later on you can check particular query_ids in query_log
 ```
 
-## analysis of the server text logs 
+## Server text logs
+
+`MemoryTracker` lines in the server log show every time a query, merge, or
+global tracker hits a peak or a limit — handy when system tables have already
+rolled over.
 
 ```
 grep MemoryTracker /var/log/clickhouse-server.log
 zgrep MemoryTracker /var/log/clickhouse-server.log.*.gz
 ```
+
+## Next steps
+
+Once you've identified the culprit:
+
+- Hitting `Memory limit (for query) exceeded` — see
+  [memory configuration settings](/altinity-kb-setup-and-maintenance/altinity-kb-memory-configuration-settings/)
+  and consider enabling the
+  [Memory Overcommiter](/altinity-kb-setup-and-maintenance/altinity-kb-memory-overcommit/).
+- Caches/dictionaries dominating on a small box — see
+  [Configure ClickHouse for low memory environments](/altinity-kb-setup-and-maintenance/configure_clickhouse_for_low_mem_envs/).
+- System tables themselves are large — see
+  [system tables eat my disk](/altinity-kb-setup-and-maintenance/altinity-kb-system-tables-eat-my-disk/).