Commit 19ffea1
Exit JVM and dump heap on OutOfMemoryError for server containers
Add -XX:+ExitOnOutOfMemoryError, -XX:+HeapDumpOnOutOfMemoryError, and
-XX:HeapDumpPath=/dump to the long-running server JVMs (data, api,
submit, sched, db, rest). On OOM the JVM now writes a heap dump to
/dump and exits cleanly so K8s recreates the container, instead of
limping along in undefined state.
Motivated by the 2026-05-04 / 2026-05-06 prod data-pod wedges. Both
were precipitated by Java heap space errors which corrupted the
JVM-wide static InactivityMonitor Timer (a TimerTask hit OOM mid-run
and the TimerThread silently terminated). With these flags, the JVM
aborts on the first OOM before the InactivityMonitor TimerThread can
be touched, eliminating the OOM-driven path to the failover-wedge
condition entirely. The companion JmsFailoverWatchdog (in PR #1681)
remains as defense-in-depth for non-OOM wedge causes.
Targeted scope:
- 5 swarm-style server Dockerfiles (data, api, submit, sched, db) —
add the three flags right after -XX:MaxRAMPercentage=80.
- 2 Quarkus rest Dockerfiles (Dockerfile.jvm, Dockerfile.legacy-jar) —
append to JAVA_OPTS.
Skipped intentionally:
- Dockerfile-batch-dev (per-job SLURM lifecycle, no /dump volume).
- Dockerfile-clientgen-dev / docker/build/admin (build/utility tools).
- Dockerfile.native and Dockerfile.native-micro (Quarkus native image;
-XX: flags don't apply to GraalVM native builds).
Companion change required in vcell-fluxcd: add a /dump emptyDir volume
mount to each affected Deployment so the JVM has somewhere to write
the dump. Without that mount the dump silently fails and the JVM still
exits — log signal still works, just no postmortem artifact.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent d07328a commit 19ffea1
7 files changed
Lines changed: 20 additions & 2 deletions
File tree
- docker/build
- vcell-rest/src/main/docker
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
87 | 90 | | |
88 | 91 | | |
89 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
76 | 76 | | |
77 | 77 | | |
78 | 78 | | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
79 | 82 | | |
80 | 83 | | |
81 | 84 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
56 | 56 | | |
57 | 57 | | |
58 | 58 | | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
59 | 62 | | |
60 | 63 | | |
61 | 64 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
90 | 93 | | |
91 | 94 | | |
92 | 95 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
120 | 123 | | |
121 | 124 | | |
122 | 125 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
99 | 102 | | |
100 | 103 | | |
101 | 104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
90 | | - | |
| 90 | + | |
91 | 91 | | |
92 | 92 | | |
93 | 93 | | |
0 commit comments