Skip to content

Commit 19ffea1

Browse files
jcschaffclaude
andcommitted
Exit JVM and dump heap on OutOfMemoryError for server containers
Add -XX:+ExitOnOutOfMemoryError, -XX:+HeapDumpOnOutOfMemoryError, and -XX:HeapDumpPath=/dump to the long-running server JVMs (data, api, submit, sched, db, rest). On OOM the JVM now writes a heap dump to /dump and exits cleanly so K8s recreates the container, instead of limping along in undefined state. Motivated by the 2026-05-04 / 2026-05-06 prod data-pod wedges. Both were precipitated by Java heap space errors which corrupted the JVM-wide static InactivityMonitor Timer (a TimerTask hit OOM mid-run and the TimerThread silently terminated). With these flags, the JVM aborts on the first OOM before the InactivityMonitor TimerThread can be touched, eliminating the OOM-driven path to the failover-wedge condition entirely. The companion JmsFailoverWatchdog (in PR #1681) remains as defense-in-depth for non-OOM wedge causes. Targeted scope: - 5 swarm-style server Dockerfiles (data, api, submit, sched, db) — add the three flags right after -XX:MaxRAMPercentage=80. - 2 Quarkus rest Dockerfiles (Dockerfile.jvm, Dockerfile.legacy-jar) — append to JAVA_OPTS. Skipped intentionally: - Dockerfile-batch-dev (per-job SLURM lifecycle, no /dump volume). - Dockerfile-clientgen-dev / docker/build/admin (build/utility tools). - Dockerfile.native and Dockerfile.native-micro (Quarkus native image; -XX: flags don't apply to GraalVM native builds). Companion change required in vcell-fluxcd: add a /dump emptyDir volume mount to each affected Deployment so the JVM has somewhere to write the dump. Without that mount the dump silently fails and the JVM still exits — log signal still works, just no postmortem artifact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent d07328a commit 19ffea1

7 files changed

Lines changed: 20 additions & 2 deletions

File tree

docker/build/Dockerfile-api-dev

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,9 @@ EXPOSE 8000
8484
ENTRYPOINT java \
8585
-Xdebug -agentlib:jdwp=transport=dt_socket,address=*:8000,server=y,suspend=n \
8686
-XX:MaxRAMPercentage=80 \
87+
-XX:+ExitOnOutOfMemoryError \
88+
-XX:+HeapDumpOnOutOfMemoryError \
89+
-XX:HeapDumpPath=/dump \
8790
# -XX:+PrintFlagsFinal -XshowSettings:vm \
8891
-Dvcell.softwareVersion="${softwareVersion}" \
8992
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager \

docker/build/Dockerfile-data-dev

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,9 @@ EXPOSE 8000
7676
ENTRYPOINT java \
7777
-Xdebug -agentlib:jdwp=transport=dt_socket,address=*:8000,server=y,suspend=n \
7878
-XX:MaxRAMPercentage=80 \
79+
-XX:+ExitOnOutOfMemoryError \
80+
-XX:+HeapDumpOnOutOfMemoryError \
81+
-XX:HeapDumpPath=/dump \
7982
# -XX:+PrintFlagsFinal -XshowSettings:vm \
8083
-Djava.awt.headless=true \
8184
-Dvcell.softwareVersion="${softwareVersion}" \

docker/build/Dockerfile-db-dev

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ EXPOSE 8000
5656
ENTRYPOINT java \
5757
-Xdebug -agentlib:jdwp=transport=dt_socket,address=*:8000,server=y,suspend=n \
5858
-XX:MaxRAMPercentage=80 \
59+
-XX:+ExitOnOutOfMemoryError \
60+
-XX:+HeapDumpOnOutOfMemoryError \
61+
-XX:HeapDumpPath=/dump \
5962
# -XX:+PrintFlagsFinal -XshowSettings:vm \
6063
-Djava.awt.headless=true \
6164
-Dvcell.softwareVersion="${softwareVersion}" \

docker/build/Dockerfile-sched-dev

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,9 @@ VOLUME /htclogs
8787
ENTRYPOINT java \
8888
-Xdebug -agentlib:jdwp=transport=dt_socket,address=*:8000,server=y,suspend=n \
8989
-XX:MaxRAMPercentage=80 \
90+
-XX:+ExitOnOutOfMemoryError \
91+
-XX:+HeapDumpOnOutOfMemoryError \
92+
-XX:HeapDumpPath=/dump \
9093
# -XX:+PrintFlagsFinal -XshowSettings:vm \
9194
-Djava.awt.headless=true \
9295
-Dvcell.softwareVersion="${softwareVersion}" \

docker/build/Dockerfile-submit-dev

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,9 @@ EXPOSE 8000
117117
ENTRYPOINT java \
118118
-Xdebug -agentlib:jdwp=transport=dt_socket,address=*:8000,server=y,suspend=n \
119119
-XX:MaxRAMPercentage=80 \
120+
-XX:+ExitOnOutOfMemoryError \
121+
-XX:+HeapDumpOnOutOfMemoryError \
122+
-XX:HeapDumpPath=/dump \
120123
# -XX:+PrintFlagsFinal -XshowSettings:vm \
121124
-Djava.awt.headless=true \
122125
-Dvcell.softwareVersion="${softwareVersion}" \

vcell-rest/src/main/docker/Dockerfile.jvm

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,10 @@ ENV JAVA_OPTS="\
9595
-Dquarkus.http.host=0.0.0.0 \
9696
-Djava.util.logging.manager=org.jboss.logmanager.LogManager \
9797
-Dvcellapi.privateKey.file=/run/secrets/jwt-secret/apiprivkey \
98-
-Dvcellapi.publicKey.file=/run/secrets/jwt-secret/apipubkey"
98+
-Dvcellapi.publicKey.file=/run/secrets/jwt-secret/apipubkey \
99+
-XX:+ExitOnOutOfMemoryError \
100+
-XX:+HeapDumpOnOutOfMemoryError \
101+
-XX:HeapDumpPath=/dump"
99102
ENV JAVA_APP_JAR="/deployments/quarkus-run.jar"
100103

101104
ENTRYPOINT [ "/opt/jboss/container/java/run/run-java.sh" ]

vcell-rest/src/main/docker/Dockerfile.legacy-jar

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ COPY target/*-runner.jar /deployments/quarkus-run.jar
8787

8888
EXPOSE 8080
8989
USER 185
90-
ENV JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -D"
90+
ENV JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -XX:+ExitOnOutOfMemoryError -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/dump -D"
9191
ENV JAVA_APP_JAR="/deployments/quarkus-run.jar"
9292

9393
ENTRYPOINT [ "/opt/jboss/container/java/run/run-java.sh" ]

0 commit comments

Comments
 (0)