A Java Heap Dump is a snapshot of all the objects that are in memory in the JVM at a certain moment. Typically, a heap dump is created at the moment a Java application crashes because it runs out of memory. The heap dump can then show you what the application was doing in its dying moments, providing insight into potential memory leaks.
By Joep Weijers | TOPdesk
Likewise, you can also extract a heap dump from a running JVM. This is a useful technique to peak under the hood of a service that is running with abnormal memory usage, but that is not running out of memory (yet). For us at TOPdesk, this is a very valuable tool to investigate performance issues.
The JVM supplies the Attach API to allow external tools to attach to the JVM. Profiling tools like jmap
, jcmd
and JVisualVM use that API to monitor and troubleshoot the Java process inside JVM.
Unfortunately, we can’t use any of these tools in our containers. JVisualVM is the user friendliest, as it provides a simple graphical user interface, but is not usable from within our headless Docker containers. jmap
is not an officially supported tool and is not part of the OpenJ9 distribution. And jcmd
is only available in the JDK variant of OpenJ9, not in the JRE.
So we need a way to make the jcmd
tool available in an already running OpenJ9 JRE container.
Attempt 1: Using a Kubernetes Ephemeral Debug Container
Kubernetes 1.16 introduces an alpha feature called Ephemeral Containers: a special type of container that runs temporarily in an existing Pod to accomplish user-initiated actions such as troubleshooting.
Could we use such an ephemeral container containing the OpenJ9 JDK to attach to a container running a service on OpenJ9 JRE?
In this example we will use a Docker container running a simple Hello World webserver: topdesk/example-openj9-web-service:1.0.0. You can find the code of this example service in the GitHub repository: https://github.com/TOPdesk/example-openj9-web-service.
Starting the web service container
Let’s start by firing up a minikube
cluster. Since ephemeral containers are still an alpha version the EphemeralContainers
feature gate has to be enabled:
- minikube start –feature-gates=EphemeralContainers=true
Now we are going to start a pod with our example-openj9-web-service:
- kubectl run example-openj9-web-service –image=topdesk/example-openj9-web-service
If we exec
into the pod, we can verify the Java version is indeed OpenJ9 JRE. And we also see that jcmd
is not present in this image:
- kubectl exec -it example-openj9-web-service — sh
- $ java -version
- openjdk version “11.0.9” 2020-10-20
- OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11)
- Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 11 Linux amd64-64-Bit Compressed References 20201022_810 (JIT enabled, AOT enabled)
- $ which java
- /opt/java/openjdk/bin/java
- $ ls /opt/java/openjdk/bin/
- java jitserver jjs jrunscript keytool pack200 rmid rmiregistry unpack200
- $ jcmd
- sh: 4: jcmd: not found
Starting the ephemeral debug container
Now we are going to attach an ephemeral container to the example-openj9-web-service container. We use the OpenJ9 JDK image, are targeting the process namespace of the other container and start the sh
command:
- kubectl alpha debug -it example-openj9-web-service –image=adoptopenjdk:11.0.9_11-jdk-openj9-0.23.0 –target=example-openj9-web-service — sh
We can use ps
to list all running processes and see the web service of the example-openj9-web-service container running:
- $ ps -ef
- UID PID PPID C STIME TTY TIME CMD
- 999 1 0 0 16:34 ? 00:00:00 /bin/sh -c java $JAVA_OPTS -jar /opt/webservice/webservice.jar
- 999 7 1 0 16:34 ? 00:00:00 java -jar /opt/webservice/webservice.jar
- root 251 0 0 17:08 pts/0 00:00:00 sh
- root 280 251 0 17:08 pts/0 00:00:00 ps -ef
The additional tools of the JDK are now at our disposal. Let’s use jcmd
to create a heap dump of our Java process with PID 7:
- $ jcmd 7 GC.heap_dump /tmp/heapdump
- Error getting data from 7: Exception connecting to 7
It seems we can’t connect to the JVM. Let’s use jps
to list all the running JVMs:
- $ jps -l
- 327 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps
That is odd, we only see the tool itself running. So we seem unable to reach the JVM in the container we are attached to. Are we doing something wrong here?
Attempt 2: Switching user in the ephemeral debug container
According to the jps
documentation “the tool shows information for every Java process that is owned by the current user ID on the current host”. We are currently running as root, but the Java process is running as user 999. Maybe it works if we create a user 999 and run jps
under that user?
- $ apt-get update
- $ apt-get install -y sudo
- $ useradd –no-create-home –uid 999 debuguser
- $ sudo su debuguser
- $ /opt/java/openjdk/bin/jps -l
- 613 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps
Unfortunately, we still can’t connect to the JVM running in the example-openj9-web-service container. I am not sure whether that is caused by something in Kubernetes’ process sharing between the main container and debug container, or maybe the JDK tooling can’t handle this situation.
Attempt 3: Downloading the JDK into the container
A completely different approach is to copy the debug tools into your container. For instance, by downloading the JDK into the running web service container. After downloading and unzipping, you can use jps
and jcmd
directly:
- $ cd /tmp
- $ curl -L https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.9%2B11_openj9-0.23.0/OpenJDK11U-jdk_x64_linux_openj9_11.0.9_11_openj9-0.23.0.tar.gz –output jdk.tgz
- $ tar -zxvf jdk.tgz
- $ jdk-11.0.9+11/bin/jps -l
- 687 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps
- 7 /opt/webservice/webservice.jar
- $ jdk-11.0.9+11/bin/jcmd 7 GC.heap_dump /tmp/heapdump
- Dump written to /tmp/heapdump
Success! We have our heap dump. However, we also polluted our container with a JDK. We can of course remove the JDK files afterwards. But the concept of ephemeral debug containers, that disappear when you are done debugging, is much more appealing. If you know of a way to do heap dumps on running containers using ephemeral debug containers, please reach out!