Tuesday, June 16, 2009

2009-06-16 Tuesday - Java JVM Garbage Collector Tuning

A client recently engaged my services to help their development team refactor the C# .NET code base for a major systems integration interface.

Their initial execution time was approximately 38 minutes. The latest version of the refactored code is now running in approximately 8 minutes.


One layer of the interface invokes a 3rd party vendor's Java Web Services API. Under nominal load, one of the Java Web Services throws an OutOfMemoryError - as the size of a batch file is increased during load testing. Thus far, they have relied on increasing the Java JVM Heap size parameters. They are running JDK 1.5 due to a 3rd party library dependency.

A bit of investigating came across what appeared to be somewhat similiarly reported problem:

This discussion thread indicates that this might be resolved in JDK 1.6
http://www.nabble.com/What-the-...--%22java.lang.OutOfMemoryError%3A-GC-overhead-limit-exceeded%22-tt12058809.html

This summary description seems to be in line with what I suspect to be the root cause:

Excessive GC Time and OutOfMemoryError
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#par_gc.oom

"The parallel collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line."


In preparing to do further diagnosis, I've assembled the following links to share with the client's technical staff:


2009 JavaOne Technical Session: Garbage Collection Tuning in the Java HotSpot Virtual Machine TS-4887
http://72.5.124.65/learning/javaoneonline/j1sessn.jsp?sessn=TS-4887&yr=2009&track=javase

Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html


Frequently Asked Questions bout Garbage Collection in the HotspotTM JavaTM Virtual Machine (1.4.2)
http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

Java theory and practice: Garbage collection in the HotSpot JVM (2003)
http://www.ibm.com/developerworks/java/library/j-jtp11253/

Garbage collection tuning in Java 5.0 is a breath of fresh air
http://articles.techrepublic.com.com/5100-10878_11-6108296.html

4 Easy Ways to do Java Garbage Collection Tuning
http://developer.amd.com/documentation/articles/pages/4EasyWaystodoJavaGarbageCollectionTuning.aspx

Java Garbage Collection Tuning
http://www.jivesoftware.com/jivespace/docs/DOC-1486



http://www.caucho.com/resin-3.0/performance/jvm-tuning.xtp

There are essentially two GC threads running. One is a very lightweight thread which does "little" collections primarily on the Eden (a.k.a. Young) generation of the heap. The other is the Full GC thread which traverses the entire heap when there is not enough memory left to allocate space for objects which get promoted from the Eden to the older generation(s).

If there is a memory leak or inadequate heap allocated, eventually the older generation will start to run out of room causing the Full GC thread to run (nearly)continuously. Since this process "stops the world", {your application} won't be able to respond to requests and they'll start to back up.

The amount allocated for the Eden generation is the value specified with -Xmn. The amount allocated for the older generation is the value of -Xmx minus the -Xmn. Generally, you don't want the Eden to be too big or it will take too long for the GC to look through it for space that can be reclaimed


Troubleshooting/FAQ
http://www.caucho.com/resin-3.0/troubleshoot/technique.xtp#garbage-collector

java.lang.OutOfMemoryError exception, application runs out of memory
http://www.caucho.com/resin-3.0/troubleshoot/symptoms.xtp#memory-leaks

"An OutOfMemoryError exception is usually an indication that heap memory is being used up. Often this is from application code keeping references to objects that are no longer needed, and the garbage collector does not free them"

CPU spikes, excessive CPU usage Obtain a thread dump and check for threads that are caught in tight loops. Check for garbage collection issues.


Out of Memory and Garbage Collection
http://www.caucho.com/resin-3.0/troubleshoot/technique.xtp#out-of-memory

Most memory problems are due to memory leaks in the application program. For example, a cache or a vector that fills up with stale data, or a singleton or static variable which doesn't properly detect a web-app reload. Some more exotic memory issues relate to running out of heap memory or virtual memory when using a large number of threads (> 256).

The steps to track down a memory problem are:

1. Enable -J-verbosegc with the httpd.sh start or httpd -install. The -verbosegc flag logs the garbage collection of the heap, which will let you know if you're running out of heap memory (the most common case).

2. Get a heap profiler or use the heap dump in the JVM. JProfiler is an inexpensive commercial heap profiler. Although the JVM's heap dump is not very user friendly, it's always available. You should be using a heap profiler as part of your development process and certainly use one before any production launch.

3. With the heap profiler, find the 2-3 top memory users and fix those memory leaks.


4. Common application errors include:
+ ThreadLocal variables that are not properly cleared at the end of each request.
+ Singleton or static hash maps and caches, esp check for clearing web-app restarts.
+ Spawned threads which are not stopped on web-app restart.
+ web-app variables (like the "application" variable), stored in a static variable.

5. If the heap is clean, i.e. -verbosegc shows a steady heap, you'll need to look at non-heap memory:
+ Thread stack usage (-Xss1m). Each thread takes up some non-heap memory. The default on some systems is 8 meg. Because of 32-bit virtual memory limits of about 2G on some systems, even 256 threads with 8 meg stacks can chew up the virtual memory. You can drop the stack size with the -Xss directive.
+ JNI memory. If you're using JNI libraries or drivers that use JNI, it's possible that the JNI can allocate more memory than is available.
+ fork/exec and OS limits. If the OS does not have enough available swap space, for example, the OS might refuse a "jikes" compilation.
+ NIO, memory mapping, and .jar files. The JDK memory-maps .jar files. In some cases with very large numbers of jars, this can result in running out of virtual memory. The same issue can appear for NIO memory mapping.

6. If all of these are eliminated, it might be an {application} bug. However, you will
need to have located the memory leak as {application}-related before any memory-related bug report, i.e. it's necessary to go through the above steps before reporting a bug. Bug reports that complain about OutOfMemory errors, but have not even gotten a JDK memory dump are most likely application errors. You must provide a heap dump when reporting any potential Resin memory problems.



Tuning the Java Runtime System
http://docs.sun.com/source/817-2180-10/pt_chap5.html

Tuning JVM switches for performance
http://performance.netbeans.org/howto/jvmswitches/index.html

More exotic switches
-XX:+UseAdaptiveSizePolicy - this switch may help
improve garbage collector throughput and memory footprint. It is part of garbage
collector ergonomics implemented in JDK5.0.



JDK 1.5 Garbage Collector Ergonomics
http://java.sun.com/j2se/1.5.0/docs/guide/vm/gc-ergonomics.html

(…)
On server-class machines running the server VM, the garbage collector (GC) has changed from the previous serial collector (-XX:+UseSerialGC) to a parallel collector (-XX:+UseParallelGC). You can override this default by using the -XX:+UseSerialGC command-line option to the java command.

(…)
The parallel garbage collector (UseParallelGC) throws an out-of-memory exception if an excessive amount of time is being spent collecting a small amount of the heap. To avoid this exception, you can increase the size of the heap. You can also set the parameters -XX:GCTimeLimit=time-limit and -XX:GCHeapFreeLimit=space-limit

where:

time-limit:

The upper limit on the amount of time spent in garbage collection in percent of total time (default is 98).

space-limit:

The lower limit on the amount of space freed during a garbage collection in percent of the maximum heap (default is 2).

(…)
-XX:GCTimeRatio=nnn

A hint to the virtual machine that it's desirable that not more than 1 / (1 + nnn) of the application execution time be spent in the collector.

For example -XX:GCTimeRatio=19 sets a goal of 5% of the total time for GC and throughput goal of 95%. That is, the application should get 19 times as much time as the collector.

By default the value is 99, meaning the application should get at least 99 times as
much time as the collector. That is, the collector should run for not more than 1% of the total time. This was selected as a good choice for server applications. A value that is too high will cause the size of the heap to grow to its maximum.



The client environment includes running JBoss Application Server:

JBoss Enterprise Portal Platform 4.3 Tuning Guide
http://www.redhat.com/docs/en-US/Enterprise_Portal_Platform/4.3/pdf/Tuning_Guide.pdf


6.2. Garbage Collection (GC) Tuning
"Depending on nature of your application, adding XX:+UseConcMarkSweepGC
-XX:+UseParNewGC may optimize GC collection behavior. "

Jboss 4.2 - TuneVMGarbageCollection
http://www.jboss.org/community/wiki/TuneVMGarbageCollection



JBoss Enterprise Application Platform Tuning (Jboss World 2008)
http://www.jbossworld.com/2008/downloads/pdf/thursday/JBOSS_10-1050am_JBoss_Enterprise_Application_Platform_Tuning_v2_Andy_Miller.pdf



Scaling Up the JBoss Application Server (LinuxWorld Open Solutions Summit 2007)
http://www.linuxworld.com/events/2007/slideshows/A2-johnson.pdf

Set the heap sizes
– Set min and max sizes to same value
– Set young
generation to 1/3 size of heap

No comments: