Comprehensive Tuning Case Studies
Tuning Methodology
JVM tuning is not about blindly adjusting parameters — it follows a scientific methodology:
Tuning Principles
- Optimize code first, then tune the JVM: Most performance issues stem from code, not JVM configuration
- Tune with data, not guesses: Base decisions on GC logs and monitoring data
- Change only one parameter at a time: Avoid multiple variables that make it impossible to determine causation
- Establish a baseline before tuning: Record pre-optimization metrics for comparison
- Verify tuning results with real load: Not idle testing
Tuning Process
1. Define goals (throughput? latency? memory footprint?)
2. Establish baseline (collect current metrics)
3. Analyze bottlenecks (GC logs / monitoring / thread dumps)
4. Formulate plan (parameter adjustments / code optimization)
5. Implement and verify (A/B testing / canary deployment)
6. Compare results (against baseline)
Case 1: Spring Boot Web Service GC Tuning
Background
A Spring Boot e-commerce backend service, 4C8G container, JDK 17, G1GC.
Problem
- API P99 latency > 500ms, with occasional spikes > 2s
- Young GC frequency approximately every 3 seconds
- Sporadic Full GC causing service pauses
Analysis
# 1. Examine GC logs
# Young GC pause times
[GC pause (G1 Evacuation Pause) (young), 0.045 secs] # 45ms, acceptable
[GC pause (G1 Evacuation Pause) (young), 0.080 secs] # 80ms, on the high side
# Full GC
[Full GC (Allocation Failure) 4096M->2800M(4096M), 1.200 secs] # 1.2 seconds!
# 2. jstat analysis
jstat -gcutil <pid> 1000 10
# O (old generation utilization) consistently at 75-85%, triggering frequent concurrent marking
# FGC count 2-3 times per hour
Diagnosis
- Root cause: 4G heap too small, high old generation utilization, Mixed GC reclamation can’t keep up with allocation rate, degrading to Full GC
- Contributing factor: Large objects (order list JSON serialization) allocated directly in Humongous Regions, triggering premature GC
Optimization Plan
# Before
java -Xms4g -Xmx4g -XX:+UseG1GC -jar app.jar
# After
java -Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=100 \ # Lower target pause (200→100)
-XX:G1HeapRegionSize=8m \ # Larger Region (avoid large objects becoming Humongous)
-XX:InitiatingHeapOccupancyPercent=40 \ # Trigger concurrent marking earlier (45→40)
-XX:G1MixedGCCountTarget=12 \ # More Mixed GC cycles (8→12)
-XX:G1MixedGCLiveThresholdPercent=80 \ # More aggressive reclaim (85→80)
-XX:SurvivorRatio=6 \ # Larger Survivor (8→6)
-Xlog:gc*:file=/var/log/gc.log:time,uptime:filecount=5,filesize=20m \
-jar app.jar
Code Optimization
// Before: large object serialization
String json = objectMapper.writeValueAsString(orders); // Large string when order count is high
// After: streaming write
objectMapper.writeValue(outputStream, orders);
// Before: unbounded cache
Map<String, Order> cache = new HashMap<>();
// After: bounded cache
Cache<String, Order> cache = Caffeine.newBuilder()
.maximumSize(10000)
.expireAfterWrite(Duration.ofMinutes(30))
.build();
Results
| Metric | Before | After |
|---|---|---|
| P99 latency | 520ms | 85ms |
| Young GC pause | 45-80ms | 20-35ms |
| Full GC frequency | 2-3/hour | 0 |
| Old generation utilization | 75-85% | 50-65% |
Case 2: Spark Job Off-Heap Memory Tuning
Background
A Spark ETL job processing 500GB of data, JDK 11, running on YARN with 8G Executor memory.
Problem
- OOM after running for a while:
java.lang.OutOfMemoryError: Direct buffer memory - Severe GC pauses during shuffle stage
Analysis
# Executor log
ERROR Executor: Exception in task 123.0 in stage 45
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:175)
# GC log: long Old GC pauses
[GC pause (G1 Evacuation Pause) (mixed), 0.800 secs]
Diagnosis
- Root cause 1: Spark network transport uses Netty, which heavily uses off-heap DirectByteBuffer, exceeding
-XX:MaxDirectMemorySize - Root cause 2: Too many on-heap cached objects, poor Mixed GC reclaim efficiency
Optimization Plan
# Before
--conf spark.executor.memory=8g
--conf spark.executor.memoryOverhead=2g
# After
--conf spark.executor.memory=6g # Reduce heap memory
--conf spark.executor.memoryOverhead=4g # Increase off-heap memory (2→4G)
--conf spark.executor.extraJavaOptions="-XX:+UseG1GC -XX:MaxDirectMemorySize=3g -XX:MaxGCPauseMillis=100"
--conf spark.memory.fraction=0.6 # Reduce execution/storage memory ratio (0.6→0.5)
--conf spark.memory.storageFraction=0.3 # Reduce storage memory fraction
--conf spark.sql.shuffle.partitions=400 # Increase shuffle partition count
# Code optimization: reduce broadcast variable size
# Before
broadcast(bigLookupTable) // 2GB lookup table
# After
broadcast(filteredLookupTable) // Filtered to 200MB
Results
| Metric | Before | After |
|---|---|---|
| OOM frequency | Every run | Eliminated |
| Job duration | 3.5 hours | 2.1 hours |
| Total GC time | 45 minutes | 18 minutes |
Case 3: Containerized Microservice JVM Limits and Tuning
Background
50+ microservices running in a Kubernetes cluster, container limits 2C4G, JDK 17.
Problem
- Multiple services OOMKilled (container killed, not Java OOM)
- GC logs show heap utilization only 60%
- Container actual memory usage close to limit
Analysis
# View container memory usage
kubectl top pod <pod-name>
# NAME CPU MEMORY
# my-service 800m 3900Mi (limit 4Gi, 95% used)
# Java process memory breakdown (Native Memory Tracking)
jcmd <pid> VM.native_memory summary
# Findings:
# - Java Heap: 2.0G (-Xmx2g)
# - Class: 0.5G (metaspace)
# - Thread: 0.6G (300 threads × 2M/Xss)
# - Internal: 0.3G (Direct ByteBuffer)
# - Code: 0.2G (JIT code cache)
# - GC: 0.2G (GC data structures)
# Total: 3.8G → approaching 4G limit
Diagnosis
- Root cause: JVM off-heap memory + native memory + container overhead exceeds container limit
-Xmx2gonly limits the heap; off-heap memory is uncontrolled- Thread stack
Xssdefaults to 1M, 300 threads = 300M
Optimization Plan
# Before
java -Xmx2g -jar app.jar
# After — using container-aware parameters
java -XX:+UseContainerSupport \
-XX:MaxRAMPercentage=50.0 \ # Heap = 50% × 4G = 2G
-XX:InitialRAMPercentage=50.0 \
-XX:MaxMetaspaceSize=256m \ # Limit metaspace
-XX:MaxDirectMemorySize=256m \ # Limit direct memory
-XX:ReservedCodeCacheSize=128m \ # Reduce code cache
-Xss512k \ # Reduce thread stack (1M→512K)
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=100 \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/tmp/heapdump.hprof \
-Xlog:gc*:file=/tmp/gc.log:time:filecount=3,filesize=10m \
-jar app.jar
Kubernetes Configuration
resources:
requests:
memory: "3Gi"
cpu: "1.5"
limits:
memory: "4Gi"
cpu: "2"
# Increase container memory headroom
# JVM heap 2G + off-heap ~1.5G + OS ~0.5G = 4G
Results
| Metric | Before | After |
|---|---|---|
| OOMKilled | 3-5 per week | 0 |
| Container memory usage | 3.9G/4G | 3.0G/4G |
| GC P99 pause | 120ms | 45ms |
Full-Stack Tuning Case Study
Scenario
An online education platform where users report slow video loading, API P99 > 3s.
Step 1: Monitoring and Detection
# Prometheus alerts
# - API service P99 > 3s
# - GC pause P99 > 500ms
# Grafana Dashboard
# - Heap utilization consistently at 85%
# - Young GC frequency every 2 seconds
# - Old generation steadily growing
Step 2: Analyze GC Logs
# Upload GC logs to GCEasy
# Analysis results:
# - Throughput: 92% (target > 98%)
# - Avg Young GC pause: 65ms
# - Max Young GC pause: 320ms
# - Full GC: 5 times in 2 hours
# - Object promotion rate: 200MB/s
Step 3: Thread Analysis
# jstack reveals numerous BLOCKED threads
"pool-3-thread-15" #45 prio=5 os_prio=0 tid=0x... nid=0x... waiting for monitor entry
java.lang.Thread.State: BLOCKED (on object monitor)
at com.example.VideoService.getVideoInfo(VideoService.java:123)
- waiting to lock <0x...> (a java.lang.Object)
at com.example.VideoService.process(VideoService.java:89)
Step 4: Code Investigation
// Problematic code: synchronized lock + unbounded cache
public class VideoService {
private Map<String, VideoInfo> cache = new HashMap<>();
public synchronized VideoInfo getVideoInfo(String id) {
VideoInfo info = cache.get(id);
if (info == null) {
info = loadFromDB(id); // Slow DB query
cache.put(id, info); // Cache grows unbounded
}
return info;
}
}
Step 5: Implement Optimization
// After optimization: async loading + bounded cache
public class VideoService {
private Cache<String, VideoInfo> cache = Caffeine.newBuilder()
.maximumSize(50000)
.expireAfterWrite(Duration.ofMinutes(30))
.refreshAfterWrite(Duration.ofMinutes(10))
.buildAsync(this::loadFromDB); // Async refresh
public CompletableFuture<VideoInfo> getVideoInfo(String id) {
return cache.get(id);
}
}
# JVM parameter adjustments
# Increase heap + optimize G1
java -Xms4g -Xmx4g \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=100 \
-XX:G1HeapRegionSize=8m \
-XX:InitiatingHeapOccupancyPercent=40 \
-jar app.jar
Step 6: Verify Results
| Metric | Before | After |
|---|---|---|
| API P99 latency | 3200ms | 180ms |
| GC throughput | 92% | 99.2% |
| Full GC | 5 times/2 hours | 0 |
| BLOCKED threads | 30+ | 0 |
Tuning Best Practices Summary
- Always enable GC logging and HeapDump — this is the foundation for tuning and troubleshooting
- Set -Xms = -Xmx to avoid jitter from dynamic heap expansion/contraction
- Use
-XX:MaxRAMPercentagein container environments instead of fixed-Xmx - Optimize code before tuning JVM — code issues cannot be solved with parameters
- Avoid explicit System.gc() — use
-XX:+DisableExplicitGCto disable it - Choose the right collector: G1/ZGC for latency-sensitive workloads, Parallel for throughput
- Monitor off-heap memory: Native Memory Tracking helps with investigation
- Tune incrementally: change one parameter at a time, compare against baseline
- Use canary deployments: validate tuning parameters on a subset of instances first
- Monitor continuously: tuning is not a one-time task — keep watching metric changes
Summary
This chapter demonstrated the full JVM tuning process through four case studies: from monitoring to detect issues, to analyzing GC logs and thread dumps to identify root causes, to formulating and implementing optimization plans, and finally verifying results. The core of tuning is not memorizing parameters, but mastering the “Monitor → Locate → Analyze → Optimize → Verify” methodology.
Comments