Object Creation & Memory Layout
Object Creation Process
When the JVM encounters a new instruction, object creation goes through the following steps:
new instruction
│
▼
① Class loading check ──→ Class not loaded? ──→ Execute class loading
│ Already loaded
▼
② Allocate memory
│
├── Space is contiguous? ──→ Bump the Pointer
│ └── Use CAS for thread safety
│
└── Space is fragmented? ──→ Free List
└── Use CAS or TLAB for thread safety
│
▼
③ Initialize memory to zero
│
▼
④ Set object header (Mark Word + Klass Pointer)
│
▼
⑤ Execute <init> method (constructor)
1. Class Loading Check
The JVM first checks whether the new instruction’s parameter can locate a symbolic reference to the class in the constant pool, and whether the class has been loaded, resolved, and initialized. If not loaded, the class loading process must be executed first.
2. Memory Allocation
After class loading, the JVM allocates memory in the heap based on the object size. The allocation method depends on whether the heap memory is contiguous:
- Bump the Pointer: When heap memory is perfectly contiguous, used and free memory are on each side with a pointer in between. Allocating memory means moving the pointer toward the free side by the object size
- Free List: When heap memory is fragmented, the JVM maintains a list recording which memory blocks are available, and finds a large enough space from the list during allocation
The allocation method depends on whether the GC collector has compaction capability:
- Serial, ParNew, G1, ZGC (with compaction) → Bump the Pointer
- CMS (mark-sweep) → Free List
3. Concurrent Safety
Object creation is very frequent in the JVM, requiring thread safety:
- CAS + retry on failure: Atomic operation for allocation actions
- TLAB (Thread Local Allocation Buffer): Each thread pre-allocates a small private buffer in Eden; threads allocate within their own TLAB, and use CAS to apply for a new TLAB when exhausted
# TLAB-related parameters
-XX:+UseTLAB # Enable TLAB (default: on)
-XX:TLABSize=256k # TLAB size
-XX:+PrintTLAB # Print TLAB information
4. Memory Initialization to Zero
After memory allocation, the JVM initializes the allocated memory space to zero (excluding the object header). This ensures that object instance fields can be used without explicit initialization:
// No NPE or undefined behavior occurs
int count; // Automatically initialized to 0
boolean flag; // Automatically initialized to false
Object ref; // Automatically initialized to null
5. Set Object Header
After zero initialization, the JVM sets the object header information, including:
- Which class the object is an instance of
- How to find the class metadata
- The object’s GC age
- Lock state information
6. Execute Constructor
From the JVM’s perspective, the <init> method (constructor) hasn’t started yet. From the programmer’s perspective, initialization truly begins after the new keyword.
Object Memory Layout
In HotSpot, the object’s memory layout is divided into three areas: Header, Instance Data, and Padding.
┌─────────────────────────────────────┐
│ Object Header │
│ ┌────────────────────────────────┐ │
│ │ Mark Word (32/64 bit) │ │
│ │ - Hash code, GC age, lock │ │
│ ├────────────────────────────────┤ │
│ │ Klass Pointer (32/64 bit) │ │
│ │ - Pointer to class metadata │ │
│ ├────────────────────────────────┤ │
│ │ Array Length (optional) │ │
│ │ - Only for array objects │ │
│ └────────────────────────────────┘ │
├─────────────────────────────────────┤
│ Instance Data │
│ - Fields inherited from parent │
│ - Fields defined in this class │
├─────────────────────────────────────┤
│ Padding │
│ - Ensure object size is multiple │
│ of 8 bytes │
└─────────────────────────────────────┘
Mark Word
The Mark Word stores the object’s own runtime data, with lengths of 32 bits and 64 bits in 32-bit and 64-bit JVMs respectively. It is key to implementing lightweight locks and biased locking.
64-bit JVM Mark Word Layout:
| Storage Content | Flag Bits | State |
|---|---|---|
| Object hash code, GC generational age | 01 | Unlocked |
| Pointer to lock record | 00 | Lightweight lock |
| Pointer to heavyweight lock | 10 | Heavyweight lock (mutex) |
| Empty | 11 | GC mark |
| Biased thread ID, biased timestamp, GC generational age | 01 | Biased lock |
Klass Pointer
The Klass Pointer points to the object’s type metadata (InstanceKlass). The JVM uses this pointer to determine which class the object belongs to. With compressed pointers enabled, it occupies 4 bytes; otherwise, 8 bytes.
Array Length
Only array objects have this field, recording the array length. The JVM can determine the size of regular objects through their metadata, but not arrays.
Instance Data
Instance data is the actual effective information stored in the object — the various field contents defined in the code, including those inherited from parent classes and defined in the current class.
Field storage order is influenced by the -XX:FieldsAllocationStyle parameter. Default strategy:
- First store primitive types (long/double > int/float > short/char > byte/boolean)
- Then store reference types
Parent class fields appear before child class fields. The CompactFields parameter (enabled by default) inserts smaller fields into gaps between parent class fields.
Padding
HotSpot requires object size to be a multiple of 8 bytes. The object header is already a multiple of 8 bytes (64-bit JVM with compression enabled), so when instance data is not a multiple of 8, padding is needed.
Compressed Oops
Why Compressed Oops Are Needed
In a 64-bit JVM, object references occupy 8 bytes, compared to 4 bytes in 32-bit, increasing memory consumption by approximately 1.5 times. More memory means:
- Increased GC workload
- Reduced cache hit rate
- Increased memory bandwidth pressure
Compression Principle
Compressed Oops compress 64-bit object references into 32 bits:
On store: reference = (actual address - heap base address) >> 3
On use: actual address = heap base address + (reference << 3)
Leveraging the 8-byte alignment property of objects, the lower 3 bits are always 0 and don’t need to be stored. Therefore, 32-bit references can address 2^32 × 8 = 32GB of heap space.
Enabling Conditions
- Enabled by default when heap size < 32GB
- Automatically disabled when heap size ≥ 32GB (32-bit references cannot address the entire heap)
- Can be explicitly enabled via
-XX:+UseCompressedOops(on by default) - Combined with
-XX:+UseCompressedClassPointersto compress Klass Pointer
# Check compressed pointer status
java -XX:+PrintFlagsFinal -version | grep Compressed
Object Size Calculation
On a 64-bit JVM with compressed pointers enabled:
// A simple object
class Simple {
int id; // 4 bytes
}
// Object header: Mark Word(8) + Klass Pointer(4) = 12 bytes
// Instance data: int(4) = 4 bytes
// Padding: 0 bytes (12 + 4 = 16, already aligned)
// Total: 16 bytes
// Object with reference
class WithRef {
int id; // 4 bytes
Object ref; // 4 bytes (compressed pointer)
}
// Object header: 12 bytes
// Instance data: 4 + 4 = 8 bytes
// Padding: 0 bytes
// Total: 20 → aligned to 24 bytes (4 bytes padding)
Object Access
The JVM operates on specific objects on the heap through reference data on the stack. The reference type in the specification only specifies a reference pointing to an object. There are two mainstream access methods:
Handle Access
reference → ┌──────────────┐
│ Handle Pool │
│ ┌──────────┐ │ ┌────────────────┐
│ │Instance ├─┼────→│ Heap Object │
│ │data ptr │ │ │ Instance │
│ ├──────────┤ │ └────────────────┘
│ │Type data ├─┼─┐ ┌────────────────┐
│ │pointer │ │ └──→│ Method Area │
│ └──────────┘ │ │ Type Data │
└──────────────┘ └────────────────┘
- Advantage: The reference stores a stable handle address; when objects are moved (during GC), only the instance data pointer in the handle needs to be updated
- Disadvantage: One additional level of indirection overhead
Direct Pointer Access
reference → ┌────────────────┐ ┌────────────────┐
│ Heap Object │────→│ Method Area │
│ Instance │ │ Type Data │
└────────────────┘ └────────────────┘
- Advantage: Fast, one less level of indirection
- Disadvantage: When objects are moved, the reference itself needs to be updated
HotSpot uses direct pointer access because object access is extremely frequent, and eliminating one level of indirection is significant. The GC is responsible for updating references when objects are moved.
Summary
This chapter analyzed object creation and memory layout in the JVM. Understanding the Mark Word structure is the foundation for mastering Java lock mechanisms; understanding compressed oops helps optimize memory usage; understanding object access methods helps understand GC’s impact on applications. The next chapter will cover garbage collection algorithms.
Comments