Tuesday, August 11, 2009

Java : Garbage Collection

Context:
I got a query - how does garbage collection work in Java? This was the time I started gathering details by means of googling. The details compiled here are based on my research online. Please note that the question I was asked about is very generic - and the details below touch upon the topic at very high level.


What is a Garbage collection?
Garbage collection is a mechanism in java which works on mark and sweep basis and relives programmers from spending time in writing code to claim back the memory allocated to the objects.

The mechanism also is part of Java's security strategy. It will prevent current program to crash JVM instance by means of memory hazards.

This service of course doesn't come free, it causes a small overhead since the process has to be continuously running in the background. However, keep in mind that scale of the application will determine the magnitude of overhead caused by this process.

How it works?

Garbage collector will iterate over all the available objects. The ones which are left out are marked for garbage collection. In the sweep stage, Garbage Collector deletes the objects for which the heap space was allocated and make it available to executing program.

Garbage collector will determine eligibility of an object by measuring its "reachability" from the root node. Any objects which are referred to by a root is a live object. Subsequently any objects referred to by a live object is also considered live object. Objects which are not reachable are dead and will be removed from heap space.

There are algorithms used by Garbage collectors to mark "reachable" objects abut it would be a subject in itself to learn more about them.

Garbage Collection Strategies

The garbage collectors might use Compacting and Copying strategies to deal with heap fragmentation.

What are generations?

Generations refer to the life span of the variables. Local variables for example are the most short lived ones and will be the first ones to be identified and claimed back. They are called young generations. Old generations are the objects which survive across multiple collections. Old and Young generations symbolize the memory pool available to store such objects.

When young generations are filled up, it causes minor collection and when tenured generation pool will cause Major Collection to occur. Note that later is much a slow process considering the fact that it involves all live objects

Java program is allocated heap space. Every thread is allocated a separate stack of functions and stacks reside in Heaps. Local variables are part of stacks.

Conclusion:

Although JVM encapsulates memory management under the hood - knowing how it works will make the developers write code more consciously keeping scalability in mind.

More (Technical) resources:

  1. Command line parameters which come for JMV's performance improvement are described by Sun here: http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp
  2. The details on how to performance tune Garbage Collection in java 1.5 is given here - http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html

1 comment:

  1. We also faced a lot of trouble due to GC. This post is nice and informative, and does cover GC in detail.

    One thing missing is, how to write programs that can work efficiently, with minimum overhead, even after handling gigantic scale.

    If I write an article on it, I'll surely leave a link here.

    Regards,
    Anurag.

    ReplyDelete