Damodar's Musings

web development and miscellany

Now that the basics of garbage collection are behind us, in this post I’ll discuss the Garbage Collectors that are available for us to choose from.

Before we go there, let’s first look at two key measures that will help us evaluate the available garbage collectors. The time spent by an executing application can be expressed as the sum of the time spent by the application in actually doing useful work,  and the time that the application had to forcibly wait for garbage collection activities to complete.

This can be expressed as:

Total Execution Time = Useful Time + Paused Time

This brings us to our definitions:

  1. Throughput:
    This is defined as the percentage of total time that is not spent in garbage collection.  I.e.,  [Useful Time] / [Total Execution Time]
  2. Latency:
    This is defined as the average of all the [Paused Time] values over the execution of the application.

Latency is usually a major concern for highly interactive or real time applications, where a delay in processing is noticeable and potentially significant. On the other hand, for server side web applications, a bigger concern is throughput, since the latency introduced by garbage collection may be dwarfed by the latencies introduced by other contributors such as database or network access.

It is important to note that it is hard for a garbage collector to maximize both measures. For instance, throughput can be enhanced by using a very generously sized young generation (which reduces the frequency at which GC cycles are run). However, this can adversely impact latency when the garbage collection does occurs, as the time per garbage collection cycle is directly proportional to the size of the area of the heap being managed.

Garbage Collectors

The following garbage collectors are available for our use, and can be configured using the appropriate JVM switches.

Generational Area Characteristics
Serial Young Stop The World, Copying Collector, Single GC thread. (works as described in the previous article).
Serial Old (MSC) Old Stop the World, Mark Sweep Compact (MSC), Single GC thread
Parallel Scavenge Young Stop the World, copying collector, multiple GC threads. Provides higher throughput by executing GC tasks in parallel with each other (but not the app).

Cannot run during concurrent phases of the CMS.

Parallel New Young As Parallel Scavenge, but can run during the concurrent phases of the CMS
Parallel Old/ Parallel Compacting Old Similar to Parallel Scavenge, but operates on the old generation.

uses multiple GC threads to speed up the work of Serial Old (MSC).

STW collector, but higher throughput for old generation collections.

Concurrent Mark-Sweep (CMS) Old Breaks up its work into phases, and executes most of its phases concurrently with the application thread – resulting in low latency. However, it introduces substantial management overhead and results in a fragmented heap.

Selecting a Garbage Collector

It is important to note that you can install different collectors to manage each generation. For instance, to use the Parallel Scavenge collector for the Young Generation, and the Serial Old collector for the Old Generation, you would use the following switch:

java -XX:+UseParallelGC

Switch Young Generation Old Generation
UseSerialGC Serial Serial Old (MSC)
UseParNewGC ParNew Serial Old (MSC)
UseConcMarkSweepGC ParNew CMS (mostly used)

Serial Old (used when concurrent mode failure occurs)

+UseParallelGC Parallel Scavenge Serial Old
UseParallelOldGC Parallel Scavenge Parallel Old
+UseConcMarkSweepGC
-UseParNewGC
Serial CMS
Serial Old

Performance Tuning Considerations

While tuning is largely trial-and-error, and is highly dependent on your particular environment and application needs, there are a few guiding principles that might be of help.

  1. An insufficient heap is the leading cause of garbage collection. This is particularly a problem for server side JVMs, especially at high loads. Hence, devote as much space to the heap as possible. Try allocating between 50-70% of the physical memory on the server to the JVM and see if it makes a difference.
  2. Set the initial and maximum heap sizes to the same value. You’re likely going to end up at your maximum value anyway – so why not make it easier on the JVM – and avoid having to gradual grow your heap? This eliminates the CPU cycles required to grow the heap.
  3. Set the Young Generation size appropriately. It has to be small enough (to avoid lengthy GC pauses), but big enough to accommodate a large number of transitory objects. Use the NewSize and MaxNewSize parameters wisely, and set the young generation to about 25% of the total heap. You can also use NewRatio to set the size of the young generation relative to the old generation.
  4. The Young Generation area must be set to less than half the total heap (see Reference [1] for details on the Young Generation guarantee).
  5. Use the default Garbage Collectors and attempt some of the more complex options only if the situation warrants it.
  6. Ensure that you clear out references that are no longer needed. Pay particular attention to collections (such as maps) that may continue to hold obsolete references to objects, long after their usefulness has ended.
  7. Use JVM options like -verbose:gc, and -XX:+PrintGCDetails to monitor GC performance. Ideally, you want to avoid a sawtooth pattern, which large amounts of memory being freed up after each collection.

With this, I’ve come to the end of the story I set out to tell about garbage collection in Java. This question was prompted by an attendee at my presentation at SuperValu, Inc. in Chanhassen.

Do add a comment here if you find anything here that merits correction.

References:

  1. http://java.sun.com/docs/hotspot/gc1.4.2/
  2. http://blogs.sun.com/jonthecollector/entry/our_collectors
  3. http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp

The image below demonstrates what the JVM heap might look like after an application has been running for some time.

As we saw in the last article, the JVM’s heap is divided into 3 main areas – the permanent generation (purple), the old generation (yellow), and the young generation. The young generation area is further subdivided into Eden (green), and two survivor spaces (blue and orange).

The root set of an application is a set of objects that define the highest level references available to the application. The most common members of this set are any static references available to the application, as well as any local variable references to objects that have been allocated by the application itself.

An object on the heap is considered to be “live” if a path can be traced from a root set member to that given object; otherwise the object is considered “dead”. Live objects are said to have passed the “reachability” test, whereas dead objects have failed that test. Live objects are important and must be preserved, whereas dead objects are considered garbage, and must be reclaimed during the next garbage collection cycle.

In the image below, live objects are indicated by a bright brick pattern, while dead objects are represented using a washed out wavy pattern.

We will use this diagram for further exploration of the GC mechanism.

Garbage collectors can typically be described using these dimensions:

  1. Tracing garbage collectors
    This describes a collector that relies on the reachability test to determine which objects should be preserved, and which ones are garbage and so need to be reclaimed.
  2. Stop the World or Concurrent collectors
    Collectors often rely on the fact that no mutations of the heap may occur once the garbage collector has begun its operation. A Stop the World collector freezes all activity in the JVM until it has completed its cycle. On the other hand, a concurrent collector attempts to run the garbage collection process concurrently with the running application. Allowing concurrent operation ratchets up the complexity of its implementation significantly. In addition, such collectors may not be as aggressive at cleaning up garbage as Stop the World collectors.
  3. Compacting collectors
    As memory allocations and deallocations occur over the life of an application, the heap tends to get rather fragmented. This degrades the ability of the JVM to satisfy requests for large contiguous blocks of memory. Compacting collectors address this by defragmenting the heap – by moving live objects so that they are adjacent to each other, maximizing the contiguous free space available on the heap. This results in more efficient memory allocation.

    An alternative to compacting is to allow an object to span non contiguous blocks of memory. The tradeoff is increased implementation complexity and memory management overhead.

  4. Generational Collectors
    Most objects allocated by an application are short lived. Usually, these objects survive only for the duration of a single method call, before being eligible for reclamation.

    Further, the effort required to manage memory is directly proportional to the size of the memory block that needs to be managed. In other words, the smaller the area to be managed, the faster the collection will complete, thereby increasing the amount of CPU time available to the application itself.

    A generational collector takes advantage of both these principles.

    First, it divides the heap into multiple areas – to reduce the area that must be combed during a garbage collection cycle.

    Second, it ensures that each area holds objects of a given generation. The permanent generation hold objects that are relatively permanent; the old generation holds objects that are considered elderly; and the young generation holds objects that are relative newcomers (either newborn or middle aged). This ensures that most of the collection activity can be focused on the small area of the heap that contains the volatile young generation; with old generation collection being required only rarely.

Garbage collectors in the Java world, contain one or more of the above characteristics. For instance, the standard serial collector is a generational, tracing, Stop-the-World, and compacting, type of collector.

The Young Generation area is sub divided further into Eden and two Survivor spaces. At any given time, there is only one “active” survivor space (labeled “From”). The inactive space is labeled “To”. After each successful garbage collection cycle within the Young Generation, the active and inactive survivor spaces switch roles.

Let us assume that 2 new allocations (the green checkerboard pattern) are being requested by the application. If there is enough memory available in Eden, the allocation is made in Eden (as shown), and the application proceeds with no latency being experienced.

The diagram shows that the first allocation succeeded and now resides on the heap.

However, the second allocation request was larger than available space in the Eden! The JVM is unable to fulfill the request, and so it awakens the garbage collector to come to its aid. This results in a Young Generation collection cycle.

The Young Generation collector begins by performing the reachability test on all objects in Eden and in the currently active Survivor space (labeled From). The live objects are identified using the reachability test. Next, it begins copying the active objects out into either the inactive Survivor space (if the object is fairly new) or into the Old Generation area (if the object has survived a number of previous young generation collections). This results in the heap as shown:

A few things to note here:

  1. The newly allocated object is created in Eden (as we’d expect)
  2. The active and inactive Survivor spaces switch roles (and are re-labeled) at the end of the garbage collection cycle.
  3. The previous contents of the Young Generation [Eden + previous From] have now been copied over either to the inactive Survivor space or to the Old Generation area. Here, 2 middle-aged active objects (blue) from the previous From, and 2 newborn active objects (green) from Eden have been copied over to the inactive Survivor space; and 1 elderly active object (blue) from the previous From has been copied over to the Old Generation.
  4. The requested object allocation (checkerboard green) was now satisified from Eden, and it now resides there.
  5. Both Eden and the previously inactive survivor space (To) are nicely defragmented, with active objects on one side and free space on the other. (I’ve not shown the areas marked as garbage in these areas).

This is a fairly quick and painless collection operation, and even if the collector were to stop the world, the pause would be rather unnoticeable. Hence, a Young Generation collection is also termed a “Minor Collection”.

So, what happens when the Old Generation itself becomes full? Let’s reimagine our heap as shown below, with a new object allocation waiting to be satisfied.

In this case, a Young Generation collection is called for because there isn’t enough space available in Eden to satisfy this allocaiton request. However, unlike the earlier scenario, the promotion of the middle-aged object from the active Survivor space to being an elderly object in the old generation cannot proceed because there is no free space available in the Old Generation. In this case, the Young Generation collection is put on hold, while an Old Generation collection cycle is spun up to free up space in the Old Generation.

This results in what is termed a Major Collection, since this activity is expensive (given the size of the entire heap), and the pause suffered by the application (for Stop The World collectors) is noticeable.

However, in the end, the net result of the heap should be exactly as shown as the result of the Young Collection.

This ends part 2 of this series.

Continue on to part 3 >

Garbage collection in the JVM is often treated as a dark art. We know that we’re supposed to be thankful to the JVM for freeing us from worrying about the intricacies of memory management. At the same time, we’d like to retain some amount of control over this process as well. The challenge for most developers is that we’re not sure how.

In this series of posts, I’m going to discuss the garbage collection mechanisms available in Java 6, and how we might train it to do our bidding.

Languages such as C required the programmer to be keenly aware of the memory requirements of a program. You knew exactly how many bytes you needed for a data structure, as well as how to request the operating system for those bytes in memory. With that kind of power came the responsibility of ensuring that you freed this memory once you were done with it. Without the explicit freeing of this memory, the data structure was locked in memory, remaining unavailable both to your program as well as to the operating system. It is easy to imagine how a poorly written program could starve itself by being profligate in its misuse of memory.

A new world order was established with the introduction of Java and the managed memory model it introduced. No more did the application control the allocation and freeing of memory – all that was now the purview of the virtual machine. All that an application did was use the new operator to allocate an object, and all the magic happened behind the scenes. Not only was the memory allocated automatically from the heap, but also was freed automatically whenever that object was no longer needed. This article describes the magic behind this mechanism. The Java heap is an area in memory that is allocated to the virtual machine, and is used to meet the memory needs of a running application. This block of memory is specified using the -Xms and -Xmx VM parameters, as shown:

java -Xms256m -Xmx512m …

This informs the java interpreter that we are requesting a starting heap size of 256MB. If the application’s memory needs exceed that limit, then the JVM may request additional memory from the operating system, until the maximum limit is reached, at 512MB. If an application requires memory beyond this maximum limit, then an Out of Memory error will result. An optimization is to set both these values to the same number. This ensures that the maximum allowable memory is allocated at one shot, and no further dynamic expansion of the heap has to happen.

Heap Structure

The JVM’s heap is not simply a linear byte array. Instead, it is comprised of the following 3 areas:

Permanent Area

This area contains Class and  Method objects for the classes required by the application. This area is not constrained by the limit imposed by the -Xmx parameter.

The size of this area is managed using the -XX:PermSize and -XX:MaxPermSize JVM parameters. The former sets the initial size, while the latter sets the maximum size of this area. Again, in order to prevent the dynamic growing of this space, and the resulting slowdown as the garbage collector kicks in, you could set both these parameters to the same value. Further, make sure you set this space large enough to hold all the classes needed by your application – else your application will fail with an error that indicates that you are out of PermGen space – even though your heap may have plenty of headroom available. This area was called “permanent” because older JVMs (prior to 1.4) would never garbage collect this area. Objects loaded into here were locked in place until the JVM exited. Newer VMs provide the -noclassgc parameter that lets you tune this behavior. If this parameter is not set, the JVM will garbage collect within this area if it needs memory, especially during a Full Collection cycle (we’ll see more about this in a bit).

Young Generation Area

Most objects created by an application are ephemeral – they only live for a very short period of time. It makes sense therefore for such objects to be confined to a fairly small sandbox that can be combed through on a very frequent basis. This improves the efficiency of the collection operation for two reasons – first, collections tend to be much faster when the area to comb is small; and second, the results of a collection tend to be much more productive as most of the objects created here are short lived and so their space can be readily reclaimed.

Old Generation Area

This area generally contains objects that are fairly long lived, i.e., those that have survived multiple collection cycles within the young generation area. The idea behind promoting long lived objects here is to avoid the overhead of continually managing these objects in the young generation area. I.e., it helps optimize the young generation collections to not have to bother with objects that are known to be long lived. This is often much larger than the new generation area, and so garbage collection is much more involved here.

<End of part 1>

Continue on to part 2 >

Here are a few reviews that discuss my book:

http://www.amazon.com/review/R6B6I68PW7FPD/ref=cm_cr_rdp_perm

http://blog.bielu.com/2010/01/tomcat-6-developers-guide-book-review.html

http://blog.bielu.com/2010/01/tomcat-6-developers-guide-book-review.html#links

http://java.dzone.com/articles/tomcat-6-developer’s-guide

I had a chance to discuss Tomcat (and my book) at the Object Technology User Group (OTUG) meeting at the University of St Thomas.

I’ve read that people are often more scared of death than of public speaking. And, there’s nothing more terrifying than crashing and burning in front of a live audience. In many ways, I consider these nothing less than interviews with the entire set of attendees comprising the panel!

So I treat every presentation with the greatest respect.

Devi had attended my previous presentation at the Tek Systems Java Users Group back in December and had given me some nice pointers on how to spruce up my presentation. All of which I incorporated into my delivery this time round.

I follow a simple three point strategy for all my presentations.

First, I go heavy on diagrams and light on the text. I like to leave an unadorned diagram on the screen, and then paint word pictures to fill in the gaps. The upside is that everyone remains focused on me, while allowing me the flexibility of tailoring the discussion based on the interests/experience profile of the group.

I’ve seen too many presentations where the presenter is simply reading off of the slides. My solution is to simply not have any text on them :)

Second, I divide my presentation into core and optional areas. I have a single core track that absolutely needs to be covered. Then I have one of two optional tracks that I can pick from depending on the time available or on  audience interest.

At OTUG, we had almost 2.5 hours, so I covered the core aspects of chapters 1 through 3. Once that was done, I continued on with my optional track of Java class loading mechanics. Time permitting, I would have been able to go over my other optional track (which I left unused) – a live demonstration of Tomcat and a sample web application running dissected within Eclipse.

(I have a third optional track that I’m developing … but its not ready for sharing just yet. )

Finally, I’m an animated presenter and I get very excited when talking about Tomcat and advanced Java concepts. So my third strategy is to play to my strengths.

The best presenters I’ve seen are masters at closing the distance between the audience and themselves – for instance by using humor, or by their infectious enthusiasm for a topic.  I try hard to convey my excitement by engaging the audience in a discussion on Tomcat; and by staying agile around the podium. Its not uncommon to see me gesticulating animatedly at various parts of the screen, or walking towards the audience to eliminate the physical distance between myself and the audience.

The audience was very receptive and very gracious – and so the talk went on for almost a half hour longer than expected. We started at around 6:00 and were done at 8:30. There were interesting conversations that followed, and I finally made my way home only around 10pm!

I had hoped to put the kids to bed, but they were already fast asleep by the time I got in.

So, Two down, Four more to go!

Dec 8, 2010 – Tek Systems JUG
Feb 16, 2010 – OTUG

My upcoming events:

Mar 8, 2010 – Twin Cities Java Users Group
Mar 30, 2010 – Madison JUG
Mar 31, 2010 – Milwaukee JUG
Apr 20, 2010 – Chicago JUG

Folks who know me are aware that I’m easily distracted. Turns out all my good intentions of reviewing Roxio Creator 2010 were put on hold when I went home on Friday to find a book on video editing waiting at my front door. Soon after I opened the package, I was hooked, and Roxio was soon a distant memory.

The book was interesting enough that I read it over every free moment I had (even while at a basketball game) until it was done.

I assess every book that I read using a very simple yardstick – “Was it worth the investment of my time in reading this book?”. I’m willing to overlook minor issues as long as this basic test is met – and this book passed that test with flying colors.

So what did I learn?

First – make sure that you shoot only as much video as you think is absolutely necessary. With how cheap it is to shoot video, I’m often tempted to “overshoot” – and this is a major hassle in post production and archival. On a recent vacation, I ended up with over 2 hours of video, that I’d like to edit down to 20 minutes. Even after I eliminate all the obvious shaky camera and bad exposure bits, I’m still stuck with too much video to consider.

Second – focus on telling a story. With this vacation, I probably should have shot establishing shots of us arriving at the airport, at the resort, at major events during the stay, the kids having fun, and the final departure. Instead, I had a hodge podge of sequences that is a challenge for me to edit into any form of story. At least Mr Cameron has no competition from me in this regard.

The subtext of this book is “Storytelling with HD cameras”, and sure enough the author demonstrates obvious knowledge and depth of understanding of all things High Def. 

For instance, I’ve always been quite disenchanted with 24p video and couldn’t quite figure out why I wasn’t as excited about the “film” look as everyone else seemed to be. I found the strobing at 24p to be so annoying, that I almost never use that mode. This book finally explains why 1080p @ 60i can make video seem smoother than at 24p.

There’s a lot of good stuff here – right from how to pick your first camcorder (no, resolution is not that critical; and yes, auto mode is the devil), to picking your accessories (an entire section on tripods!). In my opinion,  this should be a must-buy purchase before you get yourself a new HD camcorder.

This is the kind of book that will need repeated visits in order to internalize its advice.

For a more detailed review of this book – check out my review on Amazon.com.

As of January 20, 2010 Comcast is providing its subscribers with free access to the Norton Security Suite. This is huge news for those who are Symantec fans, as well as for those who can barely tolerate McAfee.

As you can tell, I’m excited by this move, and immediately installed Norton – replacing the Kaspersky software that I had running on one of my notebooks. And I love it!

I can’t wait to replace the McAfee installs on my other computers.

The free license is valid for up to 7 workstations for residential subscribers.

Check out this press release from Symantec.

Then head on out here to grab the install from Comcast.

Roxio Creator 2010I’m a sucker for video editing tools – there’s something very satisfying about assembling a “watchable” video out of a mess of home video footage.

Fortunately,  I had over 160 minutes of HD footage from a recent vacation, that was screaming out for treatment.
So, I decided to see for myself how this software would fare under fairly adverse conditions. I installed it on a laptop that has decidedly seen better days, and … so far so good.

After a fairly painless installation sequence, during which it fetched SP1, I’m yet to experience a crash. Fingers crossed. I’ve imported about 60 minutes of footage, have been playing with scene detection (VideoWave) and automatic movie creation (CineMagic – I hate it!), and audio/video capture from the web.

Its the old tale of the cobbler’s children, I guess, but my wife says my web site looks like it was done by a 10 year old.

So, I went looking for options to pretty it up without taking up any more of that resource that I already have a deficit of … Time.

The options were quickly narrowed to using Joomla, Drupal, or WordPress.

Given that most of my content takes the form of blog posts, I chose to give WordPress a whirl first.

And, I’m glad that I did. What a marvellous piece of software this has turned out to be.

But, since this is not a review of the software – that’s all I’ll say about it for now.

I needed a book that would get me started quickly, and show me the ropes, without bogging me down in the details. There’ll be time for that once I decide that this is the way forward.

So, one weekend with the book, my hosting provider, and the software – and I’m already knowledgeable enough to be really dangerous!

I really enjoyed the book – it provided clear explanations of all the various options available within WordPress, without the unnecessarily cutesy graphics and vapid attempts at humor. I particularly appreciated its no-nonsense, practical, workbook style.

I’m now seriously wondering whether I even need to try out the others.

WordPress seems to have everything that I need, and with the tremendous supply of customized themes, widgets, and plugins, the web site should now at least look like its been done by a “competent 10-year old” :0)

Click here for a full review of this book on Amazon.

iDrive

I used to be rather proud of my generational backup system, and my techniques for off site storage, until I accidentally formatted my backup drive proving that no backup solution is idiot-proof.

After spending almost $150 to purchase hard drive recovery software (File Recovery from Seagate is also highly recommended), I figured that spending $50/month was not such a bad investment after all.

As an inveterate comparison shopper, I began my quest by running a Google search for reviews on all available online backup solutions.

Note that I wasn’t merely looking for an online file storage solution – I wanted to be assured that my system was going to be backed up regularly – whether or not it was a conscious action on my part.

Google apprised me of the obvious choices – there were Mozy, Carbonite, and a bevy of others.

However, the one that caught my eye, and which I decided to go with was iDrive. While there were not too many comments/reviews for this service, there were a few features that simply made it a no-brainer.

First, iDrive is one of the rare services that let you back up an external drive. 

Second, iDrive does not delete any files from your online account, unless you explicitly do so by forcing a sync with your computer/external storage.

Third, iDrive is almost a version control system, as it stores versions of your file. In addition, only the changes between versions is stored, so the full file size doesn’t count against your storage limits. Needless to say, you can pick the version of the file to restore.

Its hard to overstate the importance of these features. Carbonite Regular, for instance, automatically deletes files after 30 days if it no longer finds the file on your computer. This means that if you were to run out of space on your computer – you have 30 days to go upgrade your hard drive before the files cease to exist on the Carbonite servers. You’d have to upgrade to Carbonite Pro to get service that matches iDrive. I hated the fact that this critical piece of information was not clearly called out on most services.

I shoot in Camera RAW, and my hard drives are always bulging at the seams – so having all those files on my computer forever, was simply not an option. iDrive seemed ideal for my situation.

In addition, the system uses two levels of encryption. First there’s the login that is required to access the account, and the use of the secure HTTPS protocol to transmit your files over to iDrive. Second, there’s an additional password that iDrive claims is not stored anywhere on the iDrive servers, which is used to further encrypt the files with 256-bit AES encryption.

There’s also very convenient access to your files using a web browser interface.

I’ve been using iDrive for about 6 months now, and I must say that I’m relieved not to have to actively think about storage and backup.