Thursday, February 3, 2011

Chapter 21: Garbage Collection

This chapter is probably going to be the most important or rather most confusing chapter from the exam perspective. You can expect atleast a few questions from this chapter, so it is better if you pay close attention to the topics covered in this chapter.

Overview of Memory Management and Garbage Collection

Memory management is a crucial element in many types of applications. Consider a program that reads in large amounts of data, say from somewhere else on a network, and then writes that data into a database on a hard drive. A typical design would be to read the data into some sort of collection in memory, perform some operations on the data, and then write the data into the database. After the data is written into the database, the collection that stored the data temporarily must be emptied of old data or deleted and recreated before processing the next batch. This operation might be performed thousands of times, and in languages like C or C++ that do not offer automatic garbage collection, a small flaw in the logic that manually empties or deletes the collection data structures can allow small amounts of memory to be improperly reclaimed or lost forever. These small losses are called memory leaks, and over many thousands of iterations they can make enough memory inaccessible that programs will eventually crash. Creating code that performs manual memory management cleanly and thoroughly is a complex task.
Java’s garbage collector provides an automatic solution to memory management. In most cases it frees you from having to add any memory management logic to your application. The downside to automatic garbage collection is that you can’t completely control when it runs and when it doesn’t.

Overview of Java’s Garbage Collector

Garbage collection is the phrase used to describe automatic memory management in Java. Whenever a software program executes (in any programming language for that matter), it uses memory in several different ways. We’re not going to get into Computer Science 101 here, but it’s typical for memory to be used to create a stack, a heap, in Java’s case constant pools, and method areas. The heap is that part of memory where Java objects live, and it’s the one and only part of memory that is in any way involved in the garbage collection process.

If you remember, we took a look at Heap a few chapters ago. For the exam it’s important to know that you can call it the heap, you can call it the garbage collectible heap.

So, all of garbage collection revolves around making sure that the heap has as much free space as possible. For the purpose of the exam, what this boils down to is deleting any objects that are no longer reachable by the Java program running. When the garbage collector runs, its purpose is to find and delete objects that cannot be reached. If you think of a Java program as being in a constant cycle of creating the objects it needs (which occupy space on the heap), and then discarding them when they’re no longer needed, creating new objects, discarding them, and so on, the missing piece of the puzzle is the garbage collector. When it runs, it looks for those discarded objects and deletes them from memory so that the cycle of using memory and releasing it can continue.

When Does the Garbage Collector Run?

The garbage collector is under the control of the JVM. The JVM decides when to run the garbage collector. From within your Java program you can ask the JVM to run the garbage collector, but there are no guarantees, under any circumstances, that the JVM will comply. Left to its own devices, the JVM will typically run the garbage collector when it senses that memory is running low. Experience indicates that when your Java program makes a request for garbage collection, the JVM will usually grant your request in short order, but there are no guarantees. I repeat, the JVM does not guarantee the execution of the garbage collector when you invoke it. It can execute it and opt to ignore your request totally because that’s how it works. We cant do a thing about it.

How Does the Garbage Collector Work?

You just can’t be sure. You might hear that the garbage collector uses a mark and sweep algorithm, and for any given Java implementation that might be true, but the Java specification doesn’t guarantee any particular implementation. You might hear that the garbage collector uses reference counting; once again maybe yes maybe no. The important concept to understand for the exam is when does an object become eligible for garbage collection? To answer this question fully, we have to jump ahead a little bit and talk about threads. (Don’t worry, We will take a detailed look at Threads in future.) In a nutshell, every Java program has from one to many threads. Each thread has its own little execution stack. Normally, the programmer causes at least one thread to run in a Java program, the one with the main() method at the bottom of the stack. However, there are many really cool reasons to launch additional threads from your initial thread. In addition to having its own little execution stack, each thread has its own lifecycle. For now, all we need to know is that threads can be alive or dead. With this background information, we can now say with stunning clarity and resolve that an object is eligible for garbage collection when no live thread can access it.

Based on that definition, the garbage collector does some magical, unknown operations, and when it discovers an object that can’t be reached by any live thread, it will consider that object as eligible for deletion, and it might even delete it at some point. When we talk about reaching an object, we’re really talking about having a reachable reference variable that refers to the object in question. If our Java program has a reference variable that refers to an object, and that reference variable is available to a live thread, then that object is considered reachable. We’ll talk more about how objects can become unreachable in the following section.

Can a Java application run out of memory? Yes. The garbage collection system attempts to remove objects from memory when they are not used. However, if you maintain too many live objects (objects referenced from other live objects), the system can run out of memory. Garbage collection cannot ensure that there is enough memory, only that the memory that is available will be managed as efficiently as possible.

Writing Code That Explicitly Makes Objects Eligible for Collection

In the preceding section, we learned the theories behind Java garbage collection. In this section, we show how to make objects eligible for garbage collection using actual code. We also discuss how to attempt to force garbage collection if it is necessary, and how you can perform additional cleanup on objects before they are removed from memory.

Nulling a Reference

As we discussed earlier, an object becomes eligible for garbage collection when there are no more reachable references to it. Obviously, if there are no reachable references, it doesn’t matter what happens to the object. For our purposes it is just floating in space, unused, inaccessible, and no longer needed.

The first way to remove a reference to an object is to set the reference variable that refers to the object to null. Examine the following code:

1. public class TestGarbageCollection {
2. public static void main(String [] args) {
3. StringBuffer sb = new StringBuffer("hello");
4. System.out.println(sb);
5. // The StringBuffer object is not eligible for collection
6. sb = null;
7. // Now the StringBuffer object is eligible for collection
8. }
9. }

The StringBuffer object with the value hello is assigned to the reference variable sb in the third line. To make the object eligible for GC, we set the reference variable sb to null, which removes the single reference that existed to the StringBuffer object. Once line 6 has run, our happy little hello StringBuffer object is doomed, eligible for garbage collection.

Reassigning a Reference Variable

We can also decouple a reference variable from an object by setting the reference variable to refer to another object. Examine the following code:
class TestGarbageCollection {
public static void main(String [] args) {
StringBuffer s1 = new StringBuffer("hello");
StringBuffer s2 = new StringBuffer("goodbye");
System.out.println(s1);
// At this point the StringBuffer "hello" is not eligible
s1 = s2; // Redirects s1 to refer to the "goodbye" object
// Now the StringBuffer "hello" is eligible for collection
}
}

Objects that are created in a method also need to be considered. When a method is invoked, any local variables created exist only for the duration of the method. Once the method has returned, the objects created in the method are eligible for garbage collection. There is an obvious exception, however. If an object is returned from the method, its reference might be assigned to a reference variable in the method that called it; hence, it will not be eligible for collection. Ex:

import java.util.Date;
public class TestGarbageColl {
public static void main(String [] args) {
Date d = getDate();
doSomething();
System.out.println("d = " + d);
}

public static Date getDate() {
Date d2 = new Date();
StringBuffer now = new StringBuffer(d2.toString());
System.out.println(now);
return d2;
}
}

In the preceding example, we created a method called getDate() that returns a Date object. This method creates two objects: a Date and a StringBuffer containing the date information. Since the method returns the Date object, it will not be eligible for collection even after the method has completed. The StringBuffer object, though, will be eligible, even though we didn’t explicitly set the now variable to null.

Isolating a Reference

There is another way in which objects can become eligible for garbage collection, even if they still have valid references! We call this scenario “Islands of isolation.”

A simple example is a class that has an instance variable that is a reference variable to another instance of the same class. Now imagine that two such instances exist and that they refer to each other. If all other references to these two objects are removed, then even though each object still has a valid reference, there will be no way for any live thread to access either object. When the garbage collector runs, it can usually discover any such islands of objects and remove them. As you can imagine, such Islands can become quite large, theoretically containing hundreds of objects. Examine the following code:

public class IslandTest {
IslandTest i;
public static void main(String [] args) {

IslandTest i2 = new IslandTest();
IslandTest i3 = new IslandTest();
IslandTest i4 = new IslandTest();

i2.i = i3; // i2 refers to i3
i3.i = i4; // i3 refers to i4
i4.i = i2; // i4 refers to i2

i2 = null;
i3 = null;
i4 = null;

// lots more code
}
}

When the code reaches // lots more code, the three IslandTest objects (previously known as i2, i3, and i4) have instance variables so that they refer to each other, but their links to the outside world (i2, i3, and i4) have been nulled. These three objects are eligible for garbage collection.

This covers everything you will need to know about making objects eligible for garbage collection.

Forcing Garbage Collection

First and foremost, unlike this paragraphs title, garbage collection cannot be forced. However, Java provides some methods that allow you to request that the JVM perform garbage collection.
In reality, it is possible only to suggest to the JVM that it perform garbage collection. However, there are no guarantees the JVM will actually remove all of the unused objects from memory (even if garbage collection is run). It is essential that you understand this concept for the exam.

The garbage collection routines that Java provides are members of the Runtime class. The Runtime class is a special class that has a single object (a Singleton) for each main program. The Runtime object provides a mechanism for communicating directly with the virtual machine. To get the Runtime instance, you can use the method Runtime.getRuntime(), which returns the Singleton. Once you have the Singleton you can invoke the garbage collector using the gc() method. Alternatively, you can call the same method on the System class, which has static methods that can do the work of obtaining the Singleton for you. The simplest way to ask for garbage collection (remember—just a request) is

System.gc();

Theoretically, after calling System.gc(), you will have as much free memory as possible. We say theoretically because this routine does not always work that way. First, your JVM may not have implemented this routine; the language specification allows this routine to do nothing at all. Second, another thread might grab lots of memory right after you run the garbage collector.
This is not to say that System.gc() is a useless method—it’s much better than nothing. You just can’t rely on System.gc() to free up enough memory so that you don’t have to worry about running out of memory. The Certification Exam is interested in guaranteed behavior, not probable behavior.

Now that we are somewhat familiar with how this works, let’s do a little experiment to see if we can see the effects of garbage collection. The following program lets us know how much total memory the JVM has available to it and how much free memory it has. It then creates 10,000 Date objects. After this, it tells us how much memory is left and then calls the garbage collector (which, if it decides to run, should halt the program until all unused objects are removed). The final free memory result should indicate whether it has run. Let’s look at the program:

1. import java.util.Date;
2. public class TestGCBehavior {
3. public static void main(String [] args) {
4. Runtime rt = Runtime.getRuntime();
5. System.out.println("Total JVM memory: " + rt.totalMemory());
6. System.out.println("Before Memory = " + rt.freeMemory());
7. Date d = null;

8. for(int i = 0;i<10000;i++) {

9. d = new Date();

10. d = null;

11. }

12. System.out.println("After Memory = " + rt.freeMemory());

13. rt.gc(); // an alternate to System.gc()

14. System.out.println("After GC Memory = " + rt.freeMemory());

15. }

16. }




Now, let’s run the program and check the results:

Total JVM memory: 1048568

Before Memory = 703008

After Memory = 458048

After GC Memory = 818272


Note: The numbers above may vary based on your system and JVM configuration.


As we can see, the JVM actually did decide to garbage collect (i.e., delete) the eligible objects. In the preceding example, we suggested to the JVM to perform garbage collection with 458,048 bytes of memory remaining, and it honored our request. This program has only one user thread running, so there was nothing else going on when we called rt.gc(). Keep in mind that the behavior when gc() is called may be different for different JVMs, so there is no guarantee that the unused objects will be removed from memory. About the only thing you can guarantee is that if you are running very low on memory, the garbage collector will run before it throws an OutOfMemoryException.

Cleaning up Before Garbage Collection—the finalize() Method

Java provides you a mechanism to run some code just before your object is deleted by the garbage collector. This code is located in a method named finalize() that all classes inherit from class Object. On the surface this sounds like a great idea; maybe your object opened up some resources, and you’d like to close them before your object is deleted. The problem is that, as you may have gathered by now, you can’t count on the garbage collector to ever delete an object. So, any code that you put into your class’s overridden finalize() method is not guaranteed to run. The finalize() method for any given object might run, but you can’t count on it, so don’t put any essential code into your finalize() method. In fact, we recommend that in general you don’t override finalize() at all.

There are a couple of concepts concerning finalize() that you need to remember.
• For any given object, finalize() will be called only once by the garbage collector.
• Calling finalize() can actually result in saving an object from deletion.
Let’s look into these statements a little further. First of all, remember that any code that you can put into a normal method you can put into finalize(). For example, in the finalize() method you could write code that passes a reference to the object in question back to another object, effectively uneligiblizing the object for garbage collection. If at some point later on this same object becomes eligible for garbage collection again, the garbage collector can still process this object and delete it. The garbage collector, however, will remember that, for this object, finalize() already ran, and it will not run finalize() again.

Previous Chapter: Chapter 20 - Overloading in Detail

Next Chapter: Quick Review - Chapters 15 to 21

4 comments:

  1. Thanks for your comment in my blog Anand. you have very good site keep it up.

    Javin
    Why String is immutable in Java

    ReplyDelete
  2. can you format the code in the article a little bit?
    A lot of posts can't be read clearly because of formatting of the code.

    Thanks,

    ReplyDelete
  3. @ Michee

    All posts in the SCJP Series are cleaned up. let me know if any more formatting issues are there.

    Apologies for the trouble. :) not sure how this happened cos they were all just fine when I posted them

    Anand

    ReplyDelete

© 2013 by www.inheritingjava.blogspot.com. All rights reserved. No part of this blog or its contents may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the Author.

ShareThis

Google+ Followers

Followers