Friday, February 18, 2011

Chapter 35: Serialization

Serialization is one of the important topics in Java which is widely used. Though you may not get too many questions from this topic, you can definitely expect a few and learning this concept would be useful from a java developer point of view as well.

So, let’s get started!!!

Imagine you want to save the state of one or more objects. If Java didn’t have serialization, you’d have to use one of the I/O classes to write out the state of the instance variables of all the objects you want to save. The worst part would be trying to reconstruct new objects that were virtually identical to the objects you were trying to save. You’d need your own protocol for the way in which you wrote and restored the state of each object, or you could end up setting variables with the wrong values. For example, imagine you stored an object that has instance variables for height and weight. At the time you save the state of the object, you could write out the height and weight as two ints in a file, but the order in which you write them is crucial. It would be all too easy to re-create the object but mix up the height and weight values—using the saved height as the value for the new object’s weight and vice versa.
The purpose of Serialization is to help us achieve whatever complicated scenario we just witnessed in an easier manner.

Working with ObjectOutputStream and ObjectInputStream

The magic of basic serialization happens with just two methods: one to serialize objects and write them to a stream, and a second to read from the stream and deserialize the object.

ObjectOutputStream.writeObject() - serialize and write

ObjectInputStream.readObject() - read and deserialize

The java.io.ObjectOutputStream and java.io.ObjectInputStream classes are considered to be higher-level classes in the java.io package, and as we learned in the previous chapter that means that you’ll wrap them around lower-level classes, such as java.io.FileOutputStream and java.io.FileInputStream. Here’s a small program that creates an object, serializes it, and then deserializes it:

import java.io.*;

class Car implements Serializable { } // 1

public class SerializeCar {
public static void main(String[] args) {
Car c = new Car(); // 2
try {
FileOutputStream fs = new FileOutputStream("testSer.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(c); // 3
os.close();
} Catch (Exception e) { e.printStackTrace(); }

try {
FileInputStream fis = new FileInputStream("testSer.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
c = (Car) ois.readObject(); // 4
ois.close();
} Catch (Exception e) { e.printStackTrace(); }
}
}

Let’s take a look at the key points in this example:

1. We declare that the Car class implements the Serializable interface. Serializable is a marker interface; it has no methods to implement.
2. We make a new Car object, which as we know is serializable.
3. We serialize the Car object c by invoking the writeObject() method. First, we had to put all of our I/O-related code in a try/Catch block. Next we had to create a FileOutputStream to write the object to. Then we wrapped the FileOutputStream in an ObjectOutputStream, which is the class that has the magic serialization method that we need. Remember that the invocation of writeObject() performs two tasks: it serializes the object, and then it writes the serialized object to a file.
4. We de-serialize the Car object by invoking the readObject() method. The readObject() method returns an Object, so we have to cast the deserialized object back to a Car. Again, we had to go through the typical I/O hoops to set this up.

This is a simple example of serialization in action. Let us now look at more complicated examples of Serialization.

Serializing Objects

What does it really mean to save an object? If the instance variables are all primitive types, it’s pretty straightforward. But what if the instance variables are themselves references to objects? What gets saved? Clearly in Java it wouldn’t make any sense to save the actual value of a reference variable, because the value of a Java reference has meaning only within the context of a single instance of a JVM. In other words, if you tried to restore the object in another instance of the JVM, even running on the same computer on which the object was originally serialized, the reference would be useless.

But what about the object that the reference refers to? Look at this class:
class Car {
private Engine theEngine;
private int CarSize;
public Car(Engine Engine, int size) {
theEngine = Engine;
CarSize = size;
}
public Engine getEngine() { return theEngine; }
}
class Engine {
private int EngineSize;
public Engine(int size) { EngineSize = size; }
public int getEngineSize() { return EngineSize; }
}

Now make a Car... First, you make a Engine for the Car:
Engine c = new Engine(3);

Then make a new Car, passing it the Engine:
Car d = new Car(c, 8);

Now what happens if you save the Car? If the goal is to save and then restore a Car, and the restored Car is an exact duplicate of the Car that was saved, then the Car needs a Engine that is an exact duplicate of the Car’s Engine at the time the Car was saved. That means both the Car and the Engine should be saved.

And what if the Engine itself had references to other objects—like perhaps a Color object? This gets quite complicated very quickly. If it were up to the programmer to know the internal structure of each object the Car referred to, so that the programmer could be sure to save all the state of all those objects. That would be a nightmare with even the simplest of objects.

Fortunately, the Java serialization mechanism takes care of all of this. When you serialize an object, Java serialization takes care of saving that object’s entire “object graph.” That means a deep copy of everything the saved object needs to be restored. For example, if you serialize a Car object, the Engine will be serialized automatically. And if the Engine class contained a reference to another object, THAT object would also be serialized, and so on. And the only object you have to worry about saving and restoring is the Car. The other objects required to fully reconstruct that Car are saved (and restored) automatically through serialization.

Remember, you do have to make a conscious choice to create objects that are serializable, by implementing the Serializable interface. If we want to save Car objects, for example, we’ll have to modify the Car class as follows:
class Car implements Serializable {
// the rest of the code as before
// Serializable has no methods to implement
}
And now we can save the Car with the following code:
import java.io.*;
public class SerializeCar {
public static void main(String[] args) {
Engine c = new Engine(3);
Car d = new Car(c, 8);
try {
FileOutputStream fs = new FileOutputStream("testSer.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(d);
os.close();
} Catch (Exception e) { e.printStackTrace(); }
}
}

But when we run this code we get a runtime exception something like this
java.io.NotSerializableException: Engine

What did we forget? The Engine class must ALSO be Serializable. If we modify the Engine class and make it serializable, then there’s no problem:

class Engine implements Serializable {
// same
}

Here’s the complete code:
import java.io.*;
public class SerializeCar {
public static void main(String[] args) {
Engine c = new Engine(3);
Car d = new Car(c, 5);
System.out.println("before: Engine size is "
+ d.getEngine().getEngineSize());
try {
FileOutputStream fs = new FileOutputStream("testSer.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(d);
os.close();
} Catch (Exception e) { e.printStackTrace(); }
try {
FileInputStream fis = new FileInputStream("testSer.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
d = (Car) ois.readObject();
ois.close();
} Catch (Exception e) { e.printStackTrace(); }

System.out.println("after: Engine size is "
+ d.getEngine().getEngineSize());
}
}
class Car implements Serializable {
private Engine theEngine;
private int CarSize;
public Car(Engine Engine, int size) {
theEngine = Engine;
CarSize = size;
}
public Engine getEngine() { return theEngine; }
}
class Engine implements Serializable {
private int EngineSize;
public Engine(int size) { EngineSize = size; }
public int getEngineSize() { return EngineSize; }
}

This produces the output:
before: Engine size is 3
after: Engine size is 3

But what would happen if we didn’t have access to the Engine class source code? In other words, what if making the Engine class serializable was not an option? Are we stuck with a non-serializable Car?

Obviously we could subclass the Engine class, mark the subclass as Serializable, and then use the Engine subclass instead of the Engine class. But that’s not always an option either for several potential reasons:

1. The Engine class might be final, preventing subclassing.
OR
2. The Engine class might itself refer to other non-serializable objects, and without knowing the internal structure of Engine, you aren’t able to make all these fixes (assuming you even wanted to TRY to go down that road).
OR
3. Subclassing is not an option for other reasons related to your design.
So...THEN what do you do if you want to save a Car?
That’s where the transient modifier comes in. If you mark the Car’s Engine instance variable with transient, then serialization will simply skip the Engine during serialization:
class Car implements Serializable {
private transient Engine theEngine;
// the rest of the class as before
}

class Engine {
// same code
}

Now we have a Serializable Car, with a non-serializable Engine, but the Car has marked the Engine transient; the output is
before: Engine size is 3
Exception in thread "main" java.lang.NullPointerException

This null pointer exception came up during de-serialization because, the engine which was transient was not serialized and hence was null and when we were reversing the process, the system did not know what to do with the engine object.

Unfortunately this is something we have to live with or we can consider other options which we will see shortly.

Using writeObject and readObject

Consider the problem: we have a Car object we want to save. The Car has a Engine, and the Engine has state that should also be saved as part of the Car’s state. But...the Engine is not Serializable, so we must mark it transient. That means when the Car is deserialized, it comes back with a null Engine. What can we do to somehow make sure that when the Car is deserialized, it gets a new Engine that matches the one the Car had when the Car was saved?

Java serialization has a special mechanism just for this—a set of private methods you can implement in your class that, if present, will be invoked automatically during serialization and deserialization. It’s almost as if the methods were defined in the Serializable interface, except they aren’t. They are part of a special callback contract the serialization system offers you that basically says, “If you have a pair of methods matching this exact signature, these methods will be called during the serialization/deserialization process.

These methods let you step into the middle of serialization and deserialization. So they’re perfect for letting you solve the Car/Engine problem: when a Car is being saved, you can step into the middle of serialization and say, “By the way, I’d like to add the state of the Engine’s variable (an int) to the stream when the Car is serialized.” You’ve manually added the state of the Engine to the Car’s serialized representation, even though the Engine itself is not saved.
Of course, you’ll need to restore the Engine during deserialization by stepping into the middle and saying, “I’ll read that extra int I saved to the Car stream, and use it to create a new Engine, and then assign that new Engine to the Car that’s being deserialized.” The two special methods you define must have signatures that look EXACTLY like this:

private void writeObject(ObjectOutputStream os) {
// your code for saving the Engine variables
}

private void readObject(ObjectInputStream is) {
// your code to read the Engine state, create a new Engine,
// and assign it to the Car
}

Yes, we’re going to write methods that have the same name as the ones we’ve been calling! Where do these methods go? Let’s change the Car class:
class Car implements Serializable {
transient private Engine theEngine;
private int CarSize;
public Car(Engine Engine, int size) {
theEngine = Engine;
CarSize = size;
}
public Engine getEngine() { return theEngine; }
private void writeObject(ObjectOutputStream os) {
// throws IOException { // 1
try {
os.defaultWriteObject(); // 2
os.writeInt(theEngine.getEngineSize()); // 3
} Catch (Exception e) { e.printStackTrace(); }
}
private void readObject(ObjectInputStream is) {
// throws IOException, ClassNotFoundException { // 4
try {
is.defaultReadObject(); // 5
theEngine = new Engine(is.readInt()); // 6
} Catch (Exception e) { e.printStackTrace(); }
}
}
Let’s take a look at the preceding code.

In our scenario we’ve agreed that, for whatever real-world reason, we can’t serialize a Engine object, but we want to serialize a Car. To do this we’re going to implement writeObject() and readObject(). By implementing these two methods you’re saying to the compiler: “If anyone invokes writeObject() or readObject() concerning a Car object, use this code as part of the read and write.”

1. Like most I/O-related methods writeObject() can throw exceptions. You can declare them or handle them but we recommend handling them.
2. When you invoke defaultWriteObject() from within writeObject() you’re telling the JVM to do the normal serialization process for this object. When implementing writeObject(), you will typically request the normal serialization process, and do some custom writing and reading too.
3. In this case we decided to write an extra int (the Engine size) to the stream that’s creating the serialized Car. You can write extra stuff before and/or after you invoke defaultWriteObject(). BUT, when you read it back in, you have to read the extra stuff in the same order you wrote it.
4. Again, we chose to handle rather than declare the exceptions.
5. When it’s time to deserialize, defaultReadObject() handles the normal deserialization you’d get if you didn’t implement a readObject() method.
6. Finally we build a new Engine object for the Car using the Engine size that we manually serialized. (We had to invoke readInt() after we invoked defaultReadObject() or the streamed data would be out of sync!)

Remember, the most common reason to implement writeObject() and readObject() is when you have to save some part of an object’s state manually. If you choose, you can write and read ALL of the state yourself, but that’s very rare. So, when you want to do only a part of the serialization/deserialization yourself, you MUST invoke the defaultReadObject() and defaultWriteObject() methods to do the rest.

Which brings up another question—why wouldn’t all Java classes be serializable? Why isn’t class Object serializable? There are some things in Java that simply cannot be serialized because they are runtime specific. Things like streams, threads, runtime, etc. and even some GUI classes cannot be serialized. What is and is not serializable in the Java API is NOT part of the exam, but you’ll need to keep them in mind if you’re serializing complex objects.

How Inheritance Affects Serialization

Serialization is very cool, but in order to apply it effectively you’re going to have to understand how your class’s superclasses affect serialization.

Exam Tip: If a superclass is Serializable, then according to normal Java interface rules, all subclasses of that class automatically implement Serializable implicitly. In other words, a subclass of a class marked Serializable passes the IS-A test for Serializable, and thus can be saved without having to explicitly mark the subclass as Serializable. You simply cannot tell whether a class is or is not Serializable UNLESS you can see the class inheritance tree to see if any other superclasses implement Serializable. If the class does not explicitly extend any other class, and does not implement Serializable, then you know for CERTAIN that the class is not Serializable, because class Object does NOT implement Serializable.

That brings up another key issue with serialization...what happens if a superclass is not marked Serializable, but the subclass is? Can the subclass still be serialized even if its superclass does not implement Serializable? Imagine this:

class Automobile { }
class Car extends Automobile implements Serializable {
// the rest of the Car code
}

Now you have a Serializable Car class, with a non-Serializable superclass. This works! But there are potentially serious implications. To fully understand those implications, let’s step back and look at the difference between an object that comes from deserialization vs. an object created using new. Remember, when an object is constructed using new (as opposed to being deserialized), the following things happen (in this order):

1. All instance variables are assigned default values.
2. The constructor is invoked, which immediately invokes the superclass constructor (or another overloaded constructor, until one of the overloaded constructors invokes the superclass constructor).
3. All superclass constructors complete.
4. Instance variables that are initialized as part of their declaration are assigned their initial value (as opposed to the default values they’re given prior to the superclass constructors completing).
5. The constructor completes.

But these things do NOT happen when an object is deserialized. When an instance of a serializable class is deserialized, the constructor does not run, and instance variables are NOT given their initially assigned values! Think about it—if the constructor were invoked, and/or instance variables were assigned the values given in their declarations, the object you’re trying to restore would revert back to its original state, rather than coming back reflecting the changes in its state that happened sometime after it was created. For example, imagine you have a class that declares an instance variable and assigns it the int value 3, and includes a method that changes the instance variable value to 10:

class DoSomething implements Serializable {
int num = 3;
void changeNum() { num = 10; }
}

Obviously if you serialize a DoSomething instance after the changeNum() method runs, the value of the num variable should be 10. When the DoSomething instance is deserialized, you want the num variable to still be 10! You obviously don’t want the initialization (in this case, the assignment of the value 3 to the variable num) to happen. Think of constructors and instance variable assignments together as part of one complete object initialization process. The point is, when an object is deserialized we do NOT want any of the normal initialization to happen. We don’t want the constructor to run, and we don’t want the explicitly declared values to be assigned. We want only the values saved as part of the serialized state of the object to be reassigned.

Of course if you have variables marked transient, they will not be restored to their original state, but will instead be given the default value for that data type. In other words, even if you say

class DoAnotherThing implements Serializable {
transient int x = 42;
}

when the DoAnotherThing instance is deserialized, the variable x will be set to a value of 0. Object references marked transient will always be reset to null, regardless of whether they were initialized at the time of declaration in the class.

So, that’s what happens when the object is deserialized, and the class of the serialized object directly extends Object, or has ONLY serializable classes in its inheritance tree. It gets a little trickier when the serializable class has one or more non-serializable superclasses. Getting back to our non-serializable Automobile class with a serializable Car subclass example:

class Automobile {
public String name;
}
class Car extends Automobile implements Serializable {
// the rest of the Car code
}

Because Automobile is NOT serializable, any state maintained in the Automobile class, even though the state variable is inherited by the Car, isn’t going to be restored with the Car when it’s deserialized! The reason is, the (unserialized) Automobile part of the Car is going to be reinitialized just as it would be if you were making a new Car (as opposed to deserializing one). That means all the things that happen to an object during construction, will happen—but only to the Automobile parts of a Car. In other words, the instance variables from the Car’s class will be serialized and deserialized correctly, but the inherited variables from the non-serializable Automobile superclass will come back with their default/initially assigned values rather than the values they had at the time of serialization.

If you are a serializable class, but your superclass is NOT serializable, then any instance variables you INHERIT from that superclass will be reset to the values they were given during the original construction of the object. This is because the non-serializable class constructor WILL run!

In fact, every constructor ABOVE the first non-serializable class constructor will also run, no matter what, because once the first super constructor is invoked, (during deserialization), it of course invokes its super constructor and so on up the inheritance tree.
For the exam, you’ll need to be able to recognize which variables will and will not be restored with the appropriate values when an object is deserialized, so be sure to study the following code example and the output:

import java.io.*;
class SerializationWithInheritance {
public static void main(String [] args) {

Car d = new Car(35, "Ferrari");
System.out.println("before: " + d.name + " "
+ d.weight);
try {
FileOutputStream fs = new FileOutputStream("testSer.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(d);
os.close();
} Catch (Exception e) { e.printStackTrace(); }
try {
FileInputStream fis = new FileInputStream("testSer.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
d = (Car) ois.readObject();
ois.close();
} Catch (Exception e) { e.printStackTrace(); }

System.out.println("after: " + d.name + " "
+ d.weight);
}
}
class Car extends Automobile implements Serializable {
String name;
Car(int w, String n) {
weight = w; // inherited
name = n; // not inherited
}
}
class Automobile { // not serializable !
int weight = 42;
}

which produces the output:
before: Ferrari 35
after: Ferrari 42

The key here is that because Automobile is not serializable, when the Car was deserialized, the Automobile constructor ran and reset the Car’s inherited weight variable.
Exam Tip: If you serialize a collection or an array, every element must be serializable! A single non-serializable element will cause serialization to fail. Note also that while the collection interfaces are not serializable, the concrete collection classes in the Java API are.

Serialization Is Not for Static Elements

Finally, you might notice that we’ve talked ONLY about instance variables, not static variables. Should static variables be saved as part of the object’s state? Isn’t the state of a static variable at the time an object was serialized important? Yes and no. It might be important, but it isn’t part of the instance’s state at all. Remember, you should think of static variables purely as CLASS variables. They have nothing to do with individual instances. But serialization applies only to OBJECTS. And what happens if you deserialize three different Car instances, all of which were serialized at different times, and all of which were saved when the value of a static variable in class Car was different. Which instance’s static value would be used to replace the one currently in the one and only Car class that’s currently loaded? See the problem?

Static variables are NEVER saved as part of the object’s state...because they do not belong to the object!

Tip:
As simple as serialization code is to write, versioning problems can occur in the real world. If you save a Car object using one version of the class, but attempt to deserialize it using a newer, different version of the class, deserialization might fail. See the Java API for details about versioning issues and solutions.

Previous Chapter: Chapter 34 - File Navigation & I/O

Next Chapter: Chapter 36: Using Dates, Numbers and Currency

16 comments:

  1. Great post man , you indeed cover the topic in details and complement my post Top 10 Java Serialization Interview Question

    Javin

    ReplyDelete
  2. very nice article... get extra and detailed information here.

    thanks.

    ReplyDelete
  3. Really very good , Thanks

    ReplyDelete
  4. It is a nice explanation in details... It cleared many things about Serialization. I am very impressed and bookmarked your site to get other topics clarified. For versioning, we need to define the Serial version UID in both of the classes and that should be the same as well. The version exception (ClassCastException) occurs when any changes happen in the serialized class during its deserialization so version should always be controlled...

    Thanks,
    Deepesh

    ReplyDelete
  5. It is a good tutorials in a simple language. Thanks

    ReplyDelete
  6. Nice article ... thanks a lot for sharing !!

    ReplyDelete
  7. Awesome explanation. Keep up the good work!

    ReplyDelete
  8. Awesome explanation.. Covered each and every concept of Serialization in detail.. i have never seen such a great blog post.. it is explained in very easy way.. Thanks a lot buddy .. I have become your fan.. Keep it up.. good luck.. :)

    ReplyDelete
  9. Great explanation in simple word! Keep up the good work.....

    ReplyDelete
  10. Amazing article....awesome explaination.. very descriptive n informative.... :) Covered the topic so well !!!

    ReplyDelete
  11. Superb article i read "serialization topic" in effective java and was so confused. Only after reading this article things started making sense. "What is the use of a great piece of technology if no one can understand it"

    ReplyDelete
  12. ultimate article...pls dont remove this post ever

    ReplyDelete
  13. Ultimate article...keep posting such articles...and never remove it ..
    i was always confused with serialization...but this article removes most of the doubts..
    i have a question though....

    1. Is deserialization effected by the change in return type of methods.As i understand yes it should be because we are changing the class.
    2. if i serialize a class and the then add 2 new methods in it, and then try to deserialize it, will it give any issues?..after deserializing , i am only using the old methods(not the new ones)
    3. How does the serialization happen across platforms..A java object being sent to .Net application using Json Api.How does that mechanism work?

    ReplyDelete
    Replies
    1. Shobit - I did not understand the Q no. 1
      2. When you deserialize an object, you need to explicitly read the methods/details that were added before serialization. As long as you read/use the old ones it should be ok
      3. The purpose of serialization is to create a stream of bytes that represent the state of an object. As long as the receiving system can read and understand this stream of bytes there should be no issue. This is one of the key benefits of Java.

      Delete

© 2013 by www.inheritingjava.blogspot.com. All rights reserved. No part of this blog or its contents may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without prior written permission of the Author.

ShareThis

Google+ Followers

Followers