Monday, August 5, 2013

ObjectOutputSteam -reset() - OutOfMemory Issues

Have you ever faced heap memory issue with ObjectOutputSteam?  Then article would be very much useful to understand the internals and resolve the issue.

When you construct an ObjectOutputStream and an ObjectInputStream, they each contain a cache of objects that have already been sent across this stream. The cache relies on object identity, rather than the traditional hashing function. It is more similar to a java.util.IdentityHashMap than a normal java.util.HashMap. So, if you resend the same object, only a pointer to the object is sent across the network. This is very clever, and saves network bandwidth. However, the ObjectOutputStream cannot detect whether your object was changed internally or not, resulting in the Receiver just seeing the same object over and over again

In other way, If you send the different object every time, this will keep on storing in the cache handle and it will be keep growing, and at one point of time we might come across the heap memory issue since cache is huge and never cleared cache on the both the ends.

To avoid these kind of issues, we need to reset() the ObjectOutputStream, and this will clear the cache on the both ends. But, again this is a costly operation so we need to do it with some constrains. That will be mentioned below.

  ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
  oos.writeObject("my object data");
  oos.reset();

This will discard the state of any objects already written to the stream. The state is reset to be the same as a new ObjectOutputStream

Please go through the below java specialist article on this. Considering they might modify/delete from there site and so I have coped entire content here for my reference.

Article @ http://www.javaspecialists.eu/archive/Issue088.html

Same article content below.

Resetting ObjectOutputStream

A class with many mysteries is java.io.ObjectOutputStream. For instance, when and why should you reset the stream?
Let's look at an example. First we have class Person, which is the class that we want to send over the network:
public class Person implements java.io.Serializable {
  private final String firstName;
  private final String surname;
  private int age;

  public Person(String firstName, String surname, int age) {
    this.firstName = firstName;
    this.surname = surname;
    this.age = age;
  }

  public String toString() {
    return firstName + " " + surname + ", " + age;
  }

  public void setAge(int age) {
    this.age = age;
  }
}
  
Next we have the code that Receives lots of Person objects and code that Sends them:
import java.net.*;
import java.io.*;

public class Receiver {
  public static void main(String[] args) throws Exception {
    ServerSocket ss = new ServerSocket(7000);
    Socket socket = ss.accept();
    ObjectInputStream ois = new ObjectInputStream(
        socket.getInputStream());
    int count=0;
    while(true) {
      Person p = (Person) ois.readObject();
      if (count++ % 1000 == 0) {
        System.out.println(p);
      }
    }
  }
}


import java.net.Socket;
import java.io.*;

public class Sender {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    Person p = new Person("Heinz", "Kabutz", 0);
    for (int age=0; age < 1500 * 1000; age++) {
      p.setAge(age);
      oos.writeObject(p);
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}
The output was:
java Receiver:
  *snip*
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0
  Heinz Kabutz, 0

java Sender:
  That took 19548ms
  
When we run this, we will see lots of People objects on the Receiver side, but all the age values will be 0, even though we changed the age on the Sender side. Why is this?
When you construct an ObjectOutputStream and an ObjectInputStream, they each contain a cache of objects that have already been sent across this stream. The cache relies on object identity, rather than the traditional hashing function. It is more similar to a java.util.IdentityHashMap than a normal java.util.HashMap. So, if you resend the same object, only a pointer to the object is sent across the network. This is very clever, and saves network bandwidth. However, the ObjectOutputStream cannot detect whether your object was changed internally, resulting in the Receiver just seeing the same object over and over again. You will notice that this was quite fast. We sent 1'500'000 objects in 19548ms (on my machine). (well, we only sent one object, and 1'499'999 pointers to that object).
There seemed to be some problem with sending the same Person object many times, especially if the contents of that Person changed. Due to the optimisation in ObjectOutputStream, only the pointer to the Person would be sent each time. So, what would happen if we simply sent a new Person each time? Let's try it out...
import java.net.Socket;
import java.io.*;

public class Sender2 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}
This seems to run fine for a while, until we all of a sudden see an OutOfMemory error on both the Receiver and the Sender2. Someone once challenged regarding the pathetic speed of Java. They claimed that Java was so slow that the Garbage Collector could not even keep up with objects that were being read over the network. It sounded strange to me that Java should run out of memory so after some questioning, we traced the problem to the object cache growing in the Receiver and never being cleared. Since the Person objects are always distinct, they are put into the cache on both sides of the ObjectOutputStream. The Receiver's side cannot clear entries from the table, since it does not know which entries the Sender might send again. It then keeps on growing until the JVM runs out of memory.

Resetting ObjectOutputStream

One hack^H^H^H^Hsolution to the OutOfMemory problem is to every time that you send an object also reset the cache on both sides. Let's try out what that does to our performance:
import java.net.Socket;
import java.io.*;

public class Sender3 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
      oos.reset();
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}
When I ran that, it worked without causing any OutOfMemory Errors, so I should be happy. But am I happy? I am old, after having to wait for 314242ms for it to complete, i.e. 16 times longer than with Sender. Sender was fast, but incorrect. Sender2 ran out of memory. Sender3 was correct, but slow. Is there no better way?
The problem with reset() is that it clears the cache of ALL objects, even constants such as the Strings "Heinz" and "Kabutz". So, we end up sending these constants over the network time and time again! Unfortunately the reset() is an all-or-nothing approach, so the entire cache will be lost. But perhaps, if we don't clear it all the time, we can get the advantage of speed and correctness? Let's try that out:
import java.net.Socket;
import java.io.*;

public class Sender4 {
  public static void main(String[] args) throws IOException {
    long start = System.currentTimeMillis();
    Socket s = new Socket("localhost", 7000);
    ObjectOutputStream oos = new ObjectOutputStream(
        s.getOutputStream());
    for (int age=0; age < 1500 * 1000; age++) {
      oos.writeObject(new Person("Heinz", "Kabutz", age));
      if (age % 1000 == 0) oos.reset();
    }
    long end = System.currentTimeMillis();
    System.out.println("That took " + (end-start) + "ms");
  }
}
Because I don't reset the cache on every call, Sender4 can avoid sending the Strings "Heinz" and "Kabutz" over the network 1'500'000 times in just 66015ms. Infact, it only has to send these Strings 1'500 times. If we reset the ObjectOutputStream too frequently, we will increase the network bandwidth, and if we do not reset it often enough, we will increase the burden of our Garbage Collector. Like all things in Java Performance Tuning, you have to set it to the correct number, not too big and not too little.


No comments:

Post a Comment