java

18 Oct

Java Serialization Size of Lists

By Nguyễn Bình Vĩ

What has a larger serializable size, ArrayList or LinkedList? In this newsletter we examine what the difference is and also why Vector is a poor candidate for a list in a serializable class.

Serialization Size of Lists

Hopefully we all have some idea of how the ArrayList and LinkedList
are structured internally. The ArrayList holds a pointer to a
single Object array, which grows as the number of elements exceed
the size of the array. The ArrayList’s underlying Object array
grows by about 50% whenever we run out of space. The LinkedList
has pointers to the link nodes at the front and end of the list.
Each link node is an object and has pointers forward and backwards
and to the object that is being held in the list. The memory
requirements of the LinkedList is about 4x larger than an equivalently
sized ArrayList. In addition, the ArrayList allows random access
into the middle, whereas the LinkedList has a lookup complexity of
O(n).

One of the questions I enjoy asking in my Java Specialist Master Course
is this: Which list uses more bytes when serialized? LinkedList or
ArrayList? Think about this for a bit before reading on,
considering the internal structure of each list.

Before I show the answer, I’d like to go back about 8.5
years, when I described memory usage
in Java
. I had chosen an experimental approach
to figuring out how many bytes are occupied by Java objects.
At the time, one of my readers told me he measured object
size by serializing Java objects to a byte array. I argued
that the result would differ completely from the bytes in
memory. This newsletter shows an example of how different
the results may be.

Most programmers guess that the LinkedList would have a
larger serialized size than the ArrayList. However, that
is never the case:

import java.io.*;
import java.util.*;

public class ListWritingSize {
  public static void main(String[] args) throws IOException {
    test(new LinkedList<String>());
    test(new ArrayList<String>());
  }

  public static void test(List<String> list) throws IOException {
    for (int i = 0; i < 10; i++) {
      list.add("hello world");
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(list);
    out.close();
    System.out.println(list.getClass().getSimpleName() +
        " used " + baos.toByteArray().length + " bytes");
  }
}

When we run this, we see that the LinkedList uses 107 bytes,
whilst the ArrayList uses 117 bytes. Instead of just naively
writing the contents of the collections to the stream,
the lists both contain writeObject() and
readObject() methods. These write the contents
of the lists to the stream.

Why the 10 byte difference?

The ArrayList and LinkedList work similarly, in that they
write out the number of elements and the actual objects
contained in the list. However, ArrayList also writes out
the size of the underlying array, used to recreate an
identical ArrayList to what was serialized.

This additional int is what makes up
the 10 byte difference in the serialization size.

At this point I need to also point out that the String
“hello world” is only serialized once, after that only a
reference number to the object is written.

What about Vector?

The old Vector class implements serialization in a naive way.
They simply do the default serialization, which writes the
entire Object[] as-is into the stream. Thus if
we insert a bunch of elements into the List, then clear it,
the difference between Vector and ArrayList is enormous.

import java.util.*;
import java.io.*;

public class VectorWritingSize {
  public static void main(String[] args) throws IOException {
    test(new LinkedList<String>());
    test(new ArrayList<String>());
    test(new Vector<String>());
  }

  public static void test(List<String> list) throws IOException {
    insertJunk(list);
    for (int i = 0; i < 10; i++) {
      list.add("hello world");
    }
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    ObjectOutputStream out = new ObjectOutputStream(baos);
    out.writeObject(list);
    out.close();
    System.out.println(list.getClass().getSimpleName() +
        " used " + baos.toByteArray().length + " bytes");
  }

  private static void insertJunk(List<String> list) {
    for(int i = 0; i<1000 * 1000; i++) {
      list.add("junk");
    }
    list.clear();
  }
}

When we run this code, we get the following output:

    LinkedList used 107 bytes
    ArrayList used 117 bytes
    Vector used 1310926 bytes

Vector can use a staggering amount of bytes when being
serialized. The lesson here? Don’t ever use Vector as Lists
in objects that are Serializable. The potential for
disaster is too great.

Source from : http://www.javaspecialists.eu/archive/Issue183.html