This blog post will show the impact of different ways to store a couple of integers in Scala on the memory footprint of a program.

Reminder on the JVM

Every object on the JVM has a header. It consists of a mark word and a klass pointer. On 64-bit architectures with a heap < 32G (i.e. with compressed oops), the header has a size of 12 bytes.

Java objects are 8-byte aligned by default. This can be changed with a JVM flag.

java -XX:+PrintFlagsFinal -version | grep ObjectAlignmentInBytes
     intx ObjectAlignmentInBytes                    = 8                                   {lp64_product}

So, the minimum object size is 16 bytes for modern 64-bit JDK since the object has 12-byte header, padded to a multiple of 8 bytes.

The size of an object is the object header + the size of its fields + the overhead of alignment

A primitive int takes 4 bytes

A java.lang.Integer takes 16 bytes : 12 bytes for the header and 4 bytes for the field (yes, this is a huge overhead)

Tools

Apache Spark, a tool for large-scale data processing, has a class to measure the size of an object : SizeEstimator.estimate

I use ammonite, a better Scala REPL that allow to dynamically load dependencies

Let’s check the size of an Integer

@ import $ivy.`org.apache.spark::spark-sql:2.4.0`
import $ivy.$

@ import org.apache.spark.util.SizeEstimator._
import org.apache.spark.util.SizeEstimator._

@ estimate(new Integer(1))
res2: Long = 16L

So far, so good !

Comparing different ways to store 2 ints

Let’s create a simple wrapper to store 2 int

case class IntInt(i: Int, j: Int)

The size is

  • object header : 12 bytes
  • 2 int fields : 2 * 4 = 8 bytes
  • total = 20 bytes, so 24 bytes with alignment

But it will be awkward to create classes for all combinations of types, so let’s create a generic pair object.

@ class Pair[A, B](val a: A, val b: B)
defined class Pair

@ estimate(new Pair(1, 2))
res4: Long = 56L

On the JVM, generics and primitive types don’t play well together. Here primitive types passed to the constructor need to be boxed to java.lang.Integer objects.

Let’s check the number by hand. The size is

  • object header : 12 bytes
  • 2 reference fields : 2 * 4 = 8 bytes
  • 2 Integers : 2 * 16 = 32 bytes
  • total = 52 bytes, so 56 bytes with alignment

In order to reduce the overhead due to boxing, Scala has a feature called specialized.

In this case, the compiler will generate specialized classes for each combination of specialized argument types.

@ case class SpPair[@specialized(Int) A, @specialized(Int) B](a: A, b: B)
defined class SpPair

@ estimate(SpPair(1, 2))
res7: Long = 32L

Surprisingly, because the specialization is supposed to solve the overhead issue, there is still a difference with the simple wrapper.

Let’s dig into that

@ SpPair(1, 2).getClass
res8: Class[?0] = class ammonite.$sess.cmd6$SpPair$mcII$sp

The class name is SpPair$mcII$sp and not SpPair. it’s the class generated by the compiler.

@ SpPair(1, 2).getClass.getDeclaredFields.map(_.getType)
res9: Array[Class[?0] forSome { type ?0 }] = Array(int, int)

And this class has 2 fields of type int. So the size should be the same as that of the simple wrapper.

Let’s see if there is a superclass …

@ SpPair(1, 2).getClass.getSuperclass
res10: Class[?0] = class ammonite.$sess.cmd6$SpPair

There is a superclass, it’s SpPair. Let’s have a look to the fields of the superclass

@ SpPair(1, 2).getClass.getSuperclass.getDeclaredFields.map(_.getType)
res11: Array[Class[?0] forSome { type ?0 }] = Array(class java.lang.Object, class java.lang.Object)

Unfortunately, the superclass has 2 fields of type Object, that will always be null for a specialized instance. Nevertheless, a reference field with a null value on the JVM take the same space as a non-null reference.

so the size is

  • object header : 12 bytes
  • 2 reference fields : 2 * 4 = 8 bytes
  • 2 ints : 2 * 4 = 8 bytes
  • total = 28 bytes, so 32 bytes with alignment. (Q.E.D.)

We can double-check with jol, another interesting tool when it comes to understanding memory consumption. Our SpPair class uses exactly the same mechanism as the Tuple2 class from the scala standard library. So here is the result of jol on scala.Tuple2$mcII$sp, the specialized instance for pair of ints.

java -jar jol-cli-0.9-full.jar internals -cp /usr/local/Cellar/scala/2.13.1/libexec/lib/scala-library.jar 'scala.Tuple2$mcII$sp'                                                   

Instantiated the sample instance via public scala.Tuple2$mcII$sp(int,int)

scala.Tuple2$mcII$sp object internals:
 OFFSET  SIZE               TYPE DESCRIPTION                               VALUE
      0     4                    (object header)                           05 00 00 00 (00000101 00000000 00000000 00000000) (5)
      4     4                    (object header)                           00 00 00 00 (00000000 00000000 00000000 00000000) (0)
      8     4                    (object header)                           85 0a 02 f8 (10000101 00001010 00000010 11111000) (-134083963)
     12     4   java.lang.Object Tuple2._1                                 null
     16     4   java.lang.Object Tuple2._2                                 null
     20     4                int Tuple2$mcII$sp._1$mcI$sp                  0
     24     4                int Tuple2$mcII$sp._2$mcI$sp                  0
     28     4                    (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Hopefully, the instance size computed by jol is also 32 bytes. It shows java.lang.Object fields _1 and _2 from the superclass and int fields from the subclass _1$mcI$sp, _2$mcI$sp.

Let’s see a last solution with an array.

@ SizeEstimator.estimate(Array(1,2))
res15: Long = 24L

The size is

  • object header : 12 bytes
  • length fields : 4 byte
  • 2 ints : 2 * 4 = 8 bytes
  • total = 24 bytes

Conclusion

Structure Size
Simple Wrapper 24
Generic Pair 56
Specialized Pair 32
Array 24

If you instantiate a lot of objects, it’s worth knowing the tradeoffs of each solution.