Sizeof in Scala
This blog post will show the impact of different ways to store a couple of integers in Scala on the memory footprint of a program.
Reminder on the JVM
Every object on the JVM has a header. It consists of a mark word and a klass pointer. On 64-bit architectures with a heap < 32G (i.e. with compressed oops), the header has a size of 12 bytes.
Java objects are 8-byte aligned by default. This can be changed with a JVM flag.
java -XX:+PrintFlagsFinal -version | grep ObjectAlignmentInBytes
intx ObjectAlignmentInBytes = 8 {lp64_product}
So, the minimum object size is 16 bytes for modern 64-bit JDK since the object has 12-byte header, padded to a multiple of 8 bytes.
The size of an object is the object header + the size of its fields + the overhead of alignment
A primitive int takes 4 bytes
A java.lang.Integer takes 16 bytes : 12 bytes for the header and 4 bytes for the field (yes, this is a huge overhead)
Tools
Apache Spark, a tool for large-scale data processing, has a class to measure the size of an object : SizeEstimator.estimate
I use ammonite, a better Scala REPL that allow to dynamically load dependencies
Let’s check the size of an Integer
@ import $ivy.`org.apache.spark::spark-sql:2.4.0`
import $ivy.$
@ import org.apache.spark.util.SizeEstimator._
import org.apache.spark.util.SizeEstimator._
@ estimate(new Integer(1))
res2: Long = 16L
So far, so good !
Comparing different ways to store 2 ints
Let’s create a simple wrapper to store 2 int
case class IntInt(i: Int, j: Int)
The size is
- object header : 12 bytes
- 2
int
fields : 2 * 4 = 8 bytes - total = 20 bytes, so 24 bytes with alignment
But it will be awkward to create classes for all combinations of types, so let’s create a generic pair object.
@ class Pair[A, B](val a: A, val b: B)
defined class Pair
@ estimate(new Pair(1, 2))
res4: Long = 56L
On the JVM, generics and primitive types don’t play well together. Here primitive types passed to
the constructor need to be boxed to java.lang.Integer
objects.
Let’s check the number by hand. The size is
- object header : 12 bytes
- 2 reference fields : 2 * 4 = 8 bytes
- 2 Integers : 2 * 16 = 32 bytes
- total = 52 bytes, so 56 bytes with alignment
In order to reduce the overhead due to boxing, Scala has a feature called specialized.
In this case, the compiler will generate specialized classes for each combination of specialized argument types.
@ case class SpPair[@specialized(Int) A, @specialized(Int) B](a: A, b: B)
defined class SpPair
@ estimate(SpPair(1, 2))
res7: Long = 32L
Surprisingly, because the specialization is supposed to solve the overhead issue, there is still a difference with the simple wrapper.
Let’s dig into that
@ SpPair(1, 2).getClass
res8: Class[?0] = class ammonite.$sess.cmd6$SpPair$mcII$sp
The class name is SpPair$mcII$sp
and not SpPair
. it’s the class generated by the compiler.
@ SpPair(1, 2).getClass.getDeclaredFields.map(_.getType)
res9: Array[Class[?0] forSome { type ?0 }] = Array(int, int)
And this class has 2 fields of type int
. So the size should be the same as that of the simple
wrapper.
Let’s see if there is a superclass …
@ SpPair(1, 2).getClass.getSuperclass
res10: Class[?0] = class ammonite.$sess.cmd6$SpPair
There is a superclass, it’s SpPair
. Let’s have a look to the fields of the superclass
@ SpPair(1, 2).getClass.getSuperclass.getDeclaredFields.map(_.getType)
res11: Array[Class[?0] forSome { type ?0 }] = Array(class java.lang.Object, class java.lang.Object)
Unfortunately, the superclass has 2 fields of type Object
, that will always be null for a
specialized instance. Nevertheless, a reference field with a null value on the JVM take the same
space as a non-null reference.
so the size is
- object header : 12 bytes
- 2 reference fields : 2 * 4 = 8 bytes
- 2 ints : 2 * 4 = 8 bytes
- total = 28 bytes, so 32 bytes with alignment. (Q.E.D.)
We can double-check with jol, another interesting tool when it comes to understanding memory
consumption. Our SpPair
class uses exactly the same mechanism as the Tuple2
class from the scala
standard library. So here is the result of jol on scala.Tuple2$mcII$sp
, the specialized instance for
pair of ints.
java -jar jol-cli-0.9-full.jar internals -cp /usr/local/Cellar/scala/2.13.1/libexec/lib/scala-library.jar 'scala.Tuple2$mcII$sp'
Instantiated the sample instance via public scala.Tuple2$mcII$sp(int,int)
scala.Tuple2$mcII$sp object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 05 00 00 00 (00000101 00000000 00000000 00000000) (5)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 85 0a 02 f8 (10000101 00001010 00000010 11111000) (-134083963)
12 4 java.lang.Object Tuple2._1 null
16 4 java.lang.Object Tuple2._2 null
20 4 int Tuple2$mcII$sp._1$mcI$sp 0
24 4 int Tuple2$mcII$sp._2$mcI$sp 0
28 4 (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Hopefully, the instance size computed by jol is also 32 bytes. It shows java.lang.Object
fields _1
and _2
from the superclass and int
fields from the subclass _1$mcI$sp
, _2$mcI$sp
.
Let’s see a last solution with an array.
@ SizeEstimator.estimate(Array(1,2))
res15: Long = 24L
The size is
- object header : 12 bytes
- length fields : 4 byte
- 2 ints : 2 * 4 = 8 bytes
- total = 24 bytes
Conclusion
Structure | Size |
---|---|
Simple Wrapper | 24 |
Generic Pair | 56 |
Specialized Pair | 32 |
Array | 24 |
If you instantiate a lot of objects, it’s worth knowing the tradeoffs of each solution.