Saturday, September 02, 2017

Go Interfaces and Immediate Integers

One of the unique things about the Go language is how it handles "dynamic" references compared to a language like Java.

In Java we have:


Whereas in Go we have:


In Go an interface is a "fat pointer". Your first thought might be, six of one, half a dozen of the other. What difference does it make where you put the type. However, there are significant tradeoffs.

Next, you might think, I could have multiple references to the same value, in which case Go will use more space because it will duplicate the type. But if we have a large array of values (in which case we're unlikely to have references to all of them), then Go would save space because it wouldn't need the type on each value.

The other big saving is if you have statically (compile time) typed references in Go they will look like:


And again, you save space because there's no need to store the type at runtime.

Given that background, the problem I was thinking about was how to handle integers in a Go version of Suneido. In Java you have to used boxed (on the heap) integers for generic references.

Older versions of Go were perfect for this purpose since they stored integers right in the interface in the same place as the pointer.



But sadly (for me) when they improved the garbage collector they removed immediate values:
The implementation of interface values has been modified. In earlier releases, the interface contained a word that was either a pointer or a one-word scalar value, depending on the type of the concrete object stored. This implementation was problematical for the garbage collector, so as of 1.4 interface values always hold a pointer. In running programs, most interface values were pointers anyway, so the effect is minimal, but programs that store integers (for example) in interfaces will see more allocations.
So now if you store an integer in an interface it gets allocated on the heap, effectively "boxing" it like Java does, although with less overhead.


For something like implementing Suneido, this really sucks. It adds a huge amount of overhead (relatively) on simple integer operations.

You could make an "extra fat" value that had an interface and an integer, but that's ugly too.

In C++ you can use a union to handle immediate integers and that's what cSuneido does. But Go doesn't have unions in the C/C++ sense.

But the union in cSuneido basically reserves a portion of the address space for use by immediate integers. i.e. a pointer falling in that range of addresses is assumed to be and is treated as an integer.

So why not do something similar in Go. Here's my proof of concept:

var space [1024 * 1024 * 1024]int8
var base = uintptr(unsafe.Pointer(&space[0]))

func ExampleSmi() {
   n := 123456789   
   x := &space[n]
   i := uintptr(unsafe.Pointer(x)) - base
   fmt.Println(i)
   // Output:   
   // 123456789
}

It does use the unsafe package, but only in a read-only, relatively safe way. (IMO) Pointers will always point into valid memory so the garbage collector will be happy. And it should be fast since the unsafe functions are primarily a compile time feature, they don't do much at runtime.

This example handles 30 bit integers (1 gb). (Unsigned here, but easily modified to handle signed.) I tried to allocate a 4 gb array to handle 32 bit integers but the compiler appears to have a limit of 2 gb. (Note: 4gb is an insignificant portion of a 64 bit address space.) That's too bad because I could probably ignore overflow from math on 32 bit integers, but not 30 bit integers.

We want to declare the "space" statically so that "base" is a constant. One question I had was whether this would add 1gb to the executable. With Go 1.9 on OS X it doesn't. (It appears to go in SNOPTRBSS which I translate as "no pointer uninitialized/zero memory") My other question is whether it's smart enough to allocate zeroed virtual memory and avoid actually zeroing the memory. I'm not sure how to check this, but there doesn't seem to be any lag in startup. Also, looking at the process in the OS X Activity Monitor is shows real memory usage under 2mb.  Presumably if the virtual memory is never modified it will never use any real memory. Also, importantly, the array is of type int8 so it won't need to be scanned by the garbage collector.

Even better, we can make our own type based on int8 (e.g, type Smi int8) and then define methods on this type. As long as we don't make the Smi type public and are careful to only make instances within the array then it will work like any other type. We basically have the old behavior of Go, albeit with a smaller range of integers.

It's a bit of kludge but it should work quite well and it's all standard (albeit "unsafe") Go. And it's no worse than tagged pointers  or nan boxing.