Updates the first list of supported types, that can be encoded using a Tuple, to match the types that can be found in later comments within the tuple package.
Getting a warning at the end of the script is confusing as to the status
of the operation. Did it complete? Partially complete? What was done? It
also wastes time as the repo download comes first.
On Linux, the library may be in /usr/lib64 in many cases, so check for
the library in that path too.
Memory profiling a FoundationDB layer implemented in Go shows high
memory pressure and increased GC times when performing highly-concurrent
multi-key transactions on the database. Further digging displays that
the source of the memory pressure happens when packing the keys for the
transaction into byte slices: the most salient issue is that memory
during the packing process is allocated based on the number of elements
to pack and not on the total size of the resulting byte slice.
This commit attempts to reduce the amount of memory allocated when
calling `Tuple.Pack` for most (all?) usage patterns, both in number of
allocations and in total allocated size.
The following optimizations have been implemented:
- Remove `bytes.Buffer` usage in `encodeTuple`: the `Buffer` struct is
quite expensive for the key sizes we're looking to generate, both
allocation and performance-wise. A `packer` struct has been implemented
that builds the keys "naively" by using `append` on a slice. Slice
growth in Go is also amortized just like in `bytes.Buffer`.
- Do not use `bytes.Replace` in `encodeBytes`: this function is
particularly expensive because it always allocates a copy of the byte
slice, even when it doesn't contain nil bytes. Instead, the replacement
step has been implemented manually in `packer.putbytesNil`, where it can
perform the replacement optimally into the output byte slice without
allocating memory. By having this local function we also allow the
compiler to not duplicate any input `string`s when casting them to
`[]byte`; previously, a copy of every string to pack was always being
allocated because the compiler couldn't prove that `bytes.Replace`
wouldn't modify the slice.
- Use stack space in `encode[Float|Double|Int]`: all the numerical
packing functions were allocating huge amounts of memory because of the
usage of temporary `bytes.Buffer` objects and `binary.Write` calls. The
sizes for all the packed data are always known (either 4 or 8 bytes
depending on type), so the big endian packing can be performed directly
on the stack with `binary.BigEndian.PutUint[32|64]`, which doesn't
require the `interface{}` conversion for the `binary.Write` API and in
x64 compiles to a `mov + bswap` instruction pair.
As a result of these optimizations, the "average" case of key packing
can now create a key with a single allocation. More complex key packing
operations, even those that contain strings/byte slices with nil bytes,
now allocate memory in a constant way (i.e. amortized based on the
amount of growth of the output buffer and not the number of Tuple
elements to pack).
Additionally, the reduction of memory allocations and the better usage
of the `binary` APIs produce a very significant reduction in runtime for
key packing: between 2x and 6x faster for all packing operations.
Before/after benchmarks are as follows:
benchmark old ns/op new ns/op delta
BenchmarkTuplePacking/Simple-4 310 76.4 -75.35%
BenchmarkTuplePacking/Namespaces-4 495 137 -72.32%
BenchmarkTuplePacking/ManyStrings-4 960 255 -73.44%
BenchmarkTuplePacking/ManyStringsNil-4 1090 392 -64.04%
BenchmarkTuplePacking/ManyBytes-4 1409 399 -71.68%
BenchmarkTuplePacking/ManyBytesNil-4 1364 533 -60.92%
BenchmarkTuplePacking/LargeBytes-4 319 107 -66.46%
BenchmarkTuplePacking/LargeBytesNil-4 638 306 -52.04%
BenchmarkTuplePacking/Integers-4 2764 455 -83.54%
BenchmarkTuplePacking/Floats-4 3478 482 -86.14%
BenchmarkTuplePacking/Doubles-4 3654 575 -84.26%
BenchmarkTuplePacking/UUIDs-4 366 211 -42.35%
benchmark old allocs new allocs delta
BenchmarkTuplePacking/Simple-4 6 1 -83.33%
BenchmarkTuplePacking/Namespaces-4 11 1 -90.91%
BenchmarkTuplePacking/ManyStrings-4 18 2 -88.89%
BenchmarkTuplePacking/ManyStringsNil-4 18 2 -88.89%
BenchmarkTuplePacking/ManyBytes-4 23 3 -86.96%
BenchmarkTuplePacking/ManyBytesNil-4 22 2 -90.91%
BenchmarkTuplePacking/LargeBytes-4 3 2 -33.33%
BenchmarkTuplePacking/LargeBytesNil-4 3 2 -33.33%
BenchmarkTuplePacking/Integers-4 63 3 -95.24%
BenchmarkTuplePacking/Floats-4 62 2 -96.77%
BenchmarkTuplePacking/Doubles-4 63 3 -95.24%
BenchmarkTuplePacking/UUIDs-4 2 2 +0.00%
benchmark old bytes new bytes delta
BenchmarkTuplePacking/Simple-4 272 64 -76.47%
BenchmarkTuplePacking/Namespaces-4 208 64 -69.23%
BenchmarkTuplePacking/ManyStrings-4 512 192 -62.50%
BenchmarkTuplePacking/ManyStringsNil-4 512 192 -62.50%
BenchmarkTuplePacking/ManyBytes-4 864 448 -48.15%
BenchmarkTuplePacking/ManyBytesNil-4 336 192 -42.86%
BenchmarkTuplePacking/LargeBytes-4 400 192 -52.00%
BenchmarkTuplePacking/LargeBytesNil-4 400 192 -52.00%
BenchmarkTuplePacking/Integers-4 3104 448 -85.57%
BenchmarkTuplePacking/Floats-4 2656 192 -92.77%
BenchmarkTuplePacking/Doubles-4 3104 448 -85.57%
BenchmarkTuplePacking/UUIDs-4 256 192 -25.00%
Although the Go bindings to FoundationDB are thoroughly tested as part
of the `bindingtester` operation, this commit implements a more-or-less
complete test case using golden files for the serialized output of
`Tuple.Pack` operations. This will make implementing optimizations and
refactoring the packing operation much simpler.
The same test cases used to verify correctness are also used as a
benchmark suite to measure the amount of memory allocated in the
different operations.
Explicitly adds the generated go file into the GO_SRC in the Makefile to make the dependency relationships more clear.
Adds the standard Go header to our generated Go file.