Commit Graph

11 Commits

Author SHA1 Message Date
NAKAMURA Takumi 9aa1fbf314 unittests/ADT/HashingTest.cpp: Temporarily disable a new test introduced in r151891, to appease msvc.
llvm-svn: 151970
2012-03-03 07:00:58 +00:00
Chandler Carruth 627e862389 Simplify the pair optimization. Rather than using complex type traits,
just ensure that the number of bytes in the pair is the sum of the bytes
in each side of the pair. As long as thats true, there are no extra
bytes that might be padding.

Also add a few tests that previously would have slipped through the
checking. The more accurate checking mechanism catches these and ensures
they are handled conservatively correctly.

Thanks to Duncan for prodding me to do this right and more simply.

llvm-svn: 151891
2012-03-02 10:56:40 +00:00
Chandler Carruth becfda3e40 Add a golden data test that I missed somehow the first time around.
llvm-svn: 151886
2012-03-02 10:01:29 +00:00
Chandler Carruth b101eed332 Fix bad indenting that was left over from cut/paste of the golden values
for 32-bit builds in here.

llvm-svn: 151885
2012-03-02 10:01:27 +00:00
Chandler Carruth 40119fb9c6 We really want to hash pairs of directly-hashable data as directly
hashable data. This matters when we have pair<T*, U*> as a key, which is
quite common in DenseMap, etc. To that end, we need to detect when this
is safe. The requirements on a generic std::pair<T, U> are:

1) Both T and U must satisfy the existing is_hashable_data trait. Note
   that this includes the requirement that T and U have no internal
   padding bits or other bits not contributing directly to equality.
2) The alignment constraints of std::pair<T, U> do not require padding
   between consecutive objects.
3) The alignment constraints of U and the size of T do not conspire to
   require padding between the first and second elements.

Grow two somewhat magical traits to detect this by forming a pod
structure and inspecting offset artifacts on it. Hopefully this won't
cause any compilers to panic.

Added and adjusted tests now that pairs, even nested pairs, are treated
as just sequences of data.

Thanks to Jeffrey Yasskin for helping me sort through this and reviewing
the somewhat subtle traits.

llvm-svn: 151883
2012-03-02 09:26:36 +00:00
Chandler Carruth 4718430a5a Add support for hashing pairs by delegating to each sub-object. There is
an open question of whether we can do better than this by treating pairs
as boring data containers and directly hashing the two subobjects. This
at least makes the API reasonable.

In order to make this change, I reorganized the header a bit. I lifted
the declarations of the hash_value functions up to the top of the header
with their doxygen comments as these are intended for users to interact
with. They shouldn't have to wade through implementation details. I then
defined them at the very end so that they could be defined in terms of
hash_combine or any other hashing infrastructure.

Added various pair-hashing unittests.

llvm-svn: 151882
2012-03-02 08:32:29 +00:00
Chandler Carruth 0d7c2788e4 Remove the misguided extension here that reserved two special values in
the hash_code. I'm not sure what I was thinking here, the use cases for
special values are in the *keys*, not in the hashes of those keys.

We can always resurrect this if needed, or clients can accomplish the
same goal themselves. This makes the general case somewhat faster (~5
cycles faster on my machine) and smaller with less branching.

llvm-svn: 151865
2012-03-02 00:48:38 +00:00
Chandler Carruth 396260c484 Re-disable the debug output. The comment is there explaining why we want
to keep this around -- updating golden tests is annoying otherwise.

Thanks to Benjamin for pointing this omission out on IRC.

llvm-svn: 151860
2012-03-01 23:20:45 +00:00
Chandler Carruth 3da579832f Provide the 32-bit variant of the golden tests. Not sure how I forgot to
do this initially, sorry.

llvm-svn: 151857
2012-03-01 23:06:19 +00:00
Chandler Carruth 1d03a3b6b1 Rewrite LLVM's generalized support library for hashing to follow the API
of the proposed standard hashing interfaces (N3333), and to use
a modified and tuned version of the CityHash algorithm.

Some of the highlights of this change:
 -- Significantly higher quality hashing algorithm with very well
    distributed results, and extremely few collisions. Should be close to
    a checksum for up to 64-bit keys. Very little clustering or clumping of
    hash codes, to better distribute load on probed hash tables.
 -- Built-in support for reserved values.
 -- Simplified API that composes cleanly with other C++ idioms and APIs.
 -- Better scaling performance as keys grow. This is the fastest
    algorithm I've found and measured for moderately sized keys (such as
    show up in some of the uniquing and folding use cases)
 -- Support for enabling per-execution seeds to prevent table ordering
    or other artifacts of hashing algorithms to impact the output of
    LLVM. The seeding would make each run different and highlight these
    problems during bootstrap.

This implementation was tested extensively using the SMHasher test
suite, and pased with flying colors, doing better than the original
CityHash algorithm even.

I've included a unittest, although it is somewhat minimal at the moment.
I've also added (or refactored into the proper location) type traits
necessary to implement this, and converted users of GeneralHash over.

My only immediate concerns with this implementation is the performance
of hashing small keys. I've already started working to improve this, and
will continue to do so. Currently, the only algorithms faster produce
lower quality results, but it is likely there is a better compromise
than the current one.

Many thanks to Jeffrey Yasskin who did most of the work on the N3333
paper, pair-programmed some of this code, and reviewed much of it. Many
thanks also go to Geoff Pike Pike and Jyrki Alakuijala, the original
authors of CityHash on which this is heavily based, and Austin Appleby
who created MurmurHash and the SMHasher test suite.

Also thanks to Nadav, Tobias, Howard, Jay, Nick, Ahmed, and Duncan for
all of the review comments! If there are further comments or concerns,
please let me know and I'll jump on 'em.

llvm-svn: 151822
2012-03-01 18:55:25 +00:00
Talin f2291c908b Hashing.h - utilities for hashing various data types.
llvm-svn: 150890
2012-02-18 21:00:49 +00:00