Commit Graph

5 Commits

Author SHA1 Message Date
Rui Ueyama e403c862cc Make llvm::djbHash an inline function.
Differential Revision: https://reviews.llvm.org/D43644

llvm-svn: 326625
2018-03-02 22:00:38 +00:00
Pavel Labath 3b17b84b9c Resubmit r325107 (case folding DJB hash)
The issue was that the has function was generating different results depending
on the signedness of char on the host platform. This commit fixes the issue by
explicitly using an unsigned char type to prevent sign extension and
adds some extra tests.

The original commit message was:

This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").

To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.

Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.

Reviewers: JDevlieghere, aprantl, probinson, dblaikie

Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42740

llvm-svn: 325732
2018-02-21 22:36:31 +00:00
Pavel Labath 918f60056a Revert r325107 (case folding DJB hash) and subsequent build fix
The "knownValuesUnicode" test in the patch fails on ppc64 and arm64
bots. Reverting while I investigate.

llvm-svn: 325115
2018-02-14 11:06:39 +00:00
Pavel Labath f1440978a1 Implement a case-folding version of DJB hash
Summary:
This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").

To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.

Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.

Reviewers: JDevlieghere, aprantl, probinson, dblaikie

Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42740

llvm-svn: 325107
2018-02-14 10:05:09 +00:00
Jonas Devlieghere 92ac9d3e1b [Support] Move DJB hash to support. NFC
This patch moves the DJB hash to support. This is consistent with other
hashing algorithms living there. The hash is used by the DWARF
accelerator tables. We're doing this now because the hashing function is
needed by dsymutil and we don't want to link against libBinaryFormat.

Differential revision: https://reviews.llvm.org/D42594

llvm-svn: 323616
2018-01-28 11:05:10 +00:00