llvm-project/llvm/utils
Pavel Labath 3b17b84b9c Resubmit r325107 (case folding DJB hash)
The issue was that the has function was generating different results depending
on the signedness of char on the host platform. This commit fixes the issue by
explicitly using an unsigned char type to prevent sign extension and
adds some extra tests.

The original commit message was:

This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").

To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.

Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.

Reviewers: JDevlieghere, aprantl, probinson, dblaikie

Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42740

llvm-svn: 325732
2018-02-21 22:36:31 +00:00
..
FileCheck [FileCheck] - Fix possible buffer out of bounds access when parsing --check-prefix. 2018-01-16 08:09:24 +00:00
KillTheDoctor
LLVMVisualizers
Misc
PerfectShuffle
TableGen Fix signed/unsigned comparison warning in AsmGenMatcher generated code. NFCI. 2018-02-17 12:29:47 +00:00
Target/ARM
UpdateTestChecks [utils] Refactor utils/update_{,llc_}test_checks.py to share more code 2018-02-10 05:01:33 +00:00
bugpoint
count
crosstool
docker Fix typos of occurred and occurrence 2018-01-24 10:33:39 +00:00
emacs Fix some regular expressions in llvm-mode.el. 2018-01-29 22:56:41 +00:00
fpcmp
gdb-scripts Make the Twine pretty-printer work with GDB 7.11 2017-06-04 03:27:12 +00:00
git
git-svn [git-llvm] Handle files ignored by svn correctly 2017-12-22 21:19:13 +00:00
jedit
kate
lint
lit Remove "--full-shutdown" and instead use an environment variable LLD_IN_TEST. 2018-02-16 23:41:48 +00:00
llvm-build
llvm-lit [lit] Actually do normalize the case of files in the config map. 2017-09-21 21:27:11 +00:00
not [CMake] Use PRIVATE in target_link_libraries for executables 2017-12-05 21:49:56 +00:00
release Update build_llvm_package.bat 2018-01-25 14:43:10 +00:00
sanitizers Add libstd++-4.8 exceptions to ubsan_blacklist.txt 2017-11-29 20:10:14 +00:00
testgen
textmate
unittest [gtest] Support raw_ostream printing functions more comprehensively. 2018-02-12 10:20:09 +00:00
valgrind
vim [vim] Recognize more FileCheck comments 2018-02-20 17:27:44 +00:00
vscode Adding VSCode syntax colorizer to utils (generated from textmate colorizer). 2017-05-09 17:13:37 +00:00
yaml-bench [CMake] Use PRIVATE in target_link_libraries for executables 2017-12-05 21:49:56 +00:00
DSAclean.py
DSAextract.py
GenLibDeps.pl
GetRepositoryPath
GetSourceVersion
LLVMBuild.txt
UpdateCMakeLists.pl
abtest.py AsmPrinter: mark the beginning and the end of a function in verbose mode 2017-05-23 21:22:16 +00:00
bisect
bisect-skip-count
bugpoint_gisel_reducer.py Add a utility to reduce GlobalISel tests 2018-01-23 19:47:10 +00:00
check-each-file
clang-parse-diagnostics-file
codegen-diff
countloc.sh
create_ladder_graph.py
extract_symbols.py
findmisopt
findoptdiff
findsym.pl
getsrcs.sh
lldbDataFormatters.py
llvm-compilers-check
llvm-gisel-cov.py [globalisel][tablegen] Generate rule coverage and use it to identify untested rules 2017-11-16 00:46:35 +00:00
llvm-native-gxx
llvm.grm
llvmdo
llvmgrep
makellvm
prepare-code-coverage-artifact.py
schedcover.py
shuffle_fuzz.py
shuffle_select_fuzz_tester.py Adding a shufflevector and select LLVM IR instructions fuzz tool 2017-10-31 11:39:31 +00:00
sort_includes.py
unicode-case-fold.py Resubmit r325107 (case folding DJB hash) 2018-02-21 22:36:31 +00:00
update_llc_test_checks.py [utils] Refactor utils/update_{,llc_}test_checks.py to share more code 2018-02-10 05:01:33 +00:00
update_mir_test_checks.py update_mir_test_checks: Accept "." in function names 2018-01-26 22:56:31 +00:00
update_test_checks.py [utils] Refactor utils/update_{,llc_}test_checks.py to share more code 2018-02-10 05:01:33 +00:00
wciia.py