llvm-project/llvm/lib/Support
Pavel Labath 3b17b84b9c Resubmit r325107 (case folding DJB hash)
The issue was that the has function was generating different results depending
on the signedness of char on the host platform. This commit fixes the issue by
explicitly using an unsigned char type to prevent sign extension and
adds some extra tests.

The original commit message was:

This patch implements a variant of the DJB hash function which folds the
input according to the algorithm in the Dwarf 5 specification (Section
6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18,
"Case Mappings").

To achieve this, I have added a llvm::sys::unicode::foldCharSimple
function, which performs this mapping. The implementation of this
function was generated from the CaseMatching.txt file from the Unicode
spec using a python script (which is also included in this patch). The
script tries to optimize the function by coalescing adjecant mappings
with the same shift and stride (terms I made up). Theoretically, it
could be made a bit smarter and merge adjecant blocks that were
interrupted by only one or two characters with exceptional mapping, but
this would save only a couple of branches, while it would greatly
complicate the implementation, so I deemed it was not worth it.

Since we assume that the vast majority of the input characters will be
US-ASCII, the folding hash function has a fast-path for handling these,
and only whips out the full decode+fold+encode logic if we encounter a
character outside of this range. It might be possible to implement the
folding directly on utf8 sequences, but this would also bring a lot of
complexity for the few cases where we will actually need to process
non-ascii characters.

Reviewers: JDevlieghere, aprantl, probinson, dblaikie

Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits

Differential Revision: https://reviews.llvm.org/D42740

llvm-svn: 325732
2018-02-21 22:36:31 +00:00
..
Unix Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
Windows Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
AMDGPUMetadata.cpp AMDGPU: Add num spilled s/vgprs to metadata 2017-11-28 17:51:08 +00:00
APFloat.cpp Fix APFloat from string conversion for Inf 2017-12-19 04:27:39 +00:00
APInt.cpp [APInt] Fix extractBits to correctly handle Result.isSingleWord() case. 2018-02-16 01:44:36 +00:00
APSInt.cpp
ARMAttributeParser.cpp Avoid int to string conversion in Twine or raw_ostream contexts. 2017-12-28 16:58:54 +00:00
ARMBuildAttrs.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
ARMWinEH.cpp
Allocator.cpp Recover some overzealously removed includes. 2017-12-13 22:21:02 +00:00
Atomic.cpp Fix llvm-for-windows-on-linux build after LLVM r272701. 2017-08-03 20:10:47 +00:00
BinaryStreamError.cpp [Support] Move Stream library from MSF -> Support. 2017-03-02 20:52:51 +00:00
BinaryStreamReader.cpp Add a BinarySubstreamRef, and a method to read one. 2017-06-23 16:38:40 +00:00
BinaryStreamRef.cpp [BinaryStream] Support growable streams. 2017-11-27 18:48:37 +00:00
BinaryStreamWriter.cpp [BinaryStream] Support growable streams. 2017-11-27 18:48:37 +00:00
BlockFrequency.cpp Remove redundant includes from lib/Support. 2017-12-13 21:30:58 +00:00
BranchProbability.cpp Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. 2017-10-15 14:32:27 +00:00
CMakeLists.txt Resubmit r325107 (case folding DJB hash) 2018-02-21 22:36:31 +00:00
COM.cpp
COPYRIGHT.regex
CachePruning.cpp [ThinLTO][CachePruning] explicitly disable pruning 2017-12-22 18:32:15 +00:00
Chrono.cpp [Support][Chrono] Use explicit cast of text output of time values. 2017-11-06 23:01:46 +00:00
CodeGenCoverage.cpp Fix use of config.h in public headers. 2017-11-18 22:42:26 +00:00
CommandLine.cpp Revert r322595: Specify inline for isWhitespace in CommandLine.cpp 2018-01-22 23:27:50 +00:00
Compression.cpp
ConvertUTF.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
ConvertUTFWrapper.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
CrashRecoveryContext.cpp Re-land r303274: "[CrashRecovery] Use SEH __try instead of VEH when available" 2017-05-17 18:16:17 +00:00
DAGDeltaAlgorithm.cpp
DJB.cpp Resubmit r325107 (case folding DJB hash) 2018-02-21 22:36:31 +00:00
DataExtractor.cpp [DWARF] Support for DW_FORM_strx3 and complete support for DW_FORM_strx{1,2,4} 2017-06-21 19:37:44 +00:00
Debug.cpp
DebugCounter.cpp Hide dbgs() stream for when built with -fmodules. 2017-06-14 19:16:22 +00:00
DeltaAlgorithm.cpp
DynamicLibrary.cpp Allow clients to specify search order of DynamicLibraries. 2017-07-12 21:22:45 +00:00
Errno.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
Error.cpp [Support] Make llvm::Error and Expected faster. 2017-11-09 19:31:52 +00:00
ErrorHandling.cpp Report fatal error in the case of out of memory 2018-02-17 10:21:33 +00:00
FileOutputBuffer.cpp Make helpers static. NFC. 2017-11-24 14:55:41 +00:00
FileUtilities.cpp
FoldingSet.cpp Revert r325224 "Report fatal error in the case of out of memory" 2018-02-15 09:45:59 +00:00
FormatVariadic.cpp Remove unused variables. No functionality change. 2017-10-08 19:11:02 +00:00
FormattedStream.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
GlobPattern.cpp [Support/GlobPattern] - Do not crash when pattern has characters with int value < 0. 2017-07-31 09:26:50 +00:00
GraphWriter.cpp Convenience/safety fix for llvm::sys::Execute(And|No)Wait 2017-09-13 17:03:37 +00:00
Hashing.cpp
Host.cpp [X86] Add 'sahf' to getHostCPUFeatures so -march=native will pick it up correctly. 2018-02-17 16:52:49 +00:00
IntEqClasses.cpp
IntervalMap.cpp
JamCRC.cpp
KnownBits.cpp [KnownBits][ValueTracking] Move the math for calculating known bits for add/sub into a static method in KnownBits object 2017-08-08 16:29:35 +00:00
LEB128.cpp
LLVMBuild.txt
LineIterator.cpp
Locale.cpp
LockFileManager.cpp [Support] Replace hand-written scope_exit with make_scope_exit. 2018-02-18 16:05:40 +00:00
LowLevelType.cpp [GlobalISel] Enable legalizing non-power-of-2 sized types. 2017-11-07 10:34:34 +00:00
MD5.cpp Fix warnings. [-Wdocumentation] 2017-10-12 09:42:14 +00:00
ManagedStatic.cpp Revamp llvm::once_flag to be closer to std::once_flag 2017-02-05 21:13:06 +00:00
MathExtras.cpp
Memory.cpp
MemoryBuffer.cpp [Support] Remove MemoryBuffer::getNewMemBuffer 2018-01-15 11:03:30 +00:00
Mutex.cpp [Support] - Add bad alloc error handler for handling allocation malfunctions 2017-07-11 16:45:30 +00:00
NativeFormatting.cpp Support: Add missing #include. 2018-01-18 20:49:33 +00:00
Options.cpp
Parallel.cpp Bring r314809 back. 2017-10-04 20:27:01 +00:00
Path.cpp Delete temp file if rename fails. 2017-12-05 16:40:56 +00:00
PluginLoader.cpp
PrettyStackTrace.cpp Add more initializers to quiet a clang warning 2018-01-30 16:02:32 +00:00
Process.cpp [llvm-rc] Use proper search algorithm for finding resources. 2017-10-11 20:12:09 +00:00
Program.cpp Convenience/safety fix for llvm::sys::Execute(And|No)Wait 2017-09-13 17:03:37 +00:00
README.txt.system
RWMutex.cpp Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
RandomNumberGenerator.cpp Mark all library options as hidden. 2017-12-01 00:53:10 +00:00
Regex.cpp Add const to a const method. NFC 2017-04-18 01:04:05 +00:00
SHA1.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
ScaledNumber.cpp
ScopedPrinter.cpp Remove redundant includes from lib/Support. 2017-12-13 21:30:58 +00:00
Signals.cpp Convenience/safety fix for llvm::sys::Execute(And|No)Wait 2017-09-13 17:03:37 +00:00
SmallPtrSet.cpp [SmallPtrSet] Add iterator epoch tracking. 2017-10-13 20:37:52 +00:00
SmallVector.cpp Support, IR, ADT: Check nullptr after allocation with malloc/realloc or calloc 2017-07-20 01:30:39 +00:00
SourceMgr.cpp Add DK_Remark to SMDiagnostic 2017-10-12 23:56:02 +00:00
SpecialCaseList.cpp Extend SpecialCaseList to allow users to blame matches on entries in the file. 2017-11-07 21:16:46 +00:00
Statistic.cpp [ADT] Replace sys::MemoryFence with standard atomics. 2018-02-01 20:28:33 +00:00
StringExtras.cpp [Support] Move PrintEscapedString into the library its declaration is in 2018-01-26 20:21:02 +00:00
StringMap.cpp Report fatal error in the case of out of memory 2018-02-20 05:41:26 +00:00
StringPool.cpp
StringRef.cpp Fix APFloat from string conversion for Inf 2017-12-19 04:27:39 +00:00
StringSaver.cpp
SystemUtils.cpp
TarWriter.cpp [Support/TarWriter] - Don't allow TarWriter to add the same file more than once. 2017-12-05 10:09:59 +00:00
TargetParser.cpp [ARM] Add 'fillValidCPUArchList' to ARM targets 2018-02-08 16:48:54 +00:00
TargetRegistry.cpp Add backend name to Target to enable runtime info to be fed back into TableGen 2017-11-15 23:55:44 +00:00
ThreadLocal.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
ThreadPool.cpp Speculative build fix for lld on Linux after Michael's #include removals 2017-12-13 22:12:57 +00:00
Threading.cpp Bring r314809 back. 2017-10-04 20:27:01 +00:00
Timer.cpp Make LLVM timer reprintable: that is, make more than one print action on the same timer feasible 2018-02-10 00:38:21 +00:00
ToolOutputFile.cpp [Support] Rename tool_output_file to ToolOutputFile, NFC 2017-09-23 01:03:17 +00:00
TrigramIndex.cpp Sort the remaining #include lines in include/... and lib/.... 2017-06-06 11:49:48 +00:00
Triple.cpp [WebAssembly] Switch to *-wasm as the default target triple. 2018-01-23 16:55:44 +00:00
Twine.cpp Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. 2017-10-15 14:32:27 +00:00
Unicode.cpp
UnicodeCaseFold.cpp Resubmit r325107 (case folding DJB hash) 2018-02-21 22:36:31 +00:00
Valgrind.cpp
Watchdog.cpp
YAMLParser.cpp [ProfileData, Support] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC). 2017-06-21 23:19:47 +00:00
YAMLTraits.cpp [YAML] Fix UTF-8 handling 2017-12-21 17:14:09 +00:00
circular_raw_ostream.cpp
raw_os_ostream.cpp
raw_ostream.cpp [Support] Make the default chunk size of raw_fd_ostream to 1 GiB. 2017-10-31 17:37:20 +00:00
regcomp.c Fix llvm-special-case-list-fuzzer regexp exception 2017-10-27 19:15:13 +00:00
regengine.inc
regerror.c
regex2.h Support/reg*.h: Make headers include their dependencies 2017-10-26 20:23:11 +00:00
regex_impl.h
regexec.c
regfree.c
regstrlcpy.c
regutils.h
xxhash.cpp Revert r301487: Replace HashString algorithm with xxHash64 2017-04-26 23:15:10 +00:00

README.txt.system

Design Of lib/System
====================

The software in this directory is designed to completely shield LLVM from any
and all operating system specific functionality. It is not intended to be a
complete operating system wrapper (such as ACE), but only to provide the
functionality necessary to support LLVM.

The software located here, of necessity, has very specific and stringent design
rules. Violation of these rules means that cracks in the shield could form and
the primary goal of the library is defeated. By consistently using this library,
LLVM becomes more easily ported to new platforms since the only thing requiring
porting is this library.

Complete documentation for the library can be found in the file:
  llvm/docs/SystemLibrary.html
or at this URL:
  http://llvm.org/docs/SystemLibrary.html

While we recommend that you read the more detailed documentation, for the
impatient, here's a high level summary of the library's requirements.

 1. No system header files are to be exposed through the interface.
 2. Std C++ and Std C header files are okay to be exposed through the interface.
 3. No exposed system-specific functions.
 4. No exposed system-specific data.
 5. Data in lib/System classes must use only simple C++ intrinsic types.
 6. Errors are handled by returning "true" and setting an optional std::string
 7. Library must not throw any exceptions, period.
 8. Interface functions must not have throw() specifications.
 9. No duplicate function impementations are permitted within an operating
    system class.

To accomplish these requirements, the library has numerous design criteria that
must be satisfied. Here's a high level summary of the library's design criteria:

 1. No unused functionality (only what LLVM needs)
 2. High-Level Interfaces
 3. Use Opaque Classes
 4. Common Implementations
 5. Multiple Implementations
 6. Minimize Memory Allocation
 7. No Virtual Methods