forked from OSchip/llvm-project
3b17b84b9c
The issue was that the has function was generating different results depending on the signedness of char on the host platform. This commit fixes the issue by explicitly using an unsigned char type to prevent sign extension and adds some extra tests. The original commit message was: This patch implements a variant of the DJB hash function which folds the input according to the algorithm in the Dwarf 5 specification (Section 6.1.1.4.5), which in turn references the Unicode Standard (Section 5.18, "Case Mappings"). To achieve this, I have added a llvm::sys::unicode::foldCharSimple function, which performs this mapping. The implementation of this function was generated from the CaseMatching.txt file from the Unicode spec using a python script (which is also included in this patch). The script tries to optimize the function by coalescing adjecant mappings with the same shift and stride (terms I made up). Theoretically, it could be made a bit smarter and merge adjecant blocks that were interrupted by only one or two characters with exceptional mapping, but this would save only a couple of branches, while it would greatly complicate the implementation, so I deemed it was not worth it. Since we assume that the vast majority of the input characters will be US-ASCII, the folding hash function has a fast-path for handling these, and only whips out the full decode+fold+encode logic if we encounter a character outside of this range. It might be possible to implement the folding directly on utf8 sequences, but this would also bring a lot of complexity for the few cases where we will actually need to process non-ascii characters. Reviewers: JDevlieghere, aprantl, probinson, dblaikie Subscribers: mgorny, hintonda, echristo, clayborg, vleschuk, llvm-commits Differential Revision: https://reviews.llvm.org/D42740 llvm-svn: 325732 |
||
---|---|---|
.. | ||
Unix | ||
Windows | ||
AMDGPUMetadata.cpp | ||
APFloat.cpp | ||
APInt.cpp | ||
APSInt.cpp | ||
ARMAttributeParser.cpp | ||
ARMBuildAttrs.cpp | ||
ARMWinEH.cpp | ||
Allocator.cpp | ||
Atomic.cpp | ||
BinaryStreamError.cpp | ||
BinaryStreamReader.cpp | ||
BinaryStreamRef.cpp | ||
BinaryStreamWriter.cpp | ||
BlockFrequency.cpp | ||
BranchProbability.cpp | ||
CMakeLists.txt | ||
COM.cpp | ||
COPYRIGHT.regex | ||
CachePruning.cpp | ||
Chrono.cpp | ||
CodeGenCoverage.cpp | ||
CommandLine.cpp | ||
Compression.cpp | ||
ConvertUTF.cpp | ||
ConvertUTFWrapper.cpp | ||
CrashRecoveryContext.cpp | ||
DAGDeltaAlgorithm.cpp | ||
DJB.cpp | ||
DataExtractor.cpp | ||
Debug.cpp | ||
DebugCounter.cpp | ||
DeltaAlgorithm.cpp | ||
DynamicLibrary.cpp | ||
Errno.cpp | ||
Error.cpp | ||
ErrorHandling.cpp | ||
FileOutputBuffer.cpp | ||
FileUtilities.cpp | ||
FoldingSet.cpp | ||
FormatVariadic.cpp | ||
FormattedStream.cpp | ||
GlobPattern.cpp | ||
GraphWriter.cpp | ||
Hashing.cpp | ||
Host.cpp | ||
IntEqClasses.cpp | ||
IntervalMap.cpp | ||
JamCRC.cpp | ||
KnownBits.cpp | ||
LEB128.cpp | ||
LLVMBuild.txt | ||
LineIterator.cpp | ||
Locale.cpp | ||
LockFileManager.cpp | ||
LowLevelType.cpp | ||
MD5.cpp | ||
ManagedStatic.cpp | ||
MathExtras.cpp | ||
Memory.cpp | ||
MemoryBuffer.cpp | ||
Mutex.cpp | ||
NativeFormatting.cpp | ||
Options.cpp | ||
Parallel.cpp | ||
Path.cpp | ||
PluginLoader.cpp | ||
PrettyStackTrace.cpp | ||
Process.cpp | ||
Program.cpp | ||
README.txt.system | ||
RWMutex.cpp | ||
RandomNumberGenerator.cpp | ||
Regex.cpp | ||
SHA1.cpp | ||
ScaledNumber.cpp | ||
ScopedPrinter.cpp | ||
Signals.cpp | ||
SmallPtrSet.cpp | ||
SmallVector.cpp | ||
SourceMgr.cpp | ||
SpecialCaseList.cpp | ||
Statistic.cpp | ||
StringExtras.cpp | ||
StringMap.cpp | ||
StringPool.cpp | ||
StringRef.cpp | ||
StringSaver.cpp | ||
SystemUtils.cpp | ||
TarWriter.cpp | ||
TargetParser.cpp | ||
TargetRegistry.cpp | ||
ThreadLocal.cpp | ||
ThreadPool.cpp | ||
Threading.cpp | ||
Timer.cpp | ||
ToolOutputFile.cpp | ||
TrigramIndex.cpp | ||
Triple.cpp | ||
Twine.cpp | ||
Unicode.cpp | ||
UnicodeCaseFold.cpp | ||
Valgrind.cpp | ||
Watchdog.cpp | ||
YAMLParser.cpp | ||
YAMLTraits.cpp | ||
circular_raw_ostream.cpp | ||
raw_os_ostream.cpp | ||
raw_ostream.cpp | ||
regcomp.c | ||
regengine.inc | ||
regerror.c | ||
regex2.h | ||
regex_impl.h | ||
regexec.c | ||
regfree.c | ||
regstrlcpy.c | ||
regutils.h | ||
xxhash.cpp |
README.txt.system
Design Of lib/System ==================== The software in this directory is designed to completely shield LLVM from any and all operating system specific functionality. It is not intended to be a complete operating system wrapper (such as ACE), but only to provide the functionality necessary to support LLVM. The software located here, of necessity, has very specific and stringent design rules. Violation of these rules means that cracks in the shield could form and the primary goal of the library is defeated. By consistently using this library, LLVM becomes more easily ported to new platforms since the only thing requiring porting is this library. Complete documentation for the library can be found in the file: llvm/docs/SystemLibrary.html or at this URL: http://llvm.org/docs/SystemLibrary.html While we recommend that you read the more detailed documentation, for the impatient, here's a high level summary of the library's requirements. 1. No system header files are to be exposed through the interface. 2. Std C++ and Std C header files are okay to be exposed through the interface. 3. No exposed system-specific functions. 4. No exposed system-specific data. 5. Data in lib/System classes must use only simple C++ intrinsic types. 6. Errors are handled by returning "true" and setting an optional std::string 7. Library must not throw any exceptions, period. 8. Interface functions must not have throw() specifications. 9. No duplicate function impementations are permitted within an operating system class. To accomplish these requirements, the library has numerous design criteria that must be satisfied. Here's a high level summary of the library's design criteria: 1. No unused functionality (only what LLVM needs) 2. High-Level Interfaces 3. Use Opaque Classes 4. Common Implementations 5. Multiple Implementations 6. Minimize Memory Allocation 7. No Virtual Methods