Commit Graph

5512 Commits

Author SHA1 Message Date
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Alexandre Ganea d110c3a9f5 [ADT] Support BitVector as a key in DenseSet/Map
This patch adds DenseMapInfo<> support for BitVector and SmallBitVector.

This is part of https://reviews.llvm.org/D71775, where a BitVector is used as a thread affinity mask.
2020-02-14 10:24:22 -05:00
James Henderson a55dec7d64 [test][DebugInfo] Fix signed/unsigned comparison problem in test
This caused build bot failures:
http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/8568/
2020-02-14 13:40:44 +00:00
James Henderson fe6983a75a [DebugInfo] Error if unsupported address size detected in line table
Prior to this patch, if a DW_LNE_set_address opcode was parsed with an
address size (i.e. with a length after the opcode) of anything other 1,
2, 4, or 8, an llvm_unreachable would be hit, as the data extractor does
not support other values. This patch introduces a new error check that
verifies the address size is one of the supported sizes, in common with
other places within the DWARF parsing.

This patch also fixes calculation of a generated line table's size in
unit tests. One of the tests in this patch highlighted a bug introduced
in 1271cde474, when non-byte operands were used as arguments for
extended or standard opcodes.

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D73962
2020-02-14 11:08:12 +00:00
Vedant Kumar 8e77b33b3c [Local] Do not move around dbg.declares during replaceDbgDeclare
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.

Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.

However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.

Testing: stage2 RelWithDebInfo sanitized build, check-llvm

rdar://59397340

Differential Revision: https://reviews.llvm.org/D74517
2020-02-13 14:35:02 -08:00
Fangrui Song 0bc77a0f0d [AsmPrinter] De-capitalize some AsmPrinter::Emit* functions
Similar to rL328848.
2020-02-13 13:38:33 -08:00
Jonas Devlieghere 5810ed5186 [llvm][TextAPI/MachO] Extract common code into unittest helper (NFC)
This extract common code between the 4 TBD formats in a header that can
be shared.

Differential revision: https://reviews.llvm.org/D73332
2020-02-13 12:53:24 -08:00
Jonas Devlieghere c6e8bfe7c9 [llvm][TextAPI/MachO] Extend TBD_V4 unittest to verify writing
Same as D73328 but for TBD_V4. One notable tidbit is that the swift abi
version for swift 1 & 2 is emitted as a float which is considered
invalid input.

Differential revision: https://reviews.llvm.org/D73330
2020-02-13 12:53:24 -08:00
Greg Clayton 19602b7194 Add a DWARF transformer class that converts DWARF to GSYM.
Summary:
The DWARF transformer is added as a class so it can be unit tested fully.

The DWARF is converted to GSYM format and handles many special cases for functions:
- omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped)
- omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped)
- omit any functions whose high PC is <= low PC (dead stripped)
- StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do.
- When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase".
- omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out.

Reviewers: aprantl, dblaikie, probinson

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74450
2020-02-13 10:48:37 -08:00
Johannes Doerfert 3f3ec9c40b [OpenMP][FIX] Collect blocks to be outlined after finalization
Finalization can introduce new blocks we need to outline as well so it
makes sense to identify the blocks that need to be outlined after
finalization happened. There was also a minor unit test adjustment to
account for the fact that we have a single outlined exit block now.
2020-02-13 00:42:22 -06:00
Johannes Doerfert 70cac41a2b Reapply "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late"
Reapply 8a56d64d76 with minor fixes.

The problem was that cancellation can cause new edges to the parallel
region exit block which is not outlined. The CodeExtractor will encode
the information which "exit" was taken as a return value. The fix is to
ensure we do not return any value from the outlined function, to prevent
control to value conversion we ensure a single exit block for the
outlined region.

This reverts commit 3aac953afa.
2020-02-12 22:29:07 -06:00
Johannes Doerfert 3aac953afa Revert "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late"
This reverts commit 8a56d64d76.

Will be recommitted once the clang test problem is addressed.
2020-02-12 18:50:43 -06:00
Johannes Doerfert 8a56d64d76 [OpenMP][IRBuilder] Perform finalization (incl. outlining) late
In order to fix PR44560 and to prepare for loop transformations we now
finalize a function late, which will also do the outlining late. The
logic is as before but the actual outlining step happens now after the
function was fully constructed. Once we have loop transformations we
can apply them in the finalize step before the outlining.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D74372
2020-02-12 17:55:01 -06:00
Roman Lebedev 687bbf85de
[llvm-exegesis] CombinationGenerator: don't store function_ref
function_ref is non-owning, so if we get it as a parameter in constructor,
our reference goes out-of-scope as soon as constructor returns.
Instead, let's just take it as a parameter to the actual `generate()` call
2020-02-12 23:33:23 +03:00
Roman Lebedev 6030fe01f4
[llvm-exegesis] Exploring X86::OperandType::OPERAND_COND_CODE
Summary:
Currently, we only have nice exploration for LEA instruction,
while for the rest, we rely on `randomizeUnsetVariables()`
to sometimes generate something interesting.
While that works, it isn't very reliable in coverage :)

Here, i'm making an assumption that while we may want to explore
multi-instruction configs, we are most interested in the
characteristics of the main instruction we were asked about.

Which we can do, by taking the existing `randomizeMCOperand()`,
and turning it on it's head - instead of relying on it to randomly fill
one of the interesting values, let's pregenerate all the possible interesting
values for the variable, and then generate as much `InstructionTemplate`
combinations of these possible values for variables as needed/possible.

Of course, that requires invasive changes to no longer pass just the
naked `Instruction`, but sometimes partially filled `InstructionTemplate`.

As it can be seen from the test, this allows us to explore
`X86::OperandType::OPERAND_COND_CODE` for instructions
that take such an operand.
I'm hoping this will greatly simplify exploration.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: orodley, mgorny, sdardis, tschuett, jrtc27, atanasyan, mstojanovic, andreadb, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74156
2020-02-12 21:33:52 +03:00
James Henderson bf4d8f2952 [DebugInfo] Add checks for v2 directory and file name table terminators
The DWARFv2-4 specification for the line table header states that the
include directories and file name tables both end with a single null
byte. Prior to this change, the parser did not detect if this byte was
missing, because it also stopped reading the tables once it reached the
prologue end, as claimed by the header_length field. This change adds a
check that the terminator has been seen at the end of each table.

Reviewed by: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D74413
2020-02-12 14:49:22 +00:00
James Henderson 1da62b51a5 [DebugInfo] Print version in error message in decimal
Also remove some test duplication and add a test case that shows the
maximum version is rejected (this also shows that the value in the error
message is actually in decimal, and not just missing an 0x prefix).

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D74403
2020-02-12 14:49:22 +00:00
Ehud Katz 167c428490 [unittests] Fix TargetLibraryInfoTest.ValidProto 2020-02-12 14:13:14 +02:00
Ehud Katz 9d0956ebd4 [APFloat] Fix FP remainder operation
Reimplement IEEEFloat::remainder() function.

Fix PR3359.

Differential Revision: https://reviews.llvm.org/D69776
2020-02-12 10:42:55 +02:00
Justin Lebar 1bd6123b78 Use std::foo_t rather than std::foo in LLVM.
Summary: C++14 migration. No functional change.

Reviewers: bkramer, JDevlieghere, lebedev.ri

Subscribers: MatzeB, hiraditya, jkorous, dexonsmith, arphaman, kadircet, lebedev.ri, usaxena95, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D74384
2020-02-11 15:12:51 -08:00
Sterling Augustine 257e412762 Update test for windows. 2020-02-11 12:35:46 -08:00
Sterling Augustine 417375d785 Allow retrieving source files relative to the compilation directory.
Summary:
Dwarf stores source-file names the three parts:
<compilation_directory><include_directory><filename>

Prior to this change, the code only allowed retrieving either all
three as the absolute path, or just the filename.  But many
compile-command lines--especially those in hermetic build systems
don't specify an absolute path, nor just the filename, but rather the
path relative to the compilation directory. This features allows
retrieving them in that style.

Add tests for path printing styles.

Modify createBasicPrologue to handle include directories.

Subscribers: aprantl, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73383
2020-02-11 11:46:20 -08:00
Cyndy Ishida 8c3d0d6a5f [llvm][TextAPI] add simulators to output
Summary:
* for <= tbd_v3, simulator platforms appear the same as the real
platform and we distinct the difference from the architecture.

fixes: rdar://problem/59161559

Reviewers: ributzka, steven_wu

Reviewed By: ributzka

Subscribers: hiraditya, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74416
2020-02-11 10:37:37 -08:00
Justin Lebar fb45968e62 Use C++14-style return type deduction in LLVM.
Summary:
Simplifies the C++11-style "-> decltype(...)" return-type deduction.

Note that you have to be careful about whether the function return type
is `auto` or `decltype(auto)`.  The difference is that bare `auto`
strips const and reference, just like lambda return type deduction.  In
some cases that's what we want (or more likely, we know that the return
type is a value type), but whenever we're wrapping a templated function
which might return a reference, we need to be sure that the return type
is decltype(auto).

No functional change.

Subscribers: dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74383
2020-02-11 07:38:42 -08:00
Hiroshi Yamauchi bb383ae612 [CallPromotionUtils] Add tryPromoteCall.
Summary: It attempts to devirtualize a call on alloca through vtable loads.

Reviewers: davidxl

Subscribers: mgorny, Prazek, hiraditya, llvm-commits

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
2020-02-10 13:43:16 -08:00
Martin Storsjö bc8e442188 [test] Disable the Passes/PluginsTest cases on windows with BUILD_SHARED_LIBS
The plugin expects to have undefined references to symbols exported
by the loading process, which isn't supported by shared libraries
on windows.

Differential Revision: https://reviews.llvm.org/D74042
2020-02-10 22:50:36 +02:00
Sterling Augustine 0bd48c3d4e [DebugInfo] Support file-level include directories when generating DWARFv5 LineTable prologues.
Differential Revision: https://reviews.llvm.org/D74249
2020-02-10 12:24:46 -08:00
James Henderson eea9040f42 [DebugInfo][test] Fix host endian test issue
The test previously assumed that the host was little endian, which broke
the big endian build bots.
2020-02-10 16:23:30 +00:00
Kadir Cetinkaya 5731b6672d
Revert "[OpenMP] Fix unused variable"
This breaks under asan, see http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38597/steps/check-clang%20asan/logs/stdio

This reverts commit bb50454295.

Revert "[FIX] Ordering problem accidentally introduced with D72304"

This reverts commit 08c0a06d8f.

Revert "[OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder."

This reverts commit e8a436c5ea.
2020-02-10 16:34:59 +01:00
James Henderson fddacd00fc [DebugInfo][test] Fix(?) build bots due to incorrect type usage 2020-02-10 15:11:49 +00:00
Bill Wendling c55cf4afa9 Revert "Remove redundant "std::move"s in return statements"
The build failed with

  error: call to deleted constructor of 'llvm::Error'

errors.

This reverts commit 1c2241a793.
2020-02-10 07:07:40 -08:00
James Henderson b1c7bfe6da [DebugInfo] Reject line tables of version > 5
If a debug line section with version of greater than 5 is encountered,
prior to this change the parser would accept it and treat it as version
5. This might work to some extent, but then it might not at all, as it
really depends on the format of the unspecified future version, which
will be different (otherwise there would be no point in changing the
version number). Any information we could provide has a good chance of
being invalid, so we should just refuse to parse such tables.

Reviewed by: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D74204
2020-02-10 14:43:10 +00:00
James Henderson cd37f0ad64 [NFC] Fix line endings 2020-02-10 14:41:46 +00:00
Bill Wendling 1c2241a793 Remove redundant "std::move"s in return statements 2020-02-10 06:39:44 -08:00
James Henderson 1dc62d0358 [DebugInfo][test] Replace pre-canned binary test
The DebugInfo/dwarfdump-invalid-line-table test used a pre-canned binary
generated by a fuzzer to demonstrate a bug fix. Unfortunately, the
binary is rigid and requires hand-editing if we change behaviour, such
as rejecting certain properties within it (as I plan on doing in another
change).

Rather than hand-edit the binary, I have replaced it with two tests. The
first tests the high-level code path from the debug line parser that
produces the same error as this test previously did, and the second is a
set of unit test cases that comprehensively cover the
FormValue::skipValue method, which in turn covers the area that the
original bug fix touched.

Reviewed by: MaskRay, dblaikie

Differential Revision: https://reviews.llvm.org/D74202
2020-02-10 13:54:40 +00:00
Matt Arsenault 6135f5eda4 GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ
The result type is separate from the source type.
2020-02-09 18:11:43 -05:00
Johannes Doerfert c057d1d3af [FIX] Fix warning in LazyCallGraphTest caused by D70927 2020-02-08 18:58:16 -06:00
fady e8a436c5ea [OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder.
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.

Also this patch modifies clang to use the new directives when  `-fopenmp-enable-irbuilder` commandline option is passed.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D72304
2020-02-08 18:55:48 -06:00
Johannes Doerfert 72277ecd62 Introduce a CallGraph updater helper class
The CallGraphUpdater is a helper that simplifies the process of updating
the call graph, both old and new style, while running an CGSCC pass.

The uses are contained in different commits, e.g. D70767.

More functionality is added as we need it.

Reviewed By: modocache, hfinkel

Differential Revision: https://reviews.llvm.org/D70927
2020-02-08 14:16:48 -06:00
George Burgess IV f8c9ceb1ce [SimplifyLibCalls] Add __strlen_chk.
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.

Differential Revision: https://reviews.llvm.org/D74079
2020-02-08 11:51:00 -08:00
Petar Avramovic 7df5fc9e03 [GlobalISel] Add buildMerge with SrcOp initializer list
Allows more flexible use of buildMerge in places where
use operands are available as SrcOp since it does not
require explicit conversion to Register.
Simplify code with new buildMerge.

Differential Revision: https://reviews.llvm.org/D74223
2020-02-07 18:43:45 +01:00
Matt Arsenault 8de2dad9e0 GlobalISel: Fix lowering of G_CTLZ/G_CTTZ
The type passed to lower was invalid, so I'm not sure how this was
even working before. The source and destination type also do not have
to match, so make sure to use the right ones.
2020-02-07 06:54:12 -08:00
Konstantin Schwarz 76986bdc46 [GlobalISel] Legalize more G_FP(EXT|TRUNC) libcalls.
This adds a new helper function for retrieving the
floating point type corresponding to the specified
bit-width.
2020-02-06 11:41:34 -08:00
Matt Arsenault 9087ef0765 GlobalISel: Allow CSE of G_IMPLICIT_DEF
The legalizer produces a lot of these, and they make reading legalized
MIR annoying. For some reason, this does seem to sometimes introduce
copies of implicit def, which is dumb.
2020-02-05 17:47:21 -05:00
Hans Wennborg 6c4a8bc0a9 Make llvm::crc32() work also for input sizes larger than 32 bits.
The problem was noticed by the Chrome OS toolchain folks
(crbug.com/1048445) because llvm-objcopy --add-gnu-debuglink would
insert the wrong checksum when processing a binary larger than 4 GB.
That use case regressed in 1e1e3ba252 when we started using
llvm::crc32() in more places.

Differential revision: https://reviews.llvm.org/D74039
2020-02-05 21:32:11 +01:00
Sanjay Patel 686a038ed8 [Analysis] add query to get splat value from array of ints
I was debug stepping through an x86 shuffle lowering and
noticed we were doing an N^2 search for splat index. I
didn't find the equivalent functionality anywhere else in
LLVM, so here's a helper that takes an array of int and
returns a splatted index while ignoring undefs (any
negative value).

This might also be used inside existing
ShuffleVectorInst/ShuffleVectorSDNode functions and/or
help with D72467.

Differential Revision: https://reviews.llvm.org/D74064
2020-02-05 14:55:02 -05:00
Adrian McCarthy da45bd2321 [VFS] More consistent support for Windows
Removed some #ifdefs specific to Windows handling of VFS paths.  This
eliminates most of the differences between the Windows and non-Windows
code paths.

Making this work required some changes to account for the fact that VFS
file paths can be Posix style or Windows style, so you cannot just assume
that they use the host's native path style.  In one case, this means
implementing our own version of make_absolute, since the filesystem code
in Support doesn't have styles in the sense that the path code does.

Differential Review: https://reviews.llvm.org/D71092
2020-02-05 11:38:20 -08:00
Momchil Velikov 3627c91ead [ARM][TargetParser] Improve handling of dependencies between target features
The patch at https://reviews.llvm.org/D64048 added "negative"
dependency handling in `ARM::appendArchExtFeatures`: feature "noX"
removes all features, which imply "X".

This patch adds the "positive" handling: feature "X" adds all the
feature strings implied by "X".

(This patch also comes from the suggestion here
https://reviews.llvm.org/D72633#inline-658582)

Differential Revision: https://reviews.llvm.org/D72762
2020-02-05 16:07:51 +00:00
Martin Storsjö 2f1ca30f99 Partially revert c1c9819ef9
Revert the part of that change that broke the
test Passes/./PluginsTests/PluginsTests.LoadPlugin.
2020-02-05 13:29:48 +02:00
Martin Storsjö c1c9819ef9 [CMake] Add missing component dependencies, to fix building for mingw with BUILD_SHARED_LIBS
Differential Revision: https://reviews.llvm.org/D73840
2020-02-05 13:10:27 +02:00