Commit Graph

191904 Commits

Author SHA1 Message Date
Matt Arsenault 34d9a16e54 AMDGPU: Add option to expand 64-bit integer division in IR
I didn't realize we were already expanding 24/32-bit division here
already. Use the available IntegerDivision utilities. This uses loops,
so produces significantly smaller code than the inline DAG expansion.

This now requires width reductions of 64-bit divisions before
introducing the expanded loops.

This helps work around missing legalization in GlobalISel for
division, which are the only remaining core instructions that didn't
work at all.

I think this is plausibly a better implementation than exists in the
DAG, although turning it on by default misses out on the constant
value optimizations and also needs benchmarking.
2020-02-14 11:16:08 -08:00
Craig Topper 391cc4dd41 [X86] Use ZERO_EXTEND instead of SIGN_EXTEND in the fast isel handling of convert_from_fp16. 2020-02-14 10:57:12 -08:00
Craig Topper fc0c72b2df [X86] Add AVX512 support to the fast isel code for Intrinsic::convert_from_fp16/convert_to_fp16. 2020-02-14 10:57:11 -08:00
Alina Sbirlea 1326a5a4cf [LoopRotate] Get and update MSSA only if available in legacy pass manager.
Summary:
Potential fix for: https://bugs.llvm.org/show_bug.cgi?id=44889 and https://bugs.llvm.org/show_bug.cgi?id=44408

In the legacy pass manager, loop rotate need not compute MemorySSA when not being in the same loop pass manager with other loop passes.
There isn't currently a way to differentiate between the two cases, so this attempts to limit the usage in LoopRotate to only update MemorySSA when the analysis is already available.
The side-effect of this is that it will split the Loop pipeline.

This issue does not apply to the new pass manager, where we have a flag specifying if all loop passes in that loop pass manager preserve MemorySSA.

Reviewers: dmgreen, fedor.sergeev, nikic

Subscribers: Prazek, hiraditya, george.burgess.iv, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74574
2020-02-14 10:47:26 -08:00
Matt Arsenault bfbfa18591 GlobalISel: Lower s64->s16 G_FPTRUNC
This is more or less directly ported from the AMDGPU custom lowering
for FP_TO_FP16. I made a few minor fixups (using G_UNMERGE_VALUES
instead of creating shift/trunc to extract the two halves, and zexting
an inverted compare instead of select_cc).

This also does not include the fast math expansion the DAG which
converts to f32 and then to f16. I think that belongs in a
pre-legalize combine instead.
2020-02-14 10:46:58 -08:00
Volkan Keles 187686a22f [GlobalISel] LegalizationArtifactCombiner: Fix a bug in tryCombineMerges
Like COPY instructions explained in D70616, we don't check the constraints
when combining G_UNMERGE_VALUES. Use the same logic used in D70616 to check
if registers can be replaced, or a COPY instruction needs to be built.

https://reviews.llvm.org/D70564
2020-02-14 10:45:58 -08:00
Brian Cain bf3b86bc2f [Hexagon] v67+ HVX register pairs should support either direction
Assembler now permits pairs like 'v0:1', which are encoded
differently from the odd-first pairs like 'v1:0'.

The compiler will require more work to leverage these new register
pairs.
2020-02-14 12:43:43 -06:00
Simon Pilgrim f0181cc7ba [APInt] Add some basic APInt::byteSwap unit tests
As noted on D74621 we currently have no test coverage
2020-02-14 18:15:13 +00:00
Matt Arsenault b38940dfb9 TTI: Fix vectorization cost for bswap 2020-02-14 10:14:07 -08:00
Matt Arsenault 8c2c0b3637 AMDGPU: Improve i16/v2i16 bswap 2020-02-14 09:53:22 -08:00
Craig Topper 7badb38918 [X86] Fix copy/paste mistake in comment. NFC 2020-02-14 09:47:50 -08:00
Matt Arsenault e0fd2d6d62 AMDGPU: Add baseline tests for 16-bit bswap 2020-02-14 09:34:13 -08:00
Matt Arsenault a257bde420 AMDGPU/GlobalISel: Handle G_BSWAP 2020-02-14 09:09:44 -08:00
Alexandre Ganea 0d2ba6577d Fix compilation breakage introduced by 8404aeb56a.
Also fix BitVector unittest failure when DLLVM_ENABLE_ASSERTIONS are OFF, introduced by d110c3a9f5.
2020-02-14 11:17:18 -05:00
Evgeniy Brevnov cae643d596 Reverting D73027 [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151). 2020-02-14 22:57:23 +07:00
Alexandre Ganea 8404aeb56a [Support] On Windows, ensure hardware_concurrency() extends to all CPU sockets and all NUMA groups
The goal of this patch is to maximize CPU utilization on multi-socket or high core count systems, so that parallel computations such as LLD/ThinLTO can use all hardware threads in the system. Before this patch, on Windows, a maximum of 64 hardware threads could be used at most, in some cases dispatched only on one CPU socket.

== Background ==
Windows doesn't have a flat cpu_set_t like Linux. Instead, it projects hardware CPUs (or NUMA nodes) to applications through a concept of "processor groups". A "processor" is the smallest unit of execution on a CPU, that is, an hyper-thread if SMT is active; a core otherwise. There's a limit of 32-bit processors on older 32-bit versions of Windows, which later was raised to 64-processors with 64-bit versions of Windows. This limit comes from the affinity mask, which historically is represented by the sizeof(void*). Consequently, the concept of "processor groups" was introduced for dealing with systems with more than 64 hyper-threads.

By default, the Windows OS assigns only one "processor group" to each starting application, in a round-robin manner. If the application wants to use more processors, it needs to programmatically enable it, by assigning threads to other "processor groups". This also means that affinity cannot cross "processor group" boundaries; one can only specify a "preferred" group on start-up, but the application is free to allocate more groups if it wants to.

This creates a peculiar situation, where newer CPUs like the AMD EPYC 7702P (64-cores, 128-hyperthreads) are projected by the OS as two (2) "processor groups". This means that by default, an application can only use half of the cores. This situation could only get worse in the years to come, as dies with more cores will appear on the market.

== The problem ==
The heavyweight_hardware_concurrency() API was introduced so that only *one hardware thread per core* was used. Once that API returns, that original intention is lost, only the number of threads is retained. Consider a situation, on Windows, where the system has 2 CPU sockets, 18 cores each, each core having 2 hyper-threads, for a total of 72 hyper-threads. Both heavyweight_hardware_concurrency() and hardware_concurrency() currently return 36, because on Windows they are simply wrappers over std:🧵:hardware_concurrency() -- which can only return processors from the current "processor group".

== The changes in this patch ==
To solve this situation, we capture (and retain) the initial intention until the point of usage, through a new ThreadPoolStrategy class. The number of threads to use is deferred as late as possible, until the moment where the std::threads are created (ThreadPool in the case of ThinLTO).

When using hardware_concurrency(), setting ThreadCount to 0 now means to use all the possible hardware CPU (SMT) threads. Providing a ThreadCount above to the maximum number of threads will have no effect, the maximum will be used instead.
The heavyweight_hardware_concurrency() is similar to hardware_concurrency(), except that only one thread per hardware *core* will be used.

When LLVM_ENABLE_THREADS is OFF, the threading APIs will always return 1, to ensure any caller loops will be exercised at least once.

Differential Revision: https://reviews.llvm.org/D71775
2020-02-14 10:24:22 -05:00
Alexandre Ganea d110c3a9f5 [ADT] Support BitVector as a key in DenseSet/Map
This patch adds DenseMapInfo<> support for BitVector and SmallBitVector.

This is part of https://reviews.llvm.org/D71775, where a BitVector is used as a thread affinity mask.
2020-02-14 10:24:22 -05:00
Alex Richardson c29310707e Fix line endings produced by update_cc_test_checks.py
Use the same appraoch as update_llc_test_checks.py to always write \n
line endings. This should fix the Windows buildbots.
2020-02-14 15:17:27 +00:00
Alex Richardson 61dd0603bd Move update_cc_test_checks.py tests to clang
Having tests that depend on clang inside llvm/ are not a good idea since
it can break incremental `ninja check-llvm`.

Fixes https://llvm.org/PR44798

Reviewed By: lebedev.ri, MaskRay, rsmith
Differential Revision: https://reviews.llvm.org/D74051
2020-02-14 14:39:55 +00:00
Teresa Johnson 2102ef8aad Reenable "Always import constants" after compile time fixes
Summary:
Reenables importing of constants by default, which was disabled in
D73724 due to excessive thin link times. These inefficiencies were
fixed in D73851.

I re-measured thin link times for a number of binaries that had compile
time explosions with importing of constants previously and confirmed
they no longer have any notable increases with it enabled.

Reviewers: wmi, evgeny777

Subscribers: hiraditya, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74512
2020-02-14 06:37:14 -08:00
Pavel Iliin b6a9fe2099 [AArch64] Add BIT/BIF support.
This patch added generation of SIMD bitwise insert BIT/BIF instructions.
In the absence of GCC-like functionality for optimal constraints satisfaction
during register allocation the bitwise insert and select patterns are matched
by pseudo bitwise select BSP instruction with not tied def.
It is expanded later after register allocation with def tied
to BSL/BIT/BIF depending on operands registers.
This allows to get rid of redundant moves.

Reviewers: t.p.northover, samparker, dmgreen

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D74147
2020-02-14 14:19:39 +00:00
James Henderson a55dec7d64 [test][DebugInfo] Fix signed/unsigned comparison problem in test
This caused build bot failures:
http://lab.llvm.org:8011/builders/ppc64le-lld-multistage-test/builds/8568/
2020-02-14 13:40:44 +00:00
Simon Pilgrim 2492075add [X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets
Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.

REAPPLIED: Original commit rG11c16e71598d was reverted at rGde1d90299b16 as it wasn't accounting for later lowering. This version emits ROTLI or the OR(VSHLI/VSRLI) directly to avoid the issue.
2020-02-14 11:55:18 +00:00
Luís Marques de1c2877a9 llvm/cmake/config.guess: add support for riscv32 and riscv64
Summary: LLVM configuration fails with 'unable to guess system type' on riscv64.
Add support for detecting riscv32 and riscv64 systems.

Patch by Gokturk Yuksek (gokturk)
Reviewers: erichkeane, rengolin, mgorny, aaron.ballman, beanz, luismarques
Reviewed By: luismarques
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68899
2020-02-14 11:53:51 +00:00
Roger Ferrer Ibanez 2bef1c0e56 [OpenMP] Lower taskyield using OpenMP IR Builder
This is similar to D69828.

Special codegen for enclosing untied tasks is still done in clang.

Differential Revision: https://reviews.llvm.org/D70799
2020-02-14 11:35:17 +00:00
Andrew Ng 430fc538e6 [llvm-ar] Simplify Windows comparePaths NFCI
Replace use of widenPath in comparePaths with UTF8ToUTF16. widenPath
does a lot more than just conversion from UTF-8 to UTF-16. This is not
necessary for CompareStringOrdinal and could possibly even cause
problems.

Differential Revision: https://reviews.llvm.org/D74477
2020-02-14 11:20:17 +00:00
James Henderson fe6983a75a [DebugInfo] Error if unsupported address size detected in line table
Prior to this patch, if a DW_LNE_set_address opcode was parsed with an
address size (i.e. with a length after the opcode) of anything other 1,
2, 4, or 8, an llvm_unreachable would be hit, as the data extractor does
not support other values. This patch introduces a new error check that
verifies the address size is one of the supported sizes, in common with
other places within the DWARF parsing.

This patch also fixes calculation of a generated line table's size in
unit tests. One of the tests in this patch highlighted a bug introduced
in 1271cde474, when non-byte operands were used as arguments for
extended or standard opcodes.

Reviewed by: dblaikie

Differential Revision: https://reviews.llvm.org/D73962
2020-02-14 11:08:12 +00:00
Roger Ferrer Ibanez a82f35e176 [OpenMP] Lower taskwait using OpenMP IR Builder
The code generation is exactly the same as it was.

But not that the special handling of untied tasks is still handled by
emitUntiedSwitch in clang.

Differential Revision: https://reviews.llvm.org/D69828
2020-02-14 09:53:02 +00:00
James Henderson 4e1c49cf4d [doc] Clarify responsibility for fixing experimental target problems
Experimental targets are meant to be maintained by the community behind
the target. They are not monitored by the primary build bots. This
change clarifies that it is this communities responsibility for things
like test fixes related to the target caused by changes unrelated to
that target.

See http://lists.llvm.org/pipermail/llvm-dev/2020-February/139115.html
for a full discussion.

Reviewed by: rupprecht, lattner, MaskRay

Differential Revision: https://reviews.llvm.org/D74538
2020-02-14 09:50:18 +00:00
Kazushi (Jam) Marukawa 60431bd728 [VE] Support for PIC (global data and calls)
Summary: Support for PIC with tests for global variables and function calls.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D74536
2020-02-14 09:50:02 +01:00
Kadir Cetinkaya 1674f772b4
[VecotrCombine] Fix unused variable for assertion disabled builds 2020-02-14 09:30:29 +01:00
Sam Parker fd01b2f4a6 [NFC][ARM] Convert some pointers to references. 2020-02-14 08:29:01 +00:00
Fangrui Song bcd24b2d43 [AsmPrinter][MCStreamer] De-capitalize EmitInstruction and EmitCFI* 2020-02-13 22:08:55 -08:00
Evgeniy Brevnov 5573abceab [DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses(PR42151).
Summary:
This is second attempt to fix the problem with incorrect dependencies reported in presence of invariant load. Initial fix (https://reviews.llvm.org/D64405) was reverted due to a regression reported in https://reviews.llvm.org/D70516.

The original fix changed caching behavior for invariant loads. Namely such loads are not put into the second level cache (NonLocalDepInfo). The problem with that fix is the first level cache  (CachedNonLocalPointerInfo) still works as if invariant loads were in the second level cache. The solution is in addition to not putting dependence results into the second level cache avoid putting info about invariant loads into the first level cache as well.

Reviewers: jdoerfert, reames, hfinkel, efriedma

Reviewed By: jdoerfert

Subscribers: DaniilSuchkov, hiraditya, bmahjour, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73027
2020-02-14 12:18:31 +07:00
Jonas Devlieghere 5feb80e748 [dsymutil] Fix double relocation of DW_AT_call_return_pc
When the DW_AT_call_return_pc matches a relocation, the call return pc
would get relocated twice, once because of the relocation in the object
file and once because of dsymutil. The same problem exists for the low
and high PC and the fix is the same. We remember the low, high and
return pc of the original DIE and relocate that, rather than the
potentially already relocated value.

Reviewed offline by Fred Riss.
2020-02-13 17:42:48 -08:00
Liu, Chen3 ec89335c47 [X86] Fix the bug that _mm_mask_cvtsepi64_epi32 generates result without
zero the upper 64bit.

Differential Revision : https://reviews.llvm.org/D74552
2020-02-14 09:26:06 +08:00
Eric Christopher e635e48020 Reinstate llvm-go to test the go bindings.
This partially reverts commit 102814b4d3.
2020-02-13 17:24:55 -08:00
Fangrui Song 1d49eb00d9 [AsmPrinter] De-capitalize all AsmPrinter::Emit* but EmitInstruction
Similar to rL328848.
2020-02-13 17:06:24 -08:00
Thomas Lively 918e90559b [WebAssembly] Make stack pointer args inhibit tail calls
Summary:
Also make return calls terminator instructions so epilogues are
inserted before them rather than after them. Together, these changes
make WebAssembly's tail call optimization more stack-safe.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D73943
2020-02-13 16:43:53 -08:00
Pavel Iliin b23ec43973 [AArch64][NFC] Update test checks.
This NFC commit updates several llc tests checks by automatically generated ones.
2020-02-14 00:13:15 +00:00
Vedant Kumar 3091049446 Add dbgs() output to help track down missing DW_AT_location bugs, NFC 2020-02-13 14:38:44 -08:00
Vedant Kumar 8e77b33b3c [Local] Do not move around dbg.declares during replaceDbgDeclare
replaceDbgDeclare is used to update the descriptions of stack variables
when they are moved (e.g. by ASan or SafeStack). A side effect of
replaceDbgDeclare is that it moves dbg.declares around in the
instruction stream (typically by hoisting them into the entry block).
This behavior was introduced in llvm/r227544 to fix an assertion failure
(llvm.org/PR22386), but no longer appears to be necessary.

Hoisting a dbg.declare generally does not create problems. Usually,
dbg.declare either describes an argument or an alloca in the entry
block, and backends have special handling to emit locations for these.
In optimized builds, LowerDbgDeclare places dbg.values in the right
spots regardless of where the dbg.declare is. And no one uses
replaceDbgDeclare to handle things like VLAs.

However, there doesn't seem to be a positive case for moving
dbg.declares around anymore, and this reordering can get in the way of
understanding other bugs. I propose getting rid of it.

Testing: stage2 RelWithDebInfo sanitized build, check-llvm

rdar://59397340

Differential Revision: https://reviews.llvm.org/D74517
2020-02-13 14:35:02 -08:00
Sanjay Patel 19b62b79db [VectorCombine] try to form vector binop to eliminate an extract element
binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C

This is a transform that has been considered for canonicalization (instcombine)
in the past because it reduces instruction count. But as shown in the x86 tests,
it's impossible to know if it's profitable without a cost model. There are many
potential target constraints to consider.

We have implemented similar transforms in the backend (DAGCombiner and
target-specific), but I don't think we have this exact fold there either (and if
we did it in SDAG, it wouldn't work across blocks).

Note: this patch was intended to handle the more general case where the extract
indexes do not match, but it got too big, so I scaled it back to this pattern
for now.

Differential Revision: https://reviews.llvm.org/D74495
2020-02-13 17:23:27 -05:00
Francesco Petrogalli fe36127982 [build] Fix shared lib builds. 2020-02-13 22:09:52 +00:00
Fangrui Song 0bc77a0f0d [AsmPrinter] De-capitalize some AsmPrinter::Emit* functions
Similar to rL328848.
2020-02-13 13:38:33 -08:00
Craig Topper c2e8a421ac [X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL.
If we widen the compare we might trigger a spurious exception from
the garbage data.

We have two choices here. Explicitly force the upper bits to zero.
Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM
result to mask register.

I've chosen to go with the second option. I'm not sure which is
really best. In some cases we could get rid of the zeroing since
the producing instruction probably already zeroed it. But we lose
the ability to fold a load. So which is best is dependent on
surrounding code.

Differential Revision: https://reviews.llvm.org/D74522
2020-02-13 13:26:40 -08:00
Fangrui Song 0dce409cee [AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile 2020-02-13 13:22:49 -08:00
Thomas Lively e252293d06 [WebAssembly] Add cbrt function signatures
Summary:
Fixes a crash in the backend where optimizations produce calls to the
cbrt runtime functions. Fixes PR 44227.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74259
2020-02-13 13:18:42 -08:00
Jonas Devlieghere 5810ed5186 [llvm][TextAPI/MachO] Extract common code into unittest helper (NFC)
This extract common code between the 4 TBD formats in a header that can
be shared.

Differential revision: https://reviews.llvm.org/D73332
2020-02-13 12:53:24 -08:00
Jonas Devlieghere c6e8bfe7c9 [llvm][TextAPI/MachO] Extend TBD_V4 unittest to verify writing
Same as D73328 but for TBD_V4. One notable tidbit is that the swift abi
version for swift 1 & 2 is emitted as a float which is considered
invalid input.

Differential revision: https://reviews.llvm.org/D73330
2020-02-13 12:53:24 -08:00