Commit Graph

68634 Commits

Author SHA1 Message Date
Matt Arsenault 312a9d1b83 GlobalISel: Fix narrowScalar for G_{CTLZ|CTTZ}_ZERO_UNDEF
Narrow these for 64-bit VALU for AMDGPU.
2020-02-09 19:02:38 -05:00
Matt Arsenault c437f6c687 AMDGPU/GlobalISel: Split 64-bit G_CTPOP in RegBankSelect 2020-02-09 18:39:33 -05:00
Matt Arsenault 6135f5eda4 GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ
The result type is separate from the source type.
2020-02-09 18:11:43 -05:00
Matt Arsenault 2126c70e3a AMDGPU/GlobalISel: Don't mis-select vector index on a constant
Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
2020-02-09 18:02:37 -05:00
Matt Arsenault f4a38c114e AMDGPU/GlobalISel: Look through casts when legalizing vector indexing
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
2020-02-09 18:02:10 -05:00
Matt Arsenault 6e1770821f AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses
Reverts part of 6524a7a2b9. Since that
commit, the expansion was ignoring the actual save exec register
produced by the instruction, and looking at other instructions. I do
not understand why it was looking at other instructions, but relying
on this scan was wrong.

Fixes verifier errors after SI_IF is tail duplicated, which should be
correct to do. The results were fed into a phi, which was lowered to
the S_MOV_B64_term instructions.
2020-02-09 17:59:19 -05:00
Simon Pilgrim 29e646fe65 [X86] combineConcatVectorOps - combine VROTLI/VROTRI ops
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:50:10 +00:00
Craig Topper 656d66f5fc [X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16.
We were using MOV32r0 and an extract_subreg as an input. By using
custom isel we can move the extract_subreg to after the SBB instead
of on the input.
2020-02-09 13:19:35 -08:00
Simon Pilgrim e82e17d4d4 [X86] Add lowerShuffleAsBitRotate (PR44379)
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.

This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.

There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.

Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
2020-02-09 21:15:03 +00:00
Simon Pilgrim 0ae119f835 [X86][XOP] Add XOP target to vXi16/vXi8 shuffle tests
Helps with bit rotation test coverage for PR44379
2020-02-09 18:35:51 +00:00
Simon Pilgrim 2278073125 [X86][SSE] Add more tests showing failure to lower shuffles as bit rotations 2020-02-09 18:35:51 +00:00
Sanjay Patel a17f03bd93 [VectorCombine] new IR transform pass for partial vector ops
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.

There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:

    InstCombine - can't happen without TTI, but we don't want target-specific
                  folds there.
    SDAG - too late to assist other vectorization passes; TLI is not equipped
           for these kind of cost queries; limited to a single basic block.
    CGP - too late to assist other vectorization passes; would need to re-implement
          basic cleanups like CSE/instcombine.
    SLP - doesn't fit with existing transforms; limited to a single basic block.

This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.

We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.

Differential Revision: https://reviews.llvm.org/D73480
2020-02-09 10:04:41 -05:00
Simon Pilgrim 644d56b432 [X86] Recognise ROTLI/ROTRI rotations as faux shuffles
Allows us to combine rotations with shuffles.

One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-09 12:25:49 +00:00
Ehud Katz 3b70ee27a5 [LoopExtractor] Convert LoopExtractor from LoopPass to ModulePass
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.

Fixes PR3082 and PR8929.

Differential Revision: https://reviews.llvm.org/D69069
2020-02-09 12:25:21 +02:00
Ayman Musa 10c7b7708b [AggressiveInstCombine] Add test with baseline CHECKs for aggressive inst combine for SELECT. 2020-02-09 12:07:25 +02:00
serge_sans_paille e67cbac812 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 10:42:45 +01:00
serge-sans-paille 4546211600 Revert "Support -fstack-clash-protection for x86"
This reverts commit 0fd51a4554.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354
2020-02-09 10:06:31 +01:00
serge_sans_paille 0fd51a4554 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.

Differential Revision: https://reviews.llvm.org/D68720
2020-02-09 09:35:42 +01:00
Johannes Doerfert b0c77c36d2 [Attributor] Add an Attributor CGSCC pass and run it
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.

The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.

Reviewed By: hfinkel

Differential Revision: https://reviews.llvm.org/D70767
2020-02-08 21:27:34 -06:00
Johannes Doerfert e565db49c6 [OpenMP][Opt] Delete terminating and read-only parallel regions
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69954
2020-02-08 18:52:04 -06:00
Simon Pilgrim 835c81923e Fix test name typo 2020-02-08 21:28:46 +00:00
Johannes Doerfert 98e8eb8be0 [FIX] Update PM tests after D69930 landed 2020-02-08 15:22:40 -06:00
Simon Pilgrim f9c28dc9a5 [X86][SSE] Add test cases from PR44379 2020-02-08 21:03:03 +00:00
Simon Pilgrim 4b4fbae24a [X86] Test showing inability to combine ROTLI/ROTRI rotations into shuffles
One of many things necessary to fix PR44379 (lowering shuffles to rotations)
2020-02-08 21:03:02 +00:00
Johannes Doerfert 9548b74a83 [OpenMP] Introduce the OpenMPOpt transformation pass
The OpenMPOpt pass is a CGSCC pass in which OpenMP specific
optimizations can reside.

The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime
calls and their uses. This allows targeted transformations and eases
their implementation.

This initial patch deduplicates `__kmpc_global_thread_num` and
`omp_get_thread_num` calls. We can also identify arguments that are
equivalent to such a call result and use it instead. Later we can
determine "gtid" arguments based on the use in kernel functions etc.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D69930
2020-02-08 14:47:03 -06:00
George Burgess IV f8c9ceb1ce [SimplifyLibCalls] Add __strlen_chk.
Bionic has had `__strlen_chk` for a while. Optimizing that into a
constant is quite profitable, when possible.

Differential Revision: https://reviews.llvm.org/D74079
2020-02-08 11:51:00 -08:00
Nikita Popov a148b9e990 [InstCombine] Fix infinite min/max canonicalization loop (PR44541)
While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541,
it does so in a more roundabout manner and there might be other
loopholes to trigger the same issue. This is a more direct fix,
that prevents the transform if the min/max is based on a
non-canonical sub X, 0 instruction.

Differential Revision: https://reviews.llvm.org/D73849
2020-02-08 20:42:17 +01:00
Simon Pilgrim 4aa7b9cc96 [X86] X86InstComments - add FMA4 comments
These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.
2020-02-08 17:02:00 +00:00
Simon Pilgrim 10417ad2e4 [X86] Standardize BROADCAST enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.
2020-02-08 16:55:00 +00:00
Nikita Popov d4627b90a0 [InstCombine] Avoid modifying instructions in-place
As discussed on D73919, this replaces a few cases where we were
modifying multiple operands of instructions in-place with the
creation of a new instruction, which we generally prefer nowadays.

This tends to be more readable and less prone to worklist management
bugs.

Test changes are only superficial (instruction naming and order).
2020-02-08 17:05:56 +01:00
Nikita Popov 23db9724d0 [InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835)
Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform
if it wouldn't actually do anything (apart from removing and reinserting
the same instructions).

Note that the test case doesn't loop on current master anymore, only
on the LLVM 10 release branch. The issue is already mitigated on master
due to worklist order fixes, but we should fix the root cause there as well.

As a side note, we should probably assert in combineLoadToNewType()
that it does not combine to the same type. Not doing this here, because
this assertion would also be triggered in another place right now.

Differential Revision: https://reviews.llvm.org/D74278
2020-02-08 16:55:22 +01:00
Simon Pilgrim c8bc89a933 Regenerate FMA tests 2020-02-08 15:23:40 +00:00
Simon Pilgrim 2398752f37 Add missing encoding comments from fma scalar folded intrinsics tests 2020-02-08 15:23:39 +00:00
Simon Pilgrim 0ed79e9b8f [X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079)
Tweak EVEX implementation names so it matches the other variants
2020-02-08 14:54:44 +00:00
serge-sans-paille 658495e6ec Revert "Support -fstack-clash-protection for x86"
This reverts commit e229017732.

Failures:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604
http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308
2020-02-08 14:26:22 +01:00
Victor Campos af2a384581 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 60e0120c91.
2020-02-08 13:18:45 +00:00
Igor Kudrin 1ea99a2ebc [DebugInfo] Allow reading an address table with a mismatched address.
This case does not look as an unrecoverable error.

Differential Revision: https://reviews.llvm.org/D74194
2020-02-08 20:00:03 +07:00
serge_sans_paille e229017732 Support -fstack-clash-protection for x86
Implement protection against the stack clash attack [0] through inline stack
probing.

Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].

This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.

Only implemented for x86.

[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html

This a recommit of 39f50da2a3 with better option
handling and more portable testing

Differential Revision: https://reviews.llvm.org/D68720
2020-02-08 13:31:52 +01:00
Simon Pilgrim ed92ac73af Add missing encoding comments from fma4 folded intrinsics tests 2020-02-08 11:24:22 +00:00
Simon Pilgrim 7f5b3fa73c [X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree
Peek through X86ISD::FRCP nodes to see if there is a negatible input.
2020-02-08 10:56:27 +00:00
Simon Pilgrim 63e338be2c [X86][SSE] Show isNegatibleForFree inability to peek through X86ISD::FRCP
We can safely negate the input of RCP but we can't peek through it.
2020-02-08 10:40:49 +00:00
Craig Topper 2af1640f9a [LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF.
Summary:
For CTTZ we place a set bit just past where the non-promoted type
stopped so the extended bits won't be used for the count. For
CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in
the original type and we end up counting into the extended bits.
So we can just use ANY_EXTEND for both cases.

This matches what is done in type legalization for these operations.
We make no effort to force the upper bits to zero.

Differential Revision: https://reviews.llvm.org/D74111
2020-02-07 22:25:56 -08:00
Akira Hatanaka 4dcc029edb [ObjC][ARC] Keep track of phis that have been discovered to avoid an
infinite loop

This fixes a bug introduced in 6770fbb314.

rdar://problem/59137105
2020-02-07 20:33:11 -08:00
Sam Clegg caeb6cfbc2 [WebAssembly] Fix signature of __powitf2 libcall
Add tests for @llvm.powi.f64/f128.

See: https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic

Differential Revision: https://reviews.llvm.org/D74274
2020-02-07 20:30:47 -08:00
Heejin Ahn 5b5cbfe135 [WebAssembly] Add debug info to insts in Emscripten SjLj
Summary:
This makes sure all newly create instructions in Emscripten SjLj has
appropriate debug info attached. Fixes
https://github.com/emscripten-core/emscripten/issues/9797.

Reviewers: kripken

Subscribers: dschuff, aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D74269
2020-02-07 19:08:39 -08:00
David Blaikie 84eeee6580 Linker/module-max-warn.ll: Fix test to be compatible with Windows file separators 2020-02-07 17:14:05 -08:00
Akira Hatanaka 6770fbb314 [ObjC][ARC] Delete ARC runtime calls that take inert phi values
This improves on the following patch, which removed ARC runtime calls
taking inert global variables:

https://reviews.llvm.org/D62433

rdar://problem/59137105
2020-02-07 16:31:36 -08:00
David Blaikie ba9cae58bb IR Linking: Support merging Warning+Max module metadata flags
Summary:
Debug Info Version was changed to use "Max" instead of "Warning" per the
original design intent - but this maxes old/new IR unlinkable, since
mismatched merge styles are a linking failure.

It seems possible/maybe reasonable to actually support the combination
of these two flags: Warn, but then use the maximum value rather than the
first value/earlier module's value.

Reviewers: tejohnson

Differential Revision: https://reviews.llvm.org/D74257
2020-02-07 16:29:58 -08:00
Amara Emerson 35c63d66aa [GlobalISel][CallLowering] Look through bitcasts from constant function pointers.
Calls to ObjC's objc_msgSend function are done by bitcasting the function global
to the required function type signature. This patch looks through this bitcast
so that we can do a direct call with bl on arm64 instead of using an indirect blr.

Differential Revision: https://reviews.llvm.org/D74241
2020-02-07 15:32:54 -08:00
Craig Topper bb717d3f46 [X86] Correct the implementation of the avx512 masked fmsubadd autoupgrade code to not leave the negate unconnected.
This was causing us to generate fmaddsub instead of fmsubadd if
rounding control is not 4.
2020-02-07 15:27:05 -08:00