Commit Graph

394978 Commits

Author SHA1 Message Date
Fangrui Song a45bcde05f [LangRef] Reorder two paragraphs for comdat
so that IMAGE_COMDAT_SELECT_LARGEST refers to the correct example.
2021-07-25 12:53:14 -07:00
Simon Pilgrim 1cfecf4fc4 [X86][AVX] Add getBROADCAST_LOAD helper function. NFCI.
Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.
2021-07-25 20:37:58 +01:00
Jon Chesterfield e30b3b23a4 [libomptarget] Build amdgpu plugin without hsa
Default to building the amdgpu plugin to use dlopen when hsa is
not found instead of disabling it.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106600
2021-07-25 19:33:36 +01:00
Joseph Huber 58725c12bb [OpenMP] Introduce RAII to protect certain RTL calls from DCE
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707
2021-07-25 14:15:47 -04:00
Roman Lebedev 9ebd0dbf0f
[NFC][Codegen][X86] Improve test coverage for insertions into XMM vector 2021-07-25 21:08:03 +03:00
Kyungwoo Lee 6530ea4095 [AArch64] Fix Local Deallocation for Homogeneous Prolog/Epilog
The stack adjustment for local deallocation was incorrectly ported.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D106760
2021-07-25 10:51:11 -07:00
Joachim Protze c46ccb8538 [OpenMP][tests][NFC] Update test status for gcc 11 and 12
gcc 11 introduced support for depend clause, but the gomp interface of libomp
does not yet handle the information.
Also remove -fopenmp-version=50, which is no longer needed for clang, but not
supported by gcc.
2021-07-25 18:56:36 +02:00
Simon Pilgrim b95f66ad78 [X86][SSE] LowerRotate - perform modulo on the amount splat source directly.
If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues.

Fixes one of the concerns raised on D104156
2021-07-25 17:30:32 +01:00
Nikita Popov 087a8eea35 [Attributes] Clean up handling of UB implying attributes (NFC)
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
2021-07-25 18:21:13 +02:00
Nikita Popov 99f869c8f0 [Attributes] Remove nonnull from UB-implying attributes
From LangRef:

> if the parameter or return pointer is null, poison value is
> returned or passed instead. The nonnull attribute should be
> combined with the noundef attribute to ensure a pointer is not
> null or otherwise the behavior is undefined.

Dropping noundef is sufficient to prevent UB. Including nonnull
in this method just muddies the semantics.
2021-07-25 18:07:31 +02:00
Simon Pilgrim 34dc4f24f2 Revert rG939291041bb35b8088e3b61be2b8b3bc950f64a7 "[AMDGPU] Regenerate wave32.ll test checks"
This still breaks buildbots
2021-07-25 15:59:26 +01:00
Nico Weber 75077f46e7 [JITLink][RISCV] Run new test from 0ad562b48 only if the RISCV backend is enabled 2021-07-25 10:47:26 -04:00
Krishna Kariya 7bd361200a [InstCombine] Fix PR47960 - Incorrect transformation of fabs with nnan flag
Bug Fix for PR: https://llvm.org/PR47960

This patch makes sure that the fast math flag used in the 'select'
instruction is the same as the 'fabs' instruction after the transformation.

Differential Revision: https://reviews.llvm.org/D101727
2021-07-25 10:43:33 -04:00
Shilei Tian f1b8fa55d0 [OpenMP][NVPTX] Disable OpenMPOpt when building deviceRTLs
We build `deviceRTLs` with `-O1` by default, which also triggers OpenMPOpt. When
the info cache is created, some attributes are removed. As a result, although we
mark a few functions `noinline`, they are still inlined when the bitcode library
is generated. This can cause an issue in middle end optimization.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106710
2021-07-25 10:38:27 -04:00
Roman Lebedev fa0910e6de
[NFC][Codegen][X86] Improve test coverage for repeated insertions of the same scalar into different elements 2021-07-25 17:37:04 +03:00
Simon Pilgrim 939291041b [AMDGPU] Regenerate wave32.ll test checks
To simplify diff in future patch
2021-07-25 15:13:09 +01:00
Simon Pilgrim 54e5ced7e6 [AMDGPU] Regenerate mul24 test checks
To simplify diffs in future patch
2021-07-25 15:13:09 +01:00
Sanjay Patel 1ce05ad619 [x86] improve CMOV codegen by pushing add into operands, part 2
This is a minimum extension of D106607 to allow folding for
2 non-zero constantsi that can be materialized as immediates..

In the reduced test examples, we save 1 instruction by rolling
the constants into LEA/ADD. In the motivating test from the bullet
benchmark, we absorb both of the constant moves into add ops via
LEA magic, so we reduce by 2 instructions.

Differential Revision: https://reviews.llvm.org/D106684
2021-07-25 10:05:41 -04:00
Kazu Hirata 0fc5534ac7 [GlobalISel] Remove FlagsOp (NFC)
The class was introduced without a use on Dec 11, 2018 in commit
cef44a2342.
2021-07-25 07:05:07 -07:00
Kazu Hirata 4e288a8528 [Inline] Fix a warning by removing an explicit copy constructor
This patches fixes the warning:

  llvm/include/llvm/Analysis/InlineCost.h:62:3: error: definition of
  implicit copy assignment operator for 'CostBenefitPair' is
  deprecated because it has a user-declared copy constructor
  [-Werror,-Wdeprecated-copy]

by removing the explicit copy constructor.
2021-07-25 06:56:47 -07:00
Simon Pilgrim 15b883f457 [X86][AVX] Adjust AllowBWIVPERMV3 tolerance to account for VariableCrossLaneShuffleDepth
As noticed on D105390 - we were hardwiring the depth limit for combining to VPERMI2W/VPERMI2B instructions. Not only had we made the limit too low, we hadn't accounted for slow/fast shuffles via the VariableCrossLaneShuffleDepth control
2021-07-25 14:05:11 +01:00
Simon Pilgrim 9591abd74e [AMDGPU] Regenerate global-load-saddr-to-vaddr test checks
To simplify diff in future patch
2021-07-25 14:05:10 +01:00
Simon Pilgrim 00e37c1cd4 [AMDGPU] Regenerate ctpop16 test checks
To simplify diff in future patch
2021-07-25 14:05:09 +01:00
Simon Pilgrim 249ef1fa82 [AMDGPU] Regenerate half test checks
To simplify diff in future patch
2021-07-25 14:05:08 +01:00
Simon Pilgrim 97d2277b37 [AMDGPU] Regenerate anyext test checks
To simplify diff in future patch
2021-07-25 14:05:08 +01:00
Liqiang Tao 4bdfea2c51 [llvm][Inline] Add interface to return cost-benefit stuff
Return cost-benefit stuff which is computed by cost-benefit analysis.

Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D105349
2021-07-25 20:18:19 +08:00
Amara Emerson acbc0c5f0e [AArch64][GlobalISel] Widen non-pow-2 types for shifts before clamping.
For types like s96, we don't want to clamp to s64, we want to first widen to
s128 and then narrow it. Otherwise we end up with impossible to legalize types.
2021-07-24 15:50:43 -07:00
Eugene Zhulenev de7a4e53a2 [mlir] Async: lower SCF operations into CFG inside coroutines
Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D106747
2021-07-24 14:36:26 -07:00
Craig Topper c63dbd8501 [RISCV] Custom lower (i32 (fptoui/fptosi X)).
I stumbled onto a case where our (sext_inreg (assertzexti32 (fptoui X)), i32)
isel pattern can cause an fcvt.wu and fcvt.lu to be emitted if
the assertzexti32 has an additional user. If we add a one use check
it would just cause a fcvt.lu followed by a sext.w when only need
a fcvt.wu to satisfy both users.

To mitigate this I've added custom isel and new ISD opcodes for
fcvt.wu. This allows us to keep know it started life as a conversion
to i32 without needing to match multiple nodes. ComputeNumSignBits
has been taught that this new nodes produces 33 sign bits. To
prevent regressions when we need to zero extend the result of an
(i32 (fptoui X)), I've added a DAG combine to convert it to an
(i64 (fptoui X)) before type legalization. In most cases this would
happen in InstCombine, but a zero_extend can be created for function
returns or arguments.

To keep everything consistent I've added new nodes for fptosi as well.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D106346
2021-07-24 10:50:43 -07:00
Nikita Popov c7e69e46c8 [Tests] Add additional tests for incorrect willreturn handling (NFC)
Highlight a few of the places that don't handle non-willreturn
calls correctly right now.
2021-07-24 17:27:29 +02:00
Nikita Popov baa51a0cef [Tests] Add missing willreturn attributes (NFC)
To retain the spirit of these tests after an upcoming change
to mayHaveSideEffect(), add willreturn attributes to a number
of functions.
2021-07-24 17:17:48 +02:00
Nikita Popov 0339fcc728 [LICM] Extract debugify test (NFC)
Only one of the tests in the file wants to check debug info, so
move it into a separate file. This allows update_test_checks to
work.
2021-07-24 17:04:42 +02:00
Kazu Hirata 4ccfb1076f [ADT] Remove WrappedPairNodeDataIterator (NFC)
The last use was removed on Jul 16, 2020 in commit
f1d4db4f0c.
2021-07-24 08:02:57 -07:00
Simon Pilgrim f8191ee32b [X86] Add additional div-mod-pair negative test coverage
As suggested on D106745
2021-07-24 15:21:46 +01:00
Benjamin Kramer e27c700b9a [mlir] Restore markUnknownOpDynamicallyLegal to call isDynamicallyLegal by default
Looks like an oversight from b7a4649899

This should probably have a test case ...
2021-07-24 15:54:42 +02:00
Sander de Smalen c3277a8828 [BasicTTI] Set scalarization cost of scalable vector casts to Invalid.
When BasicTTIImpl::getCastInstrCost can't determine the cost of a
vector cast operation when the types need legalization, it falls
back to calculating scalarization costs. Instead of crashing on
`cast<FixedVectorType>(DstVTy)` when the type is a scalable vector,
return an Invalid cost.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D106655
2021-07-24 14:13:21 +01:00
Simon Pilgrim 01f20581dd [X86] Add i128 div-mod-pair test coverage 2021-07-24 14:00:53 +01:00
Paul Walker e697a542ca [SVE][NFC] Cleanup fixed length code gen tests to make them more resilient.
Many of the tests have used NEXT when DAG is more approprite. In
some cases single DAG lines have been used. Note that these are
manual tests because they're to complex for update_llc_test_checks.py
and so it's worth not relying too much on the ordered output.

I've also made the CHECK lines more uniform when it comes to the
ordering of things like LO/HI.
2021-07-24 13:14:42 +01:00
Simon Pilgrim 478b22d95a [CGP] despeculateCountZeros - Don't create is-zero branch if cttz/ctlz source is known non-zero
If value tracking can confirm that the cttz/ctlz source is known non-zero then we don't need to create a branch (which DAG will struggle to recover from).

Differential Revision: https://reviews.llvm.org/D106685
2021-07-24 13:11:49 +01:00
LLVM GN Syncbot fcb3bb581b [gn build] Port 6aa9e746eb 2021-07-24 12:03:50 +00:00
Ayke van Laethem 4d7f5c0a85
[AVR] Only support sp, r0 and r1 in llvm.read_register
Most other registers are allocatable and therefore cannot be used.

This issue was flagged by the machine verifier, because reading other
registers is considered reading from an undefined register.

Differential Revision: https://reviews.llvm.org/D96969
2021-07-24 14:03:27 +02:00
Ayke van Laethem 41f905b211
[AVR] Fix rotate instructions
This patch fixes some issues with the RORB pseudo instruction.

  - A minor issue in which the instructions were said to use the SREG,
    which is not true.
  - An issue with the BLD instruction, which did not have an output operand.
  - A major issue in which invalid instructions were generated. The fix
    also reduce RORB from 4 to 3 instructions, so it's also a small
    optimization.

These issues were flagged by the machine verifier.

Differential Revision: https://reviews.llvm.org/D96957
2021-07-24 14:03:26 +02:00
Ayke van Laethem 6aa9e746eb
[AVR] Expand large shifts early in IR
This patch makes sure shift instructions such as this one:

    %result = shl i32 %n, %amount

are expanded just before the IR to SelectionDAG conversion to a loop so
that calls to non-existing library functions such as __ashlsi3 are
avoided. The generated code is currently pretty bad but there's a lot of
room for improvement: the shift itself can be done in just four
instructions.

Differential Revision: https://reviews.llvm.org/D96677
2021-07-24 14:03:26 +02:00
Ayke van Laethem 431a941465
[AVR] Improve 8/16 bit atomic operations
There were some serious issues with atomic operations. This patch should
fix the biggest issues.

For details on the issue take a look at this Compiler Explorer sample:
https://godbolt.org/z/n3ndhn

Code:

    void atomicadd(_Atomic char *val) {
        *val += 5;
    }

Output:

    atomicadd:
        movw    r26, r24
        ldi     r24, 5     ; 'operand' register
        in      r0, 63
        cli
        ld      r24, X     ; load value
        add     r24, r26   ; value += X
        st      X, r24     ; store value back
        out     63, r0
        ret                ; return the wrong value (in r24)

There are various problems with this.

 - The value to add (5) is stored in r24. However, the value to add to
   is loaded in the same register: r24.
 - The `add` instruction adds half of the pointer to the loaded value,
   instead of (attempting to) add the operand with value 5.
 - The output value of the cmpxchg instruction (which is not used in
   this code sample) is the new value with 5 added, not the old value.
   The LangRef specifies that it has to be the old value, before the
   operation.

This patch fixes the first two and leaves the third problem to be fixed
at a later date. I believe atomics were mostly broken before this patch,
with this patch they should become usable as long as you ignore the
output of the atomic operation. In particular it fixes the following
things:

 - It sets the earlyclobber flag for the input ('$operand' operand) so
   that the register allocator puts it in a different register than the
   output value.
 - It fixes a number of issues with the pseudo op expansion pass, for
   example now it adds the $operand field instead of the pointer. This
   fixes most machine instruction verifier issues (other flagged issues
   are unrelated to atomics).

Differential Revision: https://reviews.llvm.org/D97127
2021-07-24 14:03:26 +02:00
Ayke van Laethem 8544ce80f8
[AVR] Set R31R30 as clobbered after ADJCALLSTACKDOWN
In most cases, using R31R30 is fine because the call (which always
precedes ADJCALLSTACKDOWN) will clobber R31R30 anyway. However, in some
rare cases the register allocator might insert an instruction between
the call and the ADJCALLSTACKDOWN instruction and expect the register
pair to be live afterwards. I think this happens as a result of
rematerialization. Therefore, to fix this, the instruction needs to have
Defs set to R31R30.

Setting the Defs field does have the effect of making the instruction
look dead, which it certainly is not. This is fixed by setting
hasSideEffects to true.

Differential Revision: https://reviews.llvm.org/D97745
2021-07-24 14:03:26 +02:00
Ayke van Laethem feda08b70a
[AVR] Do not chain stores in call frame setup
Previously, AVRTargetLowering::LowerCall attempted to keep stack stores
in order with chains. Perhaps this worked in the past, but it does not
work now: it appears that the SelectionDAG legalization phase removes
these chains. Therefore, I've removed these chains entirely to match
X86 (which, similar to AVR, also prefers to use push instructions over
stack-relative stores to set up a call frame). With this change, all the
stack stores are in a somewhat reasonable order.

Differential Revision: https://reviews.llvm.org/D97853
2021-07-24 14:03:26 +02:00
Ayke van Laethem 13ca0c87ed
[lld][WebAssembly] Align __heap_base
__heap_base was not aligned. In practice, it will often be aligned
simply because it follows the stack, but when the stack is placed at the
beginning (with the --stack-first option), the __heap_base might be
unaligned. It could even be byte-aligned.

At least wasi-libc appears to expect that __heap_base is aligned:
659ff41456/dlmalloc/src/malloc.c (L5224)

While WebAssembly itself does not appear to require any alignment for
memory accesses, it is sometimes required when sharing a pointer
externally. For example, WASI might expect alignment up to 8:
https://github.com/WebAssembly/WASI/blob/main/phases/snapshot/docs.md#-timestamp-u64

This issue got introduced with the addition of the --stack-first flag:
https://reviews.llvm.org/D46141
I suspect the lack of alignment wasn't intentional here.

Differential Revision: https://reviews.llvm.org/D106499
2021-07-24 14:03:26 +02:00
Butygin b7a4649899 [mlir] ConversionTarget legality callbacks refactoring
* Get rid of Optional<std::function> as std::function already have a null state
* Add private setLegalityCallback function to set legality callback for unknown ops
* Get rid of unknownOpsDynamicallyLegal flag, use unknownLegalityFn state insted. This causes behavior change when user first calls markUnknownOpDynamicallyLegal with callback and then without but I am not sure is the original behavior was really a 'feature', or just oversignt in the original implementation.

Differential Revision: https://reviews.llvm.org/D105496
2021-07-24 14:59:36 +03:00
Melanie Blower 05ae303555 [clang][patch] Remove test artifact before running test for consistent results
Fix non-deterministic test behavior by removing previously-created
test directory, see comments in D95159
2021-07-24 07:55:10 -04:00
Simon Pilgrim c261a06b7a [DAG] Add initial SelectionDAG::isGuaranteedNotToBeUndefOrPoison framework (PR51129)
I've setup the basic framework for the isGuaranteedNotToBeUndefOrPoison call and updated DAGCombiner::visitFREEZE to use it, further Opcodes can be handled when we have test coverage.

I'm not aware of any vector test freeze coverage so the DemandedElts (and the Depth) args are not being used yet - but they are in place.

SelectionDAG::isGuaranteedNotToBePoison wrappers have also been added.

Differential Revision: https://reviews.llvm.org/D106668
2021-07-24 11:36:35 +01:00