Commit Graph

59779 Commits

Author SHA1 Message Date
Jessica Paquette 19dc9c9780 [AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.

This

- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.

- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)

- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.

- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.

Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.

Differential Revision: https://reviews.llvm.org/D89823
2020-10-22 15:27:36 -07:00
Jessica Paquette 147b9497e7 [AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)

It still makes sense to select these instructions at -O0.

Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.

This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.

Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.

And also add a 'r' which somehow got dropped from a bunch of function names.

https://reviews.llvm.org/D89820
2020-10-22 14:43:25 -07:00
Tim Corringham 3c1273d737 [AMDGPU] Add amdgpu specific loop threshold metadata
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU
specific unroll threshold value to be specified on a loop by loop basis.

The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.

Differential Revision: https://reviews.llvm.org/D84779
2020-10-22 17:21:32 +01:00
Mircea Trofin e24537d48f [NFC][MC] Use MCRegister for ReachingDefAnalysis APIs
Also updated the users of the APIs; and a drive-by small change to
RDFRegister.cpp

Differential Revision: https://reviews.llvm.org/D89912
2020-10-22 08:47:35 -07:00
Piotr Sobczak 7ae0033ca8 [AMDGPU] Fix expansion of i16 MULH
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.

Differential Revision: https://reviews.llvm.org/D89965
2020-10-22 17:05:06 +02:00
Evgeny Leviant ed6a91f456 [ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89939
2020-10-22 18:03:01 +03:00
Simon Pilgrim 2692978050 [X86] X86AsmParser - make methods const where possible. NFCI.
Reported by cppcheck
2020-10-22 15:55:06 +01:00
Simon Pilgrim 091b18ba81 [X86] Return const& in IntelExprStateMachine::getIdentifierInfo(). NFCI.
Avoid unnecessary copy in X86AsmParser::ParseIntelOperand
2020-10-22 15:55:06 +01:00
Matt Arsenault d5c0561667 AMDGPU: Fix not always reserving VGPRs used for SGPR spilling
The VGPRs used for SGPR spills need to be reserved, even if we aren't
speculatively reserving one.

This was broken by 117e5609e9.
2020-10-22 10:19:19 -04:00
Matt Arsenault d3bcfe2a36 AMDGPU: Implement getNoPreservedMask
We don't support funclets for exception handling and I hit this when
manually reducing MIR.
2020-10-22 10:17:31 -04:00
Simon Pilgrim 794dc7ad26 [CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger().
Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger.

All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.
2020-10-22 14:27:42 +01:00
Jeremy Morse d73275993b [DebugInstrRef] Substitute debug value numbers to handle optimizations
This patch touches two optimizations, TwoAddressInstruction and X86's
FixupLEAs pass, both of which optimize by re-creating instructions. For
LEAs, various bits of arithmetic are better represented as LEAs on X86,
while TwoAddressInstruction sometimes converts instrs into three address
instructions if it's profitable.

For debug instruction referencing, both of these require substitutions to
be created -- the old instruction number must be pointed to the new
instruction number, as illustrated in the added test. If this isn't done,
any variable locations based on the optimized instruction are
conservatively dropped.

Differential Revision: https://reviews.llvm.org/D85756
2020-10-22 13:01:03 +01:00
Tianqing Wang be39a6fe6f [X86] Add User Interrupts(UINTR) instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89301
2020-10-22 17:33:07 +08:00
Tony 8e8cc587a5 [NFC][AMDGPU] Reorder SIMemoryLegalizer functions to be consistent
- Make the SIMemoryLegalizer insertAcquire function be in the same
  order for each target to be consistent.

Differential Revision: https://reviews.llvm.org/D89880
2020-10-22 05:39:18 +00:00
Xiang1 Zhang 7c3fea7721 [X86] Support customizing stack protector guard
Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D88631
2020-10-22 10:08:14 +08:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Evgeny Leviant bf9edcb6fd [ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89876
2020-10-21 20:49:10 +03:00
Stanislav Mekhanoshin 611959f004 [AMDGPU] Fixed v_swap_b32 match
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.

Fixes: SWDEV-256848

Differential Revision: https://reviews.llvm.org/D89599
2020-10-21 10:14:24 -07:00
Joe Nash f6d7832f4c [AMDGPU] Refactor SOPC & SOPP .td for extension
We use the Real vs Pseudo instruction abstraction for other
types of instructions to facilitate changes in opcode
between gpu generations.
This patch introduces that abstraction to SOPC and SOPP.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89738

Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138
2020-10-21 12:35:52 -04:00
Matt Arsenault 1ed4caff1d AMDGPU: Lower the threshold reported for maximum stack size exceeded
Check the actual maximum supported stack size for a kernel.
2020-10-21 12:06:27 -04:00
Matt Arsenault 53c43431bc AMDGPU: Propagate amdgpu-flat-work-group-size attributes
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.

Also fix not being able to functionally run the pass standalone.
2020-10-21 12:06:24 -04:00
Paul C. Anagnostopoulos dfd6b69e01 [ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89822
2020-10-21 09:52:45 -04:00
Jonas Paulsson 1606755da0 [SystemZ] Mark unsaved argument R6 as live throughout function.
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.

This patch makes sure that in this special case any kill flags of it are
removed.

Review: Ulrich Weigand, Eli Friedman

Differential Revision: https://reviews.llvm.org/D89451
2020-10-21 14:38:59 +02:00
Nicholas Guy 9a2d2bedb7 Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.

Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose

Differential Revision: https://reviews.llvm.org/D88494
2020-10-21 11:52:47 +01:00
Sebastian Neubauer 5290f50e44 [AMDGPU] Fix off by one in assert
D89217 did not subtract one when accessing SubRegFromChannelTable in one
place.

Differential Revision: https://reviews.llvm.org/D89804
2020-10-21 12:37:52 +02:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Craig Topper d4d0b41a82 [X86] Remove period from end of error message in assembler
Addresses post-commit feedback from D89837.
2020-10-21 00:43:23 -07:00
Craig Topper 79a69f558f [X86] Error on using h-registers with REX prefix in the assembler instead of leaving it to a fatal error in the encoder.
Using a fatal error is bad for user experience.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89837
2020-10-20 21:35:44 -07:00
Carl Ritson 324a15cead [AMDGPU][NFC] Fix missing size in comment 2020-10-21 11:38:21 +09:00
Austin Kerbow ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Tony 1bc7bfffdb [AMDGPU] Optimize waitcnt insertion for flat memory operations
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Differential Revision: https://reviews.llvm.org/D89618
2020-10-20 22:55:12 +00:00
Craig Topper 1298252f80 [X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction.
Instead of handling before parsing, just fix it after parsing.
2020-10-20 15:22:00 -07:00
Craig Topper 702aae368a [X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction.
Instead of detecting the mnemonic and hacking the operands before
parsing. Just fix it up after parsing.
2020-10-20 15:20:46 -07:00
Paul C. Anagnostopoulos 332ff48dee [AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89796
2020-10-20 16:23:15 -04:00
vnalamot 89f7ccea6f [AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly
Remove getAllVGPR32() interface and update the SGPR spill code to use
a proper method to get the relevant VGPR registers list.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89806
2020-10-20 23:15:24 +05:30
Jay Foad 4913e3627c [AMDGPU] Remove unused declaration. NFC.
The implementation of this method was removed in D89706.
2020-10-20 16:31:42 +01:00
Michael Liao 2a0e4d1c01 [amdgpu] Enhance AMDGPU AA.
- In general, a generic point may alias to pointers in all other address
  spaces. However, for certain cases enforced by the programming model,
  we may found a generic point won't alias to pointers to local objects.
  * When a generic pointer is loaded from the constant address space, it
    could only be a pointer to the GLOBAL or CONSTANT address space.
    Thus, it won't alias to pointers to the PRIVATE or LOCAL address
    space.
  * When a generic pointer is passed as a kernel argument, it also could
    only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
    also won't alias to pointers to the PRIVATE or LOCAL address space.

Differential Revision: https://reviews.llvm.org/D89525
2020-10-20 09:54:12 -04:00
Carl Ritson be2afbd019 [AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified.  Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644
2020-10-20 19:15:21 +09:00
Carl Ritson 6aabbeadae [AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility
Remove duplicate code and move things around to make it easier to
add additional optimisations to the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89619
2020-10-20 17:22:43 +09:00
Ulrich Weigand c299f3555d [SystemZ] Fix disassembler crashes
The "Size" value returned by SystemZDisassembler::getInstruction is
used by common code even in the case where the routine returns
failure.  If that Size value exceeds the number of bytes remaining
in the section, that could cause disassembler crashes.

Fixed by never returning more than the number of bytes remaining.
2020-10-20 10:21:42 +02:00
Evgeny Leviant 991e86156c [ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89460
2020-10-20 11:14:21 +03:00
David Green 6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Luqman Aden 51892a42da [COFF][ARM] Fix CodeView for Windows on 32bit ARM targets.
Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D89622
2020-10-19 22:16:16 -07:00
Wang, Pengfei 3a85472af2 [X86] Fix assert fail when element type is i1.
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89096
2020-10-20 09:26:32 +08:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Sergei Trofimovich 1eb812e06d [VE] Fix initializer visibility
Before the change attempt to link libLTO.so against shared
LLVM library failed as:

```
[ 76%] Linking CXX shared library ../../lib/libLTO.so
... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1
c++ -o ...libLTO.so.12git ...ibLLVM-12git.so
ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()':
include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo'
```

It happens because on linux llvm build system sets default
symbol visibility to "hidden". The fix is to set visibility
back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47847

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89633
2020-10-19 22:54:41 +01:00
Evgenii Stepanov 188a7d6710 Add alloca size threshold for StackTagging initializer merging.
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.

This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).

We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.

Reviewers: vitalybuka, pcc

Subscribers: llvm-commits
2020-10-19 13:44:07 -07:00
Craig Topper e28376ec28 [X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table.
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.

I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.

A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89656
2020-10-19 12:53:14 -07:00