Commit Graph

59968 Commits

Author SHA1 Message Date
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Craig Topper 3701e33a22 [RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.

I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.

Differential Revision: https://reviews.llvm.org/D90546
2020-11-04 09:59:14 -08:00
Joe Nash 58adab34c4 [AMDGPU] Resolve pseudo registers at encoding uses
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90721

Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
2020-11-04 12:52:32 -05:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos d56cd4291e [TableGen] Add !interleave operator to concatenate a list of values with delimiters
Add a test. Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D90469
2020-11-04 09:23:54 -05:00
Simon Moll 351c10cc72 [VE] Add +vpu attribute
`+vpu` controls whether VEISelLowering adds any vregs.  This defaults to
`-vpu` to have scalar code generation out of the box.  We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90465
2020-11-04 12:42:00 +01:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Sebastian Neubauer 1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
David Green eb611930b6 [ARM] Remove unused variable. NFC 2020-11-04 09:00:03 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Julien Jorge 0fca651711 [WebAssembly] Don't fold frame offset for global addresses
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```

In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.

Patch by Julien Jorge (jjorge@quarkslab.com)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D90577
2020-11-03 14:56:25 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Jordan Rupprecht 980bf1d5d1 [NFC] Inline wasm assertion-only variable 2020-11-03 13:06:59 -08:00
Andy Wingo 107c3a12d6 [WebAssembly] Implement ref.null
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.

Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).

The register form of ref.null is still untested.

Differential Revision: https://reviews.llvm.org/D90608
2020-11-03 10:46:23 -08:00
Craig Topper 00eff96e1d [RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.

Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.

There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.

Differential Revision: https://reviews.llvm.org/D90575
2020-11-03 10:04:52 -08:00
Esme-Yi 5053eab890 Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA."
This reverts commit 119ab2181e.
2020-11-03 16:34:02 +00:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jameson Nash a0ad066ce4 make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Replaces https://reviews.llvm.org/D74158

Differential Revision: https://reviews.llvm.org/D89613
2020-11-03 10:02:09 -05:00
Sanjay Patel 9af561ec99 [x86] update cost table comments for maxnum; NFC
Follow-up suggested in D90613.
2020-11-03 08:09:59 -05:00
David Green bd32386410 [ARM] Remove unused variable. NFC 2020-11-03 12:58:10 +00:00
David Green e474499402 [ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of
using a low overhead loop as the LR register will need to be spilled and
reloaded around the call, and the low overhead will end up being
reverted. This teaches our hardware loop lowering that these memory
intrinsics will be calls under certain situations.

Differential Revision: https://reviews.llvm.org/D90439
2020-11-03 11:53:09 +00:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Esme-Yi 119ab2181e [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Reviewed By: shchenz, steven.zhang

Differential Revision: https://reviews.llvm.org/D89855
2020-11-03 07:44:11 +00:00
Craig Topper 46e91f6701 [RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC
These were being selected to ROL/ROR, but DAG combine should
canonicalize fshl/fshr with same inputs to rotl/rotr which we
also have patterns for.
2020-11-02 23:12:18 -08:00
Esme-Yi b969dfe26f [NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo.
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D89846
2020-11-03 06:28:56 +00:00
Jessica Clarke 7601a21738 [RISCV] Only return DestSourcePair from isCopyInstrImpl for registers
ADDI often has a frameindex in operand 1, but consumers of this
interface, such as MachineSink, tend to call getReg() on the Destination
and Source operands, leading to the following crash when building
FreeBSD after this implementation was added in 8cf6778d30:

```
clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
 #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0
 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void*) llvm/lib/Support/Unix/Signals.inc:630:0
 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0
 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0
 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0)
 #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
 #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0
#10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0
#11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr*>&) llvm/lib/CodeGen/MachineSink.cpp:918:0
#12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock*, llvm::SmallVector<llvm::MachineBasicBlock*, 4u>, std::less<llvm::MachineBasicBlock*>, std::allocator<std::pair<llvm::MachineBasicBlock* const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0
#13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0
#14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0
```

Thus, check that operand 1 is also a register in the condition.

Reviewed By: arichardson, luismarques

Differential Revision: https://reviews.llvm.org/D89090
2020-11-03 03:55:47 +00:00
Qiu Chaofan d14e51806b [PowerPC] Skip IEEE 128-bit FP type in FastISel
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90206
2020-11-03 11:17:11 +08:00
Qiu Chaofan 3204ffeade [PowerPC] [NFC] Rename VCMPo to VCMP_rec
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D90581
2020-11-03 11:10:59 +08:00
Fangrui Song ca01a6b3ac [PowerPC] Parse and ignore .machine ppc64
In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S
use this directive.
2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek b26a2755dc [Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC
It's useful outside of Hexagon TTI, and with how TTI is implemented,
it is not accessible outside of TTI.
2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Craig Topper 9ac2910093 [RISCV] Make SelectRORIW handle the commutability of OR.
The SHL and SRL could be in opposite order so account for that.

Differential Revision: https://reviews.llvm.org/D90586
2020-11-02 09:32:54 -08:00
Sanjay Patel 35fa3c474f [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Craig Topper 7142ec3aaf [RISCV] When matching RORIW, make sure the same input is given to both shifts.
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.

Differential Revision: https://reviews.llvm.org/D90580
2020-11-02 09:12:40 -08:00
Momchil Velikov 7360d6d921 [ARM][MachineOutliner] Do not overestimate LR liveness in return block
The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers
callee-saved registers to be alive at the point after the return
instruction in a block. In the ARM backend, the `LR` register is
classified as callee-saved, which is not really correct (from an ARM
eABI or just common sense point of view).  These two conditions cause
the `MachineOutliner` to overestimate the liveness of `LR`, which
results in unnecessary saves/restores of `LR` around calls to outlined
sequences.  It also causes the `MachineVerifer` to crash in some
cases, because the save instruction reads a dead `LR`, for example
when the following program:

int h(int, int);

int f(int a, int b, int c, int d) {
  a = h(a + 1, b - 1);
  b = b + c;
  return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

int g(int a, int b, int c, int d) {
  a = h(a - 1, b + 1);
  b = b + c;
  return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

is compiled with `-target arm-eabi -march=armv7-m -Oz`.

This patch computes the liveness of `LR` in return blocks only, while
taking into account the few ARM instructions, which read `LR`, but
nevertheless the register is not mentioned (explicitly or implicitly)
in the instruction operands.

Differential Revision: https://reviews.llvm.org/D89189
2020-11-02 16:47:22 +00:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Matt Arsenault 86b8f6919b AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Evgeny Leviant cc96a82291 [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes case when sched class has write and read variants belonging
to different processor models.

Differential revision: https://reviews.llvm.org/D89777
2020-11-02 17:39:04 +03:00
Jay Foad 0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad 2e7e898c8f Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Simon Pilgrim 36920d5f9d [RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI.
As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.
2020-11-02 11:30:57 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Qiu Chaofan 2762e6734f [PowerPC] Fix a crash in POWER 9 setb peephole
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90142
2020-11-02 14:29:43 +08:00
Craig Topper e57237f198 Recommit "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h. NFCI"
This reverts 781917254d and recommits
781917254d.

I've changed getRegForInlineAsmConstraint to not use a std::pair
of Register in a previous commit. Hopefully that fixes the reported
issue with expensive checks on Windows. I'm still not sure exactly
why this commit removing an include affected a different file.

Original message:

RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.

The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
2020-11-01 10:35:37 -08:00