Commit Graph

60174 Commits

Author SHA1 Message Date
Cameron McInally c126eb7529 [SelectionDAG] Add legalizations for VECREDUCE_SEQ_FMUL
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.

Differential Revision: https://reviews.llvm.org/D90644
2020-11-04 14:20:31 -06:00
Mircea Trofin 5dc47541f9 [NFC] Use Register/MCRegister
Differential Revision: https://reviews.llvm.org/D90724
2020-11-04 12:20:17 -08:00
Craig Topper cc3bf27077 [RISCV] Remove assertsexti32 from fslw/fsrw isel patterns.
The operations in these patterns shouldn't be effected by sign
bits. And the pattern is starting from a sign_extend_inreg so
we aren't expecting sign bits to be passed through either.

Differential Revision: https://reviews.llvm.org/D90739
2020-11-04 11:37:58 -08:00
Craig Topper d47300f503 [RISCV] Correct the operand order for fshl/fshr to fsl/fsr instructions.
fsl/fsr take their shift amount in $rs2 or an immediate. The
sources are $rs1 and $rs3.

fshl/fshr ISD opcodes both concatenate operand 0 in the high bits and
operand 1 in the lower bits. fshl returns the high bits after
shifting and fshr returns the low bits. So a shift amount of 0
returns operand 0 for fshl and operand 1 for fshr.

fsl/fsr concatenate their operands in different orders such that
$rs1 will be returned for a shift amount of 0. So $rs1 needs to
come from operand 0 of fshl and operand 1 of fshr.

Differential Revision: https://reviews.llvm.org/D90735
2020-11-04 11:13:25 -08:00
Craig Topper 0122a4ea66 [RISCV] Remove assertsexti32 from inputs to riscv_sllw/srlw nodes in B extension isel patterns.
riscv_sllw/srlw only reads the lower 32 bits of the first operand.
And the lower 5 bits of the second operands. Whether the upper
32 bits of the input are sign bits or not doesn't matter.

Also use ineg and not to shorten the patterns.

Differential Revision: https://reviews.llvm.org/D90668
2020-11-04 10:35:05 -08:00
Craig Topper 857563eaf0 [RISCV] Check all 64-bits of the mask in SelectRORIW.
We need to ensure the upper 32 bits of the mask are zero.
So that the srl shifts zeroes into the lower 32 bits.

Differential Revision: https://reviews.llvm.org/D90585
2020-11-04 10:15:30 -08:00
Christopher Tetreault 900ec97bbe [UBSan] Cannot negate smallest negative signed integer
Silence warning Undefined Behavior Sanitzer warning:
runtime error: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D90710
2020-11-04 10:07:52 -08:00
Craig Topper 3701e33a22 [RISCV] Remove custom isel for (srl (shl val, 32), imm). Use pattern instead. NFCI
We don't need custom matching, we just a need a predicate to check
the immediate is greater than 32. We can use the existing ImmSub32
to adjust the immediate.

I've also used the new predicate in the other location that used
ImmSub32. I tried to create a test case where we would break without
the greater than 32 check on that pattern, but DAG combine defeated me.
Still seemed safer to have it.

Differential Revision: https://reviews.llvm.org/D90546
2020-11-04 09:59:14 -08:00
Joe Nash 58adab34c4 [AMDGPU] Resolve pseudo registers at encoding uses
Pseudo-registers allow different register encodings
between gpu generations. Make sure we resolve the
pseudo regs to real regs whenever we get their
hardware encoding.
Using the correct encodings revealed a register
bank conflict and an unnecessary write dependency.
Tests have been updated to match.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90721

Change-Id: I73c154cd24aecc820993b50bebaf4df97a5710ca
2020-11-04 12:52:32 -05:00
Sebastian Neubauer 31a0b2834f [AMDGPU] Fix iterating in SIFixSGPRCopies
The insertion of waterfall loops splits the current basic block into
three blocks. So the basic block that we iterate over must be updated.

This failed assert(!NodePtr->isKnownSentinel()) in ilist_iterator for
divergent calls in branches before.

Differential Revision: https://reviews.llvm.org/D90596
2020-11-04 18:43:19 +01:00
Paul C. Anagnostopoulos d56cd4291e [TableGen] Add !interleave operator to concatenate a list of values with delimiters
Add a test. Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D90469
2020-11-04 09:23:54 -05:00
Simon Moll 351c10cc72 [VE] Add +vpu attribute
`+vpu` controls whether VEISelLowering adds any vregs.  This defaults to
`-vpu` to have scalar code generation out of the box.  We bring up
vector isel under the `+vpu` flag. Once vector isel is stable we switch
to `+vpu` and advertise vregs and vops in TTI.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90465
2020-11-04 12:42:00 +01:00
Kerry McLaughlin f2412d372d [SVE][CodeGen] Lower scalable integer vector reductions
This patch uses the existing LowerFixedLengthReductionToSVE function to also lower
scalable vector reductions. A separate function has been added to lower VECREDUCE_AND
& VECREDUCE_OR operations with predicate types using ptest.

Lowering scalable floating-point reductions will be addressed in a follow up patch,
for now these will hit the assertion added to expandVecReduce() in TargetLowering.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D89382
2020-11-04 11:38:49 +00:00
Sebastian Neubauer 1124bf4ab7 [AMDGPU] Set rsrc1 flags for graphics shaders
Before they were only set for compute kernels and compute shaders but
not for other shaders.

Differential Revision: https://reviews.llvm.org/D89399
2020-11-04 12:25:41 +01:00
Sebastian Neubauer 76313288cd [AMDGPU] Fix ieee mode default value
Previously, the default value for ieee mode was
- on for compute kernels and compute shaders,
- off for all shaders except compute shaders.

This commit changes the default to be
- on for compute kernels,
- off for shaders.

This aligns the default value with the settings that are actually in
use.  To my knowledge, all users of shader calling conventions (mesa and
llpc) disable the ieee mode by default.

Differential Revision: https://reviews.llvm.org/D89388
2020-11-04 12:25:38 +01:00
David Green eb611930b6 [ARM] Remove unused variable. NFC 2020-11-04 09:00:03 +00:00
Sander de Smalen 73b6cb67dc [NFCI] Replace AArch64StackOffset by StackOffset.
This patch replaces the AArch64StackOffset class by the generic one
defined in TypeSize.h.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D88983
2020-11-04 08:49:00 +00:00
Amara Emerson 393b55380a [AArch64][GlobalISel] Add combine for G_EXTRACT_VECTOR_ELT to allow selection of pairwise FADD.
For the <2 x float> case, instead of adding another combine or legalization to
get it into a <4 x float> form, I'm just adding a GISel specific selection
pattern to cover it.

Differential Revision: https://reviews.llvm.org/D90699
2020-11-03 17:25:14 -08:00
Julien Jorge 0fca651711 [WebAssembly] Don't fold frame offset for global addresses
When machine instructions are in the form of
```
%0 = CONST_I32 @str
%1 = ADD_I32 %stack.0, %0
%2 = LOAD 0, 0, %1
```

In the `ADD_I32` instruction, it is possible to fold it if `%0` is a
`CONST_I32` from an immediate number. But in this case it is a global
address, so we shouldn't do that. But we haven't checked if the operand
of `ADD` is an immediate so far. This fixes the problem. (The case
applies the same for `ADD_I64` and `CONST_I64` instructions.)

Fixes https://bugs.llvm.org/show_bug.cgi?id=47944.

Patch by Julien Jorge (jjorge@quarkslab.com)

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D90577
2020-11-03 14:56:25 -08:00
Sanjay Patel c40126e740 [ARM] remove cost-kind predicate for most math op costs
This is based on the same idea that I am using for the basic model implementation
and what I have partly already done for x86: throughput cost is number of
instructions/uops, so size/blended costs are identical except in special cases
(for example, fdiv or other known-expensive machine instructions or things like
MVE that may require cracking into >1 uop)).

Differential Revision: https://reviews.llvm.org/D90692
2020-11-03 17:23:46 -05:00
Jordan Rupprecht 980bf1d5d1 [NFC] Inline wasm assertion-only variable 2020-11-03 13:06:59 -08:00
Andy Wingo 107c3a12d6 [WebAssembly] Implement ref.null
This patch adds a new "heap type" operand kind to the WebAssembly MC
layer, used by ref.null. Currently the possible values are "extern" and
"func"; when typed function references come, though, this operand may be
a type index.

Note that the "heap type" production is still known as "refedtype" in
the draft proposal; changing its name in the spec is
ongoing (https://github.com/WebAssembly/reference-types/issues/123).

The register form of ref.null is still untested.

Differential Revision: https://reviews.llvm.org/D90608
2020-11-03 10:46:23 -08:00
Craig Topper 00eff96e1d [RISCV] Add missing patterns for rotr with immediate for Zbb/Zbp extensions.
DAGCombine doesn't canonicalize rotl/rotr with immediate so we
need patterns for both.

Remove the custom matcher for rotl to RORI and just use a SDNodeXForm
to convert the immediate instead. Doing this gives priority to the
rev32/rev16 versions of grevi over rori since an explicit immediate
is more precise than any immediate. I also added rotr patterns for
rev32/rev16. And removed the (or (shl), (shr)) patterns that should be
combined to rotl by DAG combine.

There is at least one other grev pattern that probably needs a
another rotr pattern, but we need more test coverage first.

Differential Revision: https://reviews.llvm.org/D90575
2020-11-03 10:04:52 -08:00
Esme-Yi 5053eab890 Revert "[PowerPC] Extend folding RLWINM + RLWINM to post-RA."
This reverts commit 119ab2181e.
2020-11-03 16:34:02 +00:00
Tim Renouf 89d41f3a2b [AMDGPU] Add gfx1033 target
Differential Revision: https://reviews.llvm.org/D90447

Change-Id: If2650fc7f31bbdd49c76e74a9ca8e3734d769761
2020-11-03 16:27:48 +00:00
Tim Renouf ee3e642627 [AMDGPU] Add gfx90c target
This differentiates the Ryzen 4000/4300/4500/4700 series APUs that were
previously included in gfx909.

Differential Revision: https://reviews.llvm.org/D90419

Change-Id: Ia901a7157eb2f73ccd9f25dbacec38427312377d
2020-11-03 16:27:43 +00:00
Jay Foad 040c50278c [AMDGPU] Fix ds_read2/write2 with unaligned offsets
These instructions use a scaled offset. We were wrongly selecting them
even when the required offset was not a multiple of the scale factor.

Differential Revision: https://reviews.llvm.org/D90607
2020-11-03 15:16:10 +00:00
Jameson Nash a0ad066ce4 make the AsmPrinterHandler array public
This lets external consumers customize the output, similar to how
AssemblyAnnotationWriter lets the caller define callbacks when printing
IR. The array of handlers already existed, this just cleans up the code
so that it can be exposed publically.

Replaces https://reviews.llvm.org/D74158

Differential Revision: https://reviews.llvm.org/D89613
2020-11-03 10:02:09 -05:00
Sanjay Patel 9af561ec99 [x86] update cost table comments for maxnum; NFC
Follow-up suggested in D90613.
2020-11-03 08:09:59 -05:00
David Green bd32386410 [ARM] Remove unused variable. NFC 2020-11-03 12:58:10 +00:00
David Green e474499402 [ARM] Treat memcpy/memset/memmove as call instructions for low overhead loops
If an instruction will be lowered to a call there is no advantage of
using a low overhead loop as the LR register will need to be spilled and
reloaded around the call, and the low overhead will end up being
reverted. This teaches our hardware loop lowering that these memory
intrinsics will be calls under certain situations.

Differential Revision: https://reviews.llvm.org/D90439
2020-11-03 11:53:09 +00:00
Nicholas Guy 54d8627852 [AArch64] Redundant masks in downcast long multiply
Adds patterns to catch masks preceeding a long multiply,
and generating a single umull/smull instruction instead.

Differential revision: https://reviews.llvm.org/D89956
2020-11-03 10:12:28 +00:00
Petar Avramovic 0031418dce AMDGPU/GlobalISel: Use same builder/observer in post-legalizer-combiner
Change match/apply functions into methods of new target specific combiner
helper class. Use reference to MachineIRBuilder from helper instead of
constructing new MachineIRBuilder each time new instruction needs to made.
Allows correct tracking of newly created instructions.

Differential Revision: https://reviews.llvm.org/D90623
2020-11-03 09:24:50 +01:00
Esme-Yi 119ab2181e [PowerPC] Extend folding RLWINM + RLWINM to post-RA.
Summary: This patch depends on D89846. We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example rGc4690b007743. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.

Reviewed By: shchenz, steven.zhang

Differential Revision: https://reviews.llvm.org/D89855
2020-11-03 07:44:11 +00:00
Craig Topper 46e91f6701 [RISCV] Remove isel patterns for fshl/fshr with same inputs. NFC
These were being selected to ROL/ROR, but DAG combine should
canonicalize fshl/fshr with same inputs to rotl/rotr which we
also have patterns for.
2020-11-02 23:12:18 -08:00
Esme-Yi b969dfe26f [NFC][PowerPC] Move the folding RLWINMs from ppc-mi-peephole to PPCInstrInfo.
Summary: We have the patterns to fold 2 RLWINMs in ppc-mi-peephole, while some RLWINM will be generated after RA, for example D88274. If the RLWINM generated after RA followed by another RLWINM, we expect to perform the optimization after RA, too.
This is a NFC patch to move the folding patterns to PPCInstrInfo, and the follow-up works will be calling it in pre-emit-peephole and expand the patterns to handle more cases.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D89846
2020-11-03 06:28:56 +00:00
Jessica Clarke 7601a21738 [RISCV] Only return DestSourcePair from isCopyInstrImpl for registers
ADDI often has a frameindex in operand 1, but consumers of this
interface, such as MachineSink, tend to call getReg() on the Destination
and Source operands, leading to the following crash when building
FreeBSD after this implementation was added in 8cf6778d30:

```
clang: llvm/include/llvm/CodeGen/MachineOperand.h:359: llvm::Register llvm::MachineOperand::getReg() const: Assertion `isReg() && "This is not a register operand!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
 #0 0x00007f4286f9b4d0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) llvm/lib/Support/Unix/Signals.inc:563:0
 #1 0x00007f4286f9b587 PrintStackTraceSignalHandler(void*) llvm/lib/Support/Unix/Signals.inc:630:0
 #2 0x00007f4286f9926b llvm::sys::RunSignalHandlers() llvm/lib/Support/Signals.cpp:71:0
 #3 0x00007f4286f9ae52 SignalHandler(int) llvm/lib/Support/Unix/Signals.inc:405:0
 #4 0x00007f428646ffd0 (/lib/x86_64-linux-gnu/libc.so.6+0x3efd0)
 #5 0x00007f428646ff47 raise /build/glibc-2ORdQG/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #6 0x00007f42864718b1 abort /build/glibc-2ORdQG/glibc-2.27/stdlib/abort.c:81:0
 #7 0x00007f428646142a __assert_fail_base /build/glibc-2ORdQG/glibc-2.27/assert/assert.c:89:0
 #8 0x00007f42864614a2 (/lib/x86_64-linux-gnu/libc.so.6+0x304a2)
 #9 0x00007f428d4078e2 llvm::MachineOperand::getReg() const llvm/include/llvm/CodeGen/MachineOperand.h:359:0
#10 0x00007f428d8260e7 attemptDebugCopyProp(llvm::MachineInstr&, llvm::MachineInstr&) llvm/lib/CodeGen/MachineSink.cpp:862:0
#11 0x00007f428d826442 performSink(llvm::MachineInstr&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::SmallVectorImpl<llvm::MachineInstr*>&) llvm/lib/CodeGen/MachineSink.cpp:918:0
#12 0x00007f428d826e27 (anonymous namespace)::MachineSinking::SinkInstruction(llvm::MachineInstr&, bool&, std::map<llvm::MachineBasicBlock*, llvm::SmallVector<llvm::MachineBasicBlock*, 4u>, std::less<llvm::MachineBasicBlock*>, std::allocator<std::pair<llvm::MachineBasicBlock* const, llvm::SmallVector<llvm::MachineBasicBlock*, 4u> > > >&) llvm/lib/CodeGen/MachineSink.cpp:1073:0
#13 0x00007f428d824a2c (anonymous namespace)::MachineSinking::ProcessBlock(llvm::MachineBasicBlock&) llvm/lib/CodeGen/MachineSink.cpp:410:0
#14 0x00007f428d824513 (anonymous namespace)::MachineSinking::runOnMachineFunction(llvm::MachineFunction&) llvm/lib/CodeGen/MachineSink.cpp:340:0
```

Thus, check that operand 1 is also a register in the condition.

Reviewed By: arichardson, luismarques

Differential Revision: https://reviews.llvm.org/D89090
2020-11-03 03:55:47 +00:00
Qiu Chaofan d14e51806b [PowerPC] Skip IEEE 128-bit FP type in FastISel
Vector types, quadword integers and f128 currently cannot be handled in
FastISel. We did not skip f128 type in lowering arguments, which causes
a crash. This patch will fix it.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90206
2020-11-03 11:17:11 +08:00
Qiu Chaofan 3204ffeade [PowerPC] [NFC] Rename VCMPo to VCMP_rec
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D90581
2020-11-03 11:10:59 +08:00
Fangrui Song ca01a6b3ac [PowerPC] Parse and ignore .machine ppc64
In the wild, kexec-tools purgatory/arch/ppc64/v2wrap.S and hvcall.S
use this directive.
2020-11-02 16:49:57 -08:00
Krzysztof Parzyszek b26a2755dc [Hexagon] Move isTypeForHVX from Hexagon TTI to HexagonSubtarget, NFC
It's useful outside of Hexagon TTI, and with how TTI is implemented,
it is not accessible outside of TTI.
2020-11-02 14:00:45 -06:00
Stanislav Mekhanoshin c9d6fe6f7d [AMDGPU] Improve FLAT scratch detection
We were useing too broad check for isFLATScratch() which also
includes FLAT global.

Differential Revision: https://reviews.llvm.org/D90505
2020-11-02 11:37:33 -08:00
Craig Topper 9ac2910093 [RISCV] Make SelectRORIW handle the commutability of OR.
The SHL and SRL could be in opposite order so account for that.

Differential Revision: https://reviews.llvm.org/D90586
2020-11-02 09:32:54 -08:00
Sanjay Patel 35fa3c474f [x86] add AVX2 cost model entries for maxnum of 256-bit vectors
As noticed in D90554 ,
the AVX2 costs for 256-bit vectors did not include FMAXNUM entries,
so we fell back to AVX1 which assumes those ops will be split into
128-bit halves or something close to that.

Differential Revision: https://reviews.llvm.org/D90613
2020-11-02 12:20:17 -05:00
Craig Topper 7142ec3aaf [RISCV] When matching RORIW, make sure the same input is given to both shifts.
The code is looking for (sext_inreg (or (shl X, C2), (shr (and Y, C3), C1))).
We need to ensure X and Y are the same.

Differential Revision: https://reviews.llvm.org/D90580
2020-11-02 09:12:40 -08:00
Momchil Velikov 7360d6d921 [ARM][MachineOutliner] Do not overestimate LR liveness in return block
The `LiveRegUnits` utility (as well as `LivePhysRegs`) considers
callee-saved registers to be alive at the point after the return
instruction in a block. In the ARM backend, the `LR` register is
classified as callee-saved, which is not really correct (from an ARM
eABI or just common sense point of view).  These two conditions cause
the `MachineOutliner` to overestimate the liveness of `LR`, which
results in unnecessary saves/restores of `LR` around calls to outlined
sequences.  It also causes the `MachineVerifer` to crash in some
cases, because the save instruction reads a dead `LR`, for example
when the following program:

int h(int, int);

int f(int a, int b, int c, int d) {
  a = h(a + 1, b - 1);
  b = b + c;
  return 1 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

int g(int a, int b, int c, int d) {
  a = h(a - 1, b + 1);
  b = b + c;
  return 2 + (2 * a + b) * (c - d) / (a - b) * (c + d);
}

is compiled with `-target arm-eabi -march=armv7-m -Oz`.

This patch computes the liveness of `LR` in return blocks only, while
taking into account the few ARM instructions, which read `LR`, but
nevertheless the register is not mentioned (explicitly or implicitly)
in the instruction operands.

Differential Revision: https://reviews.llvm.org/D89189
2020-11-02 16:47:22 +00:00
Florian Hahn b3b993a7ad Reland "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts the revert commit 408c4408fa.

This version of the patch includes a fix for a crash caused by
treating ICmp/FCmp constant expressions as instructions.

Original message:

On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.
2020-11-02 15:39:29 +00:00
Matt Arsenault 86b8f6919b AMDGPU: Reorder checks 2020-11-02 10:21:48 -05:00
Evgeny Leviant cc96a82291 [TableGen][SchedModels] Fix read/write variant substitution
Patch fixes case when sched class has write and read variants belonging
to different processor models.

Differential revision: https://reviews.llvm.org/D89777
2020-11-02 17:39:04 +03:00
Jay Foad 0892d2a311 Revert "Fix ds_read2/write2 unaligned offsets"
This reverts commit 2e7e898c8f.

It was committed by mistake.
2020-11-02 14:01:33 +00:00
Jay Foad 2e7e898c8f Fix ds_read2/write2 unaligned offsets 2020-11-02 13:57:13 +00:00
Simon Pilgrim 36920d5f9d [RISCV] Avoid std::pair<> in FPReg StringSwitch to avoid MSVC compile failures. NFCI.
As discussed on D90322, some MSVC builds are failing with is_trivially_copyable static asserts (see D86126) - we can avoid this by not using the std::pair<unsigned,unsigned> which held both the FP+DP Registers, just handle the FP register and convert to DP on the fly.
2020-11-02 11:30:57 +00:00
Caroline Concatto 71038788ce Revert "[AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register."
This reverts commit 8b281bfaf3.
2020-11-02 08:15:50 +00:00
Caroline Concatto 8b281bfaf3 [AArch64][AsmParser] Remove 'x31' alias for 'sp/xzr' register.
Only the aliases 'xzr' and 'sp' exist for the physical register x31.
The reason for wanting to remove the alias 'x31' is because it allows users
to write invalid asm that is not accepted by the GNU assembler.

Is there any objection to removing this alias? Or do we want to keep
this for compatibility with existing code that uses w31/x31?

Differential Revision: https://reviews.llvm.org/D90153
2020-11-02 07:57:05 +00:00
Qiu Chaofan 2762e6734f [PowerPC] Fix a crash in POWER 9 setb peephole
Variable InnerIsSel references FalseRes, while FalseRes might be
zext/sext. So InnerIsSel should reference SetOrSelCC, otherwise a crash
will happen.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D90142
2020-11-02 14:29:43 +08:00
Craig Topper e57237f198 Recommit "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h. NFCI"
This reverts 781917254d and recommits
781917254d.

I've changed getRegForInlineAsmConstraint to not use a std::pair
of Register in a previous commit. Hopefully that fixes the reported
issue with expensive checks on Windows. I'm still not sure exactly
why this commit removing an include affected a different file.

Original message:

RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.

The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
2020-11-01 10:35:37 -08:00
Craig Topper a76cd10fcd [RISCV] Use 'unsigned' instead of Register in getRegForInlineAsmConstraint. NFC
The return value of this interface still uses an 'unsigned' on all
targets. So we convert Register back to unsigned at the end.

I'm hoping this will prevent the issue that caused the revert of
D90322.
2020-11-01 10:16:52 -08:00
Christudasan Devadasan d6aa4aa29a [AMDGPU] Some refactoring after D90404. NFC. 2020-11-01 13:18:53 +05:30
Christudasan Devadasan 9bb2b4f0aa [AMDGPU] Add alignment check for v3 to v4 load type promotion
It should be enabled only when the load alignment is at least 8-byte.

Fixes: SWDEV-256824

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90404
2020-11-01 12:05:34 +05:30
Ayke van Laethem e03ba2198d
[AVR] Improve inline rotate/shift expansions
These expansions were rather inefficient and were done with more code
than necessary. This change optimizes them to use expansions more
similar to GCC. The code size is the same (when optimizing for code
size) but somehow LLVM reorders blocks in a non-optimal way. Still, this
should be an improvement with a reduction in code size of around 0.12%
(when building compiler-rt).

Differential Revision: https://reviews.llvm.org/D86418
2020-10-31 23:15:49 +01:00
Paul C. Anagnostopoulos ef6f6d1c1a [TableGen] Eliminate uses of true and false in .td files.
They occurred in one NVPTX file and some test files.

Differential Revision: https://reviews.llvm.org/D90513
2020-10-31 10:54:33 -04:00
David Green 30ad742644 [ARM] Fix crash for gather of pointer costs.
If the elt size is unknown due to it being a pointer, a comparison
against 0 will cause an assert. Make sure the elt size is large enough
before comparing and for the moment just return the scalar cost.
2020-10-31 13:10:14 +00:00
Simon Pilgrim 9e406ee808 [X86] Make some basic VarArgsLoweringHelper helper methods const. NFCI.
Fixes a number of cppcheck remarks.
2020-10-31 12:16:49 +00:00
Simon Pilgrim e0cbcf96ce [X86] Make the X86FrameSortingComparator operator const. NFCI.
Fixes a cppcheck remark.
2020-10-31 12:16:49 +00:00
Simon Pilgrim 55dbb7d823 [X86] X86MCTargetDesc - ensure the declaration/definition variable names match. NFCI.
Silences cppcheck mismatch warnings.
2020-10-31 11:50:00 +00:00
Simon Pilgrim 30a1d91127 [X86] Reduce scope of DestReg and use specific Register type not unsigned. NFCI. 2020-10-31 11:46:07 +00:00
Simon Pilgrim ae80ac6db2 [X86] printAsmMRegister - make the X86AsmPrinter arg a const reference. NFC.
Fixes cppcheck warning.
2020-10-31 11:41:14 +00:00
Simon Pilgrim 39f77b3224 [X86] assignValueToReg - fix Wshadow warning. NFCI.
X86OutgoingValueHandler already has a MIB member
2020-10-31 11:39:26 +00:00
Simon Pilgrim 33e20008d1 [X86] printAsmVRegister - remove unused argument. NFC. 2020-10-31 11:34:28 +00:00
Simon Pilgrim ec547a7517 [X86] X86AsmPrinter - ensure the declaration/definition variable names match. NFCI.
Silences cppcheck mismatch warnings.
2020-10-31 11:31:46 +00:00
Simon Pilgrim 5eec049689 [X86] No need to determine pointer when the type is already a MachineInstr*. NFCI.
Caught by cppcheck - appears to be a copy+paste typo as the other var is an iterator that does need the &* pointer operation.
2020-10-31 11:26:25 +00:00
Liu, Chen3 756f597841 [X86] Support Intel avxvnni
This patch mainly made the following changes:

1. Support AVX-VNNI instructions;
2. Introduce ExplicitVEXPrefix flag so that vpdpbusd/vpdpbusds/vpdpbusds/vpdpbusds instructions only use vex-encoding when user explicity add {vex} prefix.

Differential Revision: https://reviews.llvm.org/D89105
2020-10-31 12:39:51 +08:00
Thomas Lively a787e09779 [WebAssembly] Prototype i64x2.bitmask
As proposed in https://github.com/WebAssembly/simd/pull/368.

Differential Revision: https://reviews.llvm.org/D90514
2020-10-30 17:23:30 -07:00
Wouter van Oortmerssen 86cd2332ce [WebAssembly] Fixed DWARF DW_AT_low_pc encoded as 64-bit in wasm64
Also added general wasm64 DWARF test
Also added asserts for unsupported reloc combinations that triggered this bug.

Differential Revision: https://reviews.llvm.org/D90503
2020-10-30 16:42:48 -07:00
Thomas Lively 0a512a555a [WebAssembly] Prototype i64x2.eq
As proposed in https://github.com/WebAssembly/simd/pull/381. Since it is still
in the prototyping phase, it is only accessible via a target builtin function
and a target intrinsic.

Depends on D90504.

Differential Revision: https://reviews.llvm.org/D90508
2020-10-30 16:38:15 -07:00
Thomas Lively 1cb0b56607 [WebAssembly] Prototype i64x2.widen_{low,high}_i32x4_{s,u}
As proposed in https://github.com/WebAssembly/simd/pull/290. As usual, these
instructions are available only via builtin functions and intrinsics while they
are in the prototyping stage.

Differential Revision: https://reviews.llvm.org/D90504
2020-10-30 15:44:04 -07:00
Florian Hahn 408c4408fa Revert "[TTI] Add VecPred argument to getCmpSelInstrCost."
This reverts commit 73f01e3df5.

This appears to break
http://lab.llvm.org:8011/#/builders/85/builds/383.
2020-10-30 21:26:14 +00:00
Peter Collingbourne 3d049bce98 hwasan: Support for outlined checks in the Linux kernel.
Add support for match-all tags and GOT-free runtime calls, which
are both required for the kernel to be able to support outlined
checks. This requires extending the access info to let the backend
know when to enable these features. To make the code easier to maintain
introduce an enum with the bit field positions for the access info.

Allow outlined checks to be enabled with -mllvm
-hwasan-inline-all-checks=0. Kernels that contain runtime support for
outlined checks may pass this flag. Kernels lacking runtime support
will continue to link because they do not pass the flag. Old versions
of LLVM will ignore the flag and continue to use inline checks.

With a separate kernel patch [1] I measured the code size of defconfig
+ tag-based KASAN, as well as boot time (i.e. time to init launch)
on a DragonBoard 845c with an Android arm64 GKI kernel. The results
are below:

         code size    boot time
before    92824064      6.18s
after     38822400      6.65s

[1] https://linux-review.googlesource.com/id/I1a30036c70ab3c3ee78d75ed9b87ef7cdc3fdb76

Depends on D90425

Differential Revision: https://reviews.llvm.org/D90426
2020-10-30 14:25:40 -07:00
Cameron McInally dda1e74b58 [Legalize] Add legalizations for VECREDUCE_SEQ_FADD
Add Legalization support for VECREDUCE_SEQ_FADD, so that we don't need to depend on ExpandReductionsPass.

Differential Revision: https://reviews.llvm.org/D90247
2020-10-30 16:02:55 -05:00
Peter Collingbourne c9b1a2b41d AArch64: Use SBFX instead of UBFX to extract address granule in outlined HWASan checks.
In a kernel (or in general in environments where bit 55 of the address
is set) the shadow base needs to point to the end of the shadow region,
not the beginning. Bit 55 needs to be sign extended into bits 52-63
of the shadow base offset, otherwise we end up loading from an invalid
address. We can do this by using SBFX instead of UBFX.

Using SBFX should have no effect in the userspace case where bit 55
of the address is clear so we do so unconditionally. I don't think
we need a ABI version bump for this (but one will come anyway when
we switch to x20 for the shadow base register).

Differential Revision: https://reviews.llvm.org/D90424
2020-10-30 12:53:15 -07:00
Peter Collingbourne 3859fc653f AArch64: Switch to x20 as the shadow base register for outlined HWASan checks.
From a code size perspective it turns out to be better to use a
callee-saved register to pass the shadow base. For non-leaf functions
it avoids the need to reload the shadow base into x9 after each
function call, at the cost of an additional stack slot to save the
caller's x20. But with x9 there is also a stack size cost, either
as a result of copying x9 to a callee-saved register across calls or
by spilling it to stack, so for the non-leaf functions the change to
stack usage is largely neutral.

It is also code size (and stack size) neutral for many leaf functions.
Although they now need to save/restore x20 this can typically be
combined via LDP/STP into the x30 save/restore. In the case where
the function needs callee-saved registers or stack spills we end up
needing, on average, 8 more bytes of stack and 1 more instruction
but given the improvements to other functions this seems like the
right tradeoff.

Unfortunately we cannot change the register for the v1 (non short
granules) check because the runtime assumes that the shadow base
register is stored in x9, so the v1 check still uses x9.

Aside from that there is no change to the ABI because the choice
of shadow base register is a contract between the caller and the
outlined check function, both of which are compiler generated. We do
need to rename the v2 check functions though because the functions
are deduplicated based on their names, not on their contents, and we
need to make sure that when object files from old and new compilers
are linked together we don't end up with a function that uses x9
calling an outlined check that uses x20 or vice versa.

With this change code size of /system/lib64/*.so in an Android build
with HWASan goes from 200066976 bytes to 194085912 bytes, or a 3%
decrease.

Differential Revision: https://reviews.llvm.org/D90422
2020-10-30 12:51:30 -07:00
Craig Topper 6915c76e10 [RISCV] Don't use DCI.CombineTo to replace a single result. NFCI
Just return the new node, which is the standard practice.

I also noticed what appeared to be an unnecessary attempt at
creating an ANY_EXTEND where the type should already be correct.
I replace with an assert to verify the type.

Differential Revision: https://reviews.llvm.org/D90444
2020-10-30 10:46:32 -07:00
Sanjay Patel 251dd7c0f9 [x86] add cost overrides for mul with overflow
I'm assuming the standard size integer instructions for this end up as something like:
mulq %rsi
seto %al

And the 'mul' generally has reciprocal throughput of 1 on typical implementations
(higher latency, but that's not handled here).
The default costs may end up much higher than that, and that's what we see in the test diffs.

Vector types are left as a 'TODO'.

Differential Revision: https://reviews.llvm.org/D90431
2020-10-30 12:38:16 -04:00
Simon Moll 4474d4d49c [VE][NFC] Split up lowering init
Split up the monolithic VETargetLowering ctor into three initialization phases:
1. initRegisterClasses()
2. initSPUActions()
3. // TODO initVPUActions()

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D90463
2020-10-30 16:18:27 +01:00
Matt Arsenault 790f5771fd AMDGPU: Fix missing writelane cases to skip with exec=0 2020-10-30 11:15:11 -04:00
serge-sans-paille 0f60bcc36c [stack-clash] Fix probing of dynamic alloca
- Perform the probing in the correct direction.
  Related to https://github.com/rust-lang/rust/pull/77885#issuecomment-711062924

- The first touch on a dynamic alloca cannot use a mov because it clobbers
  existing space. Use a xor 0 instead

Differential Revision: https://reviews.llvm.org/D90216
2020-10-30 15:34:00 +01:00
Simon Pilgrim 0ff1ab42f2 Use cast<> instead of dyn_cast<> as we dereference the pointer immediately. NFCI.
Fix clang static analyzer warning - we know that the arg should be ConstantInt and we're better off relying on cast<> asserting on failure rather than a null dereference crash.
2020-10-30 14:33:20 +00:00
Florian Hahn 73f01e3df5 [TTI] Add VecPred argument to getCmpSelInstrCost.
On some targets, like AArch64, vector selects can be efficiently lowered
if the vector condition is a compare with a supported predicate.

This patch adds a new argument to getCmpSelInstrCost, to indicate the
predicate of the feeding select condition. Note that it is not
sufficient to use the context instruction when querying the cost of a
vector select starting from a scalar one, because the condition of the
vector select could be composed of compares with different predicates.

This change greatly improves modeling the costs of certain
compare/select patterns on AArch64.

I am also planning on putting up patches to make use of the new argument in
SLPVectorizer & LV.

Reviewed By: dmgreen, RKSimon

Differential Revision: https://reviews.llvm.org/D90070
2020-10-30 13:49:08 +00:00
David Sherwood cea69fa4dc [SVE] Add fatal error for unnamed SVE variadic arguments
We don't currently support passing unnamed variadic SVE arguments
so I've added a fatal error if we hit such cases to prevent any
silent ABI issues in future.

Differential Revision: https://reviews.llvm.org/D90230
2020-10-30 13:35:47 +00:00
David Green d14db8c8dc [ARM] Match MVE vqdmulh
This adds ISel matching for a form of VQDMULH. There are several ir
patterns that we could match to that instruction, this one is for:

min(ashr(mul(sext(a), sext(b)), 7), 127)

Which is what llvm will optimize to once it has removed the max that
usually makes up the min/max saturate pattern, as in this case the
compare will always be false. The additional complication to match i32
patterns (which extend into an i64) is that the min will be a
vselect/setcc, as vmin is not supported for i64 vectors. Tablegen
patterns have also been updated to attempt to reuse the MVE_TwoOpPattern
patterns.

Differential Revision: https://reviews.llvm.org/D90096
2020-10-30 13:34:27 +00:00
Simon Pilgrim 781917254d Revert rG22c383763456 "[RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h"
This reverts commit 22c3837634.

This is causing a build failure with MSVC - reported on D90322
2020-10-30 11:59:37 +00:00
alex-t a4f7e4264c [AMDGPU] SILowerControlFlow::removeMBBifRedundant. Refactoring plus fix for the null MBB pointer in MF->splice
Detailed description: This change addresses the refactoring adviced by foad. It also contain the fix for the case when getNextNode is null if the successor block is the last in MachineFunction.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D90314
2020-10-30 14:46:08 +03:00
Michael Roe fc0892c1f9 [mips] Implement add.ps, mul.ps and sub.ps
Differential revision: https://reviews.llvm.org/D90321
2020-10-30 10:59:15 +03:00
Krzysztof Parzyszek db60e64036 [Hexagon] Handle additional shuffles that can be made perfect 2020-10-29 19:09:00 -05:00
Craig Topper 74b078294f [RISCV] Improve worklist management in the DAG combine for SLLW/SRLW/SRAW
This combine makes two calls to SimplifyDemandedBits, one for the LHS and one
for the RHS. If the LHS call returns true, we don't make the RHS call. When
SimplifyDemandedBits makes a change, it will add the nodes around the change to
the DAG combiner worklist. If the simplification happens on the first recursion
step, the N will get added to the worklist. But if the simplification happens
deeper in the recursion, then N will not be revisited until the next time the
DAG combiner runs.

This patch explicitly addes N to the worklist anytime a Simplification is made.
Without this we might miss additional simplifications on the LHS or never
simplify the RHS. Special care also needs to be taken to not add N if it has
been CSEd by the simplification. There are similar examples in DAGCombiner and
the X86 target, but I don't have a test for it for RISC-V. I've also returned
SDValue(N, 0) instead of SDValue() so DAGCombiner knows a change was made and
will update its Statistic variable.

The test here was constructed so that 2 simplifications happen to the LHS.
Without this fix one happens in the post type legalization DAG combine and the
other happens after LegalizeDAG. This prevents the RHS from ever being
simplified causing the left and right shift to clear the upper 32 bits of the
RHS to be left behind.

Differential Revision: https://reviews.llvm.org/D90339
2020-10-29 14:52:53 -07:00
Craig Topper 22c3837634 [RISCV] Remove include of RISCVRegisterInfo.h from RISCVBaseInfo.h
RISCVRegisterInfo.h is part of the CodeGen layer. The Utils library
is intended to be shared with the MC layer so shouldn't use files
from the CodeGen layer.

The register enum names are already available from
RISCVMCTargetDesc.h. It appears what was coming from this include
was a transitive include of the Register class which I've replaced
with MCRegister. Register has a constructor from MCRegister so it
should be convertible.
2020-10-29 11:39:19 -07:00
Thomas Lively be6f50798e [WebAssembly] Implement SIMD signselect instructions
As proposed in https://github.com/WebAssembly/simd/pull/124, using the opcodes
adopted by V8 in
https://chromium-review.googlesource.com/c/v8/v8/+/2486235/2/src/wasm/wasm-opcodes.h.
Uses new builtin functions and a new target intrinsic exclusively to ensure that
the new instructions are only emitted when a user explicitly opts in to using
them since they are still in the prototyping and evaluation phase.

Differential Revision: https://reviews.llvm.org/D90357
2020-10-29 11:06:20 -07:00
Jay Foad 9cee87d72a [AMDGPU] Fix double space in disassembly of ds_gws_sema_* with gds
By setting up the AsmStrings correctly we can remove some special cases
from AMDGPUInstPrinter::printOffset.

Differential Revision: https://reviews.llvm.org/D90307
2020-10-29 17:31:59 +00:00
Jay Foad 58de4b2053 [AMDGPU] Use pseudo instructions for readlane/writelane
This reverts r227987 "R600/SI: Determine target-specific encoding of READLANE and WRITELANE early v2".

All the codegen changes are caused by the post-RA scheduler no longer
treating readlane/writelane as scheduling barriers due to having
unmodelled side effects. (The pseudos are hasSideEffects = 0, but the
real instructions are hasSideEffects = ? which TableGen conservatively
treats as 1.)

Differential Revision: https://reviews.llvm.org/D90401
2020-10-29 16:00:53 +00:00
Nicholas Guy eb9fe24eaf [ARM] Fix IT block generation after Thumb2SizeReduce with -Oz
Fixes a regression caused by D82439, in which IT blocks were no longer being generated when -Oz is present.

Differential Revision: https://reviews.llvm.org/D88496
2020-10-29 15:17:31 +00:00
Jay Foad 7a79921edd [AMDGPU] Remove gds operand from ds_gws_* MachineInstrs
The operand value was always 1 (except in some bad MIR tests) so it was
redundant.

Differential Revision: https://reviews.llvm.org/D90378
2020-10-29 15:04:23 +00:00
Jay Foad a442fad911 [AMDGPU] Fix double space in disassembly of s_set_gpr_idx_mode
Differential Revision: https://reviews.llvm.org/D90374
2020-10-29 14:54:33 +00:00
Jay Foad e9dd2c4fe2 [AMDGPU] Fix double space in disassembly of some DPP instructions
Differential Revision: https://reviews.llvm.org/D90373
2020-10-29 14:54:33 +00:00
Kazushi (Jam) Marukawa 58a6b7bcde [VE] Add missing BCR format
Add missing "BCR %sy, 0, target" format instruction and a regression
test for this format.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90387
2020-10-29 23:30:49 +09:00
Kazushi (Jam) Marukawa 07d1996601 [VE] Support register aliases in llvm-mc
Support register aliases in MC layer to compile existing assembly
files with clang and integrated assembler.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90383
2020-10-29 23:28:32 +09:00
Jay Foad 69f5105f5c [AMDGPU] Simplify insertNoops functions. NFC. 2020-10-29 10:55:20 +00:00
Kazushi (Jam) Marukawa 9c82944b2d [VE] Add vector control instructions
Add LVL/SVL/SMVL/LVIX isntructions.  Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90355
2020-10-29 19:24:31 +09:00
Ben Shi 076a8d915b [NFC][AVR] Improve device list
Reviewed By: dylanmckay

https://reviews.llvm.org/D87968
2020-10-29 10:54:17 +08:00
Kazushi (Jam) Marukawa 7942960199 [VE] Add vector mask operation instructions
Add VFMK/VFMS/VFMF/ANDM/ORM/XORM/EQVM/NNDM/NEGM/PCVM/LZVM/TOVM
isntructions.  Add regression tests too.  Also add new patterns
to parse VFMK/VFMS/VFMF mnemonics.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90297
2020-10-29 08:42:41 +09:00
Austin Kerbow de51867343 [AMDGPU] Add Reset function to GCNHazardRecognizer
Reset the tracked emitted instructions when starting scheduling on a new
region.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90347
2020-10-28 16:32:32 -07:00
Jay Foad 5b91a6a88b [AMDGPU] Allow some modifiers on VOP3B instructions
V_DIV_SCALE_F32/F64 are VOP3B encoded so they can't use the ABS src
modifier, but they can still use NEG and the usual output modifiers.

This partially reverts 3b99f12a4e "AMDGPU: Remove modifiers from v_div_scale_*".

Differential Revision: https://reviews.llvm.org/D90296
2020-10-28 21:54:14 +00:00
Jay Foad 50ee22d791 [AMDGPU] Fix double space in disassembly of SDWA instructions with vcc
Differential Revision: https://reviews.llvm.org/D90317
2020-10-28 21:39:39 +00:00
Florian Hahn 772aaa6023 [AArch64] Improve lowering of insert_vector_elt with 0.0 consts.
When moving +0.0 into a float vector, we can use to vi*gpr variants of
INS.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90176
2020-10-28 21:35:33 +00:00
Austin Kerbow 8b127a8661 [AMDGPU] Fix inserting combined s_nop in bundles
Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90334
2020-10-28 14:34:04 -07:00
Florian Hahn ba78cae20f [AArch64] Use DUP for BUILD_VECTOR with few different elements.
If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP  for the common elements and
INSERT_VECTOR_ELT for the different elements.

Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.

With D90176, the lowering for patterns originating from code like
` float32x4_t y = {a,a,a,0};` (common in 3D apps) are lowered even
better (unnecessary fmov is removed).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D90233
2020-10-28 19:48:20 +00:00
Sanjay Patel 7c395f31a6 [CostModel][x86] remove cost-kind predicate for intrinsic costs
We model cost as number of instructions / uops, so it does not
make sense to treat size/blended costs any differently than
throughput.
2020-10-28 14:33:37 -04:00
Thomas Lively 31e944556f [WebAssembly] Prototype extending multiplication SIMD instructions
As proposed in https://github.com/WebAssembly/simd/pull/376. This commit
implements new builtin functions and intrinsics for these instructions, but does
not yet add them to wasm_simd128.h because they have not yet been merged to the
proposal. These are the first instructions with opcodes greater than 0xff, so
this commit updates the MC layer and disassembler to handle that correctly.

Differential Revision: https://reviews.llvm.org/D90253
2020-10-28 09:38:59 -07:00
Paul C. Anagnostopoulos 9d72065cf6 [TableGen] [AMDGPU] Add !sub operator for subtraction
Use it in the AMDGPU target to eliminate !add(value1, !mul(value2, -1))

Differential Revision: https://reviews.llvm.org/D90107
2020-10-28 12:27:53 -04:00
Jay Foad 9e634bc22f [AMDGPU] Omit needless string concatenations. NFC. 2020-10-28 12:56:52 +00:00
Kazushi (Jam) Marukawa cbdee7df06 [VE] Add vector merger operation instructions
Add VMRG/VSHF/VCP/VEX isntructions.  Add regression tests too.
Also add new patterns to parse new UImm4 oeprand.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90292
2020-10-28 19:57:10 +09:00
Kazushi (Jam) Marukawa 7ce2b93cbe [VE] Add vector iterative operation instructions
Add VFIA/VFIS/VFIM/VFIAM/VFISM/VFIMA/VFIMS isntructions.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90252
2020-10-28 19:06:46 +09:00
Kazushi (Jam) Marukawa 15f6250bed [VE][NFC] Fix typo in comment 2020-10-28 18:51:07 +09:00
Kazushi (Jam) Marukawa b22e32a9c8 [VE] Specify to expand BRIND and BR_JT
BRIND and BR_JT are not implmented yet, so expand them atm.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90283
2020-10-28 18:50:20 +09:00
David Green 066737fdbc [AArch64] Remove AArch64ISD::NOT, use vnot instead
vnot (xor -1) should be equivalent to the AArch64 specific AArch64ISD::NOT
node, but allow more folding thanks to all the target independent
optimizations. Specifically this allows select(icmp ne, x, y) to
become "cmeq; bsl y, x" as opposed to needing to convert the predicate
with "cmeq; mvn; bsl x, y"

Unfortunately there is a regression in a cmtst test, but the code it
selected from was already non-canonical, with instcombine preferring to
use an eq predicate instead. Plus the more common case of icmp ne is
improved.

Differential Revision: https://reviews.llvm.org/D90126
2020-10-28 08:15:37 +00:00
Carl Ritson 057934a6d7 [AMDGPU] Fix insert of SIPreAllocateWWMRegs in FastRegAlloc
SIPreAllocateWWMRegs was being inserted after RegisterCoalescer
but this pass does not exist during FastAlloc so pre-allocation
pass was never being run.
Insert pre-allocation after TwoAddressInstructionPass instead.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D90236
2020-10-28 12:15:15 +09:00
Nemanja Ivanovic 5459d08795 [PowerPC] Fix single-use check and update chain users for ld-splat
When converting a BUILD_VECTOR or VECTOR_SHUFFLE to a splatting load
as of 1461fb6e78, we inaccurately check
for a single user of the load and neglect to update the users of the
output chain of the original load. As a result, we can emit a new
load when the original load is kept and the new load can be reordered
after a dependent store. This patch fixes those two issues.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47891
2020-10-27 16:49:38 -05:00
Stanislav Mekhanoshin 78ae1f6c90 [AMDGPU] Change predicate for fma/fmac legacy
I do not exactly like the use of a negative predicate to
enable instructions' support. Change HasNoMadMacF32Insts
with HasFmaLegacy32.

Differential Revision: https://reviews.llvm.org/D90250
2020-10-27 12:03:52 -07:00
Victor Huang 2e1a737f46 [PowerPC][PCRelative] Turn on TLS support for PCRel by default
Turn on TLS support for PCRel by default and update the test cases.

Differential Revision: https://reviews.llvm.org/D88738
Reviewed by: stefanp, kamaub
2020-10-27 13:58:44 -05:00
Michael Liao 46c3d5cb05 [amdgpu] Add the late codegen preparation pass.
Summary:
- Teach that pass to widen naturally aligned but not DWORD aligned
  sub-DWORD loads.

Reviewers: rampitec, arsenm

Subscribers:

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80364
2020-10-27 14:07:59 -04:00
Kazushi (Jam) Marukawa a65883a78a [VE] Add vector reduction instructions
Add VSUMS/VSUMX/VFSUM/VMAXS/VMAXX/VFMAX/VRAND/VROR/VRXOR isntructions.
Add regression tests too.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90227
2020-10-28 02:33:21 +09:00
Michael Liao 0d092303b4 [amdgpu] Enable use of AA during codegen.
- Add an internal option `-amdgpu-use-aa-in-codegen` to enable or
  disable this feature. By Default, it's enabled.

Differential Revision: https://reviews.llvm.org/D89320
2020-10-27 09:46:23 -04:00
Benjamin Kramer 35f7cbf9df [X86] Don't crash on CVTPS2PH with wide vector inputs. 2020-10-27 14:42:02 +01:00
Kazushi (Jam) Marukawa c5fa6bae12 [VE] Add vector float instructions
Add VFAD/VFSB/VFMP/VFDV/VFSQRT/VFCP/VFCM/VFMAD/VFMSB/VFNMAD/VFNMSB/
VRCP/VRSQRT/VRSQRTNEX/VFIX/VFIXX/VFLT/VFLTX/VCVS/VCVD instructions.
Add regression tests too.  Also add additional AsmParser for VFIX
and VFIXX instructions to parse their mnemonic.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90166
2020-10-27 20:42:24 +09:00
Jay Foad 6539ebe97d [AMDGPU] Use DPP instead of Ext in a couple of class names. NFC. 2020-10-27 10:22:30 +00:00
Craig Topper f385823e04 [X86] Alternate implementation of D88194.
This uses PreprocessISelDAG to replace the constant before
instruction selection instead of matching opcodes after.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89178
2020-10-27 00:20:03 -07:00
Wei Wang d602e79a81 [X86] Encode global address in small code model
In small code model, program and its symbols are linked in the lower 2 GB of
the address space. Try encoding global address even when the range is unknown
in such case.

Differential Revision: https://reviews.llvm.org/D89341
2020-10-26 23:14:06 -07:00
Bing1 Yu 2c08f1b4b6 [CostModel][X86] teach TTI calculate cost of chain of vector inserts/extracts more precisely and correctly:In each 128-lane, if there is at least one index is demanded and not all indices are demanded...
In each 128-lane, if there is at least one index is demanded and not all
indices are demanded and this 128-lane is not the first 128-lane of the
legalized-vector, then this 128-lane needs a extracti128;
If in each 128-lane, there is at least one index is demanded, this 128-lane
needs a inserti128.

The following cases will help you build a better understanding:
Assume we insert several elements into a v8i32 vector in avx2,
Case#1: inserting into 1th index needs vpinsrd + inserti128
Case#2: inserting into 5th index needs extracti128 + vpinsrd +
inserti128
Case#3: inserting into 4,5,6,7 index needs 4*vpinsrd + inserti128.

Reviewed By: pengfei, RKSimon

Differential Revision: https://reviews.llvm.org/D89767
2020-10-27 11:21:13 +08:00
Chen Zheng 00e573cadb [LSR] fix typo in comments and rename for a new added hook. 2020-10-26 22:29:22 -04:00
Carl Ritson 7a880ab388 [AMDGPU] Move WQM Pass after MI Scheduler
Exec mask manipulation inserted by SIWholeQuadMode barriers to
instruction scheduling.  Move the entire pass after the machine
instruction scheduler and make changes so pass is correct for
non-SSA operation.  These changes should leave the pass still
usable pre-scheduler, although tests have be updated to reflect
post-scheduler results.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D88081
2020-10-27 10:25:53 +09:00
Amy Kwan 803cc3aff2 [PowerPC] Implement Set Boolean Condition Instructions
This patch implements the set boolean condition instructions introduced in
POWER10.

The set boolean condition instructions (set[n]bc[r]) are used during
the following situations:
- sign/zero/any extending i1 to an i32 or i64,
- reg+reg, reg+imm or floating point comparisons being sign/zero extended to i32 or i64,
- spilling CR bits (using the setnbc instruction)

Differential Revision: https://reviews.llvm.org/D87705
2020-10-26 18:42:51 -05:00
Stanislav Mekhanoshin d176e13ca5 Fixed release build after D89170 2020-10-26 16:00:57 -07:00
Stanislav Mekhanoshin 038d884a50 [AMDGPU] Use flat scratch instructions where available
The support is disabled by default. So far there is instruction
selection, spilling, and frame elimination. It also changes SP
from unswizzled to swizzled as used by flat scratch instructions,
so it cannot be mixed with MUBUF stack access.

At the very least missing:

- GlobalISel;
- Some optimizations in frame elimination in between vector
  and scalar ALU;
- It shall finally allow to always materialize frame index
  as an SGPR, but that is not implemented and frame elimination
  cannot handle it yet;
- Unaligned and/or multidword flat scratch shall work, but it
  is legalized now for MUBUF;
- Operand folding cannot optimize FI like with MUBUF yet;
- It will need scaling the value of the SP/FP in the DWARF
  expression to recover the unswizzled scratch address;

Differential Revision: https://reviews.llvm.org/D89170
2020-10-26 14:40:42 -07:00
Evgeny Leviant a28388f95b [ARM][SchedModels] Move IsLDMBaseRegInListPred to ARMSchedule.td. NFC
This predicate is not specific to cortex-a57 and can be used in other processor
models as well.
2020-10-26 22:31:41 +03:00
Stanislav Mekhanoshin ad8131bb03 [AMDGPU] Fix VC warning about singed/unsigned comparison. NFC.
This is the warning reported in https://reviews.llvm.org/D89599
2020-10-26 11:55:57 -07:00
Evgeny Leviant e74f66125e [ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D90150
2020-10-26 20:22:41 +03:00
Evgeny Leviant a877bda397 Fix issue in cortex-a57 sched model
Differential revision: https://reviews.llvm.org/D90152
2020-10-26 20:16:40 +03:00
Benjamin Kramer b777d30496 [AMDGPU] Avoid unused variable warning in Release builds. NFC.
SIRegisterInfo.cpp:480:19: error: unused variable 'SOffset'
2020-10-26 18:11:57 +01:00
Kazushi (Jam) Marukawa 9d0db405b5 [VE] Add vector shift instructions
Add VSLL/VSLD/VSRL/VSLA/VSLAX/VSRA/VSRAX/VSFA instructionss.  Add
additonal AsmParser for VSLD special operand.  Also add regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90143
2020-10-27 00:30:27 +09:00
Kazushi (Jam) Marukawa 83cb423c6e [VE] Add vector logical instructions
Add VAND/VOR/VXOE/VEQV/VLDZ/VPCNT/VBRV/VSEQ instrucitons and regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90141
2020-10-27 00:29:33 +09:00
Kazushi (Jam) Marukawa cfefef50c1 [VE] Support atomic store
Support atomic store instructions and add a regression test.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90137
2020-10-27 00:28:11 +09:00
Jay Foad 0ca4124798 [AMDGPU] Make more use of printNamedBit in AMDGPUInstPrinter. NFC. 2020-10-26 14:03:35 +00:00
Kazushi (Jam) Marukawa 8aa60f67dc [VE] Add vector comparison and min/max
Add VCMP/VCPS/VCPX/VCMS/VCMX vector instructions.  Also add regression
tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89643
2020-10-26 18:32:04 +09:00
Kazushi (Jam) Marukawa 0acf700243 [VE] Add integer arithmetic vector instructions
Add VADD/VADS/VADX/VSUB/VSBS/VSBX/VMPY/VMPS/VMPX/VMPD/VDIV/VDVS/VDVX
instructions.  Also add regression tests.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89642
2020-10-26 18:30:11 +09:00
Sebastian Neubauer a094b4fa4b [AMDGPU] Emit new pal metadata by default
If no pal metadata is given, default to the msgpack format instead of
the legacy metadata. This makes tests better readable.

Differential Revision: https://reviews.llvm.org/D90035
2020-10-26 10:16:17 +01:00
Evgeny Leviant a95ce5f65f [ARM][SchedModels] Rename and generalize predicate. NFC 2020-10-26 12:14:55 +03:00
Kazushi (Jam) Marukawa f32992ad24 [VE] Support atomic load
Support atomic load instruction and add a regression test.
VE uses release consitency, so need to insert fence around
atomic instructions.  This patch enable AtomicExpandPass
and use emitLeadingFence and emitTrailingFence mechanism
for such purpose.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90135
2020-10-26 18:02:45 +09:00
Evgeny Leviant 99b2756517 [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred
Differential revision: https://reviews.llvm.org/D90024
2020-10-26 12:01:39 +03:00
Evgeny Leviant a4fc18e641 [ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D90029
2020-10-26 11:54:08 +03:00
Evgeny Leviant d613e39d52 [ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D90045
2020-10-26 11:43:02 +03:00
David Green 61bc18de0b [Schedule] Add a MultiHazardRecognizer
This adds a MultiHazardRecognizer and starts to make use of it in the
ARM backend. The idea of the class is to allow multiple independent
hazard recognizers to be added to a single base MultiHazardRecognizer,
allowing them to all work in parallel without requiring them to be
chained into subclasses. They can then be added or not based on cpu or
subtarget features, which will become useful in the ARM backend once
more hazard recognizers are being used for various things.

This also renames ARMHazardRecognizer to ARMHazardRecognizerFPMLx in the
process, to more clearly explain what that recognizer is designed for.

Differential Revision: https://reviews.llvm.org/D72939
2020-10-26 08:06:17 +00:00
Kazushi (Jam) Marukawa 52f03fe115 [VE] Support atomic fence
Support atomic fence instruction and add a regression test.
Add MEMBARRIER pseudo insturction also to use it as a barrier
against to the compiler optimizations.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D90112
2020-10-26 17:03:09 +09:00
Christudasan Devadasan 5a061041ec [AMDGPU] Avoid offset register in MUBUF for direct stack object accesses
We use an absolute address for stack objects and
it would be necessary to have a constant 0 for soffset field.

Fixes: SWDEV-228562

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89234
2020-10-26 11:08:37 +05:30
Craig Topper 82974e0114 [X86] Don't disassemble wbinvd with 0xf2 or 0x66 prefix.
The 0xf3 prefix has been defined as wbnoinvd on Icelake Server. So
the prefix isn't ignored by the CPU. AMD documentation suggests that
wbnoinvd is treated as wbinvd on older processors. Intel documentation
is not clear. Perhaps 0xf2 and 0x66 are treated the same, but its
not documented.

This patch changes TB to PS in the td file so 0xf2 and 0x66 will
be treated as errors. This matches versions of objdump after
wbnoinvd was added.
2020-10-25 20:56:01 -07:00
Liu, Chen3 180548c5c7 [X86] VEX/EVEX prefix doesn't work for inline assembly.
For now, we lost the encoding information if we using inline assembly.
The encoding for the inline assembly will keep default even if we add
the vex/evex prefix.

Differential Revision: https://reviews.llvm.org/D90009
2020-10-26 08:37:45 +08:00
Craig Topper 63ba82ed00 [X86] Use TargetConstant for immediates for VASTART_SAVE_XMM_REGS. 2020-10-25 12:52:56 -07:00
Craig Topper 2ed16aa66f [X86] Use TargetConstant instead of Constant for operands to X86vaarg64. 2020-10-25 12:24:59 -07:00
Craig Topper a222d832d5 [X86] Use TargetConstant for FPDiff with X86::TC_RETURN.
It's required to be a constant and can never be in a register so
make it explicit.
2020-10-25 00:29:11 -07:00
Fangrui Song f04d92af94 [X86] Produce R_X86_64_GOTPCRELX for test/binop instructions (MOV32rm/TEST32rm/...) when -Wa,-mrelax-relocations=yes is enabled
We have been producing R_X86_64_REX_GOTPCRELX (MOV64rm/TEST64rm/...) and
R_X86_64_GOTPCRELX for CALL64m/JMP64m without the REX prefix since 2016 (to be
consistent with GNU as), but not for MOV32rm/TEST32rm/...
2020-10-24 15:14:17 -07:00
Fangrui Song d5adadb3a5 [AArch64][GlobalISel] Fix -Wunused-variable. NFC 2020-10-24 12:47:11 -07:00
Benjamin Kramer 39a0d6889d [X86] Add a stub for Intel's alderlake.
No scheduling, no autodetection.
2020-10-24 19:01:22 +02:00
Benjamin Kramer bd2cf96c09 [X86] Add a stub for znver3 based on the little public information there is in AMD's manuals
No scheduling, no autodetection. Just enough so -march=znver3 works.
2020-10-24 19:01:22 +02:00
dfukalov 9068c20965 [AMDGPU][CostModel] Refine cost model for half- and quarter-rate instructions.
1. Throughput and codesize costs estimations was separated and updated.
2. Updated fdiv cost estimation for different cases.
3. Added scalarization processing for types that are treated as !isSimple() to
improve codesize estimation in getArithmeticInstrCost() and
getArithmeticInstrCost(). The code was borrowed from TCK_RecipThroughput path
of base implementation.

Next step is unify scalarization part in base class that is currently works for
TCK_RecipThroughput path only.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89973
2020-10-24 19:53:08 +03:00
David Green 92205bf122 [ARM] Remove some dead code. NFC 2020-10-24 17:22:49 +01:00
Simon Pilgrim ce356e1546 [DAG] Add BuildVectorSDNode::getRepeatedSequence helper to recognise multi-element splat patterns
Replace the X86 specific isSplatZeroExtended helper with a generic BuildVectorSDNode method.

I've just used this to simplify the X86ISD::BROADCASTM lowering so far (and remove isSplatZeroExtended), but we should be able to use this in more places to lower to complex broadcast patterns.

Differential Revision: https://reviews.llvm.org/D87930
2020-10-24 12:23:09 +01:00
Jonas Paulsson 7c026a83ee [SystemZ] Define MaxInstLength to have the value of 6.
This value had the default value of 4 which caused branch relaxation to fail.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D90065
2020-10-24 09:19:34 +02:00
Krzysztof Parzyszek 1b5baa42bc [Hexagon] Handle selection between HVX vector predicates
Make sure that (select i1 q0 q1) is handled properly.
2020-10-23 18:22:03 -05:00
Evandro Menezes fe9a7d9627 [RISCV] Use the commercial name for scheduling model (NFC)
Use the commercial name for the scheduling model for the SiFive 7 Series.
2020-10-23 16:33:27 -05:00
Cameron McInally a1cc274cb3 [SVE] Lower fixed length VECREDUCE_SEQ_FADD operation
Differential Revision: https://reviews.llvm.org/D89162
2020-10-23 16:24:02 -05:00
Stanislav Mekhanoshin 2e64ad9494 [AMDGPU] Fixed isLegalRegOperand() with physregs
This does not change anything at the moment, but needed for
D89170. In that change I am probing a physical SGPR to see if
it is legal. RC is SReg_32, but DRC for scratch instructions
is SReg_32_XEXEC_HI and test fails.

That is sufficient just to check if DRC contains a register
here in case of physreg. Physregs also do not use subregs
so the subreg handling below is irrelevant for these.

Differential Revision: https://reviews.llvm.org/D90064
2020-10-23 11:33:34 -07:00
Baptiste Saleil edb27912a3 [PowerPC] Add intrinsics for MMA
This patch adds support for MMA intrinsics.

Authored by: Baptiste Saleil

Reviewed By: #powerpc, bsaleil, amyk

Differential Revision: https://reviews.llvm.org/D89345
2020-10-23 13:16:02 -05:00
Amara Emerson 0f0fd383b4 [AArch64][GlobalISel] Introduce a new post-isel optimization pass.
There are two optimizations here:

1. Consider the following code:
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel1:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
 %sub:gpr32 = SUBSWrr %_, %_, implicit-def $nzcv
 FCMPSrr %0, %1, implicit-def $nzcv
 %sel2:gpr32 = CSELWr %_, %_, 12, implicit $nzcv
This kind of code where we have 2 FCMPs each feeding a CSEL can happen
when we have a single IR fcmp being used by two selects. During selection,
to ensure that there can be no clobbering of nzcv between the fcmp and the
csel, we have to generate an fcmp immediately before each csel is
selected.

However, often we can essentially CSE these together later in MachineCSE.
This doesn't work though if there are unrelated flag-setting instructions
in between the two FCMPs. In this case, the SUBS defines NZCV
but it doesn't have any users, being overwritten by the second FCMP.

Our solution here is to try to convert flag setting operations between
a interval of identical FCMPs, so that CSE will be able to eliminate one.

2. SelectionDAG imported patterns for arithmetic ops currently select the
flag-setting ops for CSE reasons, and add the implicit-def $nzcv operand
to those instructions. However if those impdef operands are not marked as
dead, the peephole optimizations are not able to optimize them into non-flag
setting variants. The optimization here is to find these dead imp-defs and
mark them as such.

This pass is only enabled when optimizations are enabled.

Differential Revision: https://reviews.llvm.org/D89415
2020-10-23 10:18:36 -07:00
Huihui Zhang 1e113c078a [AArch64][SVE] Fix umin/umax lowering to handle out of range imm.
Immediate must be in an integer range [0,255] for umin/umax instruction.
Extend pattern matching helper SelectSVEArithImm() to take in value type
bitwidth when checking immediate value is in range or not.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D89831
2020-10-23 09:42:56 -07:00
Victor Huang 7a74bb899a [PowerPC] Fix the Predicates for enabling pcrelative-memops and PLXVP/PSTXVP definitions
In this patch, Predicates fix added for the following:
* disable prefix-instrs will disable pcrelative-memops
* set two predicates PairedVectorMemops and PrefixInstrs for PLXVP/PSTXVP definitions

Differential Revision: https://reviews.llvm.org/D89727
Reviewed by: amyk, steven.zhang
2020-10-23 11:33:20 -05:00
vpykhtin 00255f4192 [AMDGPU] Fix access beyond the end of the basic block in execMayBeModifiedBeforeAnyUse.
I was wrong in thinking that MRI.use_instructions return unique instructions and mislead Jay in his previous patch D64393.

First loop counted more instructions than it was in reality and the second loop went beyond the basic block with that counter.

I used Jay's previous code that relied on MRI.use_operands to constrain the number of instructions to check among.
modifiesRegister is inlined to reduce the number of passes over instruction operands and added assert on BB end boundary.

Differential Revision: https://reviews.llvm.org/D89386
2020-10-23 19:17:48 +03:00
Paulo Matos 69e2797eae [WebAssembly] Implementation of (most) table instructions
Implementation of instructions table.get, table.set, table.grow,
table.size, table.fill, table.copy.

Missing instructions are table.init and elem.drop as they deal with
element sections which are not yet implemented.

Added more tests to tables.s

Differential Revision: https://reviews.llvm.org/D89797
2020-10-23 08:42:54 -07:00
Jay Foad 958130dfda [AMDGPU] Add simplification/combines for llvm.amdgcn.fma.legacy
This follows on from D89558 which added the new intrinsic and D88955
which added similar combines for llvm.amdgcn.fmul.legacy.

Differential Revision: https://reviews.llvm.org/D90028
2020-10-23 16:16:13 +01:00
Paul C. Anagnostopoulos 876af264c1 [TableGen] Change !getop and !setop to !getdagop and !setdagop.
Differential Revision: https://reviews.llvm.org/D89814
2020-10-23 10:36:05 -04:00
Matt Arsenault 8a59d4b654 AMDGPU: Don't query for TII in TII 2020-10-23 10:34:24 -04:00
Matt Arsenault d61996473d AMDGPU: Increase branch size estimate with offset bug
This will be relaxed to insert a nop if the offset hits the bad value,
so over estimate branch instruction sizes.
2020-10-23 10:34:24 -04:00
Chen Zheng 1e0b6c1df0 [LSR] ignore profitable chain when reg num is not major cost.
Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D89665
2020-10-23 09:35:48 -04:00
Simon Pilgrim 936ef89ebe [X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI. 2020-10-23 12:35:27 +01:00
Evgeny Leviant cb86522c94 [ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC
Differential revision: https://reviews.llvm.org/D90017
2020-10-23 14:27:49 +03:00
Florian Hahn 0fcc6f7a76 [AArch64] Implement getIntrinsicInstrCost, handle min/max intrinsics.
This patch adds a specialized implementation of getIntrinsicInstrCost
and add initial cost-modeling for min/max vector intrinsics.

AArch64 NEON support umin/smin/umax/smax for vectors
<8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>.
Notably, it does not support vectors with i64 elements.

This change by itself should have very little impact on codegen, but in
follow-up patches I plan to teach the vectorizers to consider using
those intrinsics on platforms where it is profitable, e.g. because there
is no general 'select'-like instruction.

The current cost returned should be better for throughput, latency and size.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D89953
2020-10-23 11:32:42 +01:00
Jay Foad 86a480e9ce [AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy
Differential Revision: https://reviews.llvm.org/D88955
2020-10-23 09:31:00 +01:00
Evgeny Leviant 7a78073be7 [ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89957
2020-10-23 10:33:20 +03:00
Jessica Paquette 19dc9c9780 [AArch64][GlobalISel] Move imm adjustment for G_ICMP to post-legalizer lowering
Move the code which adjusts the immediate/predicate on a G_ICMP to
AArch64PostLegalizerLowering.

This

- Reduces the number of places we need to test for optimized compares in the
selector. We know that the compare should have been simplified by the time it
hits the selector, so we can avoid testing this in selects, brconds, etc.

- Allows us to potentially fold more compares (previously, this optimization
was only done after calling `tryFoldCompare`, this may allow us to hit some more
TST cases)

- Simplifies the selection code in `emitIntegerCompare` significantly; we can
just use an emitSUBS function.

- Allows us to avoid checking that the predicate has been updated after
`emitIntegerCompare`.

Also add a utility header file for things that may be useful in the selector
and various combiners. No need for an implementation file at this point, since
it's just one constexpr function for now. I've run into a couple cases where
having one of these would be handy, so might as well add it here. There are
a couple functions in the selector that can probably be factored out into
here.

Differential Revision: https://reviews.llvm.org/D89823
2020-10-22 15:27:36 -07:00
Jessica Paquette 147b9497e7 [AArch64][GlobalISel] Split post-legalizer combiner to allow for lowering at -O0
There are a lot of combines in AArch64PostLegalizerCombiner which exist to
facilitate instruction matching in the selector. (E.g. matching for G_ZIP and
other shuffle vector pseudos)

It still makes sense to select these instructions at -O0.

Matching earlier in a combiner can reduce complexity in the selector
significantly. For example, a good portion of our selection code for compares
would be a lot easier to represent in a combine.

This patch moves matching combines into a "AArch64PostLegalizerLowering"
combiner which runs at all optimization levels.

Also, while we're here, improve the documentation for the
AArch64PostLegalizerCombiner, and fix up the filepath in its file comment.

And also add a 'r' which somehow got dropped from a bunch of function names.

https://reviews.llvm.org/D89820
2020-10-22 14:43:25 -07:00
Tim Corringham 3c1273d737 [AMDGPU] Add amdgpu specific loop threshold metadata
Add new loop metadata amdgpu.loop.unroll.threshold to allow the initial AMDGPU
specific unroll threshold value to be specified on a loop by loop basis.

The intention is to be able to to allow more nuanced hints, e.g. specifying a
low threshold value to indicate that a loop may be unrolled if cheap enough
rather than using the all or nothing llvm.loop.unroll.disable metadata.

Differential Revision: https://reviews.llvm.org/D84779
2020-10-22 17:21:32 +01:00
Mircea Trofin e24537d48f [NFC][MC] Use MCRegister for ReachingDefAnalysis APIs
Also updated the users of the APIs; and a drive-by small change to
RDFRegister.cpp

Differential Revision: https://reviews.llvm.org/D89912
2020-10-22 08:47:35 -07:00
Piotr Sobczak 7ae0033ca8 [AMDGPU] Fix expansion of i16 MULH
This commit marks i16 MULH as expand in AMDGPU backend,
which is necessary after the refactoring in D80485.

Differential Revision: https://reviews.llvm.org/D89965
2020-10-22 17:05:06 +02:00
Evgeny Leviant ed6a91f456 [ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89939
2020-10-22 18:03:01 +03:00
Simon Pilgrim 2692978050 [X86] X86AsmParser - make methods const where possible. NFCI.
Reported by cppcheck
2020-10-22 15:55:06 +01:00
Simon Pilgrim 091b18ba81 [X86] Return const& in IntelExprStateMachine::getIdentifierInfo(). NFCI.
Avoid unnecessary copy in X86AsmParser::ParseIntelOperand
2020-10-22 15:55:06 +01:00
Matt Arsenault d5c0561667 AMDGPU: Fix not always reserving VGPRs used for SGPR spilling
The VGPRs used for SGPR spills need to be reserved, even if we aren't
speculatively reserving one.

This was broken by 117e5609e9.
2020-10-22 10:19:19 -04:00
Matt Arsenault d3bcfe2a36 AMDGPU: Implement getNoPreservedMask
We don't support funclets for exception handling and I hit this when
manually reducing MIR.
2020-10-22 10:17:31 -04:00
Simon Pilgrim 794dc7ad26 [CodeGen] Split MVT::changeTypeToInteger() functionality from EVT::changeTypeToInteger().
Add the MVT equivalent handling for EVT changeTypeToInteger/changeVectorElementType/changeVectorElementTypeToInteger.

All the SimpleVT code already exists inside the EVT equivalents, but by splitting this out we can use these directly inside MVT types without converting to/from EVT.
2020-10-22 14:27:42 +01:00
Jeremy Morse d73275993b [DebugInstrRef] Substitute debug value numbers to handle optimizations
This patch touches two optimizations, TwoAddressInstruction and X86's
FixupLEAs pass, both of which optimize by re-creating instructions. For
LEAs, various bits of arithmetic are better represented as LEAs on X86,
while TwoAddressInstruction sometimes converts instrs into three address
instructions if it's profitable.

For debug instruction referencing, both of these require substitutions to
be created -- the old instruction number must be pointed to the new
instruction number, as illustrated in the added test. If this isn't done,
any variable locations based on the optimized instruction are
conservatively dropped.

Differential Revision: https://reviews.llvm.org/D85756
2020-10-22 13:01:03 +01:00
Tianqing Wang be39a6fe6f [X86] Add User Interrupts(UINTR) instructions
For more details about these instructions, please refer to the latest
ISE document:
https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D89301
2020-10-22 17:33:07 +08:00
Tony 8e8cc587a5 [NFC][AMDGPU] Reorder SIMemoryLegalizer functions to be consistent
- Make the SIMemoryLegalizer insertAcquire function be in the same
  order for each target to be consistent.

Differential Revision: https://reviews.llvm.org/D89880
2020-10-22 05:39:18 +00:00
Xiang1 Zhang 7c3fea7721 [X86] Support customizing stack protector guard
Reviewed By: nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D88631
2020-10-22 10:08:14 +08:00
Craig Topper 9e884169a2 [FPEnv][X86][SystemZ] Use different algorithms for i64->double uint_to_fp under strictfp to avoid producing -0.0 when rounding toward negative infinity
Some of our conversion algorithms produce -0.0 when converting unsigned i64 to double when the rounding mode is round toward negative. This switches them to other algorithms that don't have this problem. Since it is undefined behavior to change rounding mode with the non-strict nodes, this patch only changes the behavior for strict nodes.

There are still problems with unsigned i32 conversions too which I'll try to fix in another patch.

Fixes part of PR47393

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87115
2020-10-21 18:12:54 -07:00
Gaurav Jain 4634ad6c0b [NFC] Set return type of getStackPointerRegisterToSaveRestore to Register
Differential Revision: https://reviews.llvm.org/D89858
2020-10-21 16:19:38 -07:00
Evgeny Leviant bf9edcb6fd [ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89876
2020-10-21 20:49:10 +03:00
Stanislav Mekhanoshin 611959f004 [AMDGPU] Fixed v_swap_b32 match
1. Fixed liveness issue with implicit kills.
2. Fixed potential problem with an indirect mov.

Fixes: SWDEV-256848

Differential Revision: https://reviews.llvm.org/D89599
2020-10-21 10:14:24 -07:00
Joe Nash f6d7832f4c [AMDGPU] Refactor SOPC & SOPP .td for extension
We use the Real vs Pseudo instruction abstraction for other
types of instructions to facilitate changes in opcode
between gpu generations.
This patch introduces that abstraction to SOPC and SOPP.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89738

Change-Id: I59d53c2c7058b49d05b60350f4062a9b542d3138
2020-10-21 12:35:52 -04:00
Matt Arsenault 1ed4caff1d AMDGPU: Lower the threshold reported for maximum stack size exceeded
Check the actual maximum supported stack size for a kernel.
2020-10-21 12:06:27 -04:00
Matt Arsenault 53c43431bc AMDGPU: Propagate amdgpu-flat-work-group-size attributes
Fixes being overly conservative with the register counts in called
functions. This should try to do a conservative range merge, but for
now just clone.

Also fix not being able to functionally run the pass standalone.
2020-10-21 12:06:24 -04:00
Paul C. Anagnostopoulos dfd6b69e01 [ARM] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89822
2020-10-21 09:52:45 -04:00
Jonas Paulsson 1606755da0 [SystemZ] Mark unsaved argument R6 as live throughout function.
For historical reasons, the R6 register is a callee-saved argument
register. This means that if it is used to pass an argument to a function
that does not clobber it, it is live throughout the function.

This patch makes sure that in this special case any kill flags of it are
removed.

Review: Ulrich Weigand, Eli Friedman

Differential Revision: https://reviews.llvm.org/D89451
2020-10-21 14:38:59 +02:00
Nicholas Guy 9a2d2bedb7 Add "SkipDead" parameter to TargetInstrInfo::DefinesPredicate
Some instructions may be removable through processes such as IfConversion,
however DefinesPredicate can not be made aware of when this should be considered.
This parameter allows DefinesPredicate to distinguish these removable instructions
on a per-call basis, allowing for more fine-grained control from processes like
ifConversion.

Renames DefinesPredicate to ClobbersPredicate, to better reflect it's purpose

Differential Revision: https://reviews.llvm.org/D88494
2020-10-21 11:52:47 +01:00
Sebastian Neubauer 5290f50e44 [AMDGPU] Fix off by one in assert
D89217 did not subtract one when accessing SubRegFromChannelTable in one
place.

Differential Revision: https://reviews.llvm.org/D89804
2020-10-21 12:37:52 +02:00
Jay Foad f6a5699c6c [AMDGPU][TableGen] Make more use of !ne !not !and !or. NFC. 2020-10-21 09:56:43 +01:00
Craig Topper d4d0b41a82 [X86] Remove period from end of error message in assembler
Addresses post-commit feedback from D89837.
2020-10-21 00:43:23 -07:00
Craig Topper 79a69f558f [X86] Error on using h-registers with REX prefix in the assembler instead of leaving it to a fatal error in the encoder.
Using a fatal error is bad for user experience.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D89837
2020-10-20 21:35:44 -07:00
Carl Ritson 324a15cead [AMDGPU][NFC] Fix missing size in comment 2020-10-21 11:38:21 +09:00
Austin Kerbow ebdcef20ce [AMDGPU] Avoid inserting noops during scheduling
Passes that are run after the post-RA scheduler may insert instructions like
waitcnt which eliminate the need for certain noops. After this patch the
scheduler is still aware of possible latency from hazards but noops will
not be inserted until the dedicated hazard recognizer pass is run.

Depends on D89753.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D89754
2020-10-20 17:11:36 -07:00
Austin Kerbow 37d907899f [HazardRec] Allow inserting multiple wait-states simultaneously
If a target can encode multiple wait-states into a noop allow emitting such
instructions directly.

Reviewed By: rampitec, dmgreen

Differential Revision: https://reviews.llvm.org/D89753
2020-10-20 17:03:47 -07:00
Tony 1bc7bfffdb [AMDGPU] Optimize waitcnt insertion for flat memory operations
Change waitcnt insertion to check the memory operand tokens to see if
flat memory operations access VMEM in the same way it does to check if
accessing LDS. This avoids adding waitcnt for counters for address
spaces that are not accessed.

In addition, only generate the pessimistic waitcnt 0 if a flat memory
operation appears to access both VMEM and LDS.

This benefits flat memory operations that explicitly specify the
address space as GLOBAL or LOCAL.

Differential Revision: https://reviews.llvm.org/D89618
2020-10-20 22:55:12 +00:00
Craig Topper 1298252f80 [X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction.
Instead of handling before parsing, just fix it after parsing.
2020-10-20 15:22:00 -07:00
Craig Topper 702aae368a [X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction.
Instead of detecting the mnemonic and hacking the operands before
parsing. Just fix it up after parsing.
2020-10-20 15:20:46 -07:00
Paul C. Anagnostopoulos 332ff48dee [AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans
Differential Revision: https://reviews.llvm.org/D89796
2020-10-20 16:23:15 -04:00
vnalamot 89f7ccea6f [AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly
Remove getAllVGPR32() interface and update the SGPR spill code to use
a proper method to get the relevant VGPR registers list.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D89806
2020-10-20 23:15:24 +05:30
Jay Foad 4913e3627c [AMDGPU] Remove unused declaration. NFC.
The implementation of this method was removed in D89706.
2020-10-20 16:31:42 +01:00
Michael Liao 2a0e4d1c01 [amdgpu] Enhance AMDGPU AA.
- In general, a generic point may alias to pointers in all other address
  spaces. However, for certain cases enforced by the programming model,
  we may found a generic point won't alias to pointers to local objects.
  * When a generic pointer is loaded from the constant address space, it
    could only be a pointer to the GLOBAL or CONSTANT address space.
    Thus, it won't alias to pointers to the PRIVATE or LOCAL address
    space.
  * When a generic pointer is passed as a kernel argument, it also could
    only be a pointer to the GLOBAL or CONSTANT address space. Thus, it
    also won't alias to pointers to the PRIVATE or LOCAL address space.

Differential Revision: https://reviews.llvm.org/D89525
2020-10-20 09:54:12 -04:00
Carl Ritson be2afbd019 [AMDGPU] Remove fix up operand from SI_ELSE
Remove immediate operand from SI_ELSE which indicates if EXEC has
been modified.  Instead always emit code that handles EXEC and
remove unnecessary instructions during pre-RA optimisation.

This facilitates passes (i.e. SIWholeQuadMode) adding exec mask
manipulation post control flow lowering, and pre control flow
lower passes do not need to be aware of SI_ELSE handling.

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D89644
2020-10-20 19:15:21 +09:00
Carl Ritson 6aabbeadae [AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility
Remove duplicate code and move things around to make it easier to
add additional optimisations to the pass.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D89619
2020-10-20 17:22:43 +09:00
Ulrich Weigand c299f3555d [SystemZ] Fix disassembler crashes
The "Size" value returned by SystemZDisassembler::getInstruction is
used by common code even in the case where the routine returns
failure.  If that Size value exceeds the number of bytes remaining
in the section, that could cause disassembler crashes.

Fixed by never returning more than the number of bytes remaining.
2020-10-20 10:21:42 +02:00
Evgeny Leviant 991e86156c [ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate
Differential revision: https://reviews.llvm.org/D89460
2020-10-20 11:14:21 +03:00
David Green 6dcbc323fd Revert "[ARM][LowOverheadLoops] Adjust Start insertion."
This reverts commit 38f625d0d1.

This commit contains some holes in its logic and has been causing
issues since it was commited. The idea sounds OK but some cases were not
handled correctly. Instead of trying to fix that up later it is probably
simpler to revert it and work to reimplement it in a more reliable way.
2020-10-20 08:55:21 +01:00
Luqman Aden 51892a42da [COFF][ARM] Fix CodeView for Windows on 32bit ARM targets.
Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets.

Reviewed By: compnerd

Differential Revision: https://reviews.llvm.org/D89622
2020-10-19 22:16:16 -07:00
Wang, Pengfei 3a85472af2 [X86] Fix assert fail when element type is i1.
extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail.
Since we don't really handle vxi1 cases in below code, we can just return from here.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89096
2020-10-20 09:26:32 +08:00
Stanislav Mekhanoshin 6ddadf9901 [AMDGPU] flat scratch ST addressing mode on gfx10
GFX10 enables third addressing mode for flat scratch instructions,
an ST mode. In that mode both register operands are omitted and
only swizzled offset is used in addition to flat_scratch base.

Differential Revision: https://reviews.llvm.org/D89501
2020-10-19 15:29:52 -07:00
Sergei Trofimovich 1eb812e06d [VE] Fix initializer visibility
Before the change attempt to link libLTO.so against shared
LLVM library failed as:

```
[ 76%] Linking CXX shared library ../../lib/libLTO.so
... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1
c++ -o ...libLTO.so.12git ...ibLLVM-12git.so
ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()':
include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo'
```

It happens because on linux llvm build system sets default
symbol visibility to "hidden". The fix is to set visibility
back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY.

Bug: https://bugs.llvm.org/show_bug.cgi?id=47847

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D89633
2020-10-19 22:54:41 +01:00
Evgenii Stepanov 188a7d6710 Add alloca size threshold for StackTagging initializer merging.
Summary:
Initializer merging generates pretty inefficient code for large allocas
that also happens to trigger an exponential algorithm somewhere in
Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867.

This change adds an upper limit for the alloca size. The default limit
is selected such that worst case size of memtag-generated code is
similar to non-memtag (but because of the ISA quirks, this case is
realized at the different value of alloca size, ex. memset inlining
triggers at sizes below 512, but stack tagging instructions are 2x
shorter, so limit is approx. 256).

We could try harder to emit more compact code with initializer merging,
but that would only affect large, sparsely initialized allocas, and
those are doing fine already.

Reviewers: vitalybuka, pcc

Subscribers: llvm-commits
2020-10-19 13:44:07 -07:00
Craig Topper e28376ec28 [X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table.
We have pseudo instructions we use for bitcasts between these types.
We have them in the load folding table, but not the store folding
table. This adds them there so they can be used for stack spills.

I added an exact size check so that we don't fold when the stack slot
is larger than the GPR. Otherwise the upper bits in the stack slot
would be garbage. That would be fine for Eli's test case in PR47874,
but I'm not sure its safe in general.

A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int
pseudo instructions to use FR32 as the second source register class
instead of VR128. That will keep the coalescer from promoting the
register class of the bitcast instruction which will make the stack
slot 4 bytes instead of 16 bytes.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D89656
2020-10-19 12:53:14 -07:00
Tony 6be9c7d2dc [AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp 2020-10-19 18:50:28 +00:00
Jay Foad 56f6bf1a8d [AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24
These were introduced in r279902 on the grounds that using separate
MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple
uses of the operands, which would prevent SimplifyDemandedBits from
simplifying the operands.

This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations"

No functional change intended. At least it has no effect on lit tests.

Differential Revision: https://reviews.llvm.org/D89706
2020-10-19 19:15:34 +01:00
Amy Kwan 6a946fd06f [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead.
MULH is often expanded on targets.
This patch removes the isMulhCheaperThanMulShift hook and uses
isOperationLegalOrCustom instead.

Differential Revision: https://reviews.llvm.org/D80485
2020-10-19 12:23:04 -05:00
Tony 151e297034 [AMDGPU] Simplify cumode handling in SIMemoryLegalizer
Differential Revision: https://reviews.llvm.org/D89663
2020-10-19 17:13:45 +00:00
Paul C. Anagnostopoulos 2871c6c93f [Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans.
Differential Revision: https://reviews.llvm.org/D89551
2020-10-19 10:33:55 -04:00