Commit Graph

34165 Commits

Author SHA1 Message Date
Amara Emerson 19ff00dab8 [AArch64] Fix CollectLOH creating an AdrpAdd LOH when there's a live used reg
between the two instructions.

If there's a pattern like:
$xA = ADRP foo @PAGE
[some killing use of reg Xb]
$Xb = ADDXri $Xa, 0, @PAGEOFF

CollectLOH would create an AdrpAdd LOH that resulted in the linker optimizing
this sequence into:
$xB = ADR foo
[some killing use of reg $Xb]
... and therefore clobbers the live $Xb register that was used by the
instruction in between.

This was discovered by a GlobalISel patch D78465 which broke up global variable
accesses into two pseudos, which in some cases could be moved apart.

Differential Revision: https://reviews.llvm.org/D80834
2020-06-01 16:00:55 -07:00
Matt Arsenault 20793b2aef AMDGPU: Fix test in code directory 2020-06-01 13:26:51 -04:00
Matt Arsenault 7ad36491ca AMDGPU: Fix alignment for dynamic allocas
The alignment value also needs to be scaled by the wave size.
2020-06-01 13:06:37 -04:00
Igor Kudrin cbec419b3e [DebugInfo] Separate fields with commas in headers of type units (3/3).
For most tables, we already use commas in headers. This set of patches
unifies dumping the remaining ones.

Differential Revision: https://reviews.llvm.org/D80806
2020-06-01 17:40:28 +07:00
Igor Kudrin 2a7af30482 [DebugInfo] Separate fields with commas in headers of compile units (2/3).
For most tables, we already use commas in headers. This set of patches
unifies dumping the remaining ones.

Differential Revision: https://reviews.llvm.org/D80806
2020-06-01 17:40:24 +07:00
Tim Northover dace8224f3 AArch64: materialize large stack offset into xzr correctly.
When a stack offset was too big to materialize in a single instruction, we were
trying to do it in stages:

    adds xD, sp, #imm
    adds xD, xD, #imm

Unfortunately, if xD is xzr then the second instruction doesn't exist and
wouldn't do what was needed if it did. Instead we can use a temporary register
for all but the last addition.
2020-06-01 09:30:05 +01:00
Li Rong Yi 3101601b54 [PowerPC] Exploit vabsd on P9
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
  for (int i = 0; i < 16; i++)
     t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D80271
2020-06-01 02:30:27 +00:00
Craig Topper 8abe830093 [X86] Rewrite how X86PartialReduction finds candidates to consider optimizing.
Previously we walked the users of any vector binop looking for
more binops with the same opcode or phis that eventually ended up
in a reduction. While this is simple it also means visiting the
same nodes many times since we'll do a forward walk for each
BinaryOperator in the chain. It was also far more general than what
we have tests for or expect to see.

This patch replaces the algorithm with a new method that starts at
extract elements looking for a horizontal reduction. Once we find
a reduction we walk through backwards through phis and adds to
collect leaves that we can consider for rewriting.

We only consider single use adds and phis. Except for a special
case if the Add is used by a phi that forms a loop back to the
Add. Including other single use Adds to support unrolled loops.

Ultimately, I want to narrow the Adds, Phis, and final reduction
based on the partial reduction we're doing. I still haven't
figured out exactly what that looks like yet. But restricting
the types of graphs we expect to handle seemed like a good first
step. As does having all the leaves and the reduction at once.

Differential Revision: https://reviews.llvm.org/D79971
2020-05-31 12:53:01 -07:00
Simon Pilgrim 22e50833e9 [X86][AVX] Reduce unary target shuffles width if the upper elements aren't demanded. 2020-05-31 20:19:24 +01:00
Simon Pilgrim 8f2f613a6e [X86][AVX] combineX86ShufflesRecursively - peekThroughOneUseBitcasts subvector before widening.
This matches what we do for the full sized vector ops at the start of combineX86ShufflesRecursively, and helps getFauxShuffleMask extract more INSERT_SUBVECTOR patterns.
2020-05-31 19:58:33 +01:00
Matt Arsenault 95f65a7c6c AArch64/GlobalISel: Fix incorrect ptrmask usage for alignment
I inverted the mask when I ported to the new form of G_PTRMASK in
8bc03d2168.

I don't think this really broke anything, since G_VASTART isn't
handled for types with an alignment higher than the stack alignment.
2020-05-31 10:56:55 -04:00
Simon Pilgrim 4a2673d79f [X86][AVX] Add SimplifyMultipleUseDemandedBits VBROADCAST handling to SimplifyDemandedVectorElts.
As suggested on D79987.
2020-05-31 14:20:15 +01:00
Simon Pilgrim 15b281d780 [X86][AVX] Add test case described in D79987 2020-05-31 13:51:00 +01:00
Kang Zhang bfdf9ef009 Revert "[NFC][PowerPC] Add a new case to test phi-node-elimination pass"
This case wll be failed on some machines which enable expensive-checks.

This reverts commit af3abbf7bd.
2020-05-31 09:24:21 +00:00
Kang Zhang af3abbf7bd [NFC][PowerPC] Add a new case to test phi-node-elimination pass 2020-05-31 08:05:27 +00:00
Jay Foad 2768edfff1 [AMDGPU] Propagate fast-math flags when lowering FSIN and FCOS
Differential Revision: https://reviews.llvm.org/D80813
2020-05-31 05:21:55 +01:00
Jay Foad d4751f3556 [AMDGPU] Precommit tests for D80813 2020-05-31 05:21:55 +01:00
Changpeng Fang 234eba90f4 AMDGPU: Add setTruncStoreAction for vector i64 types made legal recently
Reviewers:
  rampitec, arsenm

Differential Revision:
  https://reviews.llvm.org/D80853
2020-05-30 20:45:27 -07:00
Craig Topper af1accdd86 [X86] Teach computeKnownBitsForTargetNode that the upper half of X86ISD::MOVQ2DQ is all zero. 2020-05-30 19:47:07 -07:00
Craig Topper efc5857b0b [X86] Autogenerate complete checks. NFC 2020-05-30 19:47:07 -07:00
Craig Topper 07e8a780d8 [X86] Add pseudo instructions to use MULX with a single destination when the low result isn't used.
The instruction is defined to only produce high result if both
destinations are the same. We can exploit this to avoid
unnecessarily clobbering a register.

In order to hide this from register allocation we use a pseudo
instruction and expand the result during MCInst creation.

Differential Revision: https://reviews.llvm.org/D80500
2020-05-30 16:01:01 -07:00
Philip Reames dfa82f8af4 [Tests] Convert last statepoint lowering tests to bundle format 2020-05-30 12:59:34 -07:00
Craig Topper c65c1d7893 [X86] Autogenerate complete checks. NFC 2020-05-29 23:45:04 -07:00
Martin Storsjö 51089db6d7 [test] Regenerate checks in aarch64_win64cc_vararg.ll with update_llc_test_checks.py. NFC. 2020-05-30 09:22:09 +03:00
Martin Storsjö cf97e0ec42 [AArch64] Treat x18 as callee-saved in functions with windows calling convention on non-windows OSes
Treat it as callee-saved, and always back it up. When windows code calls
entry points in unix code, marked with the windows calling convention,
that unix code can call other functions that isn't compiled with
-ffixed-x18 which may clobber x18 freely. By backing it up and restoring
it on return, we preserve the register across the function call,
fulfilling this part of the windows calling convention on another OS.

This isn't enough for making sure that x18 is preseved when non-windows
code does a callback to windows code, but is a clear improvement over
the current status quo. Additionally, wine is nowadays building many
modules as PE DLLs, which avoids the callback issue altogether for those
DLLs.

Differential Revision: https://reviews.llvm.org/D61892
2020-05-30 09:22:09 +03:00
Carl Ritson d04147789f [AMDGPU] Remove assertion on S1024 SGPR to VGPR spill
Summary:
Replace an assertion that blocks S1024 SGPR to VGPR spill.
The assertion pre-dates S1024 and is not wave size dependent.

Reviewers: arsenm, sameerds, rampitec

Reviewed By: arsenm

Subscribers: qcolombet, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80783
2020-05-30 11:16:19 +09:00
Matt Arsenault 0892a96a05 AMDGPU: Optimize s_setreg_b32 to s_denorm_mode/s_round_mode
This is a custom inserter because it was less work than teaching
tablegen a way to indicate that it is sometimes OK to have a no side
effect instruction in the output of a side effecting pattern.

The asm is needed to look like a read of the mode register to prevent
it from being deleted. However, there seems to be a bug where the mode
register def instructions are moved across the asm sideeffect by the
post-RA scheduler.

Another oddity is the immediate is formatted differently between
s_denorm_mode and s_round_mode.
2020-05-29 21:11:36 -04:00
Matt Arsenault 4f300d4996 AMDGPU: Add new baseline tests for setreg handling
Most of these should be identical and use a common prefix, but
update_llc_test_checks is failing to generate shared checks for some
reason.
2020-05-29 21:00:30 -04:00
Matt Arsenault f012c58abd AMDGPU: Move MIMG MMO check to verifier 2020-05-29 20:58:23 -04:00
Matt Arsenault 2484109378 AMDGPU/GlobalISel: Add boilerplate for inline asm lowering
Test mostly from minor adjustments to the AArch64 one.
2020-05-29 16:49:23 -04:00
Zequan Wu 80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Matt Arsenault 2d2627d47a AMDGPU: Remove fp-exceptions feature
This was never used, and the only thing it changed was removed in
284472be6d. The floating point mode is
also not a property of the subtarget.
2020-05-29 15:19:59 -04:00
Ehud Katz f881c7967d [tests] Fix AMDGPU test
Fix naming issue in test due to change D80399.
2020-05-29 22:15:26 +03:00
Stanislav Mekhanoshin a520294913 [AMDGPU] Regenrated urem/udiv global isel tests. NFC. 2020-05-29 12:08:47 -07:00
Craig Topper 5c7aca6a4c [X86] Ignore large code model in X86FastISel::X86MaterializeFP in 32-bit mode
Large code model doesn't mean anything to 32-bit mode. But nothing
prevents it from being set. Ignore to avoid generating 64-bit mode
only instructions.

Differential Revision: https://reviews.llvm.org/D80768
2020-05-29 10:39:08 -07:00
Craig Topper 87e4ad4d5c [X86] Remove isel pattern for MMX_X86movdq2q+simple_load. Replace with DAG combine to to loadmmx.
Only 64-bit bits will be loaded, not the whole 128 bits. We can
just combine it to plain mmx load. This has the side effect of
enabling isel load folding for it.

This part of my desire to get rid of isel patterns that shrink loads.
2020-05-29 10:20:03 -07:00
Xiangling Liao 26604d06b6 [AIX] Emit AvailableExternally Linkage on AIX
Since on AIX, our strategy is to not use -u to suppress any undefined
symbols, we need to emit .extern for the symbols with AvailableExternally
linkage.

Differential Revision: https://reviews.llvm.org/D80642
2020-05-29 13:12:59 -04:00
Guozhi Wei 40c08367e4 [DAGCombiner] Add command line options to guard store width reduction
optimizations

As discussed in the thread http://lists.llvm.org/pipermail/llvm-dev/2020-May/141838.html,
some bit field access width can be reduced by ReduceLoadOpStoreWidth, some
can't. If two accesses are very close, and the first access width is reduced,
the second is not. Then the wide load of second access will be stalled for long
time.

This patch add command line options to guard ReduceLoadOpStoreWidth and
ShrinkLoadReplaceStoreWithStore, so users can use them to disable these
store width reduction optimizations.

Differential Revision: https://reviews.llvm.org/D80745
2020-05-29 09:41:41 -07:00
Kevin P. Neal cd74ccc965 [X86] Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:31:55 -04:00
Stanislav Mekhanoshin f6a6de288b GlobalISel: fix CombinerHelper::matchEqualDefs()
This matcher was always returning true for the different
results of a same instruction.

Differential Revision:
2020-05-29 09:30:02 -07:00
Kevin P. Neal c21a4f84b0 Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:28:14 -04:00
Kevin P. Neal 66d1899e2f Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:25:13 -04:00
Kevin P. Neal a38788201e Fix errors in use of strictfp attribute.
Errors spotted with use of: https://reviews.llvm.org/D68233
2020-05-29 12:22:21 -04:00
Jay Foad 9e0b52e2e6 [AMDGPU] Remove duplicate test cases
The two "2sin" test cases were identical to the "sin_2x" test cases just
above.
2020-05-29 16:36:36 +01:00
David Green 747c574b94 [ARM] Extra MVE VMLAV reduction patterns
These patterns for i8 and i16 VMLA's were missing. They end up from
legalized vector.reduce.add.v8i16 and vector.reduce.add.v16i8, and
although the instruction works differently (the mul and add are
performed in a higher precision), I believe it is OK because only an
i8/i16 are demanded from them, and so the results will be the same. At
least, they pass any testing I can think to run on them.

There are some tests that end up looking worse, but are quite artificial
due to passing half vector types through a call boundary. I would not
expect the vmull to realistically come up like that, and a vmlava is
likely better a lot of the time.

Differential Revision: https://reviews.llvm.org/D80524
2020-05-29 16:23:24 +01:00
Sanjay Patel 912502e8ef [AArch64][x86] add tests for FMA combines; NFC 2020-05-29 08:58:37 -04:00
Florian Hahn d20a3d35e1 [DAGComb] Do not turn insert_elt into shuffle for single elt vectors.
Currently combineInsertEltToShuffle turns insert_vector_elt into a
vector_shuffle, even if the inserted element is a vector with a single
element. In this case, it should be unlikely that the additional shuffle
would be more efficient than a insert_vector_elt.

Additionally, this fixes a infinite cycle in DAGCombine, where
combineInsertEltToShuffle turns a insert_vector_elt into a shuffle,
which gets turned back into a insert_vector_elt/extract_vector_elt by
a custom AArch64 lowering (in visitVECTOR_SHUFFLE).

Such insert_vector_elt and extract_vector_elt combinations can be
lowered efficiently using mov on AArch64.

There are 2 test changes in arm64-neon-copy.ll: we now use one or two
mov instructions instead of a single zip1. The reason that we need a
second mov in ins1f2 is that we have to move the result to the result
register and is not really related to the DAGCombine fold I think.
But in any case, on most uarchs, mov should be cheaper than zip1. On a
Cortex-A75 for example, zip1 is twice as expensive as mov
(https://developer.arm.com/docs/101398/latest/arm-cortex-a75-software-optimization-guide-v20)

Reviewers: spatel, efriedma, dmgreen, RKSimon

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D80710
2020-05-29 13:21:13 +01:00
Simon Pilgrim b9826c1086 [CGP] Ensure address scaled offset is representable as int64_t
AddressingModeMatcher::matchScaledValue was calling getSExtValue for a constant before ensuring that we can actually represent the value as int64_t

Fixes OSSFuzz#22723 which is a followup to rGc479052a74b2 (PR46004 / OSSFuzz#22357)
2020-05-29 12:25:43 +01:00
David Sherwood 9c0ef044be [SVE] Fix warnings in SelectInst::areInvalidOperands
We should be comparing the element counts rather than the
numbers of elements.

Differential Revision: https://reviews.llvm.org/D80634
2020-05-29 07:50:47 +01:00
David Sherwood b147b88c84 [CodeGen] Add support for extracting elements of scalable vectors
I have tried to ensure that SelectionDAG and DAGCombiner do
sensible things for scalable vectors, and added support for a
limited number of simple folds. Codegen support for the vector
extract patterns have also been added to the AArch64 backend.

New vector extract tests have been added here:

  CodeGen/AArch64/sve-extract-element.ll

and I have also added new folds using inserts and extracts here:

  CodeGen/AArch64/sve-insert-element.ll

Differential Revision: https://reviews.llvm.org/D80208
2020-05-29 07:49:43 +01:00