Commit Graph

45911 Commits

Author SHA1 Message Date
Sanjay Patel d7bed12192 [AArch64] remove bogus comment; NFC
I added this comment with D42323, but as discussed in D42806, the architecture
does the right thing for denorms. We don't even need the select on 0.0 here?

llvm-svn: 323996
2018-02-01 19:59:33 +00:00
Changpeng Fang 29fcf883fb AMDGPU/SI: Adjust the encoding family for D16 buffer instructions when the target has UnpackedD16VMem feature.
Reviewers:
  Matt and Brian

Differential Revision:
  https://reviews.llvm.org/D42548

llvm-svn: 323988
2018-02-01 18:41:33 +00:00
Simon Pilgrim 1a8cefc328 [X86][SSE] LowerBUILD_VECTORAsVariablePermute - add support for scaling index vectors
This allows us to use PSHUFB for v8i16/v4i32 and VPERMD/PERMPS for v4i64/v4f64 variable shuffles.

Differential Revision: https://reviews.llvm.org/D42487

llvm-svn: 323987
2018-02-01 18:10:30 +00:00
Craig Topper a8a24232ee [X86] Remove custom lowering vXi1 extending loads and truncating stores.
Summary: Now that v2i1/v4i1 are legal without VLX. And v32i1 is legalized by splitting rather than widening. And isVectorLoadExtDesirable returns false for vXi1. It appears this handling is dead because the operations simply don't exist.

Reviewers: RKSimon, zvi, guyblank, delena, spatel

Reviewed By: delena

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D42781

llvm-svn: 323983
2018-02-01 17:08:41 +00:00
Craig Topper 7e910a9e85 [X86] Turn X86ISD::AND nodes that have no flag users back into ISD::AND just before isel to enable test instruction matching
Summary:
EmitTest sometimes creates X86ISD::AND specifically to hide the AND from DAG combine. But this prevents isel patterns that look for (cmp (and X, Y), 0) from being able to see it. So we end up with an AND and a TEST. The TEST gets removed by compare instruction optimization during the peephole pass.

This patch attempts to fix this by converting X86ISD::AND with no flag users back into ISD::AND during the DAG preprocessing just before isel.

In order to do this correctly I had to make the X86ISD::AND node created by EmitTest in this case really have a flag output. Which arguably it should have had anyway so that the number of operands would be consistent for the opcode in all cases. Then I had to modify the ReplaceAllUsesWith to understand that we might be looking at an instruction with 2 outputs. Though in this case there are no uses to replace since we just created the node, but that's what the code did before so I just made it keep working.

Reviewers: spatel, RKSimon, niravd, deadalnix

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42764

llvm-svn: 323982
2018-02-01 17:08:39 +00:00
Sanjay Patel 657e5d8d41 [DAGCombiner] filter out denorm inputs when calculating sqrt estimate (PR34994)
As shown in the example in PR34994:
https://bugs.llvm.org/show_bug.cgi?id=34994
...we can return a very wrong answer (inf instead of 0.0) for square root when 
using a reciprocal square root estimate instruction.

Here, I've conditionalized the filtering out of denorms based on the function 
having "denormal-fp-math"="ieee" in its attributes. The other options for this 
attribute are 'preserve-sign' and 'positive-zero'.

So we don't generate this extra code by default with just '-ffast-math' (because 
then there's no denormal attribute string at all), but it works if you specify 
'-ffast-math -fdenormal-fp-math=ieee' from clang. 

As noted in the review, there may be other problems in clang that affect the 
results depending on platform (Linux x86 at least), but this should allow 
creating the desired codegen.

Differential Revision: https://reviews.llvm.org/D42323

llvm-svn: 323981
2018-02-01 16:57:18 +00:00
Sjoerd Meijer 9d9a86535e [ARM] FullFP16 LowerReturn Fix
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).

Differential Revision: https://reviews.llvm.org/D42743

llvm-svn: 323968
2018-02-01 13:48:40 +00:00
Aleksandar Beserminji a330c208f2 [mips] Include EVA instructions in Std2MicroMips mapping tables
This patch includes EVA instructions in the Std2MicroMips mapping
tables, which is required for direct object emission.

Differential Revision: https://reviews.llvm.org/D41771

llvm-svn: 323958
2018-02-01 12:53:26 +00:00
Clement Courbet ea8d07eb76 [AArch64][NFC] Make all ProcResource definitions include their SchedModel.
This makes targets ExynosM1,ExynosM3,ThunderX2T99 consistent with all
other targets.

llvm-svn: 323955
2018-02-01 12:12:01 +00:00
Yvan Roux 490e9e6761 [ARM] Add support for unpredictable MVN instructions.
This fixes bugzilla 33011
https://bugs.llvm.org/show_bug.cgi?id=33011

Defines bits {19-16} as zero or unpredictable as specified by the ARM ARM in
sections A8.8.116 and A8.8.117.

It fixes also the usage of PC register as destination register for MVN
register-shifted register version as specified in A8.8.117.

Differential Revision: https://reviews.llvm.org/D41905

llvm-svn: 323954
2018-02-01 12:06:57 +00:00
Yvan Roux 705e26a243 Test commit: Fix a comment.
llvm-svn: 323947
2018-02-01 08:39:58 +00:00
Dean Michael Berris cdca0730be [XRay][compiler-rt+llvm] Update XRay register stashing semantics
Summary:
This change expands the amount of registers stashed by the entry and
`__xray_CustomEvent` trampolines.

We've found that since the `__xray_CustomEvent` trampoline calls can show up in
situations where the scratch registers are being used, and since we don't
typically want to affect the code-gen around the disabled
`__xray_customevent(...)` intrinsic calls, that we need to save and restore the
state of even the scratch registers in the handling of these custom events.

Reviewers: pcc, pelikan, dblaikie, eizan, kpw, echristo, chandlerc

Reviewed By: echristo

Subscribers: chandlerc, echristo, hiraditya, davide, dblaikie, llvm-commits

Differential Revision: https://reviews.llvm.org/D40894

llvm-svn: 323940
2018-02-01 02:21:54 +00:00
Evgeniy Stepanov 7746899f48 Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations"
Miscompiles code. Testcase pending.

This reverts commit r323869.

llvm-svn: 323929
2018-01-31 22:55:19 +00:00
Matt Arsenault af88f0eb44 AMDGPU: Fix missing SCC def from s_xor_b64_term
llvm-svn: 323927
2018-01-31 22:54:27 +00:00
Craig Topper e44faf53c7 [X86] Make the type checks in detectAVX512USatPattern more robust
This code currently uses isSimple and getSizeInBits in an attempt to prune types. But isSimple will return true for any type that any target supports natively. I don't think that's a good way to prune types. I also don't think the dest element type checks are very robust since we didn't do an isSimple check on the dest type.

This patch adds a check for the input type being legal to the one caller that didn't already check that. Then we explicitly check the element types for the destination are i8, i16, or i32

Differential Revision: https://reviews.llvm.org/D42706

llvm-svn: 323924
2018-01-31 22:26:31 +00:00
Krzysztof Parzyszek 15efa98f63 [Hexagon] Rename HexagonISelLowering::getNode to getInstr, NFC
llvm-svn: 323916
2018-01-31 21:17:03 +00:00
Chandler Carruth 0dcee4fe7a [x86] Make the retpoline thunk insertion a machine function pass.
Summary:
This removes the need for a machine module pass using some deeply
questionable hacks. This should address PR36123 which is a case where in
full LTO the memory usage of a machine module pass actually ended up
being significant.

We should revert this on trunk as soon as we understand and fix the
memory usage issue, but we should include this in any backports of
retpolines themselves.

Reviewers: echristo, MatzeB

Subscribers: sanjoy, mcrosier, mehdi_amini, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D42726

llvm-svn: 323915
2018-01-31 20:56:37 +00:00
Krzysztof Parzyszek 1108ee2496 [Hexagon] Implement HVX codegen for vector shifts
llvm-svn: 323914
2018-01-31 20:49:24 +00:00
Krzysztof Parzyszek 9eb085e6cf [Hexagon] Handle ANY_EXTEND_VECTOR_INREG in lowering
llvm-svn: 323912
2018-01-31 20:48:11 +00:00
Krzysztof Parzyszek b843f75179 [Hexagon] Handle SETCC on vector pairs in lowering
llvm-svn: 323911
2018-01-31 20:46:55 +00:00
Marek Olsak d4bb329d0e AMDGPU: Fold inline offset for loads properly in moveToVALU on GFX9
Summary:
This enables load merging into x2, x4, which is driven by inline offsets.

6500 shaders are affected:
Code Size in affected shaders: -15.14 %

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D42078

llvm-svn: 323909
2018-01-31 20:18:11 +00:00
Marek Olsak 13e4741275 AMDGPU: Add intrinsics llvm.amdgcn.cvt.{pknorm.i16, pknorm.u16, pk.i16, pk.u16}
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D41663

llvm-svn: 323908
2018-01-31 20:18:04 +00:00
Sam Clegg 6e7f1826c5 [WebAssembly] MC: Remove unused code for handling of wasm globals
For now, we are not using wasm globals, except for modeling of
the stack points.

Alos, factor out common struct WasmGlobalType, which matches the
name for that tuple in the Wasm spec and rename methods
to "isBindingGlobal", "isTypeGlobal" to avoid ambiguity.

Patch by Nicholas Wilson!

Differential Revision: https://reviews.llvm.org/D42750

llvm-svn: 323901
2018-01-31 19:50:14 +00:00
Amaury Sechet f9a9e9a251 [X86] Generate testl instruction through truncates.
Summary:
This was introduced in D42646 but ended up being reverted because the original implementation was buggy.

Depends on D42646

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42741

llvm-svn: 323899
2018-01-31 19:20:06 +00:00
Krzysztof Parzyszek 82a83391d3 [Hexagon] Handle BUILD_VECTOR from undef values in buildHvxVectorReg
llvm-svn: 323889
2018-01-31 16:52:15 +00:00
Amaury Sechet f89f188ddb [X86] Avoid using high register trick for test instruction
Summary:
It seems it's main effect is to create addition copies when values are inr register that do not support this trick, which increase register pressure and makes the code bigger.

Reviewers: craig.topper, niravd, spatel, hfinkel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D42646

llvm-svn: 323888
2018-01-31 16:48:54 +00:00
Krzysztof Parzyszek 8cc636c592 [Hexagon] Only process bitcasts of vsplats when selecting const vectors
Selecting of constant HVX vectors involves some "manual processing",
which mishandled an unrelated BITCAST operation causing a selection
error.

llvm-svn: 323887
2018-01-31 16:48:20 +00:00
Diana Picus 12ed95e3e7 Fix formatting for r323876. NFC
llvm-svn: 323878
2018-01-31 15:16:17 +00:00
Diana Picus 1d4421f6a6 [ARM GlobalISel] Modernize LegalizerInfo. NFCI
Start using the new LegalizerInfo API introduced in r323681.

Keep the old API for opcodes that need Lowering in some circumstances
(G_FNEG and G_UREM/G_SREM).

llvm-svn: 323876
2018-01-31 14:55:07 +00:00
Pablo Barrio 2e442a7831 [ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations
Summary:
Expressions of the form x < 0 ? 0 :  x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves

In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.

Patch by Marten Svanfeldt.

Reviewers: fhahn, pbarrio

Reviewed By: pbarrio

Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D42574

llvm-svn: 323869
2018-01-31 13:20:10 +00:00
Jonas Paulsson cc5fe73669 [SystemZ] Check the bitwidth before calling isInt/isUInt.
Since these methods will assert if the integer does not fit into 64 bits,
it is necessary to do this check before calling them in
supportedAddressingMode().

Review: Ulrich Weigand.
llvm-svn: 323866
2018-01-31 12:41:25 +00:00
Sjoerd Meijer 98d5359ea2 [ARM] Armv8.2-A FP16 code generation (part 2/3)
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.

Differential Revision: https://reviews.llvm.org/D42580

llvm-svn: 323861
2018-01-31 10:18:29 +00:00
Jonas Paulsson e6a8329e9f [PowerPC] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Nemanja Ivanovic
llvm-svn: 323858
2018-01-31 09:26:51 +00:00
Roger Ferrer Ibanez aea4208720 [ARM] Allow the scheduler to clone a node with glue to avoid a copy CPSR ↔ GPR.
In Thumb 1, with the new ADDCARRY / SUBCARRY the scheduler may need to do
copies CPSR ↔ GPR but not all Thumb1 targets implement them.

The schedule can attempt, before attempting a copy, to clone the instructions
but it does not currently do that for nodes with input glue. In this patch we
introduce a target-hook to let the hook decide if a glued machinenode is still
eligible for copying. In this case these are ARM::tADCS and ARM::tSBCS .

As a follow-up of this change we should actually implement the copies for the
Thumb1 targets that do implement them and restrict the hook to the targets that
can't really do such copy as these clones are not ideal.

This change fixes PR35836.

Differential Revision: https://reviews.llvm.org/D42051

llvm-svn: 323857
2018-01-31 09:23:43 +00:00
Krzysztof Parzyszek 119856430e [RDF] Clear the renamable flag when copy propagating reserved registers
llvm-svn: 323831
2018-01-30 23:19:44 +00:00
Krzysztof Parzyszek 5d9844f48a [Hexagon] Handle truncates in polynomial multiply idiom recognition
This is in anticipation of https://reviews.llvm.org/D42424, which would
otherwise break one of the pmpy testcases.

llvm-svn: 323824
2018-01-30 22:03:59 +00:00
Craig Topper d759f476e8 [X86] Remove redundant check for hasAVX512 before calling hasBWI. NFC
hasBWI implies hasAVX512.

llvm-svn: 323823
2018-01-30 21:53:35 +00:00
Martin Storsjo 708498a164 [AArch64] Properly handle dllimport of variables when using fast-isel
Differential Revision: https://reviews.llvm.org/D42567

llvm-svn: 323810
2018-01-30 19:50:51 +00:00
Krzysztof Parzyszek 39a9842f3c [Hexagon] Handle non-aligned offsets in globals in extender optimization
Instructions like memd(r0+##global+1) are legal as long as the entire
address is properly aligned. Assuming that "global" is aligned at an
8-byte boundary, the expression "global+1" appears to be misaligned.
Handle such cases in HexagonConstExtenders, and make sure that any non-
extended offsets generated are still aligned accordingly.

llvm-svn: 323799
2018-01-30 18:12:37 +00:00
Krzysztof Parzyszek 96a284114e Revert: [Hexagon] Make sure that offset on globals matches alignment requirements
This reverts r323562, since it wasn't actually necessary. Constant-
extended offsets do not need to be aligned, as long as the effective
address is aligned.

Keep the testcase, with a modification which checks that such offsets
are not unnecessarily avoided.

llvm-svn: 323798
2018-01-30 18:10:27 +00:00
Simon Pilgrim 073f089c6e [X86][XOP] Update isVectorShiftByScalarCheap with cases covered by XOP
Similar to D42437, XOP supports variable shift for v16i8/v8i16/v4i32/v2i64 types.

Differential Revision: https://reviews.llvm.org/D42526

llvm-svn: 323797
2018-01-30 18:10:21 +00:00
Geoff Berry 1d53101387 [AMDGPU] isRenamable fixes to support copy forwarding
Mark more opcodes as hasExtraSrcRegAllocReq so that their operands will
be marked as not renamable, to avoid copy forwarding violating the
constraint that only one operand may use the constant bus.

These changes fix a few mis-compiles when copy forwarding is enabled in
MachineCopyPropagation by D41835 (and were reviewed as part of that change).

llvm-svn: 323794
2018-01-30 17:37:39 +00:00
Mark Searles 94ae3b2f9b [AMDGPU] Revert "[AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output."
Patch caused a buildbot failure; arg; http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/17373/s\
teps/build_Lld/logs/stdio :
        /Users/buildslave/as-bldslv9/lld-x86_64-darwin13/llvm.src/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1563:18: error: unused variable 'InstCnt' [-Werror,-Wunused-variable]
          static int32_t InstCnt = 0;
                                              "
This reverts commit 4f4a7d61e306b67044d9f16bc2016fee806bc2cc.

llvm-svn: 323791
2018-01-30 17:17:06 +00:00
Mark Searles d6d5a2571f [AMDGPU] Add options for waitcnt pass debugging; add instr count in debug output.
-amdgpu-waitcnt-forcezero={1|0}  Force all waitcnt instrs to be emitted as s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-amdgpu-waitcnt-forceexp=<n>  Force emit a s_waitcnt expcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcelgkm=<n> Force emit a s_waitcnt lgkmcnt(0) before the first <n> instrs
-amdgpu-waitcnt-forcevm=<n>   Force emit a s_waitcnt vmcnt(0) before the first <n> instrs

This patch was pushed ( abb190fd51cd2f9a9eef08c024e109f7f7e909fc ), which caused a buildbot failure, reverted ( 6227480d74da507cf8e1b4bcaffbdb9fb875b4b8 ), and then updated to fix buildbot failures (this patch).

Differential Revision: https://reviews.llvm.org/D40091

llvm-svn: 323788
2018-01-30 16:49:38 +00:00
Changpeng Fang 0905870f93 AMDGPU/SI: Add decoding in the GFX80_UNPACKED decoding namespace.
Reviewer:
  Dmitry (dp).

Differential Revision:
  https://reviews.llvm.org/D42596

llvm-svn: 323785
2018-01-30 16:42:40 +00:00
Evandro Menezes f1d01645a7 [AArch64] Add new target feature to fuse address generation with load or store
This feature enables the fusion of the address generation and a
corresponding load or store together.

Differential revision: https://reviews.llvm.org/D42393

llvm-svn: 323782
2018-01-30 16:28:01 +00:00
Simon Dardis daaeaba665 [mips] Fix incorrect sign extension for fpowi libcall
PR36061 showed that during the expansion of ISD::FPOWI, that there
was an incorrect zero extension of the integer argument which for
MIPS64 would then give incorrect results. Address this with the
existing mechanism for correcting sign extensions.

This resolves PR36061.

Thanks to James Cowgill for reporting the issue!

Reviewers: atanasyan, hfinkel

Differential Revision: https://reviews.llvm.org/D42537

llvm-svn: 323781
2018-01-30 16:24:10 +00:00
Zaara Syeda 1f59ae311b Re-commit : [PowerPC] Add handling for ColdCC calling convention and a pass to mark
candidates with coldcc attribute.

This recommits r322721 reverted due to sanitizer memory leak build bot failures.

Original commit message:
This patch adds support for the coldcc calling convention for Power.
This changes the set of non-volatile registers. It includes a pass to stress
test the implementation by marking all static directly called functions with
the coldcc attribute through the option -enable-coldcc-stress-test. It also
includes an option, -ppc-enable-coldcc, to add the coldcc attribute to
functions which are cold at all call sites based on BlockFrequencyInfo when
the containing function does not call any non cold functions.

Differential Revision: https://reviews.llvm.org/D38413

llvm-svn: 323778
2018-01-30 16:17:22 +00:00
Evandro Menezes 07c78eeeef [AArch64] Add new target feature to handle cheap as move for Exynos
This feature enables special handling of cheap as move in the existing
custom handling specifically for Exynos processors.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323774
2018-01-30 15:40:22 +00:00
Evandro Menezes 9f9daa1f14 [AArch64] Add pipeline model for Exynos M3
Add the scheduling and cost model for Exynos M3.

Differential revision: https://reviews.llvm.org/D42387

llvm-svn: 323773
2018-01-30 15:40:16 +00:00