Commit Graph

24115 Commits

Author SHA1 Message Date
Zaara Syeda 6535993625 Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 328326
2018-03-23 15:28:15 +00:00
John Brawn e3b44f9de6 [AArch64] Don't reduce the width of loads if it prevents combining a shift
Loads and stores can only shift the offset register by the size of the value
being loaded, but currently the DAGCombiner will reduce the width of the load
if it's followed by a trunc making it impossible to later combine the shift.

Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and
make it prevent the width reduction if this is what would happen, though do
allow it if reducing the load width will let us eliminate a later sign or zero
extend.

Differential Revision: https://reviews.llvm.org/D44794

llvm-svn: 328321
2018-03-23 14:47:07 +00:00
Simon Pilgrim 8619962c73 [X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units
Fixes throughput to match Agner/Fam16h-SoG as well.

llvm-svn: 328318
2018-03-23 14:27:26 +00:00
Christof Douma 4a025cc79d [ARM] Support float literals under XO
When targeting execute-only and fp-armv8, float constants in a compare
resulted in instruction selection failures. This is now fixed by using
vmov.f32 where possible, otherwise the floating point constant is
lowered into a integer constant that is moved into a floating point
register.

This patch also restores using fpcmp with immediate 0 under fp-armv8.

Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443
llvm-svn: 328313
2018-03-23 13:02:03 +00:00
Amara Emerson f542355942 [GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT.
This was being masked because GISel is enabled by default for -O0 and
the abort was disabled. Modified test to explicitly enable abort.

llvm-svn: 328311
2018-03-23 12:48:57 +00:00
Simon Pilgrim 2755893834 [X86][SandyBridge] Fix missing comma that was causing string concatenation of 2 instregex entries
Found while updating D44687

llvm-svn: 328308
2018-03-23 11:56:38 +00:00
Martin Storsjo db75aa96d3 Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))"
This reverts commit r328252. This change broke building a number
of projects when targeting ARM and AArch64, see PR36873.

llvm-svn: 328297
2018-03-23 08:36:47 +00:00
Craig Topper 4787b7f434 [X86] Correct the latencies of SNB integer vector multiplies based on Agner's data. Add missing MMX multiplies.
llvm-svn: 328295
2018-03-23 06:41:43 +00:00
Craig Topper 7580a7997d [X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version.
llvm-svn: 328293
2018-03-23 06:41:40 +00:00
Craig Topper 7f142b8bf1 [X86] Merge VMOVMSKBrr and MOVMSKBrr in the SNB sheduler model.
The VMOVMSKBrr was in a separate InstRW with a lower latency, but I assume they should be the same and the higher latency matches Agners table so I'm going with that.

llvm-svn: 328291
2018-03-23 06:41:38 +00:00
Craig Topper fae4173b47 [X86] Add VEXTRB/W/D/Q to Zen scheduler model.
The SSE versions were present, but not the VEX version.

llvm-svn: 328290
2018-03-23 06:41:36 +00:00
Michael Zolotukhin fab7a676c2 State that CFG is preserved in 'Falkor HW Prefetch Fix Late Phase'.
That removes some redundant recomputations from the passes pipeline.

llvm-svn: 328272
2018-03-22 23:44:40 +00:00
Michael Zolotukhin 3520331f93 Reapply "[test] Add tests for llc passes pipelines." with a fix for bots with expensive checks on.
llvm-svn: 328267
2018-03-22 23:02:48 +00:00
Craig Topper adb173314d [X86] Correct the VROUND regular expressions in Znver1 scheduler model to account for r328254
llvm-svn: 328260
2018-03-22 22:17:11 +00:00
Craig Topper 40d3b32e12 [X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD
This makes the Y position consistent with other instructions.

This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary.

llvm-svn: 328254
2018-03-22 21:55:20 +00:00
Guozhi Wei 17ff975eb1 [DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))
In our real world application, we found the following optimization is missed in DAGCombiner

(zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst))

If the user of original zext is an add, it may enable further lea optimization on x86.

This patch add a new function CombineZExtLogicopShiftLoad to do this optimization.

Differential Revision: https://reviews.llvm.org/D44402

llvm-svn: 328252
2018-03-22 21:47:25 +00:00
Craig Topper 58afb4ea58 [X86][SkylakeClient] Fix a bunch of instructions that were incorrectly assigned Port015 instead of Port01.
The VEC ADD and VEC MUL units aren't present on port 5 on SkylakeClient.

llvm-svn: 328241
2018-03-22 21:10:07 +00:00
Jun Bum Lim 2ecb7ba4c6 [CodeGen] Add a new pass for PostRA sink
Summary:
This pass sinks COPY instructions into a successor block, if the COPY is not
used in the current block and the COPY is live-in to a single successor
(i.e., doesn't require the COPY to be duplicated).  This avoids executing the
the copy on paths where their results aren't needed.  This also exposes
additional opportunites for dead copy elimination and shrink wrapping.

These copies were either not handled by or are inserted after the MachineSink
pass. As an example of the former case, the MachineSink pass cannot sink
COPY instructions with allocatable source registers; for AArch64 these type
of copy instructions are frequently used to move function parameters (PhyReg)
into virtual registers in the entry block..

For the machine IR below, this pass will sink %w19 in the entry into its
successor (%bb.1) because %w19 is only live-in in %bb.1.

```
   %bb.0:
      %wzr = SUBSWri %w1, 1
      %w19 = COPY %w0
      Bcc 11, %bb.2
    %bb.1:
      Live Ins: %w19
      BL @fun
      %w0 = ADDWrr %w0, %w19
      RET %w0
    %bb.2:
      %w0 = COPY %wzr
      RET %w0
```
As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be
able to see %bb.0 as a candidate.

With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted  in spec2000/2006/2017 on AArch64.

Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz

Reviewed By: sebpop

Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D41463

llvm-svn: 328237
2018-03-22 20:06:47 +00:00
Nirav Dave 8c5f47ac40 [DAG, X86] Fix ISel-time node insertion ids
As in SystemZ backend, correctly propagate node ids when inserting new
unselected nodes into the DAG during instruction Seleciton for X86
target.

Fixes PR36865.

Reviewers: jyknight, craig.topper

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D44797

llvm-svn: 328233
2018-03-22 19:32:07 +00:00
Craig Topper 4a3be6e578 [X86] Correct the scheduling data for some of the 32 and 64 bit multiplies to as best as I understand how they are implemented.
llvm-svn: 328231
2018-03-22 19:22:51 +00:00
Aditya Nandakumar b3297ef051 [GISel]: Fix incorrect IRTranslation while translating null pointer types
https://reviews.llvm.org/D44762

Currently IRTranslator produces
%vreg17<def>(p0) = G_CONSTANT 0;

instead we should build
%vreg16(s64) = G_CONSTANT 0
%vreg17(p0) = G_INTTOPTR %vreg16

reviewed by @aemerson.

llvm-svn: 328218
2018-03-22 17:31:38 +00:00
Jonas Devlieghere 7e69dd02bb Revert "[test] Add tests for llc passes pipelines."
This reverts r328159 because the two AArch64 tests fail on GreenDragon:
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/

llvm-svn: 328188
2018-03-22 10:34:06 +00:00
Michael Zolotukhin 7e6fa1d6ae [test] Add tests for llc passes pipelines.
This is basically an extension of existing test
test/CodeGen/X86/O0-pipeline.ll introduced in r302608.

llvm-svn: 328159
2018-03-21 22:17:13 +00:00
Artem Belevich 30512869ff [NVPTX] Make tensor shape part of WMMA intrinsic's name.
This is needed for the upcoming implementation of the
new 8x32x16 and 32x8x16 variants of WMMA instructions
introduced in CUDA 9.1.

Differential Revision: https://reviews.llvm.org/D44719

llvm-svn: 328158
2018-03-21 21:55:02 +00:00
Lei Huang efd6f1c8e2 [POWER9][NFC] update testcase check statements
llvm-svn: 328147
2018-03-21 20:59:45 +00:00
Sanjay Patel e235942a1e [InstSimplify] fp_binop X, NaN --> NaN
We propagate the existing NaN value when possible.

Differential Revision: https://reviews.llvm.org/D44521

llvm-svn: 328140
2018-03-21 19:31:53 +00:00
Krzysztof Parzyszek c715a5d2b8 [Hexagon] Eliminate subregisters from PHI nodes before pipelining
The pipeliner needs to remove instructions from the SlotIndexes
structure when they are deleted. Otherwise, the SlotIndexes map
has stale data, and an assert will occur when adding new
instructions.

This patch also changes the pipeliner to make the back-edge of
a loop carried dependence 1 cycle. The 1 cycle latency is added
to the anti-dependence that represents the back-edge. This
changes eliminates a couple of hacks added to the pipeliner to
handle the latency of the back-edge. It is needed to correctly
pipeline the test case for the sub-register elimination pass.

llvm-svn: 328113
2018-03-21 16:39:11 +00:00
Alex Bradbury 65d6ea5e68 [RISCV] Codegen support for RV32F floating point comparison operations
This patch also includes extensive tests targeted at select and br+fcmp IR
inputs. A sequence of br+fcmp required support for FPR32 registers to be added
to RISCVInstrInfo::storeRegToStackSlot and
RISCVInstrInfo::loadRegFromStackSlot.

llvm-svn: 328104
2018-03-21 15:11:02 +00:00
Alex Bradbury 77d5927a1c [RISCV] Add tests missed from r327979
llvm-svn: 328102
2018-03-21 14:50:27 +00:00
Craig Topper 137a4dd84d [X86] Fix the SchedRW for XOP vpcom register form instructions to not be marked as loads.
llvm-svn: 328071
2018-03-21 03:41:33 +00:00
Craig Topper d25f1acf67 [X86] Change PMULLD to 10 cycles on Skylake per Agner's tables and llvm-exegesis.
Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server.

Fixes PR36808.

llvm-svn: 328061
2018-03-20 23:39:48 +00:00
Derek Schuff 39b5367cba [WebAssembly] Strip threadlocal attribute from globals in single thread mode
The default thread model for wasm is single, and in this mode thread-local
global variables can be lowered identically to non-thread-local variables.

Differential Revision: https://reviews.llvm.org/D44703

llvm-svn: 328049
2018-03-20 22:01:32 +00:00
Martin Storsjo 07589fc496 [X86] Don't use the MSVC stack protector names on mingw
Mingw uses the same stack protector functions as GCC provides
on other platforms as well.

Patch by Valentin Churavy!

Differential Revision: https://reviews.llvm.org/D27296

llvm-svn: 328039
2018-03-20 20:37:51 +00:00
Abderrazek Zaafrani 4c60c222e4 [AArch64] Add vmulxh_lane fp16 vector intrinsic
https://reviews.llvm.org/D44591

llvm-svn: 328035
2018-03-20 20:25:40 +00:00
Krzysztof Parzyszek 4094ab73cc [Hexagon] Add a few more lit tests, NFC
llvm-svn: 328023
2018-03-20 19:35:09 +00:00
Krzysztof Parzyszek 65059ee284 [Hexagon] Add heuristic to exclude critical path cost for scheduling
Patch by Brendon Cahoon.

llvm-svn: 328022
2018-03-20 19:26:27 +00:00
Craig Topper c2dbd677bd [PowerPC][LegalizeFloatTypes] Move the PPC hacks for (i32 fp_to_sint/fp_to_uint (ppcf128 X)) out of LegalizeFloatTypes and into PPC specific code
I'm not entirely sure these hacks are still needed. If you remove the hacks completely, the name of the library call that gets generated doesn't match the grep the test previously had. So the test wasn't really checking anything.

If the hack is still needed it belongs in PPC specific code. I believe the FP_TO_SINT code here is the only place in the tree where a FP_ROUND_INREG node is created today. And I don't think its even being used correctly because the legalization returned a BUILD_PAIR with the same value twice. That doesn't seem right to me. By moving the code entirely to PPC we can avoid creating the FP_ROUND_INREG at all.

I replaced the grep in the existing test with full checks generated by hacking update_llc_test_check.py to support ppc32 just long enough to generate it.

Differential Revision: https://reviews.llvm.org/D44061

llvm-svn: 328017
2018-03-20 18:49:28 +00:00
Krzysztof Parzyszek eb0c510ecd [X86] Add phony registers for high halves of regs with low halves
Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters
that cover the low halves of these registers. This change adds artificial
subregisters for the high halves in order to differentiate (in terms of
register units) between the 32- and the low 16-bit registers.

This patch contains parts that aim to preserve the calculated register
pressure. This is in order to preserve the current codegen (minimize the
impact of this patch). The approach of having artificial subregisters
could be used to fix PR23423, but the pressure calculation would need
to be changed.

Differential Revision: https://reviews.llvm.org/D43353

llvm-svn: 328016
2018-03-20 18:46:55 +00:00
Artem Belevich 914d4babec [NVPTX] Make tensor load/store intrinsics overloaded.
This way we can support address-space specific variants without explicitly
encoding the space in the name of the intrinsic. Less intrinsics to deal with ->
less boilerplate.

Added a bit of tablegen magic to match/replace an intrinsics with a pointer
argument in particular address space with the space-specific instruction
variant.

Updated tests to use non-default address spaces.

Differential Revision: https://reviews.llvm.org/D43268

llvm-svn: 328006
2018-03-20 17:18:59 +00:00
Krzysztof Parzyszek 4c6b65f685 [Hexagon] Correct the computation of TopReadyCycle and BotReadyCycle of SU
TopReadyCycle and BotReadyCycle were off by one cycle when an SU is either
the first instruction or the last instruction in a packet.

Patch by Ikhlas Ajbar.

llvm-svn: 328000
2018-03-20 17:03:27 +00:00
Michael Zolotukhin fb3f509e01 [XRay] Lazily compute MachineLoopInfo instead of requiring it.
Summary:
Currently X-Ray Instrumentation pass has a dependency on MachineLoopInfo
(and thus on MachineDominatorTree as well) and we have to compute them
even if X-Ray is not used. This patch changes it to a lazy computation
to save compile time by avoiding these redundant computations.

Reviewers: dberris, kubamracek

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D44666

llvm-svn: 327999
2018-03-20 17:02:29 +00:00
Sanjay Patel 5a9210e651 [AArch64] add fabs tests for PR36600; NFC
llvm-svn: 327995
2018-03-20 16:08:47 +00:00
Simon Pilgrim 62690e9d0e [X86][Haswell][Znver1] Fix typo in fldl instregexs
Missing comma was casing 2 instregex entries to be concatenated together by mistake.

Found while investigating PR35548

llvm-svn: 327992
2018-03-20 15:44:47 +00:00
Alex Bradbury 80c8eb7696 [RISCV] Add codegen for RV32F floating point load/store
As part of this, add support for load/store from the constant pool. This is
used to materialise f32 constants.

llvm-svn: 327979
2018-03-20 13:26:12 +00:00
Alex Bradbury 76c29ee815 [RISCV] Add codegen for RV32F arithmetic and conversion operations
Currently, only a soft floating point ABI is supported.

llvm-svn: 327976
2018-03-20 12:45:35 +00:00
Krzysztof Parzyszek dca383123f [Hexagon] Improve scheduling based on register pressure
Patch by Brendon Cahoon.

llvm-svn: 327975
2018-03-20 12:28:43 +00:00
Dylan McKay 212841b7ad [AVR] Add a regression test for struct return lowering
The test is taken from
https://github.com/avr-rust/rust/issues/57

The originally implementation of struct return lowering was made in
r325474.

Patch by Peter Nimmervoll

llvm-svn: 327967
2018-03-20 11:23:03 +00:00
Jonas Paulsson 8ad035d8e5 [SystemZ] Add "REQUIRES: asserts" to test case to fix build bots.
llvm-svn: 327958
2018-03-20 08:29:19 +00:00
Martin Storsjo 802b434156 [X86] Properly implement the calling convention for f80 for mingw/x86_64
In these cases, both parameters and return values are passed
as a pointer to a stack allocation.

MSVC doesn't use the f80 data type at all, while it is used
for long doubles on mingw.

Normally, this part of the calling convention is handled
within clang, but for intrinsics that are lowered to libcalls,
it may need to be handled within llvm as well.

Differential Revision: https://reviews.llvm.org/D44592

llvm-svn: 327957
2018-03-20 06:19:38 +00:00
Craig Topper 4778fa7e8a [X86] Fix the SchedRW for memory forms of CMP and TEST.
They were incorrectly marked as RMW operations. Some of the CMP instrucions worked, but the ones that use a similar encoding as RMW form of ADD ended up marked as RMW.

TEST used the same tablegen class as some of the CMPs.

llvm-svn: 327947
2018-03-20 03:55:17 +00:00
Craig Topper 3e9462607e [X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models.
Move it from a load+store group on SNB to a load only group, the same group as CMP.

llvm-svn: 327944
2018-03-20 03:02:03 +00:00
Craig Topper 7c90e29cf8 [X86] Add ROR/ROL/SHR/SAR by 1 instructions to the Sandy Bridge scheduler model.
I assume these match the generic immediate version like they do in the other models.

llvm-svn: 327943
2018-03-20 03:01:59 +00:00
Quentin Colombet 508f68233d [ShrinkWrap] Take into account landing pad
When scanning the function for CSRs uses and defs, also check if
the basic block are landing pads.
Consider that landing pads needs the CSRs to be properly set.
That way we force the prologue/epilogue to always be pushed out
of the problematic "throw" region. The "throw" region is
problematic because the jumps are not properly modeled.

Fixes PR36513

llvm-svn: 327942
2018-03-20 02:44:40 +00:00
Shiva Chen cbd498ac10 [RISCV] Preserve stack space for outgoing arguments when the function contain variable size objects
E.g.

bar (int x)
{
  char p[x];

  push outgoing variables for foo.
  call foo
}

We need to generate stack adjustment instructions for outgoing arguments by
eliminateCallFramePseudoInstr when the function contains variable size
objects to avoid outgoing variables corrupt the variable size object.

Default hasReservedCallFrame will return !hasFP().
We don't want to generate extra sp adjustment instructions when hasFP()
return true, So We override hasReservedCallFrame as !hasVarSizedObjects().

Differential Revision: https://reviews.llvm.org/D43752

llvm-svn: 327938
2018-03-20 01:39:17 +00:00
Craig Topper 2330d6cd55 [X86] Fix the SNB scheduler for BLENDVB.
PBLENDVBrr0 was with the memory version of VBLENDVB and PBLENDVBrm0 was missing.

llvm-svn: 327937
2018-03-20 01:30:21 +00:00
Rafael Espindola d947977e87 Run dos2unix on a test. NFC.
llvm-svn: 327934
2018-03-20 01:06:29 +00:00
Aaron Smith 6738960588 [SelectionDAG] Transfer DbgValues when integer operations are promoted
Summary:
DbgValue nodes were not transferred when integer DAG nodes were promoted. For example, if an i32 add node was promoted to an i64 add node by DAGTypeLegalizer::PromoteIntegerResult(), its DbgValue node was not transferred to the new node. The simple fix is to update SetPromotedInteger() to transfer DbgValues. 

Add AArch64/dbg-value-i8.ll to test this change and fix ARM/debug-info-d16-reg.ll which had the wrong DILocalVariable nodes with arg numbers even though they are not for function parameters.

Patch by Se Jong Oh!

Reviewers: vsk, JDevlieghere, aprantl

Reviewed By: JDevlieghere

Subscribers: javed.absar, kristof.beyls, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D44546

llvm-svn: 327919
2018-03-19 22:58:50 +00:00
Jessica Paquette 563548d8f3 [MachineOutliner] AArch64: Emit CFI instructions when outlining calls
When outlining calls, the outliner needs to update CFI to ensure that, say,
exception handling works. This commit adds that functionality and adds a test
just for call outlining.

Call outlining stuff in machine-outliner.mir should be moved into
machine-outliner-calls.mir in a later commit.

llvm-svn: 327917
2018-03-19 22:48:40 +00:00
Krzysztof Parzyszek c76fefc6de [Hexagon] Add REQUIRES: asserts to test/CodeGen/Hexagon/v6vec_inc1.ll
llvm-svn: 327907
2018-03-19 21:05:21 +00:00
Nirav Dave 3264c1bdf6 [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying node id
invariant traversal and correcting typo.

llvm-svn: 327898
2018-03-19 20:19:46 +00:00
Martin Storsjo 9a55c1b0dc [ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes
This extends the use of this attribute on ARM and AArch64 from
SVN r325900 (where it was only checked for fixed stack
allocations on ARM/AArch64, but for all stack allocations on X86).

This also adds a testcase for the existing use of disabling the
fixed stack probe with the attribute on ARM and AArch64.

Differential Revision: https://reviews.llvm.org/D44291

llvm-svn: 327897
2018-03-19 20:06:50 +00:00
Sanjay Patel f2d85e78df [AMDGPU] change test to avoid NaN math
llvm-svn: 327891
2018-03-19 19:26:22 +00:00
Sanjay Patel dad3d13b99 [AMDGPU] adjust tests to be nan-free
As suggested in D44521 - bitcast to integer for the math,
so we preserve the intent of these tests when NaN math
gets folded away.

llvm-svn: 327890
2018-03-19 19:23:53 +00:00
Lei Huang ecfede94a7 [Power9]Legalize and emit code for quad-precision copySign/abs/nabs/neg/sqrt
Legalize and emit code for quad-precision floating point operations:

  * xscpsgnqp
  * xsabsqp
  * xsnabsqp
  * xsnegqp
  * xssqrtqp

Differential Revision: https://reviews.llvm.org/D44530

llvm-svn: 327889
2018-03-19 19:22:52 +00:00
Krzysztof Parzyszek 461e6691eb [Hexagon] Add a few more lit tests
llvm-svn: 327884
2018-03-19 19:03:18 +00:00
Craig Topper 5e65996fac [X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model.
PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong.

llvm-svn: 327882
2018-03-19 19:00:35 +00:00
Craig Topper b4c7873f8c [X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models.
JRCXZ was already present, but not the others.

We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output.

llvm-svn: 327881
2018-03-19 19:00:32 +00:00
Craig Topper afabf36505 [X86] Correct regular expression in Zen scheduler model that was excluding JECXZ instruction.
The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago.

llvm-svn: 327880
2018-03-19 19:00:29 +00:00
Lei Huang 6d1596a98c [PowerPC][Power9]Legalize and emit code for quad-precision add/div/mul/sub
Legalize and emit code for quad-precision floating point operations:

  * xsaddqp
  * xssubqp
  * xsdivqp
  * xsmulqp

Differential Revision: https://reviews.llvm.org/D44506

llvm-svn: 327878
2018-03-19 18:52:20 +00:00
Nemanja Ivanovic d9d5bd3067 [PowerPC] Make AddrSpaceCast noop
PowerPC targets do not use address spaces. As a result, we can get selection
failures with address space casts. This patch makes those casts noops.

Patch by Valentin Churavy.

Differential revision: https://reviews.llvm.org/D43781

llvm-svn: 327877
2018-03-19 18:50:02 +00:00
Craig Topper 259eaa6e7c [X86] Remove sse41 specific code from lowering v16i8 multiply
With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that.

llvm-svn: 327869
2018-03-19 17:31:41 +00:00
Craig Topper 5ccd87233f [X86] Make the multiply and divide itineraries more consistent.
Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage.

We also used the MUL8 itinerary for MULX32/64 which was also weird.

The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result.

llvm-svn: 327866
2018-03-19 16:38:33 +00:00
Zaara Syeda 01f414baaa Revert [MachineLICM] This reverts commit rL327856
Failing build bots. Revert the commit now.

llvm-svn: 327864
2018-03-19 16:19:44 +00:00
Matt Davis 4b54e5fc38 [CodeGen] Avoid handling DBG_VALUE in the LivePhysRegs (addUses,removeDefs,stepForward)
Summary:
This patch prevents DBG_VALUE instructions from influencing
LivePhysRegs::stepBackwards and stepForwards.  In at least one case,
specifically branch folding, the stepBackwards logic was having an
influence on code generation.  The result was that certain code
compiled with '-g -O2' would differ from that compiled with '-O2'
alone. It seems that the original logic, accounting for DBG_VALUE,
was influencing the placement of an IMPLICIT_DEF which had a later
impact on how blocks were processed in branch folding.

Reviewers: kparzysz, MatzeB

Reviewed By: kparzysz

Subscribers: bjope, llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D43850

llvm-svn: 327862
2018-03-19 16:06:40 +00:00
Zaara Syeda ff05e2b0e6 [MachineLICM] Add functions to MachineLICM to hoist invariant stores
This patch adds functions to allow MachineLICM to hoist invariant stores.
Currently, MachineLICM does not hoist any store instructions, however
when storing the same value to a constant spot on the stack, the store
instruction should be considered invariant and be hoisted. The function
isInvariantStore iterates each operand of the store instruction and checks
that each register operand satisfies isCallerPreservedPhysReg. The store
may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore.
This patch also adds the PowerPC changes needed to consider the stack
register as caller preserved.

Differential Revision: https://reviews.llvm.org/D40196

llvm-svn: 327856
2018-03-19 14:52:25 +00:00
Simon Pilgrim 30c38c3849 [X86] Generalize schedule classes to support multiple stages
Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults.

This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases.

I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific.

Differential Revision: https://reviews.llvm.org/D44612

llvm-svn: 327855
2018-03-19 14:46:07 +00:00
Sanjay Patel 05daae75ad [x86] put nops into the WriteNop class and customize for Jaguar
1. Given that we already have a classification bucket with 'nop' in the name, 
   that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'.
2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably 
   llvm-mca) how to model the resource requirements better even though a nop has no 
   dependencies.

Differential Revision: https://reviews.llvm.org/D44608

llvm-svn: 327853
2018-03-19 14:26:50 +00:00
Matt Arsenault fed0a45036 AMDGPU/GlobalISel: RegBankSelect for basic int ops
llvm-svn: 327843
2018-03-19 14:07:23 +00:00
Matt Arsenault 69932e4d69 AMDGPU: Don't leave dead illegal VGPR->SGPR copies
Normally DCE kills these, but at -O0 these get left behind
leaving suspicious looking illegal copies.

Replace with IMPLICIT_DEF to avoid iterator issues.

llvm-svn: 327842
2018-03-19 14:07:15 +00:00
Clement Courbet 6d047b70a4 [MergeICmps] Re-land 324317 "Enable the MergeICmps Pass by default."
Now that PR36557 is fixed.

llvm-svn: 327840
2018-03-19 13:37:04 +00:00
Sjoerd Meijer d16037d9bb [ARM] Support for v4f16 and v8f16 vectors
This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which
uses v4f16 and v8f16 vector operands and return values. All the moving parts
are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16
intrinsic. In a follow-up patch the rest of the intrinsics and tests will be
added.

Differential Revision: https://reviews.llvm.org/D44538

llvm-svn: 327839
2018-03-19 13:35:25 +00:00
Jonas Paulsson a6216ec4cc [SystemZ] Bugfix of CC liveness in emitMemMemWrapper (CLC).
If DoneMBB becomes empty it must have CC added to its live-in list, since it
will fall-through into EndMBB. This happens when the CLC loop does the
complete range.

Review: Ulrich Weigand
llvm-svn: 327834
2018-03-19 13:05:22 +00:00
Alex Bradbury 0171a9f4ec [RISCV] Peephole optimisation for load/store of global values or constant addresses
(load (add base, off), 0) -> (load base, off)
(store val, (add base, off)) -> (store val, base, off)

This is similar to an equivalent peephole optimisation in PPCISelDAGToDAG.

llvm-svn: 327831
2018-03-19 11:54:28 +00:00
Craig Topper d10ceffa5f [X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8.
Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm.

llvm-svn: 327820
2018-03-19 04:21:40 +00:00
Dylan McKay a35ee70641 [AVR] Lower i128 divisions to runtime library calls
This patch adds i128 division support by instruction LLVM to lower
128-bit divisions to the __udivmodti4 and __divmodti4 rtlib functions.

This also adds test for 64-bit division and 128-bit division.

Patch by Peter Nimmervoll.

llvm-svn: 327814
2018-03-19 00:55:50 +00:00
Simon Pilgrim 203876f104 [X86][Btver2] Fix crc32 schedule costs
The default is currently FAdd for some reason

llvm-svn: 327807
2018-03-18 19:54:42 +00:00
Craig Topper 2d451e73f9 [X86] Fix a bunch of overlapping regular expressions in the scheduler models.
llvm-svn: 327787
2018-03-18 08:38:06 +00:00
Craig Topper 89dcda3e90 [X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models.
The information was so wildly inaccurate and incomplete its better to just remove it.

MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information.

MMX_MASKMOVQ and MASKMOVDQU were completely missing.

MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction.

Filed PR36780 to track fixing this right.

llvm-svn: 327783
2018-03-18 03:24:42 +00:00
Nirav Dave 5f0ab71b62 Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172""
as it times out building test-suite on PPC.

llvm-svn: 327778
2018-03-17 19:24:54 +00:00
Nirav Dave 982d3a56ea [DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"
Reland ISel cycle checking improvements after simplifying and reducing
node id invariant traversal.

llvm-svn: 327777
2018-03-17 17:42:10 +00:00
Matt Arsenault abdc4f2dc7 AMDGPU/GlobalISel: Cleanup constant legality
llvm-svn: 327774
2018-03-17 15:17:48 +00:00
Matt Arsenault 685d1e8157 AMDGPU/GlobalISel: Basic G_GEP legality
llvm-svn: 327773
2018-03-17 15:17:45 +00:00
Matt Arsenault 85803366d6 AMDGPU/GlobalISel: Basic legality for load/store
llvm-svn: 327772
2018-03-17 15:17:41 +00:00
Oren Ben Simhon fdd72fd522 [X86] Added support for nocf_check attribute for indirect Branch Tracking
X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET).
IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp.
The `nocf_check` attribute has two roles in the context of X86 IBT technology:
	1. Appertains to a function - do not add ENDBR instruction at the beginning of the function.
	2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction.

This patch implements `nocf_check` context for Indirect Branch Tracking.
It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks.

Differential Revision: https://reviews.llvm.org/D41879

llvm-svn: 327767
2018-03-17 13:29:46 +00:00
Jonas Paulsson dbcf1bf503 [SystemZ] Add 'REQUIRES: asserts' to test case using debug output.
llvm-svn: 327766
2018-03-17 09:15:13 +00:00
Jonas Paulsson 138960770c [SystemZ] computeKnownBitsForTargetNode() / ComputeNumSignBitsForTargetNode()
Improve/implement these methods to improve DAG combining. This mainly
concerns intrinsics.

Some constant operands to SystemZISD nodes have been marked Opaque to avoid
transforming back and forth between generic and target nodes infinitely.

Review: Ulrich Weigand
llvm-svn: 327765
2018-03-17 08:32:12 +00:00
Jonas Paulsson e9f7fa83d5 [SelectionDAG] Handle big endian target BITCAST in computeKnownBits()
The BITCAST handling in computeKnownBits() previously only worked for little
endian.

This patch reverses the iteration over elements for a big endian target which
allows this to work in this case also.

SystemZ test case.

Review: Eli Friedman
https://reviews.llvm.org/D44249

llvm-svn: 327764
2018-03-17 08:04:00 +00:00
Jessica Paquette b3e7dc9144 [MachineOutliner] Make KILLs invisible
At the point the outliner runs, KILLs don't impact anything, but they're still
considered unique instructions. This commit makes them invisible like
DebugValues so that they can still be outlined without impacting outlining
decisions.

llvm-svn: 327760
2018-03-16 22:53:34 +00:00
Krzysztof Parzyszek f81a8d03c1 [Hexagon] Avoid bank conflicts in post-RA scheduler
Avoid scheduling two loads in such a way that they would end up in the
same packet. If there is a load in a packet, try to schedule a non-load
next.

Patch by Brendon Cahoon.

llvm-svn: 327742
2018-03-16 20:55:49 +00:00
Krzysztof Parzyszek 889cbcacbc [Hexagon] Add lit testcases for atomic intrinsics
Patch by Ben Craig.

llvm-svn: 327737
2018-03-16 20:21:43 +00:00
Reid Kleckner f8b51c5f90 [IR] Avoid the need to prefix MS C++ symbols with '\01'
Now the Windows mangling modes ('w' and 'x') do not do any mangling for
symbols starting with '?'. This means that clang can stop adding the
hideous '\01' leading escape. This means LLVM debug logs are less likely
to contain ASCII escape characters and it will be easier to copy and
paste MS symbol names from IR.

Finally.

For non-Windows platforms, names starting with '?' still get IR
mangling, so once clang stops escaping MS C++ names, we will get extra
'_' prefixing on MachO. That's fine, since it is currently impossible to
construct a triple that uses the MS C++ ABI in clang and emits macho
object files.

Differential Revision: https://reviews.llvm.org/D7775

llvm-svn: 327734
2018-03-16 20:13:32 +00:00
Farhana Aleen c6c9dc8773 [AMDGPU] Supported ds_write_b128 generation.
Summary: This is a follow-on patch of https://reviews.llvm.org/D44210

Author: FarhanaAleen

Reviewed By: msearles

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44319

llvm-svn: 327726
2018-03-16 18:12:00 +00:00
Craig Topper e6913ec340 [X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits.
We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations.

So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one.

Differential Revision: https://reviews.llvm.org/D44289

llvm-svn: 327724
2018-03-16 17:13:42 +00:00
Dmitry Preobrazhensky 4c8f4234b6 [AMDGPU][MC][GFX8][GFX9][DISASSEMBLER] Added "_e32" suffix to 32-bit VINTRP opcodes
See bug 36751: https://bugs.llvm.org/show_bug.cgi?id=36751

Differential Revision: https://reviews.llvm.org/D44529

Reviewers: artem.tamazov, arsenm
llvm-svn: 327723
2018-03-16 16:38:04 +00:00
Krzysztof Parzyszek 9915291ab8 [Hexagon] Fix zero-extending non-HVX bool vectors
llvm-svn: 327712
2018-03-16 15:03:37 +00:00
Simon Pilgrim 23578e7d3c [X86][Btver2] Add correct mul/imul schedule costs
Integer multiply is performed on the JMul function unit and i64 requires double pumping

llvm-svn: 327707
2018-03-16 14:01:01 +00:00
Simon Pilgrim 8d28ae6aec [X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs
Don't use WriteIMul defaults

llvm-svn: 327706
2018-03-16 13:43:55 +00:00
Sjoerd Meijer d391a1a985 [ARM] FP16 codegen support for VSEL
This implements lowering of SELECT_CC for f16s, which enables
codegen of VSEL with f16 types.

Differential Revision: https://reviews.llvm.org/D44518

llvm-svn: 327695
2018-03-16 08:06:25 +00:00
Craig Topper 1b8cf49704 [SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc.
Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type.

But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86.

This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck.

This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to.

For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing.

llvm-svn: 327683
2018-03-15 23:04:11 +00:00
Craig Topper c3983c34cd [X86] Make sure we use FSUB instruction as the reference for operand order in isAddSubOrSubAdd when recognizing subadd
The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node.

llvm-svn: 327660
2018-03-15 20:30:54 +00:00
Craig Topper 46502fa2ef [X86] Add test case showing bad fmsubadd creation due to bad commuting.
The code that creates fmsubadd from shuffle vector has some code to allow commuting the operands of the fadd node. This code was originally created when we only recognized fmaddsub. When fmsubadd support was added this code was not updated and is now commuting the fsub operands instead.

llvm-svn: 327659
2018-03-15 20:30:51 +00:00
Guozhi Wei 9c916584ba [PPC] Avoid non-simple MVT in STBRX optimization
PR35402 triggered this case. It bswap and stores a 48bit value, current STBRX optimization transforms it into STBRX. Unfortunately 48bit is not a simple MVT, there is no PPC instruction to support it, and it can't be automatically expanded by llvm, so caused a crash.

This patch detects the non-simple MVT and returns early.

Differential Revision: https://reviews.llvm.org/D44500

llvm-svn: 327651
2018-03-15 17:49:12 +00:00
Zaara Syeda 1110c4d336 [PowerPC] Optimize TLS initial-exec sequence to use X-Form loads/stores
This patch adds new load/store instructions for integer scalar types
which can be used for X-Form when fed by add with an @tls relocation.

Differential Revision: https://reviews.llvm.org/D43315

llvm-svn: 327635
2018-03-15 15:34:41 +00:00
Simon Pilgrim d30df5769e [X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes
Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6)

llvm-svn: 327632
2018-03-15 15:12:12 +00:00
Simon Pilgrim fb7aa57bf1 [X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes
As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types.

I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428.

Differential Revision: https://reviews.llvm.org/D44471

llvm-svn: 327630
2018-03-15 14:45:30 +00:00
Simon Pilgrim 69a4132f63 [X86] Regenerate schedule tests with zero latency comments
llvm-svn: 327628
2018-03-15 14:30:59 +00:00
Sjoerd Meijer dfc7eb490a [AArch64] Codegen tests for the Armv8.2-A FP16 intrinsics
This is a follow up of the AArch64 FP16 intrinsics work;
the codegen tests had not been added yet.

Differential Revision: https://reviews.llvm.org/D44510

llvm-svn: 327624
2018-03-15 13:42:28 +00:00
Craig Topper ff6e82c9d0 [X86] Add test cases for 512-bit addsub from build_vector.
There is no 512 bit addsub instruction, but we partially match it handle fmaddsub matching. We explicitly bail out for 512 bit vectors after failing the fmaddsub match, but we had no test coverage for that bail out.

We might want to consider splitting and using 256 bit instructions instead of the long sequence seen here.

llvm-svn: 327605
2018-03-15 06:49:01 +00:00
Craig Topper 26a3a80c87 [X86] Add support for matching FMSUBADD from build_vector.
llvm-svn: 327604
2018-03-15 06:14:55 +00:00
Mark Searles c3c02bde73 [AMDGPU] Waitcnt pass: Modify the waitcnt pass to propagate info in the case of a single basic block loop. mergeInputScoreBrackets() does this for us; update it so that it processes the single bb's score bracket when processing the single bb's preds. It is, after all, a pred of itself, so it's score bracket is needed.
Differential Revision: https://reviews.llvm.org/D44434

llvm-svn: 327583
2018-03-14 22:04:32 +00:00
Reid Kleckner 3a7a2e4a0a [FastISel] Sink local value materializations to first use
Summary:
Local values are constants, global addresses, and stack addresses that
can't be folded into the instruction that uses them. For example, when
storing the address of a global variable into memory, we need to
materialize that address into a register.

FastISel doesn't want to materialize any given local value more than
once, so it generates all local value materialization code at
EmitStartPt, which always dominates the current insertion point. This
allows it to maintain a map of local value registers, and it knows that
the local value area will always dominate the current insertion point.

The downside is that local value instructions are always emitted without
a source location. This is done to prevent jumpy line tables, but it
means that the local value area will be considered part of the previous
statement. Consider this C code:
  call1();      // line 1
  ++global;     // line 2
  ++global;     // line 3
  call2(&global, &local); // line 4

Today we end up with assembly and line tables like this:
  .loc 1 1
  callq call1
  leaq global(%rip), %rdi
  leaq local(%rsp), %rsi
  .loc 1 2
  addq $1, global(%rip)
  .loc 1 3
  addq $1, global(%rip)
  .loc 1 4
  callq call2

The LEA instructions in the local value area have no source location and
are treated as being on line 1. Stepping through the code in a debugger
and correlating it with the assembly won't make much sense, because
these materializations are only required for line 4.

This is actually problematic for the VS debugger "set next statement"
feature, which effectively assumes that there are no registers live
across statement boundaries. By sinking the local value code into the
statement and fixing up the source location, we can make that feature
work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and
https://crbug.com/793819.

This change is obviously not enough to make this feature work reliably
in all cases, but I felt that it was worth doing anyway because it
usually generates smaller, more comprehensible -O0 code. I measured a
0.12% regression in code generation time with LLC on the sqlite3
amalgamation, so I think this is worth doing.

There are some special cases worth calling out in the commit message:
1. local values materialized for phis
2. local values used by no-op casts
3. dead local value code

Local values can be materialized for phis, and this does not show up as
a vreg use in MachineRegisterInfo. In this case, if there are no other
uses, this patch sinks the value to the first terminator, EH label, or
the end of the BB if nothing else exists.

Local values may also be used by no-op casts, which adds the register to
the RegFixups table. Without reversing the RegFixups map direction, we
don't have enough information to sink these instructions.

Lastly, if the local value register has no other uses, we can delete it.
This comes up when fastisel tries two instruction selection approaches
and the first materializes the value but fails and the second succeeds
without using the local value.

Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo

Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43093

llvm-svn: 327581
2018-03-14 21:54:21 +00:00
Francis Visoiu Mistrih e85b06d65f [CodeGen] Use MIR syntax for MachineMemOperand printing
Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)".

rdar://38163529

Differential Revision: https://reviews.llvm.org/D42377

llvm-svn: 327580
2018-03-14 21:52:13 +00:00
Simon Pilgrim adf72e8549 [X86] Add haswell testing for PR35635 as well.
To improve complete model testing for schedulers for instructions with multiple results.

llvm-svn: 327572
2018-03-14 21:03:09 +00:00
Francis Visoiu Mistrih 164560bd74 [AArch64] Emit CSR loads in the same order as stores
Optionally allow the order of restoring the callee-saved registers in the
epilogue to be reversed.

The flag -reverse-csr-restore-seq generates the following code:

```
stp     x26, x25, [sp, #-64]!
stp     x24, x23, [sp, #16]
stp     x22, x21, [sp, #32]
stp     x20, x19, [sp, #48]

; [..]

ldp     x24, x23, [sp, #16]
ldp     x22, x21, [sp, #32]
ldp     x20, x19, [sp, #48]
ldp     x26, x25, [sp], #64
ret
```

Note how the CSRs are restored in the same order as they are saved.

One exception to this rule is the last `ldp`, which allows us to merge
the stack adjustment and the ldp into a post-index ldp. This is done by
first generating:

ldp x26, x27, [sp]
add sp, sp, #64

which gets merged by the arm64 load store optimizer into

ldp x26, x25, [sp], #64

The flag is disabled by default.

llvm-svn: 327569
2018-03-14 20:34:03 +00:00
Craig Topper 9c098ed819 [X86] Add back fast-isel code for handling i8 shifts.
I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds.

This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0.

Fixes PR36731.

llvm-svn: 327540
2018-03-14 17:57:19 +00:00
Francis Visoiu Mistrih 084e7d8770 [AArch64] Keep track of MIFlags in the LoadStoreOptimizer
Merging:

* $x26, $x25 = frame-setup LDPXi $sp, 0
* $sp = frame-destroy ADDXri $sp, 64, 0

into an LDPXpost should preserve the flags from both instructions as
following:

* frame-setup frame-destroy LDPXpost

Differential Revision: https://reviews.llvm.org/D44446

llvm-svn: 327533
2018-03-14 17:10:58 +00:00
Craig Topper b36cb20ef9 [X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend.
I had to modify the bswap recognition to allow unshrunk masks to make this work.

Fixes PR36689.

Differential Revision: https://reviews.llvm.org/D44442

llvm-svn: 327530
2018-03-14 16:55:15 +00:00
Simon Pilgrim d1c3c995c0 [X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327524
2018-03-14 15:47:08 +00:00
Arnold Schwaighofer bf1638daa8 SjLjEHPrepare: Don't reg-to-mem swifterror values
swifterror llvm values model the swifterror register as memory at the
LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror
values are constraint in how they can be used. Spilling them to memory
is not allowed.

SjLjEHPrepare tried to lower swifterror values to memory which is
unecessary since the back-end will spill and reload the register as
neccessary (as long as clobbering calls are marked as such which is the
case here) and further leads to invalid IR because swifterror values
can't be stored to memory.

rdar://38164004

llvm-svn: 327521
2018-03-14 15:44:07 +00:00
Alexander Ivchenko 86ef9ab28f [GlobalIsel][X86] Support for G_SDIV instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44430

llvm-svn: 327520
2018-03-14 15:41:11 +00:00
Simon Pilgrim d594942928 [X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs
Account for ymm double pumping and add proper pshufb/permutevar support

llvm-svn: 327510
2018-03-14 14:05:19 +00:00
Simon Pilgrim de995e6e37 [X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions
They shouldn't be treated as pure loads.

Found while investigating D44428

llvm-svn: 327505
2018-03-14 13:22:56 +00:00
Martin Storsjo bde677289a [AArch64] Don't produce R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC
Support for this relocation is missing in both LLD and GNU binutils
at the moment.

This reverts the ELF parts of SVN r327316.

llvm-svn: 327503
2018-03-14 13:09:10 +00:00
Alexander Ivchenko 0bd4d8c901 [GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL
Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for
shift instructions : shift gpr, shift imm, shift 1.
Currently GlobalIsel TableGen generate patterns for
shift imm and shift 1, but with shiftCount i8.
In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments
has the same type, so for now only shift i8 can use
auto generated TableGen patterns.

The support of G_SHL/G_ASHR enables tryCombineSExt
from LegalizationArtifactCombiner.h to hit, which
results in different legalization for the following tests:
    LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll
    LLVM :: CodeGen/X86/GlobalISel/gep.ll
    LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir

-; X64-NEXT:    movsbl %dil, %eax
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    shll %cl, %edi
+; X64-NEXT:    movl $24, %ecx
+; X64-NEXT:    # kill: def $cl killed $ecx
+; X64-NEXT:    sarl %cl, %edi
+; X64-NEXT:    movl %edi, %eax

..which is not optimal and should be addressed later.

Rework of the patch by igorb

Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44395

llvm-svn: 327499
2018-03-14 11:23:57 +00:00
Alexander Ivchenko 327de80529 [GlobalIsel][X86] Support for G_ZEXT instruction
Reviewed By: igorb

Differential Revision: https://reviews.llvm.org/D44378

llvm-svn: 327482
2018-03-14 09:11:23 +00:00
Craig Topper 9ca7e67c4c [X86] Re-generate test to get proper capitalization of its CHECK lines. NFC
llvm-svn: 327462
2018-03-13 23:31:48 +00:00
Craig Topper cc060e921b [X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats.
This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs.

This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best.

We probably want to merge this with the vXi1 concat code since they are very similar.

llvm-svn: 327454
2018-03-13 22:05:25 +00:00
Craig Topper 4aeec51986 [DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG.
BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling.

llvm-svn: 327446
2018-03-13 20:36:28 +00:00
Francis Visoiu Mistrih 3abf05739f [MIR] Allow frame-setup and frame-destroy on the same instruction
Nothing prevents us from having both frame-setup and frame-destroy on
the same instruction.

When merging:
* frame-setup OPCODE1
* frame-destroy OPCODE2
into
* frame-setup frame-destroy OPCODE3

we want to be able to print and parse both flags.

llvm-svn: 327442
2018-03-13 19:53:16 +00:00
Sanjay Patel bb45cc126d [x86] add test for WriteZero sched class instructions; NFC
Nops should have zero latency because there is no result.
Idioms like 'xorps xmm0, xmm0' may have zero latency because 
they are handled without using an execution unit.

llvm-svn: 327435
2018-03-13 19:20:01 +00:00
Simon Pilgrim 9855b39380 [DAGCombine] visitREM - Don't assume that one divrem isn't driving another
Under some circumstances the divrems won't have been combined together before getting to this code.

So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen.

Reduced from OSS-Fuzz #6883
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883

llvm-svn: 327424
2018-03-13 17:17:15 +00:00
Simon Pilgrim 3d4c86d399 [X86][Btver2] Split i8/i16/i32/i64 div/idiv costs
We were assuming a mixture of 32/64 division costs.

llvm-svn: 327407
2018-03-13 15:22:24 +00:00
Simon Dardis 476ed8f26e [mips] Fix the definitions of the EVA instructions
Correct their availability to their respective ISAs.

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D44209

llvm-svn: 327403
2018-03-13 14:39:44 +00:00
Simon Dardis 9d7e9032f1 [mips] Don't create nested CALLSEQ_START..CALLSEQ_END nodes.
For the MIPS O32 ABI, the current call lowering logic naively lowers each
call, creating the reserved argument area to hold the argument spill areas for
$a0..$a3 and the outgoing parameter area if one is required at each call site.

In the case of a sufficently large byval argument, a call to memcpy is used
to write the start+16..end of the argument into the outgoing parameter area.
This is done within the CALLSEQ_START..CALLSEQ_END of the callee. The CALLSEQ
nodes are responsible for performing the necessary stack adjustments.

Since the O32/N32/N64 MIPS ABIs do not have a red-zone and writing below the
stack pointer and reading the values back is unpredictable, the call to memcpy
cannot be hoisted out of the callee's CALLSEQ nodes.

However, for the O32 ABI requires the reserved argument area for functions
which have parameters. The naive lowering of calls will then create nested
CALLSEQ sequences. For N32 and N64 these nodes are also created, but with
zero stack adjustments as those ABIs do not have a reserved argument area.

This patch addresses the correctness issue by recognizing the special case
of lowering a byval argument that uses memcpy. By recognizing that the
incoming chain already has a CALLSEQ_START node on it when calling memcpy,
the CALLSEQ nodes are not created. For the N32 and N64 ABIs, this is not an
issue, as no stack adjustment has to be performed.

For the O32 ABI, the correctness reasoning is different. In the case of a
sufficently large byval argument, registers a0..a3 are going to be used for
the callee's arguments, mandating the creation of the reserved argument area.
The call to memcpy in the naive case will also create its own reserved
argument area. However, since the reserved argument area consists of undefined
values, both calls can use the same reserved argument area.

Reviewers: abeserminji, atanasyan

Differential Revision: https://reviews.llvm.org/D44296

llvm-svn: 327388
2018-03-13 12:50:03 +00:00
Simon Pilgrim 93bd7187f4 [X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them.
llvm-svn: 327385
2018-03-13 12:22:58 +00:00
Jonas Paulsson 5612bb292c [CodeGenPrepare] Respect endianness in splitMergedValStore.
splitMergedValStore will split a store into two if target prefers this, or if
-force-split-store is passed.

This patch adds the missing handling for endianness in this function along
with a test case.

Review: Eli Friedman
https://reviews.llvm.org/D44396

llvm-svn: 327375
2018-03-13 08:36:20 +00:00
Yonghong Song c88bcdec43 bpf: Extends zero extension elimination beyond comparison instructions
The current zero extension elimination was restricted to operands of
comparison. It actually could be extended to more cases.

For example:

  int *inc_p (int *p, unsigned a)
  {
    return p + a;
  }

'a' will be promoted to i64 during addition, and the zero extension could
be eliminated as well.

For the elimination optimization, it should be much better to start
recognizing the candidate sequence from the SRL instruction instead of J*
instructions.

This patch makes it an generic zero extension elimination pass instead of
one restricted with comparison.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327367
2018-03-13 06:47:03 +00:00
Yonghong Song 905d13c123 bpf: J*_RR should check both operands
There is a mistake in current code that we "break" out the optimization
when the first operand of J*_RR doesn't qualify the elimination. This
caused some elimination opportunities missed, for example the one in the
testcase.

The code should just fall through to handle the second operand.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327366
2018-03-13 06:47:02 +00:00
Yonghong Song 89e47ac671 bpf: Tighten subregister definition check
The current subregister definition check stops after the MOV_32_64
instruction.

This means we are thinking all the following instruction sequences
are safe to be eliminated:

  MOV_32_64 rB, wA
  SLL_ri    rB, rB, 32
  SRL_ri    rB, rB, 32

However, this is *not* true. The source subregister wA of MOV_32_64 could
come from a implicit truncation of 64-bit register in which case the high
bits of the 64-bit register is not zeroed, therefore we can't eliminate
above sequence.

For example, for i32_val, we shouldn't do the elimination:

  long long bar ();

  int foo (int b, int c)
  {
    unsigned int i32_val = (unsigned int) bar();

    if (i32_val < 10)
      return b;
    else
      return c;
  }

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327365
2018-03-13 06:47:00 +00:00
Yonghong Song fddb9f4e28 bpf: Add more check directives in peephole testcase
Improve the test accuracy by adding more check directives.

Shifts are expected to be eliminated for zero extension but not for signed
extension.

Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 327364
2018-03-13 06:46:59 +00:00
Craig Topper 80058e30cc [LegalizeTypes] In SplitVecOp_TruncateHelper, use GetSplitVector on the input instead of creating new extract_subvectors.
llvm-svn: 327355
2018-03-13 01:17:40 +00:00
Martin Storsjo 7bc64bd889 [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Differential Revision: https://reviews.llvm.org/D44355

llvm-svn: 327316
2018-03-12 18:47:43 +00:00
Krzysztof Parzyszek 36ca823f76 [Hexagon] Fix typo in testcase
llvm-svn: 327310
2018-03-12 18:29:47 +00:00
Krzysztof Parzyszek 2d08f2ebf8 [Hexagon] Counting leading/trailing bits is cheap
llvm-svn: 327308
2018-03-12 18:18:23 +00:00
Krzysztof Parzyszek 5d41cc19bd [Hexagon] Subtarget feature to emit one instruction per packet
This adds two features: "packets", and "nvj".

Enabling "packets" allows the compiler to generate instruction packets,
while disabling it will prevent it and disable all optimizations that
generate them. This feature is enabled by default on all subtargets.
The feature "nvj" allows the compiler to generate new-value jumps and it
implies "packets". It is enabled on all subtargets.

The exception is made for packets with endloop instructions, since they
require a certain minimum number of instructions in the packets to which
they apply. Disabling "packets" will not prevent hardware loops from
being generated.

llvm-svn: 327302
2018-03-12 17:47:46 +00:00
Yaxun Liu a99e7d8e44 [AMDGPU] Fix lowering enqueue kernel when kernel has no name
Since the enqueued kernels have internal linkage, their names may be dropped.
In this case, give them unique names __amdgpu_enqueued_kernel or
__amdgpu_enqueued_kernel.n where n is a sequential number starting from 1.

Differential Revision: https://reviews.llvm.org/D44322

llvm-svn: 327291
2018-03-12 16:34:06 +00:00
Krzysztof Parzyszek ea2324f882 [Hexagon] Add REQUIRES: asserts to testcases that use -stats
llvm-svn: 327281
2018-03-12 15:20:36 +00:00
Krzysztof Parzyszek 5b39f7cfef [Hexagon] Add REQUIRES: asserts to testcases that use -debug-only
llvm-svn: 327279
2018-03-12 15:11:16 +00:00
Dmitry Preobrazhensky da4a7c01bf [AMDGPU][MC] Corrected GATHER4 opcodes
See bug 36252: https://bugs.llvm.org/show_bug.cgi?id=36252

Differential Revision: https://reviews.llvm.org/D43874

Reviewers: artem.tamazov, arsenm
llvm-svn: 327278
2018-03-12 15:03:34 +00:00
Krzysztof Parzyszek 046090db53 [Hexagon] Add more lit tests
llvm-svn: 327271
2018-03-12 14:01:28 +00:00
Matt Arsenault 7b9ed89dcf AMDGPU/GlobalISel: Legality and RegBankInfo for G_{INSERT|EXTRACT}_VECTOR_ELT
llvm-svn: 327269
2018-03-12 13:35:53 +00:00
Matt Arsenault c0aefd561e AMDGPU/GlobalISel: InstrMapping for G_MERGE_VALUES
llvm-svn: 327268
2018-03-12 13:35:49 +00:00
Matt Arsenault 503afda95f AMDGPU/GlobalISel: Make some G_MERGE_VALUEs legal
llvm-svn: 327267
2018-03-12 13:35:43 +00:00
Simon Pilgrim 6618e2a09c [X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3
llvm-svn: 327259
2018-03-12 12:30:04 +00:00
Simon Pilgrim d09cc9c62c [X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222)
64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type.

This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics.

We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover.

Differential Revision: https://reviews.llvm.org/D43618

llvm-svn: 327247
2018-03-11 19:22:13 +00:00
Simon Pilgrim 55ed3dc676 [X86][AVX512] Added more non-VLX test cases
Cleaned up check prefixes so that they actually share a bit more

llvm-svn: 327246
2018-03-11 18:28:37 +00:00
Simon Pilgrim 30f74c14ff [X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen
XOP was already doing this, and now AVX performs v32i8 variable permutes as well.

llvm-svn: 327245
2018-03-11 17:23:54 +00:00
Simon Pilgrim b306501796 [X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type
llvm-svn: 327244
2018-03-11 17:00:46 +00:00
Simon Pilgrim 9a5d0c7540 [X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes
Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size.

llvm-svn: 327242
2018-03-11 16:28:11 +00:00
Simon Pilgrim f9cc80d218 [X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes
As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi.

For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends.

llvm-svn: 327239
2018-03-11 11:52:26 +00:00
Simon Pilgrim 2565bd421e [X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64
Permutes in the upper elements will be undefined, but they will be discarded anyway.

llvm-svn: 327238
2018-03-11 11:19:19 +00:00
George Burgess IV 2282cc2a09 Revert r327199: "Clean up a temp file on the buildbots"
"I'll revert this tomorrow," I said yesterday. This should've reached
all the bots it can by now.

llvm-svn: 327230
2018-03-10 23:22:46 +00:00
Craig Topper d88204fe1b [X86] Add comments to the end of FMA3 instructions to make the operation clear
Summary:
There are 3 different operand orders for FMA instructions so figuring out the exact operation being performed requires a lot of thought.

This patch adds a comment to the end of the assembly line to print the exact operation.

I think I've got all the instructions in here except the ones with builtin rounding.

I didn't update all tests, but I assume we can get them as we regenerate tests in the future.

Reviewers: spatel, v_klochkov, RKSimon

Reviewed By: spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D44345

llvm-svn: 327225
2018-03-10 21:30:46 +00:00
Simon Pilgrim de7f3f0f91 [X86][XOP] createVariablePermute - use VPERMIL2 for v8i32/v4i64 variable permutes
llvm-svn: 327222
2018-03-10 19:49:59 +00:00
Martin Storsjo cc24096d4d [AArch64] Implement native TLS for Windows
Differential Revision: https://reviews.llvm.org/D43971

llvm-svn: 327220
2018-03-10 19:05:21 +00:00
Simon Pilgrim ff1248f82f [X86][XOP] createVariablePermute - use VPPERM for v16i16 variable permutes
llvm-svn: 327218
2018-03-10 18:33:29 +00:00
Simon Pilgrim 8224241f75 [X86][XOP] createVariablePermute - use VPPERM for v32i8 variable permutes
llvm-svn: 327213
2018-03-10 16:51:45 +00:00
Sanjay Patel 3b36bb0362 [AMDGPU] fix tests to be independent of FP undef
llvm-svn: 327211
2018-03-10 16:39:59 +00:00
Sanjay Patel 99926101c2 [PowerPC] fix tests to be independent of FP undef
llvm-svn: 327210
2018-03-10 16:14:05 +00:00
Matt Arsenault cbda7ff4ae AMDGPU: Fix crash when constant folding with physreg operand
llvm-svn: 327209
2018-03-10 16:05:35 +00:00
Craig Topper 9804c67d21 [X86] Rewrite printMasking code in X86InstComments to use TSFlags to determine whether the instruction is masked.
This should have been NFC, but it looks like we were missing PUNPCKLHQDQ/PUNPCKLQDQ instructions in there.

llvm-svn: 327200
2018-03-10 03:12:00 +00:00
George Burgess IV 088809def1 Clean up a temp file on the buildbots.
r327100 made us stop producing vecreduce-propagate-sd-flags.s, but it's
still sticking around on some bots. This makes the bots unhappy.

I'll revert this tomorrow.

llvm-svn: 327199
2018-03-10 02:51:10 +00:00
Rafael Espindola 63c378d343 Go back to sometimes assuming intristics are local.
This fixes pr36674.

While it is valid for shouldAssumeDSOLocal to return false anytime,
always returning false for intrinsics is not optimal on i386 and also
hits a bug in the backend.

To use a plt, the caller must first setup ebx to handle the case of
that file being linked into a PIE executable or shared library. In
those cases the generated PLT uses ebx.

Currently we can produce "calll expf@plt" without setting ebx. We
could fix that by correctly setting ebx, but this would produce worse
code for the case where the runtime library is statically linked. It
would also required other tools to handle R_386_PLT32.

llvm-svn: 327198
2018-03-10 02:42:14 +00:00
Nirav Dave 042678bd55 Revert: r327172 "Correct load-op-store cycle detection analysis"
r327171 "Improve Dependency analysis when doing multi-node Instruction Selection"
        r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection"

Reverting patch as NodeId invariant change is causing pathological
increases in compile time on PPC

llvm-svn: 327197
2018-03-10 02:16:15 +00:00
Craig Topper f6ff51fc62 [TwoAddressInstructionPass] Improve tryInstructionCommute of X86 FMA and vpternlog instructions
These instructions have 3 operands that can be commuted. The first commute we find may not be the best. So we should keep searching if we performed an aggressive commute. There may still be an operand that is killed or a physical register constraint that might be better.

Differential Revision: https://reviews.llvm.org/D44324

llvm-svn: 327188
2018-03-09 23:36:58 +00:00
Nirav Dave 0fab41782d Correct load-op-store cycle detection analysis
Add missing cycle dependency checks in load-op-store fusion.

Fixes PR36274.

Reviewers: craig.topper, bogner

Subscribers: hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D43154

llvm-svn: 327172
2018-03-09 20:58:07 +00:00
Nirav Dave d668f69ee7 Improve Dependency analysis when doing multi-node Instruction Selection
Relanding after fixing NodeId Invariant.

Cleanup cycle/validity checks in ISel (IsLegalToFold,
HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full
search for cycles / dependencies pruning the search when topological
property of NodeId allows.

As part of this propogate the NodeId-based cutoffs to narrow
hasPreprocessorHelper searches.

Reviewers: craig.topper, bogner

Subscribers: llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D41293

llvm-svn: 327171
2018-03-09 20:57:42 +00:00
Nirav Dave 071699bf82 [DAG] Enforce stricter NodeId invariant during Instruction selection
Instruction Selection makes use of the topological ordering of nodes
by node id (a node's operands have smaller node id than it) when doing
cycle detection.  During selection we may violate this property as a
selection of multiple nodes may induce a use dependence (and thus a
node id restriction) between two unrelated nodes. If a selected node
has an unselected successor this may allow us to miss a cycle in
detection an invalid selection.

This patch fixes this by marking all unselected successors of a
selected node have negated node id.  We avoid pruning on such negative
ids but still can reconstruct the original id for pruning.

In-tree targets have been updated to replace DAG-level replacements
with ISel-level ones which enforce this property.

This preemptively fixes PR36312 before triggering commit r324359 relands

Reviewers: craig.topper, bogner, jyknight

Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya

Differential Revision: https://reviews.llvm.org/D43198

llvm-svn: 327170
2018-03-09 20:57:15 +00:00
Peter Collingbourne 2974856ad4 Use branch funnels for virtual calls when retpoline mitigation is enabled.
The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the
branch predictor, and as a result it can lead to a measurable loss of
performance. We can reduce the performance impact of retpolined virtual
calls by replacing them with a special construct known as a branch
funnel, which is an instruction sequence that implements virtual calls
to a set of known targets using a binary tree of direct branches. This
allows the processor to speculately execute valid implementations of the
virtual function without allowing for speculative execution of of calls
to arbitrary addresses.

This patch extends the whole-program devirtualization pass to replace
certain virtual calls with calls to branch funnels, which are
represented using a new llvm.icall.jumptable intrinsic. It also extends
the LowerTypeTests pass to recognize the new intrinsic, generate code
for the branch funnels (x86_64 only for now) and lay out virtual tables
as required for each branch funnel.

The implementation supports full LTO as well as ThinLTO, and extends the
ThinLTO summary format used for whole-program devirtualization to
support branch funnels.

For more details see RFC:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html

Differential Revision: https://reviews.llvm.org/D42453

llvm-svn: 327163
2018-03-09 19:11:44 +00:00
Simon Pilgrim 2cd489feb2 [X86][AVX] createVariablePermute - fix v2i64/v2f64 VPERMILPD index creation.
The input indices vector will put the index in bit0, but VPERMILPD actually selects off bit1 - so we need to scale accordingly.

llvm-svn: 327159
2018-03-09 18:37:56 +00:00
Farhana Aleen a7cb31123c [AMDGPU] Supported ds_read_b128 generation; Widened vector length for local address-space.
Summary: Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64.
         This patch supports ds_read_b128 instruction pattern and generation of this instruction.
         In the vectorizer, this patch also widen the vector length so that vectorizer generates
         128 bit loads for local address-space which gets translated to ds_read_b128.
         Since the performance benefit is not clear; compiler generates ds_read_b128 under -amdgpu-ds128.

Author: FarhanaAleen

Reviewed By: rampitec, arsenm

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44210

llvm-svn: 327153
2018-03-09 17:41:39 +00:00
Sanjay Patel 56d59c1f0f [AMDGPU] fix test to be independent of FP undef
llvm-svn: 327147
2018-03-09 16:33:34 +00:00
Stefan Pintilie 7f879a8467 Revert "[PowerPC] Move test to correct location."
Revert part of the LSR tune commit.

llvm-svn: 327142
2018-03-09 16:08:48 +00:00
Sebastian Pop b4bd0a404f [x86][aarch64] ask the backend whether it has a vector blend instruction
The code to match and produce more x86 vector blends was enabled for all
architectures even though the transform may pessimize the code for other
architectures that do not provide a vector blend instruction.

Added an aarch64 testcase to check that a VZIP instruction is generated instead
of byte movs.

Differential Revision: https://reviews.llvm.org/D44118

llvm-svn: 327132
2018-03-09 14:29:21 +00:00
Martin Storsjo ccac66da83 [AArch64] Fix use of a regex in the win-alloca.ll test. NFC.
Check that the variable actually is the same as the one previously
matched.

llvm-svn: 327107
2018-03-09 09:45:37 +00:00
Stanislav Mekhanoshin c8127fc674 [AMDGPU] Fixed V_DIV_FIXUP_F16 selection on GFX9
GFX9 should select opsel version.

Differential Revision: https://reviews.llvm.org/D44279

llvm-svn: 327106
2018-03-09 07:21:43 +00:00
Matt Morehouse 5a91fc6f74 Attempt to fix vecreduce-propagate-sd-flags.ll test.
Buildbots have been failing since r327079.

llvm-svn: 327100
2018-03-09 02:04:30 +00:00
Craig Topper 784f1bbf5e [X86] Remove SRAs from v16i8 multiply lowering on sse2 targets
Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb

Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does.

Differential Revision: https://reviews.llvm.org/D44267

llvm-svn: 327093
2018-03-09 01:22:31 +00:00
Sameer AbuAsal 986865c090 Propagate flags to SDValue in SplitVecOp_VECREDUCE
This patch is a fix for PR36642.

 While legalizing long vector types, make sure the smaller types get the
 flags of the wider type.

 bugzilla link: https://bugs.llvm.org/show_bug.cgi?id=36642

Change-Id: I0c2829639f094c862c10a6b51b342d4c2563e1fa
llvm-svn: 327079
2018-03-08 23:41:40 +00:00
Sanjay Patel 672ad3269b [AMDGPU] fix test to survive more FP undef constant folding
llvm-svn: 327066
2018-03-08 21:30:56 +00:00
Krzysztof Parzyszek 480ab2bbc4 [Hexagon] Ignore indexed loads when handling unaligned loads
llvm-svn: 327037
2018-03-08 18:15:13 +00:00
Sanjay Patel 7325d12f58 [AMDGPU] fix test to survive the most basic undef constant folding
This will likely need to be changed again for anything more than:
fmul undef, undef -> undef

llvm-svn: 327034
2018-03-08 17:34:25 +00:00
Sanjay Patel 0cdccf5f37 [x86] fix test to be independent of FP undef
llvm-svn: 327030
2018-03-08 17:24:30 +00:00
Sanjay Patel af2c4185a2 [x86] regenerate checks; NFC
This test will fail if we fix FP undef constant folding.

llvm-svn: 327026
2018-03-08 16:56:49 +00:00
Craig Topper a406796f5f [X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32.
This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX.

I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that.

I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that.

llvm-svn: 326991
2018-03-08 08:02:52 +00:00
Craig Topper 7ff9779768 [X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates.
These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned.

I believe these were introduced in r312450.

llvm-svn: 326967
2018-03-08 00:21:17 +00:00
Sebastian Pop 33bdb3e0e6 [AArch64] add missing pattern for insert_subvector undef
The attached testcase started failing after the patch to define
isExtractSubvectorCheap with the following pattern mismatch:

ISEL: Starting pattern match
  Initial Opcode index to 85068
    Match failed at index 85076
    LLVM ERROR: Cannot select: t47: v8i16 = insert_subvector undef:v8i16, t43, Constant:i64<0>

The code generated from llvm/lib/Target/AArch64/AArch64InstrInfo.td

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i32 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

is in ninja/lib/Target/AArch64/AArch64GenDAGISel.inc
At the location of the error it is:
/* 85076*/    OPC_CheckChild2Type, MVT::i32,

And it failed to match the type of operand 2.
Adding another def-pat for i64 fixes the failed def-pat error:

def : Pat<(insert_subvector undef, (v4i16 FPR64:$src), (i64 0)),
          (INSERT_SUBREG (v8i16 (IMPLICIT_DEF)), FPR64:$src, dsub)>;

llvm-svn: 326949
2018-03-07 22:07:13 +00:00
Simon Pilgrim dc1a0385ee [X86][SSE] Regenerate float maxnum/minnum tests
llvm-svn: 326930
2018-03-07 19:14:05 +00:00
Roorda, Jan-Willem 4b8bcf007b [Pipeliner] Fixed node order issue related to zero latency edges
Summary:
A desired property of the node order in Swing Modulo Scheduling is
that for nodes outside circuits the following holds: none of them is
scheduled after both a successor and a predecessor. We call
node orders that meet this property valid.

Although invalid node orders do not lead to the generation of incorrect
code, they can cause the pipeliner not being able to find a pipelined schedule
for arbitrary II. The reason is that after scheduling the successor and the
predecessor of a node, no room may be left to schedule the node itself.

For data flow graphs with 0-latency edges, the node ordering algorithm
of Swing Modulo Scheduling can generate such undesired invalid node orders.
This patch fixes that.

In the remainder of this commit message, I will give an example
demonstrating the issue, explain the fix, and explain how the the fix is tested.

Consider, as an example, the following data flow graph with all
edge latencies 0 and all edges pointing downward.

```
   n0
  /  \
n1    n3
  \  /
   n2
    |
   n4
```

Consider the implemented node order algorithm in top-down mode. In that mode,
the algorithm orders the nodes based on greatest Height and in case of equal
Height on lowest Movability. Finally, in case of equal Height and
Movability, given two nodes with an edge between them, the algorithm prefers
the source-node.

In the graph, for every node, the Height and Movability are equal to 0.
As will be explained below, the algorithm can generate the order n0, n1, n2, n3, n4.
So, node n3 is scheduled after its predecessor n0 and after its successor n2.

The reason that the algorithm can put node n2 in the order before node n3,
even though they have an edge between them in which node n3 is the source,
is the following: Suppose the algorithm has constructed the partial node
order n0, n1. Then, the nodes left to be ordered are nodes n2, n3, and n4. Suppose
that the while-loop in the implemented algorithm considers the nodes in
the order n4, n3, n2. The algorithm will start with node n4, and look for
more preferable nodes. First, node n4 will be compared with node n3. As the nodes
have equal Height and Movability and have no edge between them, the algorithm
will stick with node n4. Then node n4 is compared with node n2. Again the
Height and Movability are equal. But, this time, there is an edge between
the two nodes, and the algorithm will prefer the source node n2.
As there are no nodes left to compare, the algorithm will add node n2 to
the node order, yielding the partial node order n0, n1, n2. In this way node n2
arrives in the node-order before node n3.

To solve this, this patch introduces the ZeroLatencyHeight (ZLH) property
for nodes. It is defined as the maximum unweighted length of a path from the
given node to an arbitrary node in which each edge has latency 0.
So, ZLH(n0)=3, ZLH(n1)=ZLH(n3)=2, ZLH(n2)=1, and ZLH(n4)=0

In this patch, the preference for a greater ZeroLatencyHeight
is added in the top-down mode of the node ordering algorithm, after the
preference for a greater Height, and before the preference for a
lower Movability.

Therefore, the two allowed node-orders are n0, n1, n3, n2, n4 and n0, n3, n1, n2, n4.
Both of them are valid node orders.

In the same way, the bottom-up mode of the node ordering algorithm is adapted
by introducing the ZeroLatencyDepth property for nodes.

The patch is tested by adding extra checks to the following existing
lit-tests:
test/CodeGen/Hexagon/SUnit-boundary-prob.ll
test/CodeGen/Hexagon/frame-offset-overflow.ll
test/CodeGen/Hexagon/vect/vect-shuffle.ll

Before this patch, the pipeliner failed to pipeline the loops in these tests
due to invalid node-orders. After the patch, the pipeliner successfully
pipelines all these loops.

Reviewers: bcahoon

Reviewed By: bcahoon

Subscribers: Ayal, mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D43620

llvm-svn: 326925
2018-03-07 18:53:36 +00:00
Stefan Pintilie f8c2dce236 [PowerPC] Move test to correct location.
Test was added in r326906 to an incorrect location.
Moving the test to PPC CodeGen directory as the test is PPC specific.

llvm-svn: 326923
2018-03-07 18:27:10 +00:00
Craig Topper c3c15dd640 [X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal.
The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors.

While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal.

Differential Revision: https://reviews.llvm.org/D44190

llvm-svn: 326917
2018-03-07 17:53:18 +00:00
Krzysztof Parzyszek 2c3edf0567 [Hexagon] Rewrite non-HVX unaligned loads as pairs of aligned ones
This is a follow-up to r325169, this time for all types, not just HVX
vector types.

Disable this by default, since it's not always safe. 

llvm-svn: 326915
2018-03-07 17:27:18 +00:00
Farhana Aleen 89196642f7 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache;
         loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D44179

llvm-svn: 326910
2018-03-07 17:09:18 +00:00
Farhana Aleen 347d12b4ce Revert "[AMDGPU] Widened vector length for global/constant address space."
This reverts commit ce988cc100dc65e7c6c727aff31ceb99231cab03.

llvm-svn: 326907
2018-03-07 16:55:27 +00:00
Farhana Aleen 0d03d0588d [AMDGPU] Widened vector length for global/constant address space.
llvm-svn: 326904
2018-03-07 16:29:05 +00:00
Alexander Kornienko e12a48bcc0 Revert "Reapply "[DWARFv5] Emit file 0 to the line table.""
This reverts commit r326839.

r326839 breaks assembly file parsing:

$ cat q.c
void g() {}
$ clang -S q.c -g
$ clang -g -c q.s
q.s:9:2: error: file number already allocated
     .file   1 "/tmp/test" "q.c"
     ^

llvm-svn: 326902
2018-03-07 16:27:44 +00:00
Simon Pilgrim eab108ba39 [X86][X87] Add X87 fp80 conversion tests
llvm-svn: 326897
2018-03-07 14:13:14 +00:00
Sjoerd Meijer af30f06d5c [ARM] Fix for PR36577
Don't PerformSHLSimplify if the given node is used by a node that also uses a
constant because we may get stuck in an infinite combine loop.

bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577

Patch by Sam Parker.

Differential Revision: https://reviews.llvm.org/D44097

llvm-svn: 326882
2018-03-07 09:10:44 +00:00
Paul Robinson 4428e90efa Reapply "[DWARFv5] Emit file 0 to the line table."
Fixes the bug found by asan. Also XFAIL the new test for Darwin,
which is stuck on DWARF v2, and fix up other tests so they stop
failing on Windows.

llvm-svn: 326839
2018-03-06 22:37:45 +00:00
Simon Pilgrim ca38c762e4 [TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorElts
Notably helps cleanup after legalization of vector types

Differential Revision: https://reviews.llvm.org/D43674

llvm-svn: 326838
2018-03-06 22:32:01 +00:00
Krzysztof Parzyszek 18484de34b [Hexagon] Update more testcases
llvm-svn: 326830
2018-03-06 19:15:58 +00:00
Krzysztof Parzyszek c9f797fdd0 [Hexagon] Remove {{ *}} from testcases
The spaces in the instructions are now consistent.

llvm-svn: 326829
2018-03-06 19:07:21 +00:00
Craig Topper 274e08dd81 [X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode
We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode.

Differential Revision: https://reviews.llvm.org/D44031

llvm-svn: 326826
2018-03-06 18:56:33 +00:00
Sebastian Pop 41073e8046 [AArch64] define isExtractSubvectorCheap
Following the ARM-neon backend, define isExtractSubvectorCheap to return true
when extracting low and high part of a neon register.

The patch disables a test in llvm/test/CodeGen/AArch64/arm64-ext.ll This
testcase is fragile in the sense that it requires a BUILD_VECTOR to "survive"
all DAG transforms until ISelLowering. The testcase is supposed to check that
AArch64TargetLowering::ReconstructShuffle() works, and for that we need a
BUILD_VECTOR in ISelLowering. As we now transform the BUILD_VECTOR earlier into
an VEXT + vector_shuffle, we don't have the BUILD_VECTOR pattern when we get to
ISelLowering. As there is no way to disable the combiner to only exercise the
code in ISelLowering, the patch disables the testcase.

Differential revision: https://reviews.llvm.org/D43973

llvm-svn: 326811
2018-03-06 16:54:55 +00:00
Yaxun Liu 46439e8d4a [AMDGPU] Fix lowering OpenCL enqueue_kernel
One addrspacecast disappeared in clang emitted IR for
block invoke function due to adoption of the new
addr space mapping.

Differential Revision: https://reviews.llvm.org/D43785

llvm-svn: 326806
2018-03-06 16:04:39 +00:00
Dylan McKay 8f46486c65 [AVR] Remove the earlyclobber flag from LDDWRdYQ
Before I started maintaining the AVR backend, this instruction
never originally used to have an earlyclobber flag.

Some time afterwards (years ago), I must've added it back in, not realising that it
was left out for a reason.

This pseudo instrction exists solely to work around a long standing bug
in the register allocator.

Before this commit, the LDDWRdYQ pseudo was not actually working around
any bug. With the earlyclobber flag removed again, the LDDWRdYQ pseudo
now correctly works around PR13375 again.

llvm-svn: 326774
2018-03-06 11:20:25 +00:00
Martin Storsjo a7adc3185b [X86] Handle EAX being live when calling chkstk for x86_64
EAX can turn out to be alive here, when shrink wrapping is done
(which is allowed when using dwarf exceptions, contrary to the
normal case with WinCFI).

This fixes PR36487.

Differential Revision: https://reviews.llvm.org/D43968

llvm-svn: 326764
2018-03-06 06:00:13 +00:00
Paul Robinson 732e443bb9 Revert "[DWARFv5] Emit file 0 to the line table."
Caused an asan failure.

This reverts commit d54883f081186cdcce74e6f98cfc0438579ec019.
aka r326758

llvm-svn: 326762
2018-03-06 03:15:21 +00:00
Paul Robinson d5069ba3da [DWARFv5] Emit file 0 to the line table.
DWARF v5 specifies that the root file (also given in the DW_AT_name
attribute of the compilation unit DIE) should be emitted explicitly to
the line table's list of files.  This makes the line table more
independent of the .debug_info section.

Differential Revision: https://reviews.llvm.org/D44054

llvm-svn: 326758
2018-03-06 01:59:56 +00:00
Volkan Keles 2bc42e90ed GlobalISel: IRTranslate llvm.fabs.* intrinsic
Summary:
Fabs is a common floating-point operation, especially for some expansions. This patch adds
a new generic opcode for llvm.fabs.* intrinsic in order to avoid building/matching this intrinsic.

Reviewers: qcolombet, aditya_nandakumar, dsanders, rovka

Reviewed By: aditya_nandakumar

Subscribers: kristof.beyls, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D43864

llvm-svn: 326749
2018-03-05 22:31:55 +00:00
Dylan McKay 6119b79a88 [AVR] Fix the test suite after r326500.
r326500 subtly changed the way the instructions are printed.

llvm-svn: 326742
2018-03-05 20:56:25 +00:00
Nemanja Ivanovic 6cc31ca814 [PowerPC] Do not emit record-form rotates when record-form andi suffices
Up until Power9, the performance profile for rlwinm., rldicl. and andi. looked
more or less equivalent. However with Power9, the rotates are still 2-way
cracked whereas the and-immediate is not.

This patch just ensures that we don't emit record-form rotates when an andi.
is adequate.

As first pointed out by Carrot in https://bugs.llvm.org/show_bug.cgi?id=30833
(this patch is a fix for that PR).

Differential Revision: https://reviews.llvm.org/D43977

llvm-svn: 326736
2018-03-05 19:27:16 +00:00
Sanjay Patel 77ae82b84a [x86] auto-generate full checks for fabs tests
Also, change the x86-64 test to optimized and remove the 
unnecessary platform specification from the RUN lines..

llvm-svn: 326735
2018-03-05 19:11:20 +00:00
Evandro Menezes f29e35afff [AArch64] Harden test case
NFC

llvm-svn: 326724
2018-03-05 17:42:18 +00:00
Sebastian Pop ac0bfb5938 fix PR36582
The error occurs when reading i16 elements (as in the testcase) from a v8i8
with a pattern of <0,2,4,6>. As all the data in the vector is accessed, the
operation is not a VUZP. The patch stops the pattern recognition of VUZP when
EXTRACT_VECTOR_ELT has a different element type than BUILD_VECTOR.

llvm-svn: 326722
2018-03-05 17:35:49 +00:00
Evandro Menezes cd855f70c5 [AArch64] Improve code generation of constant vectors
Use the whole gammut of constant immediates available to set up a vector.
Instead of using, for example, `mov w0, #0xffff; dup v0.4s, w0`, which
transfers between register files, use the more efficient `movi v0.4s, #-1`
instead.  Not limited to just a few values, but any immediate value that can
be encoded by all the variants of `FMOV`, `MOVI`, `MVNI`, thus eliminating
the need to there be patterns to optimize special cases.

Differential revision: https://reviews.llvm.org/D42133

llvm-svn: 326718
2018-03-05 17:02:47 +00:00
Matt Arsenault e31ab94e97 AMDGPU/GlobalISel: Add InstrMapping for G_EXTRACT
llvm-svn: 326715
2018-03-05 16:25:18 +00:00
Matt Arsenault 71272e6d4e AMDGPU/GlobalISel: Make some G_EXTRACTs legal
As far as I can tell legalization of weird sizes for the
output type isn't implemented.

llvm-svn: 326714
2018-03-05 16:25:15 +00:00
Alexander Timofeev 2e5eeceeb7 Pass Divergence Analysis data to Selection DAG to drive divergence
dependent instruction selection.

Differential revision: https://reviews.llvm.org/D35267

llvm-svn: 326703
2018-03-05 15:12:21 +00:00
Craig Topper f2aae62228 [X86] Add a DAG combine to turn stores of vXi1 constants into scalar stores.
llvm-svn: 326679
2018-03-04 19:33:15 +00:00
Craig Topper 1209eb7d66 [X86] Add a 32-bit mode command line to avx512-mask-op.ll. Add tests for storing v2i1 and v4i1 constants.
llvm-svn: 326678
2018-03-04 19:33:13 +00:00
Craig Topper 4196dd12a2 [DAGCombiner] Add a peekThroughBitcast to MergeStoresOfConstantsOrVecElts to fix a crash if we are storing a bitcast of a constant.
Loading a constant into a k-register in AVX512 requires a bitcast from a scalar constant. In the test case here we have a k-register store that gets split into multiple parts of KNL. MergeConsecutiveStores sees each of these pieces as a consecutive store and looks through the bitcast to find the underly scalar constant. But when we went to create the combined store we didn't look through the same bitcast.

llvm-svn: 326677
2018-03-04 18:51:46 +00:00
Simon Pilgrim 8197b04b9b [X86][X87] Add X87 folded integer arithmetic tests
Add tests for FIADD/FISUB/FISUBR/FIMUL/FIDIV/FIDIVR

Shows we have more FILD stack usage than necessary (arg load, spill, reload to x87)

llvm-svn: 326674
2018-03-04 15:00:19 +00:00
Craig Topper a476026f70 [X86] Combine (store (v1i1 (scalar_to_vector (i8 X)))) -> (store (i8 X)).
llvm-svn: 326670
2018-03-04 01:48:02 +00:00
Craig Topper be31585be8 [X86] Lower v1i1/v2i1/v4i1/v8i1 load/stores to i8 load/store during op legalization if AVX512DQ is not supported.
We were previously doing this with isel patterns. Moving it to op legalization gives us chance to see the required bitcast earlier. And it lets us remove some isel patterns.

llvm-svn: 326669
2018-03-04 01:48:00 +00:00
Craig Topper dbf75c9c79 [LegalizeVectorTypes] When scalarizing the operand of a unary op like TRUNC, use a SCALAR_TO_VECTOR rather than a single element BUILD_VECTOR to convert back to a vector type.
X86 considers v1i1 a legal type under AVX512 and as such a truncate from a v1iX type to v1i1 can be turned into a scalar truncate plus a conversion to v1i1. We would much prefer a v1i1 SCALAR_TO_VECTOR over a one element BUILD_VECTOR.

During lowering we were detecting the v1i1 BUILD_VECTOR as a splat BUILD_VECTOR like we try to do for v2i1/v4i1/etc. In this case we create (select i1 splat_elt, v1i1 all-ones, v1i1 all-zeroes). That goes through some more legalization and we end up with a CMOV choosing between 0 and 1 in scalar and a scalar_to_vector.

Arguably we could detect the v1i1 BUILD_VECTOR and do this better in X86 target code. But just using a SCALAR_TO_VECTOR in legalization is much easier.

llvm-svn: 326637
2018-03-02 23:27:50 +00:00
Krzysztof Parzyszek e3e963236a [Hexagon] Generate valignb for shifting shuffles (instead of vdelta)
llvm-svn: 326627
2018-03-02 22:22:19 +00:00
Ulrich Weigand 1785e244eb [SystemZ] Fix test cases after r326613
I forgot to check in the updated test cases after the r326613 commit.

llvm-svn: 326616
2018-03-02 21:22:42 +00:00
Ulrich Weigand 8b19be46c7 [SystemZ] Add support for anyregcc calling convention
This adds back-end support for the anyregcc calling convention
for use with patchpoints.

Since all registers are considered call-saved with anyregcc
(except for 0 and 1 which may still be clobbered by PLT stubs
and the like), this required adding support for saving and
restoring vector registers in prologue/epilogue code for the
first time.  This is not used by any other calling convention.

llvm-svn: 326612
2018-03-02 20:40:11 +00:00
Ulrich Weigand 5eb64110d2 [SystemZ] Support stackmaps and patchpoints
This adds back-end support for the @llvm.experimental.stackmap and
@llvm.experimental.patchpoint intrinsics.

llvm-svn: 326611
2018-03-02 20:39:30 +00:00