Commit Graph

10741 Commits

Author SHA1 Message Date
Craig Topper 726968d6a2 [X86] Support v32i16/v64i8 CTLZ using lookup table.
Had to tweak the setcc's used by the code to use a vXi1 result type with a sign extend back to vector size.

llvm-svn: 318871
2017-11-22 20:05:57 +00:00
Craig Topper ba150ef60a [X86] Allow vpclmulqdq instructions to be commuted during isel to allow load folding.
The commuting patterns for the AVX version actually still had priority over the new patterns.

llvm-svn: 318800
2017-11-21 21:05:21 +00:00
Coby Tayree 5c7fe5df53 [x86][icelake]BITALG
vpopcnt{b,w}
Differential Revision: https://reviews.llvm.org/D40213

llvm-svn: 318748
2017-11-21 10:32:42 +00:00
Coby Tayree 3880f2a363 [x86][icelake]VNNI
Introducing Vector Neural Network Instructions, consisting of:
vpdpbusd{s}
vpdpwssd{s}
Differential Revision: https://reviews.llvm.org/D40208

llvm-svn: 318746
2017-11-21 10:04:28 +00:00
Coby Tayree 71e37cc9ff [x86][icelake]vbmi2
introducing vbmi2, consisting of
vpcompress{b,w}
vpexpand{b,w}
vpsh{l,r}d{w,d,q}
vpsh{l,r}dv{w,d,q}
Differential Revision: https://reviews.llvm.org/D40206

llvm-svn: 318745
2017-11-21 09:48:44 +00:00
Coby Tayree 7ca5e58736 [x86][icelake]vpclmulqdq introduction
an icelake promotion of pclmulqdq
Differential Revision: https://reviews.llvm.org/D40101

llvm-svn: 318741
2017-11-21 09:30:33 +00:00
Coby Tayree 2a1c02fcbc [x86][icelake]VAES introduction
an icelake promotion of AES
Differential Revision: https://reviews.llvm.org/D40078

llvm-svn: 318740
2017-11-21 09:11:41 +00:00
Craig Topper 67eb30ab60 [SelectionDAG] When promoting the result of a VSELECT, make sure we promote the condition to the SetCC type for the final result type not the original type.
Normally this would be cleaned up by promoting the condition operand next. But in the attached case we promoted the result from v2i48 to v2i64 and the condition from v2i1 to v2i48. Then we tried to "promote" the v2i48 condition back to v2i1 because that's what the SetCC result type for v2i64 is on X86 with VLX. But promote is either a NOP or SIGN_EXTEND and this would need a truncation.

With the change here we now get the SetCC type of v2i1 when we're handling the result promotion and the operand no longer needs to be promoted itself.

Fixes PR35272.

llvm-svn: 318706
2017-11-20 23:08:50 +00:00
Paul Robinson 746edea0ae Revert "Fix out-of-order stepping behavior in programs with sunk instructions."
This reverts commit 30419e150cd940893a13b345e85f96053850208f.
aka r318679.  It caused "sanitizer-windows" bot to fail.

llvm-svn: 318684
2017-11-20 19:07:52 +00:00
Paul Robinson f0b02965e4 Fix out-of-order stepping behavior in programs with sunk instructions.
MachineSink attempts to place instructions near the basic blocks where
they are needed.  Once an instruction has been sunk, its location
relative to other instructions is no longer consistent with the
original source code. In order to ensure correct single-stepping and
profiling, the debug location for sunk instructions is either merged
with the insertion point or erased if the target successor block is
empty.

Patch by Matthew Voss!

Differential Revision: https://reviews.llvm.org/D39933

llvm-svn: 318679
2017-11-20 18:42:17 +00:00
Mohammed Agabaria 115f68ea3e [LV][X86] Support of AVX2 Gathers code generation and update the LV with this
This patch depends on: https://reviews.llvm.org/D35348

Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen)
Update LoopVectorize to generate gathers for AVX2 processors.

Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb

Reviewed By: delena, RKSimon

Differential Revision: https://reviews.llvm.org/D35772

llvm-svn: 318641
2017-11-20 08:18:12 +00:00
Craig Topper 198f7d78d3 [X86] Regenerate a test with broadcast comments. NFC
llvm-svn: 318640
2017-11-20 08:15:04 +00:00
Sanjay Patel c0218923e1 [x86] add sqrt tests for partially-inline-libcalls (PR31455)
llvm-svn: 318630
2017-11-19 17:31:37 +00:00
Craig Topper bece74c694 [X86] Add test cases for rndscaless/sd intrinsics.
Also fix the memop in the ins for these instructions. Not sure what effect this has.

llvm-svn: 318624
2017-11-19 06:24:26 +00:00
Craig Topper 512e9e7f3f [X86] Improve load folding of scalar rcp28 and rsqrt28 instructions using sse_load_f32/f64.
llvm-svn: 318623
2017-11-19 05:42:54 +00:00
Craig Topper 9a94dfc457 [X86] Switch cannonlake to use the SkylakeServer scheduling model instead of Haswell.
Cannonlake comes after skylake and supports avx512 so this is probably a closer model for now.

llvm-svn: 318613
2017-11-19 01:25:30 +00:00
Craig Topper 81037f385e [X86] Add skeleton support for icelake CPU.
There are several patches out for review right now to implement Icelake features. This adds a CPU to collect them under.

llvm-svn: 318612
2017-11-19 01:12:00 +00:00
Simon Pilgrim 41fc45c4e6 [X86][AVX512VL] Add AVX512VL tests to the vselect packss tests.
PR34553 has gone, adding tests to ensure it doesn't come back.

vselect_packss_v16i64 still has some awful codegen on AVX512 targets....

llvm-svn: 318599
2017-11-18 19:47:59 +00:00
Craig Topper 65b8a2c9e1 [X86] Add another gather test with v8i8 sign extended indices.
This requires the indices to be legalized and sign extended.

llvm-svn: 318597
2017-11-18 19:25:35 +00:00
Sanjay Patel c0d1b2ee2e [x86] add tests for unnecessary shuffling; NFC
llvm-svn: 318592
2017-11-18 16:25:38 +00:00
Martin Storsjo 94b59240e2 [X86] Output cfi directives for saved XMM registers even if no GPRs are saved
This makes sure that functions that only clobber xmm registers
(on win64) also get the right cfi directives, if dwarf exceptions
are enabled.

Differential Revision: https://reviews.llvm.org/D40191

llvm-svn: 318591
2017-11-18 06:23:48 +00:00
Simon Pilgrim 7834009726 [X86] Merge scheduling tests for SHLD/SHRD
Reduces spsce used and makes it easier to compare the 2 values for the equivalent instructions

llvm-svn: 318541
2017-11-17 18:35:49 +00:00
Craig Topper 089082378f [X86] Add DAG combine to remove sext i32->i64 from gather/scatter instructions.
Only do this pre-legalize in case we're using the sign extend to legalize for KNL.

This recovers all of the tests that changed when I stopped SelectionDAGBuilder from deleting sign extends.

There's more work that could be done here particularly to fix the i8->i64 test case that experienced split.

llvm-svn: 318468
2017-11-16 23:09:06 +00:00
Craig Topper 1120bb9f59 [X86] Add gather test with index sign extended from i8 type.
Previously SelectionDAGBuilder would remove this sign extend leading to a failure during isel.

The codegen here isn't very nice as we ended up triggering a split.

llvm-svn: 318467
2017-11-16 23:09:03 +00:00
Craig Topper 242374e219 [X86] Don't remove sign extend of gather/scatter indices during SelectionDAGBuilder.
The sign extend might be from an i16 or i8 type and was inserted by InstCombine to match the pointer width. X86 gather legalization isn't currently detecting this to reinsert a sign extend to make things legal.

It's a bit weird for the SelectionDAGBuilder to do this kind of optimization in the first place. With this removed we can at least lean on InstCombine somewhat to ensure the index is i32 or i64.

I'll work on trying to recover some of the test cases by removing sign extends in the backend when its safe to do so with an understanding of the current legalizer capabilities.

This should fix PR30690.

llvm-svn: 318466
2017-11-16 23:08:57 +00:00
Craig Topper e85ff4f732 [X86] Pre-truncate gather/scatter indices that have element sizes larger than 64-bits before Legalize.
The wider element type will normally cause legalize to try to split and scalarize the gather/scatter, but we can't handle that. Instead, truncate the index early so the gather/scatter node is insulated from the legalization.

This really shouldn't happen in practice since InstCombine will normalize index types to the same size as pointers.

llvm-svn: 318452
2017-11-16 20:23:22 +00:00
Simon Pilgrim e8e6acdac9 [X86] Add scheduling tests for SHLD/SHRD
llvm-svn: 318402
2017-11-16 14:13:48 +00:00
Craig Topper 46a5d58b8c [X86] Update TTI to report that v1iX/v1fX types aren't legal for masked gather/scatter/load/store.
The type legalizer will try to scalarize these operations if it sees them, but there is no handling for scalarizing them. This leads to a fatal error. With this change they will now be scalarized by the mem intrinsic scalarizing pass before SelectionDAG.

llvm-svn: 318380
2017-11-16 06:02:05 +00:00
Simon Pilgrim 56415772d6 [X86] Add CBW/CDQ/CDQE/CQO/CWD/CWDE to WriteALU schedule class
Some CPUs are already overriding these sign extension instructions but we should be able to use the WriteALU schedule class by default.

Differential Revision: https://reviews.llvm.org/D39899

llvm-svn: 318308
2017-11-15 17:11:24 +00:00
Hans Wennborg 1403100b6b Fix switch-lower-peel-top-case.ll isel pass is not registered error
The test was doing -stop-after=isel, but that pass is actually the
AMDGPUDAGToDAGISel pass, which might not be built when targeting x86_64.
This changes the test to -stop-after=expand-isel-pseudos instead.

Follow-up to r318202.

llvm-svn: 318220
2017-11-14 23:30:28 +00:00
Aditya Nandakumar e6201c8724 [GISel]: Rework legalization algorithm for better elimination of
artifacts along with DCE

Legalization Artifacts are all those insts that are there to make the
type system happy. Currently, the target needs to say all combinations
of extends and truncs are legal and there's no way of verifying that
post legalization, we only have *truly* legal instructions. This patch
changes roughly the legalization algorithm to process all illegal insts
at one go, and then process all truncs/extends that were added to
satisfy the type constraints separately trying to combine trivial cases
until they converge. This has the added benefit that, the target
legalizerinfo can only say which truncs and extends are okay and the
artifact combiner would combine away other exts and truncs.

Updated legalization algorithm to roughly the following pseudo code.

WorkList Insts, Artifacts;
collect_all_insts_and_artifacts(Insts, Artifacts);

do {
  for (Inst in Insts)
         legalizeInstrStep(Inst, Insts, Artifacts);
  for (Artifact in Artifacts)
         tryCombineArtifact(Artifact, Insts, Artifacts);
} while(!Insts.empty());

Also, wrote a simple wrapper equivalent to SetVector, except for
erasing, it avoids moving all elements over by one and instead just
nulls them out.

llvm-svn: 318210
2017-11-14 22:42:19 +00:00
Rong Xu dc07ae259e [CodeGen] Fix the test case added in r318202
Add the -mtriple option to filter some platforms.

llvm-svn: 318206
2017-11-14 22:08:37 +00:00
Rong Xu 3573d8da36 [CodeGen] Peel off the dominant case in switch statement in lowering
This patch peels off the top case in switch statement into a branch if the
probability exceeds a threshold. This will help the branch prediction and
avoids the extra compares when lowering into chain of branches.

Differential Revision: http://reviews.llvm.org/D39262

llvm-svn: 318202
2017-11-14 21:44:09 +00:00
Hans Wennborg e1ecd61b98 Rename CountingFunctionInserter and use for both mcount and cygprofile calls, before and after inlining
Clang implements the -finstrument-functions flag inherited from GCC, which
inserts calls to __cyg_profile_func_{enter,exit} on function entry and exit.

This is useful for getting a trace of how the functions in a program are
executed. Normally, the calls remain even if a function is inlined into another
function, but it is useful to be able to turn this off for users who are
interested in a lower-level trace, i.e. one that reflects what functions are
called post-inlining. (We use this to generate link order files for Chromium.)

LLVM already has a pass for inserting similar instrumentation calls to
mcount(), which it does after inlining. This patch renames and extends that
pass to handle calls both to mcount and the cygprofile functions, before and/or
after inlining as controlled by function attributes.

Differential Revision: https://reviews.llvm.org/D39287

llvm-svn: 318195
2017-11-14 21:09:45 +00:00
Easwaran Raman 0d55b55bb6 [CodeGenPrepare] Disable div bypass when working set size is huge.
Summary:
Bypass of slow divs based on operand values is currently disabled for
-Os. Do the same when profile summary is available and the working set
size of the application is huge. This is similar to how loop peeling is
guarded by hasHugeWorkingSetSize. In the div bypass case, the generated
extra code (and the extra branch) tendss to outweigh the benefits of the
bypass. This results in noticeable performance improvement on an
internal application.

Reviewers: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39992

llvm-svn: 318179
2017-11-14 19:31:51 +00:00
Simon Pilgrim 600174e740 [X86][AVX] Add scheduling test for vmovntdq 256-bit store
Needs to use inline asm as domain will otherwise be changed to float (vmovntps)

llvm-svn: 318151
2017-11-14 14:03:29 +00:00
Craig Topper c314f461dd [X86] Allow X86ISD::Wrapper to be folded into the base of gather/scatter address
If the base of our gather corresponds to something contained in X86ISD::Wrapper we should be able to fold it into the address.

This patch refactors some of the address matching to more fully use the X86ISelAddressMode struct and the getAddressOperands helper. A new helper function matchVectorAddress is added to call matchWrapper or fall back to matchAddressBase.

We should also be able to support constant offsets from a wrapper, but I'll look into that in a future patch. We may even be able to completely reuse matchAddress here, but I wanted to start simple and work up to it.

Differential Revision: https://reviews.llvm.org/D39927

llvm-svn: 318057
2017-11-13 17:53:59 +00:00
Omer Paparo Bivas 4c679e1435 Inserting a base test for X86 performance nops
Change-Id: I69da08b617d7fae8024c5aee04720eb465f39b81
llvm-svn: 318041
2017-11-13 15:02:39 +00:00
Uriel Korach 2aa707bdaa [X86] test/testn intrinsics lowering to IR. llvm part.
Remove builtins from llvm and add AutoUpgrade support.
Also add fast-isel tests for the TEST and TESTN instructions.

Differential Revision: https://reviews.llvm.org/D38736

llvm-svn: 318036
2017-11-13 12:51:18 +00:00
Jina Nahias 9a7f9f123c [x86][AVX512] Lowering shuffle i/f intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38672), implements the lowering of X86 shuffle i/f intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38671

Change-Id: I1e7d359a74743e995ec356237a85214ce55d3661
llvm-svn: 318026
2017-11-13 09:16:39 +00:00
Gadi Haber c9f2300652 [X86][SKX] Adding scheduling info of non-intrinsic + commutable SKX opcodes.
Updated the scheduling information of the SKX subtarget  in the file X86SchedSkylakeServer.td under lib/Target/X86 to:
1. add regular opcodes in addition to the suffixed "_Int" opcodes
2. add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS
    instructions that are equivalent to their counterparts without the 'C' as they are part of a hack to
    make floating point min/max commutable under fast math.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39833

Change-Id: Ie13702a5ce1b1a08af91ca637a52b6962881e7d6
llvm-svn: 318024
2017-11-13 08:42:07 +00:00
Craig Topper 75d71540f8 [X86] Use sse_load_f32/f64 to improve load folding of scalar vfscalefss/sd, vrcp14ss/sd, rsqrt14ss/sd instructions.
llvm-svn: 318022
2017-11-13 08:07:33 +00:00
Craig Topper c748455e51 [X86] Regenerate test. NFC
llvm-svn: 318021
2017-11-13 08:07:31 +00:00
Craig Topper ca8abedb2a [X86] Use sse_load_f32/f64 to improve load folding for scalar VFPCLASS intrinsics.
llvm-svn: 318019
2017-11-13 06:46:48 +00:00
Craig Topper bf328f263e [X86] Add tests for missed opportunities to fold a 128-bit vector load into vfpclassss and vpfpclasssd.
llvm-svn: 318018
2017-11-13 06:46:46 +00:00
Craig Topper d4f6094091 [X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use sse_load_f32/sse_load_f64 to increase load folding opportunities.
llvm-svn: 318016
2017-11-13 05:25:24 +00:00
Craig Topper 24389c6746 [X86] Add tests for full vector loads to fold-load-unops.ll.
We should be able to fold a full vector load into a scalar intrinsic. Since it's legal to narrow a load.

llvm-svn: 318015
2017-11-13 05:25:23 +00:00
Craig Topper a95a1fd42d [X86] Regenerate fold-load-unops.ll and add and avx512f command line.
llvm-svn: 318014
2017-11-13 05:25:21 +00:00
Craig Topper deee24b83c [X86] Use sse_load_f32/f64 in patterns for the memory forms of VRNDSCALESS/SD.
llvm-svn: 318009
2017-11-13 02:03:01 +00:00
Craig Topper 63157c4784 [X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics.
The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0.

This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space.

We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used.

I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches.

llvm-svn: 318008
2017-11-13 02:03:00 +00:00
Craig Topper b42a23ff8f [X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics.
This fixes a bug where we selected packed instructions for scalar intrinsics.

llvm-svn: 317999
2017-11-12 18:51:09 +00:00
Craig Topper 6b53c4a982 [X86] Add test cases and command lines demonstrating how we accidentally select vrangeps/vrangepd from vrangess/vrangesd instrinsics when the rounding mode is CUR_DIRECTION
llvm-svn: 317998
2017-11-12 18:51:08 +00:00
Craig Topper ac250825c6 [X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint when avx512vl is enabled.
This matches what we do for scalar and 512-bit types.

llvm-svn: 317991
2017-11-11 21:44:51 +00:00
Craig Topper ae9ffa1f5a [X86] Remove avx512-round.ll. The 512-bit rounding tests are now in vec_floor.ll with 128/256 sizes.
llvm-svn: 317990
2017-11-11 21:44:50 +00:00
Craig Topper e44fc7836e [X86] Add avx512vl command line to vec_floor.ll. Add 512-bit test cases.
llvm-svn: 317989
2017-11-11 21:44:49 +00:00
Craig Topper a9f48803d7 [X86] Add avx512f command line to rounding-ops.ll
llvm-svn: 317988
2017-11-11 21:44:48 +00:00
Craig Topper 1a20db2108 [X86] Regenerate rounding-ops.ll with update_llc_test_checks.py
llvm-svn: 317987
2017-11-11 21:44:47 +00:00
Craig Topper 0ccec70ff5 [X86] Add scalar register class versions of VRNDSCALE instructions and rename the existing versions to _Int.
This is consistent with out normal implementation of scalar instructions.

While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice.

llvm-svn: 317977
2017-11-11 08:24:15 +00:00
Craig Topper 4d80c5dafc [X86] Regenerate avx512-round.ll test.
llvm-svn: 317976
2017-11-11 08:24:13 +00:00
Craig Topper 1a093934a9 [X86] Set the execution domain for vptest instruction to the integer domain.
llvm-svn: 317973
2017-11-11 06:19:12 +00:00
Craig Topper 0eb4a43384 [X86] Correct the execution domain on ROUND/VROUND instructions.
llvm-svn: 317968
2017-11-11 02:26:05 +00:00
Craig Topper ffd48e3c27 [SelectionDAG] Teach SelectionDAGBuilder's getUniformBase for gather/scatter handling to accept GEPs with more than 2 operands if the middle operands are all 0s
Currently we can only get a uniform base from a simple GEP with 2 operands. This causes us to miss address folding opportunities for simple global array accesses as the test case shows.

This patch adds support for larger GEPs if the other indices are 0 since those don't require any additional computations to be inserted.

We may also want to handle constant splats of zero here, but I'm leaving that for future work when I have a real world example.

Differential Revision: https://reviews.llvm.org/D39911

llvm-svn: 317947
2017-11-10 22:50:50 +00:00
Craig Topper cad1c95b31 [X86] Add test case to demonstrate failure to fold the address computation of a simple gather from a global array. NFC
llvm-svn: 317905
2017-11-10 18:48:18 +00:00
Simon Pilgrim c0eef8f6b0 [X86] Add scheduling tests for DAA/DAS
llvm-svn: 317892
2017-11-10 15:49:41 +00:00
Simon Pilgrim f9f1064993 [X86] Test non-i64 shld/shll tests on x86_64 targets as well as i686
llvm-svn: 317888
2017-11-10 13:43:04 +00:00
Simon Pilgrim a9d58fae6a [X86] Add scheduling tests
- CBW etc sign extensions
 - CLC/CLD/CMC flag modifiers
 - CPUID

llvm-svn: 317885
2017-11-10 12:32:34 +00:00
Simon Pilgrim 7ca61e31ac [X86] Added TODO list for missing generic x86 instruction scheduling tests.
Not sure if we want to add the more exotic system instructions (IRET etc.) as well?

llvm-svn: 317882
2017-11-10 12:04:39 +00:00
Craig Topper 1a0da2db5f [X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C)
Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG.

llvm-svn: 317878
2017-11-10 08:22:37 +00:00
Andrew V. Tischenko f8c75b8794 Sched model improving on btver2: JFPU01 resource, vtestp* for xmm.
Differential Revision: https://reviews.llvm.org/D39802

llvm-svn: 317785
2017-11-09 14:19:59 +00:00
Andrew V. Tischenko 3543f0a712 Add -print-schedule scheduling comments to inline asm.
Differential Revision: https://reviews.llvm.org/D39728

llvm-svn: 317782
2017-11-09 12:45:40 +00:00
Craig Topper 7a6e294a6c [X86] Make X86ISD::FMADDS3 isel patterns commutable.
This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND.

llvm-svn: 317769
2017-11-09 06:17:05 +00:00
Craig Topper 93e27d2ecc [X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine.
r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes.

llvm-svn: 317748
2017-11-09 01:06:47 +00:00
Craig Topper 61f81f9637 [X86] Preserve memory refs when folding loads into divides.
This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load.

llvm-svn: 317732
2017-11-08 22:26:39 +00:00
Reid Kleckner 7adb2fdbba Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317579, originally committed as r317100.

There is a design issue with marking CFI instructions duplicatable. Not
all targets support the CFIInstrInserter pass, and targets like Darwin
can't cope with duplicated prologue setup CFI instructions. The compact
unwind info emission fails.

When the following code is compiled for arm64 on Mac at -O3, the CFI
instructions end up getting tail duplicated, which causes compact unwind
info emission to fail:
  int a, c, d, e, f, g, h, i, j, k, l, m;
  void n(int o, int *b) {
    if (g)
      f = 0;
    for (; f < o; f++) {
      m = a;
      if (l > j * k > i)
        j = i = k = d;
      h = b[c] - e;
    }
  }

We get assembly that looks like this:
; BB#1:                                 ; %if.then
Lloh3:
	adrp	x9, _f@GOTPAGE
Lloh4:
	ldr	x9, [x9, _f@GOTPAGEOFF]
	mov	 w8, wzr
Lloh5:
	str		wzr, [x9]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.lt	LBB0_3
	b	LBB0_7
LBB0_2:                                 ; %entry.if.end_crit_edge
Lloh6:
	adrp	x8, _f@GOTPAGE
Lloh7:
	ldr	x8, [x8, _f@GOTPAGEOFF]
Lloh8:
	ldr		w8, [x8]
	stp	x20, x19, [sp, #-16]!   ; 8-byte Folded Spill
	.cfi_def_cfa_offset 16
	.cfi_offset w19, -8
	.cfi_offset w20, -16
	cmp		w8, w0
	b.ge	LBB0_7
LBB0_3:                                 ; %for.body.lr.ph

Note the multiple .cfi_def* directives. Compact unwind info emission
can't handle that.

llvm-svn: 317726
2017-11-08 21:31:14 +00:00
Simon Pilgrim 1c1fd15959 [X86] Add some initial scheduling tests for generic x86 instructions
These will be using inline asm to ensure we have coverage that we're unlikely to get from lowering of basic ir.

Currently waiting for D39728 to land to add support for scheduler comments for inline asm.

llvm-svn: 317698
2017-11-08 16:35:42 +00:00
Craig Topper 65e6d0b758 [X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits.
llvm-svn: 317662
2017-11-08 04:00:31 +00:00
Craig Topper b832ee68b4 [X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back.
Missed store folding opportunities will be fixed in a subsequent commit.

llvm-svn: 317661
2017-11-08 04:00:30 +00:00
Sriraman Tallam 056b3fd6fb Attribute nonlazybind should not affect calls to functions with hidden visibility.
Differential Revision: https://reviews.llvm.org/D39625

llvm-svn: 317639
2017-11-08 00:01:05 +00:00
Petar Jovanovic e2a585dddc Reland "Correct dwarf unwind information in function epilogue for X86"
Reland r317100 with minor fix regarding ComputeCommonTailLength function in
BranchFolding.cpp. Skipping top CFI instructions block needs to executed on
several more return points in ComputeCommonTailLength().

Original r317100 message:

"Correct dwarf unwind information in function epilogue for X86"

This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.

Patch by Violeta Vukobrat.

llvm-svn: 317579
2017-11-07 14:40:27 +00:00
Simon Pilgrim 9a6b720f4f [X86] Regenerate select tests
llvm-svn: 317571
2017-11-07 13:21:02 +00:00
Bjorn Steinbrink c02b237e46 [X86] Don't clobber reserved registers with stack adjustments
Summary:
Calls using invoke in funclet based functions are assumed to clobber
all registers, which causes the stack adjustment using pops to consider
all registers not defined by the call to be undefined, which can
unfortunately include the base pointer, if one is needed.

To prevent this (and possibly other hazards), skip reserved registers
when looking for candidate registers.

This fixes issue #45034 in the Rust compiler.

Reviewers: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39636

llvm-svn: 317551
2017-11-07 08:50:21 +00:00
Craig Topper e7fb300226 [X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions.
llvm-svn: 317548
2017-11-07 07:13:07 +00:00
Craig Topper 0231b1d445 [X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics.
Disable the peephole pass to prove that the pattern is working.

llvm-svn: 317547
2017-11-07 07:13:06 +00:00
Craig Topper 65fc53320b [X86] Add a test for a 128-bit vector load feeding a cvtph2ps intrinsic.
The instruction only loads 64-bits, but we should be able to fold a wider load and let it be narrowed.

llvm-svn: 317546
2017-11-07 07:13:05 +00:00
Craig Topper 8942b33f84 [X86] Remove alignment from a load in the f16c intrinsic test. The alignment shouldn't be required for load folding.
llvm-svn: 317545
2017-11-07 07:13:04 +00:00
Craig Topper cf8e6d0a76 [X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics.
Looks like there's some missed load folding opportunities for i64 loads.

llvm-svn: 317544
2017-11-07 07:13:03 +00:00
Craig Topper 75510dd6f7 [X86] Add AVX512VL command line to f16c intrinsic test to show missed EVEX opportunities for the legacy intrinsics.
llvm-svn: 317543
2017-11-07 07:13:01 +00:00
Craig Topper afc3c8206e [X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a COPY_TO_REGCLASS.
ExeDepsFix pass should take care of making the registers match.

llvm-svn: 317542
2017-11-07 04:44:22 +00:00
Craig Topper 428a4e6374 [X86] Make FeatureAVX512 imply FeatureF16C.
The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available.

Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns.

All known CPUs with AVX512 have F16C so this should safe for now.

llvm-svn: 317521
2017-11-06 22:49:04 +00:00
Bjorn Pettersson a42ed3e361 [MIRPrinter] Use %subreg.xxx syntax for subregister index operands
Summary:
Print %subreg.<subregidxname> instead of just the subregister
index when printing immediate operands corresponding to subreg
indices in INSERT_SUBREG, EXTRACT_SUBREG, SUBREG_TO_REG and
REG_SEQUENCE.

Reviewers: qcolombet, MatzeB

Reviewed By: MatzeB

Subscribers: nhaehnle, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D39696

llvm-svn: 317513
2017-11-06 21:46:06 +00:00
Uriel Korach bb86686a8b [X86][AVX512] Improve lowering of AVX512 test intrinsics
Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits
and does not need the redundant shift left and shift right instructions afterwards.
Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM
and icmp(eq,and(X,Y), 0) goes folds into TESTNM
This commit is a preparation for lowering the test and testn X86 intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38732

llvm-svn: 317465
2017-11-06 09:22:38 +00:00
Zvi Rackover 3122698040 X86 ISel: Basic support for variable-index vector permutations
Summary:
Try to lower a BUILD_VECTOR composed of extract-extract chains that can be
reasoned to be a permutation of a vector by indices in a non-constant vector.

We saw this pattern created by ISPC, which resolts to creating it due to the
requirement that shufflevector's mask operand be a *constant* vector.
I didn't check this but we could possibly use this pattern for lowering the X86 permute
C-instrinsics instead of llvm.x86 instrinsics.

This change can be followed by more improvements:
1. Handle vectors with undef elements.
2. Utilize pshufb and zero-mask-blending to support more effiecient
   construction of vectors with constant-0 elements.
3. Use smaller-element vectors of same width, and "interpolate" the indices,
   when no native operation available.

Reviewers: RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: chandlerc, DavidKreitzer

Differential Revision: https://reviews.llvm.org/D39126

llvm-svn: 317463
2017-11-06 08:25:46 +00:00
Jina Nahias 7b705f1f91 [x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR
This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR.

Differential Revision: https://reviews.llvm.org/D38684

Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540
llvm-svn: 317458
2017-11-06 07:09:24 +00:00
Craig Topper 70eaeae7f0 [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.
llvm-svn: 317454
2017-11-06 05:48:26 +00:00
Craig Topper 25cfa4cb55 [X86] Add avx512vl command line to fma-instrinsics-x86.ll
Some of these demonstrate a missed EVEX to VEX compression because we aren't prefering EVEX instructions during isel.

llvm-svn: 317452
2017-11-06 05:48:24 +00:00
Craig Topper 7e48aa89c7 [X86] Simplify command lines on the fma-instrinsics-x86.ll test and add -show-mc-encoding.
Use feature names instead of CPU names.

A future commit will add avx512vl command lines to demonstrate missed use of EVEX instructions.

llvm-svn: 317451
2017-11-06 05:48:23 +00:00
Craig Topper eff606cc0e [X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics.
Fixes PR35161.

llvm-svn: 317445
2017-11-06 04:04:01 +00:00
Craig Topper 4e2f53511a [X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I missed in r317413.
llvm-svn: 317441
2017-11-05 21:14:05 +00:00
Simon Pilgrim 879c5b15c4 [X86][SSE] Tests for integer min/max horizontal reductions
Matching patterns that vectorizers should have created for us. 

The experimental intrinsics should probably be added as well.

llvm-svn: 317439
2017-11-05 19:48:24 +00:00
Simon Pilgrim f8105cf357 [X86][AVX] Regenerate test. NFCI.
llvm-svn: 317424
2017-11-04 21:18:06 +00:00
Craig Topper 692c8efe30 [X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled.
Summary:
AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement.

Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt.

I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed.

As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here.

This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions.

Going forward I think our focus should be on
-Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14.
-Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER
-Supporting double precision.

Reviewers: zvi, DavidKreitzer, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39583

llvm-svn: 317413
2017-11-04 18:26:41 +00:00
Craig Topper be1f219050 [X86] Regenerate a couple more tests that I missed in r317410.
llvm-svn: 317412
2017-11-04 18:26:39 +00:00
Craig Topper e5d44cefea [X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 into VPERM2F128/VPERM2I128.
This recovers some of the tests that were changed by r317403.

llvm-svn: 317410
2017-11-04 18:10:03 +00:00
Craig Topper a96d62b360 [X86] Teach shuffle lowering to use 256-bit SHUF128 when possible.
This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary.

As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out.

llvm-svn: 317403
2017-11-04 06:44:47 +00:00
Craig Topper d21a53f246 [X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load.
llvm-svn: 317382
2017-11-03 22:48:13 +00:00
Andrew V. Tischenko 0916c6b654 Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own.
Differential Revision: https://reviews.llvm.org/D39546

llvm-svn: 317330
2017-11-03 15:25:13 +00:00
Clement Courbet 063bed9baf re-land [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
Fix undefined references: ExpandMemCmp belongs to CodeGen/, not Scalar/.

llvm-svn: 317318
2017-11-03 12:12:27 +00:00
Simon Pilgrim ae1f013495 [X86][SSE] Add PACKUS support to combineVectorTruncation
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317315
2017-11-03 11:33:48 +00:00
Craig Topper 333897ec31 [X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible.
llvm-svn: 317299
2017-11-03 06:48:02 +00:00
Sriraman Tallam 7cdb10f1aa Avoid PLT for external calls when attribute nonlazybind is used.
Differential Revision: https://reviews.llvm.org/D39065

llvm-svn: 317292
2017-11-03 00:10:19 +00:00
Craig Topper 086c04c8a7 [X86] Give AVX512VL instructions priority over their AVX equivalents.
I thought we had gotten all these priority bugs worked out, but I guess not.

llvm-svn: 317283
2017-11-02 23:23:37 +00:00
Clement Courbet 82bade615b Revert "[ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass."
undefined reference to `llvm::TargetPassConfig::ID' on
clang-ppc64le-linux-multistage

This reverts commit eea333c33fa73ad225ef28607795984829f65688.

llvm-svn: 317213
2017-11-02 15:53:10 +00:00
Clement Courbet 1dc37b9c3b [ExpandMemCmp] Split ExpandMemCmp from CodeGen into its own pass.
Summary:
This is mostly a noop (most of the test diffs are renamed blocks).
There are a few temporary register renames (eax<->ecx) and a few blocks are
shuffled around.

See the discussion in PR33325 for more details.

Reviewers: spatel

Subscribers: mgorny

Differential Revision: https://reviews.llvm.org/D39456

llvm-svn: 317211
2017-11-02 15:02:51 +00:00
Ayman Musa a37d1130d7 [X86] Fix bug in legalize vector types - Split large loads
When splitting a large load to smaller legally-typed loads, the last load should be padded to reach the size of the previous one so a CONCAT_VECTORS node could reunite them again.
The code currently pads the last load to reach the size of the first load (instead of the previous).

Differential Revision: https://reviews.llvm.org/D38495

Change-Id: Ib60b55ed26ce901fabf68108daf52683fbd5013f
llvm-svn: 317206
2017-11-02 13:07:06 +00:00
Michael Zuckerman 0c20b690a2 Adding test for extraxt sub vector load and store avx512
Change-Id: Iefcb0ec6b6aa1b530ce5358081f02e6e522a8e50
llvm-svn: 317202
2017-11-02 12:19:36 +00:00
Andrew V. Tischenko 3c8bf5ec37 The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc.
PR32857 should be closed.
Differential Revision: https://reviews.llvm.org/D39227

llvm-svn: 317196
2017-11-02 10:33:41 +00:00
Steven Wu d0f59f0a5b [X86] Fix fast-isel-int-float-conversion test
Test is failing due to the revert in r317136. Fix the test to make all
the bots happy.

llvm-svn: 317153
2017-11-02 01:34:15 +00:00
Petar Jovanovic bb5c84fb57 Revert "Correct dwarf unwind information in function epilogue for X86"
This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf
buildbot failure (build #15606).

llvm-svn: 317136
2017-11-01 23:05:52 +00:00
Simon Pilgrim e152c2c447 [X86][SSE] Add PACKUS support to LowerTruncate
Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value.

We have to account for pre-SSE41 targets not supporting PACKUSDW

llvm-svn: 317128
2017-11-01 21:52:29 +00:00
Craig Topper 4e56ba271e [X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q into VPALIGNR if the extended registers aren't being used.
This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding.

Differential Revision: https://reviews.llvm.org/D39401

llvm-svn: 317122
2017-11-01 21:00:59 +00:00
Craig Topper ca1aa83cbe [X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate.
This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses.

We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch.

Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want.

Differential Revision: https://reviews.llvm.org/D39402

llvm-svn: 317112
2017-11-01 18:10:06 +00:00
Craig Topper 56db9d6bec [X86] Regnerate test to attempt to fix build bot failure.
llvm-svn: 317107
2017-11-01 17:44:12 +00:00
Craig Topper 5ae677e102 [X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP
Summary:
[X86] Teach fast isel to handle i64 sitofp with AVX.

For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX.

Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode.

Reviewers: RKSimon, zvi

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39450

llvm-svn: 317102
2017-11-01 16:23:06 +00:00
Andrew V. Tischenko 3d971e39f8 Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39059

llvm-svn: 317101
2017-11-01 16:10:20 +00:00
Petar Jovanovic f2faee92aa Correct dwarf unwind information in function epilogue for X86
This patch aims to provide correct dwarf unwind information in function
epilogue for X86.

It consists of two parts. The first part inserts CFI instructions that set
appropriate cfa offset and cfa register in emitEpilogue() in
X86FrameLowering. This part is X86 specific.

The second part is platform independent and ensures that:

- CFI instructions do not affect code generation
- Unwind information remains correct when a function is modified by
  different passes. This is done in a late pass by analyzing information
  about cfa offset and cfa register in BBs and inserting additional CFI
  directives where necessary.

Changed CFI instructions so that they:

- are duplicable
- are not counted as instructions when tail duplicating or tail merging
- can be compared as equal

Added CFIInstrInserter pass:

- analyzes each basic block to determine cfa offset and register valid at
  its entry and exit
- verifies that outgoing cfa offset and register of predecessor blocks match
  incoming values of their successors
- inserts additional CFI directives at basic block beginning to correct the
  rule for calculating CFA

Having CFI instructions in function epilogue can cause incorrect CFA
calculation rule for some basic blocks. This can happen if, due to basic
block reordering, or the existence of multiple epilogue blocks, some of the
blocks have wrong cfa offset and register values set by the epilogue block
above them.

CFIInstrInserter is currently run only on X86, but can be used by any target
that implements support for adding CFI instructions in epilogue.


Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D35844

llvm-svn: 317100
2017-11-01 16:04:11 +00:00
Simon Pilgrim 2b09f39b4d Regenerate PACKUS/TRUNCS test (PR31773)
llvm-svn: 317096
2017-11-01 15:27:23 +00:00
Simon Pilgrim f657ba0cb6 [X86][SSE] Truncate with PACKSS any input with sufficient sign-bits
So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes).

The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc.

Differential Revision: https://reviews.llvm.org/D39476

llvm-svn: 317086
2017-11-01 11:47:44 +00:00
Simon Pilgrim a5793820fd [X86][AVX512] Regenerate tests to remove retl/retq regex
These are only testing 64-bit targets so we don't need the regex

llvm-svn: 317021
2017-10-31 18:43:24 +00:00
Simon Pilgrim c1927f12db [X86][AVX512] Split AVX512F and AVX512BW bool-vector bitcast tests
llvm-svn: 317020
2017-10-31 18:41:48 +00:00
Craig Topper beed653135 [X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled.
Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior.

Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used?

llvm-svn: 316978
2017-10-31 06:01:04 +00:00
Craig Topper 9f01f6093c [X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode.
llvm-svn: 316955
2017-10-30 21:09:19 +00:00
Simon Pilgrim 017f896adb [SelectionDAG] Add VSELECT demanded elts support to computeKnownBits
llvm-svn: 316947
2017-10-30 19:31:08 +00:00
Zvi Rackover dfd9e7dda8 X86 Tests: Update the variable-index permute tests with FP types. NFC.
These cases will be addressed in a future update to D39126.

llvm-svn: 316946
2017-10-30 19:29:15 +00:00
Simon Pilgrim 28b6219bd6 [X86][SSE] Add another computeKnownBits test showing missing VSELECT demandedelts support
llvm-svn: 316945
2017-10-30 19:19:58 +00:00
Simon Pilgrim 96a0b9ef54 [SelectionDAG] Add VSELECT support to computeKnownBits
llvm-svn: 316944
2017-10-30 19:08:21 +00:00
Simon Pilgrim b81fbf44c7 [X86][SSE] computeKnownBits tests showing missing VSELECT demandedelts support
llvm-svn: 316940
2017-10-30 18:48:31 +00:00
Simon Pilgrim 10d9e27067 [X86][AVX512] Cleanup scheduler tests - split GENERIC and SKX targets
llvm-svn: 316938
2017-10-30 18:37:27 +00:00
Simon Pilgrim 5da11dfd24 [SelectionDAG] Add SELECT demanded elts support to ComputeNumSignBits
llvm-svn: 316933
2017-10-30 17:53:51 +00:00
Simon Pilgrim ce82f7b694 [X86][SSE] ComputeNumSignBits tests showing missing VSELECT demandedelts support
llvm-svn: 316932
2017-10-30 17:46:50 +00:00
Simon Pilgrim ad62cbe5d9 [X86][AVX] Add missing vcvtpd2dq/vcvtps2dq scheduling tests
llvm-svn: 316926
2017-10-30 17:23:17 +00:00
Simon Pilgrim 76d36c5f0f [X86][SSE] Add clflush scheduling test
llvm-svn: 316925
2017-10-30 17:20:50 +00:00
Jina Nahias 5bf6620b15 [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b
llvm-svn: 316921
2017-10-30 16:37:28 +00:00
Craig Topper 4e13d4de52 [X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used.
Summary:
INC/DEC don't update the carry flag so we need to make sure we don't try to use it.

This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested.

The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860.

This should fully fix PR35068 finishing the fix started in r316860.

Reviewers: RKSimon, zvi, spatel

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39411

llvm-svn: 316913
2017-10-30 14:51:37 +00:00
Craig Topper 367cc12fa9 [X86] Remove AVX512 early out from X86FastISel::X86SelectCmp.
This shouldn't be needed anymore since i1 isn't a legal type.

llvm-svn: 316912
2017-10-30 14:50:11 +00:00
Craig Topper 87a148ad0e [X86] Regenerate test using update_llc_test_checks.py
llvm-svn: 316911
2017-10-30 14:50:10 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Jina Nahias e63db55c67 Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic."
This reverts commit r316890.

Change-Id: I683cceee9848ef309b452293086b1f26a941950d
llvm-svn: 316894
2017-10-30 10:35:53 +00:00
Jina Nahias 70280f9a0d [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 316890
2017-10-30 09:59:52 +00:00
Simon Pilgrim 601ae238b7 [SelectionDAG] Add SEXT/AND/XOR/Or demanded elts support to ComputeNumSignBits
llvm-svn: 316875
2017-10-29 22:03:37 +00:00
Simon Pilgrim aa65159b0a [X86][SSE] Split ComputeNumSignBits SEXT/AND/XOR/OR demandedelts test
Max depth was being exceeded which could prevent some combines working

llvm-svn: 316871
2017-10-29 21:35:28 +00:00