Commit Graph

22123 Commits

Author SHA1 Message Date
Yaxun Liu c928f2a6d4 [AMDGPU] Emit metadata for hidden arguments for kernel enqueue
Identifies kernels which performs device side kernel enqueues and emit
metadata for the associated hidden kernel arguments. Such kernels are
marked with calls-enqueue-kernel function attribute by
AMDGPUOpenCLEnqueueKernelLowering pass and later on
hidden kernel arguments metadata HiddenDefaultQueue and
HiddenCompletionAction are emitted for them.

Differential Revision: https://reviews.llvm.org/D39255

llvm-svn: 316907
2017-10-30 14:30:28 +00:00
Clement Courbet b2c3eb8cf1 [CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2).
- Targets that want to support memcmp expansions now return the list of
   supported load sizes.
 - Expansion codegen does not assume that all power-of-two load sizes
   smaller than the max load size are valid. For examples, this is not the
   case for x86(32bit)+sse2.

Fixes PR34887.

llvm-svn: 316905
2017-10-30 14:19:33 +00:00
Javed Absar 5cde1ccb29 [GlobalISel|ARM] : Allow legalizing G_FSUB
Adding support for VSUB.
Reviewed by: @rovka
Differential Revision: https://reviews.llvm.org/D39261

llvm-svn: 316902
2017-10-30 13:51:56 +00:00
Diana Picus 6e41e6ac6c [ARM GlobalISel] Fixup r316572. NFC
Just missed a few spots...

llvm-svn: 316897
2017-10-30 11:58:09 +00:00
Jina Nahias e63db55c67 Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic."
This reverts commit r316890.

Change-Id: I683cceee9848ef309b452293086b1f26a941950d
llvm-svn: 316894
2017-10-30 10:35:53 +00:00
Jina Nahias 70280f9a0d [X86][AVX512] Adding a pattern for broadcastm intrinsic.
Differential Revision: https://reviews.llvm.org/D38312

Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5
llvm-svn: 316890
2017-10-30 09:59:52 +00:00
Simon Pilgrim 601ae238b7 [SelectionDAG] Add SEXT/AND/XOR/Or demanded elts support to ComputeNumSignBits
llvm-svn: 316875
2017-10-29 22:03:37 +00:00
Simon Pilgrim aa65159b0a [X86][SSE] Split ComputeNumSignBits SEXT/AND/XOR/OR demandedelts test
Max depth was being exceeded which could prevent some combines working

llvm-svn: 316871
2017-10-29 21:35:28 +00:00
Simon Pilgrim 61921f779f [X86][SSE] ComputeNumSignBits tests showing missing SEXT/AND/XOR/OR demandedelts support
llvm-svn: 316868
2017-10-29 20:49:27 +00:00
Simon Pilgrim 7613a7b564 [SelectionDAG] Add SRA/SHL demanded elts support to ComputeNumSignBits
Introduce a isConstOrDemandedConstSplat helper function that can recognise a constant splat build vector for at least the demanded elts we care about.

llvm-svn: 316866
2017-10-29 18:19:37 +00:00
Simon Pilgrim b56fb4a2bb [X86][SSE] ComputeNumSignBits tests showing missing SHL/SRA demandedelts support
llvm-svn: 316865
2017-10-29 18:01:31 +00:00
Craig Topper 41d363fea1 [X86] Add a slow-incdec command line to atomic-eflags-reuse.ll
I believe the test_sub_1_cmp_1_setcc_ugt test case is being miscompiled in the fast inc/dec case.

llvm-svn: 316864
2017-10-29 17:15:09 +00:00
Craig Topper 495a1bc893 [X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this.
If the carry flag is being used, this transformation isn't safe.

This does prevent some test cases from using DEC now, but I'll try to look into that separately.

Fixes PR35068.

llvm-svn: 316860
2017-10-29 06:51:04 +00:00
Craig Topper 5f2289a13c [X86] Add AVX512 support to X86FastISel::X86SelectFPExt and X86FastISel::X86SelectFPTrunc.
llvm-svn: 316856
2017-10-29 02:50:31 +00:00
Craig Topper 8373336a22 [X86] Use update_llc_test_checks.py to regenerate fast-isel-int-float-conversion.ll
llvm-svn: 316855
2017-10-29 02:25:48 +00:00
Craig Topper 7d1ed9ec83 [X86] Use update_llc_test_checks.py to regenerate fast-isel-fptrunc-fpext.ll
llvm-svn: 316854
2017-10-29 02:18:43 +00:00
Craig Topper 1e30d783dd [X86] Add AVX512 support to X86FastISel::X86MaterializeFP
llvm-svn: 316853
2017-10-29 02:18:41 +00:00
Simon Pilgrim b37a24e82f [SelectionDAG] Add support for INSERT_SUBVECTOR to computeKnownBits
llvm-svn: 316847
2017-10-28 22:10:40 +00:00
Simon Pilgrim 294f88dfa0 [X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS.
llvm-svn: 316845
2017-10-28 20:51:27 +00:00
Sanjay Patel b049173157 [SimplifyCFG] use pass options and remove the latesimplifycfg pass
This is no-functional-change-intended.

This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and 
D35411 (defer folding unconditional branches) with pass parameters rather than a named
"latesimplifycfg" pass. Now that we have individual options to control the functionality,
we could decouple when these fire (but that's an independent patch if desired). 

The next planned step would be to add another option bit to disable the sinking transform
mentioned in D38566. This should also make it clear that the new pass manager needs to
be updated to limit simplifycfg in the same way as the old pass manager.

Differential Revision: https://reviews.llvm.org/D38631

llvm-svn: 316835
2017-10-28 18:43:07 +00:00
Craig Topper abe5dbafff [X86] Correct the alignments on the aligned test cases in fast-isel-vecload.ll to make sure they test selection of aligned loads.
llvm-svn: 316833
2017-10-28 17:37:51 +00:00
Simon Pilgrim d09c1ac20f [SelectionDAG] Support 'bit preserving' floating points bitcasts on computeKnownBits/ComputeNumSignBits
For cases where we know the floating point representations match the bitcasted integer equivalent, allow bitcasting to these types.

This is especially useful for the X86 floating point compare results which return all/zero bits but as a floating point type.

Differential Revision: https://reviews.llvm.org/D39289

llvm-svn: 316831
2017-10-28 14:27:53 +00:00
Craig Topper 39cfdc664d [X86] Add avx command lines to fast-isel-constpool.ll to improve coverage.
llvm-svn: 316829
2017-10-28 06:31:48 +00:00
Craig Topper ea83f85da0 [X86] Use update_llc_test_checks.py to regenerate fast-isel-constpool.ll
llvm-svn: 316828
2017-10-28 06:31:46 +00:00
Craig Topper 8ca5863dd8 [X86] Add a fast-isel test for the i8 pseudo cmov.
llvm-svn: 316827
2017-10-28 06:10:03 +00:00
Craig Topper fd0a35a649 [X86] Add avx command lines to two fast-isel tests to get coverage of selecting vucomiss/vucomisd.
The selection of these shows up as a code coverage hole when looking at the llvm-cov link on llvm.org

llvm-svn: 316823
2017-10-28 02:03:59 +00:00
Craig Topper 4390c61fad [X86] Use update_llc_test_checks.py to regenerate fast-isel-select-cmov2.ll
llvm-svn: 316822
2017-10-28 02:03:58 +00:00
Tom Stellard d0c6cf2e8c AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38439

llvm-svn: 316815
2017-10-27 23:57:41 +00:00
Krzysztof Parzyszek 4dc04e6a70 [Hexagon] Adjust patterns to reflect instruction selection preferences
llvm-svn: 316804
2017-10-27 22:24:49 +00:00
Guozhi Wei 7c67009fe5 [DAGCombine] Don't combine sext with extload if sextload is not supported and extload has multi users
In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then

  if sext is the only user of extload, there is no big difference, no harm no benefit.
  if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case.

This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload.

Differential Revision: https://reviews.llvm.org/D39108

llvm-svn: 316802
2017-10-27 21:54:24 +00:00
Rafael Espindola 2393c3b4e1 Handle undefined weak hidden symbols on all architectures.
We were handling the non-hidden case in lib/Target/TargetMachine.cpp,
but the hidden case was handled in architecture dependent code and
only X86_64 and AArch64 were covered.

While it is true that some code sequences in some ABIs might be able
to produce the correct value at runtime, that doesn't seem to be the
common case.

I left the AArch64 code in place since it also forces a got access for
non-pic code. It is not clear if that is needed, but it is probably
better to change that in another commit.

llvm-svn: 316799
2017-10-27 21:18:48 +00:00
Craig Topper b904c70005 [X86] Add fast-isel tests for integer shifts. We definitely had no coverage of i16 and i32/i64 are only tested by larger tests.
llvm-svn: 316796
2017-10-27 21:00:56 +00:00
Artur Gainullin af7ba8ff6b Improve clamp recognition in ValueTracking.
Summary:
ValueTracking was recognizing not all variations of clamp. Swapping of
true value and false value of select was added to fix this problem. The
first patch was reverted because it caused miscompile in NVPTX target. 
Added corresponding test cases.

Reviewers: spatel, majnemer, efriedma, reames

Subscribers: llvm-commits, jholewinski

Differential Revision: https://reviews.llvm.org/D39240

llvm-svn: 316795
2017-10-27 20:53:41 +00:00
Craig Topper 58fe564e93 [X86] Add avx512vl command line to fast-isel-nontemporal.ll
llvm-svn: 316789
2017-10-27 20:13:06 +00:00
Krzysztof Parzyszek 92a2635bbd [Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cpp
Making sure that an instruction has fewer operands than required, then
attempting to access one out of range is going to fail.

llvm-svn: 316785
2017-10-27 18:52:28 +00:00
Simon Pilgrim 1bfaa453a3 [X86][SSE] Add tests for inserting all-bits (-1) into a vector
We should be able to do this by re-materializing an all-bits vector and then blending with it

llvm-svn: 316779
2017-10-27 18:14:12 +00:00
Clement Courbet be684eee82 [CodeGen][ExpandMemCmp][NFC] Simplify load sequence generation.
llvm-svn: 316763
2017-10-27 12:34:18 +00:00
Matt Arsenault 878827d93a DAG: Fold fma (fneg x), K, y -> fma x, -K, y
llvm-svn: 316753
2017-10-27 09:06:07 +00:00
Clement Courbet b237f044a4 [CodeGen][ExpandMemcmp][NFC] Make tests more complete.
llvm-svn: 316749
2017-10-27 08:33:51 +00:00
Sean Fertile 57d46b8436 Add subclass data to the FoldingSetNode for MemIntrinsicSDNodes.
Not having the subclass data on an MemIntrinsicSDNodes means it was possible
to try to fold 2 nodes with the same operands but differing MMO flags. This
would trip an assertion when trying to refine the alignment between the 2
MachineMemOperands.

Differential Revision: https://reviews.llvm.org/D38898

llvm-svn: 316737
2017-10-27 04:02:51 +00:00
Eli Friedman d5dfb62de7 [ARM] Honor -mfloat-abi for libcall calling convention
As far as I can tell, this matches gcc: -mfloat-abi determines the
calling convention for all functions except those explicitly defined as
soft-float in the ARM RTABI.

This change only affects cases where the user specifies -mfloat-abi to
override the default calling convention derived from the target triple.

Fixes https://bugs.llvm.org//show_bug.cgi?id=34530.

Differential Revision: https://reviews.llvm.org/D38299

llvm-svn: 316708
2017-10-26 21:42:32 +00:00
Craig Topper b8d7d4d683 [X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions.
If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation.

This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case.

This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG.

Differential Revision: https://reviews.llvm.org/D38275

llvm-svn: 316702
2017-10-26 21:12:03 +00:00
Sanjay Patel ac50f3e907 [x86] use an insert op to put one variable element into a constant of vectors
Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it.

Differential Revision: https://reviews.llvm.org/D38756

llvm-svn: 316685
2017-10-26 18:27:55 +00:00
Konstantin Zhuravlyov 6f8e515185 AMDGPU: Commit missing fence-barrier test
This should have been committed with memory model implementation

llvm-svn: 316680
2017-10-26 17:54:09 +00:00
Sean Fertile c70d28bff5 Represent runtime preemption in the IR.
Currently we do not represent runtime preemption in the IR, which has several
drawbacks:

  1) The semantics of GlobalValues differ depending on the object file format
     you are targeting (as well as the relocation-model and -fPIE value).
  2) We have no way of disabling inlining of run time interposable functions,
     since in the IR we only know if a function is link-time interposable.
     Because of this llvm cannot support elf-interposition semantics.
  3) In LTO builds of executables we will have extra knowledge that a symbol
     resolved to a local definition and can't be preemptable, but have no way to
     propagate that knowledge through the compiler.

This patch adds preemptability specifiers to the IR with the following meaning:

dso_local --> means the compiler may assume the symbol will resolve to a
 definition within the current linkage unit and the symbol may be accessed
 directly even if the definition is not within this compilation unit.

dso_preemptable --> means that the compiler must assume the GlobalValue may be
replaced with a definition from outside the current linkage unit at runtime.

To ease transitioning dso_preemptable is treated as a 'default' in that
low-level codegen will still do the same checks it did previously to see if a
symbol should be accessed indirectly. Eventually when IR producers emit the
specifiers on all Globalvalues we can change dso_preemptable to mean 'always
access indirectly', and remove the current logic.

Differential Revision: https://reviews.llvm.org/D20217

llvm-svn: 316668
2017-10-26 15:00:26 +00:00
Marek Olsak 2232243863 AMDGPU: Handle s_buffer_load_dword hazard on SI
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D39171

llvm-svn: 316666
2017-10-26 14:43:02 +00:00
Simon Dardis 13452383cd [mips] Fix PR35071
PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past
debug instructions when removing branches for the control flow optimizer, which
lead to duplicated conditional branches. If the target of the branch was a
removable block, only the conditional branch in the terminating position would
have it's MBB operands updated, leaving the first branch with a dangling MBB
operand. The MIPS long branch pass would then trigger an assertion when
attempting to examine the instruction with dangling MBB operand.

This resolves PR35071.

Thanks to Alex Richardson for reporting the issue!

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39288

llvm-svn: 316654
2017-10-26 10:58:36 +00:00
Hiroshi Inoue b72b1fb0de [PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1
Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0.
This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities.

Differential Revision: https://reviews.llvm.org/D38941

llvm-svn: 316647
2017-10-26 09:01:51 +00:00
Alexander Richardson d3aa475988 Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0
Summary:
On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by
`.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the
file. Add a CHECK for that directive to avoid failing the test.

Reviewers: rampitec, kzhuravl

Reviewed By: rampitec, kzhuravl

Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski

Differential Revision: https://reviews.llvm.org/D39306

llvm-svn: 316616
2017-10-25 21:44:21 +00:00
Krzysztof Parzyszek 27056da9a8 [Hexagon] Account for negative offset when limiting max deviation
In getOffsetRange, Max can be set to 0 to force the extender replacement
to be at or below the original value. This would cause the new offset to
be non-negative, which is preferred for memory instructions (to reduce
the likelihood of it getting constant-extended due to predication). The
problem happens when the range is shifted by an offset (present in the
instruction being examined) and the offset is negative. The entire range
for the allowable deviation will then be strictly negative. This creates
a problem, since 0 is assumed to be a valid deviation.

llvm-svn: 316601
2017-10-25 18:46:40 +00:00
Konstantin Zhuravlyov cff1155035 AMDGPU: Cleanup memory legalizer load/store tests
llvm-svn: 316590
2017-10-25 17:04:46 +00:00
Konstantin Zhuravlyov 538612c885 AMDGPU/NFC: Rename memory legalizer tests:
- memory-legalizer-atomic-load.ll -> memory-legalizer-load.ll
  - memory-legalizer-atomic-store.ll -> memory-legalizer-store.ll

llvm-svn: 316586
2017-10-25 16:45:28 +00:00
Daniil Fukalov 2bfbadcbc1 [inlineasm] Fix crash when number of matched input constraint operands overflows signed char
In a case when number of output constraint operands that has matched input operands
doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access
ConstraintOperands (that is std::vector) with negative index.

Reviewers: rampitec, arsenm

Differential Review: https://reviews.llvm.org/D39125

llvm-svn: 316574
2017-10-25 12:51:32 +00:00
Diana Picus c537795433 [ARM GlobalISel] Remove redundant testcases. NFC
Remove the G_FADD testcases from arm-legalizer.mir, they are covered by
arm-legalizer-fp.mir (I probably forgot to delete them when I created
that test).

llvm-svn: 316573
2017-10-25 12:22:21 +00:00
Diana Picus e7b53c4546 [ARM GlobalISel] Update test after r316479. NFC
No need to check register classes in the register block anymore, since
we can now much more conveniently check them at their def.

llvm-svn: 316572
2017-10-25 12:22:16 +00:00
Diana Picus b35022121d [ARM GlobalISel] Fix call opcodes
We were generating BLX for all the calls, which was incorrect in most
cases. Update ARMCallLowering to generate BL for direct calls, and BLX,
BX_CALL or BMOVPCRX_CALL for indirect calls.

llvm-svn: 316570
2017-10-25 11:42:40 +00:00
Diana Picus eb58db5321 [ARM GlobalISel] Split test into 3. NFC
Separate the test cases that deal with calls from the rest of the IR
Translator tests.

We split into 2 different files, one for testing parameter and result
lowering, and one for testing the various different kinds of calls that
can occur (BL, BLX, BX_CALL etc).

llvm-svn: 316569
2017-10-25 11:21:15 +00:00
Clement Courbet 0c7cd071f7 Re-land "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads (1)"
Compute the actual decomposition only after deciding whether to expand
of not. Else, it's easy to make the compiler OOM with:
`memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if
someone mistakenly passes a negative value. Add a test.

This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44.

llvm-svn: 316567
2017-10-25 11:02:09 +00:00
Sam Parker ccb209bb97 [ARM] Swap cmp operands for automatic shifts
Swap the compare operands if the lhs is a shift and the rhs isn't,
as in arm and T2 the shift can be performed by the compare for its
second operand.

Differential Revision: https://reviews.llvm.org/D39004

llvm-svn: 316562
2017-10-25 08:33:06 +00:00
Martin Storsjo 373c8efa1e [AArch64] Add support for dllimport of values and functions
Previously, the dllimport attribute did the right thing in terms
of treating it as a pointer to a value, but this makes sure the
names get mangled properly, and calls to such functions load the
function from the __imp_ pointer.

This is based on SVN r212431 and r212430 where the same was
implemented for ARM.

Differential Revision: https://reviews.llvm.org/D38530

llvm-svn: 316555
2017-10-25 07:25:18 +00:00
Matt Arsenault 8a752b77a2 DAG: Fix creating select with wrong condition type
This code added in r297930 assumed that it could create
a select with a condition type that is just an integer
bitcast of the selected type. For AMDGPU any vselect is
going to be scalarized (although the vector types are legal),
and all select conditions must be i1 (the same as getSetCCResultType).

This logic doesn't really make sense to me, but there's
never really been a consistent policy in what the select
condition mask type is supposed to be. Try to extend
the logic for skipping the transform for condition types
that aren't setccs. It doesn't seem quite right to me though,
but checking conditions that seem more sensible (like whether the
vselect is going to be expanded) doesn't work since this
seems to depend on that also.

llvm-svn: 316554
2017-10-25 07:14:07 +00:00
Artem Belevich cb8f6328dc [NVPTX] allow address space inference for volatile loads/stores.
If particular target supports volatile memory access operations, we can
avoid AS casting to generic AS. Currently it's only enabled in NVPTX for
loads and stores that access global & shared AS.

Differential Revision: https://reviews.llvm.org/D39026

llvm-svn: 316495
2017-10-24 20:31:44 +00:00
Gadi Haber 323f2e1715 [X86][Broadwell] Added the instruction scheduling information for the Broadwell CPU.
Adding the scheduling information for the Browadwell (BDW) CPU target.

This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target.
We used the scheduling information retrieved from the Broadwell architects in order to create the file.
The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction.

The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175.

Performance fluctuations may be expected due to code alignment effects.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D39054

Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01
llvm-svn: 316492
2017-10-24 20:19:47 +00:00
Justin Bogner 6c452834a1 MIR: Print the register class or bank in vreg defs
This updates the MIRPrinter to include the regclass when printing
virtual register defs, which is already valid syntax for the
parser. That is, given 64 bit %0 and %1 in a "gpr" regbank,

  %1(s64) = COPY %0(s64)

would now be written as

  %1:gpr(s64) = COPY %0(s64)

While this change alone introduces a bit of redundancy with the
registers block, it allows us to update the tests to be more concise
and understandable and brings us closer to being able to remove the
registers block completely.

Note: We generally only print the class in defs, but there is one
exception. If there are uses without any defs whatsoever, we'll print
the class on all uses. I'm not completely convinced this comes up in
meaningful machine IR, but for now the MIRParser and MachineVerifier
both accept that kind of stuff, so we don't want to have a situation
where we can print something we can't parse.

llvm-svn: 316479
2017-10-24 18:04:54 +00:00
Stefan Pintilie 8f0c783095 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
  index on the Splat and then remove the Swap instruction.

Fixed the test case that was failing and recommit after pulling the original
  commit.

  Original revision is here: https://reviews.llvm.org/D39009

llvm-svn: 316478
2017-10-24 17:44:27 +00:00
Simon Pilgrim 5e8c3f328f [X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC
llvm-svn: 316462
2017-10-24 17:04:57 +00:00
Simon Pilgrim 1bc62f03a5 [SelectionDAG] Add VSELECT support to ComputeNumSignBits
llvm-svn: 316457
2017-10-24 16:38:38 +00:00
Simon Pilgrim 0a12c239b6 [X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB.
By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits.

llvm-svn: 316448
2017-10-24 15:38:16 +00:00
Sanjay Patel f762c7b32f [x86] add more vector ISA variants for memcmp expansion; NFC
...because every swiss cheese has different holes.

llvm-svn: 316446
2017-10-24 15:27:47 +00:00
Andrew V. Tischenko f4fbe4a51b Update f16c instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39051

llvm-svn: 316435
2017-10-24 13:38:30 +00:00
Zvi Rackover 31b101a186 X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms
Summary:
r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However,
X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable.

This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms.

An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save
the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result
in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire.

Fixes pr34863

Reviewers: DavidKreitzer, guyblank, aymanmus

Reviewed By: aymanmus

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38738

llvm-svn: 316431
2017-10-24 12:13:05 +00:00
Marek Olsak ce76ea0394 AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)
Summary:
Kill the thread if operand 0 == false.
llvm.amdgcn.wqm.vote can be applied to the operand.

Also allow kill in all shader stages.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D38544

llvm-svn: 316427
2017-10-24 10:27:13 +00:00
Marek Olsak 2114fc3bcb AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic
Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D38543

llvm-svn: 316426
2017-10-24 10:26:59 +00:00
Zvi Rackover 3c0d385598 X86: Fix X86CallFrameOptimization to search for the COPY StackPointer
SelectionDAG inserts a copy of ESP into a virtual register.
X86CallFrameOptimization assumed that the COPY, if present, is always
right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a
wrong assumption as the COPY can be located anywhere between the call-frame setup
instruction and its first use. If the COPY happened to be located in a different
location than what X86CallFrameOptimization assumed, visiting it while
processing the call chain would lead to a conservative bail-out.

The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note
of it so it can be ignored while processing the call chain.

Fixes pr34903

Differential Revision: https://reviews.llvm.org/D38730

llvm-svn: 316416
2017-10-24 07:38:29 +00:00
Omer Paparo Bivas 2251c79aba [MC] Adding code padding for performance stability - infrastructure. NFC.
Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved.
The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known.
This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding.

Reviewers:
aaboud
zvi
craig.topper
gadi.haber

Differential revision: https://reviews.llvm.org/D34393

Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e
llvm-svn: 316413
2017-10-24 06:16:03 +00:00
Zvi Rackover c6d0b6c103 X86: Register the X86CallFrameOptimization pass
Summary:
The motivation of this change is to enable .mir testing for this pass.
Added one test case to cover the functionality, this same case will be improved by
a future patch.

Reviewers: igorb, guyblank, DavidKreitzer

Reviewed By: guyblank, DavidKreitzer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38729

llvm-svn: 316412
2017-10-24 05:47:07 +00:00
Jessica Paquette 9df7fde269 [MachineOutliner] Add optimisation remarks for successful outlining
This commit adds optimisation remarks for outlining which fire when a function
is successfully outlined.

To do this, OutlinedFunctions must now contain references to their Candidates.
Since the Candidates must still be sorted and worked on separately, this is
done by working on everything in terms of shared_ptrs to Candidates. This is
good; it means that we can easily move everything to outlining in terms of
the OutlinedFunctions rather than the individual Candidates. This is far more
intuitive than what's currently there!

(Remarks are output when a function is created for some group of Candidates.
In a later commit, all of the outlining logic should be rewritten so that we
loop over OutlinedFunctions rather than over Candidates.)
 

llvm-svn: 316396
2017-10-23 23:36:46 +00:00
Aditya Nandakumar 921f24cef1 [GISel][ARM]: Fix illegal Generic copies in tests
This is in preparation for a verifier check that makes sure
copies are of the same size (when generic virtual registers are involved).

llvm-svn: 316388
2017-10-23 22:53:08 +00:00
Aditya Nandakumar 4dfd2590dc [GISel][AArch64]: Fix illegal Generic copies in tests
This is in preparation for a verifier check that makes sure copies are
of the same size (when generic virtual registers are involved).

llvm-svn: 316387
2017-10-23 22:53:04 +00:00
Simon Pilgrim 321e54f72d [X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8
Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS

llvm-svn: 316383
2017-10-23 22:05:02 +00:00
George Burgess IV 8a0e4bc972 Don't crash when we see unallocatable registers in clobbers
This fixes a bug where we'd crash given code like the test-case from
https://bugs.llvm.org/show_bug.cgi?id=30792 . Instead, we let the
offending clobber silently slide through.

This doesn't fully fix said bug, since the assembler will still complain
the moment it sees a crypto/fp/vector op, and we still don't diagnose
calls that require vector regs.

Differential Revision: https://reviews.llvm.org/D39030

llvm-svn: 316374
2017-10-23 20:46:36 +00:00
Stefan Pintilie 52bbd587ac Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat"
Revert commit r316366.
Previous commit causes p8-scalar_vector_conversions.ll to fail.

This reverts commit 990e764ad8a2eec206ce5dda6aefab059ccd4e92.

llvm-svn: 316371
2017-10-23 20:22:23 +00:00
Krzysztof Parzyszek 6f06b6edff [Hexagon] Return the correct chain edge for i1 function calls
In HexagonISelLowering, there is code to handle the case when
a function returns an i1 type. In this case, we need to generate
extra nodes to copy the result from R0 to a predicate register.

The code was returning the wrong value for the chain edge which
caused an assert "Wrong topological sorting" when converting the
instructions to MIs.

This patch fixes the problem by returning the chain for the final
copy.

Patch by Brendon Cahoon.

llvm-svn: 316367
2017-10-23 19:35:25 +00:00
Stefan Pintilie feafa1d7f0 [PowerPC] Try to simplify a Swap if it feeds a Splat
If we have the situation where a Swap feeds a Splat we can sometimes change the
index on the Splat and then remove the Swap instruction.

Differential Revision: https://reviews.llvm.org/D39009

llvm-svn: 316366
2017-10-23 19:33:31 +00:00
Krzysztof Parzyszek 273678823b [Hexagon] Add extra pattern for S4_addaddi
One combination was missing: add(add(x,y),c).

llvm-svn: 316363
2017-10-23 19:07:50 +00:00
Daniel Sanders d66e0901ae [globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero.
This patch enables the import of stores. Unfortunately, doing so by itself,
loses an optimization where storing 0 to memory makes use of WZR/XZR.

To mitigate this, this patch also introduces a new feature that allows register
operands to nominate a zero register. When this is done, GlobalISel will
substitute (G_CONSTANT 0) with the nominated register automatically. This
is currently configured to only apply to the stores.

Applying it to GPR32/GPR64 register classes in general will be done after
review see (https://reviews.llvm.org/D39150).

llvm-svn: 316360
2017-10-23 18:19:24 +00:00
Simon Pilgrim 6bda996cab [X86][SSE] Regenerate PACKSS tests on 32 + 64-bit targets
llvm-svn: 316354
2017-10-23 17:50:40 +00:00
Matt Arsenault b791802aef AMDGPU: Fix default range in non-kernel functions
The range should be assumed to be the hardware maximum
if a workitem intrinsic is used in a callable function
which does not know the restricted limit of the calling
kernel.

llvm-svn: 316346
2017-10-23 17:09:35 +00:00
Andrew V. Tischenko 777308b548 Update DPPD/DPPS instruction scheduling on btver2.
Differential Revision: https://reviews.llvm.org/D39046

llvm-svn: 316334
2017-10-23 15:53:30 +00:00
Simon Pilgrim 32da2f9245 [DAGCombine] Permit combining of shuffles of equivalent splat BUILD_VECTORs
combineShuffleOfScalars is very conservative about shuffled BUILD_VECTORs that can be combined together.

This patch adds one additional case - if both BUILD_VECTORs represent splats of the same scalar value but with different UNDEF elements, then we should create a single splat BUILD_VECTOR, sharing only the UNDEF elements defined by the shuffle mask.

Differential Revision: https://reviews.llvm.org/D38696

llvm-svn: 316331
2017-10-23 15:48:08 +00:00
Simon Pilgrim 03c8753924 [X86][SSE] Regenerate bitcast-and-setcc tests
Avoid the retl/retq changes in an upcoming patch

llvm-svn: 316328
2017-10-23 14:47:49 +00:00
Simon Pilgrim e131cb0bd5 [X86][AVX2] Regenerate AVX2 intrinsics tests on 32 + 64-bit targets
llvm-svn: 316326
2017-10-23 14:19:46 +00:00
Simon Pilgrim c680c4742b [X86][AVX] Regenerate AVX intrinsics tests on 32 + 64-bit targets
llvm-svn: 316325
2017-10-23 14:17:59 +00:00
Simon Pilgrim eae6e9dbc5 [X86][F16C] Regenerate F16C schedule tests
llvm-svn: 316324
2017-10-23 14:15:24 +00:00
Ayman Musa 4b2bd5ff5e [X86] Add test for opportunity to use bzhi X86 instruction instead of load+and instructions.
Transformation uploaded for CR in https://reviews.llvm.org/D34141.

llvm-svn: 316320
2017-10-23 10:24:19 +00:00
Marina Yatsina f9371d821f Add logic to greedy reg alloc to avoid bad eviction chains
This fixes bugzilla 26810
https://bugs.llvm.org/show_bug.cgi?id=26810

This is intended to prevent sequences like:
movl %ebp, 8(%esp) # 4-byte Spill
movl %ecx, %ebp
movl %ebx, %ecx
movl %edi, %ebx
movl %edx, %edi
cltd
idivl %esi
movl %edi, %edx
movl %ebx, %edi
movl %ecx, %ebx
movl %ebp, %ecx
movl 16(%esp), %ebp # 4 - byte Reload

Such sequences are created in 2 scenarios:

Scenario #1:
vreg0 is evicted from physreg0 by vreg1
Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from)
Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.)
one of the split intervals ends up evicting vreg2 from physreg1
Evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

Scenario #2
vreg0 is evicted from physreg0 by vreg1
vreg2 is evicted from physreg2 by vreg3 etc
Evictee vreg0 is intended for region splitting with split candidate physreg1
Region splitting creates a local interval because of interference with the evictor vreg1
one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from)
Another evictee vreg2 is intended for region splitting with split candidate physreg1
one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills

As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest).

Differential Revision: https://reviews.llvm.org/D35816

Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39
llvm-svn: 316295
2017-10-22 17:59:38 +00:00
Momchil Velikov d6a4ab3d49 [ARM] Dynamic stack alignment for 16-bit Thumb
This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When
targeting processors, which support only the 16-bit Thumb instruction set
the compiler ignores the alignment attributes of automatic variables and may
silently generate incorrect code.

Differential revision: https://reviews.llvm.org/D38143

llvm-svn: 316289
2017-10-22 11:56:35 +00:00
Guy Blank 92d5ce3bd4 [X86] Add a pass to convert instruction chains between domains.
The pass scans the function to find instruction chains that define
registers in the same domain (closures).
It then calculates the cost of converting the closure to another domain.
If found profitable, the instructions are converted to instructions in
the other domain and the register classes are changed accordingly.

This commit adds the pass infrastructure and a simple conversion from
the GPR domain to the Mask domain.

Differential Revision:
https://reviews.llvm.org/D37251

Change-Id: Ic2cf1d76598110401168326d411128ae2580a604
llvm-svn: 316288
2017-10-22 11:43:08 +00:00
Aaron Ballman fc02869c96 Reverting r316270 due to failing build bots.
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/12899
http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7951

llvm-svn: 316276
2017-10-21 20:38:15 +00:00
Simon Pilgrim 3cb024490a [X86][SSE] Add extractps/pextrd equivalence to domain tables
Differential Revision: https://reviews.llvm.org/D39135

llvm-svn: 316274
2017-10-21 20:19:48 +00:00
Fangrui Song c7b749bd06 [PPC CodeGen] Fix the bitreverse.i64 intrinsic.
Summary: The two 32-bit words were swapped.

Subscribers: nemanjai, kbarton

Differential Revision: https://reviews.llvm.org/D38705

llvm-svn: 316270
2017-10-21 16:59:40 +00:00
Simon Pilgrim 7025b07828 [X86][SSE] Add missing extractps scheduling test
llvm-svn: 316262
2017-10-21 14:35:09 +00:00
Craig Topper fcf27188d7 [X86] Do not generate __multi3 for mul i128 on X86
Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function.  This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test.

Patch by Riyaz V Puthiyapurayil

Reviewers: craig.topper, schweitz

Reviewed By: craig.topper, schweitz

Subscribers: RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D38668

llvm-svn: 316254
2017-10-21 02:26:00 +00:00
Krzysztof Parzyszek 9d19c8cac9 [Packetizer] Add function to check for aliasing between instructions
llvm-svn: 316243
2017-10-20 22:08:40 +00:00
Krzysztof Parzyszek 022922b31a [Hexagon] Report error instead of crashing on wrong inline-asm constraints
llvm-svn: 316236
2017-10-20 20:24:44 +00:00
Krzysztof Parzyszek 64e5d7d3ae [Hexagon] Reorganize and update instruction patterns
llvm-svn: 316228
2017-10-20 19:33:12 +00:00
Simon Pilgrim 1311ff1340 [X86][SSE] Add missing _mm_extract_ps fast-isel test
llvm-svn: 316226
2017-10-20 19:29:01 +00:00
Sanjay Patel bb94161fb7 [x86] avoid FileCheck assert duplication with retl/retq regex; NFC
This was suggested in PR35003:
https://bugs.llvm.org/show_bug.cgi?id=35003

32-bit checks may be identical to 64-bit (if we avoid those pesky scalar params!).

I'll check in the script change shortly assuming this doesn't anger any bots.

llvm-svn: 316223
2017-10-20 18:35:32 +00:00
Dave Lee f9b72327b0 Make x86 __ehhandler comdat if parent function is
Summary:
This change comes from using lld for i686-windows-msvc. Before this change, lld
emits an error of:

    error: relocation against symbol in discarded section: .xdata

It's possible that this could be addressed in lld, but I think this change is
reasonable on its own.

At a high level, this is being generated:

    A (.text comdat) -> B (.text) -> C (.xdata comdat)

Where A is a C++ inline function, which references B, an exception handler
thunk, which references C, the exception handling info.

With this structure, lld will error when applying relocations to B if the C it
references has been discarded (some other C has been selected).

This change checks if A is comdat, and if so places the exception registration
thunk (B) in the comdata group of A (and B).

It appears that MSVC makes the __ehhandler function comdat.

Is it possible that duplicate thunks are being emitted into the final binary
with other linkers, or are they stripping the unused thunks?

Reviewers: rnk, majnemer, compnerd, smeenai

Reviewed By: rnk, compnerd

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D38940

llvm-svn: 316219
2017-10-20 17:04:43 +00:00
Krzysztof Parzyszek 3818aeaeb9 [Hexagon] Allow redefinition with immediates for hw loop conversion
Normally, if the registers holding the induction variable's bounds
are redefined inside of the loop's body, the loop cannot be converted
to a hardware loop. However, if the redefining instruction is actually
loading an immediate value into the register, this conversion is both
possible and legal (since the immediate itself will be used in the
loop setup in the preheader).

llvm-svn: 316218
2017-10-20 16:56:33 +00:00
Simon Pilgrim b6b617b7d8 [X86] Check all CPU target names.
We ignore the 32-bit/64-bit triple but I've tried to use i686 triples for CPUs that don't support x86_64

llvm-svn: 316217
2017-10-20 16:55:51 +00:00
Zvi Rackover e95709d54a X86 Tests: Add tests for vector permutes with variable indices. NFC.
Basic tests which are the equivalent of single-source shufflevector with variable mask.

llvm-svn: 316216
2017-10-20 15:32:14 +00:00
Aleksandar Beserminji 143572984d Revert "[mips] Reordering callseq* nodes to be linear"
This reverts commit r314507, because the original patch is causing test
failures.

llvm-svn: 316215
2017-10-20 14:35:41 +00:00
Eugene Leviant 27b226fb65 [ARM] Use post-RA MI scheduler when +use-misched is set
Differential revision: https://reviews.llvm.org/D39100

llvm-svn: 316214
2017-10-20 14:29:17 +00:00
Simon Pilgrim 46b791921f [X86][AVX512] Regenerate regcall tests.
As part of tracking down machine verifier issues (PR27481)

llvm-svn: 316213
2017-10-20 14:13:02 +00:00
Dylan McKay 6670e42402 [AVR] Fix the select-mbb-placement-bug.ll
llvm-svn: 316205
2017-10-20 04:17:14 +00:00
Nemanja Ivanovic 0026c06e11 Disabling the transformation introduced in r315888
The commit at https://reviews.llvm.org/rL315888 is causing some failures
with internal testing. Disabling this code until we can resolve the issues.

llvm-svn: 316199
2017-10-20 00:36:46 +00:00
Alex Bradbury 8971842f43 [RISCV] Initial codegen support for ALU operations
This adds the minimum necessary to support codegen for simple ALU operations
on RV32. Prolog and epilog insertion, support for memory operations etc etc 
follow in future patches.

Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is 
reviewed and lands.

Differential Revision: https://reviews.llvm.org/D29933

llvm-svn: 316188
2017-10-19 21:37:38 +00:00
Simon Pilgrim e8e2c4c0cf [X86][AES] Test AES intrinsics on 32/64-bit targets with/without VEX encoding
Don't just test on 32-bit

llvm-svn: 316176
2017-10-19 19:05:04 +00:00
Krzysztof Parzyszek e4d0e199bf [Hexagon] Fix store conversion from rr to io in optimize addressing modes
llvm-svn: 316170
2017-10-19 16:59:22 +00:00
Simon Pilgrim fdd63d1535 [X86] Replace custom scalar integer absolute matching with ISD::ABS lowering.
x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV.

This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version.

Additional test cases are already covered by iabs.ll (rL315706 and rL315711).

Differential Revision: https://reviews.llvm.org/D38895

llvm-svn: 316162
2017-10-19 15:02:24 +00:00
Simon Pilgrim d0649f978f [X86] Add scalar (abs (abs x)) -> (abs x) combine test.
Before landing D38895

llvm-svn: 316160
2017-10-19 14:59:26 +00:00
Diana Picus 7bf71008aa [ARM GlobalISel] Fix liveins in test. NFC
llvm-svn: 316155
2017-10-19 09:28:19 +00:00
Diana Picus a993859335 [ARM GlobalISel] Remove redundant tests
These test cases don't really add anything that isn't covered by other
tests as well, so we can safely remove them.

llvm-svn: 316154
2017-10-19 08:50:28 +00:00
Justin Bogner 876ad287d1 GISel: Canonicalize select tests using update_mir_test_checks
This runs `udpate_mir_test_checks --add-vreg-checks` on the tests taht
are already more or less in the format that generates, so that there
will be less churn in some upcoming changes.

llvm-svn: 316139
2017-10-18 23:33:31 +00:00
Justin Bogner f8dc015bd1 AArch64/GISel: Modernize the localizer test
llvm-svn: 316138
2017-10-18 23:26:24 +00:00
Justin Bogner d45849f703 Canonicalize a large number of mir tests using update_mir_test_checks
This converts a large and somewhat arbitrary set of tests to use
update_mir_test_checks. I ran the script on all of the tests I expect
to need to modify for an upcoming mir syntax change and kept the ones
that obviously didn't change the tests in ways that might make it
harder to understand.

llvm-svn: 316137
2017-10-18 23:18:12 +00:00
Dylan McKay 443695f80a [AVR] Fix the select_mbb_placement_bug.ll test
llvm-svn: 316124
2017-10-18 20:04:57 +00:00
Sumanth Gundapaneni e1983bcf55 [Hexagon] New HVX target features.
This patch lets the llvm tools handle the new HVX target features that
are added by frontend (clang). The target-features are of the form
"hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX.
"hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated.
The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}.
Eg: "+hvxv62"

For the correct HVX code generation, the user must use the following
target features.
For 64B mode: "+hvxv62" "+hvx-length64b"
For 128B mode: "+hvxv62" "+hvx-length128b"

Clang picks a default length if none is specified. If for some reason,
no hvx-length is specified to llvm, the compilation will bail out.
There is a corresponding clang patch.

Differential Revision: https://reviews.llvm.org/D38851

llvm-svn: 316101
2017-10-18 18:07:07 +00:00
Konstantin Zhuravlyov 8d5e9e110c AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency
Differential Revision: https://reviews.llvm.org/D38957

llvm-svn: 316097
2017-10-18 17:31:09 +00:00
Justin Bogner 2ac32cc9ce AArch64/GISel: Fix a couple of tests that were testing the wrong thing
Fix a couple of tests that were extending the wrong vreg, and
regenerate their checks with update_mir_test_checks. This looks like
it was a copy-paste or test update error.

llvm-svn: 316087
2017-10-18 15:34:33 +00:00
Simon Dardis 03c2c65b2d [mips] Fix analyzeBranch to handle debug data
In the case where there was a conditional branch followed by a unconditional
branch with debug instruction separating them, MipsInstrInfo::analyzeBranch
would not skip past debug instruction when searching for the second branch
which give erroneous results about the control flow of the block.

This could lead to the branch folder to merge the non-fall through case
into it's predecessor, leaving the conditional branch with a dangling
basic block operand.

This resolves PR34975.

Thanks to Alexander Richardson for reporting the issue!

Reviewers: atanasyan

Differential Revision: https://reviews.llvm.org/D39003

llvm-svn: 316084
2017-10-18 14:35:29 +00:00
Simon Dardis 77bf0fd59c [mips] Move test to correct directory. NFCI
llvm-svn: 316081
2017-10-18 13:59:48 +00:00
Michael Zuckerman 7ba046c784 Adding new test for
bug fix 316067 https://bugs.llvm.org/show_bug.cgi?id=34978

This test checks that the x86-interleaved ends without any
assertion.

Change-Id: I1e970482a4d0404516cbc85517fc091bb21c35a8
llvm-svn: 316080
2017-10-18 13:51:31 +00:00
Hiroshi Inoue 5388e66d3a [PowerPC] Use helper functions to check sign-/zero-extended value
Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888.
This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM.

Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr.

Differential Revision: https://reviews.llvm.org/D38988

llvm-svn: 316071
2017-10-18 10:31:19 +00:00
Wei Ding 7ab1f7a421 AMDGPU : Fix an error for the llvm.cttz implementation.
Differential Revision: http://reviews.llvm.org/D39014

llvm-svn: 316037
2017-10-17 21:49:52 +00:00
Tim Northover 350a87eaf1 AArch64: account for possible frame index operand in compares.
If the address of a local is used in a comparison, AArch64 can fold the
address-calculation into the comparison via "adds". Unfortunately, a couple of
places (both hit in this one test) are not ready to deal with that yet and just
assume the first source operand is a register.

llvm-svn: 316035
2017-10-17 21:43:52 +00:00
Simon Pilgrim 7cd4e2c96f [X86][SSE] Tests packuswb/truncation codegen from PR34773
llvm-svn: 316033
2017-10-17 21:14:53 +00:00
Konstantin Zhuravlyov 7dabe9ced7 AMDGPU: Start generating metadata for MaxFlatWorkGroupSize
Differential Revision: https://reviews.llvm.org/D38958

llvm-svn: 316024
2017-10-17 20:03:21 +00:00
Sanjay Patel 94c0eb031c [ARM, AArch64] adjust tests trying to maintain their objective; NFC
A smarter compiler will see that these might be better without a jump table
if we're just using the constant values of the switch.

llvm-svn: 316012
2017-10-17 16:54:56 +00:00
Gadi Haber 85d99b4310 [X86][Broadwell] Added the broadwell cpu to the scheduling regression tests.<NFC>
NFC.
Added the Broadwell cpu and the BROADWELL prefix to all the scheduling regression tests, as part of prepartion for a larger commit of adding all Broadwell scheduiling.

Reviewers: RKSimon, zvi, aaboud
Differential Revision: https://reviews.llvm.org/D38994

Change-Id: I54bc9065168844c107b1729fcdc1d311ce3ea0a9
llvm-svn: 315998
2017-10-17 13:45:39 +00:00
Yichao Yu a18b0b1817 Fix implicit null check with negative offset
Summary:
It seems that negative offset was accidentally allowed in D17967.
AFAICT small negative offset should be valid (always raise segfault) on all archs that I'm aware of (especially x86, which is the only one with this optimization enabled) and such case can be useful when loading hiden metadata from an object.

However, like the positive side, it should only be done within a certain limit.
For now, use the same limit on the positive side for the negative side.
A separate option can be added if needs appear.

Reviewers: mcrosier, skatkov

Reviewed By: skatkov

Subscribers: sanjoy, llvm-commits

Differential Revision: https://reviews.llvm.org/D38925

llvm-svn: 315991
2017-10-17 11:47:36 +00:00
Gadi Haber 3020490aac [X86][Skylake] fixed/updated regression test mmx-schedule.ll which failed after r315978.
Change-Id: I60cd7e03ea6c3d9a3dc661a882458e83feca66e3
llvm-svn: 315985
2017-10-17 10:00:08 +00:00
Gadi Haber 1e0f1f476a [X86][SKL] Updated scheduling information for the SkylakeClient target
Updated the scheduling information for the SkylakeClient target with the following changes:

1. regrouped the instructions after adding load and store latencies.
2. regrouped the instructions after adding identified missing ports in several groups.
The changes were made after revisiting the latencies impact of all the load and store uOps.

Reviewers: zvi, RKSimon, craig.topper
Differential Revision: https://reviews.llvm.org/D38727

Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f
llvm-svn: 315978
2017-10-17 06:47:04 +00:00
Daniel Sanders 3229217620 [globalisel][tablegen] Add a GIM_CheckIsSameOperand test where OtherInsnID and OtherOpIdx differ
llvm-svn: 315972
2017-10-17 05:24:44 +00:00
Craig Topper 341f2ab444 [X86] Add masked palignr tests to vector-shuffle-masked.ll
llvm-svn: 315971
2017-10-17 04:17:56 +00:00
Craig Topper 19f2f49ef1 [X86] Add AVX512BW to the vector-shuffle-masked test to prepare for an upcoming commit.
llvm-svn: 315970
2017-10-17 04:17:55 +00:00
Mark Searles 4e3d6160db Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats).
Differential Revision: https://reviews.llvm.org/D38466

llvm-svn: 315957
2017-10-16 23:38:53 +00:00
Simon Pilgrim a590c74549 [X86][AVX] Add v4x64 vector shuffle test for <0,2,1,3> mask
llvm-svn: 315955
2017-10-16 23:20:16 +00:00
Quentin Colombet 0bd2825517 Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY
This reverts commit r315823, thus re-applying r315781.

Also make sure we don't use G_BITCAST mapping for non-generic registers.
Non-generic registers don't have a type but do have a reg bank.
Something the COPY mapping now how to deal with but the G_BITCAST
mapping don't.

-- Original Commit Message --
We use to resort on the generic implementation to get the mappings for
COPYs. The generic implementation resorts on table lookup and
dynamically allocated objects to get the valid mappings.

Given we already know how to map G_BITCAST and have the static mappings
for them, use that code path for COPY as well. This is much more
efficient.

Improve the compile time of RegBankSelect by up to 20%.

Note: When we eventually generate all the mappings via TableGen, we
wouldn't have to do that dance to shave compile time. The intent of this
change was to make sure that moving to static structure really pays off.

NFC.

llvm-svn: 315947
2017-10-16 22:28:40 +00:00