Commit Graph

15568 Commits

Author SHA1 Message Date
Tom Stellard 79a1fd718c AMDGPU: allow specifying a workgroup size that needs to fit in a compute unit
Summary:
For GL_ARB_compute_shader we need to support workgroup sizes of at least 1024. However, if we want to allow large workgroup sizes, we may need to use less registers, as we have to run more waves per SIMD.

This patch adds an attribute to specify the maximum work group size the compiled program needs to support. It defaults, to 256, as that has no wave restrictions.

Reducing the number of registers available is done similarly to how the registers were reserved for chips with the sgpr init bug.

Reviewers: mareko, arsenm, tstellarAMD, nhaehnle

Subscribers: FireBurn, kerberizer, llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D18340

Patch By: Bas Nieuwenhuizen

llvm-svn: 266337
2016-04-14 16:27:07 +00:00
Tom Stellard f110f8f9f7 AMDGPU/SI: Use the correct scratch wave offset register for shaders.
Summary:
The code previously always used s1 as it was using the user + system SGPR
information for compute kernels. This is incorrect for Mesa shaders though,

The register should be the next SGPR after all user and system SGPR's.
We use that Mesa adds arguments for all input and system SGPR's and
take the next available SGPR for the scratch wave offset register.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewers: mareko, arsenm, nhaehnle, tstellarAMD

Subscribers: qcolombet, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18941

Patch By: Bas Nieuwenhuizen

llvm-svn: 266336
2016-04-14 16:27:03 +00:00
Simon Dardis 53a3492b71 Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

This patch was previous committed as r266055 as seemed to have caused some spurious
test failures. They did not reappear after further local testing.

llvm-svn: 266301
2016-04-14 13:43:17 +00:00
Vasileios Kalintiris d10ce39254 [mips] Remove duplicate tests and add missing prefixes for *-LABEL checks. NFC.
Summary:
The only difference between the removed tests and the pre-existing
ones, is the materialization of the zero constant, which shouldn't
matter for these cases.

Reviewers: dsanders, sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18693

llvm-svn: 266285
2016-04-14 09:13:13 +00:00
Adam Nemet 7aab648831 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

llvm-svn: 266282
2016-04-14 08:47:17 +00:00
David Majnemer 0f26b0aeb4 [CodeGen] Teach LLVM how to lower @llvm.{min,max}num to {MIN,MAX}NAN
The behavior of {MIN,MAX}NAN differs from that of {MIN,MAX}NUM when only
one of the inputs is NaN: -NUM will return the non-NaN argument while
-NAN would return NaN.

It is desirable to lower to @llvm.{min,max}num to -NAN if they don't
have a native instruction for -NUM.  Notably, ARMv7 NEON's vmin has the
-NAN semantics.

N.B.  Of course, it is only safe to do this if the intrinsic call is
marked nnan.

llvm-svn: 266279
2016-04-14 07:13:24 +00:00
Matt Arsenault 9cd90712f0 AMDGPU: Implement canonicalize
Also add generic DAG node for it.

llvm-svn: 266272
2016-04-14 01:42:16 +00:00
Sanjay Patel 748b06514a [ppc] add tests to show potential andc optimization
llvm-svn: 266261
2016-04-13 23:23:30 +00:00
Tim Northover 5c02f9ad28 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Matthias Braun 707e02c273 ARM: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18901

llvm-svn: 266253
2016-04-13 21:43:25 +00:00
Matthias Braun 588d1cdad4 X86: Use a callee save register for the swiftself parameter.
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D18902

llvm-svn: 266252
2016-04-13 21:43:21 +00:00
Matthias Braun 74a0bd319a AArch64: Use a callee save registers for swiftself parameters
It is very likely that the swiftself parameter is alive throughout most
functions function so putting it into a callee save register should
avoid spills for the callers with only a minimum amount of extra spills
in the callees.

Currently the generated code is correct but unnecessarily spills and
reloads arguments passed in callee save registers, I will address this
in upcoming patches.

This also adds a missing check that for tail calls the preserved value
of the caller must be the same as the callees parameter.

Differential Revision: http://reviews.llvm.org/D19007

llvm-svn: 266251
2016-04-13 21:43:16 +00:00
Sanjay Patel d6c53f8727 [x86] add tests to show potential BMI optimization
llvm-svn: 266243
2016-04-13 20:40:43 +00:00
Evandro Menezes 8d53f88162 [AArch64] Disable LDP/STP for quads
Disable LDP/STP for quads on Exynos M1 as they are not as efficient as pairs
of regular LDR/STR.

Patch by Abderrazek Zaafrani <a.zaafrani@samsung.com>.

llvm-svn: 266223
2016-04-13 18:31:45 +00:00
Nirav Dave 2477491a92 Cleanup Store Merging in UseAA case
This patch fixes a bug (PR26827) when using anti-aliasing in store
merging. This sets the chain users of the component stores to point to
the new store instead of the component stores chain parent.

Reviewers: jyknight

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18909

llvm-svn: 266217
2016-04-13 17:27:26 +00:00
Tim Northover b8a1ecfc62 AArch64: don't create instructions that write to xzr/wzr twice.
These are unpredictable even on AArch64.

Patch by Yichao Yu.

llvm-svn: 266206
2016-04-13 16:25:39 +00:00
Artem Tamazov eb4d5a9b0b [AMDGPU][llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)git status
Tests added along with implemented feature.
Note that there is a small leftover of unecessary MI sheduling issue
(more info in the review). CodeGen/AMDGPU/salu-to-valu.ll updated to fix
the false regression.

TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Differential Revision: http://reviews.llvm.org/D17825

llvm-svn: 266205
2016-04-13 16:18:41 +00:00
Zoran Jovanovic 2f6845ba39 [mips] Fix emitAtomicCmpSwapPartword to handle 64 bit pointers correctly
Differential Revision: http://reviews.llvm.org/D18995

llvm-svn: 266204
2016-04-13 16:02:25 +00:00
Vasileios Kalintiris 3751d4114c [mips] Sign-extend i32 values truncated from previously zero-extended i32 values.
Summary:
This is a special case for MIPS64 because the architecture requires
properly 32-bit sign-extended values in the register containers.

Additionaly, we merge consecutive trunc + AssertZExt nodes in order
to avoid unnecessary sign-extensions when the extension comes from a
type smaller than i32.

Reviewers: dsanders

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D18893

llvm-svn: 266203
2016-04-13 15:07:45 +00:00
Simon Pilgrim cc901b57d5 [X86][SSE] Regenerated vector integer absolute tests
llvm-svn: 266194
2016-04-13 12:40:22 +00:00
Simon Pilgrim b6ef8b730d Added missing autogeneration note
llvm-svn: 266185
2016-04-13 09:28:44 +00:00
Zlatko Buljan 58d6a959be [mips][microMIPS] Add CodeGen support for DIV, MOD, DIVU, MODU, DDIV, DMOD, DDIVU and DMODU instructions
Differential Revision: http://reviews.llvm.org/D17137

This patch was reverted after the revertion of dependant patch http://reviews.llvm.org/D17068.
There was the problem with test-suite failure.
The problem is hopefully solved with dependant patch so this patch is commited again.

llvm-svn: 266179
2016-04-13 08:02:26 +00:00
Hrvoje Varga 11dd31df9a [mips][microMIPS] Fix for "Cannot copy registers" assertion
Differential Revision: http://reviews.llvm.org/D17068

This changes contains fix for failing test-suite. So, this patch should hopefully work now.

llvm-svn: 266171
2016-04-13 06:17:21 +00:00
Wei Mi 9a16d655c7 Recommit r265547, and r265610,r265639,r265657 on top of it, plus
two fixes with one about error verify-regalloc reported, and
another about live range update of phi after rematerialization.

r265547:
Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Patches on top of r265547:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

Differential Revision: http://reviews.llvm.org/D15302
Differential Revision: http://reviews.llvm.org/D18934
Differential Revision: http://reviews.llvm.org/D18935
Differential Revision: http://reviews.llvm.org/D18936

llvm-svn: 266162
2016-04-13 03:08:27 +00:00
Matt Arsenault 887d4767b7 AMDGPU: Add test for m0 initialization in basic loop
Initialization of m0 is emitted for each LDS operation, so
every block with LDS usage ends up with one. MachineLICM
used to fail to hoist this out of the loop, so every loop
iteration with LDS usage in it would re-initialize it.

This seems to be fixed now, so add a test to make sure that
it stays this way.

llvm-svn: 266156
2016-04-13 00:39:52 +00:00
Justin Bogner 263f314ba7 CodeGen: Clear the MFI's save and restore point after PrologEpilogInserter
This state is no longer useful and not guaranteed to be valid in later
codegen passes. For example, see the added test, which would print a
savepoint of %bb.-1 without this change, and crashes with a
use-after-free error under ASan if you apply the recycling allocator
patch from llvm.org/PR26808.

llvm-svn: 266150
2016-04-12 23:21:53 +00:00
Nicolai Haehnle df77c9ada4 AMDGPU: add llvm.amdgcn.buffer.load/store intrinsics
Summary:
They correspond to BUFFER_LOAD/STORE_DWORD[_X2,X3,X4] and mostly behave like
llvm.amdgcn.buffer.load/store.format. They will be used by Mesa for SSBO and
atomic counters at least when robust buffer access behavior is desired.
(These instructions perform no format conversion and do buffer range checking
per component.)

As a side effect of sharing patterns with llvm.amdgcn.buffer.store.format,
it has become trivial to add support for the f32 and v2f32 variants of that
intrinsic, so the patch does so.

Also DAG-ify (and fix) some tests that I noticed intermittent failures in
while developing this patch.

Some tests were (temporarily) adjusted for the required mayLoad/hasSideEffects
changes to the BUFFER_STORE_DWORD* instructions. See also
http://reviews.llvm.org/D18291.

Reviewers: arsenm, tstellarAMD, mareko

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18292

llvm-svn: 266126
2016-04-12 21:18:10 +00:00
Derek Schuff b861ec8734 [WebAssembly] Fix debug info in reg-stackify.ll test
It lacked a CU and thus became invalid with r266102

llvm-svn: 266114
2016-04-12 20:12:05 +00:00
Tom Stellard ab1d3a9d50 AMDGPU/SI: Insert wait states required after v_readfirstlane on SI
Summary:
We will be able to handle this case much better once the hazard recognizer
is finished, but this conservative implementation  fixes a hang with the piglit
test:

spec/arb_arrays_of_arrays/execution/sampler/fs-nested-struct-arrays-nonconst-nested-arra

Reviewers: arsenm, nhaehnle

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18988

llvm-svn: 266105
2016-04-12 18:40:43 +00:00
Matt Arsenault 3b08238f78 AMDGPU: Eliminate half of i64 or if one operand is zero_extend from i32
This helps clean up some of the mess when expanding unaligned 64-bit
loads when changed to be promote to v2i32, and fixes situations
where or x, 0 was emitted after splitting 64-bit ors during moveToVALU.

I think this could be a generic combine but I'm not sure.

llvm-svn: 266104
2016-04-12 18:24:38 +00:00
Nicolai Haehnle 279970c0dc AMDGPU/SI: Fix a mis-compilation of multi-level breaks
Summary:
Under certain circumstances, multi-level breaks (or what is understood by
the control flow passes as such) could be miscompiled in a way that causes
infinite loops, by emitting incorrect control flow intrinsics.

This fixes a hang in
dEQP-GLES3.functional.shaders.loops.while_dynamic_iterations.conditional_continue_vertex

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18967

llvm-svn: 266088
2016-04-12 16:10:38 +00:00
Artur Pilipenko dbe0bc8df4 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change.

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 266086
2016-04-12 15:58:04 +00:00
Geoff Berry c0739d8305 [ScheduleDAGInstrs] Handle instructions with multiple MMOs
Summary:
In getUnderlyingObjectsForInstr(): Don't give up on instructions with
multiple MMOs, instead look through all the MMOs and if they all meet
the conservative criteria previously used for single MMO instructions,
then return all of the underlying objects derived from the MMOs.

The change to ScheduleDAGInstrs::buildSchedGraph() is needed to avoid
the case where multiple underlying objects are present and are related
in such a way that successive iterations of the loop end up adding a
dependency from an instruction to itself.

Reviewers: atrick, hfinkel

Subscribers: MatzeB, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18093

llvm-svn: 266084
2016-04-12 15:50:19 +00:00
Than McIntosh b9049084a0 Test commit, NFC.
Adds a blank line.

llvm-svn: 266082
2016-04-12 15:35:05 +00:00
Matt Arsenault 64fa2f4513 AMDGPU: Implement i64 global atomics
llvm-svn: 266075
2016-04-12 14:05:11 +00:00
Matt Arsenault a9dbdcae04 AMDGPU: Add atomic_inc + atomic_dec intrinsics
These are different than atomicrmw add 1 because they have
an additional input value to clamp the result.

llvm-svn: 266074
2016-04-12 14:05:04 +00:00
Matt Arsenault 44e5483ada AMDGPU: Add volatile to test loads and stores
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.

llvm-svn: 266071
2016-04-12 13:38:18 +00:00
Simon Pilgrim edf100e4e7 [X86] Regenerated avx512 calling convention test checks
llvm-svn: 266070
2016-04-12 13:31:01 +00:00
Simon Dardis ee1590f5f0 Revert "[mips] MIPSR6 Compact branch aliases"
This reverts commit r266055.

ps4-buildslave2 is highlighting a failure.

llvm-svn: 266061
2016-04-12 12:22:45 +00:00
Simon Dardis 703c864fe3 [mips] MIPSR6 Compact branch aliases
Summary:
Alias 'jic $reg, 0' to 'jrc $reg' and 'jialc $reg, 0' to 'jalrc $reg' like
binutils.

Reviewers: dsanders

Differential Revision: http://reviews.llvm.org/D18856

llvm-svn: 266055
2016-04-12 10:41:53 +00:00
Chuang-Yu Cheng 94f58e79ae [PPC64] Mark CR0 Live if PPCInstrInfo::optimizeCompareInstr Creates a Use of CR0
Resolve Bug 27046 (https://llvm.org/bugs/show_bug.cgi?id=27046).
The PPCInstrInfo::optimizeCompareInstr function could create a new use of
CR0, even if CR0 were previously dead. This patch marks CR0 live if a use of
CR0 is created.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D18884

llvm-svn: 266040
2016-04-12 03:10:52 +00:00
Chuang-Yu Cheng 6efde2fb45 [PPC64] Use mfocrf in prologue when we only need to save 1 nonvolatile CR field
In the ELFv2 ABI, we are not required to save all CR fields. If only one
nonvolatile CR field is clobbered, use mfocrf instead of mfcr to
selectively save the field, because mfocrf has short latency compares to
mfcr.

Thanks Nemanja's invaluable hint!
Reviewers: nemanjai tjablin hfinkel kbarton

http://reviews.llvm.org/D17749

llvm-svn: 266038
2016-04-12 03:04:44 +00:00
Quentin Colombet 5933eb84d3 [AArch64] Add test cases for the repairing of physical registers.
llvm-svn: 266030
2016-04-12 00:43:40 +00:00
Quentin Colombet f43f882f37 [AArch64] Add a test case for the propagation of register banks through
phis.

llvm-svn: 266028
2016-04-12 00:32:55 +00:00
Quentin Colombet 99d19d89e9 [AArch64] Add a test case for the repairing of definitions.
llvm-svn: 266026
2016-04-12 00:25:22 +00:00
Quentin Colombet d52f4128d6 [AArch64] Test that RegBankSelect inserts the proper copies to fix the
register bank assignments.

llvm-svn: 266021
2016-04-12 00:00:42 +00:00
Evgeniy Stepanov f17120a85f [safestack] Add canary to unsafe stack frames
Add StackProtector to SafeStack. This adds limited protection against
data corruption in the caller frame. Current implementation treats
all stack protector levels as -fstack-protector-all.

llvm-svn: 266004
2016-04-11 22:27:48 +00:00
Tim Northover a6dea06fe3 ARM: use r7 as the frame-pointer on all MachO targets.
This is better for a few reasons:
  + It matches the other tooling for iOS.
  + It matches EABI in more cases (i.e. Thumb-mode, and in practice we don't
    use ARM mode).
  + It leads to infinitesimally smaller code (0.2%, yay!).

rdar://25369506

llvm-svn: 266003
2016-04-11 22:27:40 +00:00
Manman Ren 2aa7f23272 swifterror: fix up a testing case.
llvm-svn: 266000
2016-04-11 21:45:33 +00:00
Simon Pilgrim 82e54871d0 [DAGCombiner] Fold xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) anytime before LegalizeVectorOprs
xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) was only being combined at the AfterLegalizeTypes stage, this patch permits the combine to occur anytime before then as well.

The main aim with this to improve the ability to recognise bitmasks that can be converted to shuffles.

I had to modify a number of AVX512 mask tests as the basic bitcast to/from scalar pattern was being stripped out, preventing testing of the mmask bitops. By replacing the bitcasts with loads we can get almost the same result.

Differential Revision: http://reviews.llvm.org/D18944

llvm-svn: 265998
2016-04-11 21:10:33 +00:00
Manman Ren 5751814eda Swift Calling Convention: swifterror target support.
Differential Revision: http://reviews.llvm.org/D18716

llvm-svn: 265997
2016-04-11 21:08:06 +00:00
Tom Stellard 0ffdf65eaa Revert "AMDGPU/SI: Do not generate s_waitcnt after ds_permute/ds_bpermute"
This reverts commit r263720.

Just confirmed that s_waitcnt is required after ds_permute/ds_bpermute.

llvm-svn: 265992
2016-04-11 20:38:40 +00:00
Adrian Prantl b01a4d48ac More upgrading of old- and very-old-style debug info in testcases.
llvm-svn: 265953
2016-04-11 15:53:44 +00:00
Petar Jovanovic e578e970cb [mips] Make Static a default relocation model for MIPS codegen
This change follows up defaults for GCC and Clang, so LLVM does not differ
from them. While number of the test files are touched with this change, they
all keep the old (expected) behaviour with the explicit option:
"-relocation-model=pic"
The tests that have not been touched are insensitive to relocation model.

Differential Revision: http://reviews.llvm.org/D17995

llvm-svn: 265949
2016-04-11 15:24:23 +00:00
Ulrich Weigand 848a513d0a [SystemZ] Support conditional indirect sibling calls via BCR
This adds a conditional variant of CallBR instruction, CallBCR. Also,
it can be fused with integer comparisons, resulting in one of the new
C*BCall instructions.

In addition to CallBRCL limitations, this has another one: it won't
trigger if the function to call isn't already in %r1 - see f22 in the
test for an example (it's also why the loads in tests are volatile).

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18928

llvm-svn: 265933
2016-04-11 12:12:32 +00:00
Simon Pilgrim ac0cac3b0d [X86] Added extra widening tests for and/xor/or bit operations
Add tests for bitcasting an illegal vector to/from a legal scalar

Additional tests requested for D18944

llvm-svn: 265930
2016-04-11 11:10:36 +00:00
Simon Pilgrim ff434aaa3a [X86] Added extra widening tests for and/xor/or bit operations
To make sure we're dealing with both cases of legal/illegal number of vector elements and legal/illegal vector element types

llvm-svn: 265929
2016-04-11 10:58:52 +00:00
Simon Pilgrim d839fe9cfb [X86] Regenerated sdglue test checks
llvm-svn: 265927
2016-04-11 10:22:05 +00:00
Simon Pilgrim da730769fb [X86] Added widening tests for and/xor/or bit operations
Part of additional tests requested for D18944

llvm-svn: 265925
2016-04-11 10:16:27 +00:00
Simon Pilgrim 2d463b1a0f [X86][AVX512] Add vector integer division by constant tests
Added sdiv/srem and udiv/urem tests cases for 512 bit vectors.

llvm-svn: 265903
2016-04-10 17:14:26 +00:00
Simon Pilgrim d263fdc512 [X86][AVX512BW] Add support for v64i8 multiplies
Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets.

I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL.

Differential Revision: http://reviews.llvm.org/D18937

llvm-svn: 265902
2016-04-10 17:02:48 +00:00
Simon Pilgrim bab9979ee9 [X86][AVX512] Regenerated mask op tests
llvm-svn: 265898
2016-04-10 14:16:03 +00:00
Charles Davis 2f65f35c27 [CodeGen] Don't assume that fixed stack objects are aligned in a stack-realigned function.
Summary:
After we make the adjustment, we can assume that for local allocas, but
not for stack parameters, the return address, or any other fixed stack
object (which has a negative offset and therefore lies prior to the
adjusted SP).

Fixes PR26662.

Reviewers: hfinkel, qcolombet, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D18471

llvm-svn: 265886
2016-04-09 23:34:42 +00:00
Sanjay Patel 4abae4e0fa [x86] use BMI 'andn' for logic + compare ops
With BMI, we can use 'andn' to save an instruction when the result is only used in a compare.
This is related to one of the potential sequences to check 'isfinite' in:
https://llvm.org/bugs/show_bug.cgi?id=27164

Differential Revision: http://reviews.llvm.org/D18910

llvm-svn: 265875
2016-04-09 16:02:52 +00:00
Simon Pilgrim 1cc5712763 [X86][XOP] Support for VPPERM 2-input shuffle mask decoding
This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle.

The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp.

Differential Revision: http://reviews.llvm.org/D18441

llvm-svn: 265874
2016-04-09 14:51:26 +00:00
Sanjay Patel c0a627524d [x86] show missed opportunities to use andn
llvm-svn: 265850
2016-04-08 21:26:11 +00:00
Sanjay Patel a699ecc55c [x86] regenerate checks for BMI tests
llvm-svn: 265841
2016-04-08 20:29:39 +00:00
James Y Knight fec5c64b63 Add trailing colons to labels in a test.
This will avoid matching on the FILENAME if it happened to contain, say,
"f4" anywhere in the file path.

llvm-svn: 265837
2016-04-08 19:49:03 +00:00
Nirav Dave 66f485f4e2 Fix Load Control Dependence in MemCpy Generation
In Memcpy lowering we had missed a dependence from the load of the
operation to successor operations. This causes us to potentially
construct an in initial DAG with a memory dependence not fully
represented in the chain sub-DAG but rather require looking at the
entire DAG breaking alias analysis by allowing incorrect repositioning
of memory operations.

To work around this, r200033 changed DAGCombiner::GatherAllAliases to be
conservative if any possible issues to happen. Unfortunately this check
forbade many non-problematic situations as well. For example, it's
common for incoming argument lowering to add a non-aliasing load hanging
off of EntryNode. Then, if GatherAllAliases visited EntryNode, it would
find that other (unvisited) use of the EntryNode chain, and just give up
entirely. Furthermore, the check was incomplete: it would not actually
detect all such potentially problematic DAG constructions, because
GatherAllAliases did not guarantee to visit all chain nodes going up to
the root EntryNode. This is in general fine -- giving up early will just
miss a potential optimization, not generate incorrect results. But, for
this non-chain dependency detection code, it's possible that you could
have a load attached to a higher-up chain node than any which were
visited. If that load aliases your store, but the only dependency is
through the value operand of a non-aliasing store, it would've been
missed by this code, and potentially reordered.

With the dependence added, this check can be removed and Alias Analysis
can be much more aggressive. This fixes code quality regression in the
Consecutive Store Merge cleanup (D14834).

Test Change:

ppc64-align-long-double.ll now may see multiple serializations
of its stores

Differential Revision: http://reviews.llvm.org/D18062

llvm-svn: 265836
2016-04-08 19:44:40 +00:00
Kevin B. Smith e0a6fc3bcc [X86] Fix PR23155 by turning on X86FixupBWInsts by default.
Differential Revision: http://reviews.llvm.org/D18866

llvm-svn: 265830
2016-04-08 18:58:29 +00:00
Colin LeMahieu efe3732883 Revert r265817
lld tests need to be addressed.

llvm-svn: 265822
2016-04-08 18:15:37 +00:00
Colin LeMahieu 4a1975ba8e [llvm-objdump] Printing hex instead of dec by default
Differential Revision: http://reviews.llvm.org/D18770

llvm-svn: 265817
2016-04-08 17:55:03 +00:00
Ulrich Weigand fa2dffbc1a [SystemZ] Support conditional sibling calls via BRCL
This adds a conditional variant of CallJG instruction, CallBRCL.
It can be used for conditional sibling calls. Unfortunately, due
to IfCvt limitations, it only really works well for functions without
arguments.

Author: koriakin
Differential Revision: http://reviews.llvm.org/D18864

llvm-svn: 265814
2016-04-08 17:22:19 +00:00
Quentin Colombet bd19c8a39e [AArch64] Add a test case for the default mapping of RegBankSelect.
llvm-svn: 265811
2016-04-08 17:11:51 +00:00
Quentin Colombet 876ddf8107 [MIR] Teach the parser how to deal with register banks.
llvm-svn: 265802
2016-04-08 16:40:43 +00:00
Sam Parker 2d5126cdf5 [ARM] Enable SMLAW[B|T] and SMLUW[B|T] instruction selection
Added ISelDAGToDAG functions to enable selection of the smlawb, smlawt,
smulwb and smulwt instructions for the ARM backend. Also updated the smul
CodeGen test and removed the smulw one.

Differential Revision: http://reviews.llvm.org/D18892

llvm-svn: 265793
2016-04-08 16:02:53 +00:00
Hans Wennborg 5a7723c7a2 Revert r265547 "Recommit r265309 after fixed an invalid memory reference bug happened"
It caused PR27275: "ARM: Bad machine code: Using an undefined physical register"

Also reverting the following commits that were landed on top:
r265610 "Fix the compare-clang diff error introduced by r265547."
r265639 "Fix the sanitizer bootstrap error in r265547."
r265657 "InlineSpiller.cpp: Escap \@ in r265547. [-Wdocumentation]"

llvm-svn: 265790
2016-04-08 15:17:43 +00:00
Simon Pilgrim 872dd6c3fe [X86][SSE] Added 32-bit tests for vector lzcnt/tzcnt tests
v2i64 tests are particularly bad on 32-bit targets.

llvm-svn: 265789
2016-04-08 15:01:31 +00:00
Chuang-Yu Cheng 98c1894755 CXX_FAST_TLS calling convention: performance improvement for PPC64
This is the same change on PPC64 as r255821 on AArch64. I have even borrowed
his commit message.

The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.

We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.

Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.

We add CSRsViaCopy, it will be explicitly handled during lowering.

1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.

Author: Tom Jablin (tjablin)
Reviewers: hfinkel kbarton cycheng

http://reviews.llvm.org/D17533

llvm-svn: 265781
2016-04-08 12:04:32 +00:00
Zlatko Buljan 53a037f5cc [mips][microMIPS] Add CodeGen support for ADD, ADDIU*, ADDU* and DADD* instructions
Differential Revision: http://reviews.llvm.org/D16454

llvm-svn: 265772
2016-04-08 07:27:26 +00:00
Dmitry Polukhin 24c4f28ac9 [IFUNC] Fix ifunc-asm.ll test
It seems that llc cannot be called used in assembler tests so test that
checks asm for particular target needs to be moved to codegen.

llvm-svn: 265770
2016-04-08 06:45:19 +00:00
Amaury Sechet c53ad4f3b2 Do not select EhPad BB in MachineBlockPlacement when there is regular BB to schedule
Summary:
EHPad BB are not entered the classic way and therefor do not need to be placed after their predecessors. This patch make sure EHPad BB are not chosen amongst successors to form chains, and are selected as last resort when selecting the best candidate.

EHPad are scheduled in reverse probability order in order to have them flow into each others naturally.

Reviewers: chandlerc, majnemer, rafael, MatzeB, escha, silvas

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17625

llvm-svn: 265726
2016-04-07 21:29:39 +00:00
Jan Vesely 43b7b5b846 AMDGPU/SI: Implement atomic load/store for i32 and i64
Standard load/store instructions with GLC bit set.

Reviewers: tstellardAMD, arsenm

Differential Revision: http://reviews.llvm.org/D18760

llvm-svn: 265709
2016-04-07 19:23:11 +00:00
Tom Stellard 9112758077 AMDGPU/SI: Add latency for export instructions
Reviewers: arsenm, nhaehnle

Subscribers: nhaehnle, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D18599

llvm-svn: 265708
2016-04-07 18:30:05 +00:00
Simon Pilgrim fba9352f31 [X86][SSE] Added bitmask pattern shuffle tests
Based on OR(AND(MASK,V0),AND(~MASK,V1)) style patterns

llvm-svn: 265697
2016-04-07 17:23:55 +00:00
Kevin B. Smith 3802c4af59 [X86]: Fix for PR27251.
Differential Revision: http://reviews.llvm.org/D18850

llvm-svn: 265690
2016-04-07 16:15:34 +00:00
Ulrich Weigand 2eb027d21f [SystemZ] Implement conditional returns
Return is now considered a predicable instruction, and is converted
to a newly-added CondReturn (which maps to BCR to %r14) instruction by
the if conversion pass.

Also, fused compare-and-branch transform knows about conditional
returns, emitting the proper fused instructions for them.

This transform triggers on a *lot* of tests, hence the huge diffstat.
The changes are mostly jX to br %r14 -> bXr %r14.

Author: koriakin

Differential Revision: http://reviews.llvm.org/D17339

llvm-svn: 265689
2016-04-07 16:11:44 +00:00
Ehsan Amiri 4701a91e59 [PPC] Enable transformations in PPCPassConfig::addIRPasses at O2
http://reviews.llvm.org/D18562

A large number of testcases has been modified so they pass after this test.
One testcase is deleted, because I realized even after undoing the original
change that was committed with this testcase, the testcase still passes. So
I removed it. The change to one other testcase (test/CodeGen/PowerPC/pr25802.ll)
is an arbitrary change to keep it passing. Given the original intention of the
testcase, and the fact that fixing it will require some time to change the testcase,
we concluded that this quick change will be enough.

llvm-svn: 265683
2016-04-07 15:30:55 +00:00
Simon Pilgrim d54bae6525 [X86][SSE] Add support for VZEXT constant folding
llvm-svn: 265646
2016-04-07 07:52:45 +00:00
Ahmed Bougacha 1cf67fb9cb [X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC.
Re-apply r265450 which caused PR27245 and was reverted in r265559
because of a wrong generalization: the fetch_and_add->add_and_fetch
combine only works in specific, but pretty common, cases:
  (icmp slt x, 0) -> (icmp sle (add x, 1), 0)
  (icmp sge x, 0) -> (icmp sgt (add x, 1), 0)
  (icmp sle x, 0) -> (icmp slt (sub x, 1), 0)
  (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0)

Original Message:

We only generate LOCKed versions of add/sub when the result is unused.
It often happens that the result is used, but only by a comparison. We
can optimize those out by reusing EFLAGS, which lets us use the proper
instructions, instead of having to fallback to LXADD.

Instead of doing this as an MI peephole (as we do for the other
non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite
tricky later.

This also makes it eventually possible to stop expanding and/or/xor
if the only user is an icmp (also see D18141).

This uses the LOCK ISD opcodes added by r262244.

Differential Revision: http://reviews.llvm.org/D17633

llvm-svn: 265636
2016-04-07 02:07:10 +00:00
Ahmed Bougacha 70bde5445b [X86] Refresh and tweak EFLAGS reuse tests. NFC.
The non-1 and EQ/NE tests were misguided.

llvm-svn: 265635
2016-04-07 02:06:53 +00:00
Hans Wennborg ab16be799c Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"
Third time's the charm? The previous attempt (r265345) caused ASan test
failures on X86, as broken CFI caused stack traces to not work.

This version of the patch makes sure not to merge with stack adjustments
that have CFI, and to not add merged instructions' offests to the CFI
about to be generated.

This is already covered by the lit tests; I just got the expectations
wrong previously.

llvm-svn: 265623
2016-04-07 00:05:49 +00:00
Ehsan Amiri 322eca3849 [PPC] Use VSX/FP Facility integer load when an integer load's only users are conversion to FP
http://reviews.llvm.org/D18405

When the integer value loaded is never used directly as integer we should use VSX 
or Floating Point Facility integer loads and avoid extra direct move

llvm-svn: 265593
2016-04-06 20:12:29 +00:00
Nicolai Haehnle df3a20cd80 AMDGPU: Add a shader calling convention
This makes it possible to distinguish between mesa shaders
and other kernels even in the presence of compute shaders.

Patch By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Differential Revision: http://reviews.llvm.org/D18559

llvm-svn: 265589
2016-04-06 19:40:20 +00:00
Hans Wennborg 6849f8f15f Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC."
It caused ASan 32-bit tests to hang (PR27245).

llvm-svn: 265559
2016-04-06 16:44:38 +00:00
Hans Wennborg a7e396b5ef Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)""
It seems to be causing ASan tests to crash, probably due to
miscompiling the run-time somehow.

llvm-svn: 265551
2016-04-06 16:10:20 +00:00
Wei Mi 18293bef4e Recommit r265309 after fixed an invalid memory reference bug happened
when DenseMap growed and moved memory. I verified it fixed the bootstrap
problem on x86_64-linux-gnu but I cannot verify whether it fixes
the bootstrap error on clang-ppc64be-linux. I will watch the build-bot
result closely.

Replace analyzeSiblingValues with new algorithm to fix its compile
time issue. The patch is to solve PR17409 and its duplicates.

analyzeSiblingValues is a N x N complexity algorithm where N is
the number of siblings generated by reg splitting. Although it
causes siginificant compile time issue when N is large, it is also
important for performance since it removes redundent spills and
enables rematerialization.

To solve the compile time issue, the patch removes analyzeSiblingValues
and replaces it with lower cost alternatives containing two parts. The
first part creates a new spill hoisting method in postOptimization of
register allocation. It does spill hoisting at once after all the spills
are generated instead of inside every instance of selectOrSplit. The
second part queries the define expr of the original register for
rematerializaiton and keep it always available during register allocation
even if it is already dead. It deletes those dead instructions only in
postOptimization. With the two parts in the patch, it can remove
analyzeSiblingValues without sacrificing performance.

Differential Revision: http://reviews.llvm.org/D15302

llvm-svn: 265547
2016-04-06 15:41:07 +00:00
Chuang-Yu Cheng 6e1408a891 [ppc64] Temporary disable sibling call optimization on ppc64 due to breaking test case
r265506 breaks print-stack-trace.cc test case of compiler-rt in bootstrap
test.

http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/1708

llvm-svn: 265528
2016-04-06 10:48:36 +00:00
Chuang-Yu Cheng 2e5973ef74 [ppc64] Enable sibling call optimization on ppc64 ELFv1/ELFv2 abi
This patch enable sibling call optimization on ppc64 ELFv1/ELFv2 abi, and
add a couple of test cases. This patch also passed llvm/clang bootstrap
test, and spec2006 build/run/result validation.

Original issue: https://llvm.org/bugs/show_bug.cgi?id=25617

Great thanks to Tom's (tjablin) help, he contributed a lot to this patch.
Thanks Hal and Kit's invaluable opinions!

Reviewers: hfinkel kbarton

http://reviews.llvm.org/D16315

llvm-svn: 265506
2016-04-06 02:04:38 +00:00
Sanjoy Das 65a60670e8 Lower @llvm.experimental.deoptimize as a noreturn call
While preserving the return value for @llvm.experimental.deoptimize at
the IR level is useful during mid-level optimization, doing so at the
machine instruction level requires generating some extra code and a
return that is non-ideal.  This change has LLVM lower

```
  %val = call @llvm.experimental.deoptimize
  ret %val
```

to effectively

```
  call @__llvm_deoptimize()
  unreachable
```

instead.

llvm-svn: 265502
2016-04-06 01:33:49 +00:00