Commit Graph

16249 Commits

Author SHA1 Message Date
Kevin B. Smith c3c82cdbd0 [X86]: Improve Liveness checking for X86FixupBWInsts.cpp
Differential Revision: http://reviews.llvm.org/D21085

llvm-svn: 272797
2016-06-15 16:03:06 +00:00
Ranjeet Singh 0db7be886e Reverting r272778 because there's an assertion
failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll

llvm-svn: 272791
2016-06-15 14:23:29 +00:00
Simon Dardis 7bdf183ac1 [mips] Missing test case
Add missing testcase from r272666.

llvm-svn: 272784
2016-06-15 13:49:58 +00:00
Ranjeet Singh 351364fe76 [ARM] Add support for mrrc/mrrc2 intrinsics.
Differential Revision: http://reviews.llvm.org/D21178

llvm-svn: 272778
2016-06-15 11:32:24 +00:00
Daniel Sanders df3185d2ea [mips] Removed invalid test from o32_cc.ll
MIPS32R1 cannot implement a 64-bit FPU because this was introduced in MIPS32R2.

llvm-svn: 272769
2016-06-15 09:47:27 +00:00
Daniel Sanders d3bb20821d [mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX().
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21068

llvm-svn: 272765
2016-06-15 08:43:23 +00:00
Zlatko Buljan d2ed9c6c2c [mips][microMIPS] Add CodeGen support for AND*, OR16, OR*, XOR*, NOT16 and NOR instructions
Differential Revision: http://reviews.llvm.org/D16719

llvm-svn: 272764
2016-06-15 07:46:24 +00:00
Igor Breger 64cfd3a442 [AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior.
Use BLENDM instead of masked move instruction.

Differential Revision: http://reviews.llvm.org/D21001

llvm-svn: 272763
2016-06-15 07:30:38 +00:00
Nicolai Haehnle a609259832 AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics
Summary:
This fixes two related bugs. First, the generic optimization passes
unfortunately generate negative constant offsets but the hardware treats
SOffset as an unsigned value.

Second, there is a hardware bug on SI and CI, where address clamping in MUBUF
instructions does not work correctly when SOffset is larger than the buffer
size. This patch works around this bug by never using SOffset.

An alternative workaround would be to do the clamping manually when SOffset
is too large, but generating the required code sequence during instruction
selection would be rather involved, and in any case the resulting code would
probably be worse.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360

Reviewers: arsenm, tstellarAMD

Subscribers: arsenm, llvm-commits, kzhuravl

Differential Revision: http://reviews.llvm.org/D21326

llvm-svn: 272761
2016-06-15 07:13:05 +00:00
Sanjoy Das 0272be206a Don't force SP-relative addressing for statepoints
Summary:
...  when the offset is not statically known.

Prioritize addresses relative to the stack pointer in the stackmap, but
fallback gracefully to other modes of addressing if the offset to the
stack pointer is not a known constant.

Patch by Oscar Blumberg!

Reviewers: sanjoy

Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm

Differential Revision: http://reviews.llvm.org/D21259

llvm-svn: 272756
2016-06-15 05:35:14 +00:00
David Majnemer cbf614a93b Remove the ScalarReplAggregates pass
Nearly all the changes to this pass have been done while maintaining and
updating other parts of LLVM.  LLVM has had another pass, SROA, which
has superseded ScalarReplAggregates for quite some time.

Differential Revision: http://reviews.llvm.org/D21316

llvm-svn: 272737
2016-06-15 00:19:09 +00:00
Matt Arsenault f42c69206d AMDGPU: Run pointer optimization passes
llvm-svn: 272736
2016-06-15 00:11:01 +00:00
Xinliang David Li 8052238ac0 Fix a test case to match its intention
llvm-svn: 272733
2016-06-14 23:05:46 +00:00
Dehao Chen 9f2bdfb40f Set machine block placement hot prob threshold for both static and runtime profile.
Summary: With runtime profile, we have more confidence in branch probability, thus during basic block layout, we set a lower hot prob threshold so that blocks can be layouted optimally.

Reviewers: djasper, davidxl

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D20991

llvm-svn: 272729
2016-06-14 22:27:17 +00:00
Sanjay Patel 4c3cb8b6c0 [x86] add current codegen tests for PR27924
llvm-svn: 272714
2016-06-14 21:25:46 +00:00
Peter Collingbourne 96efdd6107 IR: Introduce local_unnamed_addr attribute.
If a local_unnamed_addr attribute is attached to a global, the address
is known to be insignificant within the module. It is distinct from the
existing unnamed_addr attribute in that it only describes a local property
of the module rather than a global property of the symbol.

This attribute is intended to be used by the code generator and LTO to allow
the linker to decide whether the global needs to be in the symbol table. It is
possible to exclude a global from the symbol table if three things are true:
- This attribute is present on every instance of the global (which means that
  the normal rule that the global must have a unique address can be broken without
  being observable by the program by performing comparisons against the global's
  address)
- The global has linkonce_odr linkage (which means that each linkage unit must have
  its own copy of the global if it requires one, and the copy in each linkage unit
  must be the same)
- It is a constant or a function (which means that the program cannot observe that
  the unique-address rule has been broken by writing to the global)

Although this attribute could in principle be computed from the module
contents, LTO clients (i.e. linkers) will normally need to be able to compute
this property as part of symbol resolution, and it would be inefficient to
materialize every module just to compute it.

See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html
for earlier discussion.

Part of the fix for PR27553.

Differential Revision: http://reviews.llvm.org/D20348

llvm-svn: 272709
2016-06-14 21:01:22 +00:00
Wei Mi b799a625f9 [X86] Reduce the width of multiplification when its operands are extended from i8 or i16
For <N x i32> type mul, pmuludq will be used for targets without SSE41, which
often introduces many extra pack and unpack instructions in vectorized loop
body because pmuludq generates <N/2 x i64> type value. However when the operands
of <N x i32> mul are extended from smaller size values like i8 and i16, the type
of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which
generates better code. For targets with SSE41, pmulld is supported so no
shrinking is needed.

Differential Revision: http://reviews.llvm.org/D20931

llvm-svn: 272694
2016-06-14 18:53:20 +00:00
Nirav Dave f8d00d5cac Fix BSS global handling in AsmPrinter
Change EmitGlobalVariable to check final assembler section is in BSS
before using .lcomm/.comm directive. This prevents globals from being
put into .bss erroneously when -data-sections is used.

This fixes PR26570.

Reviewers: echristo, rafael

Subscribers: llvm-commits, mehdi_amini

Differential Revision: http://reviews.llvm.org/D21146

llvm-svn: 272674
2016-06-14 15:09:30 +00:00
Simon Dardis 878c0b1b76 [mips] Optimize stack pointer adjustments.
Instead of always using addu to adjust the stack pointer when the
size out is of the range of an addiu instruction, use subu so that
a smaller constant can be generated.

This can give savings of ~3 instructions whenever a function has a
a stack frame whose size is out of range of an addiu instruction.

This change may break some naive stack unwinders.

Partially resolves PR/26291.

Thanks to David Chisnall for reporting the issue.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D21321

llvm-svn: 272666
2016-06-14 13:39:43 +00:00
James Molloy 65b6be1d3a [Thumb] Fix off-by-one error in r272007
We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256.

Found by Oliver Stannard and csmith!

llvm-svn: 272665
2016-06-14 13:33:07 +00:00
Simon Dardis 4fbf76f7c3 [mips][atomics] Fix atomic instruction descriptions and uses.
PR27458 highlights that the MIPS backend does not have well formed
MIR for atomic operations (among other errors).

This patch adds expands and corrects the LL/SC descriptions and uses
for MIPS(64).

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D19719

llvm-svn: 272655
2016-06-14 11:29:28 +00:00
Simon Pilgrim cf1165b86e [X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS
llvm-svn: 272651
2016-06-14 09:43:38 +00:00
Simon Dardis e661e528db [mips] MIPS32/64 itineraries
Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded
instructions are marked as having no scheduling info.

Reviewers: dsanders, vkalintiris

Differential Review: http://reviews.llvm.org/D20418

llvm-svn: 272648
2016-06-14 09:35:29 +00:00
Daniel Sanders 435a653437 [mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic.
Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21063

llvm-svn: 272647
2016-06-14 09:29:46 +00:00
Daniel Sanders d2a49ec3ab [mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA.
Summary:
The machine verifier reports 'Explicit operand marked as def' when it is
manually specified even though it agrees with the operand info.

Reviewers: sdardis

Subscribers: dsanders, sdardis, llvm-commits

Differential Revision: http://reviews.llvm.org/D21065

llvm-svn: 272646
2016-06-14 09:11:33 +00:00
Diana Picus bae1d89e45 [SelectionDAG] Remove exit-on-error flag from test (PR27765)
The exit-on-error flag in the ARM test is necessary in order to avoid an
unreachable in the DAGTypeLegalizer, when trying to expand a physical register.
We can also avoid this situation by introducing a bitcast early on, where the
invalid scalar-to-vector conversion is detected.

We also add a test for PowerPC, which goes through a similar code path in the
SelectionDAGBuilder.

Fixes PR27765.

Differential Revision: http://reviews.llvm.org/D21061

llvm-svn: 272644
2016-06-14 07:30:20 +00:00
Igor Breger 484bace21b re-generate the tests using the update_llc_test_checks.py script
llvm-svn: 272643
2016-06-14 07:05:10 +00:00
Craig Topper 99e30e6a66 [AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR.
llvm-svn: 272626
2016-06-14 03:13:00 +00:00
Craig Topper ddab395397 [AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX.
llvm-svn: 272625
2016-06-14 03:12:54 +00:00
Craig Topper cbe54a4bd9 [AVX512] Add tests for zero extending masks that show an unnecessary movzx instruction. A followup patch will remove that instruction, but adding the tests first to make the more obvious.
llvm-svn: 272624
2016-06-14 03:12:48 +00:00
Sanjoy Das 98ac278b86 Move previously added test case to the right location
In rL272580 I accidentally added a test case to test/CodeGen when
test/Transforms/DeadStoreElimination/ is a better place for it.

llvm-svn: 272581
2016-06-13 20:12:07 +00:00
Sanjoy Das d0bdf3e02b Fix AAResults::callCapturesBefore for operand bundles
Summary:
AAResults::callCapturesBefore would previously ignore operand
bundles. It was possible for a later instruction to miss its memory
dependency on a call site that would only access the pointer through a
bundle.

Patch by Oscar Blumberg!

Reviewers: sanjoy

Differential Revision: http://reviews.llvm.org/D21286

llvm-svn: 272580
2016-06-13 19:55:04 +00:00
Simon Pilgrim 582b9ce36e [X86][SSE] Added extract to scalar nontemporal store tests
llvm-svn: 272577
2016-06-13 19:08:28 +00:00
David Majnemer 248190ba69 [X86] Remove llvm.x86.bit.scan.{forward,reverse}.32
The need for these intrinsics has been obviated by r272564 which
reimplements their functionality using generic IR.

llvm-svn: 272566
2016-06-13 17:33:13 +00:00
Marek Olsak e93f6d6923 AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing
Summary:
Mesa and other users must set this to enable coalescing:
- STRIDE = 0
- SWIZZLE_ENABLE = 1

This makes one particular compute shader 8x faster.

Reviewers: tstellarAMD, arsenm

Subscribers: arsenm, kzhuravl

Differential Revision: http://reviews.llvm.org/D21136

llvm-svn: 272556
2016-06-13 16:05:57 +00:00
Ulrich Weigand daae87aa21 [SystemZ] Enable index register memory constraints for inline ASM
This enables use of the 'R' and 'T' memory constraints for inline ASM
operands on SystemZ, which allow an index register as well as an
immediate displacement. This patch includes corresponding documentation
and test case updates.

As with the last patch of this kind, I moved the 'm' constraint to the
most general case, which is now 'T' (base + 20-bit signed displacement +
index register).

Author: colpell
Differential Revision: http://reviews.llvm.org/D21239

llvm-svn: 272547
2016-06-13 14:24:05 +00:00
Ranjeet Singh 933e1aa39f [ARM] Reverting r272544 because clang patch needs
to go in as soon as llvm patch has gone in because
tests will start breaking in Clang.

llvm-svn: 272546
2016-06-13 10:58:24 +00:00
Ranjeet Singh 8feacb330d [ARM] Add mrrc/mrrc2 co-processor intrinsics
MRRC/MRRC2 instruction writes to two registers. The
intrinsic definition returns a single uint64_t to
represent the write, this is a compact way of
representing a write to two 32 bit registers,
the alternative might have been two return a
struct of 2 uint32_t's but this isn't as nice.

Differential Revision: 

llvm-svn: 272544
2016-06-13 10:43:50 +00:00
Strahinja Petrovic f0980e4dc0 This patch fixes handling long double type when it is
constant in soft float mode on PowerPC 32 architecture.

llvm-svn: 272543
2016-06-13 10:29:29 +00:00
Simon Pilgrim 377bc2ea43 [X86][SSE4A] Renamed tests to correspond with the the instruction with being tested
llvm-svn: 272542
2016-06-13 10:14:42 +00:00
Craig Topper 13cf7cac07 [AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector.
llvm-svn: 272527
2016-06-13 02:36:48 +00:00
Sanjay Patel 977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Craig Topper 1067986c5b [X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector.
llvm-svn: 272510
2016-06-12 14:11:32 +00:00
Simon Pilgrim 9d8bed1796 [X86][BMI] Added fast-isel tests for BMI1 intrinsics
A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops 

llvm-svn: 272508
2016-06-12 09:56:05 +00:00
Craig Topper b7713e413b [X86] Move tests for llvm.x86.avx.vpermil.* intrinsics to a -upgrade test since they are autoupgraded to shufflevector.
llvm-svn: 272494
2016-06-12 01:41:06 +00:00
Simon Pilgrim 2b7c02a04f [X86] Updated test checks script to generalise LCPI symbol refs
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file

Refreshed some tests to demonstrate the new check

llvm-svn: 272488
2016-06-11 20:39:21 +00:00
Simon Pilgrim 5b9bade8dd [X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.

llvm-svn: 272477
2016-06-11 15:44:13 +00:00
Craig Topper 46f49fb407 [AVX512] Re-generate v8i64 shuffle test now that we use pshufd for some cases.
llvm-svn: 272474
2016-06-11 13:57:08 +00:00
Craig Topper 504fba5c8a [AVX512] Lower v8i64 and v16i32 to pshufd when possible.
llvm-svn: 272473
2016-06-11 13:43:21 +00:00
Simon Pilgrim 6800a45790 [X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining

llvm-svn: 272471
2016-06-11 13:38:28 +00:00