Commit Graph

39991 Commits

Author SHA1 Message Date
Jonas Paulsson 818431a61a [SystemZ] Fixes in SchedModels for older subtargets.
IssueWidth updated to reflect the capacity of the issue unit correctly.
Correct number of FX and LS units modelled (2, was 1).

Review: Ulrich Weigand
llvm-svn: 286109
2016-11-07 14:47:25 +00:00
James Molloy b03e0879fc [Thumb1] Move padding earlier when synthesizing TBBs off of the PC
When the base register (register pointing to the jump table) is the PC, we expect the jump table to directly follow the jump sequence with no intervening padding.

If there is intervening padding, the calculated offsets will not be correct. One solution would be to account for any padding in the emitted LDRB instruction, but at the moment we don't support emitting MCExprs for the load offset.

In the meantime, it's correct and only a slight amount worse to just move the padding up, from just before the jump table to just before the jump instruction sequence. We can do that by emitting code alignment before the jump sequence, as we know the number of instructions in the sequence is always 4.

llvm-svn: 286107
2016-11-07 13:38:21 +00:00
Dylan McKay c988b334b6 [AVR] Enable the ISel, frame analyzer, and alloca passes
llvm-svn: 286095
2016-11-07 06:02:55 +00:00
Craig Topper b110e04851 [AVX-512] Remove masked pmovzx/pmovsx builtins and autoupgrade them to selects and native zext/sext.
This mostly reuses earlier autoupgrade support for the sse and avx equivalents. Just needed to add the code to add the select.

llvm-svn: 286092
2016-11-07 02:12:57 +00:00
Craig Topper 7e545335d6 [AVX-512] Remove 128/256 masked pshufb intrinsics. Autoupgrade them to legacy intrinsics and a select.
llvm-svn: 286089
2016-11-07 00:13:39 +00:00
Krzysztof Parzyszek 39d14f3bc3 Reapply r286080 with a phony change in Hexagon's CMakeLists.txt
Cmake has not recognized that Hexagon.td has a new dependency in
HexagonPatterns.td. All changes to that file were not visible to
the build bots.

llvm-svn: 286084
2016-11-06 20:55:57 +00:00
Saleem Abdulrasool 804e12eeb5 ARM: lower fpowi appropriately for Windows ARM
This handles the last case of the builtin function calls that we would
generate code which differed from Microsoft's ABI.  Rather than
generating a call to `__pow{d,s}i2` we now promote the parameter to a
float or double and invoke `powf` or `pow` instead.

Addresses PR30825!

llvm-svn: 286082
2016-11-06 19:46:54 +00:00
Krzysztof Parzyszek f8d38d11b9 Revert r286080: it breaks build bots
llvm-svn: 286081
2016-11-06 19:36:09 +00:00
Krzysztof Parzyszek 9e3520c884 [Hexagon] Remove redundant custom selection code
The clr/set/toggle-bit instructions (with the bit index given as an
immediate operand) had both, custom selection code that generated them,
and selection patterns at the same time. The selection patterns were
not used, because the custom selection code was executed first.
This patch removes the custom code in favor of the selection patterns.
The custom code handled 64-bit registers as well with an immediate bit
index, and so new patterns were added to implement that.

It was also the same case for the instruction "Rd += asr(Rs, Rt)",
except that the custom code did not offer any additional functionality,
and was simply removed.

llvm-svn: 286080
2016-11-06 19:03:38 +00:00
Krzysztof Parzyszek c93815ef04 [Hexagon] Round 5 of selection pattern simplifications
Remove unnecessary type casts in patterns.

llvm-svn: 286079
2016-11-06 18:13:14 +00:00
Krzysztof Parzyszek f914278f8b [Hexagon] Round 4 of selection pattern simplifications
Give simpler or more meaningful names to pat frags and xforms.

llvm-svn: 286078
2016-11-06 18:09:56 +00:00
Krzysztof Parzyszek 846597d081 [Hexagon] Round 3 of selection pattern simplifications
Remove unnecessary C++ functions for SDNode transforms. Move more
pat frags to files where they are used.

llvm-svn: 286077
2016-11-06 18:05:14 +00:00
Krzysztof Parzyszek 84755104b4 [Hexagon] Round 2 of selection pattern simplifications
Add pat frags for any-, sign-, and zero-extensions.

llvm-svn: 286076
2016-11-06 17:56:48 +00:00
Craig Topper 46de41330c [AVX-512] Remove intrinsics for 128/256-bit masked variable shift. Instead upgrade them to a select and the older AVX2 intrinsic.
llvm-svn: 286073
2016-11-06 16:29:19 +00:00
Craig Topper af9b3fe752 [AVX-512] Remove intrinsics for 128/256-bit masked shift by immediate. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286072
2016-11-06 16:29:14 +00:00
Craig Topper c9467ed31e [AVX-512] Remove intrinsics for 128/256-bit masked shift by single element in xmm. Instead upgrade them to a select and the older SSE/AVX2 intrinsic.
llvm-svn: 286070
2016-11-06 16:29:08 +00:00
Simon Pilgrim b3ad5f7ebf [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsElementInsertion. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286067
2016-11-06 14:20:29 +00:00
Craig Topper 5471fc29e4 [AVX-512] Add missing EVEX version of pattern for (v2f64 (extloadv2f32 addr:)) -> VCVTPS2PDZ128rm
llvm-svn: 286059
2016-11-06 04:12:52 +00:00
Craig Topper 1162857ec4 [AVX-512] Lower AVX cvtpd2ps intrinsic to ISD::FP_ROUND so it can use EVEX instruction when available.
llvm-svn: 286057
2016-11-06 04:12:46 +00:00
Craig Topper 9a4a3af5dd [AVX-512] Lower SSE/AVX cvtdq2ps intrinsics directly to ISD::SINT_TO_FP so they can use EVEX instructions when available.
llvm-svn: 286056
2016-11-06 04:12:42 +00:00
Krzysztof Parzyszek 2839b29f4b [Hexagon] Relocate pattern-related bits to proper places
llvm-svn: 286049
2016-11-05 21:44:50 +00:00
Krzysztof Parzyszek 4b4012a5c9 [Hexagon] Round 1 of selection pattern simplifications
Consistently use register class pat frags instead of spelling out
the type and class each time.

llvm-svn: 286048
2016-11-05 21:02:54 +00:00
Simon Pilgrim 4a9f210412 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBlend. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286045
2016-11-05 18:31:57 +00:00
Simon Pilgrim 725174694a [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsZeroOrAnyExtend. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286044
2016-11-05 18:22:13 +00:00
Simon Pilgrim 9f0afc6ae1 [X86][SSE] Reuse zeroable element mask in SSE4A EXTRQ/INSERTQ vector shuffle lowering. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286043
2016-11-05 18:05:13 +00:00
Simon Pilgrim 3cae21960e [X86][SSE] Reuse zeroable element mask in PSHUFB vector shuffle lowering. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286042
2016-11-05 17:53:27 +00:00
Simon Pilgrim 64a592d0a2 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsInsertPS. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286040
2016-11-05 17:27:48 +00:00
Simon Pilgrim 009befbd88 [X86][SSE] Reuse zeroable element mask in lowerVectorShuffleAsBitMask. NFCI
Don't regenerate a zeroable element mask with computeZeroableShuffleElements when its already available.

llvm-svn: 286039
2016-11-05 17:12:19 +00:00
Simon Pilgrim 1af0fc1103 [X86][SSE] Reuse zeroable element mask instead of regenerating it. NFCI
We are repeatedly calling computeZeroableShuffleElements in many shuffle lowering calls for the same shuffle mask/inputs.

This is a first step towards reusing the zeroable result, initially just for lowerVectorShuffleAsShift calls.

llvm-svn: 286037
2016-11-05 16:40:20 +00:00
Krzysztof Parzyszek a8d63dc289 [Hexagon] Split all selection patterns into a separate file
This is just the basic separation, without any cleanup. Further changes
will follow.

llvm-svn: 286036
2016-11-05 15:01:38 +00:00
Simon Pilgrim 1b4e1ac966 Strip trailing whitespace. NFCI.
llvm-svn: 286034
2016-11-05 14:43:04 +00:00
Krzysztof Parzyszek b7eb7fc892 [Hexagon] Account for <def,read-undef> when validating moves for predication
llvm-svn: 286009
2016-11-04 20:41:03 +00:00
Zvi Rackover 85bc64c734 [X86] Broadcast from memory intructions aren't unfoldable
Broadcast from memory instructions should be treated as moves. They can't be unfolded.

Fixes pr30693.

llvm-svn: 285998
2016-11-04 15:15:19 +00:00
Tom Stellard 2d2d33f1dc Revert "AMDGPU: Add VI i16 support"
This reverts commit r285939 and r285948.  These broke some conformance tests.

llvm-svn: 285995
2016-11-04 13:06:34 +00:00
Justin Bogner 2c2c6ac7b5 X86: Move a non-null assert to before the pointer is dereferenced
llvm-svn: 285975
2016-11-03 23:55:36 +00:00
Chandler Carruth 651f019297 Sink all of the code relying on the MachO MachineModuleInfo to live
behind the test that the MachineModuleInfo analysis was
actually available and can be used.

While the MachO bits may well be reasonable to assume in the darwin
assembly printer, the analysis isn't constructively guaranteed anywhere
I could find so it seems safest to avoid crashing here.

This issue was found with PVS-Studio. Pretty sure the Clang Static
Anaylzer flags similar issues but we've probably never pointed it at
this code effectively.

llvm-svn: 285972
2016-11-03 23:33:46 +00:00
Weiming Zhao 962eaaea9c [Cortex-M0] Atomic lowering
Summary: ARMv6m supports dmb etc fench instructions but not ldrex/strex etc. So for some atomic load/store, LLVM should inline instructions instead of lowering to __sync_ calls.

Reviewers: rengolin, efriedma, t.p.northover, jmolloy

Subscribers: efriedma, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D26120

llvm-svn: 285969
2016-11-03 21:49:08 +00:00
Tony Jiang 946242b5d2 NFC - Test commit.
Delete an empty line at the end of README.txt file.

llvm-svn: 285964
2016-11-03 20:32:21 +00:00
Tom Stellard cc34983181 AMDGPU/SI: Re add VIInstructions.td to unbreak bots
This file is unused as of r285939, but we need to keep it around
for bots that don't do full rebuilds.   We should be able to delete this
again in a few days.

llvm-svn: 285948
2016-11-03 17:56:46 +00:00
Chandler Carruth 5589aa60c7 Remove a redundant condition found by PVS-Studio.
Filed http://llvm.org/PR30897 to teach Clang to warn on this kind of
stuff.

llvm-svn: 285945
2016-11-03 17:42:02 +00:00
Tom Stellard 2b3379cdff AMDGPU: Add VI i16 support
Patch By: Wei Ding

Differential Revision: https://reviews.llvm.org/D18049

llvm-svn: 285939
2016-11-03 17:13:50 +00:00
Chandler Carruth 30e0029904 Delete a dead store found by PVS-Studio.
Quite sad we still aren't really using aggressive dead code warnings
from Clang that we could potentially use to catch this and so many other
things.

llvm-svn: 285936
2016-11-03 17:01:38 +00:00
Alexander Timofeev f867a40bf6 [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads.
hange explores the fact that LDS reads may be reordered even if access
the same location.

Prior the change, algorithm immediately stops as soon as any memory
access encountered between loads that are expected to be merged
together. Although, Read-After-Read conflict cannot affect execution
correctness.

Improves hcBLAS CGEMM manually loop-unrolled kernels performance by 44%.
Also improvement expected on any massive sequences of reads from LDS.

Differential Revision: https://reviews.llvm.org/D25944

llvm-svn: 285919
2016-11-03 14:37:13 +00:00
Zvi Rackover a455864fdf Refactor creation of X86ISD::SETCC nodes to a helper function. NFC.
llvm-svn: 285917
2016-11-03 14:25:24 +00:00
James Molloy e7d97368f2 Revert "[Thumb] Teach ISel how to lower compares of AND bitmasks efficiently"
This reverts commit r285893. It caused (probably) http://lab.llvm.org:8011/builders/clang-cmake-thumbv7-a15-full-sh/builds/83 .

llvm-svn: 285912
2016-11-03 14:08:01 +00:00
James Molloy b60d8b1987 [Thumb] Teach ISel how to lower compares of AND bitmasks efficiently
This recommits r281323, which was backed out for two reasons. One, a selfhost failure, and two, it apparently caused Chromium failures. Actually, the latter was a red herring. The log has expired from the former, but I suspect that was a red herring too (actually caused by another problematic patch of mine). Therefore reapplying, and will watch the bots like a hawk.

For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)).

1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS.
2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS.
3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS).
4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask.

1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win.

llvm-svn: 285893
2016-11-03 10:18:20 +00:00
Craig Topper 7b9cc1474f [AVX-512] Use 'vnot' instead of 'not' in patterns involving vXi1 vectors.
This fixes selection of KANDN instructions and allows us to remove an extra set of patterns for KNOT and KXNOR.

Reviewers: delena, igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26134

llvm-svn: 285878
2016-11-03 06:04:28 +00:00
Elena Demikhovsky caaceef4b3 Expandload and Compressstore intrinsics
2 new intrinsics covering AVX-512 compress/expand functionality.
This implementation includes syntax, DAG builder, operation lowering and tests.
Does not include: handling of illegal data types, codegen prepare pass and the cost model.

llvm-svn: 285876
2016-11-03 03:23:55 +00:00
Krzysztof Parzyszek ead77016d8 [Hexagon] Remove registers coalesced in expand-condsets from live intervals
llvm-svn: 285846
2016-11-02 17:59:54 +00:00
Nicolai Haehnle 368972c3b3 AMDGPU: Allow additional implicit operands on MOVRELS instructions
Summary:
The post-RA scheduler occasionally uses additional implicit operands when
the vector implicit operand as a whole is killed, but some subregisters
are still live because they are directly referenced later. Unfortunately,
this seems incredibly subtle to reproduce.

Fixes piglit spec/glsl-110/execution/variable-indexing/vs-temp-array-mat2-index-wr.shader_test
and others.

Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D25656

llvm-svn: 285835
2016-11-02 17:03:11 +00:00