Commit Graph

41442 Commits

Author SHA1 Message Date
Dan Gohman 18eafb6c68 [WebAssembly] Add skeleton MC support for the Wasm container format
This just adds the basic skeleton for supporting a new object file format.
All of the actual encoding will be implemented in followup patches.

Differential Revision: https://reviews.llvm.org/D26722

llvm-svn: 295803
2017-02-22 01:23:18 +00:00
Matt Arsenault 1f17c66890 AMDGPU: Add cvt.pkrtz intrinsic
Convert llvm.SI.packf16 test uses

llvm-svn: 295797
2017-02-22 00:27:34 +00:00
Matt Arsenault 9417505f7d AMDGPU: Remove llvm.AMDGPU.clamp intrinsic
llvm-svn: 295789
2017-02-21 23:46:04 +00:00
Matt Arsenault 2fdf2a1a18 AMDGPU: Redefine clamp node as clamp 0.0-1.0
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.

Also allow using clamp with f16, and use knowledge
of dx10_clamp.

llvm-svn: 295788
2017-02-21 23:35:48 +00:00
Artem Belevich 29bbdc1c32 [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.
Original code only used vector loads/stores for explicit vector arguments.
It could also do more loads/stores than necessary (e.g v5f32 would
touch 8 f32 values). Aggregate types were loaded one element at a time,
even the vectors contained within.

This change attempts to generalize (and simplify) parameter space
loads/stores so that vector loads/stores can be used more broadly.
Functionality of the patch has been verified by compiling thrust
test suite and manually checking the differences between PTX
generated by llvm with and without the patch.

General algorithm:
* ComputePTXValueVTs() flattens input/output argument into a flat list
  of scalars to load/store and returns their types and offsets.
* VectorizePTXValueVTs() uses that data to create vectorization plan
  which returns an array of flags marking boundaries of vectorized
  load/stores. Scalars are represented as 1-element vectors.
* Code that generates loads/stores implements a simple state machine
  that constructs a vector according to the plan.

Differential Revision: https://reviews.llvm.org/D30011

llvm-svn: 295784
2017-02-21 22:56:05 +00:00
Matt Arsenault 7d6b71db4f AMDGPU: Formatting fixes
llvm-svn: 295783
2017-02-21 22:50:41 +00:00
Evandro Menezes a8d3301ee1 [AArch64, X86] Add statistics for the MacroFusion pass
llvm-svn: 295777
2017-02-21 22:16:13 +00:00
Evandro Menezes b9b7f4b8d3 [AArch64, X86] Guard against both instrs being wild cards
If both instrs are wild cards, the result can be a crash.

llvm-svn: 295776
2017-02-21 22:16:11 +00:00
Evgeniy Stepanov 1fd19c6e5d Fix PR31896.
Address of an alias of a global with offset is incorrectly lowered as an address of the global (i.e. ignoring offset).

llvm-svn: 295762
2017-02-21 20:17:34 +00:00
Matt Arsenault c2a44e4c3c AMDGPU: Remove llvm.AMDGPU.flbit intrinsic
llvm-svn: 295754
2017-02-21 19:27:33 +00:00
Matt Arsenault e0bf7d02f0 AMDGPU: Don't use stack space for SGPR->VGPR spills
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.

I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.

The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.

llvm-svn: 295753
2017-02-21 19:12:08 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Simon Pilgrim 8eb515d8c4 [X86] EltsFromConsecutiveLoads SDLoc argument should be const&.
There appears never to have been a time that the reference was updated.

llvm-svn: 295739
2017-02-21 17:42:28 +00:00
Simon Pilgrim 791955819c [X86][AVX2] Fix VPBROADCASTQ folding on 32-bit targets.
As i64 isn't a value type on 32-bit targets, we need to fold the VZEXT_LOAD into VPBROADCASTQ.

llvm-svn: 295733
2017-02-21 16:41:44 +00:00
John Brawn a6e95e1652 [ARM] Correct SP/PC handling in t2MOVr
PC isn't allowed in the source operand of t2MOVr, so change the register class
to one without PC. SP handling is slightly trickier and changes depending on if
we're in ARMv8, so do that in checkTargetMatchPredicate.

Differential Revision: https://reviews.llvm.org/D30199

llvm-svn: 295732
2017-02-21 16:41:29 +00:00
Simon Pilgrim 3546156122 [X86][SSE] Prefer to combine shuffles to VZEXT over VZEXT_MOVL.
This matches what is already done during shuffle lowering and helps prevent the need for a zero-vector in cases where shuffles match both patterns.

llvm-svn: 295723
2017-02-21 15:09:00 +00:00
Igor Breger 812f319794 [AVX512] Fix EXTRACT_VECTOR_ELT for v2i1/v4i1/v32i1/v64i1 with variable index.
Differential Revision: https://reviews.llvm.org/D30189

llvm-svn: 295718
2017-02-21 14:01:25 +00:00
Diana Picus 613b65696a [ARM] GlobalISel: Lower calls to void() functions
For now, we hardcode a BLX instruction, and generate an ADJCALLSTACKDOWN/UP pair
with amount 0.

llvm-svn: 295716
2017-02-21 11:33:59 +00:00
Craig Topper d88389aa7e [X86] Use SHLD with both inputs from the same register to implement rotate on Sandy Bridge and later Intel CPUs
Summary:
Sandy Bridge and later CPUs have better throughput using a SHLD to implement rotate versus the normal rotate instructions. Additionally it saves one uop and avoids a partial flag update dependency.

This patch implements this change on any Sandy Bridge or later processor without BMI2 instructions. With BMI2 we will use RORX as we currently do.

Reviewers: zvi

Reviewed By: zvi

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30181

llvm-svn: 295697
2017-02-21 06:39:13 +00:00
Craig Topper 16d9730b86 [X86] Fix formatting. NFC
llvm-svn: 295695
2017-02-21 06:27:13 +00:00
Craig Topper d9fe664868 [AVX-512] Use sse_load_f32/f64 in place of scalar_to_vector and scalar load in some patterns.
llvm-svn: 295693
2017-02-21 04:26:10 +00:00
Craig Topper d890db6952 [AVX-512] Fix the ExeDomain for vcmpss/vcmpsd.
llvm-svn: 295691
2017-02-21 04:26:04 +00:00
Sanjoy Das 90208720e3 Add a wrapper around copy_if in STLExtras; NFC
I will add one more use for this in a later change.

llvm-svn: 295685
2017-02-21 00:38:44 +00:00
Craig Topper 2012dda9a0 [AVX-512] Add a few more patterns for selecting masked vpternlog with broadcast loads where the passthru operand is not operand 0.
llvm-svn: 295673
2017-02-20 17:44:09 +00:00
Simon Pilgrim 2967ed1c7e [X86] Tidyup combineExtractVectorElt. NFCI.
Pull out repeated code for extraction index operand and source vector value type.

Use isNullConstant helper to check for zero extraction index.

llvm-svn: 295670
2017-02-20 16:09:45 +00:00
Diana Picus 1c33c9f0b0 [ARM] GlobalISel: Don't select atomic loads
There used to be a check in the IRTranslator that prevented us from having to
deal with atomic loads/stores. That check has been removed in r294993 and the
AArch64 backend was updated accordingly. This commit does the same thing for the
ARM backend.

In general, in the ARM backend we introduce fences during the atomic expand
pass, so we don't have to worry about atomics, *except* for the 32-bit ARMv8
target, which handles atomics more like AArch64. Since we don't want to worry
about that yet, just bail out of instruction selection if we find any atomic
loads.

llvm-svn: 295662
2017-02-20 14:45:58 +00:00
Igor Breger fda32d266a [X86] Fix EXTRACT_VECTOR_ELT with variable index from v32i16 and v64i8 vector.
Its more profitable to go through memory (1 cycles throughput)
than using VMOVD + VPERMV/PSHUFB sequence ( 2/3 cycles throughput) to implement EXTRACT_VECTOR_ELT with variable index.
IACA tool was used to get performace estimation (https://software.intel.com/en-us/articles/intel-architecture-code-analyzer)
For example for var_shuffle_v16i8_v16i8_xxxxxxxxxxxxxxxx_i8 test from vector-shuffle-variable-128.ll I get 26 cycles vs 79 cycles. 
Removing the VINSERT node, we don't need it any more.

Differential Revision: https://reviews.llvm.org/D29690

llvm-svn: 295660
2017-02-20 14:16:29 +00:00
Simon Pilgrim 5910ebe720 [X86][AVX512] Add support for ASHR v2i64/v4i64 support without VLX
Use v8i64 ASHR instructions if we don't have VLX.

Differential Revision: https://reviews.llvm.org/D28537

llvm-svn: 295656
2017-02-20 12:16:38 +00:00
Sjoerd Meijer e22a79e898 AArch64AsmParser: tablegen the isBranchTarget helper functions
Use tablegen to autogenerate isBranchtarget helper functions. This is a cleanup
that removes almost identical functions that differ only in a few constants.

Differential Revision: https://reviews.llvm.org/D30160

llvm-svn: 295649
2017-02-20 10:57:54 +00:00
Ayman Musa 51ffeab8c8 [X86][AVX] Extend hasVEX_WPrefix bit to accept WIG value (W Ignore) + update all AVX instructions with the new value.
Add WIG value to all of AVX instructions which ignore the W-bit in their encoding, instead of giving them the default value of 0.
This patch is needed for a follow up work on EVEX2VEX pass (replacing EVEX encoded instructions with their corresponding VEX version when possible).

Differential Revision: https://reviews.llvm.org/D29876

llvm-svn: 295643
2017-02-20 08:27:54 +00:00
Craig Topper c6c68f5958 [AVX-512] Add more patterns to fold masked VPTERNLOG with load when the passthru isn't operand 0.
llvm-svn: 295640
2017-02-20 07:00:40 +00:00
Craig Topper a5fa2e40f9 [AVX-512] Fix mistake in the immediate swizzle for some of the VPTERNLOG patterns.
llvm-svn: 295638
2017-02-20 07:00:34 +00:00
Craig Topper 5b4e36aafa [AVX-512] Add more VPTERNLOG patterns to enable folding of broadcast loads that aren't in operand 2.
llvm-svn: 295634
2017-02-20 02:47:42 +00:00
Craig Topper c184b671d9 [X86] Use memory form of shift right by 1 when the rotl immediate is one less than the operation size.
An earlier commit already did this for the register form.

llvm-svn: 295626
2017-02-20 00:37:23 +00:00
Craig Topper 63801df251 [AVX-512] Remove AddedComplexity from masked operations. The size of the patterns already increases their priority.
llvm-svn: 295619
2017-02-19 21:44:35 +00:00
Simon Pilgrim 14a7eee0b4 [X86] Use peekThroughOneUseBitcasts helper. NFCI.
llvm-svn: 295618
2017-02-19 21:40:51 +00:00
Davide Italiano 16b476ffcc [X86] Prefer static_cast<> to C-style cast. NFCI.
llvm-svn: 295617
2017-02-19 21:35:41 +00:00
Craig Topper 489057715e [AVX-512] Disable peephole optimizations on the VPTERNLOG commute test. Add new patterns to enable isel to fold the loads on it own.
llvm-svn: 295616
2017-02-19 21:32:15 +00:00
Davide Italiano 1aef59eb44 [AArch64] Prefer static_cast<> to C-style cast. NFCI.
llvm-svn: 295615
2017-02-19 21:31:14 +00:00
Simon Pilgrim d590de2998 [X86][SSE] Use getTargetConstantBitsFromNode to find zeroable shuffle elements.
Replaces existing approach that could only search BUILD_VECTOR nodes.

Requires getTargetConstantBitsFromNode to discriminate cases with all/partial UNDEF bits in each element - this should also be useful when we get around to supporting getTargetShuffleMaskIndices with UNDEF elements. 

llvm-svn: 295613
2017-02-19 19:40:31 +00:00
Craig Topper 4e794c71a6 [AVX-512] Add patterns to recognize masked vpternlog when the passthrough operand is not operand 0.
This uses a SDNodeXForm to swizzle the appropriate immediate bits to allow this to be matched.

llvm-svn: 295612
2017-02-19 19:36:58 +00:00
Simon Pilgrim 4271186f9c [X86][SSE] Enable initial support for domain crossing at high shuffle combine depths.
As discussed on D27692, this permits another domain to be used to combine a shuffle at high depths.

We currently set the required depth at 4 or more combined shuffles, this is probably too high for most targets but is a good starting point and already helps avoid a number of costly variable shuffles.

llvm-svn: 295608
2017-02-19 17:19:38 +00:00
Simon Pilgrim 6d07d514de [X86][SSE] Generalize INSERTPS/SHUFPS/SHUFPD combines across domains.
Relax the INSERTPS/SHUFPS/SHUFPD combines to support integer inputs if permitted.

llvm-svn: 295606
2017-02-19 15:15:40 +00:00
Simon Pilgrim b4460cf5a9 [X86][SSE] Add domain crossing support for target shuffle combines.
Add the infrastructure to flag whether float and/or int domains are permitable.

A future patch will enable domain crossing based off shuffle depth and the value types of the source vectors.

llvm-svn: 295604
2017-02-19 14:12:25 +00:00
Craig Topper 218d1a020e [AVX-512] Add broadcast VPTERNLOG instructions to special case commuting switch.
The instructions are marked commutable, but without special handling we don't get the immediate correct.

While here also remove the masked memory forms that aren't commutable.

llvm-svn: 295602
2017-02-19 08:03:26 +00:00
Craig Topper 007c93b2b9 [X86] Remove patterns for MOVSD with v4i32 types. We don't appear to really need them and if we do we should just use a bitcast to a 64-bit element type.
llvm-svn: 295589
2017-02-19 02:08:48 +00:00
Craig Topper 06ae5e821c [X86] Tighten up some of the SDNode type constraints.
llvm-svn: 295588
2017-02-19 01:54:47 +00:00
Simon Pilgrim 599b872ca2 [X86] Fix enumeral/non-enumeral conditional expression warning.
gcc only allows you to mix enums / ints if they have the same signedness.

llvm-svn: 295586
2017-02-19 00:04:30 +00:00
Simon Pilgrim ee6ef4d6dd Fix 'variable set but not used' warning when assertions are disabled.
llvm-svn: 295585
2017-02-19 00:03:46 +00:00
Simon Pilgrim 2f2d8dc630 Fix signed/unsigned comparison warning.
llvm-svn: 295580
2017-02-18 22:56:17 +00:00