Commit Graph

41325 Commits

Author SHA1 Message Date
Simon Pilgrim bfd4495512 [X86][SSE] Combine shuffle nodes with multiple uses if all the users are being combined.
Currently we only combine shuffle nodes if they have a single user to prevent us from causing code bloat by splitting the shuffles into several different combines.

We don't take into account that in some cases we will already have combined all the users during recursively calling up the shuffle tree.

This patch keeps a list of all the shuffle nodes that have been combined so far and permits combining of further shuffle nodes if all its users are in that list.

Differential Revision: https://reviews.llvm.org/D29399

llvm-svn: 294183
2017-02-06 13:44:45 +00:00
Simon Dardis 3aa8a90eff [mips] dla expansion without the at register
Previously only the superscalar scheduled expansion of the dla macro for
MIPS64 was implemented. If assembler temporary register is not available
and the optional source register is not the destination register, synthesize
the address using the naive solution of adds and shifts.

This partially resolves PR/30383.

Thanks to Sean Bruno for reporting the issue!

Reviewers: slthakur, seanbruno

Differential Revision: https://reviews.llvm.org/D29328

llvm-svn: 294182
2017-02-06 12:43:46 +00:00
Dylan McKay 0acfafdbd6 [AVR] Use 'print' instead of 'dump'
This should fix an undefined reference on the AVR buildbot.

llvm-svn: 294175
2017-02-06 08:43:30 +00:00
Igor Breger 5c31a4c9a3 [X86][GlobalISel] Add limited ret lowering support to the IRTranslator.
Summary:
Support return lowering for i8/i16/i32/i64/float/double, vector type supported for 64bit platform only.
Support argument lowering for float/double types.

Reviewers: t.p.northover, zvi, ab, rovka

Reviewed By: zvi

Subscribers: dberris, kristof.beyls, delena, llvm-commits

Differential Revision: https://reviews.llvm.org/D29261

llvm-svn: 294173
2017-02-06 08:37:41 +00:00
Craig Topper 5d9ecd23e8 [AVX-512] Add VPSLLDQ/VPSRLDQ to load folding tables.
llvm-svn: 294170
2017-02-06 05:12:14 +00:00
Craig Topper f0eb60a6f3 [AVX-512] Add VPABSB/D/Q/W to load folding tables.
llvm-svn: 294169
2017-02-06 03:18:01 +00:00
Craig Topper 864b1a5376 [AVX-512] Add VSHUFPS/PD to load folding tables.
llvm-svn: 294168
2017-02-06 03:17:58 +00:00
Craig Topper 75218fb6b1 [AVX-512] Add VPMULLD/Q/W instructions to load folding tables.
llvm-svn: 294164
2017-02-06 01:19:26 +00:00
Craig Topper 452a7770e6 [AVX-512] Add all masked and unmasked versions of VPMULDQ and VPMULUDQ to load folding tables.
llvm-svn: 294163
2017-02-05 23:31:48 +00:00
Simon Pilgrim 380ce75687 [X86][SSE] Replace insert_vector_elt(vec, -1, idx) with shuffle
Similar to what we already do for zero elt insertion, we can quickly rematerialize 'allbits' vectors so to avoid a unnecessary gpr value and insertion into a vector

llvm-svn: 294162
2017-02-05 22:50:29 +00:00
Craig Topper 8eb1f315ac [AVX-512] Add scalar masked max/min intrinsic instructions to the load folding tables.
llvm-svn: 294153
2017-02-05 22:25:46 +00:00
Craig Topper cb4bc8be5b [AVX-512] Add scalar masked add/sub/mul/div intrinsic instructions to the load folding tables.
llvm-svn: 294152
2017-02-05 22:25:42 +00:00
Craig Topper 59af67206d [AVX-512] Add masked scalar FMA intrinsics to isNonFoldablePartialRegisterLoad to improve load folding of scalar loads.
llvm-svn: 294151
2017-02-05 22:25:40 +00:00
Dylan McKay ccd819ad94 [AVR] Implement stacksave/stackrestore by expanding (PR31342)
Summary:
Authored by Florian Zeitz.

This implements the missing stacksave/stackrestore intrinsics via expansion.

Output of `llc -O0 -march=avr ~/devel/llvm/test/CodeGen/Generic/stacksave-restore.ll` for sanity checking (comments mine):

```
	.text
	.file	".../llvm/test/CodeGen/Generic/stacksave-restore.ll"
	.globl	test
	.p2align	1
	.type	test,@function
test:                                   ; @test
; BB#0:
	push	r28
	push	r29

	in	r28, 61
	in	r29, 62
	sbiw	r28, 4
	in	r0, 63
	cli
	out	62, r29
	out	63, r0
	out	61, r28

	in	r18, 61
	in	r19, 62

	mov	r20, r22
	mov	r21, r23

	in	r30, 61
	in	r31, 62

	lsl	r22
	rol	r23
	lsl	r22
	rol	r23
	in	r26, 61
	in	r27, 62
	sub	r26, r22
	sbc	r27, r23
	andi	r26, 252
	in	r0, 63
	cli
	out	62, r27
	out	63, r0
	out	61, r26

	in	r0, 63
	cli
	out	62, r31
	out	63, r0
	out	61, r30

	in	r30, 61
	in	r31, 62
	sub	r30, r22
	sbc	r31, r23
	andi	r30, 252
	in	r0, 63
	cli
	out	62, r31
	out	63, r0
	out	61, r30

	std	Y+3, r24                ; 2-byte Folded Spill
	std	Y+4, r25                ; 2-byte Folded Spill

	mov	r24, r26
	mov	r25, r27

	in	r0, 63
	cli
	out	62, r19
	out	63, r0
	out	61, r18

	std	Y+1, r20                ; 2-byte Folded Spill
	std	Y+2, r21                ; 2-byte Folded Spill

	adiw	r28, 4
	in	r0, 63
	cli
	out	62, r29
	out	63, r0
	out	61, r28

	pop	r29
	pop	r28
	ret
.Lfunc_end0:
	.size	test, .Lfunc_end0-test
```

Reviewers: dylanmckay

Reviewed By: dylanmckay

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29553

llvm-svn: 294146
2017-02-05 21:35:45 +00:00
Kamil Rytarowski 5d2bd8dd54 Revamp llvm::once_flag to be closer to std::once_flag
Summary:
Make this interface reusable similarly to std::call_once and std::once_flag interface.

This makes porting LLDB to NetBSD easier as there was in the original approach a portable way to specify a non-static once_flag. With this change translating std::once_flag to llvm::once_flag is mechanical.

Sponsored by <The NetBSD Foundation>

Reviewers: mehdi_amini, labath, joerg

Reviewed By: mehdi_amini

Subscribers: emaste, clayborg

Differential Revision: https://reviews.llvm.org/D29566

llvm-svn: 294143
2017-02-05 21:13:06 +00:00
Craig Topper cac328f25e [X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument.
llvm-svn: 294132
2017-02-05 18:33:31 +00:00
Craig Topper d7ae9ab1fa [X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage.
llvm-svn: 294131
2017-02-05 18:33:24 +00:00
Craig Topper 6a35a81fc5 [X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later.
Similar was already done for several other shuffles in this function.

The test changes are because the old code used explicity zeroing for elements that could have been undef.

While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.

llvm-svn: 294130
2017-02-05 18:33:14 +00:00
Daniel Sanders c3ac566754 [globalisel][arm] Tablegen-erate current Register Bank Information.
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.

Depends on D27338

Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar

Reviewed By: rovka

Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28567

llvm-svn: 294124
2017-02-05 12:07:55 +00:00
Dylan McKay b78f36657e [AVR] Fix a bug where asm operands are printed twice
We would unconditionally call printOperand, even if PrintAsmOperand
already printed the immediate.

llvm-svn: 294121
2017-02-05 10:42:49 +00:00
Dylan McKay 7a3eb290ef [AVR] Support zero-sized arguments in defined methods
It is sufficient to skip emission of these arguments as we have nothing
to actually pass through the function call.

The AVR-GCC reference has nothing to say about zero-sized arguments,
presumably because C/C++ doesn't support them. This means we don't have
to worry about ABI differences.

llvm-svn: 294119
2017-02-05 09:53:45 +00:00
Craig Topper 978fdb75a4 [X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1).
llvm-svn: 294112
2017-02-04 23:26:46 +00:00
Craig Topper 3d95228dbe [X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI
llvm-svn: 294111
2017-02-04 23:26:42 +00:00
Eric Christopher b128abcf7a Remove a bunch of unnecessary casts to a target specific version of TII and TRI as we're working from a target specific STI.
llvm-svn: 294081
2017-02-04 01:52:17 +00:00
Eugene Zelenko 3f37f07c7f [Sparc] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294072
2017-02-04 00:36:49 +00:00
Eugene Zelenko cd8ea02b4a [Mips] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294069
2017-02-03 23:39:33 +00:00
Eugene Zelenko 06869c04f3 [SystemZ] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294068
2017-02-03 23:39:06 +00:00
Eugene Zelenko e894b4dc59 [AMDGPU] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294067
2017-02-03 23:38:40 +00:00
Eugene Zelenko 939f6b0167 [AArch64] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294053
2017-02-03 21:49:13 +00:00
Eugene Zelenko 07dc38f67a [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294052
2017-02-03 21:48:12 +00:00
Eugene Zelenko c3164b9c2f [XCore] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294051
2017-02-03 21:46:55 +00:00
Matt Arsenault f15da6c419 AMDGPU: AsmParser cleanups
Use typedef, remove unnecessary enum, line wraps.

llvm-svn: 294039
2017-02-03 20:49:51 +00:00
Stanislav Mekhanoshin 81db53109d [AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000
This has quite positive performance impact according to measurements.
Before previous fixes to limit the optimization that was too high
and blowed compile time and scratch usage, but now this is gone and
we can bump the threshold.

Differential Revision: https://reviews.llvm.org/D29505

llvm-svn: 294032
2017-02-03 20:08:29 +00:00
Matt Arsenault 1fa5eacf9d AMDGPU: Set MCAsmInfo::PointerSize
llvm-svn: 294031
2017-02-03 20:02:23 +00:00
Matt Arsenault d9cd736585 AMDGPU: Don't unroll for private with dynamic allocas
This won't be elimnated, so this will just bloat code
if/when these are ever used/supported.

llvm-svn: 294030
2017-02-03 19:36:00 +00:00
Simon Pilgrim 034c1bd32c [X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle.
Correctly flagging upper elements as undef.

llvm-svn: 294020
2017-02-03 17:59:58 +00:00
Simon Dardis 68e9d94055 [mips] Remove absolute size assertion for end directive
The .end <symbol> directive for MIPS marks the end of a symbol and sets the
symbol's size. Previously, the corresponding emitDirective handler asserted
that a function's size could be evaluated to an absolute value at that point
in time.

This cannot be done with when directives like .align have been encountered,
instead set the function's size to the corresponding symbolic expression and
let ELFObjectWriter resolve the expression to an absolute value. This avoids
a redundant call to evaluateAsAbsolute.

llvm-svn: 294012
2017-02-03 15:48:53 +00:00
Justin Lebar e90c468444 [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D29477

llvm-svn: 294011
2017-02-03 15:13:50 +00:00
Artem Tamazov 43b61561b0 [AMDGPU][mc] Fix AddressSanitizer leftover issue in gfx7_asm_all test
Issue occurs when assembling "ds_ordered_count v0, v0 gds".

llvm-svn: 294004
2017-02-03 12:47:30 +00:00
Sanne Wouda a994185757 [ARM] Change TCReturn to tBL if tailcall optimization fails.
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Reviewers: rengolin, jmolloy, rovka, olista01

Reviewed By: olista01

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D29020

llvm-svn: 294000
2017-02-03 11:15:53 +00:00
Stanislav Mekhanoshin f29602df65 [AMDGPU] Unroll preferences improvements
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.

Differential Revision: https://reviews.llvm.org/D29473

llvm-svn: 293991
2017-02-03 02:20:05 +00:00
Matt Arsenault e1b595306d AMDGPU: Fold fneg into fmin/fmax_legacy
llvm-svn: 293972
2017-02-03 00:51:50 +00:00
Craig Topper bbb2b95ce5 [X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering.
llvm-svn: 293969
2017-02-03 00:24:49 +00:00
Matt Arsenault 2511c031de AMDGPU: Fold fneg into fminnum/fmaxnum
llvm-svn: 293968
2017-02-03 00:23:15 +00:00
Matt Arsenault a8fcfadf46 AMDGPU: Check if users of fneg can fold mods
In multi-use cases this can save a few instructions.

llvm-svn: 293962
2017-02-02 23:21:23 +00:00
Eugene Zelenko fbd13c5c12 [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293949
2017-02-02 22:55:55 +00:00
Reid Kleckner 3c467e225e [X86] Avoid sorted order check in release builds
Effectively reverts r290248 and fixes the unused function warning with
ifndef NDEBUG.

llvm-svn: 293945
2017-02-02 22:06:30 +00:00
Craig Topper c45657375b [X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.

llvm-svn: 293944
2017-02-02 22:02:57 +00:00
Reid Kleckner c35139ec0d [CodeGen] Remove dead call-or-prologue enum from CCState
This enum has been dead since Olivier Stannard re-implemented ARM byval
handling in r202985 (2014).

llvm-svn: 293943
2017-02-02 21:58:22 +00:00
Rafael Espindola 13a79bbfe5 Change how we handle section symbols on ELF.
On ELF every section can have a corresponding section symbol. When in
an assembly file we have

.quad .text

the '.text' refers to that symbol.

The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.

The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).

Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.

This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.

llvm-svn: 293936
2017-02-02 21:26:06 +00:00
Javed Absar bb8dcc6aec [ARM] Classification Improvements to ARM Sched-Model. NFCI.
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.

Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127

llvm-svn: 293935
2017-02-02 21:08:12 +00:00
Krzysztof Parzyszek d0d42f0ec8 [Hexagon] Adding opExtentBits and opExtentAlign to GPrel instructions
Patch by Colin LeMahieu.

llvm-svn: 293933
2017-02-02 20:35:12 +00:00
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Krzysztof Parzyszek e17b0bfb24 [Hexagon] Fix relocation kind for extended predicated calls
Patch by Sid Manning.

llvm-svn: 293931
2017-02-02 20:21:56 +00:00
Krzysztof Parzyszek 357b048666 [Hexagon] Remove A4_ext_* pseudo instructions
Patch by Colin LeMahieu.

llvm-svn: 293929
2017-02-02 19:58:22 +00:00
Krzysztof Parzyszek d67ab623f6 [Hexagon] Fix insertBranch for loops with multiple ENDLOOP instructions
llvm-svn: 293925
2017-02-02 19:36:37 +00:00
Dan Gohman b89f2d3d92 [WebAssembly] Add instruction definitions for drop and get/set_global.
llvm-svn: 293922
2017-02-02 19:29:44 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Simon Dardis 08ce5fb66b [mips] Expansion of BEQL and BNEL with immediate operands
Adds support for BEQL and BNEL macros with immediate operands.

Patch by: Srdjan Obucina

Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno

Differential Revision: https://reviews.llvm.org/D17040

llvm-svn: 293905
2017-02-02 16:13:49 +00:00
Jonas Paulsson b7a2ef8375 [SystemZ] Add comment for ISD::FP_TO_UINT expansion.
(Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp)

Review: Ulrich Weigand
llvm-svn: 293900
2017-02-02 15:42:14 +00:00
Krzysztof Parzyszek bc4dc9b4b9 [Hexagon] Emitting individual instructions without copying them
Patch by Colin LeMahieu.

llvm-svn: 293899
2017-02-02 15:32:26 +00:00
Krzysztof Parzyszek f65b8f14f4 [Hexagon] Rename TypeCOMPOUND to TypeCJ
llvm-svn: 293894
2017-02-02 15:03:30 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Nirav Dave e14300e270 [X86,ISEL] Fix X86 increment chain dependence calculation
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.

llvm-svn: 293892
2017-02-02 14:39:26 +00:00
Diana Picus 32cd9b434c [ARM] GlobalISel: Lower pointer args and returns
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.

llvm-svn: 293889
2017-02-02 14:01:00 +00:00
Diana Picus 0c11c7b5c7 [ARM] GlobalISel: Error out instead of asserting
Allow unknown types in TLI.getValueType, otherwise we get asserts for certain
types that we do not support yet (instead of returning that we don't support
them and falling through the normal error path).

llvm-svn: 293888
2017-02-02 14:00:54 +00:00
Diana Picus fc19a8ff07 [ARM] GlobalISel: Legalize loading pointers
Make it legal to load pointer values. Also check that pointers are assigned
to the GPR reg bank by default.

llvm-svn: 293886
2017-02-02 13:20:49 +00:00
Simon Pilgrim 20ab6b875a [X86][SSE] Use MOVMSK for all_of/any_of reduction patterns
This is a first attempt at using the MOVMSK instructions to replace all_of/any_of reduction patterns (i.e. an and/or + shuffle chain).

So far this only matches patterns where we are reducing an all/none bits source vector (i.e. a comparison result) but we should be able to expand on this in conjunction with improvements to 'bool vector' handling both in the x86 backend as well as the vectorizers etc.

Differential Revision: https://reviews.llvm.org/D28810

llvm-svn: 293880
2017-02-02 11:52:33 +00:00
Craig Topper 047a8be18a [X86] Remove some unused DAGCombinerInfo parameters. NFC
llvm-svn: 293873
2017-02-02 08:03:23 +00:00
Craig Topper 94ed54b49a [X86] Move some INSERT_SUBVECTOR optimizations from legalize to DAG combine.
This moves creation of SUBV_BROADCAST and merging of adjacent loads that are being inserted together.

This is a step towards removing legalizing of INSERT_SUBVECTOR except for vXi1 cases.

llvm-svn: 293872
2017-02-02 08:03:20 +00:00
Craig Topper b81e6c48f8 [AVX-512] Fix the implicit defs for VZEROALL/VZEROUPPER to include YMM16-YMM31.
llvm-svn: 293862
2017-02-02 04:17:18 +00:00
Matt Arsenault 9dba9bd4cf AMDGPU: Use source modifiers with f16->f32 conversions
The operand types were defined to fit the fp16_to_fp node, which
has the half as an integer type. v_cvt_f32_f16 does support
source modifiers, so change this to have an FP type and modifiers.

For targets without legal f16, this requires recognizing the
bit operations and trying to produce them.

llvm-svn: 293857
2017-02-02 02:27:04 +00:00
Matthias Braun 5b49f95592 AArch64RegisterInfo: Simplify getReservedReg(); NFC
After marking a 32bit register and all its super registers the 64bit
register does not need to be marked again.

llvm-svn: 293855
2017-02-02 02:23:25 +00:00
Matt Arsenault 8e190b2f23 NVPTX: Fix not preserving volatile when expanding memset
llvm-svn: 293851
2017-02-02 01:20:34 +00:00
Peter Collingbourne dc5e583687 X86: Produce @ABS8 symbol modifiers for absolute symbols in range [0,128).
Differential Revision: https://reviews.llvm.org/D28689

llvm-svn: 293844
2017-02-02 00:32:03 +00:00
Stanislav Mekhanoshin 2b913b1f49 [AMDGPU] Account workgroup size in LDS occupancy limits
Functions matching LDS use to occupancy return results for a workgroup
of 64 workitems. The numbers has to be adjusted for bigger workgroups.
For example a workgroup of size 256 already occupies 4 waves just by
itself. Given that all numbers of LDS use in the compiler are per
workgroup, occupancy shall be multiplied by 4 in this case. Each 64
workitems still limited by the same number, but 4 subrgoups 64 workitems
each can afford 4 times more LDS to get the same occupancy.

In addition change initializes LDS size in the subtarget to a real value
for SI+ targets. This is required since LDS size is a variable in these
calculations.

Differential Revision: https://reviews.llvm.org/D29423

llvm-svn: 293837
2017-02-01 22:59:50 +00:00
Eugene Zelenko c5eb8e29d0 [AArch64] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293836
2017-02-01 22:56:06 +00:00
Dehao Chen 0944a8c2ec Change debug-info-for-profiling from a TargetOption to a function attribute.
Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

llvm-svn: 293833
2017-02-01 22:45:09 +00:00
Matt Arsenault 74f64833bc AMDGPU: Allow clustering flat memory operations
llvm-svn: 293809
2017-02-01 20:22:51 +00:00
Simon Dardis ac9c30c37f [mips] Parse the 'bopt' and 'nobopt' directives in IAS.
The GAS assembler supports the ".set bopt" directive but according
to the sources it doesn't do anything. It's supposed to optimize
branches by filling the delay slot of a branch with it's target.

This patch teaches the MIPS asm parser to accept both and warn in
the case of 'bopt' that the bopt directive is unsupported.

This resolves PR/31841.

Thanks to Sean Bruno for reporting the issue!

llvm-svn: 293798
2017-02-01 18:50:24 +00:00
Matthew Simpson ba5cf9dfee [LV] Move interleaved access helper functions to VectorUtils (NFC)
This patch moves some helper functions related to interleaved access
vectorization out of LoopVectorize.cpp and into VectorUtils.cpp. We would like
to use these functions in a follow-on patch that improves interleaved load and
store lowering in (ARM/AArch64)ISelLowering.cpp. One of the functions was
already duplicated there and has been removed.

Differential Revision: https://reviews.llvm.org/D29398

llvm-svn: 293788
2017-02-01 17:45:46 +00:00
Simon Pilgrim ca931efc21 [X86][SSE] Remove unused argument. NFCI.
llvm-svn: 293777
2017-02-01 16:34:50 +00:00
Matt Arsenault d59e640455 AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops
These were simply preserving the flags of the original operation,
which was too conservative in most cases and incorrect for mul.

nsw/nuw may be needed for some combines to cleanup messes when
intermediate sext_inregs are introduced later.

Tested valid combinations with alive.

llvm-svn: 293776
2017-02-01 16:25:23 +00:00
Simon Dardis 6433d5af6b [mips] Fix an initialization issue with MipsABIInfo in MipsTargetELFStreamer
DebugInfoDWARFTests is the only user so far which initializes the
MCObjectStreamer without initializing the ASMParser. The MIPS backend
relies on the ASMParser to initialize the MipsABIInfo object and to
update the target streamer with it. This should turn the mips buildbots
green.

Reviewers: atanasyan, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D28025

llvm-svn: 293772
2017-02-01 15:39:23 +00:00
Kit Barton d26978796e [PowerPC] Fix sjlj pseduo instructions to use G8RC_NOX0 register class
The the following instructions:
  - LD/LWZ (expanded from sjLj pseudo-instructions)
  - LXVL/LXVLL vector loads
  - STXVL/STXVLL vector stores
all require G8RC_NO0X class registers for RA.

Differential Revision: https://reviews.llvm.org/D29289

Committed for Lei Huang

llvm-svn: 293769
2017-02-01 14:33:57 +00:00
Simon Pilgrim 55a9c79bd1 [X86][SSE] Merge SSE2 PINSRW lowering with SSE41 PINSRB/PINSRW lowering. NFCI.
These are identical apart from the extra SSE41 guard for PINSRB.

llvm-svn: 293766
2017-02-01 13:32:19 +00:00
Javed Absar e5ad87e939 [ARM] Enable Cortex-M23 and Cortex-M33 support.
Add both cores to the target parser and TableGen. Test that eabi
attributes are set correctly for both cores. Additionally, test the
absence and presence of MOVT in Cortex-M23 and Cortex-M33, respectively.

Committed on behalf of Sanne Wouda.
Reviewers : rengolin, olista01.

Differential Revision: https://reviews.llvm.org/D29073

llvm-svn: 293761
2017-02-01 11:55:03 +00:00
NAKAMURA Takumi 468487d71a *MacroFusion.cpp: Suppress warnings to eliminate \param(s). [-Wdocumentation]
llvm-svn: 293744
2017-02-01 07:30:46 +00:00
Craig Topper 0bcba19cdf [X86] For AVX1/AVX2 isel, don't use FP move instructions for 128-bit loads/stores of integer types.
For SSE we use fp because of the smaller encoding, but that doesn't apply to AVX. So just do the natural thing so we don't have to explain why we aren't. We can't do this for 256-bit loads/stores since integer loads and stores aren't available in AVX1 so we need fallback patterns since the integer types are legal.

This doesn't affect any tests because execution domain fixing freely converts the instructions anyway. Honestly, we could probably rely on it for the SSE size optimization too.

llvm-svn: 293743
2017-02-01 07:17:16 +00:00
Evandro Menezes 455382ea22 [AArch64] Add new target feature to fuse literal generation
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, sections 4.14 and 4.15.

Differential revision: https://reviews.llvm.org/D28698

llvm-svn: 293739
2017-02-01 02:54:42 +00:00
Evandro Menezes b21fb29c26 [AArch64] Add new subtarget feature to fuse AES crypto operations
This feature enables the fusion of such operations on Cortex A57, as
recommended in its Software Optimisation Guide, section 4.13, and on Exynos
M1.

Differential revision: https://reviews.llvm.org/D28491

llvm-svn: 293738
2017-02-01 02:54:39 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Eugene Zelenko 926883e1c2 [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293729
2017-02-01 01:22:51 +00:00
Matt Arsenault da7a656542 AMDGPU: Cleanup fmin/fmax legacy function
Use a more specific subtarget check and combine hasOneUse checks

llvm-svn: 293726
2017-02-01 00:42:40 +00:00
Matt Arsenault 1575cb893c AMDGPU: Fix warning
llvm-svn: 293717
2017-01-31 23:48:37 +00:00
Justin Lebar 06fcea4cd9 [NVPTX] Compute approx sqrt as 1/rsqrt(x) rather than x*rsqrt(x).
x*rsqrt(x) returns NaN for x == 0, whereas 1/rsqrt(x) returns 0, as
desired.

Verified that the particular nvptx approximate instructions here do in
fact return 0 for x = 0.

llvm-svn: 293713
2017-01-31 23:08:57 +00:00
Michael Kuperstein e18aad39ab Shut up GCC warning about operator precedence. NFC.
Technically, this is actually changes the expression and the original
assert was "wrong", but since the conjunction is with true, it doesn't
matter in this case.

llvm-svn: 293709
2017-01-31 22:48:45 +00:00
Peter Collingbourne d763c4cc85 MC: Introduce the ABS8 symbol modifier.
@ABS8 can be applied to symbols which appear as immediate operands to
instructions that have a 8-bit immediate form for that operand. It causes
the assembler to use the 8-bit form and an 8-bit relocation (e.g. R_386_8
or R_X86_64_8) for the symbol.

Differential Revision: https://reviews.llvm.org/D28688

llvm-svn: 293667
2017-01-31 18:28:44 +00:00
Matt Arsenault d5d78510c7 AMDGPU: Use source mods with fcanonicalize
llvm-svn: 293654
2017-01-31 17:28:40 +00:00
Nirav Dave a7c041d147 [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648
2017-01-31 17:00:27 +00:00
Tom Stellard 124f5cc8c2 AMDGPU/SI: Fix inst-select-load-smrd.mir on some builds
Summary:
For some reason instructions are being inserted in the wrong order with some
builds.  I'm not sure why this is happening.

Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D29325

llvm-svn: 293639
2017-01-31 15:24:11 +00:00
Simon Pilgrim 1b39d5db7b [X86][SSE] Add support for combining PINSRB into a target shuffle.
llvm-svn: 293637
2017-01-31 14:59:44 +00:00
Sam Parker 9bf658d5fe [ARM] Avoid using ARM instructions in Thumb mode
The Requires class overrides the target requirements of an instruction,
rather than adding to them, so all ARM instructions need to include the
IsARM predicate when they have overwitten requirements.

This caused the swp and swpb instructions to be allowed in thumb mode
assembly, and the ARM encoding of CDP to be selected in codegen (which
is different for conditional instructions).

Differential Revision: https://reviews.llvm.org/D29283

llvm-svn: 293634
2017-01-31 14:35:01 +00:00
Benjamin Kramer 94a833962c [X86] Silence unused variable warning in Release builds.
llvm-svn: 293631
2017-01-31 14:13:53 +00:00
Simon Pilgrim 4eab18f6b8 [X86][SSE] Detect unary PBLEND shuffles.
These can appear during shuffle combining.

llvm-svn: 293628
2017-01-31 13:58:01 +00:00
Simon Pilgrim c29eab52e8 [X86][SSE] Add support for combining PINSRW into a target shuffle.
Also add the ability to recognise PINSR(Vex, 0, Idx).

Targets shuffle combines won't replace multiple insertions with a bit mask until a depth of 3 or more, so we avoid codesize bloat.

The unnecessary vpblendw in clearupper8xi16a will be fixed in an upcoming patch.

llvm-svn: 293627
2017-01-31 13:51:10 +00:00
Nemanja Ivanovic 2f2a6ab991 [PowerPC][Altivec] Add vmr extended mnemonic
Just adds the vmr (Vector Move Register) mnemonic for the VOR instruction in
the PPC back end.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29133

llvm-svn: 293626
2017-01-31 13:43:11 +00:00
Simon Dardis 12850eeac5 [mips] Addition of the immediate cases for the instructions [d]div, [d]divu
Related to http://reviews.llvm.org/D15772

Depends on http://reviews.llvm.org/D16888

Adds support for immediate operand for [D]DIV[U] instructions.

Patch By: Srdjan Obucina

Reviewers: zoran.jovanovic, vkalintiris, dsanders, obucina

Differential Revision: https://reviews.llvm.org/D16889

llvm-svn: 293614
2017-01-31 10:49:24 +00:00
Craig Topper 2cfa2071bd [AVX-512] Don't both looking into the AVX512DQ execution domain fixing tables if AVX512DQ isn't supported since we can't do any conversion anyway.
llvm-svn: 293608
2017-01-31 06:49:55 +00:00
Craig Topper 797e32dd98 [X86] Add AVX and SSE2 version of MOVSDmr to execution domain fixing table. AVX-512 already did this for the EVEX version.
llvm-svn: 293607
2017-01-31 06:49:53 +00:00
Craig Topper 779e4c5bb4 [AVX-512] Fix copy and paste bug in execution domain fixing tables so that we can convert 256-bit movnt instructions.
llvm-svn: 293606
2017-01-31 06:49:50 +00:00
Justin Lebar 1c9692a46f [NVPTX] Implement NVPTXTargetLowering::getSqrtEstimate.
Summary:

This lets us lower to sqrt.approx and rsqrt.approx under more
circumstances.

* Now we emit sqrt.approx and rsqrt.approx for calls to @llvm.sqrt.f32,
  when fast-math is enabled.  Previously, we only would emit it for
  calls to @llvm.nvvm.sqrt.f.  (With this patch we no longer emit
  sqrt.approx for calls to @llvm.nvvm.sqrt.f; we rely on intcombine to
  simplify llvm.nvvm.sqrt.f into llvm.sqrt.f32.)

* Now we emit the ftz version of rsqrt.approx when ftz is enabled.
  Previously, we only emitted rsqrt.approx when ftz was disabled.

Reviewers: hfinkel

Subscribers: llvm-commits, tra, jholewinski

Differential Revision: https://reviews.llvm.org/D28508

llvm-svn: 293605
2017-01-31 05:58:22 +00:00
Craig Topper 06e038c6de [X86] Update the broadcast fallback patterns to use shuffle instructions from the appropriate execution domain.
llvm-svn: 293603
2017-01-31 05:18:29 +00:00
Craig Topper e9e84c8284 [AVX-512] Fix the ExeDomain for VMOVDDUP, VMOVSLDUP, and VMOVSHDUP.
llvm-svn: 293601
2017-01-31 05:18:24 +00:00
Matt Arsenault f84e5d9a27 AMDGPU: Generalize matching of v_med3_f32
I think this is safe as long as no inputs are known to ever
be nans.

Also add an intrinsic for fmed3 to be able to handle all safe
math cases.

llvm-svn: 293598
2017-01-31 03:07:46 +00:00
Craig Topper d064cc93b2 [X86] Remove patterns for X86VPermilpi with integer types. I don't think we've formed these since the shuffle lowering rewrite.
llvm-svn: 293592
2017-01-31 02:09:53 +00:00
Craig Topper 85935f69fb [X86] Remove duplicate patterns for X86VPermilpv that already exist in the instructions themselves.
llvm-svn: 293591
2017-01-31 02:09:51 +00:00
Craig Topper ced68315ce [X86] Remove patterns for selecting PSHUFD with FP types. We don't seem to do this anymore and the AVX case definitely should be using VPERMILPS anyway.
llvm-svn: 293590
2017-01-31 02:09:49 +00:00
Craig Topper b76494e017 [X86] Remove 'else' after 'return'. NFC
llvm-svn: 293589
2017-01-31 02:09:46 +00:00
Craig Topper f9d901f0ea [X86] Use integer broadcast instructions for integer broadcast patterns.
I'm not sure why we were using an FP instruction before and had to have a comment calling attention to it, but not justifying it.

llvm-svn: 293588
2017-01-31 02:09:43 +00:00
Matt Arsenault b6491cc854 AMDGPU: Implement hook for InferAddressSpaces
For now just port some of the existing NVPTX tests
and from an old HSAIL optimization pass which
approximately did the same thing.

Don't enable the pass yet until more testing is done.

llvm-svn: 293580
2017-01-31 01:20:54 +00:00
Matt Arsenault 850657a439 NVPTX: Move InferAddressSpaces to generic code
llvm-svn: 293579
2017-01-31 01:10:58 +00:00
Eugene Zelenko 342257ea92 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293578
2017-01-31 00:56:17 +00:00
Matt Arsenault 9f432ec24c NVPTX: Trivial cleanups of NVPTXInferAddressSpaces
- Move DEBUG_TYPE below includes
- Change unknown address space constant to be consistent with other
  passes
- Grammar fixes in debug output

llvm-svn: 293567
2017-01-30 23:27:11 +00:00
Eugene Zelenko dde94e4c4f [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293565
2017-01-30 23:21:32 +00:00
Matt Arsenault 42b6478344 NVPTX: Refactor NVPTXInferAddressSpaces to check TTI
Add a new TTI hook for getting the generic address space value.

llvm-svn: 293563
2017-01-30 23:02:12 +00:00
Simon Pilgrim 3905e03a47 [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.
Thanks to @mkuper

llvm-svn: 293561
2017-01-30 22:58:44 +00:00
Simon Pilgrim a80a47afef [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.

This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.

llvm-svn: 293560
2017-01-30 22:48:49 +00:00
Eli Friedman 2345733246 Fix line endings.
llvm-svn: 293554
2017-01-30 22:04:23 +00:00
Tom Stellard 887a2562b7 AMDGPU: Fix release build broken by r293551
llvm-svn: 293553
2017-01-30 22:02:58 +00:00
Tom Stellard ca16621b2a Re-commit AMDGPU/GlobalISel: Add support for simple shaders
Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551
2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin a3b72798af [AMDGPU] Internalize non-kernel symbols
Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549
2017-01-30 21:05:18 +00:00
Matt Arsenault af635240d5 AMDGPU: Undo sub x, c -> add x, -c canonicalization
This is worse if the original constant is an inline immediate.

This should also be done for 64-bit adds, but requires fixing
operand folding bugs first.

llvm-svn: 293540
2017-01-30 19:30:24 +00:00
Krzysztof Parzyszek 3695d06a10 [RDF] Add support for regmasks
llvm-svn: 293538
2017-01-30 19:16:30 +00:00
Matt Arsenault 0c3293844b AMDGPU: Run AMDGPUCodeGenPrepare after inlining
With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525
2017-01-30 18:40:29 +00:00
Matt Arsenault ee3f0acf20 AMDGPU: Make i32 uaddo/usubo legal
llvm-svn: 293514
2017-01-30 18:11:38 +00:00
Krzysztof Parzyszek 49ffff12e5 [RDF] Extract the physical register information into a separate class
llvm-svn: 293510
2017-01-30 17:46:56 +00:00
Tom Stellard 7a19d56f73 Revert "AMDGPU/GlobalISel: Add support for simple shaders"
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
2017-01-30 17:42:41 +00:00
Matt Arsenault 41c1499504 AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent
llvm-svn: 293504
2017-01-30 17:09:47 +00:00
Tom Stellard e48f60aec8 AMDGPU/GlobalISel: Add support for simple shaders
Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293503
2017-01-30 17:09:15 +00:00
Simon Pilgrim 098998aef0 [X86][SSE] Add support for combining PINSRW+ASSERTZEXT+PEXTRW patterns with target shuffles
llvm-svn: 293500
2017-01-30 16:58:34 +00:00
Krzysztof Parzyszek b561cf953a [RDF] Add phis for entry block live-ins (in addition to function live-ins)
llvm-svn: 293491
2017-01-30 16:20:30 +00:00
Rafael Espindola e0eba3c493 Only print architecture dependent flags for that architecture.
Different architectures can have different meaning for flags in the
SHF_MASKPROC mask, so we should always check what the architecture use
before checking the flag.

NFC for now, but will allow fixing the value of an xmos flag.

llvm-svn: 293484
2017-01-30 15:38:43 +00:00
Benjamin Kramer 73564981fe [Hexagon] Make header self-contained.
llvm-svn: 293482
2017-01-30 14:55:33 +00:00
Asaf Badouh e11d2d73bf [X86][MCU] Minor bug fix for r293469 + test case
llvm-svn: 293478
2017-01-30 13:14:37 +00:00
Marek Olsak e81adb52b1 AMDGPU: Remove a useless VI SMRD pattern
Summary: already covered by complex patterns

Reviewers: arsenm, nhaehnle, tstellarAMD

Subscribers: kzhuravl, wdng, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28995

llvm-svn: 293477
2017-01-30 12:25:14 +00:00
Marek Olsak 8e93529020 AMDGPU: Fix assembler encoding for EXP instructions on VI
Reviewers: arsenm, tstellarAMD

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye

Differential Revision: https://reviews.llvm.org/D28992

llvm-svn: 293476
2017-01-30 12:25:03 +00:00
Kristof Beyls 65a12c012f [GlobalISel] Add support for indirectbr
Differential Revision: https://reviews.llvm.org/D28079

llvm-svn: 293470
2017-01-30 09:13:18 +00:00
Asaf Badouh 53713df0c2 [X86][MCU] replace select with bit manipulation instead of branches
Differential Revision: https://reviews.llvm.org/D28354


 

llvm-svn: 293469
2017-01-30 08:16:59 +00:00
Craig Topper f6df4a6978 [AVX-512] Remove duplicate CodeGenOnly patterns for scalar register broadcast. We can use COPY_TO_REGCLASS like AVX does.
This causes stack spill slots be oversized sometimes, but the same should already be happening with AVX.

llvm-svn: 293464
2017-01-30 06:59:06 +00:00
Craig Topper 0265a39472 [AVX-512] Remove KSET0B/KSET1B in favor of the patterns that select KSET0W/KSET1W for v8i1.
llvm-svn: 293458
2017-01-30 05:37:47 +00:00
Craig Topper 3b7e823f92 [AVX-512] Don't reuse VSHLI/VSRLI for mask register shifts. VSHLI/VSHRI shift within elements while KSHIFT moves whole elements.
llvm-svn: 293448
2017-01-30 00:06:01 +00:00
Chris Ray 30b3fafb94 [X86][Disassembler] Added SALC instruction
Reviewers: joe.abbey, craig.topper

Reviewed By: craig.topper

Subscribers: majnemer, llvm-commits

Differential Revision: https://reviews.llvm.org/D29201

llvm-svn: 293447
2017-01-29 23:02:47 +00:00
Craig Topper db919caf1b [AVX-512] Fix lowering for mask register concatenation with undef in the lower half.
Previously this test case fired an assertion in getNode because we tried to create an insert_subvector with both input types the same size and the index pointing to half the vector width.

llvm-svn: 293446
2017-01-29 22:53:33 +00:00
Chris Ray ba3741cb2b [X86] Fixing flag usage for RCL and RCR
Summary: The RCL and RCR instructions use the carry flag.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29237

llvm-svn: 293441
2017-01-29 20:05:30 +00:00
Simon Pilgrim 76073f8d22 [X86][SSE] Lower scalar_to_vector(0) to zero vector
Replaces an xor+movd/movq with an xorps which will be shorter in codesize, avoid an int-fpu transfer, allow modern cores to fast path the result during decode and helps other combines recognise an all-zero vector.

The only reason I can think of that we'd want to keep scalar_to_vector in this case is to help recognise the upper elts are undef but this doesn't seem to be a problem.

Differential Revision: https://reviews.llvm.org/D29097

llvm-svn: 293438
2017-01-29 18:13:37 +00:00
Saleem Abdulrasool 5282eed06c ARM: support `-mlong-calls` with AEABI TLS on ELF
Support lowering AEABI TLS access (__aeabi_read_tp) with long calls.
This requires adjusting the call sequence to use an indirect call to get
full addressability.

Resolves PR31769!

llvm-svn: 293433
2017-01-29 16:46:22 +00:00
Elena Demikhovsky 17fe27f1f2 [X86 Codegen] Fixed a bug in unsigned saturation
PACKUSWB converts Signed word to Unsigned byte, (the same about DW) and it can't be used for umin+truncate pattern.
AVX-512 VPMOVUS* instructions fit the pattern since they convert Unsigned to Unsigned.

See https://llvm.org/bugs/show_bug.cgi?id=31773

Differential Revision: https://reviews.llvm.org/D29196

llvm-svn: 293431
2017-01-29 13:18:30 +00:00
Igor Breger 9ea154d4ad [X86][GlobalISel] Add limited argument lowering support to the IRTranslator.
Summary:
Add limited (i8/i16/i32/i64)  argument lowering support to the IRTranslator.
Inspired by commit 289940.

Reviewers: t.p.northover, qcolombet, ab, zvi, rovka

Reviewed By: rovka

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28987

llvm-svn: 293427
2017-01-29 08:35:42 +00:00
Justin Hibbits 10b6147e23 Add some Book-E instructions to the asm parser and printer.
Summary:
Adds the following instructions:
* mfpmr
* mtpmr
* icblc
* icblq
* icbtls

Fix the scheduling for mtspr on e5500, which uses CFX0, instead of
SFX0/SFX1 as on e500mc.

Addresses PR 31538.

Differential Revision: https://reviews.llvm.org/D29002

llvm-svn: 293417
2017-01-29 04:55:57 +00:00
Craig Topper 6533e40e9d [X86] Fix vector ANDN matching to work correctly when both inputs to the AND are XORs.
llvm-svn: 293403
2017-01-28 23:52:09 +00:00
Will Dietz 10294b932c AMDGPU: Add GlobalISel to required_libraries.
llvm-svn: 293387
2017-01-28 18:13:08 +00:00
Arpith Chacko Jacob 2b156edf56 [NVPTX] Add intrinsics to support named barriers.
Support for barrier synchronization between a subset of threads
in a CTA through one of sixteen explicitly specified barriers.
These intrinsics are not directly exposed in CUDA but are
critical for forthcoming support of OpenMP on NVPTX GPUs.

The intrinsics allow the synchronization of an arbitrary
(multiple of 32) number of threads in a CTA at one of 16
distinct barriers. The two intrinsics added are as follows:

call void @llvm.nvvm.barrier.n(i32 10)
waits for all threads in a CTA to arrive at named barrier #10.

call void @llvm.nvvm.barrier(i32 15, i32 992)
waits for 992 threads in a CTA to arrive at barrier #15.

Detailed description of these intrinsics are available in the PTX manual.
http://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions

Reviewers: hfinkel, jlebar
Differential Revision: https://reviews.llvm.org/D17657

llvm-svn: 293384
2017-01-28 16:38:15 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Richard Trieu 3de487b2e8 [WebAssembly] Use print instead of dump method.
This fixes non-debug non-assert builds after r293359.

llvm-svn: 293368
2017-01-28 03:23:49 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Eugene Zelenko e79c077ef9 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293348
2017-01-27 23:58:02 +00:00
Artem Tamazov 33b01e9cfe [AMDGPU][mc] Fix memory corruption uncovered by AddressSanitizer during coverage/smoke Gfx7/8 testing.
Coverage/smoke Gfx7/8 tests were committed r292922 but then reverted
by r292974 due to AddressSanitizer failure, which is fixed by this patch.
Tests to be re-committed soon.

llvm-svn: 293338
2017-01-27 22:19:42 +00:00
Krzysztof Parzyszek 35ce5dac7f [Hexagon] Remove unused variable (and silence a warning)
llvm-svn: 293331
2017-01-27 20:40:14 +00:00
Tom Stellard 08efb7ebf6 AMDGPU/SI: Move some ISel helpers into utils so they can be shared with GISel
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye

Differential Revision: https://reviews.llvm.org/D29068

llvm-svn: 293321
2017-01-27 18:41:14 +00:00
Konstantin Zhuravlyov a304c83608 [AMDGPU] Grab MCSubtargetInfo from TargetMachine instead of constructing it
Differential Revision: https://reviews.llvm.org/D29224

llvm-svn: 293318
2017-01-27 18:32:40 +00:00
Chris Ray 535e7d1547 [X86] Adding FFREEP instruction.
Summary: Small change to get the FREEP instruction to decode properly.

Reviewers: craig.topper

Reviewed By: craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29193

llvm-svn: 293314
2017-01-27 18:02:53 +00:00
Matt Arsenault d8f7ea381f AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands
Accomplishes what r292982 was supposed to, which ended up
only really making the necessary test changes.

This should be applied to the 4.0 branch.

Patch by Vedran Miletić <vedran@miletic.net>

llvm-svn: 293310
2017-01-27 17:42:26 +00:00
Matt Arsenault 32b9600a7e NVPTX: Make NVPTXInferAddressSpaces preserve CFG
llvm-svn: 293308
2017-01-27 17:30:39 +00:00
Stanislav Mekhanoshin f6c1feb8c3 [AMDGPU] Turn AMDGPUUnifyMetadata back into module pass
With the adjustPassManager interface that is now possible to use
custom early module passes.

Differential Revision: https://reviews.llvm.org/D29189

llvm-svn: 293300
2017-01-27 16:38:10 +00:00
Simon Dardis ca74dd79e9 [mips] Recommit: "N64 static relocation model support"
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

This version corrects a "Conditional jump or move depends on uninitialised
value(s)" error detected by valgrind present in the original commit.

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293279
2017-01-27 11:36:52 +00:00
Saleem Abdulrasool 26c00e3700 ARM: fix vectorized division on WoA
The Windows on ARM target uses custom division for normal division as
the backend needs to insert division-by-zero checks.  However, it is
designed to only handle non-vectorized division.  ARM has custom
lowering for vectorized division as that can avoid loading registers
with the values and invoke a division routine for each one, preferring
to lower using NEON instructions.  Fall back to the custom lowering for
the NEON instructions if we encounter a vectorized division.

Resolves PR31778!

llvm-svn: 293259
2017-01-27 03:41:53 +00:00
NAKAMURA Takumi 0d299191d0 NVPTXCodeGen: Add IPO to libdeps, since r293189.
llvm-svn: 293256
2017-01-27 02:11:10 +00:00
Quentin Colombet 89dbea06f1 [ARM][LegalizerInfo] Specify the type of the opcode.
This is to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293248
2017-01-27 01:30:46 +00:00
Quentin Colombet 24203cf997 [AArch64][LegalizerInfo] Specify the type of the opcode.
This is an attempt to fix the win7 bot that does not seem to be very
good at infering the type when it gets used in an initiliazer list.

llvm-svn: 293246
2017-01-27 01:13:30 +00:00
Quentin Colombet e15e460c05 Revert "[AArch64][LegalizerInfo] Specify the type of the initialization list."
This reverts commit r293238.
Even with that the win7 bot is still failing:
http://lab.llvm.org:8011/builders/lld-x86_64-win7/builds/3862

llvm-svn: 293245
2017-01-27 01:13:25 +00:00
Quentin Colombet 86fc8305ec [AArch64][LegalizerInfo] Specify the type of the initialization list.
This is an attempt to fix the win7 bot that does not seem to be very
good at infering the type.

llvm-svn: 293238
2017-01-27 00:39:03 +00:00
Eugene Zelenko e6cf4374b0 [ARM] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293229
2017-01-26 23:40:06 +00:00
Krzysztof Parzyszek d6c8e3c9ce [Hexagon] Require IPO library in Hexagon build
This should unbreak the Hexagon build bots.

llvm-svn: 293221
2017-01-26 23:03:22 +00:00
Krzysztof Parzyszek c8b943860f [Hexagon] Add Hexagon-specific loop idiom recognition pass
llvm-svn: 293213
2017-01-26 21:41:10 +00:00
Balaram Makam b73d2962ba [AArch64] Refine Kryo Machine Model
Summary: Refine floating point SQRT and DIV with accurate latency information.

Reviewers: mcrosier

Subscribers: aemerson, rengolin, llvm-commits

Differential Revision: https://reviews.llvm.org/D29191

llvm-svn: 293204
2017-01-26 20:10:41 +00:00
Sean Fertile 3c8c385a77 [PPC] cleanup of mayLoad/mayStore flags and memory operands.
1) Explicitly sets mayLoad/mayStore property in the tablegen files on load/store
   instructions.
2) Updated the flags on a number of intrinsics indicating that they write
    memory.
3) Added SDNPMemOperand flags for some target dependent SDNodes so that they
   propagate their memory operand

Review: https://reviews.llvm.org/D28818
llvm-svn: 293200
2017-01-26 18:59:15 +00:00
Stanislav Mekhanoshin 81598117b6 Replace addEarlyAsPossiblePasses callback with adjustPassManager
This change introduces adjustPassManager target callback giving a
target an opportunity to tweak PassManagerBuilder before pass
managers are populated.

This generalizes and replaces addEarlyAsPossiblePasses target
callback. In particular that can be used to add custom passes to
extension points other than EP_EarlyAsPossible.

Differential Revision: https://reviews.llvm.org/D28336

llvm-svn: 293189
2017-01-26 16:49:08 +00:00
Nirav Dave d32a421f75 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293184 which is failing in LTO builds

llvm-svn: 293188
2017-01-26 16:46:13 +00:00
Serge Rogatch e09ba748cf [XRay][Arm32] Reduce the portion of the stub and implement more staging for tail calls - in LLVM
Summary:
This patch provides more staging for tail calls in XRay Arm32 . When the logging part of XRay is ready for tail calls, its support in the core part of XRay Arm32 may be as easy as changing the number passed to the handler from 1 to 2.
Coupled patch:
- https://reviews.llvm.org/D28674

Reviewers: dberris, rengolin

Reviewed By: dberris

Subscribers: llvm-commits, iid_iunknown, aemerson, rengolin, dberris

Differential Revision: https://reviews.llvm.org/D28673

llvm-svn: 293185
2017-01-26 16:17:03 +00:00
Nirav Dave de6516c466 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
* Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293184
2017-01-26 16:02:24 +00:00
Rafael Espindola 82149a1aa9 Use shouldAssumeDSOLocal in classifyGlobalReference.
And teach shouldAssumeDSOLocal that ppc has no copy relocations.

The resulting code handle a few more case than before. For example, it
knows that a weak symbol can be resolved to another .o file, but it
will still be in the main executable.

llvm-svn: 293180
2017-01-26 15:02:31 +00:00
Simon Pilgrim 027bb453d9 [X86][SSE] Add support for combining ANDNP byte masks with target shuffles
llvm-svn: 293178
2017-01-26 14:31:12 +00:00
Simon Pilgrim 3057fd53f9 [X86][SSE] Pull out target shuffle resolve code into helper. NFCI.
Pulled out code that removed unused inputs from a target shuffle mask into a helper function to allow it to be reused in a future commit.

llvm-svn: 293175
2017-01-26 13:06:02 +00:00
Valery Pykhtin 75d1de903f [AMDGPU] Fix typo in GCNSchedStrategy
Differential revision: https://reviews.llvm.org/D28980

llvm-svn: 293171
2017-01-26 10:51:47 +00:00
Simon Dardis 5b67a4f75f Revert "[mips] N64 static relocation model support"
This reverts commit r293164. There are multiple tests failing.

llvm-svn: 293170
2017-01-26 10:46:07 +00:00
Simon Dardis 09e65efd09 [mips] N64 static relocation model support
This patch makes one change to GOT handling and two changes to N64's
relocation model handling. Furthermore, the jumptable encodings have
been corrected for static N64.

Big GOT handling is now done via a new SDNode MipsGotHi - this node is
unconditionally lowered to an lui instruction.

The first change to N64's relocation handling is the lifting of the
restriction that N64 always uses PIC. Now it is possible to target static
environments.

The second change adds support for 64 bit symbols and enables them by
default. Previously N64 had patterns for sym32 mode only. In this mode all
symbols are assumed to have 32 bit addresses. sym32 mode support
is selectable with attribute 'sym32'. A follow on patch for clang will
add the necessary frontend parameter.

This partially resolves PR/23485.

Thanks to Brooks Davis for reporting the issue!

Reviewers: dsanders, seanbruno, zoran.jovanovic, vkalintiris

Differential Revision: https://reviews.llvm.org/D23652

llvm-svn: 293164
2017-01-26 10:19:02 +00:00
Diana Picus 278c722e6d [ARM] GlobalISel: Load i1, i8 and i16 args from stack
Add support for loading i1, i8 and i16 arguments from the stack, with or without
the ABI extension flags.

When the ABI extension flags are present, we load a 4-byte value, otherwise we
preserve the size of the load and let the instruction selector replace it with a
LDRB/LDRH. This generates the same thing as DAGISel.

Differential Revision: https://reviews.llvm.org/D27803

llvm-svn: 293163
2017-01-26 09:20:47 +00:00
Craig Topper bad53cce26 [AVX-512] Move the combine that runs combineBitcastForMaskedOp to the last DAG combine phase where I had originally meant to put it.
llvm-svn: 293157
2017-01-26 07:17:58 +00:00
Craig Topper f0bab7b739 [X86] When bitcasting INSERT_SUBVECTOR/EXTRACT_SUBVECTOR to match masked operations, use the correct type for the immediate operand.
llvm-svn: 293156
2017-01-26 07:17:53 +00:00