Commit Graph

41160 Commits

Author SHA1 Message Date
Craig Topper cac328f25e [X86] Fix printing of sha256rnds2 to include the implicit %xmm0 argument.
llvm-svn: 294132
2017-02-05 18:33:31 +00:00
Craig Topper d7ae9ab1fa [X86] Fix printing of blendvpd/blendvps/pblendvb to include the implicit %xmm0 argument. This makes codegen output more obvious about the %xmm0 usage.
llvm-svn: 294131
2017-02-05 18:33:24 +00:00
Craig Topper 6a35a81fc5 [X86] In LowerTRUNCATE, create an ISD::VECTOR_SHUFFLE instead of explicitly creating a PSHUFB. This will be lowered by regular shuffle lowering to a PSHUFB later.
Similar was already done for several other shuffles in this function.

The test changes are because the old code used explicity zeroing for elements that could have been undef.

While I was here I also changed other shuffle vectors in the same function to use the same input twice instead of creating UNDEF nodes. getVectorShuffle can create the UNDEF for us.

llvm-svn: 294130
2017-02-05 18:33:14 +00:00
Daniel Sanders c3ac566754 [globalisel][arm] Tablegen-erate current Register Bank Information.
Summary:
This patch tablegen-erates the ARM register bank information so that the
static tables added in D27807 no longer need to be maintained.

Depends on D27338

Reviewers: t.p.northover, rovka, ab, qcolombet, aditya_nandakumar

Reviewed By: rovka

Subscribers: aemerson, rengolin, mgorny, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D28567

llvm-svn: 294124
2017-02-05 12:07:55 +00:00
Dylan McKay b78f36657e [AVR] Fix a bug where asm operands are printed twice
We would unconditionally call printOperand, even if PrintAsmOperand
already printed the immediate.

llvm-svn: 294121
2017-02-05 10:42:49 +00:00
Dylan McKay 7a3eb290ef [AVR] Support zero-sized arguments in defined methods
It is sufficient to skip emission of these arguments as we have nothing
to actually pass through the function call.

The AVR-GCC reference has nothing to say about zero-sized arguments,
presumably because C/C++ doesn't support them. This means we don't have
to worry about ABI differences.

llvm-svn: 294119
2017-02-05 09:53:45 +00:00
Craig Topper 978fdb75a4 [X86] Add support for folding (insert_subvector vec1, (extract_subvector vec2, idx1), idx1) -> (blendi vec2, vec1).
llvm-svn: 294112
2017-02-04 23:26:46 +00:00
Craig Topper 3d95228dbe [X86] Simplify the code that turns INSERT_SUBVECTOR into BLENDI. NFCI
llvm-svn: 294111
2017-02-04 23:26:42 +00:00
Eric Christopher b128abcf7a Remove a bunch of unnecessary casts to a target specific version of TII and TRI as we're working from a target specific STI.
llvm-svn: 294081
2017-02-04 01:52:17 +00:00
Eugene Zelenko 3f37f07c7f [Sparc] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294072
2017-02-04 00:36:49 +00:00
Eugene Zelenko cd8ea02b4a [Mips] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294069
2017-02-03 23:39:33 +00:00
Eugene Zelenko 06869c04f3 [SystemZ] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294068
2017-02-03 23:39:06 +00:00
Eugene Zelenko e894b4dc59 [AMDGPU] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294067
2017-02-03 23:38:40 +00:00
Eugene Zelenko 939f6b0167 [AArch64] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294053
2017-02-03 21:49:13 +00:00
Eugene Zelenko 07dc38f67a [ARM] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294052
2017-02-03 21:48:12 +00:00
Eugene Zelenko c3164b9c2f [XCore] Fix some Include What You Use warnings; other minor fixes (NFC).
This is preparation to reduce MCExpr.h dependencies.

llvm-svn: 294051
2017-02-03 21:46:55 +00:00
Matt Arsenault f15da6c419 AMDGPU: AsmParser cleanups
Use typedef, remove unnecessary enum, line wraps.

llvm-svn: 294039
2017-02-03 20:49:51 +00:00
Stanislav Mekhanoshin 81db53109d [AMDGPU] Bump -amdgpu-unroll-threshold-private to 2000
This has quite positive performance impact according to measurements.
Before previous fixes to limit the optimization that was too high
and blowed compile time and scratch usage, but now this is gone and
we can bump the threshold.

Differential Revision: https://reviews.llvm.org/D29505

llvm-svn: 294032
2017-02-03 20:08:29 +00:00
Matt Arsenault 1fa5eacf9d AMDGPU: Set MCAsmInfo::PointerSize
llvm-svn: 294031
2017-02-03 20:02:23 +00:00
Matt Arsenault d9cd736585 AMDGPU: Don't unroll for private with dynamic allocas
This won't be elimnated, so this will just bloat code
if/when these are ever used/supported.

llvm-svn: 294030
2017-02-03 19:36:00 +00:00
Simon Pilgrim 034c1bd32c [X86][SSE] Add support for combining scalar_to_vector(extract_vector_elt) into a target shuffle.
Correctly flagging upper elements as undef.

llvm-svn: 294020
2017-02-03 17:59:58 +00:00
Simon Dardis 68e9d94055 [mips] Remove absolute size assertion for end directive
The .end <symbol> directive for MIPS marks the end of a symbol and sets the
symbol's size. Previously, the corresponding emitDirective handler asserted
that a function's size could be evaluated to an absolute value at that point
in time.

This cannot be done with when directives like .align have been encountered,
instead set the function's size to the corresponding symbolic expression and
let ELFObjectWriter resolve the expression to an absolute value. This avoids
a redundant call to evaluateAsAbsolute.

llvm-svn: 294012
2017-02-03 15:48:53 +00:00
Justin Lebar e90c468444 [NVPTX] Enable combineRepeatedFPDivisors for NVPTX.
Reviewers: tra

Subscribers: jholewinski, llvm-commits

Differential Revision: https://reviews.llvm.org/D29477

llvm-svn: 294011
2017-02-03 15:13:50 +00:00
Artem Tamazov 43b61561b0 [AMDGPU][mc] Fix AddressSanitizer leftover issue in gfx7_asm_all test
Issue occurs when assembling "ds_ordered_count v0, v0 gds".

llvm-svn: 294004
2017-02-03 12:47:30 +00:00
Sanne Wouda a994185757 [ARM] Change TCReturn to tBL if tailcall optimization fails.
Summary:
The tail call optimisation is performed before register allocation, so
at that point we don't know if LR is being spilt or not. If LR was spilt
to the stack, then we cannot do a tail call optimisation. That would
involve popping back into LR which is not possible in Thumb1 code.

Reviewers: rengolin, jmolloy, rovka, olista01

Reviewed By: olista01

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D29020

llvm-svn: 294000
2017-02-03 11:15:53 +00:00
Stanislav Mekhanoshin f29602df65 [AMDGPU] Unroll preferences improvements
Exit loop analysis early if suitable private access found.
Do not account for GEPs which are invariant to loop induction variable.
Do not account for Allocas which are too big to fit into register file anyway.
Add option for tuning: -amdgpu-unroll-threshold-private.

Differential Revision: https://reviews.llvm.org/D29473

llvm-svn: 293991
2017-02-03 02:20:05 +00:00
Matt Arsenault e1b595306d AMDGPU: Fold fneg into fmin/fmax_legacy
llvm-svn: 293972
2017-02-03 00:51:50 +00:00
Craig Topper bbb2b95ce5 [X86] Mark 256-bit and 512-bit INSERT_SUBVECTOR operations as legal and remove the custom lowering.
llvm-svn: 293969
2017-02-03 00:24:49 +00:00
Matt Arsenault 2511c031de AMDGPU: Fold fneg into fminnum/fmaxnum
llvm-svn: 293968
2017-02-03 00:23:15 +00:00
Matt Arsenault a8fcfadf46 AMDGPU: Check if users of fneg can fold mods
In multi-use cases this can save a few instructions.

llvm-svn: 293962
2017-02-02 23:21:23 +00:00
Eugene Zelenko fbd13c5c12 [X86] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293949
2017-02-02 22:55:55 +00:00
Reid Kleckner 3c467e225e [X86] Avoid sorted order check in release builds
Effectively reverts r290248 and fixes the unused function warning with
ifndef NDEBUG.

llvm-svn: 293945
2017-02-02 22:06:30 +00:00
Craig Topper c45657375b [X86] Move turning 256-bit INSERT_SUBVECTORS into BLENDI from legalize to DAG combine.
On one test this seems to have given more chance for DAG combine to do other INSERT_SUBVECTOR/EXTRACT_SUBVECTOR combines before the BLENDI was created. Looks like we can still improve more by teaching DAG combine to optimize INSERT_SUBVECTOR/EXTRACT_SUBVECTOR with BLENDI.

llvm-svn: 293944
2017-02-02 22:02:57 +00:00
Reid Kleckner c35139ec0d [CodeGen] Remove dead call-or-prologue enum from CCState
This enum has been dead since Olivier Stannard re-implemented ARM byval
handling in r202985 (2014).

llvm-svn: 293943
2017-02-02 21:58:22 +00:00
Rafael Espindola 13a79bbfe5 Change how we handle section symbols on ELF.
On ELF every section can have a corresponding section symbol. When in
an assembly file we have

.quad .text

the '.text' refers to that symbol.

The way we used to handle them is to leave .text an undefined symbol
until the very end when the object writer would map them to the
actual section symbol.

The problem with that is that anything before the end would see an
undefined symbol. This could result in bad diagnostics
(test/MC/AArch64/label-arithmetic-diags-elf.s), or incorrect results
when using the asm streamer (est/MC/Mips/expansion-jal-sym-pic.s).

Fixing this will also allow using the section symbol earlier for
setting sh_link of SHF_METADATA sections.

This patch includes a few hacks to avoid changing our behaviour when
handling conflicts between section symbols and other symbols. I
reported pr31850 to track that.

llvm-svn: 293936
2017-02-02 21:26:06 +00:00
Javed Absar bb8dcc6aec [ARM] Classification Improvements to ARM Sched-Model. NFCI.
This is the second in the series of patches to enable adding
of machine sched-models for ARM processors easier and compact.
This patch focuses on integer instructions and adds missing
sched definitions.

Reviewers: rovka, rengolin
Differential Revision: https://reviews.llvm.org/D29127

llvm-svn: 293935
2017-02-02 21:08:12 +00:00
Krzysztof Parzyszek d0d42f0ec8 [Hexagon] Adding opExtentBits and opExtentAlign to GPrel instructions
Patch by Colin LeMahieu.

llvm-svn: 293933
2017-02-02 20:35:12 +00:00
Michael Kuperstein e6d59fdca5 [X86] Add costs for non-AVX512 single-source permutation integer shuffles
Differential Revision: https://reviews.llvm.org/D29416

llvm-svn: 293932
2017-02-02 20:27:13 +00:00
Krzysztof Parzyszek e17b0bfb24 [Hexagon] Fix relocation kind for extended predicated calls
Patch by Sid Manning.

llvm-svn: 293931
2017-02-02 20:21:56 +00:00
Krzysztof Parzyszek 357b048666 [Hexagon] Remove A4_ext_* pseudo instructions
Patch by Colin LeMahieu.

llvm-svn: 293929
2017-02-02 19:58:22 +00:00
Krzysztof Parzyszek d67ab623f6 [Hexagon] Fix insertBranch for loops with multiple ENDLOOP instructions
llvm-svn: 293925
2017-02-02 19:36:37 +00:00
Dan Gohman b89f2d3d92 [WebAssembly] Add instruction definitions for drop and get/set_global.
llvm-svn: 293922
2017-02-02 19:29:44 +00:00
Nirav Dave 93f9d5ce04 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r293893 which is miscompiling lua on ARM and
bootstrapping for x86-windows.

llvm-svn: 293915
2017-02-02 18:24:55 +00:00
Simon Dardis 08ce5fb66b [mips] Expansion of BEQL and BNEL with immediate operands
Adds support for BEQL and BNEL macros with immediate operands.

Patch by: Srdjan Obucina

Reviewers: dsanders, zoran.jovanovic, vkalintiris, sdardis, obucina, seanbruno

Differential Revision: https://reviews.llvm.org/D17040

llvm-svn: 293905
2017-02-02 16:13:49 +00:00
Jonas Paulsson b7a2ef8375 [SystemZ] Add comment for ISD::FP_TO_UINT expansion.
(Copied from the fp-conv-10.ll test to SystemZISelLowering.cpp)

Review: Ulrich Weigand
llvm-svn: 293900
2017-02-02 15:42:14 +00:00
Krzysztof Parzyszek bc4dc9b4b9 [Hexagon] Emitting individual instructions without copying them
Patch by Colin LeMahieu.

llvm-svn: 293899
2017-02-02 15:32:26 +00:00
Krzysztof Parzyszek f65b8f14f4 [Hexagon] Rename TypeCOMPOUND to TypeCJ
llvm-svn: 293894
2017-02-02 15:03:30 +00:00
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Nirav Dave e14300e270 [X86,ISEL] Fix X86 increment chain dependence calculation
Merging Load-add-store pattern into a increment op previously dropped
the load's chain from the instructions dependence if the store is
chained to a TokenFactor.

llvm-svn: 293892
2017-02-02 14:39:26 +00:00
Diana Picus 32cd9b434c [ARM] GlobalISel: Lower pointer args and returns
It is important to change the ArgInfo's type from pointer to integer, otherwise
the CC assign function won't know what to do. Instead of hacking it up, we use
ComputeValueVTs and introduce some of the helpers that we will need later on for
lowering more complex types.

llvm-svn: 293889
2017-02-02 14:01:00 +00:00