Commit Graph

19336 Commits

Author SHA1 Message Date
Craig Topper a87b40051d [X86] Add an additional CHECK prefix to a test. Some of the cases used it, but it wasn't on the FileCheck command lines.
llvm-svn: 296283
2017-02-26 06:45:32 +00:00
David L. Jones d95c34abe1 [X86] Clean up test/CodeGen/X86/2006-03-02-InstrSchedBug.ll
Summary:
Migrated from grep to FileCheck.
Re-indented code, removed boilerplate comments.
Added 'entry' label at beginning of basic block.

Patch by Jorge Gorbe!

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30320

llvm-svn: 296280
2017-02-26 01:32:35 +00:00
Nirav Dave 73cd0194cf Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.

llvm-svn: 296279
2017-02-26 01:27:32 +00:00
Craig Topper 2caa97c891 [AVX-512] Fix the execution domain for scalar FMA instructions.
llvm-svn: 296271
2017-02-25 19:36:28 +00:00
Craig Topper 176f3310b6 [AVX-512] Fix the execution domain on some instructions.
llvm-svn: 296270
2017-02-25 19:18:11 +00:00
Craig Topper 0524035fb4 [AVX-512] Add an additional test case to show the execution domain for vrqsrtsd is wrong.
llvm-svn: 296269
2017-02-25 19:18:08 +00:00
Craig Topper 9b43d459bf [AVX-512] Use update_llc_test_checks.py to regenerate the avx512er intrinsic test.
llvm-svn: 296268
2017-02-25 19:18:04 +00:00
Nirav Dave 4a20711826 reenable accidentally disabled test NFC.
llvm-svn: 296266
2017-02-25 19:11:53 +00:00
Craig Topper 3b8aca2ecf [ExecutionDepsFix] Don't make copies of LiveReg objects when collecting operands for soft instructions
Summary:
While collecting operands we make copies of the LiveReg objects which are stored in the LiveRegs array. If the instruction uses the same register multiple times we end up with multiple copies. Later we iterate through the collected list of LiveReg objects and merge DomainValues. In the process of doing this the merge function can change the contents of the original LiveReg object in the LiveRegs array, but not the copies that have been made. So when we get to the second usage of the register we end up seeing a stale copy of the LiveReg object.

To fix this I've stopped copying and now just store a pointer to the original LiveReg object. Another option might be to avoid adding the same register to the Regs array twice, but this approach seemed simpler.

The included test case exposes this bug due to an AVX-512 masked OR instruction using the same register for the passthru operand and one of the inputs to the OR operation.

Fixes PR30284.

Reviewers: RKSimon, stoklund, MatzeB, spatel, myatsina

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30242

llvm-svn: 296260
2017-02-25 18:12:25 +00:00
Amaury Sechet 09ecd3117e Update various test's codegen. NFC
llvm-svn: 296257
2017-02-25 16:46:47 +00:00
Amaury Sechet 7bda8cdef2 Add test for known bits in uaddo and saddo.
llvm-svn: 296255
2017-02-25 15:58:34 +00:00
Artyom Skrobov 2716910caf The automatic CHECK: to CHECK-LABEL: conversion, back in 2013,
had missed most labels in this test because they didn't end
with a colon.

llvm-svn: 296254
2017-02-25 15:17:16 +00:00
Nirav Dave beabf456df In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 296252
2017-02-25 11:43:58 +00:00
Dan Gohman 82607f56bd [WebAssembly] Add support for using a wasm global for the stack pointer.
This replaces the __stack_pointer variable which was allocated in linear
memory.

llvm-svn: 296201
2017-02-24 23:46:05 +00:00
Krzysztof Parzyszek 0d67b10a3c [Hexagon] Undo shift folding where it could simplify addressing mode
For example, avoid (single shift):
  r0 = and(##536870908,lsr(r0,#3))
  r0 = memw(r1+r0<<#0)

in favor of (two shifts):
  r0 = lsr(r0,#5)
  r0 = memw(r1+r0<<#2)

llvm-svn: 296196
2017-02-24 23:34:24 +00:00
Dan Gohman d934cb8806 [WebAssembly] Basic support for Wasm object file encoding.
With the "wasm32-unknown-unknown-wasm" triple, this allows writing out
simple wasm object files, and is another step in a larger series toward
migrating from ELF to general wasm object support. Note that this code
and the binary format itself is still experimental.

llvm-svn: 296190
2017-02-24 23:18:00 +00:00
Wei Ding 4d3d4ca1b3 AMDGPU : Replace FMAD with FMA when denormals are enabled.
Differential Revision: http://reviews.llvm.org/D29958

llvm-svn: 296186
2017-02-24 23:00:29 +00:00
Stanislav Mekhanoshin 42259cf35e Revert "Correct register pressure calculation in presence of subregs"
This reverts commit r296009. It broke one out of tree target and also
does not account for all partial lines added or removed when calculating
PressureDiff.

llvm-svn: 296182
2017-02-24 21:56:16 +00:00
Evgeniy Stepanov 00400d36c9 Disallow redefinition of section symbols.
Differential Revision: https://reviews.llvm.org/D30235

llvm-svn: 296180
2017-02-24 21:44:58 +00:00
Sanjay Patel ab08bb8da9 [ARM] add tests for alternate forms of select-of-constants; NFC
llvm-svn: 296178
2017-02-24 21:36:34 +00:00
Tim Northover ef29e7284b GlobalISel: check for CImm rather than Imm on G_CONSTANTs.
All G_CONSTANTS created by the MachineIRBuilder have an operand of type CImm
(i.e. a ConstantInt), so that's what the selector needs to look for.

llvm-svn: 296176
2017-02-24 21:21:38 +00:00
Sanjay Patel cd72f156d6 [ARM] auto-generate complete checks; NFC
The affected test may change with a patch I'm looking at for DAGCombiner,
so I want to make sure it's not a regression.

llvm-svn: 296175
2017-02-24 21:19:09 +00:00
Dan Gohman 6999c4fd28 [WebAssembly] Handle f16 in fast-isel.
llvm-svn: 296172
2017-02-24 21:05:35 +00:00
Michael Kuperstein 46b131e3f8 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296149
2017-02-24 18:41:32 +00:00
Nemanja Ivanovic 195c5452d3 [PowerPC] Use subfic instruction for subtract from immediate
Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate.
The corresponding pattern already exists for 32-bit integers.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29387

llvm-svn: 296144
2017-02-24 18:16:06 +00:00
Nemanja Ivanovic 82d53ed492 [PowerPC] Use rldicr instruction for AND with an immediate if possible
Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that
clear bits from the right hand size.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29388

llvm-svn: 296143
2017-02-24 18:03:16 +00:00
Sanjay Patel 832b1622d8 [DAGCombiner] add missing folds for scalar select of {-1,0,1}
The motivation for filling out these select-of-constants cases goes back to D24480, 
where we discussed removing an IR fold from add(zext) --> select. And that goes back to:
https://reviews.llvm.org/rL75531
https://reviews.llvm.org/rL159230

The idea is that we should always canonicalize patterns like this to a select-of-constants 
in IR because that's the smallest IR and the best for value tracking. Note that we currently 
do the opposite in some cases (like the cases in *this* patch). Ie, the proposed folds in 
this patch already exist in InstCombine today:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/InstCombine/InstCombineSelect.cpp#L1151

As this patch shows, most targets generate better machine code for simple ext/add/not ops 
rather than a select of constants. So the follow-up steps to make this less of a patchwork 
of special-case folds and missing IR canonicalization:

1. Have DAGCombiner convert any select of constants into ext/add/not ops.
2  Have InstCombine canonicalize in the other direction (create more selects).

Differential Revision: https://reviews.llvm.org/D30180

llvm-svn: 296137
2017-02-24 17:17:33 +00:00
Simon Dardis ae6f2bcb25 Recommit "[mips] Fix atomic compare and swap at O0."
This time with the missing files.

Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296134
2017-02-24 16:32:18 +00:00
Simon Dardis 3c58c18ff0 Revert "[mips] Fix atomic compare and swap at O0."
This reverts r296132. I forgot to include the tests.

llvm-svn: 296133
2017-02-24 16:30:27 +00:00
Simon Dardis cf0e06d375 [mips] Fix atomic compare and swap at O0.
Similar to PR/25526, fast-regalloc introduces spills at the end of basic
blocks. When this occurs in between an ll and sc, the store can cause the
atomic sequence to fail.

This patch fixes the issue by introducing more pseudos to represent atomic
operations and moving their lowering to after the expansion of postRA
pseudos.

This resolves PR/32020.

Thanks to James Cowgill for reporting the issue!

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30257

llvm-svn: 296132
2017-02-24 16:27:45 +00:00
Daniel Sanders 066ebbfd46 [globalisel] Decouple src pattern operands from dst pattern operands.
Summary:
This isn't testable for AArch64 by itself so this patch also adds
support for constant immediates in the pattern and physical
register uses in the result.

The new IntOperandMatcher matches the constant in patterns such as
'(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold
immediates into an instruction so this is the first rule that will match
across multiple BB's.

The Renderer hierarchy is responsible for adding operands to the result
instruction. Renderers can copy operands (CopyRenderer) or add physical
registers (in particular %wzr and %xzr) to the result instruction
in any order (OperandMatchers now import the operand names from
SelectionDAG to allow renderers to access any operand). This allows us to
emit the result instruction for:
  %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0
  %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0
although the latter is untested since the matcher/importer has not been
taught about commutativity yet.

Added BuildMIAction which can build new instructions and mutate them where
possible. W.r.t the mutation aspect, MatchActions are now told the name of
an instruction they can recycle and BuildMIAction will emit mutation code
when the renderers are appropriate. They are appropriate when all operands
are rendered using CopyRenderer and the indices are the same as the matcher.
This currently assumes that all operands have at least one matcher.

Finally, this change also fixes a crash in
AArch64InstructionSelector::select() caused by an immediate operand
passing isImm() rather than isCImm(). This was uncovered by the other
changes and was detected by existing tests.

Depends on D29711

Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar

Reviewed By: rovka

Subscribers: aemerson, dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D29712

llvm-svn: 296131
2017-02-24 15:43:30 +00:00
Diana Picus 3b99c64ba1 [ARM] GlobalISel: Select G_STORE
Same as selecting G_LOAD.

llvm-svn: 296122
2017-02-24 14:01:27 +00:00
Diana Picus b31a259198 Minor test fix
The test was using a size of 8 for loading/storing pointers. It should be 4.

llvm-svn: 296120
2017-02-24 13:27:55 +00:00
Diana Picus 1f432f995a [ARM] GlobalISel: Add reg bank mappings for stores
Same as the ones for loads.

llvm-svn: 296115
2017-02-24 13:07:25 +00:00
Diana Picus a2b632a353 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296108
2017-02-24 11:28:24 +00:00
Diana Picus c21d1e5d94 Revert "[ARM] GlobalISel: Legalize stores"
This reverts commit r296103 because the test broke on one of the bots. Sorry!

llvm-svn: 296104
2017-02-24 10:35:39 +00:00
Diana Picus a5f1cfd1a7 [ARM] GlobalISel: Legalize stores
Allow the same types that we allow for loads.

llvm-svn: 296103
2017-02-24 10:19:23 +00:00
Craig Topper f2529c188b [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz intrinsics with select.
Clang has been emitting cltz intrinsics for a while now.

llvm-svn: 296091
2017-02-24 05:35:04 +00:00
Craig Topper dc13344150 [AVX-512] Move lzcnt and conflict intrinsic tests to avx512cd intrinsic test file since that's their feature.
llvm-svn: 296090
2017-02-24 05:34:59 +00:00
Craig Topper 700edc3275 [AVX-512] Use update_llc_test_checks.py to generate a test.
llvm-svn: 296089
2017-02-24 05:34:57 +00:00
Petr Hosek a7d5916308 [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stack
The Fuchsia ABI defines slots from the thread pointer where the
stack-guard value for stack-protector, and the unsafe stack pointer
for safe-stack, are stored. This parallels the Android ABI support.

Patch by Roland McGrath

Differential Revision: https://reviews.llvm.org/D30237

llvm-svn: 296081
2017-02-24 03:10:10 +00:00
Eli Friedman 7e0ce82c4a Add some testcases for bitfields with illegal widths.
clang will generate IR like this for input using packed bitfields;
very simple semantically, but it's a bit tricky to actually
generate good code.

llvm-svn: 296080
2017-02-24 03:04:11 +00:00
Eli Friedman 8f34746c72 Fix old testcase for dead store to match the original intent.
The x86 backend has a special case for load+xor+store, which isn't really
what this is trying to test.

llvm-svn: 296077
2017-02-24 02:58:49 +00:00
Adam Nemet f373e68bfc [LazyMachineBFI] Add testcase
This is based on Justin's testcase and checking whether BFI is not populated
in case hotness is off.

This is a patch meant on top of Justin's patch to enable Machine opt-remarks
in the
AsmPrinter (http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170130/426595.html)

Differential Revision: https://reviews.llvm.org/D29837

llvm-svn: 296065
2017-02-24 01:22:55 +00:00
Michael Kuperstein 581c9f4b20 Revert r269060 to pacify bots.
llvm-svn: 296064
2017-02-24 01:22:19 +00:00
Justin Bogner d75fd0988d OptDiag: Add test for r296053
Forgot to commit this with the change.

llvm-svn: 296061
2017-02-24 01:13:09 +00:00
Michael Kuperstein 12e79d5002 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296060
2017-02-24 00:56:21 +00:00
Artem Belevich 620db1f3dd [NVPTX] Added support for .f16x2 instructions.
This patch enables support for .f16x2 operations.

Added new register type Float16x2.
Added support for .f16x2 instructions.
Added handling of vectorized loads/stores of v2f16 values.

Differential Revision: https://reviews.llvm.org/D30057
Differential Revision: https://reviews.llvm.org/D30310

llvm-svn: 296032
2017-02-23 22:38:24 +00:00
Tim Northover 063a56e81c ARM: make sure FastISel bails on f64 operations for Cortex-M4.
FastISel wasn't checking the isFPOnlySP subtarget feature before emitting
double-precision operations, so it got completely invalid CodeGen for doubles
on Cortex-M4F.

The normal ISel testing wasn't spectacular either so I added a second RUN line
to improve that while I was in the area.

llvm-svn: 296031
2017-02-23 22:35:00 +00:00
Krzysztof Parzyszek 128e191eac [Hexagon] Handle saturations in Hexagon bit tracker
llvm-svn: 296026
2017-02-23 22:11:52 +00:00