Commit Graph

22 Commits

Author SHA1 Message Date
Nirav Dave a81682aad4 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r284151 which appears to be triggering a LTO
failures on Hexagon

llvm-svn: 284157
2016-10-13 20:23:25 +00:00
Nirav Dave 4b36957243 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after upstream changes.

   Simplify Consecutive Merge Store Candidate Search

   Now that address aliasing is much less conservative, push through
   simplified store merging search which only checks for parallel stores
   through the chain subgraph. This is cleaner as the separation of
   non-interfering loads/stores from the store-merging logic.

   Whem merging stores, search up the chain through a single load, and
   finds all possible stores by looking down from through a load and a
   TokenFactor to all stores visited. This improves the quality of the
   output SelectionDAG and generally the output CodeGen (with some
   exceptions).

   Additional Minor Changes:

       1. Finishes removing unused AliasLoad code
       2. Unifies the the chain aggregation in the merged stores across
       code paths
       3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
       4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

   This finishes the change Matt Arsenault started in r246307 and
   jyknight's original patch.

   Many tests required some changes as memory operations are now
   reorderable. Some tests relying on the order were changed to use
   volatile memory operations

   Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 284151
2016-10-13 19:20:16 +00:00
Nirav Dave e524f50882 Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r282600 due to test failues with MCJIT

llvm-svn: 282604
2016-09-28 16:37:50 +00:00
Nirav Dave e17e055b75 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Simplify Consecutive Merge Store Candidate Search

  Now that address aliasing is much less conservative, push through
  simplified store merging search which only checks for parallel stores
  through the chain subgraph. This is cleaner as the separation of
  non-interfering loads/stores from the store-merging logic.

  Whem merging stores, search up the chain through a single load, and
  finds all possible stores by looking down from through a load and a
  TokenFactor to all stores visited. This improves the quality of the
  output SelectionDAG and generally the output CodeGen (with some
  exceptions).

  Additional Minor Changes:

    1. Finishes removing unused AliasLoad code
    2. Unifies the the chain aggregation in the merged stores across
       code paths
    3. Re-add the Store node to the worklist after calling
       SimplifyDemandedBits.
    4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
       arbitrary, but seemed sufficient to not cause regressions in
       tests.

  This finishes the change Matt Arsenault started in r246307 and
  jyknight's original patch.

  Many tests required some changes as memory operations are now
  reorderable. Some tests relying on the order were changed to use
  volatile memory operations

  Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -
      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

    CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll -
      This test appears to work but no longer exhibits the spill
      behavior.

Reviewers: arsenm, hfinkel, tstellarAMD, nhaehnle, jyknight

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 282600
2016-09-28 15:50:43 +00:00
Simon Pilgrim ab7c46eccf [X86][SSE] Add SSE1 tests to make sure we don't merge loads on illegal types
llvm-svn: 279065
2016-08-18 13:41:26 +00:00
Simon Pilgrim c15d217831 [X86][SSE] Added support for combining target shuffles to (V)PSHUFD/VPERMILPD/VPERMILPS immediate permutes
This patch allows target shuffles to be combined to single input immediate permute instructions - (V)PSHUFD/VPERMILPD/VPERMILPS - allowing more general pattern matching than what we current do and improves the likelihood of memory folding compared to existing patterns which tend to reuse the input in multiple arguments.

Further permute instructions (V)PSHUFLW/(V)PSHUFHW/(V)PERMQ/(V)PERMPD may be added in the future but its proven tricky to create tests cases for them so far. (V)PSHUFLW/(V)PSHUFHW is already handled quite well in combineTargetShuffle so it may be that removing some of that code may allow us to perform more of the combining in one place without duplication.

Differential Revision: http://reviews.llvm.org/D21148

llvm-svn: 273999
2016-06-28 08:08:15 +00:00
Simon Pilgrim c49bd2ede0 [X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros
Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible.

llvm-svn: 264922
2016-03-30 20:52:24 +00:00
Simon Pilgrim 0110890a79 [X86][SSE] Added tests to ensure that consecutive loads including any/all volatiles are not combined
llvm-svn: 264225
2016-03-24 00:14:37 +00:00
Simon Pilgrim e9093adae0 [X86][AVX] Add shuffle masking support for EltsFromConsecutiveLoads
Add support for the case where we have a consecutive load (which must include the first + last elements) with a mixture of undef/zero elements. We load the vector and then apply a shuffle to clear the zero'd elements.

Differential Revision: http://reviews.llvm.org/D17297

llvm-svn: 261490
2016-02-21 19:15:48 +00:00
Simon Pilgrim 63b1ecab7d line endings fix
llvm-svn: 259992
2016-02-06 15:38:25 +00:00
Simon Pilgrim 9e369f2a51 [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim 7823fd2535 [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Simon Pilgrim 6788f33cf2 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Simon Pilgrim 528e94e9a2 [X86][SSE] Added i686 target tests to make sure we are correctly loading consecutive entries as 64-bit integers
llvm-svn: 259794
2016-02-04 15:51:55 +00:00
Simon Pilgrim 5be17b6e3e [X86][AVX512] Add support for AVX512 VMOVD (load) shuffle decoding
llvm-svn: 259430
2016-02-01 23:04:05 +00:00
Simon Pilgrim f5c23ad3d7 [X86][AVX512] Add support for AVX512 VMOVSD/VMOVSS shuffle decoding
llvm-svn: 259427
2016-02-01 22:26:28 +00:00
Simon Pilgrim 025a3d857a [X86][AVX512] Add support for AVX512 VINSERTPS shuffle decoding
llvm-svn: 259420
2016-02-01 22:05:50 +00:00
Simon Pilgrim 068e38f7f4 [X86][SSE] Add AVX512 merge consecutive load tests
Add AVX512F/AVX512BW 512-bit tests.

Add AVX512F tests to existing 128/256-bit tests.

llvm-svn: 259410
2016-02-01 21:30:50 +00:00
Simon Pilgrim 46696ef93c [X86][SSE] Add zero element and general 64-bit VZEXT_LOAD support to EltsFromConsecutiveLoads
This patch adds support for trailing zero elements to VZEXT_LOAD loads (and checks that no zero elts occur within the consecutive load).

It also generalizes the 64-bit VZEXT_LOAD load matching to work for loads other than 2x32-bit loads.

After this patch it will also be easier to add support for other basic load patterns like 32-bit VZEXT_LOAD loads, PMOVZX and subvector load insertion.

Differential Revision: http://reviews.llvm.org/D16217

llvm-svn: 258798
2016-01-26 09:30:08 +00:00
Simon Pilgrim e74653b67a [X86][SSE] Add INSERTPS target shuffle combines.
As vector shuffles can only reference two inputs many (V)INSERTPS patterns end up being split over two targets shuffles.

This patch adds combines to attempt to combine (V)INSERTPS nodes with input/output nodes that are just zeroing out these additional vector elements.

Differential Revision: http://reviews.llvm.org/D16072

llvm-svn: 258205
2016-01-19 22:24:12 +00:00
Simon Pilgrim 7cdedc1e02 [X86][SSE] Added extra 'float3' consecutive load tests
llvm-svn: 257998
2016-01-16 19:53:33 +00:00
Simon Pilgrim cff8550121 [X86][SSE] Added more exhaustive merge consecutive load tests
llvm-svn: 257876
2016-01-15 09:52:50 +00:00