Commit Graph

21938 Commits

Author SHA1 Message Date
Nirav Dave 4442667fc5 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting after fixing X86 inc/dec chain bug.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 293893
2017-02-02 14:39:42 +00:00
Matthias Braun 9dc3b5ff89 RegisterCoalescer: Cleanup joinReservedPhysReg(); NFC
- Factor out a common subexpression
- Add some helpful comments
- Fix printing of a register in a debug message

llvm-svn: 293856
2017-02-02 02:23:27 +00:00
Paul Robinson 5362216c36 Remove an assertion that doesn't hold when mixing -g and -gmlt through
LTO.  Replace it with a related assertion, ensuring that abstract
variables appear only in abstract scopes.
Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29430

llvm-svn: 293841
2017-02-01 23:51:56 +00:00
Dehao Chen 0944a8c2ec Change debug-info-for-profiling from a TargetOption to a function attribute.
Summary: LTO requires the debug-info-for-profiling to be a function attribute.

Reviewers: echristo, mehdi_amini, dblaikie, probinson, aprantl

Reviewed By: mehdi_amini, dblaikie, aprantl

Subscribers: aprantl, probinson, ahatanak, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29203

llvm-svn: 293833
2017-02-01 22:45:09 +00:00
Paul Robinson a380e613f1 Remove an assertion that doesn't hold when mixing -g and -gmlt through
LTO.  Part of PR31437.

Differential Revision: http://reviews.llvm.org/D29310

llvm-svn: 293818
2017-02-01 21:54:50 +00:00
Sanjoy Das 08da2e28ee [ImplicitNullCheck] Extend canReorder scope
Summary:
This change allows a re-order of two intructions if their uses
are overlapped.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29120

llvm-svn: 293775
2017-02-01 16:04:21 +00:00
Florian Hahn 7a5ec55fb3 [legalizetypes] Push fp16 -> fp32 extension node to worklist.
Summary:
This way, the type legalization machinery will take care of registering
the result of this node properly.

This patches fixes all failing fp16 test cases  with expensive checks.
(CodeGen/ARM/fp16-promote.ll, CodeGen/ARM/fp16.ll, CodeGen/X86/cvt16.ll
CodeGen/X86/soft-fp.ll) 


Reviewers: t.p.northover, baldrick, olista01, bogner, jmolloy, davidxl, ab, echristo, hfinkel

Reviewed By: hfinkel

Subscribers: mehdi_amini, hfinkel, davide, RKSimon, aemerson, llvm-commits

Differential Revision: https://reviews.llvm.org/D28195

llvm-svn: 293765
2017-02-01 13:01:33 +00:00
Evandro Menezes 94edf02923 [CodeGen] Move MacroFusion to the target
This patch moves the class for scheduling adjacent instructions,
MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a
scheduling block, beyond just among the predecessors of the branch at the
end.

Differential revision: https://reviews.llvm.org/D28489

llvm-svn: 293737
2017-02-01 02:54:34 +00:00
Sanjoy Das 15e50b510e [ImplicitNullCheck] NFC isSuitableMemoryOp cleanup
Summary:
isSuitableMemoryOp method is repsonsible for verification
that instruction is a candidate to use in implicit null check.
Additionally it checks that base register is not re-defined before.
In case base has been re-defined it just returns false and lookup
is continued while any suitable instruction will not succeed this check
as well. This results in redundant further operations.

So when we found that base register has been re-defined we just
stop.

Patch by Serguei Katkov!

Reviewers: reames, sanjoy

Reviewed By: sanjoy

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29119

llvm-svn: 293736
2017-02-01 02:49:25 +00:00
Stanislav Mekhanoshin 70c245e92d Fix regalloc assignment of overlapping registers
SplitEditor::defFromParent() can create a register copy.
If register is a tuple of other registers and not all lanes are used
a copy will be done on a full tuple regardless. Later register unit
for an unused lane will be considered free and another overlapping
register tuple can be assigned to a different value even though first
register is live at that point. That is because interference only look at
liveness info, while full register copy clobbers all lanes, even unused.

This patch fixes copy to only cover used lanes.

Differential Revision: https://reviews.llvm.org/D29105

llvm-svn: 293728
2017-02-01 01:18:36 +00:00
Kyle Butt b15c06677c CodeGen: Allow small copyable blocks to "break" the CFG.
When choosing the best successor for a block, ordinarily we would have preferred
a block that preserves the CFG unless there is a strong probability the other
direction. For small blocks that can be duplicated we now skip that requirement
as well, subject to some simple frequency calculations.

Differential Revision: https://reviews.llvm.org/D28583

llvm-svn: 293716
2017-01-31 23:48:32 +00:00
Tim Northover c6bfa481cf GlobalISel: the translation of an invoke must branch to the good block.
Otherwise bad things happen if the basic block order isn't trivial after an
invoke.

llvm-svn: 293679
2017-01-31 20:12:18 +00:00
Matthias Braun 01fa962226 InterleaveAccessPass: Avoid constructing invalid shuffle masks
Fix a bug where we would construct shufflevector instructions addressing
invalid elements.

Differential Revision: https://reviews.llvm.org/D29313

llvm-svn: 293673
2017-01-31 18:37:53 +00:00
Tim Northover 293f74355b GlobalISel: merge invoke and call translation paths.
Well, sort of. But the lower-level code that invoke used to be using completely
botched the handling of varargs functions, which hopefully won't be possible if
they're using the same code.

llvm-svn: 293670
2017-01-31 18:36:11 +00:00
Nirav Dave a7c041d147 [X86] Implement -mfentry
Summary: Insert calls to __fentry__ at function entry.

Reviewers: hfinkel, craig.topper

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D28000

llvm-svn: 293648
2017-01-31 17:00:27 +00:00
Nicolai Haehnle 8813d5d221 [DAGCombine] require UnsafeFPMath for re-association of addition
Summary:
The affected transforms all implicitly use associativity of addition,
for which we usually require unsafe math to be enabled.

The "Aggressive" flag is only meant to convey information about the
performance of the fused ops relative to a fmul+fadd sequence.

Fixes Bug 31626.

Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD

Subscribers: jholewinski, nemanjai, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D28675

llvm-svn: 293635
2017-01-31 14:35:37 +00:00
Keno Fischer 578cf7aae7 [ExecutionDepsFix] Improve clearance calculation for loops
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571
2017-01-30 23:37:03 +00:00
Tim Northover 2bf8c9d381 GlobalISel: correctly translate invoke when callee is a register.
This should fix the GlobalISel verifier.

llvm-svn: 293550
2017-01-30 21:45:21 +00:00
Tim Northover c944970484 GlobalISel: account for differing exception selector sizes.
For some reason the exception selector register must be a pointer (that's
assumed by SDag); on the other hand, it gets moved into an IR-level type which
might be entirely different (i32 on AArch64). IRTranslator needs to be aware of
this.

llvm-svn: 293546
2017-01-30 20:52:42 +00:00
Tim Northover c94d70336b GlobalISel: tidy up def/use test. NFC.
llvm-svn: 293545
2017-01-30 20:52:37 +00:00
Tim Northover 79f43f195c GlobalISel: translate memset & memmove.
llvm-svn: 293541
2017-01-30 19:33:07 +00:00
Tim Northover 480609d0f3 GlobalISel: permit unused vregs without a register-class after ISel.
This can happen if earlier combining has removed all uses of some VReg, which
is fine and shouldn't flag an error.

llvm-svn: 293537
2017-01-30 19:12:50 +00:00
Simon Pilgrim ffe2535cf6 Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Justin Bogner 8f520a73b2 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim 0a5ab5c4db Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Matt Arsenault 32e6bfa20f DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
David Blaikie a66696f210 unique_ptrify some containers in GlobalISel::RegisterBankInfo
To simplify/clarify memory ownership, make leaks (as one was found/fixed
recently) harder to write, etc.

(also, while I was there - removed a duplicate lookup in a container)

llvm-svn: 293506
2017-01-30 17:13:56 +00:00
Matt Arsenault 0c687390fe DAG: Constant fold fp16_to_fp/fp16_to_fp
This fixes emitting conversions of constants on targets
without legal f16 that need to use these for legalization.

llvm-svn: 293499
2017-01-30 16:57:41 +00:00
Kristof Beyls 65a12c012f [GlobalISel] Add support for indirectbr
Differential Revision: https://reviews.llvm.org/D28079

llvm-svn: 293470
2017-01-30 09:13:18 +00:00
Matthias Braun a4976c6166 MachineInstr: Remove parameter from dump()
The primary use of the dump() functions in LLVM is for use in a
debugger. Unfortunately lldb does not seem to handle default arguments
so using `p SomeMI.dump()` fails and you have to type the longer `p
SomeMI.dump(nullptr)`. Remove the paramter to make the most common use
easy. (You can always construct something like `p
SomeMI.print(dbgs(),MyTII)` if you need more features).

Differential Revision: https://reviews.llvm.org/D29241

llvm-svn: 293440
2017-01-29 18:20:42 +00:00
Craig Topper 135da1faf5 [SelectionDAG] Make SDNode::getConstantOperandVal an inline method.
It's operation already exists manually in many places without using the method.

llvm-svn: 293421
2017-01-29 06:08:02 +00:00
Craig Topper 4753736abf [DAGCombiner] Use unsigned for a constant vector index instead of APInt.
The type system requires that the number of vector elements should fit in 32-bits so this should be safe.

llvm-svn: 293414
2017-01-29 04:38:21 +00:00
Craig Topper d15730902b [DAGCombiner] Remove unnecessary check on the size of the type of the index of EXTRACT_SUBVECTOR.
The type system already requires that the number of vector elements must fit in 32-bits so an index should as well. Even if the type of the index were larger all we care about is that the constant index can fit in 64-bits so that we can call getZExtValue.

llvm-svn: 293413
2017-01-29 04:38:19 +00:00
Craig Topper 24cdbe8fa6 [DAGCombiner] Make sure index of EXTRACT_SUBVECTOR is a constant before trying to use getConstantOperandVal.
llvm-svn: 293412
2017-01-29 04:38:16 +00:00
Xinliang David Li fd3f645f9d Add support to dump dot graph block layout after MBP
Differential Revision: https://reviews.llvm.org/D29141

llvm-svn: 293408
2017-01-29 01:57:02 +00:00
Matthias Braun 194ded551c Use print() instead of dump() in code
llvm-svn: 293371
2017-01-28 06:53:55 +00:00
Quentin Colombet 8cf1163c4f [RegisterBankInfo] Emit proper type for remapped registers.
When the OperandsMapper creates virtual registers, it used to just create
plain scalar register with the right size. This may confuse the
instruction selector because we lose the information of the instruction
using those registers what supposed to do. The MachineVerifier complains
about that already.

With this patch, the OperandsMapper still creates plain scalar register,
but the expectation is for the mapping function to remap the type
properly. The default mapping function has been updated to do that.

rdar://problem/30231850

llvm-svn: 293362
2017-01-28 02:23:48 +00:00
Matthias Braun 8c209aa877 Cleanup dump() functions.
We had various variants of defining dump() functions in LLVM. Normalize
them (this should just consistently implement the things discussed in
http://lists.llvm.org/pipermail/cfe-dev/2014-January/034323.html

For reference:
- Public headers should just declare the dump() method but not use
  LLVM_DUMP_METHOD or #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
- The definition of a dump method should look like this:
  #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
  LLVM_DUMP_METHOD void MyClass::dump() {
    // print stuff to dbgs()...
  }
  #endif

llvm-svn: 293359
2017-01-28 02:02:38 +00:00
Quentin Colombet 351099022a [RegisterCoalescing] Recommit the patch "Remove partial redundent copy".
In r292621, the recommit fixes a bug related with live interval update
after the partial redundent copy is moved.

This recommit solves an additional bug related to the lack of update of
subranges.

The original patch is to solve the performance problem described in
PR27827. Register coalescing sometimes cannot remove a copy because of
interference. But if we can find a reverse copy in one of the predecessor
block of the copy, the copy is partially redundent and we may remove the
copy partially by moving it to the predecessor block without the
reverse copy.

Differential Revision: https://reviews.llvm.org/D28585

Re-apply r292621

Revert "Revert rL292621. Caused some internal build bot failures in apple."

This reverts commit r292984.

Original patch: Wei Mi <wmi@google.com>
Subrange fix: Mostly Matthias Braun <matze@braunis.de>

llvm-svn: 293353
2017-01-28 01:05:27 +00:00
Evgeniy Stepanov d0852873e5 Fix memory leak in globalisel.
#0 0x89cdeb in operator new[](unsigned long) /code/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:84:37
    #1 0x4ec87c4 in llvm::RegisterBankInfo::ValueMapping const* llvm::RegisterBankInfo::getOperandsMapping<llvm::RegisterBankInfo::ValueMapping const* const*>(llvm::RegisterBankInfo::ValueMapping const* const*, llvm::RegisterBankInfo::ValueMapping const* const*) const /code/llvm/lib/CodeGen/GlobalISel/RegisterBankInfo.cpp:297:9
    #2 0x9327ee in llvm::AArch64RegisterBankInfo::getInstrMapping(llvm::MachineInstr const&) const /code/llvm/lib/Target/AArch64/AArch64RegisterBankInfo.cpp:540:30
    #3 0x4eb8d07 in llvm::RegBankSelect::assignInstr(llvm::MachineInstr&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:546:24
    #4 0x4eb9dd2 in llvm::RegBankSelect::runOnMachineFunction(llvm::MachineFunction&) /code/llvm/lib/CodeGen/GlobalISel/RegBankSelect.cpp:624:12
    #5 0x3141875 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /code/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:13
    #6 0x396128d in llvm::FPPassManager::runOnFunction(llvm::Function&) /code/llvm/lib/IR/LegacyPassManager.cpp:1513:27
    #7 0x3961832 in llvm::FPPassManager::runOnModule(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1534:16
    #8 0x3962540 in runOnModule /code/llvm/lib/IR/LegacyPassManager.cpp:1590:27
    #9 0x3962540 in llvm::legacy::PassManagerImpl::run(llvm::Module&) /code/llvm/lib/IR/LegacyPassManager.cpp:1693
    #10 0x8ae368 in compileModule(char**, llvm::LLVMContext&) /code/llvm/tools/llc/llc.cpp:562:8
    #11 0x8a7a1b in main /code/llvm/tools/llc/llc.cpp:316:22

llvm-svn: 293351
2017-01-28 00:46:30 +00:00
Tim Northover 12bd22fbee GlobalISel: don't leak super-entry BB when merging with IR-level one.
We have to delete the block manually or it leaks. That triggers failures in
-fsanitize=leak bots (unsurprisingly), which should be fixed by this patch.

llvm-svn: 293347
2017-01-27 23:54:31 +00:00
Tim Northover d8b85584f2 GlobalISel: set correct regclass for LOAD_STACK_GUARD.
Since it's not actually a generic MI, its register operands need a RegClass,
which is conveniently the target's pointer RegClass.

llvm-svn: 293335
2017-01-27 21:31:24 +00:00
Tim Northover c9bc8a5580 GlobalISel: mark incoming landing-pad registers as live.
Should fix machine verifier failures.

llvm-svn: 293334
2017-01-27 21:31:17 +00:00
Matthias Braun c91e28af4b ScheduleDAGInstrs: Do not try to toggle kill flags on debug uses
Preparation for upcoming changes. No testcase as none of the public
targets bundles early enough and has a post machine scheduler enabled at
the same time. The error is also easily catched by asserts.

llvm-svn: 293324
2017-01-27 18:53:07 +00:00
Matthias Braun 26e8c350f9 ScheduleDAGInstrs: Cleanup toggleKillFlag(); NFC
llvm-svn: 293323
2017-01-27 18:53:05 +00:00
Matthias Braun bd7d91838e ScheduleDAGInstrs: Cleanup; NFC
Comment, doxygen and a bit of whitespace cleanup.

llvm-svn: 293322
2017-01-27 18:53:00 +00:00
Jun Bum Lim b99a06b7c9 [CodeGenPrep]No negative cost in the ExtLd promotion
Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument.

Reviewers: mcrosier, jmolloy, qcolombet, javed.absar

Reviewed By: mcrosier, qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28871

llvm-svn: 293307
2017-01-27 17:16:37 +00:00
Jonas Paulsson bb0ed3e732 [DAGTypeLegalizer] Handle SIGN/ZERO_EXTEND in WidenVecRes_Convert().
In case of a SIGN/ZERO_EXTEND of an incomplete vector type (using only a
partial number of available vector elements), WidenVecRes_Convert() used to
resort to scalarization.

This patch adds a handling of the (common) case where an input vector can be
found of same width as the widened result vector, by converting the node to
SIGN/ZERO_EXTEND_VECTOR_INREG.

Review: Eli Friedman
llvm-svn: 293268
2017-01-27 07:46:26 +00:00
Tim Northover 09aac4ad2a GlobalISel: support debug intrinsics.
The translation scheme is mostly cribbed from FastISel, and it's not entirely
convincing semantically. But it does seem to work in the common cases and allow
variables to be printed so it can't be all wrong.

llvm-svn: 293228
2017-01-26 23:39:14 +00:00
Andrew Kaylor a0a1164ce4 Add intrinsics for constrained floating point operations
This commit introduces a set of experimental intrinsics intended to prevent
optimizations that make assumptions about the rounding mode and floating point
exception behavior.  These intrinsics will later be extended to specify
flush-to-zero behavior.  More work is also required to model instruction
dependencies in machine code and to generate these instructions from clang
(when required by pragmas and/or command line options that are not currently
supported).

Differential Revision: https://reviews.llvm.org/D27028

llvm-svn: 293226
2017-01-26 23:27:59 +00:00