Commit Graph

707 Commits

Author SHA1 Message Date
Craig Topper 1cb0e5afb0 [Simplify] Add testcase to show that merging conditional stores for triangles is sensitive to the order of the branch targets on the conditional branches. NFC
llvm-svn: 300915
2017-04-20 22:57:36 +00:00
Sanjay Patel 33439f982b [InstCombine] morph an existing instruction instead of creating a new one
One potential way to make InstCombine (very slightly?) faster is to recycle instructions 
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms 
like this if that makes InstCombine more manageable (just throwing out an idea; not sure 
how much opportunity is actually here).

Differential Revision: https://reviews.llvm.org/D31863

llvm-svn: 300067
2017-04-12 15:11:33 +00:00
Matt Arsenault f10061ec70 Add address space mangling to lifetime intrinsics
In preparation for allowing allocas to have non-0 addrspace.

llvm-svn: 299876
2017-04-10 20:18:21 +00:00
Sanjay Patel 2824927e8a [SimplifyCFG] auto-generate better checks; NFC
llvm-svn: 299825
2017-04-09 16:16:32 +00:00
Joerg Sonnenberger fa7367428a Split the SimplifyCFG pass into two variants.
The first variant contains all current transformations except
transforming switches into lookup tables. The second variant
contains all current transformations.

The switch-to-lookup-table conversion results in code that is more
difficult to analyze and optimize by other passes. Most importantly,
it can inhibit Dead Code Elimination. As such it is often beneficial to
only apply this transformation very late. A common example is inlining,
which can often result in range restrictions for the switch expression.

Changes in execution time according to LNT:
SingleSource/Benchmarks/Misc/fp-convert +3.03%
MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk -11.20%
MultiSource/Benchmarks/Olden/perimeter/perimeter -10.43%
and a couple of smaller changes. For perimeter it also results 2.6%
a smaller binary.

Differential Revision: https://reviews.llvm.org/D30333

llvm-svn: 298799
2017-03-26 06:44:08 +00:00
Sanjay Patel caf369bd03 [SimplifyCFG] move tests for PR31028 from CGP
Hopefully, this will make sense with a forthcoming patch. If not, we can move these back.

llvm-svn: 297660
2017-03-13 19:59:14 +00:00
Peter Collingbourne 10c500ddc0 opt: Rename -default-data-layout flag to -data-layout and make it always override the layout.
There isn't much point in a flag that only works if the data layout is empty.

Differential Revision: https://reviews.llvm.org/D30014

llvm-svn: 295468
2017-02-17 17:36:52 +00:00
Peter Collingbourne 9421c2dc54 AssumptionCache: Disable the verifier by default, move it behind a hidden cl::opt and verify from releaseMemory().
This is a short term solution to the problem that many passes currently fail
to update the assumption cache. In the long term the verifier should not
be controllable with a flag. We should either fix all passes to correctly
update the assumption cache and enable the verifier unconditionally or
somehow arrange for the assumption list to be updated automatically by passes.

Differential Revision: https://reviews.llvm.org/D30003

llvm-svn: 295236
2017-02-15 21:10:09 +00:00
Peter Collingbourne 0609acc10d SimplifyCFG: Register cloned assume intrinsics with assumption cache when creating critical edge.
Differential Revision: https://reviews.llvm.org/D29976

llvm-svn: 295145
2017-02-15 03:01:11 +00:00
Taewook Oh 505a25aec5 [InstCombine] Merge DebugLoc when speculatively hoisting store instruction
Summary: Along with https://reviews.llvm.org/D27804, debug locations need to be merged when hoisting store instructions as well. Not sure if just dropping debug locations would make more sense for this case, but as the branch instruction will have at least different discriminator with the hoisted store instruction, I think there will be no difference in practice.

Reviewers: aprantl, andreadb, danielcdh

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29062

llvm-svn: 293372
2017-01-28 07:05:43 +00:00
Akira Hatanaka 4ec7b20ef6 [SimplifyCFG] Do not sink and merge inline-asm instructions.
Conservatively disable sinking and merging inline-asm instructions as doing so
can potentially create arguments that cannot satisfy the inline-asm constraints.

For example, SimplifyCFG used to do the following transformation:

(before)
if.then:
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 8)
  br label %if.end
if.else:
  %1 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 6)
  br label %if.end

(after)
  %.sink = select i1 %tobool, i32 6, i32 8
  %0 = call i32 asm "rorl $2, $0", "=&r,0,n"(i32 %r6, i32 %.sink)

This would result in a crash in the backend since only immediate integer operands
are permitted for constraint "n".

rdar://problem/30110806

Differential Revision: https://reviews.llvm.org/D29111

llvm-svn: 293025
2017-01-25 06:21:51 +00:00
Adrian Prantl 1eadba1c8c Renumber testcase metadata nodes after r290153.
This patch renumbers the metadata nodes in debug info testcases after
https://reviews.llvm.org/D26769. This is a separate patch because it
causes so much churn. This was implemented with a python script that
pipes the testcases through llvm-as - | llvm-dis - and then goes
through the original and new output side-by side to insert all
comments at a close-enough location.

Differential Revision: https://reviews.llvm.org/D27765

llvm-svn: 290292
2016-12-22 00:45:21 +00:00
Adrian Prantl bceaaa9643 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 290153
2016-12-20 02:09:43 +00:00
Daniel Jasper aec2fa352f Revert @llvm.assume with operator bundles (r289755-r289757)
This creates non-linear behavior in the inliner (see more details in
r289755's commit thread).

llvm-svn: 290086
2016-12-19 08:22:17 +00:00
Adrian Prantl 73ec065604 Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289920 (again).
I forgot to implement a Bitcode upgrade for the case where a DIGlobalVariable
has not DIExpression. Unfortunately it is not possible to safely upgrade
these variables without adding a flag to the bitcode record indicating which
version they are.
My plan of record is to roll the planned follow-up patch that adds a
unit: field to DIGlobalVariable into this patch before recomitting.
This way we only need one Bitcode upgrade for both changes (with a
version flag in the bitcode record to safely distinguish the record
formats).

Sorry for the churn!

llvm-svn: 289982
2016-12-16 19:39:01 +00:00
Adrian Prantl 74a835cda0 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

This reapplies r289902 with additional testcase upgrades.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289920
2016-12-16 04:25:54 +00:00
Adrian Prantl 03c6d31a3b Revert "[IR] Remove the DIExpression field from DIGlobalVariable."
This reverts commit 289902 while investigating bot berakage.

llvm-svn: 289906
2016-12-16 01:00:30 +00:00
Adrian Prantl ce13935776 [IR] Remove the DIExpression field from DIGlobalVariable.
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:

(1) The DIGlobalVariable should describe the source level variable,
    not how to get to its location.

(2) It makes it unsafe/hard to update the expressions when we call
    replaceExpression on the DIGLobalVariable.

(3) It makes it impossible to represent a global variable that is in
    more than one location (e.g., a variable with multiple
    DW_OP_LLVM_fragment-s).  We also moved away from attaching the
    DIExpression to DILocalVariable for the same reasons.

<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769

llvm-svn: 289902
2016-12-16 00:36:43 +00:00
Hal Finkel cb9f78e1c3 Make processing @llvm.assume more efficient by using operand bundles
There was an efficiency problem with how we processed @llvm.assume in
ValueTracking (and other places). The AssumptionCache tracked all of the
assumptions in a given function. In order to find assumptions relevant to
computing known bits, etc. we searched every assumption in the function. For
ValueTracking, that means that we did O(#assumes * #values) work in InstCombine
and other passes (with a constant factor that can be quite large because we'd
repeat this search at every level of recursion of the analysis).

Several of us discussed this situation at the last developers' meeting, and
this implements the discussed solution: Make the values that an assume might
affect operands of the assume itself. To avoid exposing this detail to
frontends and passes that need not worry about it, I've used the new
operand-bundle feature to add these extra call "operands" in a way that does
not affect the intrinsic's signature. I think this solution is relatively
clean. InstCombine adds these extra operands based on what ValueTracking, LVI,
etc. will need and then those passes need only search the users of the values
under consideration. This should fix the computational-complexity problem.

At this point, no passes depend on the AssumptionCache, and so I'll remove
that as a follow-up change.

Differential Revision: https://reviews.llvm.org/D27259

llvm-svn: 289755
2016-12-15 02:53:42 +00:00
Sanjoy Das 3336f681e3 [Verifier] Add verification for TBAA metadata
Summary:
This change adds some verification in the IR verifier around struct path
TBAA metadata.

Other than some basic sanity checks (e.g. we get constant integers where
we expect constant integers), this checks:

 - That by the time an struct access tuple `(base-type, offset)` is
   "reduced" to a scalar base type, the offset is `0`.  For instance, in
   C++ you can't start from, say `("struct-a", 16)`, and end up with
   `("int", 4)` -- by the time the base type is `"int"`, the offset
   better be zero.  In particular, a variant of this invariant is needed
   for `llvm::getMostGenericTBAA` to be correct.

 - That there are no cycles in a struct path.

 - That struct type nodes have their offsets listed in an ascending
   order.

 - That when generating the struct access path, you eventually reach the
   access type listed in the tbaa tag node.

Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D26438

llvm-svn: 289402
2016-12-11 20:07:15 +00:00
Sanjay Patel 534e270ae5 [SimplifyCFG] auto-generate better checks; NFC
llvm-svn: 287954
2016-11-25 21:12:39 +00:00
Sanjay Patel d1a147f9f4 [SimplifyCFG] auto-generate better checks; NFC
llvm-svn: 287953
2016-11-25 21:07:13 +00:00
Florian Hahn 77382be56b [simplifycfg][loop-simplify] Preserve loop metadata in 2 transformations.
insertUniqueBackedgeBlock in lib/Transforms/Utils/LoopSimplify.cpp now
propagates existing llvm.loop metadata to newly the added backedge.

llvm::TryToSimplifyUncondBranchFromEmptyBlock in lib/Transforms/Utils/Local.cpp
now propagates existing llvm.loop metadata to the branch instructions in the
predecessor blocks of the empty block that is removed.

Differential Revision: https://reviews.llvm.org/D26495

llvm-svn: 287341
2016-11-18 13:12:07 +00:00
Mandeep Singh Grang da99e33ae3 [llvm] Remove redundant --check-prefix=CHECK from tests
Reviewers: MatzeB, mcrosier, rengolin

Differential Revision: https://reviews.llvm.org/D25894

llvm-svn: 285003
2016-10-24 18:57:55 +00:00
Dehao Chen 018a3afa99 Ignore debug info when making optimization decisions in SimplifyCFG.
Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info.

Reviewers: davidxl, mzolotukhin, jmolloy

Subscribers: aprantl, llvm-commits

Differential Revision: https://reviews.llvm.org/D25286

llvm-svn: 284415
2016-10-17 19:28:44 +00:00
Oliver Stannard fe4432b105 [SimplifyCFG] Don't lower complex ConstantExprs to lookup tables
Not all ConstantExprs can be represented by a global variable, for example most
pointer arithmetic other than addition of a constant, so we can't convert these
values from switch statements to lookup tables.

Differential Revision: https://reviews.llvm.org/D25550

llvm-svn: 284379
2016-10-17 12:00:24 +00:00
Sanjoy Das bc357e8fa3 [SimplifyCFG] Don't create PHI nodes for constant bundle operands
Summary:
Constant bundle operands may need to retain their constant-ness for
correctness.  I'll admit that this is slightly odd, but it looks like
SimplifyCFG already does this for things like @llvm.frameaddress and
@llvm.stackmap, so I suppose adding one more case is not a big deal.

It is possible to add a mechanism to denote bundle operands that need to
remain constants, but that's probably too complicated for the time
being.

Reviewers: jmolloy

Subscribers: mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D25502

llvm-svn: 284028
2016-10-12 18:15:33 +00:00
Oliver Stannard 4df1cc0b00 [ARM] Don't convert switches to lookup tables of pointers with ROPI/RWPI
With the ROPI and RWPI relocation models we can't always have pointers
to global data or functions in constant data, so don't try to convert switches
into lookup tables if any value in the lookup table would require a relocation.
We can still safely emit lookup tables of other values, such as simple
constants.

Differential Revision: https://reviews.llvm.org/D24462

llvm-svn: 283530
2016-10-07 08:48:24 +00:00
David Majnemer 8c03c1bade [SimplifyCFG] Correctly test for unconditional branches in GetCaseResults
GetCaseResults assumed that a terminator with one successor was an
unconditional branch.  This is not necessarily the case, it could be a
cleanupret.

Strengthen the check by querying whether or not the terminator is
exceptional.

llvm-svn: 283517
2016-10-07 01:38:35 +00:00
James Molloy 0efb96a8ee [SimplifyCFG] Update (AND) IR flags when CSE'ing instructions
We were updating metadata but not IR flags. Because we pick an arbitrary instruction to be the CSE candidate, it comes down to luck (50% or less chance) if this results in broken codegen or not, which is why PR30373 which is actually not the fault of the commit it was bisected down to.

Fixes PR30373.

llvm-svn: 281889
2016-09-19 08:23:08 +00:00
David Majnemer 08566532ae Add a test for r280191
llvm-svn: 281694
2016-09-16 02:43:36 +00:00
Peter Collingbourne d4135bbc30 DebugInfo: New metadata representation for global variables.
This patch reverses the edge from DIGlobalVariable to GlobalVariable.
This will allow us to more easily preserve debug info metadata when
manipulating global variables.

Fixes PR30362. A program for upgrading test cases is attached to that
bug.

Differential Revision: http://reviews.llvm.org/D20147

llvm-svn: 281284
2016-09-13 01:12:59 +00:00
James Molloy 104370ab37 [SimplifyCFG] Be even more conservative in SinkThenElseCodeToEnd
This should *actually* fix PR30244. This cranks up the workaround for PR30188 so that we never sink loads or stores of allocas.

The idea is that these should be removed by SROA/Mem2Reg, and any movement of them may well confuse SROA or just cause unwanted code churn. It's not ideal that the midend should be crippled like this, but that unwanted churn can really cause significant regressions in important workloads (tsan).

llvm-svn: 281162
2016-09-11 09:00:03 +00:00
James Molloy 18d96e8fa5 [SimplifyCFG] Harden up the profitability heuristic for block splitting during sinking
Exposed by PR30244, we will split a block currently if we think we can sink at least one instruction. However this isn't right - the reason we split predecessors is so that we can sink instructions that otherwise couldn't be sunk because it isn't safe to do so - stores, for example.

So, change the heuristic to only split if it thinks it can sink at least one non-speculatable instruction.

Should fix PR30244.

llvm-svn: 281160
2016-09-11 08:07:30 +00:00
Dehao Chen 87823f8e4d Remove debug info when hoisting instruction from then/else branch.
Summary: The hoisted instruction is executed speculatively. It could affect the debugging experience as user would see gdb go into code that may not be expected to execute. It will also affect sample profile accuracy by assigning incorrect frequency to source within then/else branch.

Reviewers: davidxl, dblaikie, chandlerc, kcc, echristo

Subscribers: mehdi_amini, probinson, eric_niebler, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D24164

llvm-svn: 280995
2016-09-08 21:53:33 +00:00
Hal Finkel ac5803ba91 [SimplifyCFG] Don't try to create metadata-valued PHIs
We can't create metadata-valued PHIs; don't try to do so when sinking.

I created a test case for this using the @llvm.type.test intrinsic, because it
takes a metadata parameter and does not have severe side effects (thus
SimplifyCFG is willing to otherwise sink it).

Previously, running the test case would crash with:

  Invalid use of metadata!
    %.sink = select i1 %flag, metadata <...>, metadata <0x4e45dc0>
  LLVM ERROR: Broken function found, compilation aborted!

llvm-svn: 280866
2016-09-07 21:38:22 +00:00
James Molloy ec905a62ae [SimplifyCFG] Update workaround for PR30188 to also include loads
I should have realised this the first time around, but if we're avoiding sinking stores where the operands come from allocas so they don't create selects, we also have to do the same for loads because SROA will be just as defective looking at loads of selected addresses as stores.

Fixes PR30188 (again).

llvm-svn: 280792
2016-09-07 08:40:20 +00:00
James Molloy bf1837d9c9 [SimplifyCFG] Check PHI uses more accurately
PR30292 showed a case where our PHI checking wasn't correct. We were checking that all values were used by the same PHI before deciding to sink, but we weren't checking that the incoming values for that PHI were what we expected. As a result, we had to bail out after block splitting which caused us to never reach a steady state in SimplifyCFG.

Fixes PR30292.

llvm-svn: 280790
2016-09-07 08:15:54 +00:00
Oliver Stannard ef38d53a7e [SimplifyCFG] Add test for sinking inline asm in if/else
This test code previously caused a failure in the module verifier,
because SimplifyCFG created this invalid instruction, which tries to
take the address of inline asm:
  %.sink = select i1 %1, i64 ()* asm "mov $0, #1", "=r", i64 ()* asm %"mov $0, #2", "=r"

This has been fixed recently, presumably by James Molloy's patches that
re-wrote and changed parts of SimplifyCFG, so this patch just adds a
regression test for it.

Differential Revision: https://reviews.llvm.org/D24231

llvm-svn: 280660
2016-09-05 13:49:26 +00:00
James Molloy f3cf2a494b [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280470
2016-09-02 07:29:00 +00:00
James Molloy 88cad7e5cf [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280364
2016-09-01 12:58:13 +00:00
James Molloy eec6df3193 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280351
2016-09-01 10:44:35 +00:00
James Molloy cacfc16109 Revert "[SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd"
This reverts commit r280216 - it caused buildbot failures.

llvm-svn: 280234
2016-08-31 13:16:52 +00:00
James Molloy 76c9d423a7 Revert "[SimplifyCFG] Handle tail-sinking of more than 2 incoming branches"
This reverts commit r280217. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280233
2016-08-31 13:16:45 +00:00
James Molloy 06a45483a1 Revert "[SimplifyCFG] Add a workaround to fix PR30188"
This reverts commit r280219. r280216 caused buildbot failures - backing out the entire chain.

llvm-svn: 280232
2016-08-31 13:16:36 +00:00
James Molloy 8a66a39cbf Revert "[SimplifyCFG] Fix bootstrap failure after r280220"
This reverts commit r280228. r280216 caused buildbot failures - backing out the entire sequence.

llvm-svn: 280231
2016-08-31 13:16:30 +00:00
James Molloy b7efa6c227 [SimplifyCFG] Fix bootstrap failure after r280220
We check that a sinking candidate is used by only one PHI node during our legality checks. However for instructions that are used by other sinking candidates our heuristic is less conservative. This can result in a candidate actually being illegal when we come to sink it because of how we sunk a predecessor. Do the used-by-only-one-PHI checks again during sinking to ensure we don't crash.

llvm-svn: 280228
2016-08-31 12:33:48 +00:00
James Molloy 171fdac7ce [SimplifyCFG] Add a workaround to fix PR30188
We're sinking stores, which is a good thing, but in the process creating selects for the store address operand, which SROA/Mem2Reg can't look through, which caused serious regressions.

The real fix is in SROA, which I'll be looking into.

llvm-svn: 280219
2016-08-31 10:46:45 +00:00
James Molloy c53b40b509 [SimplifyCFG] Handle tail-sinking of more than 2 incoming branches
This was a real restriction in the original version of SinkIfThenCodeToEnd. Now it's been rewritten, the restriction can be lifted.

As part of this, we handle a very common and useful case where one of the incoming branches is actually conditional. Consider:

   if (a)
     x(1);
   else if (b)
     x(2);

This produces the following CFG:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \    |  /
          [ end ]

[end] has two unconditional predecessor arcs and one conditional. The conditional refers to the implicit empty 'else' arc. This same pattern can also be caused by an empty default block in a switch.

We can't sink the call to x() down to end because no call to x() happens on the third incoming arc (assume that x() has sideeffects for the sake of argument; if something is safe to speculate we could indeed sink nevertheless but this cannot happen in the general case and causes many extra selects).

We are now able to detect this case and split off the unconditional arcs to a common successor:

         [if]
        /    \
      [x(1)] [if]
        |     | \
        |     |  \
        |  [x(2)] |
         \   /    |
     [sink.split] |
           \     /
           [ end ]

Now we can sink the call to x() into %sink.split. This can cause significant code simplification in many testcases.

llvm-svn: 280217
2016-08-31 10:46:33 +00:00
James Molloy 55bd04cd20 [SimplifyCFG] Change the algorithm in SinkThenElseCodeToEnd
r279460 rewrote this function to be able to handle more than two incoming edges and took pains to ensure this didn't regress anything.

This time we change the logic for determining if an instruction should be sunk. Previously we used a single pass greedy algorithm - sink instructions until one requires more than one PHI node or we run out of instructions to sink.

This had the problem that sinking instructions that had non-identical but trivially the same operands needed extra logic so we sunk them aggressively. For example:

    %a = load i32* %b          %d = load i32* %b
    %c = gep i32* %a, i32 0    %e = gep i32* %d, i32 1

Sinking %c and %e would naively require two PHI merges as %a != %d. But the loads are obviously equivalent (and maybe can't be hoisted because there is no common predecessor).

This is why we implemented the fairly complex function areValuesTriviallySame(), to look through trivial differences like this. However it's just not clever enough.

Instead, throw areValuesTriviallySame away, use pointer equality to check equivalence of operands and switch to a two-stage algorithm.

In the "scan" stage, we look at every sinkable instruction in isolation from end of block to front. If it's sinkable, we keep track of all operands that required PHI merging.

In the "sink" stage, we iteratively sink the last non-terminator in the source blocks. But when calculating how many PHIs are actually required to be inserted (to work out if we should stop or not) we remove any values that have already been sunk from the set of PHI-merges required, which allows us to be more aggressive.

This turns an algorithm with potentially recursive lookahead (looking through GEPs, casts, loads and any other instruction potentially not CSE'd) to two linear scans.

llvm-svn: 280216
2016-08-31 10:46:23 +00:00
James Molloy 923e98c232 [SimplifyCFG] Tail-merge calls with sideeffects
This was deliberately disabled during my rewrite of SinkIfThenToEnd to keep behaviour
at least vaguely consistent with the previous version and keep it as close to NFC as
I could.

There's no real reason not to merge sideeffect calls though, so let's do it! Small fixup
along the way to ensure we don't create indirect calls.

Should fix PR28964.

llvm-svn: 280215
2016-08-31 10:46:16 +00:00
James Molloy d13b1239e4 [SimplifyCFG] Properly CSE metadata in SinkThenElseCodeToEnd
This was missing, meaning the metadata in sunk instructions was potentially bogus and could cause miscompiles.

llvm-svn: 280072
2016-08-30 10:56:08 +00:00
David Majnemer e8fd5f9ffd [SimplifyCFG] Hoisting invalidates metadata
We forgot to remove optimization metadata when performing hosting during
FoldTwoEntryPHINode.

This fixes PR29163.

llvm-svn: 279980
2016-08-29 17:14:08 +00:00
Sanjay Patel 25475bcc0c [Constant] remove fdiv and frem from canTrap()
Assuming the default FP env, we should not treat fdiv and frem any differently in terms of
trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env.

This matches how we treat the fdiv/frem in IR with isSafeToSpeculativelyExecute() and in 
the backend after:
https://reviews.llvm.org/rL279970

llvm-svn: 279973
2016-08-29 15:27:17 +00:00
Sanjay Patel 20f02b271b [SimplifyCFG] rename test file, regenerate checks, and add test
The fdiv test shows a problem similar to:
https://reviews.llvm.org/rL279970

llvm-svn: 279972
2016-08-29 14:57:53 +00:00
James Molloy 5bf2114265 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
[Recommitting now an unrelated assertion in SROA is sorted out]

The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279460
2016-08-22 19:07:15 +00:00
James Molloy 475f4a763f Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279443. It caused buildbot failures.

llvm-svn: 279447
2016-08-22 18:13:12 +00:00
James Molloy 353052698a [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

Round 4. This time we should handle all instructions correctly, and not replace any operands that need to be constant with variables.

This was really hard to determine safely, so the helper function should be put into the Instruction API. I'll do that as a followup.

llvm-svn: 279443
2016-08-22 17:40:23 +00:00
Reid Kleckner 98a48afa5d Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r279229. It breaks intrinsic function calls in
diamonds.

llvm-svn: 279313
2016-08-19 20:22:39 +00:00
James Molloy 11a1936b70 [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 279229
2016-08-19 10:10:27 +00:00
Reid Kleckner 70a600b8bb Revert "[SimplifyCFG] Rewrite SinkThenElseCodeToEnd"
This reverts commit r278660.

It causes downstream assertion failure in InstCombine on shuffle
instructions. Comes up in __mm_swizzle_epi32.

llvm-svn: 278672
2016-08-15 15:42:31 +00:00
James Molloy 9a3c82f5cf [SimplifyCFG] Rewrite SinkThenElseCodeToEnd
The new version has several advantages:
  1) IMSHO it's more readable and neater
  2) It handles loads and stores properly
  3) It can handle any number of incoming blocks rather than just two. I'll be taking advantage of this in a followup patch.

With this change we can now finally sink load-modify-store idioms such as:

    if (a)
      return *b += 3;
    else
      return *b += 4;

    =>

    %z = load i32, i32* %y
    %.sink = select i1 %a, i32 5, i32 7
    %b = add i32 %z, %.sink
    store i32 %b, i32* %y
    ret i32 %b

When this works for switches it'll be even more powerful.

llvm-svn: 278660
2016-08-15 08:04:56 +00:00
Benjamin Kramer 000a87d1b0 Actually, r277337 was fine. Just kill the DAGs that made the test allow nondeterminism.
llvm-svn: 277821
2016-08-05 14:58:34 +00:00
Benjamin Kramer aa160c22f7 [SimplifyCFG] Make range reduction code deterministic.
This generated IR based on the order of evaluation, which is different
between GCC and Clang. With that in mind you get bootstrap miscompares
if you compare a Clang built with GCC-built Clang vs. Clang built with
Clang-built Clang. Diagnosing that made my head hurt.

This also reverts commit r277337, which "fixed" the test case.

llvm-svn: 277820
2016-08-05 14:55:02 +00:00
Simon Pilgrim 7fd4ad6849 Fixed test check ordering issue on windows buildbots
llvm-svn: 277337
2016-08-01 10:40:15 +00:00
James Molloy bade86cedc [SimplifyCFG] Fix nasty RAUW bug from r277325
Using RAUW was wrong here; if we have a switch transform such as:
  18 -> 6 then
  6 -> 0

If we use RAUW, while performing the second transform the  *transformed* 6
from the first will be also replaced, so we end up with:
  18 -> 0
  6 -> 0

Found by clang stage2 bootstrap; testcase added.

llvm-svn: 277332
2016-08-01 09:34:48 +00:00
James Molloy 91821bd0b4 [SimplifyCFG] Try and pacify buildbots after r277325
It looks like the two independent parts of the rotate operation (a lshr and shl) are being reordered on some bots. Add CHECK-DAGs to account for this.

llvm-svn: 277329
2016-08-01 08:09:55 +00:00
James Molloy b2e436de42 [SimplifyCFG] Range reduce switches
If a switch is sparse and all the cases (once sorted) are in arithmetic progression, we can extract the common factor out of the switch and create a dense switch. For example:

    switch (i) {
    case 5: ...
    case 9: ...
    case 13: ...
    case 17: ...
    }

can become:

    if ( (i - 5) % 4 ) goto default;
    switch ((i - 5) / 4) {
    case 0: ...
    case 1: ...
    case 2: ...
    case 3: ...
    }

or even better:

   switch ( ROTR(i - 5, 2) {
   case 0: ...
   case 1: ...
   case 2: ...
   case 3: ...
   }

The division and remainder operations could be costly so we only do this if the factor is a power of two, and emit a right-rotate instead of a divide/remainder sequence. Dense switches can be lowered significantly better than sparse switches and can even be transformed into lookup tables.

llvm-svn: 277325
2016-08-01 07:45:11 +00:00
David Majnemer e14e7bc4b8 Revert "[SimplifyCFG] Stop inserting calls to llvm.trap for UB"
This reverts commit r273778, it seems to break UBSan :/

llvm-svn: 273779
2016-06-25 08:19:55 +00:00
David Majnemer d346a37737 [SimplifyCFG] Stop inserting calls to llvm.trap for UB
SimplifyCFG had logic to insert calls to llvm.trap for two very
particular IR patterns: stores and invokes of undef/null.

While InstCombine canonicalizes certain undefined behavior IR patterns
to stores of undef, phase ordering means that this cannot be relied upon
in general.

There are much better tools than llvm.trap: UBSan and ASan.

N.B. I could be argued into reverting this change if a clear argument as
to why it is important that we synthesize llvm.trap for stores, I'd be
hard pressed to see why it'd be useful for invokes...

llvm-svn: 273778
2016-06-25 08:04:19 +00:00
David Majnemer 1fea77c6fc [SimplifyCFG] Replace calls to null/undef with unreachable
Calling null is undefined behavior, a call to undef can be trivially
treated as a call to null.

llvm-svn: 273776
2016-06-25 07:37:27 +00:00
Chuang-Yu Cheng 68f7f1cf00 Teaching SimplifyCFG to recognize the Or-Mask trick that InstCombine uses to
reduce the number of comparisons.

Specifically, InstCombine can turn:
  (i == 5334 || i == 5335)
into:
  ((i | 1) == 5335)

SimplifyCFG was already able to detect the pattern:
  (i == 5334 || i == 5335)
to:
  ((i & -2) == 5334)

This patch supersedes D21315 and resolves PR27555
(https://llvm.org/bugs/show_bug.cgi?id=27555).

Thanks to David and Chandler for the suggestions!

Author: Thomas Jablin (tjablin)
Reviewers: majnemer chandlerc halfdan cycheng

http://reviews.llvm.org/D21397

llvm-svn: 273639
2016-06-24 01:59:00 +00:00
Chuang-Yu Cheng dbe00d51b4 SimplifyCFG is able to detect the pattern:
(i == 5334 || i == 5335)
to:
    ((i & -2) == 5334)

This transformation has some incorrect side conditions. Specifically, the
transformation is only applied when the right-hand side constant (5334 in
the example) is a power of two not equal and not equal to the negated mask.
These side conditions were added in r258904 to fix PR26323. The correct side
condition is that: ((Constant & Mask) == Constant)[(5334 & -2) == 5334].

It's a little bit hard to see why these transformations are correct and what
the side conditions ought to be. Here is a CVC3 program to verify them for
64-bit values:
    ONE  : BITVECTOR(64) = BVZEROEXTEND(0bin1, 63);
    x    : BITVECTOR(64);
    y    : BITVECTOR(64);
    z    : BITVECTOR(64);
    mask : BITVECTOR(64) = BVSHL(ONE, z);
    QUERY( (y & ~mask = y) =>
           ((x & ~mask = y) <=> (x = y OR x = (y |  mask)))
    );

Please note that each pattern must be a dual implication (<--> or iff). One
directional implication can create spurious matches. If the implication is
only one-way, an unsatisfiable condition on the left side can imply a
satisfiable condition on the right side. Dual implication ensures that
satisfiable conditions are transformed to other satisfiable conditions and
unsatisfiable conditions are transformed to other unsatisfiable conditions.

Here is a concrete example of a unsatisfiable condition on the left
implying a satisfiable condition on the right:
    mask = (1 << z)
    (x & ~mask) == y --> (x == y || x == (y | mask))

Substituting y = 3, z = 0 yields:
    (x & -2) == 3 --> (x == 3 || x == 2)

The version of this code before r258904 had no side-conditions and
incorrectly justified itself in comments through one-directional
implication.

Thanks to Chandler for the suggestion!

Author: Thomas Jablin (tjablin)
Reviewers: chandlerc majnemer hfinkel cycheng

http://reviews.llvm.org/D21417

llvm-svn: 272873
2016-06-16 04:44:25 +00:00
David Majnemer 2482e1c017 [SimplifyCFG] Don't kill empty cleanuppads with multiple uses
A basic block could contain:
  %cp = cleanuppad []
  cleanupret from %cp unwind to caller

This basic block is empty and is thus a candidate for removal.  However,
there can be other uses of %cp outside of this basic block.  This is
only possible in unreachable blocks.

Make our transform more correct by checking that the pad has a single
user before removing the BB.

This fixes PR28005.

llvm-svn: 271816
2016-06-04 23:50:03 +00:00
David Majnemer 9f92f4c497 [SimplifyCFG] Remove cleanuppads which are empty except for calls to lifetime.end
A cleanuppad is not cheap, they turn into many instructions and result
in additional spills and fills.  It is not worth keeping a cleanuppad
around if all it does is hold a lifetime.end instruction.

N.B.  We first try to merge the cleanuppad with another cleanuppad to
avoid dropping the lifetime and debug info markers.

llvm-svn: 270314
2016-05-21 05:12:32 +00:00
Sanjay Patel 75892a1543 [SimplifyCFG] eliminate switch cases based on known range of switch condition
This was noted in PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766#c2

We may not know whether the sign bit(s) are zero or one, but we can still
optimize based on knowing that the sign bit is repeated.

Differential Revision: http://reviews.llvm.org/D20275

llvm-svn: 270222
2016-05-20 14:53:09 +00:00
Dehao Chen f16376b505 Follow-up patch of http://reviews.llvm.org/D19948 to handle missing profiles when simplifying CFG.
Summary: Set default branch weight to 1:1 if one of the branch has profile missing when simplifying CFG.

Reviewers: spatel, davidxl

Subscribers: danielcdh, llvm-commits

Differential Revision: http://reviews.llvm.org/D20307

llvm-svn: 269995
2016-05-18 22:41:03 +00:00
Sanjay Patel 399780f088 add test to show missing optimization
llvm-svn: 269601
2016-05-15 18:41:18 +00:00
Sanjay Patel ecdd13d788 regenerate checks
llvm-svn: 269596
2016-05-15 18:05:10 +00:00
Dehao Chen b76e5d948a Propagate branch metadata when some branch probability is missing.
Summary: In sample profile, some branches may have profile missing due to profile inaccuracy. We want existing branch probability still valid after propagation.

Reviewers: hfinkel, davidxl, spatel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19948

llvm-svn: 269137
2016-05-10 23:07:19 +00:00
Sanjay Patel 1cb6241a89 [SimplifyCFG] propagate branch metadata when creating select (retry r268550 / r268751 with possible fix)
Retrying r268550/r268751 which were reverted at r268577/r268765 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken another guess at fixing
the problem in this version of the patch and will watch for another failure.

Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.

Differential Revision: http://reviews.llvm.org/D19674

llvm-svn: 268767
2016-05-06 18:07:46 +00:00
Sanjay Patel 84a0bf64a8 revert r268751 - caused same failures on msan bot
llvm-svn: 268765
2016-05-06 17:51:37 +00:00
Sanjay Patel 6609510c32 [SimplifyCFG] propagate branch metadata when creating select (retry r268550 with possible fix)
Retrying r268550 which was reverted at r268577 due a memory sanitizer failure.
I have not been able to reproduce that failure, but I've taken a guess at fixing
the problem in this version of the patch and will watch for another failure.

Original commit message:
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.

Differential Revision: http://reviews.llvm.org/D19674

llvm-svn: 268751
2016-05-06 17:07:47 +00:00
Chad Rosier 4ab37c0037 [SimplifyCFG] Prefer a simplification based on a dominating condition.
Rather than merge two branches with a common destination.
Differential Revision: http://reviews.llvm.org/D19743

llvm-svn: 268735
2016-05-06 14:25:14 +00:00
Chad Rosier 25cfb7dbd6 [ValueTracking] Improve isImpliedCondition for matching LHS and Imm RHSs.
llvm-svn: 268636
2016-05-05 15:39:18 +00:00
Vitaly Buka fdcea9d78a Revert "[SimplifyCFG] propagate branch metadata when creating select"
MemorySanitizer: use-of-uninitialized-value
0x4910e47 in count /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:159:12
0x4910e47 in countLeadingZeros<unsigned long> /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/Support/MathExtras.h:183
0x4910e47 in FitWeights /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:855
0x4910e47 in SimplifyCondBranchToCondBranch /mnt/b/sanitizer-buildbot2/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:2895

This reverts commit 609f4dd4bf3bc735c8c047a4d4b0a8e9e4d202e2.

llvm-svn: 268577
2016-05-04 23:59:33 +00:00
Sanjay Patel 7e8c285814 [SimplifyCFG] propagate branch metadata when creating select
Unlike earlier similar fixes, we need to recalculate the branch weights
in this case.

Differential Revision: http://reviews.llvm.org/D19674

llvm-svn: 268550
2016-05-04 20:48:24 +00:00
Hans Wennborg 0c3518e84b [SimplifyCFG] isSafeToSpeculateStore now ignores debug info
This patch fixes PR27615.

@llvm.dbg.value instructions no longer count towards the maximum number of
instructions to look back at in the instruction list when searching for a
store instruction. This should make the output consistent between debug and
non-debug build.

Patch by Henric Karlsson <henric.karlsson@ericsson.com>!

Differential Revision: http://reviews.llvm.org/D19912

llvm-svn: 268512
2016-05-04 15:40:57 +00:00
Reid Kleckner bca59d2a43 Revert "[SimplifyCFG] Extend TryToSimplifyUncondBranchFromEmptyBlock for empty block including lifetime intrinsics"
This reverts commit r268254.

This change causes assertion failures while building Chromium. Reduced
test case coming soon.

llvm-svn: 268288
2016-05-02 19:43:22 +00:00
Hans Wennborg b7599329fc [SimplifyCFG] Extend TryToSimplifyUncondBranchFromEmptyBlock for empty block including lifetime intrinsics
Make it possible that TryToSimplifyUncondBranchFromEmptyBlock merges empty
basic block including lifetime intrinsics as well as phi nodes and
unconditional branch into its successor or predecessor(s).

If successor of empty block has single predecessor, all contents including
lifetime intrinsics are sinked into the successor. Otherwise, they are
hoisted into its predecessor(s) and then merged into the predecessor(s).

Patch by Josh Yoon <josh.yoon@samsung.com>!

Differential Revision: http://reviews.llvm.org/D19257

llvm-svn: 268254
2016-05-02 17:22:54 +00:00
Sanjay Patel bc6fad0bdf add minimal test to show dropped metadata
llvm-svn: 268141
2016-04-30 00:12:54 +00:00
Sanjay Patel 6748ec49e9 remove the metadata added with r267827
We can demonstrate the 'select' bug and fix with a simpler test case.
The merged weight values are already tested in another test.

llvm-svn: 268139
2016-04-30 00:02:36 +00:00
Sanjay Patel 21bd38a07b Update test to use FileCheck
Also, add some metadata to show what that currently looks like.

llvm-svn: 267827
2016-04-28 00:29:27 +00:00
Sanjay Patel 29dea0d230 [SimplifyCFG] propagate branch metadata when creating select
llvm-svn: 267624
2016-04-26 23:15:48 +00:00
Hal Finkel e4c0c1679b [SimplifyCFG] Preserve !llvm.mem.parallel_loop_access when merging
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.

llvm-svn: 267515
2016-04-26 02:06:06 +00:00
Sanjay Patel 82059090d3 Add check for "branch_weights" with prof metadata
While we're here, fix the comment and variable names to make it
clear that these are raw weights, not percentages.

llvm-svn: 267491
2016-04-25 23:15:16 +00:00
Chad Rosier bbabc85031 Fix typo from r267432.
llvm-svn: 267436
2016-04-25 18:20:27 +00:00
Chad Rosier 4c4e3336b8 [ValueTracking] Add an additional test case for r266767 where one operand is a const.
llvm-svn: 267432
2016-04-25 17:41:48 +00:00
Chad Rosier e2cbd13e56 [ValueTracking] Improve isImpliedCondition when the dominating cond is false.
llvm-svn: 267430
2016-04-25 17:23:36 +00:00
Chad Rosier 1a60159064 [SimplifyCFG] Add final missing implications to isImpliedTrueByMatchingCmp.
Summary: eq imply [u|s]ge and [u|s]le are true.

Remove redundant logic by implementing isImpliedFalseByMatchingCmp(Pred1, Pred2)
as isImpliedTrueByMatchingCmp(Pred1, getInversePredicate(Pred2)).

llvm-svn: 267177
2016-04-22 17:57:34 +00:00
Chad Rosier 3456cb5672 [SimplifyCFG] Add missing implications to isImpliedTrueByMatchingCmp.
Summary: [u|s]gt and [u|s]lt imply [u|s]ge and [u|s]le are true, respectively.
I've simplified the existing tests and added additional tests to cover the new
cases mentioned above.  I've also added tests for all the cases where the
first compare doesn't imply anything about the second compare.

llvm-svn: 267171
2016-04-22 17:14:12 +00:00
Chad Rosier 1960d13e29 [SimplifyCFG] Simplify code review by temporarily removing this test file.
A followup commit will replace these tests with simplified and more inclusive
tests.  The diff is unreadable if this were to be done in a single commit.

llvm-svn: 267170
2016-04-22 17:14:08 +00:00
Sanjoy Das 54a3a006ca [SimplifyCFG] Fold `llvm.guard(false)` to unreachable
Summary:
`llvm.guard(false)` always bails out of the current compilation unit, so
we can prune any control flow following it.

Reviewers: hfinkel, pcc, reames

Subscribers: majnemer, reames, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19245

llvm-svn: 266955
2016-04-21 05:09:12 +00:00
Mandeep Singh Grang 029a0567fa [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.
Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests.

Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier

Subscribers: mcrosier, dsanders

Differential Revision: http://reviews.llvm.org/D19279

llvm-svn: 266834
2016-04-19 23:51:52 +00:00
Chad Rosier b7dfbb40a3 [ValueTracking] Improve isImpliedCondition for conditions with matching operands.
This patch improves SimplifyCFG to catch cases like:

  if (a < b) {
    if (a > b) <- known to be false
      unreachable;
  }

Phabricator Revision: http://reviews.llvm.org/D18905

llvm-svn: 266767
2016-04-19 17:19:14 +00:00
Adrian Prantl 75819aedf6 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
Sanjay Patel f11ab05bdb [SimplifyCFG] propagate branch metadata when creating select (PR27344)
This is almost identical to:
http://reviews.llvm.org/rL264527

This doesn't solve PR27344; it just allows the profile weights to survive. 
To solve the bug, we need to use the profile weights in the backend.

llvm-svn: 266442
2016-04-15 15:32:12 +00:00
Sanjay Patel 81433e99b9 [SimplifyCFG] add metadata to show failure to propagate (PR27344)
llvm-svn: 266435
2016-04-15 14:53:35 +00:00
Adrian Prantl b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Davide Italiano 936a2b09f3 [DebugInfo] Subprograms should belong to a CU.
Start fixing tests accordingly. There are still
about 35 failures before we can enable this check
in the IR verifier.

llvm-svn: 264990
2016-03-31 03:40:07 +00:00
Adrian Prantl 4a09777b37 Upgrade some wildly anachronistic debug info in testcases.
llvm-svn: 264797
2016-03-29 22:34:30 +00:00
Hyojin Sung 4673f10568 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being
applied (e.g., transforming nested loops into simplified forms and loop vectorization).
    
The patch augments the existing PHI node-based check by adding a pre-test if the BB actually
belongs to a set of loop headers and not eliminating it if yes.

llvm-svn: 264697
2016-03-29 04:08:57 +00:00
Sanjay Patel 3e9664fd60 regenerate checks
llvm-svn: 264677
2016-03-28 22:12:21 +00:00
Reid Kleckner ba85781f58 Revert "[SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops"
This reverts commit r264596.

It does not compile.

llvm-svn: 264604
2016-03-28 18:07:40 +00:00
Hyojin Sung 0ada5b0d14 [SimlifyCFG] Prevent passes from destroying canonical loop structure, especially for nested loops
When eliminating or merging almost empty basic blocks, the existence of non-trivial PHI nodes
is currently used to recognize potential loops of which the block is the header and keep the block.
However, the current algorithm fails if the loops' exit condition is evaluated only with volatile
values hence no PHI nodes in the header. Especially when such a loop is an outer loop of a nested
loop, the loop is collapsed into a single loop which prevent later optimizations from being 
applied (e.g., transforming nested loops into simplified forms and loop vectorization).

The patch augments the existing PHI node-based check by adding a pre-test if the BB actually 
belongs to a set of loop headers and not eliminating it if yes. 

llvm-svn: 264596
2016-03-28 17:22:25 +00:00
Sanjay Patel 796db35f62 [SimplifyCFG] propagate branch metadata when creating select (PR26636)
llvm-svn: 264527
2016-03-26 23:30:50 +00:00
Sanjay Patel 342f7c7e10 minimize test cases
These are tests for store transforms. 
The loads, adds, and geps were irrelevant.

llvm-svn: 264526
2016-03-26 23:09:25 +00:00
Sanjay Patel 9e23fedaf0 propagate 'unpredictable' metadata on select instructions
This is similar to D18133 where we allowed profile weights on select instructions. 
This extends that change to also allow the 'unpredictable' attribute of branches to apply to selects.

A test to check that 'unpredictable' metadata is preserved when cloning instructions was checked in at:
http://reviews.llvm.org/rL263648

Differential Revision: http://reviews.llvm.org/D18220

llvm-svn: 263716
2016-03-17 15:30:52 +00:00
Sanjay Patel 3b32ebb97b use FileCheck for tighter checking
llvm-svn: 263679
2016-03-16 23:39:37 +00:00
Sanjay Patel b672e792f2 reduce check strings; no need to check IR comments
llvm-svn: 263675
2016-03-16 23:22:01 +00:00
Sanjay Patel 6cec10572f use FileCheck for tighter checking
I'm testing out a script that auto-generates the check lines.
It's 98% copied from utils/update_llc_test_checks.py.
If others think this is useful, please let me know.

llvm-svn: 263668
2016-03-16 22:34:57 +00:00
Sanjay Patel cb775fcf22 use FileCheck for tighter checking
I'm testing out a script that auto-generates the check lines.
It's 98% copied from utils/update_llc_test_checks.py.
If others think this is useful, please let me know.

llvm-svn: 263667
2016-03-16 22:29:07 +00:00
Sanjay Patel ee52b6e77d allow branch weight metadata on select instructions (PR26636)
As noted in:
https://llvm.org/bugs/show_bug.cgi?id=26636

This doesn't accomplish anything on its own. It's the first step towards preserving 
and using branch weights with selects.

The next step would be to make sure we're propagating the info in all of the other
places where we create selects (SimplifyCFG, InstCombine, etc). I don't think there's
an easy fix to make this happen; we have to look at each transform individually to 
determine how to correctly propagate the weights.

Along with that step, we need to then use the weights when making subsequent transform
decisions such as discussed in http://reviews.llvm.org/D16836.

The inliner test is independent but closely related. It verifies that metadata is
preserved when both branches and selects are cloned.

Differential Revision: http://reviews.llvm.org/D18133

llvm-svn: 263482
2016-03-14 20:18:59 +00:00
Sanjay Patel 610da4fbaf update test to use FileCheck
llvm-svn: 263347
2016-03-12 21:09:26 +00:00
David Majnemer ec72e37220 [SimplifyCFG] Do not blindly remove unreachable blocks
DeleteDeadBlock was called indiscriminately, leading to cleanuprets with
undef cleanuppad references.

Instead, try to drain the BB of most of it's instructions if it is
unreachable.  We can then remove the BB if it solely consists of a
terminator (and maybe some phis).

llvm-svn: 261731
2016-02-24 10:02:16 +00:00
David Majnemer 1efa23ddab [SimplifyCFG] Merge together cleanuppads
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.

Differential Revision: http://reviews.llvm.org/D17459

llvm-svn: 261390
2016-02-20 01:07:45 +00:00
Justin Lebar db63949e8d [SimplifyCFG] Don't fold conditional branches that contain calls to convergent functions.
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.

Reviewers: jingyue

Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17128

llvm-svn: 260730
2016-02-12 21:01:36 +00:00
Gerolf Hoflehner 2432bd0ddd [SimplifyCFG] Fix for "endless" loop after dead code removal (Alternative to
D16251)

Summary:
This is a simpler fix to the problem than the dominator approach in
http://reviews.llvm.org/D16251. It adds only values into the gather() while loop
that have been seen before.

The actual endless loop is in the constant compare gather() routine in
Utils/SimplifyCFG.cpp. The same value ret.0.off0.i is pushed back into the
queue:
%.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i

Here is what happens at the IR level:

for.cond.i:                                       ; preds = %if.end6.i,
%if.end.i54
%ix.0.i = phi i32 [ 0, %if.end.i54 ], [ %inc.i55, %if.end6.i ]
%ret.0.off0.i = phi i1 [false, %if.end.i54], [%.ret.0.off0.i, %if.end6.i] <<<
%cmp2.i = icmp ult i32 %ix.0.i, %11
br i1 %cmp2.i, label %for.body.i, label %LBJ_TmpSimpleNeedExt.exit

if.end6.i:                                        ; preds = %for.body.i
%cmp10.i = icmp ugt i32 %conv.i, %add9.i
%.ret.0.off0.i = or i1 %ret.0.off0.i, %cmp10.i <<<

When if.end.i54 gets eliminated which removes the definition of ret.0.off0.i.
The result is the expression %.ret.0.off0.i = or i1 %.ret.0.off0.i, %cmp10.i
(Note the first ‘or’ operand is now %.ret.0.off0.i, and *NOT* %ret.0.off0.i).
And
now there is use of .ret.0.off0.i before a definition which triggers the
“endless” loop in gather():

while(!DFT.empty()) {

    V = DFT.pop_back_val();   // V is .ret.0.off0.i

    if (Instruction *I = dyn_cast<Instruction>(V)) {
      // If it is a || (or && depending on isEQ), process the operands.
      if (I->getOpcode() == (isEQ ? Instruction::Or : Instruction::And)) {
        DFT.push_back(I->getOperand(1));  // This is now .ret.0.off0.i also
        DFT.push_back(I->getOperand(0));

        continue; // “endless loop” for .ret.0.off0.i
      }

Reviewers: reames, ahatanak

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16839

llvm-svn: 259730
2016-02-03 23:54:25 +00:00
Evgeniy Stepanov e257f0f671 Tweak unnamed label syntax in textual IR for easier matching in tests.
Change the unnamed label comments like
  ; <label>:8  ; preds = %1
to
  ; <label>:8:  ; preds = %1

This way lit tests can match [[LABEL]]: in both asserts and no-asserts builds.

llvm-svn: 258993
2016-01-27 21:53:08 +00:00
Sanjay Patel 5264cc772c [SimplifyCFG] limit recursion depth when speculating instructions (PR26308)
This is a fix for:
https://llvm.org/bugs/show_bug.cgi?id=26308

With the switch to using the TTI cost model in:
http://reviews.llvm.org/rL228826
...it became possible to hit a zero-cost cycle of instructions (gep -> phi -> gep...), 
so we need a cap for the recursion in DominatesMergePoint().

A recursion depth parameter was already added for a different reason in:
http://reviews.llvm.org/rL255660
...so we can just set a limit for it.

I pulled "10" out of the air and made it an independent parameter that we can play with.
It might be higher than it needs to be given the currently low default value of 
PHINodeFoldingThreshold (2). That's the starting cost value that we enter the recursion
with, and most instructions have cost set to TCC_Basic (1), so I don't think we're going
to speculate more than 2 instructions with the current parameters.

As noted in the review and the TODO comment, we can do better than just limiting recursion
depth.

Differential Revision: http://reviews.llvm.org/D16637

llvm-svn: 258971
2016-01-27 19:22:45 +00:00
David Majnemer fccf5c6e01 Revert "Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)""
This reverts commit r258903 which reverted r255660.  r258903 was an
accidental commit and should not have been committed.

llvm-svn: 258905
2016-01-27 02:59:41 +00:00
David Majnemer c761afd1d1 [SimplifyCFG] Don't mistake icmp of and for a tree of comparisons
SimplifyCFG tries to turn complex branch conditions into a switch.
Some of it's logic attempts to reason about bitwise arithmetic produced
by InstCombine.  InstCombine can turn things like (X == 2) || (X == 3)
into (X & 1) == 2 and so SimplifyCFG tries to detect when this occurs so
that it can produce a switch instruction.

However, the legality checking was not sufficient to determine whether
or not this had occured.  Correctly check this case by requiring that
the right-hand side of the comparison be a power of two.

This fixes PR26323.

llvm-svn: 258904
2016-01-27 02:43:28 +00:00
David Majnemer 47de2140f7 Revert "[SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)"
This reverts commit r255660.

llvm-svn: 258903
2016-01-27 02:43:22 +00:00
Chen Li 1689c2f54b [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary:
This is a fix of D13718. D13718 was committed but then reverted because of the following bug:
https://llvm.org/bugs/show_bug.cgi?id=25299

This patch fixes the issue shown in the bug.

Reviewers: majnemer, reames

Subscribers: jevinskie, llvm-commits

Differential Revision: http://reviews.llvm.org/D14308

llvm-svn: 257277
2016-01-10 05:48:01 +00:00
David Majnemer 59eb733af1 [SimplifyCFG] Further improve our ability to remove redundant catchpads
In r256814, we managed to remove catchpads which were trivially redudant
because they were the same SSA value.  We can do better using the same
algorithm but with a smarter datastructure by hashing the SSA values
within the catchpad and comparing them structurally.

llvm-svn: 256815
2016-01-05 07:42:17 +00:00
David Majnemer 2fa8651a8f [SimplifyCFG] Remove redundant catchpads
Remove duplicate catchpad handlers from a catchswitch.

llvm-svn: 256814
2016-01-05 06:27:50 +00:00
Joseph Tremoulet 0d808888c1 [WinEH] Simplify unreachable catchpads
Summary:
At least for CoreCLR, a catchpad which immediately executes an
`unreachable` instruction indicates that the exception can never have a
matching type, and so such catchpads can be removed, and so can their
catchswitches if the catchswitch becomes empty.

Reviewers: rnk, andrew.w.kaylor, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D15846

llvm-svn: 256809
2016-01-05 02:37:41 +00:00
Chen Li d71999ef1b [gc.statepoint] Change gc.statepoint intrinsic's return type to token type instead of i32 type
Summary: This patch changes gc.statepoint intrinsic's return type to token type instead of i32 type. Using token types could prevent LLVM to merge different gc.statepoint nodes into PHI nodes and cause further problems with gc relocations. The patch also changes the way on how gc.relocate and gc.result look for their corresponding gc.statepoint on unwind path. The current implementation uses the selector value extracted from a { i8*, i32 } landingpad as a hook to find the gc.statepoint, while the patch directly uses a token type landingpad (http://reviews.llvm.org/D15405) to find the gc.statepoint. 

Reviewers: sanjoy, JosephTremoulet, pgavlin, igor-laevsky, mjacob

Subscribers: reames, mjacob, sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D15662

llvm-svn: 256443
2015-12-26 07:54:32 +00:00
James Molloy 3d21dcf3ed [SimplifyCFG] Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

Now with a fix (and fixed tests) for the conformance issue seen in Chromium.

llvm-svn: 255767
2015-12-16 14:12:44 +00:00
Sanjay Patel 38a022623a [SimplifyCFG] allow speculation of exactly one expensive instruction (PR24818)
This is the last general step to allow more IR-level speculation with a safety harness in place in CodeGenPrepare.

The intent is to restore the behavior enabled by:
http://reviews.llvm.org/rL228826

but prevent bad performance such as:
https://llvm.org/bugs/show_bug.cgi?id=24818

Earlier patches in this sequence:
D12882 (disable SimplifyCFG speculation for expensive instructions)
D13297 (have CGP despeculate expensive ops)
D14630 (have CGP despeculate special versions of cttz/ctlz)

As shown in the test cases, we only have two instructions currently affected: ctz for some x86 and fdiv generally. 
Allowing exactly one expensive instruction is a bit of a hack, but it lines up with what is currently implemented
in CGP. If we make the despeculation more general in CGP, we can make the speculation here more liberal.

A follow-up patch will adjust the cost for sqrt and possibly other typically expensive math intrinsics (currently
everything is cheap by default). GPU targets would likely want to override those expensive default costs (just as
they probably should already override the cost of div/rem) because just about any math is cheaper than control-flow
on those targets.

Differential Revision: http://reviews.llvm.org/D15213

llvm-svn: 255660
2015-12-15 17:38:29 +00:00
Reid Kleckner db9a91e324 Revert "Don't create unnecessary PHIs"
This reverts commit r255489.

It causes test failures in Chromium and does not appear to respect the
AlternativeV parameter.

llvm-svn: 255562
2015-12-14 22:36:57 +00:00
David Majnemer bbfc7219ef [IR] Remove terminatepad
It turns out that terminatepad gives little benefit over a cleanuppad
which calls the termination function.  This is not sufficient to
implement fully generic filters but MSVC doesn't support them which
makes terminatepad a little over-designed.

Depends on D15478.

Differential Revision: http://reviews.llvm.org/D15479

llvm-svn: 255522
2015-12-14 18:34:23 +00:00
James Molloy 2b1e101e99 Don't create unnecessary PHIs
In conditional store merging, we were creating PHIs when we didn't
need to. If the value to be predicated isn't defined in the block
we're predicating, then it doesn't need a PHI at all (because we only
deal with triangles and diamonds, any value not in the predicated BB
must dominate the predicated BB).

This fixes a large code size increase in some benchmarks in a popular embedded benchmark suite.

llvm-svn: 255489
2015-12-14 10:57:01 +00:00
David Majnemer 8a1c45d6e8 [IR] Reformulate LLVM's EH funclet IR
While we have successfully implemented a funclet-oriented EH scheme on
top of LLVM IR, our scheme has some notable deficiencies:
- catchendpad and cleanupendpad are necessary in the current design
  but they are difficult to explain to others, even to seasoned LLVM
  experts.
- catchendpad and cleanupendpad are optimization barriers.  They cannot
  be split and force all potentially throwing call-sites to be invokes.
  This has a noticable effect on the quality of our code generation.
- catchpad, while similar in some aspects to invoke, is fairly awkward.
  It is unsplittable, starts a funclet, and has control flow to other
  funclets.
- The nesting relationship between funclets is currently a property of
  control flow edges.  Because of this, we are forced to carefully
  analyze the flow graph to see if there might potentially exist illegal
  nesting among funclets.  While we have logic to clone funclets when
  they are illegally nested, it would be nicer if we had a
  representation which forbade them upfront.

Let's clean this up a bit by doing the following:
- Instead, make catchpad more like cleanuppad and landingpad: no control
  flow, just a bunch of simple operands;  catchpad would be splittable.
- Introduce catchswitch, a control flow instruction designed to model
  the constraints of funclet oriented EH.
- Make funclet scoping explicit by having funclet instructions consume
  the token produced by the funclet which contains them.
- Remove catchendpad and cleanupendpad.  Their presence can be inferred
  implicitly using coloring information.

N.B.  The state numbering code for the CLR has been updated but the
veracity of it's output cannot be spoken for.  An expert should take a
look to make sure the results are reasonable.

Reviewers: rnk, JosephTremoulet, andrew.w.kaylor

Differential Revision: http://reviews.llvm.org/D15139

llvm-svn: 255422
2015-12-12 05:38:55 +00:00
Sanjoy Das 8a954a0553 [OperandBundles] Fix a transform in simplifycfg
Reviewers: pcc, majnemer, reames

Subscribers: reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D15345

llvm-svn: 255062
2015-12-08 22:26:08 +00:00
Igor Laevsky 7310c68e85 Revert "Revert "Strip metadata when speculatively hoisting instructions (r252604)"
Failing clang test is now fixed by the r253458.

llvm-svn: 253459
2015-11-18 14:50:18 +00:00
Sanjay Patel f740129198 [MIPS] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
MIPS32 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any MIPS32
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:
  jr  $ra
  clz  $2, $4

cttz:
  addiu  $1, $4, -1
  not  $2, $4
  and  $1, $2, $1
  clz  $1, $1
  addiu  $2, $zero, 32
  jr  $ra
  subu  $2, $2, $1

Instead of:

ctlz:
  beqz  $4, $BB0_2
  addiu  $2, $zero, 32
  clz  $2, $4
$BB0_2:
  jr  $ra
  nop

cttz:
  beqz  $4, $BB1_2
  addiu  $2, $zero, 32
  addiu  $1, $4, -1
  not  $2, $4
  and  $1, $2, $1
  clz  $1, $1
  addiu  $2, $zero, 32
  subu  $2, $2, $1
$BB1_2:
  jr  $ra
  nop

See D14469 for the larger motivation.

Differential Revision: http://reviews.llvm.org/D14500

llvm-svn: 252755
2015-11-11 17:24:56 +00:00
Sanjay Patel af1b48bfdc [ARM] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
ARM V6T2 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any ARM V6T2
implementation.

The net result of allowing this speculation for the regression tests in this patch is
that we get this code:

ctlz:               
  clz  r0, r0
  bx  lr
cttz:              
  rbit  r0, r0
  clz  r0, r0
  bx  lr

Instead of:

ctlz:    
  cmp  r0, #0
  moveq  r0, #32
  clzne  r0, r0
  bx  lr
cttz:     
  cmp   r0, #0
  moveq  r0, #32
  rbitne  r0, r0
  clzne  r0, r0
  bx  lr

This will help solve a general speculation/despeculation problem noted in PR24818:
https://llvm.org/bugs/show_bug.cgi?id=24818

Differential Revision: http://reviews.llvm.org/D14469

llvm-svn: 252639
2015-11-10 19:24:31 +00:00
Sanjay Patel 241c31fb64 [AArch64] add overrides for isCheapToSpeculateCttz() and isCheapToSpeculateCtlz()
AArch64 has instructions for efficient count-leading/trailing-zeros, so this should be
considered a cheap operation (and therefore fair game for speculation) for any AArch64
implementation.

The net result of allowing this speculation for the regression tests in this
patch is that we get this code:

ctlz:
  clz  w0, w0
  ret

cttz:
  rbit  w8, w0
  clz  w0, w8
  ret

Instead of:

ctlz:
  cbz  w0, .LBB0_2
  clz  w0, w0
  ret
.LBB0_2:
  orr  w0, wzr, #0x20
  ret

cttz:
  cbz  w0, .LBB1_2
  rbit  w8, w0
  clz  w0, w8
  ret
.LBB1_2:
  orr  w0, wzr, #0x20
  ret

See D14469 for the larger motivation.

Differential Revision: http://reviews.llvm.org/D14505

llvm-svn: 252625
2015-11-10 18:11:37 +00:00
Renato Golin 0e77d72b0a Revert "Strip metadata when speculatively hoisting instructions"
This reverts commit r252604, as it broke all ARM and AArch64 buildbots, as
well as some x86, et al.

llvm-svn: 252623
2015-11-10 18:01:16 +00:00
Igor Laevsky 01c3692a10 Strip metadata when speculatively hoisting instructions
This is fix for PR24059.

When we are hoisting instruction above some condition it may turn out
that metadata on this instruction was control dependant on the condition.
This metadata becomes invalid and we need to drop it.

This patch should cover most obvious places of speculative execution (which
I have found by greping isSafeToSpeculativelyExecute). I think there are more
cases but at least this change covers the severe ones.

Differential Revision: http://reviews.llvm.org/D14398

llvm-svn: 252604
2015-11-10 14:10:31 +00:00
Peter Collingbourne d4bff30370 DI: Reverse direction of subprogram -> function edge.
Previously, subprograms contained a metadata reference to the function they
described. Because most clients need to get or set a subprogram for a given
function rather than the other way around, this created unneeded inefficiency.

For example, many passes needed to call the function llvm::makeSubprogramMap()
to build a mapping from functions to subprograms, and the IR linker needed to
fix up function references in a way that caused quadratic complexity in the IR
linking phase of LTO.

This change reverses the direction of the edge by storing the subprogram as
function-level metadata and removing DISubprogram's function field.

Since this is an IR change, a bitcode upgrade has been provided.

Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is
attached to the PR.

Differential Revision: http://reviews.llvm.org/D14265

llvm-svn: 252219
2015-11-05 22:03:56 +00:00
James Molloy 4de84ddec9 [SimplifyCFG] Merge conditional stores
We can often end up with conditional stores that cannot be speculated. They can come from fairly simple, idiomatic code:

  if (c & flag1)
    *a = x;
  if (c & flag2)
    *a = y;
  ...

There is no dominating or post-dominating store to a, so it is not legal to move the store unconditionally to the end of the sequence and cache the intermediate result in a register, as we would like to.

It is, however, legal to merge the stores together and do the store once:

  tmp = undef;
  if (c & flag1)
    tmp = x;
  if (c & flag2)
    tmp = y;
  if (c & flag1 || c & flag2)
    *a = tmp;

The real power in this optimization is that it allows arbitrary length ladders such as these to be completely and trivially if-converted. The typical code I'd expect this to trigger on often uses binary-AND with constants as the condition (as in the above example), which means the ending condition can simply be truncated into a single binary-AND too: 'if (c & (flag1|flag2))'. As in the general case there are bitwise operators here, the ladder can often be optimized further too.

This optimization involves potentially increasing register pressure. Even in the simplest case, the lifetime of the first predicate is extended. This can be elided in some cases such as using binary-AND on constants, but not in the general case. Threading 'tmp' through all branches can also increase register pressure.

The optimization as in this patch is enabled by default but kept in a very conservative mode. It will only optimize if it thinks the resultant code should be if-convertable, and additionally if it can thread 'tmp' through at least one existing PHI, so it will only ever in the worst case create one more PHI and extend the lifetime of a predicate.

This doesn't trigger much in LNT, unfortunately, but it does trigger in a big way in a third party test suite.

llvm-svn: 252051
2015-11-04 15:28:04 +00:00
Artur Pilipenko 5c5011d503 Preserve load alignment and dereferenceable metadata during some transformations
Reviewed By: hfinkel

Differential Revision: http://reviews.llvm.org/D13953

llvm-svn: 251809
2015-11-02 17:53:51 +00:00
Philip Reames 846e3e41ed [SimplifyCFG] Constant fold a branch implied by it's incoming edge
The most common use case is when eliminating redundant range checks in an example like the following:
c = a[i+1] + a[i];

Note that all the smarts of the transform (the implication engine) is already in ValueTracking and is tested directly through InstructionSimplify.

Differential Revision: http://reviews.llvm.org/D13040

llvm-svn: 251596
2015-10-29 03:11:49 +00:00
David Majnemer 492937095f [SimplifyCFG] Don't DCE catchret because the successor is unreachable
CatchReturnInst has side-effects: it runs a destructor.  This destructor
could conceivably run forever/call exit/etc. and should not be removed.

llvm-svn: 251461
2015-10-27 22:43:56 +00:00
Chen Li 7009cd3554 Revert rL251061 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
llvm-svn: 251149
2015-10-23 21:13:01 +00:00
Chen Li c6e28782d8 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.  

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13718

llvm-svn: 251061
2015-10-22 20:48:38 +00:00
David Majnemer dc3b67b4ca [SimplifyCFG] Don't use-after-free an SSA value
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.

This fixes PR25267.

llvm-svn: 250922
2015-10-21 18:22:24 +00:00
Philip Reames a956cc7f08 Revert 250343 and 250344
Turns out this approach is buggy.  In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.  

Consider:
 if (i u< L) leave()
 if ((i + 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow.  This gives us:
 if (i u< L) leave()
 if ((i +nuw 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we now do the transform this patch proposed, we end up with:
 if ((i +nuw 1) u< L) leave_appropriately()
 print(a[i] + a[i+1]) 

That would be a miscompile when i==-1.  The problem here is that the control dependent nuw bits got used to prove something about the first condition.  That's obviously invalid.

This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...

llvm-svn: 250430
2015-10-15 16:51:00 +00:00
Philip Reames aae328d6eb Test case which should have been part of 250343
llvm-svn: 250344
2015-10-14 22:47:06 +00:00
Kostya Serebryany 5cb86d5a40 [asan] Disabling speculative loads under asan. Patch by Mike Aizatsky
llvm-svn: 250259
2015-10-14 00:21:05 +00:00
Joseph Tremoulet 09af67aba5 [EH] Create removeUnwindEdge utility
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.


Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

llvm-svn: 248677
2015-09-27 01:47:46 +00:00
Sanjay Patel 13e8bbc237 set div/rem default values to 'expensive' in TargetTransformInfo's cost model
...because that's what the cost model was intended to do.

As discussed in D12882, this fix has a temporary unintended consequence for
SimplifyCFG: it causes us to not speculate an fdiv. However, two wrongs make
PR24818 right, and two wrongs make PR24343 act right even though it's really
still wrong.

I intend to correct SimplifyCFG and add to CodeGenPrepare to account for this
cost model change and preserve the righteousness for the bug report cases.

https://llvm.org/bugs/show_bug.cgi?id=24818
https://llvm.org/bugs/show_bug.cgi?id=24343

Differential Revision: http://reviews.llvm.org/D12882

llvm-svn: 248439
2015-09-23 22:28:18 +00:00
Reid Kleckner b005d281c3 [WinEH] Pull Adjectives and CatchObj out of the catchpad arg list
Clang now passes the adjectives as an argument to catchpad.

Getting the CatchObj working is simply a matter of threading another
static alloca through codegen, first as an alloca, then as a frame
index, and finally as a frame offset.

llvm-svn: 247844
2015-09-16 20:16:27 +00:00
Reid Kleckner 5dbee7baef [IR] Print the label operands of a catchpad like an invoke
The rest of the EH pads are fine, since they have at most one label and
take fewer operands for the personality.

Old catchpad vs. new:
  %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)] to label %__except.ret.10 unwind label %catchendblock.9
-----
  %5 = catchpad [i8* bitcast (i32 ()* @"\01?filt$0@0@main@@" to i8*)]
          to label %__except.ret.10 unwind label %catchendblock.9

llvm-svn: 247433
2015-09-11 17:27:52 +00:00
Philip Reames 053701399d [SimplifyCFG] Use known bits to eliminate dead switch defaults
This is a follow up to http://reviews.llvm.org/D11995 implementing the suggestion by Hans.

If we know some of the bits of the value being switched on, we know that the maximum number of unique cases covers the unknown bits. This allows to eliminate switch defaults for large integers (i32) when most bits in the value are known.

Note that I had to make the transform contingent on not having any dead cases. This is conservatively correct with the old code, but required for the new code since we might have a dead case which varies one of the known bits. Counting that towards our number of covering cases would be bad.  If we do have dead cases, we'll eliminate them first, then revisit the possibly dead default.

Differential Revision: http://reviews.llvm.org/D12497

llvm-svn: 247309
2015-09-10 17:44:47 +00:00
Andrew Kaylor a89baa21b8 Fixing bad test syntax.
llvm-svn: 246897
2015-09-04 23:47:34 +00:00
Andrew Kaylor 50e4e86c26 [WinEH] Teach SimplfyCFG to eliminate empty cleanup pads.
Differential Revision: http://reviews.llvm.org/D12434

llvm-svn: 246896
2015-09-04 23:39:40 +00:00
David Majnemer 0a92f86fe6 Revert r246232 and r246304.
This reverts isSafeToSpeculativelyExecute's use of ReadNone until we
split ReadNone into two pieces: one attribute which reasons about how
the function reasons about memory and another attribute which determines
how it may be speculated, CSE'd, trap, etc.

llvm-svn: 246331
2015-08-28 21:13:39 +00:00
Duncan P. N. Exon Smith 814b8e91c7 DI: Require subprogram definitions to be distinct
As a follow-up to r246098, require `DISubprogram` definitions
(`isDefinition: true`) to be 'distinct'.  Specifically, add an assembler
check, a verifier check, and bitcode upgrading logic to combat testcase
bitrot after the `DIBuilder` change.

While working on the testcases, I realized that
test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore.  Its
purpose was to check for a corner case in PR22792 where two subprogram
definitions match exactly and share the same metadata node.  The new
verifier check, requiring that subprogram definitions are 'distinct',
precludes that possibility.

I updated almost all the IR with the following script:

    git grep -l -E -e '= !DISubprogram\(.* isDefinition: true' |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true\)/= distinct \1/'

Likely some variant of would work for out-of-tree testcases.

llvm-svn: 246327
2015-08-28 20:26:49 +00:00
David Majnemer 0293704be2 [ValueTracking] readnone CallInsts are fair game for speculation
Any call which is side effect free is trivially OK to speculate.  We
already had similar logic in EarlyCSE and GVN but we were missing it
from isSafeToSpeculativelyExecute.

This fixes PR24601.

llvm-svn: 246232
2015-08-27 23:03:01 +00:00
Philip Reames 98a2dabc08 [SimplifyCFG] Prune code from a provably unreachable switch default
As Sanjoy pointed out over in http://reviews.llvm.org/D11819, a switch on an icmp should always be able to become a branch instruction. This patch generalizes that notion slightly to prove that the default case of a switch is unreachable if the cases completely cover all possible bit patterns in the condition. Once that's done, the switch to branch conversion kicks in just fine.

Note: Duplicate case values are disallowed by the LangRef and verifier.

Differential Revision: http://reviews.llvm.org/D11995

llvm-svn: 246125
2015-08-26 23:56:46 +00:00
Adrian Prantl baf90fc265 Fix a bug that caused SimplifyCFG to drop DebugLocs.
Instruction::dropUnknownMetadata(KnownSet) is supposed to preserve all
metadata in KnownSet, but the condition for DebugLocs was inverted.

Most users of dropUnknownMetadata() actually worked around this by not
adding LLVMContext::MD_dbg to their list of KnowIDs.
This is now made explicit.

llvm-svn: 245589
2015-08-20 18:24:02 +00:00
Chen Li eafbc9dc47 [ConstantFoldTerminator] Preserve make.implicit metadata when converting SwitchInst to BranchInst
Summary: llvm::ConstantFoldTerminator function can convert SwitchInst with single case (and default) to a conditional BranchInst. This patch adds support to preserve make.implicit metadata on this conversion.

Reviewers: sanjoy, weimingz, chenli

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D11841

llvm-svn: 244348
2015-08-07 19:30:12 +00:00
Duncan P. N. Exon Smith 55ca964e94 DI: Disallow uniquable DICompileUnits
Since r241097, `DIBuilder` has only created distinct `DICompileUnit`s.
The backend is liable to start relying on that (if it hasn't already),
so make uniquable `DICompileUnit`s illegal and automatically upgrade old
bitcode.  This is a nice cleanup, since we can remove an unnecessary
`DenseSet` (and the associated uniquing info) from `LLVMContextImpl`.

Almost all the testcases were updated with this script:

    git grep -e '= !DICompileUnit' -l -- test |
    grep -v test/Bitcode |
    xargs sed -i '' -e 's,= !DICompileUnit,= distinct !DICompileUnit,'

I imagine something similar should work for out-of-tree testcases.

llvm-svn: 243885
2015-08-03 17:26:41 +00:00
Duncan P. N. Exon Smith ed013cd221 DI: Remove DW_TAG_arg_variable and DW_TAG_auto_variable
Remove the fake `DW_TAG_auto_variable` and `DW_TAG_arg_variable` tags,
using `DW_TAG_variable` in their place Stop exposing the `tag:` field at
all in the assembly format for `DILocalVariable`.

Most of the testcase updates were generated by the following sed script:

    find test/ -name "*.ll" -o -name "*.mir" |
    xargs grep -l 'DILocalVariable' |
    xargs sed -i '' \
      -e 's/tag: DW_TAG_arg_variable, //' \
      -e 's/tag: DW_TAG_auto_variable, //'

There were only a handful of tests in `test/Assembly` that I needed to
update by hand.

(Note: a follow-up could change `DILocalVariable::DILocalVariable()` to
set the tag to `DW_TAG_formal_parameter` instead of `DW_TAG_variable`
(as appropriate), instead of having that logic magically in the backend
in `DbgVariable`.  I've added a FIXME to that effect.)

llvm-svn: 243774
2015-07-31 18:58:39 +00:00
Matt Arsenault 5eb5eb59fc AMDGPU: Fix some places missed in rename
llvm-svn: 240143
2015-06-19 17:39:03 +00:00
David Majnemer 7fddeccb8b Move the personality function from LandingPadInst to Function
The personality routine currently lives in the LandingPadInst.

This isn't desirable because:
- All LandingPadInsts in the same function must have the same
  personality routine.  This means that each LandingPadInst beyond the
  first has an operand which produces no additional information.

- There is ongoing work to introduce EH IR constructs other than
  LandingPadInst.  Moving the personality routine off of any one
  particular Instruction and onto the parent function seems a lot better
  than have N different places a personality function can sneak onto an
  exceptional function.

Differential Revision: http://reviews.llvm.org/D10429

llvm-svn: 239940
2015-06-17 20:52:32 +00:00
Igor Laevsky 965bf6a3ce [Statepoints] Add test case to check that statepoint is marked with Throwable attribute.
Differential Revision: http://reviews.llvm.org/D10215

llvm-svn: 239473
2015-06-10 13:24:00 +00:00
Sunil Srivastava d79dfcbc37 Changed renaming of local symbols by inserting a dot vefore the numeric suffix.
One code change and several test changes to match that
details in http://reviews.llvm.org/D9481

llvm-svn: 237150
2015-05-12 16:47:30 +00:00
Duncan P. N. Exon Smith a9308c49ef IR: Give 'DI' prefix to debug info metadata
Finish off PR23080 by renaming the debug info IR constructs from `MD*`
to `DI*`.  The last of the `DIDescriptor` classes were deleted in
r235356, and the last of the related typedefs removed in r235413, so
this has all baked for about a week.

Note: If you have out-of-tree code (like a frontend), I recommend that
you get everything compiling and tests passing with the *previous*
commit before updating to this one.  It'll be easier to keep track of
what code is using the `DIDescriptor` hierarchy and what you've already
updated, and I think you're extremely unlikely to insert bugs.  YMMV of
course.

Back to *this* commit: I did this using the rename-md-di-nodes.sh
upgrade script I've attached to PR23080 (both code and testcases) and
filtered through clang-format-diff.py.  I edited the tests for
test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns
were off-by-three.  It should work on your out-of-tree testcases (and
code, if you've followed the advice in the previous paragraph).

Some of the tests are in badly named files now (e.g.,
test/Assembler/invalid-mdcompositetype-missing-tag.ll should be
'dicompositetype'); I'll come back and move the files in a follow-up
commit.

llvm-svn: 236120
2015-04-29 16:38:44 +00:00
Hans Wennborg 86ac630585 SimplifyCFG: Correctly handle switch lookup tables which fully cover the input type and use bit tests to check for holes
When using bit tests for hole checks, we call AddPredecessorToBlock to give the
phi node a value from the bit test block. This would break if we've
previously called removePredecessor on the default destination because the
switch is fully covered.

Test case by Mark Lacey.

llvm-svn: 235771
2015-04-24 20:57:56 +00:00
David Blaikie 23af64846f [opaque pointer type] Add textual IR support for explicit type parameter to the call instruction
See r230786 and r230794 for similar changes to gep and load
respectively.

Call is a bit different because it often doesn't have a single explicit
type - usually the type is deduced from the arguments, and just the
return type is explicit. In those cases there's no need to change the
IR.

When that's not the case, the IR usually contains the pointer type of
the first operand - but since typed pointers are going away, that
representation is insufficient so I'm just stripping the "pointerness"
of the explicit type away.

This does make the IR a bit weird - it /sort of/ reads like the type of
the first operand: "call void () %x(" but %x is actually of type "void
()*" and will eventually be just of type "ptr". But this seems not too
bad and I don't think it would benefit from repeating the type
("void (), void () * %x(" and then eventually "void (), ptr %x(") as has
been done with gep and load.

This also has a side benefit: since the explicit type is no longer a
pointer, there's no ambiguity between an explicit type and a function
that returns a function pointer. Previously this case needed an explicit
type (eg: a function returning a void() function was written as
"call void () () * @x(" rather than "call void () * @x(" because of the
ambiguity between a function returning a pointer to a void() function
and a function returning void).

No ambiguity means even function pointer return types can just be
written alone, without writing the whole function's type.

This leaves /only/ the varargs case where the explicit type is required.

Given the special type syntax in call instructions, the regex-fu used
for migration was a bit more involved in its own unique way (as every
one of these is) so here it is. Use it in conjunction with the apply.sh
script and associated find/xargs commands I've provided in rr230786 to
migrate your out of tree tests. Do let me know if any of this doesn't
cover your cases & we can iterate on a more general script/regexes to
help others with out of tree tests.

About 9 test cases couldn't be automatically migrated - half of those
were functions returning function pointers, where I just had to manually
delete the function argument types now that we didn't need an explicit
function type there. The other half were typedefs of function types used
in calls - just had to manually drop the * from those.

import fileinput
import sys
import re

pat = re.compile(r'((?:=|:|^|\s)call\s(?:[^@]*?))(\s*$|\s*(?:(?:\[\[[a-zA-Z0-9_]+\]\]|[@%](?:(")?[\\\?@a-zA-Z0-9_.]*?(?(3)"|)|{{.*}}))(?:\(|$)|undef|inttoptr|bitcast|null|asm).*$)')
addrspace_end = re.compile(r"addrspace\(\d+\)\s*\*$")
func_end = re.compile("(?:void.*|\)\s*)\*$")

def conv(match, line):
  if not match or re.search(addrspace_end, match.group(1)) or not re.search(func_end, match.group(1)):
    return line
  return line[:match.start()] + match.group(1)[:match.group(1).rfind('*')].rstrip() + match.group(2) + line[match.end():]

for line in sys.stdin:
  sys.stdout.write(conv(re.search(pat, line), line))

llvm-svn: 235145
2015-04-16 23:24:18 +00:00
Duncan P. N. Exon Smith 49e6a70fe3 Verifier: Call verifyModule() from llc and opt
Change `llc` and `opt` to run `verifyModule()`.  This ensures that we
check the full module before `FunctionPass::doInitialization()` ever
gets called (I was getting crashes in `DwarfDebug` instead of verifier
failures when testing a WIP patch that checks operands of compile
units).  In `opt`, also move up debug-info-stripping so that it still
runs before verification.

There was a fair bit of broken code that was sitting in tree.
Interestingly, some were cases of a `select` that referred to itself in
`-instcombine` tests (apparently an intermediate result).  I split them
off to `*-noverify.ll` tests with RUN lines like this:

    opt < %s -S -disable-verify -instcombine | opt -S | FileCheck %s

This avoids verifying the input file (so we can get the broken code into
`-instcombine), but still verifies the output with a second call to
`opt` (to verify that `-instcombine` will clean it up like it should).

llvm-svn: 233432
2015-03-27 22:04:28 +00:00
Duncan P. N. Exon Smith 988a7f8b79 DebugInfo: Fix bad debug info for compile units and types
Fix debug info in these tests, which started failing with a WIP patch to
verify compile units and types.  The problems look like they were all
caused by bitrot.  They fell into these categories:

  - Using `!{i32 0}` instead of `!{}`.
  - Using `!{null}` instead of `!{}`.
  - Using `!MDExpression()` instead of `!{}`.
  - Using `!8` instead of `!{!8}`.
  - `file:` references that pointed at `MDCompileUnit`s instead of the
    same `MDFile` as the compile unit.
  - `file:` references that were numerically off-by-one or (off-by-ten).

llvm-svn: 233415
2015-03-27 20:46:33 +00:00
Philip Reames 2b969d7010 Merge empty landing pads in SimplifyCFG
This patch tries to merge duplicate landing pads when they branch to a common shared target.

Given IR that looks like this:
lpad1:
  %exn = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
         cleanup
  br label %shared_resume
lpad2:
  %exn2 = landingpad {i8*, i32} personality i32 (...)* @__gxx_personality_v0
          cleanup
  br label %shared_resume
shared_resume:
  call void @fn()
  ret void
}

We can rewrite the users of both landing pad blocks to use one of them. This will generally allow the shared_resume block to be merged with the common landing pad as well.

Without this change, tail duplication would likely kick in - creating N (2 in this case) copies of the shared_resume basic block.

Differential Revision: http://reviews.llvm.org/D8297

llvm-svn: 233125
2015-03-24 22:28:45 +00:00
Duncan P. N. Exon Smith 166121ad0b Verifier: Check debug info intrinsic arguments
Verify that debug info intrinsic arguments are valid.  (These checks
will not recurse through the full debug info graph, so they don't need
to be cordoned of in `DebugInfoVerifier`.)

With those checks in place, changing the `DbgIntrinsicInst` accessors to
downcast to `MDLocalVariable` and `MDExpression` is natural (added isa
specializations in `Metadata.h` to support this).

Added tests to `test/Verifier` for the new -verify checks, and fixed the
debug info in all the in-tree tests.

If you have out-of-tree testcases that have started to fail to -verify,
hopefully the verify checks are helpful.  The most likely problem is
that the expression argument is `!{}` (instead of `!MDExpression()`).

llvm-svn: 232296
2015-03-15 01:21:30 +00:00
David Blaikie f72d05bc7b [opaque pointer type] Add textual IR support for explicit type parameter to gep operator
Similar to gep (r230786) and load (r230794) changes.

Similar migration script can be used to update test cases, which
successfully migrated all of LLVM and Polly, but about 4 test cases
needed manually changes in Clang.

(this script will read the contents of stdin and massage it into stdout
- wrap it in the 'apply.sh' script shown in previous commits + xargs to
apply it over a large set of test cases)

import fileinput
import sys
import re

rep = re.compile(r"(getelementptr(?:\s+inbounds)?\s*\()((<\d*\s+x\s+)?([^@]*?)(|\s*addrspace\(\d+\))\s*\*(?(3)>)\s*)(?=$|%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|zeroinitializer|<|\[\[[a-zA-Z]|\{\{)", re.MULTILINE | re.DOTALL)

def conv(match):
  line = match.group(1)
  line += match.group(4)
  line += ", "
  line += match.group(2)
  return line

line = sys.stdin.read()
off = 0
for match in re.finditer(rep, line):
  sys.stdout.write(line[off:match.start()])
  sys.stdout.write(conv(match))
  off = match.end()
sys.stdout.write(line[off:])

llvm-svn: 232184
2015-03-13 18:20:45 +00:00
Duncan P. N. Exon Smith e274180f0e DebugInfo: Move new hierarchy into place
Move the specialized metadata nodes for the new debug info hierarchy
into place, finishing off PR22464.  I've done bootstraps (and all that)
and I'm confident this commit is NFC as far as DWARF output is
concerned.  Let me know if I'm wrong :).

The code changes are fairly mechanical:

  - Bumped the "Debug Info Version".
  - `DIBuilder` now creates the appropriate subclass of `MDNode`.
  - Subclasses of DIDescriptor now expect to hold their "MD"
    counterparts (e.g., `DIBasicType` expects `MDBasicType`).
  - Deleted a ton of dead code in `AsmWriter.cpp` and `DebugInfo.cpp`
    for printing comments.
  - Big update to LangRef to describe the nodes in the new hierarchy.
    Feel free to make it better.

Testcase changes are enormous.  There's an accompanying clang commit on
its way.

If you have out-of-tree debug info testcases, I just broke your build.

  - `upgrade-specialized-nodes.sh` is attached to PR22564.  I used it to
    update all the IR testcases.
  - Unfortunately I failed to find way to script the updates to CHECK
    lines, so I updated all of these by hand.  This was fairly painful,
    since the old CHECKs are difficult to reason about.  That's one of
    the benefits of the new hierarchy.

This work isn't quite finished, BTW.  The `DIDescriptor` subclasses are
almost empty wrappers, but not quite: they still have loose casting
checks (see the `RETURN_FROM_RAW()` macro).  Once they're completely
gutted, I'll rename the "MD" classes to "DI" and kill the wrappers.  I
also expect to make a few schema changes now that it's easier to reason
about everything.

llvm-svn: 231082
2015-03-03 17:24:31 +00:00
David Blaikie a79ac14fa6 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie 79e6c74981 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Chad Rosier 543900539f Prevent hoisting fmul from THEN/ELSE to IF if there is fmsub/fmadd opportunity.
This patch adds the isProfitableToHoist API.  For AArch64, we want to prevent a
fmul from being hoisted in cases where it is more profitable to form a
fmsub/fmadd.

Phabricator Review: http://reviews.llvm.org/D7299
Patch by Lawrence Hu <lawrence@codeaurora.org>

llvm-svn: 230241
2015-02-23 19:15:16 +00:00
Andrea Di Biagio b14ae8692d [CodeGenPrepare] Removed duplicate logic. SimplifyCFG already knows how to speculate calls to cttz/ctlz.
SimplifyCFG now knows how to speculate calls to intrinsic cttz/ctlz that are
'cheap' for the target. Therefore, some of the logic in CodeGenPrepare
that was originally added at revision 224899 can now be removed.

This patch is basically a no functional change. It removes the duplicated
logic in CodeGenPrepare and converts all the existing target specific tests
for cttz/ctlz into SimplifyCFG tests.

Differential Revision: http://reviews.llvm.org/D7608

llvm-svn: 229105
2015-02-13 14:15:48 +00:00
James Molloy 5eb75aced4 [SimplifyCFG] Add test for r229099
Add extra test that was accidentally not staged.

llvm-svn: 229101
2015-02-13 11:08:40 +00:00
Andrea Di Biagio b08862c4f0 [TTI] Teach the cost heuristic how to query TLI to check if a zext/trunc is 'free' for the target.
Now that SimplifyCFG uses TTI for the cost heuristic, we can teach BasicTTIImpl
how to query TLI in order to get a more accurate cost for truncates and
zero-extends.

Before this patch, the basic cost heuristic in TargetTransformInfoImplCRTPBase
would have conservatively returned a 'default' TCC_Basic for all zero-extends,
and TCC_Free for truncates on native types.

This patch improves the heuristic so that we query TLI (if available) to get
more accurate answers. If TLI is available, then methods 'isZExtFree' and
'isTruncateFree' can be used to check if a zext/trunc is free for the target.

Added more test cases to SimplifyCFG/X86/speculate-cttz-ctlz.ll.
With this change, SimplifyCFG is now able to speculate a 'cheap' cttz/ctlz
immediately followed by a free zext/trunc.

Differential Revision: http://reviews.llvm.org/D7585

llvm-svn: 228923
2015-02-12 14:17:24 +00:00
Andrea Di Biagio 2a0e435db1 [TTI] Improved cost heuristic for cttz/ctlz calls.
This patch is a follow-up of r228826 (see code-review: D7506).

Now that SimplifyCFG uses TargetTransformInfo for cost analysis, we 
have to fix the cost heuristic for intrinsic calls to cttz/ctlz.

This patch defines method 'getIntrinsicCost' in BasicTTIImpl: now, BasicTTIImpl
queries TLI to check if a call to cttz/ctlz is cheap for the target.

Added test cases in Transforms/SimplifyCFG/X86 to verify that on x86,
SimplifyCFG only speculates a call to cttz/ctlz if it is cheap.

Differential Revision: http://reviews.llvm.org/D7554

llvm-svn: 228829
2015-02-11 14:22:18 +00:00
James Molloy 7c336576a5 [SimplifyCFG] Swap to using TargetTransformInfo for cost
analysis.

We're already using TTI in SimplifyCFG, so remove the hard-baked "cheapness"
heuristic and use TTI directly. Generally NFC intended, but we're using a slightly
different heuristic now so there is a slight test churn.

Test changes:
  * combine-comparisons-by-cse.ll: Removed unneeded branch check.
  * 2014-08-04-muls-it.ll: Test now doesn't branch but emits muleq.
  * coalesce-subregs.ll: Superfluous block check.
  * 2008-01-02-hoist-fp-add.ll: fadd is safe to speculate. Change to udiv.
  * PhiBlockMerge.ll: Superfluous CFG checking code. Main checks still present.
  * select-gep.ll: A variable GEP is not expensive, just TCC_Basic, according to the TTI.

llvm-svn: 228826
2015-02-11 12:15:41 +00:00
Reid Kleckner 96d011315a Don't promote asynch EH invokes of nounwind functions to calls
If the landingpad of the invoke is using a personality function that
catches asynch exceptions, then it can catch a trap.

Also add some landingpads to invalid LLVM IR test cases that lack them.

Over-the-shoulder reviewed by David Majnemer.

llvm-svn: 228782
2015-02-11 01:23:16 +00:00
Chandler Carruth fdffd87d68 [PM] Port SimplifyCFG to the new pass manager.
This should be sufficient to replace the initial (minor) function pass
pipeline in Clang with the new pass manager. I'll probably add an (off
by default) flag to do that just to ensure we can get extra testing.

llvm-svn: 227726
2015-02-01 11:34:21 +00:00
Hans Wennborg b64cb271dc SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
The range check would get optimized away later, but we might as well not emit
them in the first place.

http://reviews.llvm.org/D6471

llvm-svn: 227126
2015-01-26 19:52:34 +00:00
Hans Wennborg 6800008f04 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations and
allows for more efficient lowering. Both the SDag switch lowering and
LowerSwitch can exploit unreachable defaults.

Also make TurnSwitchRangeICmp handle switches with unreachable default.
This is kind of separate change, but it cannot be tested without the change
above, and I don't want to land the change above without this since that would
regress other tests.

Differential Revision: http://reviews.llvm.org/D6471

llvm-svn: 227125
2015-01-26 19:52:32 +00:00
Reid Kleckner f12b33454f Revert "Don't remove a landing pad if the invoke requires a table entry."
This reverts commit r176827.

Björn Steinbrink pointed out that this didn't actually fix the bug
(PR15555) it was attempting to fix.

With this reverted, we can now remove landingpad cleanups that
immediately resume unwinding, converting the invoke to a call.

llvm-svn: 226850
2015-01-22 19:29:46 +00:00
Duncan P. N. Exon Smith 9885469922 IR: Move MDLocation into place
This commit moves `MDLocation`, finishing off PR21433.  There's an
accompanying clang commit for frontend testcases.  I'll attach the
testcase upgrade script I used to PR21433 to help out-of-tree
frontends/backends.

This changes the schema for `DebugLoc` and `DILocation` from:

    !{i32 3, i32 7, !7, !8}

to:

    !MDLocation(line: 3, column: 7, scope: !7, inlinedAt: !8)

Note that empty fields (line/column: 0 and inlinedAt: null) don't get
printed by the assembly writer.

llvm-svn: 226048
2015-01-14 22:27:36 +00:00
Hans Wennborg dcc6e5bc03 SimplifyCFG: check uses of constant-foldable instrs in switch destinations (PR20210)
The previous code assumed that such instructions could not have any uses
outside CaseDest, with the motivation that the instruction could not
dominate CommonDest because CommonDest has phi nodes in it. That simply
isn't true; e.g., CommonDest could have an edge back to itself.

llvm-svn: 225552
2015-01-09 22:13:31 +00:00
Michael Liao 5313da3263 [SimplifyCFG] Revise common code sinking
- Fix the case where more than 1 common instructions derived from the same
  operand cannot be sunk. When a pair of value has more than 1 derived values
  in both branches, only 1 derived value could be sunk.
- Replace BB1 -> (BB2, PN) map with joint value map, i.e.
  map of (BB1, BB2) -> PN, which is more accurate to track common ops.

llvm-svn: 224757
2014-12-23 08:26:55 +00:00
Duncan P. N. Exon Smith be7ea19b58 IR: Make metadata typeless in assembly
Now that `Metadata` is typeless, reflect that in the assembly.  These
are the matching assembly changes for the metadata/value split in
r223802.

  - Only use the `metadata` type when referencing metadata from a call
    intrinsic -- i.e., only when it's used as a `Value`.

  - Stop pretending that `ValueAsMetadata` is wrapped in an `MDNode`
    when referencing it from call intrinsics.

So, assembly like this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata !{i32 %v}, metadata !0)
      call void @llvm.foo(metadata !{i32 7}, metadata !0)
      call void @llvm.foo(metadata !1, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{metadata !3}, metadata !0)
      ret void, !bar !2
    }
    !0 = metadata !{metadata !2}
    !1 = metadata !{i32* @global}
    !2 = metadata !{metadata !3}
    !3 = metadata !{}

turns into this:

    define @foo(i32 %v) {
      call void @llvm.foo(metadata i32 %v, metadata !0)
      call void @llvm.foo(metadata i32 7, metadata !0)
      call void @llvm.foo(metadata i32* @global, metadata !0)
      call void @llvm.foo(metadata !3, metadata !0)
      call void @llvm.foo(metadata !{!3}, metadata !0)
      ret void, !bar !2
    }
    !0 = !{!2}
    !1 = !{i32* @global}
    !2 = !{!3}
    !3 = !{}

I wrote an upgrade script that handled almost all of the tests in llvm
and many of the tests in cfe (even handling many `CHECK` lines).  I've
attached it (or will attach it in a moment if you're speedy) to PR21532
to help everyone update their out-of-tree testcases.

This is part of PR21532.

llvm-svn: 224257
2014-12-15 19:07:53 +00:00
Hans Wennborg 67ead59108 Add some tests for SimplifyCFG's TurnSwitchRangeIntoICmp(). NFC.
llvm-svn: 223396
2014-12-04 22:19:28 +00:00
Hans Wennborg 1096539335 Add some tests for SimplifyCFG's ConstantFoldTerminator(). NFC.
llvm-svn: 223395
2014-12-04 22:19:25 +00:00
Hans Wennborg 5bef5b522b Revert r223049, r223050 and r223051 while investigating test failures.
I didn't foresee affecting the Clang test suite :/

llvm-svn: 223054
2014-12-01 17:36:43 +00:00
Hans Wennborg 269ebb612e SimplifyCFG: Omit range checks for switch lookup tables when default is unreachable
They would get optimized away later, but we might as well not emit them.

llvm-svn: 223051
2014-12-01 17:08:38 +00:00
Hans Wennborg 5a1e5c05d8 SimplifyCFG: don't remove unreachable default switch destinations
An unreachable default destination can be exploited by other optimizations, and
SDag lowering is now prepared to handle them efficiently.

For example, branches to the unreachable destination will be optimized away,
such as in the case of range checks for switch lookup tables.

On 64-bit Linux, this reduces the size of a clang bootstrap by 80 kB (and
Chromium by 30 kB).

llvm-svn: 223050
2014-12-01 17:08:35 +00:00
Erik Eckstein 0d86c7623f reinstate r222872: Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
Fixed missing dominance check.
Original commit message:

This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
   if (idx < tablesize)
      r = table[idx]; // table does not contain default_value
   else
      r = default_value;
   if (r != default_value)
      ...
Is optimized to:
   cond = idx < tablesize;
   if (cond)
      r = table[idx];
   else
      r = default_value;
   if (cond)
      ...
Jump threading will then eliminate the second if(cond).

llvm-svn: 222891
2014-11-27 15:13:14 +00:00
Erik Eckstein 2190cd9ffa Revert "Peephole optimization in switch table lookup: reuse the guarding table comparison if possible."
It is breaking the clang bootstrag.

llvm-svn: 222877
2014-11-27 10:59:08 +00:00
Erik Eckstein e73e308ab9 Peephole optimization in switch table lookup: reuse the guarding table comparison if possible.
This optimization tries to reuse the generated compare instruction, if there is a comparison against the default value after the switch.
Example:
    if (idx < tablesize)
       r = table[idx]; // table does not contain default_value
    else
       r = default_value;
    if (r != default_value)
       ...
Is optimized to:
    cond = idx < tablesize;
    if (cond)
       r = table[idx];
    else
       r = default_value;
    if (cond)
       ...
\endcode
Jump threading will then eliminate the second if(cond).

llvm-svn: 222872
2014-11-27 08:33:51 +00:00
Hans Wennborg bda193edff Remove useless rdar:// comment from switch_to_lookup_table.ll test.
llvm-svn: 222772
2014-11-25 18:45:23 +00:00
Juergen Ributzka c9591e9bdb [SimplifyCFG] Make the value type of the hole check bitmask a power-of-2.
When converting a switch to a lookup table we might have to generate a bitmaks
to encode and check for holes in the original switch statement.

The type of this mask depends on the number of switch statements, which can
result in illegal types for pretty much all architectures.

To avoid unnecessary type legalization and help FastISel this commit increases
the size of the bitmask to next power-of-2 value when necessary.

This fixes rdar://problem/18984639.

llvm-svn: 222168
2014-11-17 19:39:56 +00:00
Erik Eckstein 105374fe5e Optimize switch lookup tables with linear mapping.
This is a simple optimization for switch table lookup:
It computes the output value directly with an (optional) mul and add if there is a linear mapping between index and output.
Example:

int f1(int x) {
  switch (x) {
    case 0: return 10;
    case 1: return 11;
    case 2: return 12;
    case 3: return 13;
  }
  return 0;
}

generates:

define i32 @f1(i32 %x) #0 {
entry:
  %0 = icmp ult i32 %x, 4
  br i1 %0, label %switch.lookup, label %return

switch.lookup:
  %switch.offset = add i32 %x, 10
  ret i32 %switch.offset

return:
  ret i32 0
}

llvm-svn: 222121
2014-11-17 09:13:57 +00:00
Matt Arsenault d6511b49ac Add minnum / maxnum intrinsics
These are named following the IEEE-754 names for these
functions, rather than the libm fmin / fmax to avoid
possible ambiguities. Some languages may implement something
resembling fmin / fmax which return NaN if either operand is
to propagate errors. These implement the IEEE-754 semantics
of returning the other operand if either is a NaN representing
missing data.

llvm-svn: 220341
2014-10-21 23:00:20 +00:00
Marcello Maggioni 5bbe3df63f Switch to select optimization for two-case switches
This is the same optimization of r219233 with modifications to support PHIs with multiple incoming edges from the same block
and a test to check that this condition is handled.

llvm-svn: 219656
2014-10-14 01:58:26 +00:00
Joerg Sonnenberger 5ca10d0edb Revert r219223, it creates invalid PHI nodes.
llvm-svn: 219587
2014-10-12 17:16:04 +00:00
Arnold Schwaighofer d7d010eb2a SimplifyCFG: Don't convert phis into selects if we could remove undef behavior
instead

We used to transform this:

  define void @test6(i1 %cond, i8* %ptr) {
  entry:
    br i1 %cond, label %bb1, label %bb2

  bb1:
    br label %bb2

  bb2:
    %ptr.2 = phi i8* [ %ptr, %entry ], [ null, %bb1 ]
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

into this:

  define void @test6(i1 %cond, i8* %ptr) {
    %ptr.2 = select i1 %cond, i8* null, i8* %ptr
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

because the simplifycfg transformation into selects would happen to happen
before the simplifycfg transformation that removes unreachable control flow
(We have 'unreachable control flow' due to the store to null which is undefined
behavior).

The existing transformation that removes unreachable control flow in simplifycfg
is:

  /// If BB has an incoming value that will always trigger undefined behavior
  /// (eg. null pointer dereference), remove the branch leading here.
  static bool removeUndefIntroducingPredecessor(BasicBlock *BB)

Now we generate:

  define void @test6(i1 %cond, i8* %ptr) {
    store i8 2, i8* %ptr.2, align 8
    ret void
  }

I did not see any impact on the test-suite + externals.

rdar://18596215

llvm-svn: 219462
2014-10-10 01:27:02 +00:00
Marcello Maggioni 963bc87dbd Two case switch to select optimization
This optimization tries to convert switch instructions that are used to select a value with only 2 unique cases + default block
to a select or a couple of selects (depending if the default block is reachable or not).

The typical case this optimization wants to be able to optimize is this one:

Example:
switch (a) {
  case 10:                %0 = icmp eq i32 %a, 10
    return 10;            %1 = select i1 %0, i32 10, i32 4
  case 20:        ---->   %2 = icmp eq i32 %a, 20
    return 2;             %3 = select i1 %2, i32 2, i32 %1
  default:
    return 4;
}

It also sets the base for further optimizations that are planned and being reviewed.

llvm-svn: 219223
2014-10-07 18:16:44 +00:00
Duncan P. N. Exon Smith 176b691d32 Revert "Revert "DI: Fold constant arguments into a single MDString""
This reverts commit r218918, effectively reapplying r218914 after fixing
an Ocaml bindings test and an Asan crash.  The root cause of the latter
was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a
PR to investigate who requires the loose check (and why).

Original commit message follows.

--

This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 219010
2014-10-03 20:01:09 +00:00
Duncan P. N. Exon Smith 786cd049fc Revert "DI: Fold constant arguments into a single MDString"
This reverts commit r218914 while I investigate some bots.

llvm-svn: 218918
2014-10-02 22:15:31 +00:00
Duncan P. N. Exon Smith 571f97bd90 DI: Fold constant arguments into a single MDString
This patch addresses the first stage of PR17891 by folding constant
arguments together into a single MDString.  Integers are stringified and
a `\0` character is used as a separator.

Part of PR17891.

Note: I've attached my testcases upgrade scripts to the PR.  If I've
just broken your out-of-tree testcases, they might help.

llvm-svn: 218914
2014-10-02 21:56:57 +00:00
Adrian Prantl 87b7eb9d0f Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

Note: I accidentally committed a bogus older version of this patch previously.
llvm-svn: 218787
2014-10-01 18:55:02 +00:00
Adrian Prantl b458dc2eee Revert r218778 while investigating buldbot breakage.
"Move the complex address expression out of DIVariable and into an extra"

llvm-svn: 218782
2014-10-01 18:10:54 +00:00
Adrian Prantl 25a7174e7a Move the complex address expression out of DIVariable and into an extra
argument of the llvm.dbg.declare/llvm.dbg.value intrinsics.

Previously, DIVariable was a variable-length field that has an optional
reference to a Metadata array consisting of a variable number of
complex address expressions. In the case of OpPiece expressions this is
wasting a lot of storage in IR, because when an aggregate type is, e.g.,
SROA'd into all of its n individual members, the IR will contain n copies
of the DIVariable, all alike, only differing in the complex address
reference at the end.

By making the complex address into an extra argument of the
dbg.value/dbg.declare intrinsics, all of the pieces can reference the
same variable and the complex address expressions can be uniqued across
the CU, too.
Down the road, this will allow us to move other flags, such as
"indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr)

This patch adds a new LLVM-local tag to DIExpressions, so we can detect
and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

This patch does not touch the "Indirect" field in DIVariable; but moving
that into the expression would be a natural next step.

http://reviews.llvm.org/D4919
rdar://problem/17994491

Thanks to dblaikie and dexonsmith for reviewing this patch!

llvm-svn: 218778
2014-10-01 17:55:39 +00:00
Jingyue Wu fc0296704c [SimplifyCFG] threshold for folding branches with common destination
Summary:
This patch adds a threshold that controls the number of bonus instructions
allowed for folding branches with common destination. The original code allows
at most one bonus instruction. With this patch, users can customize the
threshold to allow multiple bonus instructions. The default threshold is still
1, so that the code behaves the same as before when users do not specify this
threshold.

The motivation of this change is that tuning this threshold significantly (up
to 25%) improves the performance of some CUDA programs in our internal code
base. In general, branch instructions are very expensive for GPU programs.
Therefore, it is sometimes worth trading more arithmetic computation for a more
straightened control flow. Here's a reduced example:

  __global__ void foo(int a, int b, int c, int d, int e, int n,
                      const int *input, int *output) {
    int sum = 0;
    for (int i = 0; i < n; ++i)
      sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i];
    *output = sum;
  }

The select statement in the loop body translates to two branch instructions "if
((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination.
With the default threshold, SimplifyCFG is unable to fold them, because
computing the condition of the second branch "(i | c) ^ d > e" requires two
bonus instructions. With the threshold increased, SimplifyCFG can fold the two
branches so that the loop body contains only one branch, making the code
conceptually look like:

  sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i];

Increasing the threshold significantly improves the performance of this
particular example. In the configuration where both conditions are guaranteed
to be true, increasing the threshold from 1 to 2 improves the performance by
18.24%. Even in the configuration where the first condition is false and the
second condition is true, which favors shortcuts, increasing the threshold from
1 to 2 still improves the performance by 4.35%.

We are still looking for a good threshold and maybe a better cost model than
just counting the number of bonus instructions. However, according to the above
numbers, we think it is at least worth adding a threshold to enable more
experiments and tuning. Let me know what you think. Thanks!

Test Plan: Added one test case to check the threshold is in effect

Reviewers: nadav, eliben, meheff, resistor, hfinkel

Reviewed By: hfinkel

Subscribers: hfinkel, llvm-commits

Differential Revision: http://reviews.llvm.org/D5529

llvm-svn: 218711
2014-09-30 22:23:38 +00:00
Bjorn Steinbrink 3c33150801 Add a test for hoisting instructions with metadata out of then/else blocks
Test for the bug fixed in r215723.

llvm-svn: 217453
2014-09-09 17:10:21 +00:00
Matt Arsenault 85cbc7e371 Make fabs safe to speculatively execute
llvm-svn: 216736
2014-08-29 16:01:17 +00:00
Manman Ren 062f58d550 [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.
When we have a covered lookup table, make sure we don't delete PHINodes that
are cached in PHIs.

rdar://17887153

llvm-svn: 214642
2014-08-02 23:41:54 +00:00
Rafael Espindola d07cf400ab SimplifyCFG: Avoid miscompilations due to removed lifetime intrinsics.
The lifetime intrinsics need some work in order to make it clear which
optimizations are or are not valid.

For now dropping this optimization avoids a miscompilation.

Patch by Björn Steinbrink.

llvm-svn: 214336
2014-07-30 21:04:00 +00:00
Hal Finkel 930469107d Add @llvm.assume, lowering, and some basic properties
This is the first commit in a series that add an @llvm.assume intrinsic which
can be used to provide the optimizer with a condition it may assume to be true
(when the control flow would hit the intrinsic call). Some basic properties are added here:

 - llvm.invariant(true) is dead.
 - llvm.invariant(false) is unreachable (this directly corresponds to the
   documented behavior of MSVC's __assume(0)), so is llvm.invariant(undef).

The intrinsic is tagged as writing arbitrarily, in order to maintain control
dependencies. BasicAA has been updated, however, to return NoModRef for any
particular location-based query so that we don't unnecessarily block code
motion.

llvm-svn: 213973
2014-07-25 21:13:35 +00:00
Manman Ren 29a2005596 Try to fix the bots again by moving test to X86 directory.
llvm-svn: 213884
2014-07-24 17:57:09 +00:00
Manman Ren a8bc9a4c36 Try to fix the bots. If this does not work, I am going to move it to X86 directory.
llvm-svn: 213880
2014-07-24 17:18:33 +00:00
Manman Ren edc60376ed SimplifyCFG: fix a bug in switch to table conversion
We use gep to access the global array "switch.table", and the table index
should be treated as unsigned. When the highest bit is 1, this commit
zero-extends the index to an integer type with larger size.

For a switch on i2, we used to generate:
%switch.tableidx = sub i2 %0, -2
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i2 %switch.tableidx

It is incorrect when %switch.tableidx is 2 or 3. The fix is to generate
%switch.tableidx = sub i2 %0, -2
%switch.tableidx.zext = zext i2 %switch.tableidx to i3
getelementptr inbounds [4 x i64]* @switch.table, i32 0, i3 %switch.tableidx.zext

rdar://17735071

llvm-svn: 213815
2014-07-23 23:13:23 +00:00
Sanjay Patel a932da8f35 Fix for PR17073 ( http://llvm.org/pr17073 ), simplifycfg illegally hoists an operation in a phi node that can trap.
This patch adds to an existing loop over phi nodes in SimplifyCondBranchToCondBranch() to check for trapping ops and bails out of the optimization if we find one of those.

The test cases verify that trapping ops are not hoisted and non-trapping ops are still optimized as expected.

llvm-svn: 212490
2014-07-07 21:19:00 +00:00
Hans Wennborg b03ebfb77e Don't build switch tables for dllimport and TLS variables in GEPs
This is a follow-up to r211331, which failed to notice that we were
returning early from ValidLookupTableConstant for GEPs.

llvm-svn: 211753
2014-06-26 00:30:52 +00:00
Hans Wennborg 4dc895164a Don't build switch lookup tables for dllimport or TLS variables
We would previously put dllimport variables in switch lookup tables, which
doesn't work because the address cannot be used in a constant initializer.
This is basically the same problem that we have in PR19955.

Putting TLS variables in switch tables also desn't work, because the
address of such a variable is not constant.

Differential Revision: http://reviews.llvm.org/D4220

llvm-svn: 211331
2014-06-20 00:38:12 +00:00
Alp Toker d3d017cf00 Reduce verbiage of lit.local.cfg files
We can just split targets_to_build in one place and make it immutable.

llvm-svn: 210496
2014-06-09 22:42:55 +00:00
Matt Arsenault c8fc08c31b Make bitcast, extractelement, and insertelement considered cheap for speculation.
This helps more branches into selects. On R600,
vectors are cheap and anything that helps
remove branches is very good.

llvm-svn: 209914
2014-05-30 18:34:43 +00:00
Louis Gerbarg 1f54b82164 Add ExtractValue instruction to SimplifyCFG's ComputeSpeculationCost
Since ExtractValue is not included in ComputeSpeculationCost CFGs containing
ExtractValueInsts cannot be simplified. In particular this interacts with
InstCombineCompare's tendency to insert add.with.overflow intrinsics for
certain idiomatic math operations, preventing optimization.

This patch adds ExtractValue to the ComputeSpeculationCost. Test case included

rdar://14853450

llvm-svn: 208434
2014-05-09 17:02:46 +00:00
Hans Wennborg b73c0b041d Allow switch-to-lookup table for tables with holes by adding bitmask check
This allows us to generate table lookups for code such as:

  unsigned test(unsigned x) {
    switch (x) {
      case 100: return 0;
      case 101: return 1;
      case 103: return 2;
      case 105: return 3;
      case 107: return 4;
      case 109: return 5;
      case 110: return 6;
      default: return f(x);
    }
  }

Since cases 102, 104, etc. are not constants, the lookup table has holes
in those positions. We therefore guard the table lookup with a bitmask check.

Patch by Jasper Neumann!

llvm-svn: 203694
2014-03-12 18:35:40 +00:00
Tim Northover e94a518a22 IR: add a second ordering operand to cmpxhg for failure
The syntax for "cmpxchg" should now look something like:

	cmpxchg i32* %addr, i32 42, i32 3 acquire monotonic

where the second ordering argument gives the required semantics in the case
that no exchange takes place. It should be no stronger than the first ordering
constraint and cannot be either "release" or "acq_rel" (since no store will
have taken place).

rdar://problem/15996804

llvm-svn: 203559
2014-03-11 10:48:52 +00:00
Matt Arsenault ee364ee729 Allow speculating llvm.sqrt, fma and fmuladd
This doesn't set errno, so this should be OK.
Also update the documentation to explicitly state
that errno are not set.

llvm-svn: 200501
2014-01-31 00:09:00 +00:00
Rafael Espindola ab73c493ea Fix pr14893.
When simplifycfg moves an instruction, it must drop metadata it doesn't know
is still valid with the preconditions changes. In particular, it must drop
the range and tbaa metadata.

The patch implements this with an utility function to drop all metadata not
in a white list.

llvm-svn: 200322
2014-01-28 16:56:46 +00:00
Manman Ren f1cb16e481 PGO branch weight: keep halving the weights until they can fit into
uint32.

When folding branches to common destination, the updated branch weights
can exceed uint32 by more than factor of 2. We should keep halving the
weights until they can fit into uint32.

llvm-svn: 200262
2014-01-27 23:39:03 +00:00