Commit Graph

222 Commits

Author SHA1 Message Date
Jun Bum Lim 4230101def [CodeGenPrep]Restructure promoting Ext to form ExtLoad
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524.  Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.

Reviewers: jmolloy, qcolombet, mcrosier, javed.absar

Reviewed By: qcolombet

Subscribers: rengolin, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27853

llvm-svn: 298114
2017-03-17 19:05:21 +00:00
Matt Arsenault 02d915be90 CodeGenPrepare: Sink addressing modes for atomics
llvm-svn: 297903
2017-03-15 22:35:20 +00:00
Michael Kuperstein 13bf8a2684 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296416
2017-02-28 00:11:34 +00:00
Daniel Jasper 3ca4525612 Revert "[CGP] Split some critical edges coming out of indirect branches"
This reverts commit r296149 as it leads to crashes when compiling for
PPC.

llvm-svn: 296295
2017-02-26 11:09:12 +00:00
Eli Friedman c12a5a7595 [CodeGenPrepare] Make -addr-sink-using-gep work with address spaces.
When we construct addressing modes, we use isNoopAddrSpaceCast to ignore
addrspacecast instructions. Make sure we insert the correct addrspacecast
when we reconstruct the addressing mode.

Differential Revision: https://reviews.llvm.org/D30114

llvm-svn: 296167
2017-02-24 20:51:36 +00:00
Michael Kuperstein 46b131e3f8 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296149
2017-02-24 18:41:32 +00:00
Michael Kuperstein 581c9f4b20 Revert r269060 to pacify bots.
llvm-svn: 296064
2017-02-24 01:22:19 +00:00
Michael Kuperstein 12e79d5002 [CGP] Split some critical edges coming out of indirect branches
Splitting critical edges when one of the source edges is an indirectbr
is hard in general (because it requires changing the memory the indirectbr
reads). But if a block only has a single indirectbr predecessor (which is
the common case), we can simulate splitting that edge by splitting
the destination block, and retargeting the *direct* branches.

This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()
ends up using an indirect branch with ~100 successors, and passing a constant to
each of those. Since MachineSink can't break indirect critical edges on demand
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100
defs of registers containing constants, which we in the predecessor block, where
only one of those constants is used in each successor. So, at each computed goto,
we needlessly spill about a 100 constants to stack. The end result is that a
clang-compiled python interpreter can be about ~2.5x slower on a simple python
reduction loop than a gcc-compiled interpreter.

Differential Revision: https://reviews.llvm.org/D29916

llvm-svn: 296060
2017-02-24 00:56:21 +00:00
Geoff Berry 5d534b6a11 [CodeGenPrepare] Sink and duplicate more 'and' instructions.
Summary:
Rework the code that was sinking/duplicating (icmp and, 0) sequences
into blocks where they were being used by conditional branches to form
more tbz instructions on AArch64.  The new code is more general in that
it just looks for 'and's that have all icmp 0's as users, with a target
hook used to select which subset of 'and' instructions to consider.
This change also enables 'and' sinking for X86, where it is more widely
beneficial than on AArch64.

The 'and' sinking/duplicating code is moved into the optimizeInst phase
of CodeGenPrepare, where it can take advantage of the fact the
OptimizeCmpExpression has already sunk/duplicated any icmps into the
blocks where they are used.  One minor complication from this change is
that optimizeLoadExt needed to be updated to always mark 'and's it has
determined should be in the same block as their feeding load in the
InsertedInsts set to avoid an infinite loop of hoisting and sinking the
same 'and'.

This change fixes a regression on X86 in the tsan runtime caused by
moving GVNHoist to a later place in the optimization pipeline (see
PR31382).

Reviewers: t.p.northover, qcolombet, MatzeB

Subscribers: aemerson, mcrosier, sebpop, llvm-commits

Differential Revision: https://reviews.llvm.org/D28813

llvm-svn: 295746
2017-02-21 18:53:14 +00:00
Matt Arsenault 1672b1b86e TargetLowering: Remove AddrSpace parameter from GetAddrModeArguments
It doesn't make any sense to pass in to what is supposed to be parsing
the call, and this can be inferred from the pointer output.

llvm-svn: 294412
2017-02-08 07:09:03 +00:00
Igor Laevsky 3be81bae63 [CodeGenPrepare] Hoist all getSubtargetImpl calls to the beginning of the pass
Differential Revision: https://reviews.llvm.org/D29456

llvm-svn: 294301
2017-02-07 13:27:20 +00:00
Jun Bum Lim b99a06b7c9 [CodeGenPrep]No negative cost in the ExtLd promotion
Summary: This change prevent the signed value of cost from being negative as the value is passed as an unsigned argument.

Reviewers: mcrosier, jmolloy, qcolombet, javed.absar

Reviewed By: mcrosier, qcolombet

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28871

llvm-svn: 293307
2017-01-27 17:16:37 +00:00
Haicheng Wu 8ce2d14356 [CodeGenPrepare] Fix a typo in the comment. NFC.
encode => endcode.

Differential Revision: https://reviews.llvm.org/D28866

llvm-svn: 292438
2017-01-18 21:12:10 +00:00
Wei Mi a2f0b594c2 Redo store splitting in CodeGenPrepare.
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.

Differential Revision: https://reviews.llvm.org/D25914

llvm-svn: 290365
2016-12-22 19:44:45 +00:00
George Burgess IV 3f08914e7e [Analysis] Centralize objectsize lowering logic.
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.

llvm-svn: 290214
2016-12-20 23:46:36 +00:00
Jun Bum Lim 90b6b5074a [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block and unit test failures in AVR and WebAssembly :

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289988
2016-12-16 20:38:39 +00:00
Sanjoy Das 007572706b Inline stripInvariantGroupMetadata out of existence
As a one liner function, I don't think it is pulling its weight in terms
of helping readability.

llvm-svn: 289987
2016-12-16 20:29:39 +00:00
Sanjoy Das 089c699743 Fix CodeGenPrepare::stripInvariantGroupMetadata
`dropUnknownNonDebugMetadata` takes a list of "known" metadata IDs.  The
only reason it worked at all is that `getMetadataID` returns something
unrelated -- it returns the subclass ID of the receiver (which is used
in `dyn_cast` etc.).  That does not numerically match
`LLVMContext::MD_invariant_group` and ends up dropping `invariant_group`
along with every other metadata that does not numerically match
`LLVMContext::MD_invariant_group`.

llvm-svn: 289973
2016-12-16 18:52:33 +00:00
Jun Bum Lim f9416af191 Revert "[CodeGenPrep] Skip merging empty case blocks"
This reverts commit r289951.

llvm-svn: 289960
2016-12-16 17:06:14 +00:00
Jun Bum Lim 85347dde27 [CodeGenPrep] Skip merging empty case blocks
This is recommit of r287553 after fixing the invalid loop info after eliminating an empty block:

Summary: Merging an empty case block into the header block of switch could cause ISel to add COPY instructions in the header of switch, instead of the case block, if the case block is used as an incoming block of a PHI. This could potentially increase dynamic instructions, especially when the switch is in a loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, joerg, davidxl

Subscribers: joerg, qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 289951
2016-12-16 16:03:31 +00:00
Peter Collingbourne ab85225be4 IR: Change the gep_type_iterator API to avoid always exposing the "current" type.
Instead, expose whether the current type is an array or a struct, if an array
what the upper bound is, and if a struct the struct type itself. This is
in preparation for a later change which will make PointerType derive from
Type rather than SequentialType.

Differential Revision: https://reviews.llvm.org/D26594

llvm-svn: 288458
2016-12-02 02:24:42 +00:00
Joerg Sonnenberger caaa82d90d Revert r287553: [CodeGenPrep] Skip merging empty case blocks
It results in assertions in lib/Analysis/BlockFrequencyInfoImpl.cpp line
670 ("Expected irreducible CFG").

llvm-svn: 288052
2016-11-28 18:56:54 +00:00
Justin Lebar 3e50a5be8f [CodeGenPrepare] Don't sink non-cheap addrspacecasts.
Summary:
Previously, CGP would unconditionally sink addrspacecast instructions,
even going so far as to sink them into a loop.

Now we check that the cast is "cheap", as defined by TLI.

We introduce a new "is-cheap" function to TLI rather than using
isNopAddrSpaceCast because some GPU platforms want the ability to ask
for non-nop casts to be sunk.

Reviewers: arsenm, tra

Subscribers: jholewinski, wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26923

llvm-svn: 287591
2016-11-21 22:49:15 +00:00
Justin Lebar 838c7f5a85 [CodeGenPrepare] Rewrite a loop in terms of llvm::none_of. NFC.
Reviewers: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D26924

llvm-svn: 287590
2016-11-21 22:49:11 +00:00
Jun Bum Lim 82f55c5446 [CodeGenPrep] Skip merging empty case blocks
Summary: Merging an empty case block into the header block of switch could cause
ISel to add COPY instructions in the header of switch, instead of the case
block, if the case block is used as an incoming block of a PHI. This could
potentially increase dynamic instructions, especially when the switch is in a
loop. I added a test case which was reduced from the benchmark I was targetting.

Reviewers: t.p.northover, mcrosier, manmanren, wmi, davidxl

Subscribers: qcolombet, danielcdh, hfinkel, mcrosier, llvm-commits

Differential Revision: https://reviews.llvm.org/D22696

llvm-svn: 287553
2016-11-21 16:47:28 +00:00
Simon Pilgrim f2fbf43704 Fix comment typos. NFC.
Identified by Pedro Giffuni in PR27636.

llvm-svn: 287490
2016-11-20 13:47:59 +00:00
Chandler Carruth 0f139b4f71 Hoist check for TLI above all of the attempts to use it (including one
of which that is hidden inside a separate function call) and helpfully
before building expensive transaction infrastructure. This will avoid
crashing when running CGP in a generic mode if we ever managed to hit
this case.

Note that I spent some time looking at alternatives. CGP is actually
used without a TM or TLI in order to do some target-independent testing.
Further, all of the neighboring optimization techniques actually have
some paths that are effective even in the absence of TLI so this seemed
the correct scope at which to check and bypass logic. It still isn't
clear that long-term support for missing TM/TLI is the right
cost/benefit tradeoff for CGP -- we seem to get relatively little for it
and the code is just littered with checks (and assumptions which
I suspect are still missing some checks).

This at least fixes the potential bug in this code spotted by
PVS-Studio, so we've got that going for us. ;]

llvm-svn: 285987
2016-11-04 06:54:00 +00:00
Dehao Chen 302b69c940 Use profile info to set function section prefix to group hot/cold functions.
Summary:
The original implementation is in r261607, which was reverted in r269726 to accomendate the ProfileSummaryInfo analysis pass. The new implementation:
1. add a new metadata for function section prefix
2. query against ProfileSummaryInfo in CGP to set the correct section prefix for each function
3. output the section prefix set by CGP

Reviewers: davidxl, eraman

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D24989

llvm-svn: 284533
2016-10-18 20:42:47 +00:00
Andrea Di Biagio fa90c692db [CodeGenPrepare] When moving a zext near to its associated load, do not retain the original debug location.
CodeGenPrepare knows how to move a zext of a load into the same basic block
where the load lives. The goal is to help ISel match a zero-extending load
instead of two separated instructions.

CGP attempts to move a zext computation even if it lives in a basic block that
does not post-dominate the load's basic block. That means, the hoisted zext may
be speculated. Preserving the zext location would hurt the debugging experience
and the quality of sample pgo.
With this patch, when moving a zext near to its associated load, CGP no longer
propagates the zext's debug location. Instead, CGP conservatively reuses the
same debug location for the load and the zext.

An alternative approach would be to assign an artificial line-0 location to the
zext. However we don't want to over-use the 'line-0' for this particular case
because it would have a size cost in the line-table section for no additional
benefit.

Differential Revision: https://reviews.llvm.org/D25611

llvm-svn: 284377
2016-10-17 11:32:26 +00:00
Wolfgang Pieb e51bede1d8 Preserve the debug location when CodeGenPrepare sinks a compare instruction into the
basic block of a user.

Patch by Andrea DiBiagio.

Differential Revision: https://reviews.llvm.org/D24632

llvm-svn: 283500
2016-10-06 21:43:45 +00:00
Mehdi Amini 117296c0a0 Use StringRef in Pass/PassManager APIs (NFC)
llvm-svn: 283004
2016-10-01 02:56:57 +00:00
Dehao Chen c32d71253c Fix the bug introduced in r281252.
llvm-svn: 281253
2016-09-12 20:29:54 +00:00
Dehao Chen 9bbb941acf Lower consecutive select instructions correctly.
Summary: If consecutive select instructions are lowered separately in CGP, it will introduce redundant condition check and branches that cannot be removed by later optimization phases. This patch lowers all consecutive select instructions at the same to to avoid inefficent code as demonstrated in https://llvm.org/bugs/show_bug.cgi?id=29095

Reviewers: davidxl

Subscribers: vsk, llvm-commits

Differential Revision: https://reviews.llvm.org/D24147

llvm-svn: 281252
2016-09-12 20:23:28 +00:00
Michael Kuperstein f79af6f8c4 [CGP] Be less conservative about tail-duplicating a ret to allow tail calls
CGP tail-duplicates rets into blocks that end with a call that feed the ret.
This puts the call in tail position, potentially allowing the DAG builder to
lower it as a tail call. To avoid tail duplication in cases where we won't
form the tail call, CGP tried to predict whether this is going to be possible,
and avoids doing it when lowering as a tail call will definitely fail.
However, it was being too conservative by always throwing away calls to
functions with a signext/zeroext attribute on the return type.

Instead, we can use the same logic the builder uses to determine whether the
attributes work out.

Differential Revision: https://reviews.llvm.org/D24315

llvm-svn: 280894
2016-09-08 00:48:37 +00:00
Michael Kuperstein 71321563de Don't reuse a variable name in a nested scope. NFC.
llvm-svn: 280853
2016-09-07 20:29:49 +00:00
Xinliang David Li 241e6c7086 [Profile] preserve branch metadata lowering select in CGP
CGP currently drops select's MD_prof profile data when
generating conditional branch which can lead to bad
code layout. The patch fixes the issue.

Differential Revision: http://reviews.llvm.org/D24169

llvm-svn: 280600
2016-09-03 21:26:36 +00:00
David Majnemer 0d955d0bf5 Use the range variant of find instead of unpacking begin/end
If the result of the find is only used to compare against end(), just
use is_contained instead.

No functionality change is intended.

llvm-svn: 278433
2016-08-11 22:21:41 +00:00
Tim Northover 918f05063c CodeGenPrep: use correct function to determine Global's alignment.
Elsewhere (particularly computeKnownBits) we assume that a global will be
aligned to the value returned by Value::getPointerAlignment. This is used to
boost the alignment on memcpy/memset, so any target-specific request can only
increase that value.

llvm-svn: 275866
2016-07-18 18:28:52 +00:00
Chad Rosier a00df49dc5 Clarify that we match BSwap in InstCombine and BitReverse in CGP. NFC.
Also, rename recognizeBitReverseOrBSwapIdiom to recognizeBSwapOrBitReverseIdiom,
so the ordering of the MatchBSwaps and MatchBitReversals arguments are
consistent with the function name.

llvm-svn: 270715
2016-05-25 16:22:14 +00:00
Jun Bum Lim be11bdc4b0 Rename getLargestLegalIntTypeSize to getLargestLegalIntTypeSizeInBits(). NFC.
Summary: Rename DataLayout::getLargestLegalIntTypeSize to DataLayout::getLargestLegalIntTypeSizeInBits() to prevent similar mistakes  fixed in r269433.

Reviewers: joker.eph, mcrosier

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D20248

llvm-svn: 269456
2016-05-13 18:38:35 +00:00
Sanjay Patel c7b91e65d8 [CGP] avoid crashing from weightlessness
It's possible that we have branch weights with 0 values.
In that case, don't try to create an impossible BranchProbability.

llvm-svn: 268935
2016-05-09 17:31:55 +00:00
David Majnemer 0c80e2eac6 [CodeGenPrepare] Don't sink a cast past its user
The sink cast machinery is supposed to sink casts as close to their user
as possible.  However, an EH pad is the first instruction in it's basic
block.  Don't sink if the user is an EH pad.

This fixes PR27536.

llvm-svn: 267767
2016-04-27 19:36:38 +00:00
Sanjay Patel d66607bd8c [CodeGenPrepare] use branch weight metadata to decide if a select should be turned into a branch
This is part of solving PR27344:
https://llvm.org/bugs/show_bug.cgi?id=27344

CGP should undo the SimplifyCFG transform for the same reason that earlier patches have used this
same mechanism: it's possible that passes between SimplifyCFG and CGP may be able to optimize the
IR further with a select in place.

For the TLI hook default, >99% taken or not taken is chosen as the default threshold for a highly
predictable branch. Even the most limited HW branch predictors will be correct on this branch almost
all the time, so even a massive mispredict penalty perf loss would be overcome by the win from all
the times the branch was predicted correctly.

As a follow-up, we could make the default target hook less conservative by using the SchedMachineModel's
MispredictPenalty. Or we could just let targets override the default by implementing the hook with that
and other target-specific options. Note that trying to statically determine mispredict rates for 
close-to-balanced profile weight data is generally impossible if the HW is sufficiently advanced. Ie, 
50/50 taken/not-taken might still be 100% predictable.

Finally, note that this patch as-is will not solve PR27344 because the current __builtin_unpredictable()
branch weight default values are 4 and 64. A proposal to change that is in D19435.

Differential Revision: http://reviews.llvm.org/D19488

llvm-svn: 267572
2016-04-26 17:11:17 +00:00
Sanjay Patel a31b0c0ece [CodeGenPrepare] don't convert an unpredictable select into control flow
Suggested in the review of D19488:
http://reviews.llvm.org/D19488

llvm-svn: 267504
2016-04-26 00:47:39 +00:00
Sanjay Patel dc88bd6e1f replace duplicated static functions for profile metadata access with BranchInst member function; NFCI
llvm-svn: 267295
2016-04-23 20:01:22 +00:00
Andrew Kaylor aa641a5171 Re-commit optimization bisect support (r267022) without new pass manager support.
The original commit was reverted because of a buildbot problem with LazyCallGraph::SCC handling (not related to the OptBisect handling).

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267231
2016-04-22 22:06:11 +00:00
Vedant Kumar 6013f45f92 Revert "Initial implementation of optimization bisect support."
This reverts commit r267022, due to an ASan failure:

  http://lab.llvm.org:8080/green/job/clang-stage2-cmake-RgSan_check/1549

llvm-svn: 267115
2016-04-22 06:51:37 +00:00
Andrew Kaylor f0f279291c Initial implementation of optimization bisect support.
This patch implements a optimization bisect feature, which will allow optimizations to be selectively disabled at compile time in order to track down test failures that are caused by incorrect optimizations.

The bisection is enabled using a new command line option (-opt-bisect-limit).  Individual passes that may be skipped call the OptBisect object (via an LLVMContext) to see if they should be skipped based on the bisect limit.  A finer level of control (disabling individual transformations) can be managed through an addition OptBisect method, but this is not yet used.

The skip checking in this implementation is based on (and replaces) the skipOptnoneFunction check.  Where that check was being called, a new call has been inserted in its place which checks the bisect limit and the optnone attribute.  A new function call has been added for module and SCC passes that behaves in a similar way.

Differential Revision: http://reviews.llvm.org/D19172

llvm-svn: 267022
2016-04-21 17:58:54 +00:00
Petar Jovanovic 644b8c1a5d Calculate __builtin_object_size when pointer depends on a condition
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D18438

llvm-svn: 266193
2016-04-13 12:25:25 +00:00
Sanjay Patel 892f167aa5 use range-loops; NFCI
llvm-svn: 265985
2016-04-11 20:13:44 +00:00