Commit Graph

6066 Commits

Author SHA1 Message Date
Owen Anderson a8d1c3e74e Fix an issue with the MergeBasicBlockIntoOnlyPred() helper function where it did
not properly handle the case where the predecessor block was the entry block to
the function.  The only in-tree client of this is JumpThreading, which worked
around the issue in its own code.  This patch moves the solution into the helper
so that JumpThreading (and other clients) do not have to replicate the same fix
everywhere.

llvm-svn: 212875
2014-07-12 07:12:47 +00:00
Hal Finkel 511fea7acd Feeding isSafeToSpeculativelyExecute its DataLayout pointer (in Sink)
This is the one remaining place I see where passing
isSafeToSpeculativelyExecute a DataLayout pointer might matter (at least for
loads) -- I think I got the others in r212720. Most of the other remaining
callers of isSafeToSpeculativelyExecute only use it for call sites (or
otherwise exclude loads).

llvm-svn: 212730
2014-07-10 16:07:11 +00:00
Hal Finkel a995f92627 Feeding isSafeToSpeculativelyExecute its DataLayout pointer
isSafeToSpeculativelyExecute can optionally take a DataLayout pointer. In the
past, this was mainly used to make better decisions regarding divisions known
not to trap, and so was not all that important for users concerned with "cheap"
instructions. However, now it also helps look through bitcasts for
dereferencable loads, and will also be important if/when we add a
dereferencable pointer attribute.

This is some initial work to feed a DataLayout pointer through to callers of
isSafeToSpeculativelyExecute, generally where one was already available.

llvm-svn: 212720
2014-07-10 14:41:31 +00:00
Hal Finkel 2e42c34d05 Allow isDereferenceablePointer to look through some bitcasts
isDereferenceablePointer should not give up upon encountering any bitcast. If
we're casting from a pointer to a larger type to a pointer to a small type, we
can continue by examining the bitcast's operand. This missing capability
was noted in a comment in the function.

In order for this to work, isDereferenceablePointer now takes an optional
DataLayout pointer (essentially all callers already had such a pointer
available). Most code uses isDereferenceablePointer though
isSafeToSpeculativelyExecute (which already took an optional DataLayout
pointer), and to enable the LICM test case, LICM needs to actually provide its DL
pointer to isSafeToSpeculativelyExecute (which it was not doing previously).

llvm-svn: 212686
2014-07-10 05:27:53 +00:00
Rafael Espindola adf21f2a56 Update the MemoryBuffer API to use ErrorOr.
llvm-svn: 212405
2014-07-06 17:43:13 +00:00
Arnold Schwaighofer ed988fb97d GVN: Preserve invariant.load metadata
If both instructions to be replaced are marked invariant the resulting
instruction is invariant.

rdar://13358910

Fix by Erik Eckstein!

llvm-svn: 211801
2014-06-26 19:51:19 +00:00
Eli Bendersky 5d5e18da3e Rename loop unrolling and loop vectorizer metadata to have a common prefix.
[LLVM part]

These patches rename the loop unrolling and loop vectorizer metadata
such that they have a common 'llvm.loop.' prefix.  Metadata name
changes:

llvm.vectorizer.* => llvm.loop.vectorizer.*
llvm.loopunroll.* => llvm.loop.unroll.*

This was a suggestion from an earlier review
(http://reviews.llvm.org/D4090) which added the loop unrolling
metadata. 

Patch by Mark Heffernan.

llvm-svn: 211710
2014-06-25 15:41:00 +00:00
Evgeniy Stepanov d99cca2c7a Factor out part of LICM::sink into a helper function.
llvm-svn: 211678
2014-06-25 09:17:21 +00:00
Evgeniy Stepanov 10280dac1d [LICM] Don't create more than one copy of an instruction per loop exit block when sinking.
Fixes exponential compilation complexity in PR19835, caused by
LICM::sink not handling the following pattern well:

f = op g
e = op f, g
d = op e
c = op d, e
b = op c
a = op b, c

When an instruction with N uses is sunk, each of its operands gets N
new uses (all of them - phi nodes). In the example above, if a had 1
use, c would have 2, e would have 4, and g would have 8.

llvm-svn: 211673
2014-06-25 07:54:58 +00:00
Dinesh Dwivedi 8bb5fb0661 Updated comments as suggested by Rafael. Thanks.
llvm-svn: 211268
2014-06-19 14:11:53 +00:00
Dinesh Dwivedi 657105e582 Fixed jump threading going to infinite loop.
This patch add code to remove unreachable blocks from function
as they may cause jump threading to stuck in infinite loop.

Differential Revision: http://reviews.llvm.org/D3991

llvm-svn: 211103
2014-06-17 14:34:19 +00:00
Duncan P. N. Exon Smith 73686d305a SROA: Only split loads on byte boundaries
r199771 accidently broke the logic that makes sure that SROA only splits
load on byte boundaries.  If such a split happens, some bits get lost
when reassembling loads of wider types, causing data corruption.

Move the width check up to reject such splits early, avoiding the
corruption.  Fixes PR19250.

Patch by: Björn Steinbrink <bsteinbr@gmail.com>

llvm-svn: 211082
2014-06-17 00:19:35 +00:00
Eli Bendersky ff90324599 Teach LoopUnrollPass to respect loop unrolling hints in metadata.
[This is resubmitting r210721, which was reverted due to suspected breakage
which turned out to be unrelated].

Some extra review comments were addressed. See D4090 and D4147 for more details.

The Clang change that produces this metadata was committed in r210667

Patch by Mark Heffernan.

llvm-svn: 211076
2014-06-16 23:53:02 +00:00
Nick Lewycky b06a796051 Remove extra whitespace in function declaration. No functionality change.
llvm-svn: 210965
2014-06-14 03:48:29 +00:00
Jiangning Liu 96e92c1d75 Move GlobalMerge from Transform to CodeGen.
This patch is to move GlobalMerge pass from Transform/Scalar                                                           
to CodeGen, because GlobalMerge depends on TargetMachine.
In the mean time, the macro INITIALIZE_TM_PASS is also moved
to CodeGen/Passes.h. With this fix we can avoid making
libScalarOpts depend on libCodeGen.

llvm-svn: 210951
2014-06-13 22:57:59 +00:00
Tim Northover 6bf04e4512 SCCP: update for cmpxchg returning { iN, i1 } now.
I accidentally missed this one since its use looked OK locally.

llvm-svn: 210909
2014-06-13 14:54:09 +00:00
Tim Northover 420a216817 IR: add "cmpxchg weak" variant to support permitted failure.
This commit adds a weak variant of the cmpxchg operation, as described
in C++11. A cmpxchg instruction with this modifier is permitted to
fail to store, even if the comparison indicated it should.

As a result, cmpxchg instructions must return a flag indicating
success in addition to their original iN value loaded. Thus, for
uniformity *all* cmpxchg instructions now return "{ iN, i1 }". The
second flag is 1 when the store succeeded.

At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been
added as the natural representation for the new cmpxchg instructions.
It is a strong cmpxchg.

By default this gets Expanded to the existing ATOMIC_CMP_SWAP during
Legalization, so existing backends should see no change in behaviour.
If they wish to deal with the enhanced node instead, they can call
setOperationAction on it. Beware: as a node with 2 results, it cannot
be selected from TableGen.

Currently, no use is made of the extra information provided in this
patch. Test updates are almost entirely adapting the input IR to the
new scheme.

Summary for out of tree users:
------------------------------

+ Legacy Bitcode files are upgraded during read.
+ Legacy assembly IR files will be invalid.
+ Front-ends must adapt to different type for "cmpxchg".
+ Backends should be unaffected by default.

llvm-svn: 210903
2014-06-13 14:24:07 +00:00
Rafael Espindola db4ed0bdab Remove 'using std::errro_code' from lib.
llvm-svn: 210871
2014-06-13 02:24:39 +00:00
Rafael Espindola 3acea39853 Don't use 'using std::error_code' in include/llvm.
This should make sure that most new uses use the std prefix.

llvm-svn: 210835
2014-06-12 21:46:39 +00:00
Duncan P. N. Exon Smith fd5c553f54 GVN: Enable value forwarding for calloc
Enable value forwarding for loads from `calloc()` without an intervening
store.

This change extends GVN to handle the following case:

    %1 = tail call noalias i8* @calloc(i64 1, i64 4)
    %2 = bitcast i8* %1 to i32*
    ; This load is trivially constant zero
    %3 = load i32* %2, align 4

This is analogous to the handling for `malloc()` in the same places.
`malloc()` returns `undef`; `calloc()` returns a zero value.  Note that
it is correct to return zero even for out of bounds GEPs since the
result of such a GEP would be undefined.

Patch by Philip Reames!

llvm-svn: 210828
2014-06-12 21:16:19 +00:00
Eli Bendersky dc6de2ce29 Revert r210721 as it causes breakage in internal builds (and possibly GDB).
llvm-svn: 210807
2014-06-12 18:05:39 +00:00
Eli Bendersky 899bef099f Teach LoopUnrollPass to respect loop unrolling hints in metadata.
See http://reviews.llvm.org/D4090 for more details.

The Clang change that produces this metadata was committed in r210667

Patch by Mark Heffernan.

llvm-svn: 210721
2014-06-11 23:15:35 +00:00
Jiangning Liu d623c528c5 Create macro INITIALIZE_TM_PASS.
Pass initialization requires to initialize TargetMachine for back-end
specific passes. This commit creates a new macro INITIALIZE_TM_PASS to
simplify this kind of initialization.

llvm-svn: 210641
2014-06-11 07:04:37 +00:00
Jiangning Liu b2ae37fb67 Global merge for global symbols.
This commit is to improve global merge pass and support global symbol merge.
The global symbol merge is not enabled by default. For aarch64, we need some
more back-end fix to make it really benifit ADRP CSE.

llvm-svn: 210640
2014-06-11 06:44:53 +00:00
Jiangning Liu 3e5b855a51 Rename global-merge to enable-global-merge.
llvm-svn: 210639
2014-06-11 06:35:26 +00:00
Eric Christopher 2af33756c7 We already have a reference to the TargetMachine, use that.
llvm-svn: 210580
2014-06-10 20:39:39 +00:00
Jingyue Wu 5c7b1aed5d [SeparateConstOffsetFromGEP] inbounds zext => sext for better splitting
For each array index that is in the form of zext(a), convert it to sext(a)
if we can prove zext(a) <= max signed value of typeof(a). The conversion
helps to split zext(x + y) into sext(x) + sext(y).

Reviewed in http://reviews.llvm.org/D4060

llvm-svn: 210444
2014-06-08 23:49:34 +00:00
Jingyue Wu 01ceeb190d [SeparateConstOffsetFromGEP] Fix an illegitimate optimization on zext
zext(a + b) != zext(a) + zext(b) even if a + b >= 0 && b >= 0.

e.g., a = i4 0b1111, b = i4 0b0001
zext a + b to i8 = zext 0b0000 to i8 = 0b00000000
(zext a to i8) + (zext b to i8) = 0b00001111 + 0b00000001 = 0b00010000

llvm-svn: 210439
2014-06-08 20:19:38 +00:00
Jingyue Wu 48a5abeec0 Refactor canonicalizing array indices to a helper function
No functionality changes.

llvm-svn: 210438
2014-06-08 20:15:45 +00:00
Jingyue Wu 84465473e7 Fixed several correctness issues in SeparateConstOffsetFromGEP
Most issues are on mishandling s/zext.

Fixes:

1. When rebuilding new indices, s/zext should be distributed to
sub-expressions. e.g., sext(a +nsw (b +nsw 5)) = sext(a) + sext(b) + 5 but not
sext(a + b) + 5. This also affects the logic of recursively looking for a
constant offset, we need to include s/zext into the context of the searching.

2. Function find should return the bitwidth of the constant offset instead of
always sign-extending it to i64.

3. Stop shortcutting zext'ed GEP indices. LLVM conceptually sign-extends GEP
indices to pointer-size before computing the address. Therefore, gep base,
zext(a + b) != gep base, a + b

Improvements:

1. Add an optimization for splitting sext(a + b): if a + b is proven
non-negative (e.g., used as an index of an inbound GEP) and one of a, b is
non-negative, sext(a + b) = sext(a) + sext(b)

2. Function Distributable checks whether both sext and zext can be distributed
to operands of a binary operator. This helps us split zext(sext(a + b)) to
zext(sext(a) + zext(sext(b)) when a + b does not signed or unsigned overflow.

Refactoring:

Merge some common logic of handling add/sub/or in find.

Testing:

Add many tests in split-gep.ll and split-gep-and-gvn.ll to verify the changes
we made.

llvm-svn: 210291
2014-06-05 22:07:33 +00:00
Benjamin Kramer 4968944376 [Reassociate] Similar to "X + -X" -> "0", added code to handle "X + ~X" -> "-1".
Handle "X + ~X" -> "-1" in the function Value *Reassociate::OptimizeAdd(Instruction *I, SmallVectorImpl<ValueEntry> &Ops);
This patch implements:
TODO: We could handle "X + ~X" -> "-1" if we wanted, since "-X = ~X+1".

Patch by Rahul Jain!

Differential Revision: http://reviews.llvm.org/D3835

llvm-svn: 209973
2014-05-31 15:01:54 +00:00
Michael J. Spencer 289067cc3d Add LoadCombine pass.
This pass is disabled by default. Use -combine-loads to enable in -O[1-3]

Differential revision: http://reviews.llvm.org/D3580

llvm-svn: 209791
2014-05-29 01:55:07 +00:00
Jingyue Wu 80a738dc62 Distribute sext/zext to the operands of and/or/xor
This is an enhancement to SeparateConstOffsetFromGEP. With this patch, we can
extract a constant offset from "s/zext and/or/xor A, B".

Added a new test @ext_or to verify this enhancement.

Refactoring the code, I also extracted some common logic to function
Distributable. 

llvm-svn: 209670
2014-05-27 18:00:00 +00:00
Owen Anderson 115aa160e6 Make the LoopRotate pass's maximum header size configurable both programmatically
and via the command line, mirroring similar functionality in LoopUnroll.  In
situations where clients used custom unrolling thresholds, their intent could
previously be foiled by LoopRotate having a hardcoded threshold.

llvm-svn: 209617
2014-05-26 08:58:51 +00:00
Jingyue Wu bbb6e4a885 Add the extracted constant offset using GEP
Fixed a TODO in r207783.

Add the extracted constant offset using GEP instead of ugly
ptrtoint+add+inttoptr. Using GEP simplifies future optimizations and makes IR
easier to understand. 

Updated all affected tests, and added a new test in split-gep.ll to cover a
corner case where emitting uglygep is necessary.

llvm-svn: 209537
2014-05-23 18:39:40 +00:00
Diego Novillo 7f8af8bf91 Add support for missed and analysis optimization remarks.
Summary:
This adds two new diagnostics: -pass-remarks-missed and
-pass-remarks-analysis. They take the same values as -pass-remarks but
are intended to be triggered in different contexts.

-pass-remarks-missed is used by LLVMContext::emitOptimizationRemarkMissed,
which passes call when they tried to apply a transformation but
couldn't.

-pass-remarks-analysis is used by LLVMContext::emitOptimizationRemarkAnalysis,
which passes call when they want to inform the user about analysis
results.

The patch also:

1- Adds support in the inliner for the two new remarks and a
   test case.

2- Moves emitOptimizationRemark* functions to the llvm namespace.

3- Adds an LLVMContext argument instead of making them member functions
   of LLVMContext.

Reviewers: qcolombet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D3682

llvm-svn: 209442
2014-05-22 14:19:46 +00:00
Quentin Colombet c88baa5c10 [LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN.
This commit introduces a canonical representation for the formulae.
Basically, as soon as a formula has more that one base register, the scaled
register field is used for one of them. The register put into the scaled
register is preferably a loop variant.
The commit refactors how the formulae are built in order to produce such
representation.
This yields a more accurate, but still perfectible, cost model.

<rdar://problem/16731508>

llvm-svn: 209230
2014-05-20 19:25:04 +00:00
Matt Arsenault 04b67ceeeb Use range for
llvm-svn: 209147
2014-05-19 17:52:48 +00:00
Rafael Espindola 5a52b9f139 Revert "Implement global merge optimization for global variables."
This reverts commit r208934.

The patch depends on aliases to GEPs with non zero offsets. That is not
supported and fairly broken.

The good news is that GlobalAlias is being redesigned and will have support
for offsets, so this patch should be a nice match for it.

llvm-svn: 208978
2014-05-16 13:02:18 +00:00
Jiangning Liu 932e1c3924 Implement global merge optimization for global variables.
This commit implements two command line switches -global-merge-on-external
and -global-merge-aligned, and both of them are false by default, so this
optimization is disabled by default for all targets.

For ARM64, some back-end behaviors need to be tuned to get this optimization
further enabled.

llvm-svn: 208934
2014-05-15 23:45:42 +00:00
Alp Toker beaca19c7c Fix typos
llvm-svn: 208839
2014-05-15 01:52:21 +00:00
Jay Foad a0653a3e6c Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Benjamin Kramer 3f085ba6d6 GVN: Fix non-determinism in map iteration.
Iterating over a DenseMaop is non-deterministic and results to unpredictable IR
output.

Based on a patch by Daniel Reynaud!

llvm-svn: 208728
2014-05-13 21:06:40 +00:00
Benjamin Kramer d97f95e2f2 GVN: rangify a couple of loops.
No functionality change.

llvm-svn: 208727
2014-05-13 21:06:36 +00:00
Nick Lewycky 54a824b758 Improve wording to make it sounds more like a change than an analysis.
llvm-svn: 208370
2014-05-08 23:04:46 +00:00
Richard Smith c45f3f7433 Simplify and fix incorrect comment. No functionality change.
llvm-svn: 208272
2014-05-08 01:08:43 +00:00
Nick Lewycky 7185b5d60c Detabify.
llvm-svn: 208019
2014-05-06 00:46:20 +00:00
Nick Lewycky 5ef6bc8815 Improve 'tail' call marking in TRE. A bootstrap of clang goes from 375k calls marked tail in the IR to 470k, however this improvement does not carry into an improvement of the call/jmp ratio on x86. The most common pattern is a tail call + br to a block with nothing but a 'ret'.
The number of tail call to loop conversions remains the same (1618 by my count).

The new algorithm does a local scan over the use-def chains to identify local "alloca-derived" values, as well as points where the alloca could escape. Then, a visit over the CFG marks blocks as being before or after the allocas have escaped, and annotates the calls accordingly.

llvm-svn: 208017
2014-05-05 23:59:03 +00:00
Benjamin Kramer 9130cb8547 LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
Otherwise we use the same threshold as for complete unrolling, which is
way too high. This made us unroll any loop smaller than 150 instructions
by 8 times, but only if someone specified -march=core2 or better,
which happens to be the default on darwin.

llvm-svn: 207940
2014-05-04 19:12:38 +00:00
Akira Hatanaka f76388dd7e [GVN] Pass the phi-translated address of a load instead of the untranslated
address to AnalyzeLoadFromClobberingLoad. This fixes a bug in load-PRE where
PRE is applied to a load that is not partially redundant.

<rdar://problem/16638765>.

llvm-svn: 207853
2014-05-02 17:59:17 +00:00