Commit Graph

13971 Commits

Author SHA1 Message Date
Chandler Carruth a8125350ea [FunctionAttrs] Separate another chunk of the logic for functionattrs
from its pass harness by providing a lambda to query for AA results.

This allows the legacy pass to easily provide a lambda that uses the
special helpers to construct function AA results from a legacy CGSCC
pass. With the new pass manager (the next patch) the lambda just
directly wraps the intuitive query API.

llvm-svn: 251715
2015-10-30 16:48:08 +00:00
Dehao Chen 49359bf3d7 Recommit r251680 (also need to update clang test)
Update the discriminator assignment algorithm

* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.

original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }

; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3

llvm-svn: 251689
2015-10-30 05:07:15 +00:00
Dehao Chen 4d84b9321e Revert r251680:
Update the discriminator assignment algorithm

* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.

original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }

; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3

llvm-svn: 251685
2015-10-30 04:29:05 +00:00
Dehao Chen 9a5d2b18e0 Update the discriminator assignment algorithm
* If a scope has already been assigned a discriminator, do not reassign a nested discriminator for it.
* If the file and line both match, even if the column does not match, we should assign a new discriminator for the stmt.

original code:
; #1 int foo(int i) {
; #2 if (i == 3 || i == 5) return 100; else return 99;
; #3 }

; i == 3: discriminator 0
; i == 5: discriminator 2
; return 100: discriminator 1
; return 99: discriminator 3

llvm-svn: 251680
2015-10-30 02:38:29 +00:00
Dehao Chen 7ddf7865b4 clang-format lib/Transforms/Utils/AddDiscriminators.cpp
llvm-svn: 251656
2015-10-29 21:25:33 +00:00
Chandler Carruth c518ebddf3 [FunctionAttrs] Provide a single SCC node set to all of the
transformations in FunctionAttrs rather than building a new one each
time.

This isn't trivial because there are different heuristics from different
passes for exactly what set they want. The primary difference is whether
an *overridable* function completely disables the synthesis of
attributes. I've modeled this by directly testing for overridable, and
using the common set that excludes external and opt-none functions.

This does cause some changes by disabling more optimizations in the face
of opt-none. Specifically, we were still optimizing *calls* to opt-none
functions based on their attributes, just not the bodies. It seems
better to be conservative on both fronts given the intended semanticas
here (best effort to not assume or disturb anything). I've not tried to
test this change as it seems complex, brittle, and not important to the
implicit contract of opt-none. Instead, it seems more like a choice that
should be dictated by the simplified implementation and the change to be
acceptable differences within the space of opt-none.

A big benefit here is that these transformations no longer rely on the
legacy pass manager's SCC types, they just work on generic sets of
function pointers. This will make it easy to re-use their logic in the
new pass manager.

I've also made the transforms static functions instead of members where
trivial while I was touching the signatures.

llvm-svn: 251640
2015-10-29 18:29:15 +00:00
Adhemerval Zanella 1edb084919 [sanitizer] [msan] Unify aarch64 mapping
This patch unify the 39-bit and 42-bit mapping for aarch64 to use only
one instrumentation algorithm.  This removes compiler flag 
SANITIZER_AARCH64_VMA requirement for MSAN on aarch64.

The mapping to use now is for 39 and 42-bits:

    0x00000000000ULL-0x01000000000ULL  MappingDesc::INVALID
    0x01000000000ULL-0x02000000000ULL  MappingDesc::SHADOW
    0x02000000000ULL-0x03000000000ULL  MappingDesc::ORIGIN
    0x03000000000ULL-0x04000000000ULL  MappingDesc::SHADOW
    0x04000000000ULL-0x05000000000ULL  MappingDesc::ORIGIN
    0x05000000000ULL-0x06000000000ULL  MappingDesc::APP
    0x06000000000ULL-0x07000000000ULL  MappingDesc::INVALID
    0x07000000000ULL-0x08000000000ULL  MappingDesc::APP

And only for 42-bits:

    0x08000000000ULL-0x09000000000ULL  MappingDesc::INVALID
    0x09000000000ULL-0x0A000000000ULL  MappingDesc::SHADOW
    0x0A000000000ULL-0x0B000000000ULL  MappingDesc::ORIGIN
    0x0B000000000ULL-0x0F000000000ULL  MappingDesc::INVALID
    0x0F000000000ULL-0x10000000000ULL  MappingDesc::APP
    0x10000000000ULL-0x11000000000ULL  MappingDesc::INVALID
    0x11000000000ULL-0x12000000000ULL  MappingDesc::APP
    0x12000000000ULL-0x17000000000ULL  MappingDesc::INVALID
    0x17000000000ULL-0x18000000000ULL  MappingDesc::SHADOW
    0x18000000000ULL-0x19000000000ULL  MappingDesc::ORIGIN
    0x19000000000ULL-0x20000000000ULL  MappingDesc::INVALID
    0x20000000000ULL-0x21000000000ULL  MappingDesc::APP
    0x21000000000ULL-0x26000000000ULL  MappingDesc::INVALID
    0x26000000000ULL-0x27000000000ULL  MappingDesc::SHADOW
    0x27000000000ULL-0x28000000000ULL  MappingDesc::ORIGIN
    0x28000000000ULL-0x29000000000ULL  MappingDesc::SHADOW
    0x29000000000ULL-0x2A000000000ULL  MappingDesc::ORIGIN
    0x2A000000000ULL-0x2B000000000ULL  MappingDesc::APP
    0x2B000000000ULL-0x2C000000000ULL  MappingDesc::INVALID
    0x2C000000000ULL-0x2D000000000ULL  MappingDesc::SHADOW
    0x2D000000000ULL-0x2E000000000ULL  MappingDesc::ORIGIN
    0x2E000000000ULL-0x2F000000000ULL  MappingDesc::APP
    0x2F000000000ULL-0x39000000000ULL  MappingDesc::INVALID
    0x39000000000ULL-0x3A000000000ULL  MappingDesc::SHADOW
    0x3A000000000ULL-0x3B000000000ULL  MappingDesc::ORIGIN
    0x3B000000000ULL-0x3C000000000ULL  MappingDesc::APP
    0x3C000000000ULL-0x3D000000000ULL  MappingDesc::INVALID
    0x3D000000000ULL-0x3E000000000ULL  MappingDesc::SHADOW
    0x3E000000000ULL-0x3F000000000ULL  MappingDesc::ORIGIN
    0x3F000000000ULL-0x40000000000ULL  MappingDesc::APP

And although complex it provides a better memory utilization that
previous one.

llvm-svn: 251624
2015-10-29 13:02:30 +00:00
Daniel Jasper 1de905a667 Fix use-after-free. Thanks ASAN for giving me a detailed report :-).
llvm-svn: 251623
2015-10-29 12:49:37 +00:00
Cong Hou 45bd8ce64c Revert the revision 251592 as it fails a test on some platforms.
llvm-svn: 251617
2015-10-29 05:35:22 +00:00
Xinliang David Li 7a88ad6476 [PGO] Do not emit runtime hook user function for Linux
Clang driver now injects -u<hook_var> flag in the linker 
command line, in which case user function is not needed 
any more.

Differential Revision: http://reviews.llvm.org/D14033

llvm-svn: 251612
2015-10-29 04:08:31 +00:00
Philip Reames eb3e9dad7f [LVI/CVP] Teach LVI about range metadata
Somewhat shockingly for an analysis pass which is computing constant ranges, LVI did not understand the ranges provided by range metadata.

As part of this change, I included a change to CVP primarily because doing so made it much easier to write small self contained test cases. CVP was previously only handling the non-local operand case, but given that LVI can sometimes figure out information about instructions standalone, I don't see any reason to restrict this.  There could possibly be a compile time impact from this, but I suspect it should be minimal.  If anyone has an example which substaintially regresses, please let me know.  I could restrict the block local handling to ICmps feeding Terminator instructions if needed.  

Note that this patch continues a somewhat bad practice in LVI. In many cases, we know facts about values, and separate context sensitive facts about values. LVI makes no effort to distinguish and will frequently cache the same value fact repeatedly for different contexts. I would like to change this, but that's a large enough change that I want it to go in separately with clear documentation of what's changing. Other examples of this include the non-null handling, and arguments.

As a meta comment: the entire motivation of this change was being able to write smaller (aka reasonable sized) test cases for a future patch teaching LVI about select instructions.

Differential Revision: http://reviews.llvm.org/D13543

llvm-svn: 251606
2015-10-29 03:57:17 +00:00
Philip Reames 846e3e41ed [SimplifyCFG] Constant fold a branch implied by it's incoming edge
The most common use case is when eliminating redundant range checks in an example like the following:
c = a[i+1] + a[i];

Note that all the smarts of the transform (the implication engine) is already in ValueTracking and is tested directly through InstructionSimplify.

Differential Revision: http://reviews.llvm.org/D13040

llvm-svn: 251596
2015-10-29 03:11:49 +00:00
Davide Italiano a904e520c2 [SimplifyLibCalls] Factor out common unsafe-math checks.
llvm-svn: 251595
2015-10-29 02:58:44 +00:00
Cong Hou abe042bb3e Add a flag vectorizer-maximize-bandwidth in loop vectorizer to enable using larger vectorization factor.
To be able to maximize the bandwidth during vectorization, this patch provides a new flag vectorizer-maximize-bandwidth. When it is turned on, the vectorizer will determine the vectorization factor (VF) using the smallest instead of widest type in the loop. To avoid increasing register pressure too much, estimates of the register usage for different VFs are calculated so that we only choose a VF when its register usage doesn't exceed the number of available registers.

llvm-svn: 251592
2015-10-29 01:28:44 +00:00
Diego Novillo 748b3ffe3b SamplePGO - Add flag to check sampling coverage.
This adds the flag -mllvm -sample-profile-check-coverage=N to the
SampleProfile pass. N is the percent of input sample records that the
user expects to apply.  If the pass does not use N% (or more) of the
sample records in the input, it emits a warning.

This is useful to detect some forms of stale profiles. If the code has
drifted enough from the original profile, there will be records that do
not match the IR anymore.

This will not detect cases where a sample profile record for line L is
referring to some other instructions that also used to be at line L.

llvm-svn: 251568
2015-10-28 22:30:25 +00:00
Sanjoy Das 13e63a2f21 [JumpThreading] Use dominating conditions to prove implications
Summary:
If P branches to Q conditional on C and Q branches to R conditional on
C' and C => C' then the branch conditional on C' can be folded to an
unconditional branch.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13972

llvm-svn: 251557
2015-10-28 21:27:08 +00:00
Diego Novillo a8a3bd2100 SamplePGO - Clear per-function data after applying a profile.
The pass was keeping around a lot of per-function data (visited blocks,
edges, dominance, etc) that is just taking up memory for no reason. In
fact, from function to function it could potentially confuse the
propagator since some maps are indexed by line offsets which can be
common between functions.

llvm-svn: 251531
2015-10-28 17:40:22 +00:00
Chad Rosier 7142da0ed4 Typo.
llvm-svn: 251521
2015-10-28 15:08:33 +00:00
Chad Rosier 7967614b2b Reapply: [LIR] Add support for creating memsets from loops with a negative stride.
The simple fix is to prevent forming memcpy from loops with a negative stride.

llvm-svn: 251518
2015-10-28 14:38:49 +00:00
James Molloy ef607a2089 [GlobalOpt] Add newlines to DEBUG messages
I think these were affected by a change way back when to stop printing newlines in Value::dump() by default. This change simply allows the debug output to be readable.

NFC.

llvm-svn: 251517
2015-10-28 14:30:53 +00:00
Chad Rosier 8eb2a18a9f Revert "[LIR] Add support for creating memsets from loops with a negative stride."
This reverts commit r251512.  This is causing LNT/chomp to fail.

llvm-svn: 251513
2015-10-28 13:54:09 +00:00
Chad Rosier d6a6bd5501 [LIR] Add support for creating memsets from loops with a negative stride.
http://reviews.llvm.org/D14125

llvm-svn: 251512
2015-10-28 12:55:34 +00:00
Chen Li 8d23a9bbef Revert r251492 "[IndVarSimplify] Rewrite loop exit values with their
initial values from loop preheader", because it broke some bots.

llvm-svn: 251498
2015-10-28 05:15:51 +00:00
Chen Li 032a5d0cea [IndVarSimplify] Rewrite loop exit values with their initial values from loop preheader
Summary:
This patch adds support to check if a loop has loop invariant conditions which lead to loop exits. If so, we know that if the exit path is taken, it is at the first loop iteration. If there is an induction variable used in that exit path whose value has not been updated, it will keep its initial value passing from loop preheader. We can therefore rewrite the exit value with
its initial value. This will help remove phis created by LCSSA and enable other optimizations like loop unswitch.


Reviewers: sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13974

llvm-svn: 251492
2015-10-28 04:45:47 +00:00
David Majnemer 492937095f [SimplifyCFG] Don't DCE catchret because the successor is unreachable
CatchReturnInst has side-effects: it runs a destructor.  This destructor
could conceivably run forever/call exit/etc. and should not be removed.

llvm-svn: 251461
2015-10-27 22:43:56 +00:00
NAKAMURA Takumi 6f49ecc3c4 Whitespace.
llvm-svn: 251437
2015-10-27 19:02:52 +00:00
NAKAMURA Takumi 7ef7293b40 Revert r251291, "Loop Vectorizer - skipping "bitcast" before GEP"
It causes miscompilation of llvm/lib/ExecutionEngine/Interpreter/Execution.cpp.
See also PR25324.

llvm-svn: 251436
2015-10-27 19:02:36 +00:00
Diego Novillo aa55507ff9 Tidy a comment. NFC.
llvm-svn: 251434
2015-10-27 18:41:46 +00:00
Charlie Turner ab3215fa11 [SLP] Be more aggressive about reduction width selection.
Summary:
This change could be way off-piste, I'm looking for any feedback on whether it's an acceptable approach.

It never seems to be a problem to gobble up as many reduction values as can be found, and then to attempt to reduce the resulting tree. Some of the workloads I'm looking at have been aggressively unrolled by hand, and by selecting reduction widths that are not constrained by a vector register size, it becomes possible to profitably vectorize. My test case shows such an unrolling which SLP was not vectorizing (on neither ARM nor X86) before this patch, but with it does vectorize.

I measure no significant compile time impact of this change when combined with D13949 and D14063. There are also no significant performance regressions on ARM/AArch64 in SPEC or LNT.

The more principled approach I thought of was to generate several candidate tree's and use the cost model to pick the cheapest one. That seemed like quite a big design change (the algorithms seem very much one-shot), and would likely be a costly thing for compile time. This seemed to do the job at very little cost, but I'm worried I've misunderstood something!

Reviewers: nadav, jmolloy

Subscribers: mssimpso, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D14116

llvm-svn: 251428
2015-10-27 17:59:03 +00:00
Charlie Turner cd6e8cf8c2 [SLP] Try a bit harder to find reduction PHIs
Summary:
Currently, when the SLP vectorizer considers whether a phi is part of a reduction, it dismisses phi's whose incoming blocks are not the same as the block containing the phi. For the patterns I'm looking at, extending this rule to allow phis whose incoming block is a containing loop latch allows me to vectorize certain workloads.

There is no significant compile-time impact, and combined with D13949, no performance improvement measured in ARM/AArch64 in any of SPEC2000, SPEC2006 or LNT.

Reviewers: jmolloy, mcrosier, nadav

Subscribers: mssimpso, nadav, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D14063

llvm-svn: 251425
2015-10-27 17:54:16 +00:00
Charlie Turner 74c387feb7 [SLP] Treat SelectInsts as reduction values.
Summary:
Certain workloads, in particular sum-of-absdiff loops, can be vectorized using SLP if it can treat select instructions as reduction values.

The test case is a bit awkward. The AArch64 cost model needs some tuning to not be so pessimistic about selects. I've had to tweak the SLP threshold here.

Reviewers: jmolloy, mzolotukhin, spatel, nadav

Subscribers: nadav, mssimpso, aemerson, llvm-commits

Differential Revision: http://reviews.llvm.org/D13949

llvm-svn: 251424
2015-10-27 17:49:11 +00:00
Diego Novillo c04270d2e4 Fix SamplePGO segfault when debug info is missing.
When emitting a remark for a conditional branch annotation, the remark
uses the line location information of the conditional branch in the
message.  In some cases, that information is unavailable and the
optimization would segfaul. I'm still not sure whether this is a bug or
WAI, but the optimizer should not die because of this.

llvm-svn: 251420
2015-10-27 17:37:00 +00:00
Davide Italiano c692688cbd [SimplifyLibCalls] Use range-based loop. No functional change.
llvm-svn: 251383
2015-10-27 04:17:51 +00:00
Chandler Carruth 69798fb5ec [function-attrs] Refactor code to handle shorter code with early exits.
No functionality changed here, but the indentation is substantially
reduced and IMO the code is much easier to read. I've also added some
helpful comments.

This is just a clean-up I wrote while studying the code, and that has
been in my backlog for a while.

llvm-svn: 251381
2015-10-27 01:41:43 +00:00
Diego Novillo e822b63681 Remove unused local variable. NFC.
llvm-svn: 251344
2015-10-26 20:50:26 +00:00
Igor Laevsky 1ef06559f4 [RS4GC] Strip noalias attribute after statepoint rewrite
We should remove noalias along with dereference and dereference_or_null attributes 
because statepoint could potentially touch the entire heap including noalias objects.

Differential Revision: http://reviews.llvm.org/D14032

llvm-svn: 251333
2015-10-26 19:06:01 +00:00
Diego Novillo 7963ea1996 SamplePGO - Add optimization reports.
This adds a couple of optimization remarks to the SamplePGO
transformation. When it decides to inline a hot function (to mimic the
inline stack and repeat useful inline decisions in the original build).

It will also report branch destinations. For instance, given the code
fragment:

     6      if (i < 1000)
     7        sum -= i;
     8      else
     9        sum += -i * rand();

If the 'else' branch is taken most of the time, building this code with
-Rpass=sample-profile will produce:

a.cc:9:14: remark: most popular destination for conditional branches at small.cc:6:9 [-Rpass=sample-profile]
      sum += -i * rand();
             ^

llvm-svn: 251330
2015-10-26 18:52:53 +00:00
David Blaikie 94c83370b5 Move the canonical header to the top of its matching cpp file as per coding convention
This ensures that the header will be verified to be standalone (and
avoid mistakes like the one fixed in r251178)

llvm-svn: 251326
2015-10-26 18:40:56 +00:00
Evgeniy Stepanov d1aad26589 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

The previous iteration of this change was reverted in r250461. This
version leaves the generic, compiler-rt based implementation in
SafeStack.cpp instead of moving it to TargetLoweringBase in order to
allow testing without a TargetMachine.

llvm-svn: 251324
2015-10-26 18:28:25 +00:00
Alexey Samsonov 145b0fd2a0 Refactor: Simplify boolean conditional return statements in lib/Transforms/Instrumentation
Summary: Use clang-tidy to simplify boolean conditional return statements.

Differential Revision: http://reviews.llvm.org/D9996

Patch by Richard (legalize@xmission.com)!

llvm-svn: 251318
2015-10-26 18:06:40 +00:00
Elena Demikhovsky 7a77149391 Loop Vectorizer - skipping "bitcast" before GEP
Vectorization of memory instruction (Load/Store) is possible when the pointer is coming from GEP. The GEP analysis allows to estimate the profit.
In some cases we have a "bitcast" between GEP and memory instruction.
I added code that skips the "bitcast".

http://reviews.llvm.org/D13886

llvm-svn: 251291
2015-10-26 13:42:41 +00:00
Silviu Baranga b892e35520 [InstCombine] Teach instcombine not to create extra PHI nodes when folding GEPs
Summary:
InstCombine tries to transform GEP(PHI(GEP1, GEP2, ..)) into GEP(GEP(PHI(...))
when possible. However, this may leave the old PHI node around. Even if we
do end up folding the GEPs, having an extra PHI node might not be beneficial.

This change makes the transformation more conservative. We now only do this if
the PHI has only one use, and can therefore be removed after the transformation.

Reviewers: jmolloy, majnemer

Subscribers: mcrosier, mssimpso, llvm-commits

Differential Revision: http://reviews.llvm.org/D13887

llvm-svn: 251281
2015-10-26 10:25:05 +00:00
Benjamin Kramer 8ceb323bb4 Convert assert(false) into llvm_unreachable where it makes sense.
llvm-svn: 251266
2015-10-25 22:28:27 +00:00
Sanjoy Das 15c4c4604f [LCSSA] Unbreak build, don't reuse L; NFC
The build broke in r251248.

llvm-svn: 251251
2015-10-25 19:27:17 +00:00
Sanjoy Das 331521c688 [LCSSA] Use range for loops; NFC
llvm-svn: 251248
2015-10-25 19:08:32 +00:00
Michael Zolotukhin 1eeb2da7d4 Refactor: Simplify boolean conditional return statements in lib/Transforms/Vectorize (NFC).
Summary: Use clang-tidy to simplify boolean conditional return statements

Differential Revision: http://reviews.llvm.org/D10003

Patch by Richard<legalize@xmission.com>

llvm-svn: 251206
2015-10-24 20:16:42 +00:00
NAKAMURA Takumi 26c3872666 ScalarReplAggregates.cpp: Try to appease clash of anonymous::SROA in modules build.
llvm-svn: 251181
2015-10-24 06:42:42 +00:00
Igor Laevsky dde0029a25 [RS4GC] Rename stripDereferenceabilityInfo into stripNonValidAttributes.
llvm-svn: 251157
2015-10-23 22:42:44 +00:00
Chen Li 7009cd3554 Revert rL251061 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
llvm-svn: 251149
2015-10-23 21:13:01 +00:00
Hal Finkel f2199b2178 Handle non-constant shifts in computeKnownBits, and use computeKnownBits for constant folding in InstCombine/Simplify
First, the motivation: LLVM currently does not realize that:

  ((2072 >> (L == 0)) >> 7) & 1 == 0

where L is some arbitrary value. Whether you right-shift 2072 by 7 or by 8, the
lowest-order bit is always zero. There are obviously several ways to go about
fixing this, but the generic solution pursued in this patch is to teach
computeKnownBits something about shifts by a non-constant amount. Previously,
we would give up completely on these. Instead, in cases where we know something
about the low-order bits of the shift-amount operand, we can combine (and
together) the associated restrictions for all shift amounts consistent with
that knowledge. As a further generalization, I refactored all of the logic for
all three kinds of shifts to have this capability. This works well in the above
case, for example, because the dynamic shift amount can only be 0 or 1, and
thus we can say a lot about the known bits of the result.

This brings us to the second part of this change: Even when we know all of the
bits of a value via computeKnownBits, nothing used to constant-fold the result.
This introduces the necessary code into InstCombine and InstSimplify. I've
added it into both because:

  1. InstCombine won't automatically pick up the associated logic in
     InstSimplify (InstCombine uses InstSimplify, but not via the API that
     passes in the original instruction).

  2. Putting the logic in InstCombine allows the resulting simplifications to become
     part of the iterative worklist

  3. Putting the logic in InstSimplify allows the resulting simplifications to be
     used by everywhere else that calls SimplifyInstruction (inlining, unrolling,
     and many others).

And this requires a small change to our definition of an ephemeral value so
that we don't break the rest case from r246696 (where the icmp feeding the
@llvm.assume, is also feeding a br). Under the old definition, the icmp would
not be considered ephemeral (because it is used by the br), but this causes the
assume to remove itself (in addition to simplifying the branch structure), and
it seems more-useful to prevent that from happening.

llvm-svn: 251146
2015-10-23 20:37:08 +00:00
Tim Northover d4f55c0b1b GVN: don't try to replace instruction with itself.
After some look-ahead PRE was added for GEPs, an instruction could end
up in the table of candidates before it was actually inspected. When
this happened the pass might decide it was the best candidate to
replace itself. This didn't go well.

Should fix PR25291

llvm-svn: 251145
2015-10-23 20:30:02 +00:00
Sanjoy Das 0a1bee8a80 [Inliner] Don't inline through callsites with operand bundles
Summary:
This change teaches the LLVM inliner to not inline through callsites
with unknown operand bundles.  Currently all operand bundles are
"unknown" operand bundles but in the near future we will add support for
inlining through some select kinds of operand bundles.

Reviewers: reames, chandlerc, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14001

llvm-svn: 251141
2015-10-23 20:09:55 +00:00
Xinliang David Li 8ee08b0f70 Add more intrumentation/runtime helper interfaces (NFC)
This patch converts the remaining references to literal
strings for names of profile runtime entites (such as
profile runtime hook, runtime hook use function, profile
init method, register function etc).

Also added documentation for all the new interfaces.

llvm-svn: 251093
2015-10-23 04:22:58 +00:00
Mehdi Amini d42ae865b8 SLPVectorizer: AllSameOpcode* starts "true" only for instructions
r251085 wasn't as NFC as intended...

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 251087
2015-10-23 01:04:45 +00:00
Mehdi Amini bf6ee32ca5 SLPVectorizer: refactor reorderInputsAccordingToOpcode (NFC)
This is intended to simplify the changes needed to solve PR25247.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 251085
2015-10-23 00:46:17 +00:00
Justin Bogner 35e46cdd04 LoopPass: Simplify the API for adding a new loop. NFC
The insertLoop() API is only used to add new loops, and has confusing
ownership semantics. Simplify it by replacing it with addLoop().

llvm-svn: 251064
2015-10-22 21:21:32 +00:00
Chen Li c6e28782d8 [SimplifyCFG] Extend SimplifyResume to handle phi of trivial landing pad.
Summary: Currently SimplifyResume can convert an invoke instruction to a call instruction if its landing pad is trivial. In practice we could have several invoke instructions with trivial landing pads and share a common rethrow block, and in the common rethrow block, all the landing pads join to a phi node. The patch extends SimplifyResume to check the phi of landing pad and their incoming blocks. If any of them is trivial, remove it from the phi node and convert the invoke instruction to a call instruction.  

Reviewers: hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13718

llvm-svn: 251061
2015-10-22 20:48:38 +00:00
Xinliang David Li 83bc4220ce Add helper functions and remove hard coded references to instProf related name/name-prefixes
This is a clean up patch that defines instr prof section and variable 
name prefixes in a common header with access helper functions. 
clang FE change will be done as a follow up once this patch is in.

Differential Revision: http://reviews.llvm.org/D13919

llvm-svn: 251058
2015-10-22 20:32:12 +00:00
David Majnemer e0675fb8fb [Sink] Don't check BB.empty()
As an invariant, BasicBlocks cannot be empty when passed to a transform.
This is not the case for MachineBasicBlocks and the Sink pass was ported
from the MachineSink pass which would explain the check's existence.

llvm-svn: 251057
2015-10-22 20:29:08 +00:00
Alexey Samsonov f4fb5f500c [ASan] Enable instrumentation of dynamic allocas by default.
llvm-svn: 251056
2015-10-22 20:07:28 +00:00
Alexey Samsonov 8daaf8b09b [ASan] Minor fixes to dynamic allocas handling:
* Don't instrument promotable dynamic allocas:
  We already have a test that checks that promotable dynamic allocas are
  ignored, as well as static promotable allocas. Make sure this test will
  still pass if/when we enable dynamic alloca instrumentation by default.

* Handle lifetime intrinsics before handling dynamic allocas:
  lifetime intrinsics may refer to dynamic allocas, so we need to emit
  instrumentation before these dynamic allocas would be replaced.

Differential Revision: http://reviews.llvm.org/D12704

llvm-svn: 251045
2015-10-22 19:51:59 +00:00
Craig Topper 42526d3372 Use ArrayRef instead of pointer and size. NFC
llvm-svn: 251029
2015-10-22 16:35:56 +00:00
David Majnemer dc3b67b4ca [SimplifyCFG] Don't use-after-free an SSA value
SimplifyTerminatorOnSelect didn't consider the possibility that the
condition might be related to one of PHI nodes.

This fixes PR25267.

llvm-svn: 250922
2015-10-21 18:22:24 +00:00
Dehao Chen 100424124b Tolerate negative offset when matching sample profile.
In some cases (as illustrated in the unittest), lineno can be less than the heade_lineno because the function body are included from some other files. In this case, offset will be negative. This patch makes clang still able to match the profile to IR in this situation.

http://reviews.llvm.org/D13914

llvm-svn: 250873
2015-10-21 01:22:27 +00:00
Igor Laevsky 68688df94c [MemorySanitizer] NFC. Do not use GET_INTRINSIC_MODREF_BEHAVIOR table.
It is now possible to infer intrinsic modref behaviour purely from intrinsic attributes.
This change will allow to completely remove GET_INTRINSIC_MODREF_BEHAVIOR table.

Differential Revision: http://reviews.llvm.org/D13907

llvm-svn: 250860
2015-10-20 21:33:30 +00:00
Keno Fischer a010cfa592 Fix missing INITIALIZE_PASS_DEPENDENCY for AddressSanitizer
Summary: In r231241, TargetLibraryInfoWrapperPass was added to
`getAnalysisUsage` for `AddressSanitizer`, but the corresponding
`INITIALIZE_PASS_DEPENDENCY` was not added.

Reviewers: dvyukov, chandlerc, kcc

Subscribers: kcc, llvm-commits

Differential Revision: http://reviews.llvm.org/D13629

llvm-svn: 250813
2015-10-20 10:13:55 +00:00
Sanjoy Das 3020b1bc8c [RS4GC] Remove a redundant linear search, NFCI
Since LiveVariables is uniqued (we just created it from a `DenseSet`),
`FindIndex(LiveVariables, LiveVariables[i])` is always `i`.

llvm-svn: 250786
2015-10-20 01:06:31 +00:00
Sanjoy Das b1942f14cd [RS4GC] Clean up `find_index`; NFC
- Bring it up to the LLVM Coding Style
 - Sink it inside `CreateGCRelocates`, which is its only user

llvm-svn: 250785
2015-10-20 01:06:28 +00:00
Sanjoy Das 7ad67640e9 [RS4GC] Re-purpose `normalizeForInvokeSafepoint`; NFC.
`normalizeForInvokeSafepoint` in RewriteStatepointsForGC.cpp, as it is
written today, deals with `gc.relocate` and `gc.result` uses of a
statepoint equally well.  This change documents this fact and adds a
test case.

There is no functional change here -- only documentation of existing
functionality.

llvm-svn: 250784
2015-10-20 01:06:24 +00:00
Sanjoy Das ff3dba736a [RS4GC] Minor cleanup to `normalizeForInvokeSafepoint`; NFC
llvm-svn: 250783
2015-10-20 01:06:17 +00:00
Duncan P. N. Exon Smith 1e59a66c69 ObjCARC: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250756
2015-10-19 23:20:14 +00:00
Michael Liao c65d386b81 [InstCombine] Optimize icmp of inc/dec at RHS
Allow LLVM to optimize the sequence like the following:

  %inc = add nsw i32 %i, 1
  %cmp = icmp slt %n, %inc

into:

  %cmp = icmp sle i32 %n, %i

The case is not handled previously due to the complexity of compuation of %n.
Hence, LLVM cannot swap operands of icmp accordingly.

llvm-svn: 250746
2015-10-19 22:08:14 +00:00
Duncan P. N. Exon Smith 6b92a14a28 Vectorize: Remove implicit ilist iterator conversions, NFC
Besides the usual, I finally added an overload to
`BasicBlock::splitBasicBlock()` that accepts an `Instruction*` instead
of `BasicBlock::iterator`.  Someone can go back and remove this overload
later (after updating the callers I'm going to skip going forward), but
the most common call seems to be
`BB->splitBasicBlock(BB->getTerminator(), ...)` and I'm not sure it's
better to add `->getIterator()` to every one than have the overload.
It's pretty hard to get the usage wrong.

llvm-svn: 250745
2015-10-19 22:06:09 +00:00
Elena Demikhovsky 20662e39f1 Removed parameter "Consecutive" from isLegalMaskedLoad() / isLegalMaskedStore().
Originally I planned to use the same interface for masked gather/scatter and set isConsecutive to "false" in this case.

Now I'm implementing masked gather/scatter and see that the interface is inconvenient. I want to add interfaces isLegalMaskedGather() / isLegalMaskedScatter() instead of using the "Consecutive" parameter in the existing interfaces.

Differential Revision: http://reviews.llvm.org/D13850

llvm-svn: 250686
2015-10-19 07:43:38 +00:00
Xinliang David Li aa0592cc70 [PGO] Eliminate prof data register calls on FreeBSD platform
This is a follow up patch of r250199 after verifying the start/stop
section symbols work as spected on FreeBSD.

llvm-svn: 250679
2015-10-19 04:17:10 +00:00
Jakub Staszak f12821a43c Preserve CFG in MergedLoadStoreMotion. This fixes PR24426.
llvm-svn: 250660
2015-10-18 19:34:10 +00:00
Simon Pilgrim 216b1bf5ed [InstCombine] SSE4A constant folding and conversion to shuffles.
This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'bit index' and 'length' operands are constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Differential Revision: http://reviews.llvm.org/D13348

llvm-svn: 250609
2015-10-17 11:40:05 +00:00
Sanjoy Das 58fae7cf6b [RS4GC] Dont' propagate call attrs related to patchable statepoints
The `"statepoint-id"` and `"statepoint-num-patch-bytes"` attributes are
used solely to determine properties of the `gc.statepoint` being
created.  Once the `gc.statepoint` is in place, these should be removed.

llvm-svn: 250491
2015-10-16 02:41:23 +00:00
Sanjoy Das 810a59d037 [RS4GC] Bring legalizeCallAttributes up to LLVM coding style; NFC
llvm-svn: 250490
2015-10-16 02:41:11 +00:00
Sanjoy Das 25ec1a3e60 [RS4GC] Use "deopt" operand bundles
Summary:
This is a step towards using operand bundles to carry deopt state till
RewriteStatepointsForGC.  The change adds a flag to
RewriteStatepointsForGC that teaches it to pick up deopt state from a
`"deopt"` operand bundle attached to the `call` or `invoke` it is
wrapping.

The command line flag added, `-rs4gc-use-deopt-bundles`, will only exist
for a short while.  Once we are able to pipe deopt bundle state through
the full optimization pipeline without problems, we will "constant fold"
`-rs4gc-use-deopt-bundles` to `true`.

Reviewers: swaroop.sridhar, reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13372

llvm-svn: 250489
2015-10-16 02:41:00 +00:00
Sanjoy Das 7360f30852 [IndVars] Rename getExtend; NFC
Rename `IndVarSimplify::getExtend` to `IndVarSimplify::createExtendInst`
to make it obvious that it creates `llvm::Instruction` s.

llvm-svn: 250484
2015-10-16 01:00:50 +00:00
Sanjoy Das 37e87c2023 [IndVars] Have `cloneArithmeticIVUser` guess better
Summary:
`cloneArithmeticIVUser` currently trips over expression like `add %iv,
-1` when `%iv` is being zero extended -- it tries to construct the
widened use as `add %iv.zext, zext(-1)` and (correctly) fails to prove
equivalence to `zext(add %iv, -1)` (here the SCEV for `%iv` is
`{1,+,1}`).

This change teaches `IndVars` to try sign extending the non-IV operand
if that makes the newly constructed IV use equivalent to the widened
narrow IV use.

Reviewers: atrick, hfinkel, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13717

llvm-svn: 250483
2015-10-16 01:00:47 +00:00
Sanjoy Das 472840a3d3 [IndVars] Extract out a few local variables; NFC
llvm-svn: 250482
2015-10-16 01:00:44 +00:00
Sanjoy Das 1fd184e5a2 [IndVars] Split `WidenIV::cloneIVUser`; NFC
Summary:
This NFC splitting is intended to make a later diff easier to follow.
It just tail duplicates `cloneIVUser` into `cloneArithmeticIVUser` and
`cloneBitwiseIVUser`.

Reviewers: atrick, hfinkel, reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13716

llvm-svn: 250481
2015-10-16 01:00:39 +00:00
Evgeniy Stepanov 9addbc9fc1 Revert "[safestack] Fast access to the unsafe stack pointer on AArch64/Android."
Breaks the hexagon buildbot.

llvm-svn: 250461
2015-10-15 21:26:49 +00:00
Evgeniy Stepanov 142947e9f0 [safestack] Fast access to the unsafe stack pointer on AArch64/Android.
Android libc provides a fixed TLS slot for the unsafe stack pointer,
and this change implements direct access to that slot on AArch64 via
__builtin_thread_pointer() + offset.

This change also moves more code into TargetLowering and its
target-specific subclasses to get rid of target-specific codegen
in SafeStackPass.

This change does not touch the ARM backend because ARM lowers
builting_thread_pointer as aeabi_read_tp, which is not available
on Android.

llvm-svn: 250456
2015-10-15 20:50:16 +00:00
Philip Reames a956cc7f08 Revert 250343 and 250344
Turns out this approach is buggy.  In discussion about follow on work, Sanjoy pointed out that we could be subject to circular logic problems.  

Consider:
 if (i u< L) leave()
 if ((i + 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we know that L is less than UINT_MAX, we could possible prove (in a control dependent way) that i + 1 does not overflow.  This gives us:
 if (i u< L) leave()
 if ((i +nuw 1) u< L) leave()
 print(a[i] + a[i+1]) 

If we now do the transform this patch proposed, we end up with:
 if ((i +nuw 1) u< L) leave_appropriately()
 print(a[i] + a[i+1]) 

That would be a miscompile when i==-1.  The problem here is that the control dependent nuw bits got used to prove something about the first condition.  That's obviously invalid.

This won't happen today, but since I plan to enhance LVI/CVP with exactly that transform at some point in the not too distant future...

llvm-svn: 250430
2015-10-15 16:51:00 +00:00
Diego Novillo 38be33302c Sample Profiles - Adjust integer types. Mostly NFC.
This adjusts all integers in the reader/writer to reflect the types
stored on profile files. They should all be unsigned 32-bit or 64-bit
values. Changed all associated internal types to be uint32_t or
uint64_t.

The only place that needed some adjustments is in the sample profile
transformation. Altough the weight read from the profile are 64-bit
values, the internal API for branch weights only accepts 32-bit values.
The pass now saturates weights that overflow uint32_t.

llvm-svn: 250427
2015-10-15 16:36:21 +00:00
Benjamin Kramer 6db3338cb1 [ScalarOpts] Remove dead code.
Does not touch debug dumpers. NFC.

llvm-svn: 250417
2015-10-15 15:08:58 +00:00
Manman Ren 72d44b1b09 Recommit r250345, it was reverted in r250366 to investigate a bot failure.
Our internal bot is still red after r250366.

llvm-svn: 250415
2015-10-15 14:59:40 +00:00
Manman Ren f5499fd9d5 Temporarily revert r250345 to sort out bot failure.
With r250345 and r250343, we start to observe the following failure
when bootstrap clang with lto and pgo:
PHI node entries do not match predecessors!
  %.sroa.029.3.i = phi %"class.llvm::SDNode.13298"* [ null, %30953 ], [ null, %31017 ], [ null, %30998 ], [ null, %_ZN4llvm8dyn_castINS_14ConstantSDNodeENS_7SDValueEEENS_10cast_rettyIT_T0_E8ret_typeERS5_.exit.i.1804 ], [ null, %30975 ], [ null, %30991 ], [ null, %_ZNK4llvm3EVT13getScalarTypeEv.exit.i.1812 ], [ %..sroa.029.0.i, %_ZN4llvm11SmallVectorIiLj8EED1Ev.exit.i.1826 ], !dbg !451895
label %30998
label %_ZNK4llvm3EVTeqES0_.exit19.thread.i
LLVM ERROR: Broken function found, compilation aborted!

I will re-commit this if the bot does not recover.

llvm-svn: 250366
2015-10-15 04:58:24 +00:00
Cong Hou b74d3b3b86 Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

This is the third attempt to submit this patch, while the first two led to failures in some FDO tests. After investigation, it is the edge weight normalization that caused those failures. In this patch the edge weight normalization is fixed so that there is no zero weight in the output and the sum of all weights can fit in 32-bit integer. Several unit tests are added.

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250345
2015-10-14 23:14:17 +00:00
Philip Reames b42db21de8 [SimplifyCFG] Speculatively flatten CFG based on profiling metadata
If we have a series of branches which are all unlikely to fail, we can possibly combine them into a single check on the fastpath combined with a bit of dispatch logic on the slowpath. We don't want to do this unconditionally since it requires speculating instructions past a branch, but if the profiling metadata on the branch indicates profitability, this can reduce the number of checks needed along the fast path.

The canonical example this is trying to handle is removing the second bounds check implied by the Java code: a[i] + a[i+1]. Note that it can currently only do so for really simple conditions and the values of a[i] can't be used anywhere except in the addition. (i.e. the load has to have been sunk already and not prevent speculation.) I plan on extending this transform over the next few days to handle alternate sequences.

Differential Revision: http://reviews.llvm.org/D13070

llvm-svn: 250343
2015-10-14 22:46:19 +00:00
Chen Li 567aa7ab30 [LoopUnswitch] Correct misleading comments.
Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13738

llvm-svn: 250317
2015-10-14 19:47:43 +00:00
Manman Ren 2c8e16d507 Revert r250204 and r250240 due to bot failure. We failed to build PGO-ed clang.
llvm-svn: 250264
2015-10-14 03:04:03 +00:00
Evgeniy Stepanov ebd3f44f93 [msan] Fix crash on multiplication by a non-integer constant.
Fixes PR25160.

llvm-svn: 250260
2015-10-14 00:21:13 +00:00
David Majnemer eba62796cb [InlineFunction] Correctly inline TerminatePadInst
We forgot to append the terminatepad's arguments which resulted in us
treating the old terminatepad as an argument to the new terminatepad
causing us to crash immediately.  Instead, add the old terminatepad's
arguments to the new terminatepad.

This fixes PR25155.

llvm-svn: 250234
2015-10-13 22:08:17 +00:00
Chad Rosier 7f08d80595 Typo.
llvm-svn: 250224
2015-10-13 20:59:16 +00:00
Duncan P. N. Exon Smith be4d8cba1c Scalar: Remove remaining ilist iterator implicit conversions
Remove remaining `ilist_iterator` implicit conversions from
LLVMScalarOpts.

This change exposed some scary behaviour in
lib/Transforms/Scalar/SCCP.cpp around line 1770.  This patch changes a
call from `Function::begin()` to `&Function::front()`, since the return
was immediately being passed into another function that takes a
`Function*`.  `Function::front()` started to assert, since the function
was empty.  Note that `Function::end()` does not point at a legal
`Function*` -- it points at an `ilist_half_node` -- so the other
function was getting garbage before.  (I added the missing check for
`Function::isDeclaration()`.)

Otherwise, no functionality change intended.

llvm-svn: 250211
2015-10-13 19:26:58 +00:00
Cong Hou 7ab123a5cf Update the branch weight metadata in JumpThreading pass.
Currently in JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB).

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250204
2015-10-13 18:43:10 +00:00
Xinliang David Li 3dd8817d84 [PGO]: Eliminate calls to __llvm_profile_register_function for Linux.
On Linux, the profile runtime can use __start_SECTNAME and __stop_SECTNAME
symbols defined by the linker to locate the start and end location of
a named section (with C name). This eliminates the need for instrumented
binary to call __llvm_profile_register_function during start-up time.

llvm-svn: 250199
2015-10-13 18:39:48 +00:00
Duncan P. N. Exon Smith 3a9c9e3dcd Scalar: Remove some implicit ilist iterator conversions, NFC
Remove some of the implicit ilist iterator conversions in
LLVMScalarOpts.  More to go.

llvm-svn: 250197
2015-10-13 18:26:00 +00:00
Duncan P. N. Exon Smith 1732340bfa IPO: Remove implicit ilist iterator conversions, NFC
llvm-svn: 250187
2015-10-13 17:51:03 +00:00
Duncan P. N. Exon Smith e82c286fba Instrumentation: Remove ilist iterator implicit conversions, NFC
llvm-svn: 250186
2015-10-13 17:39:10 +00:00
Duncan P. N. Exon Smith 9f8aaf21ba InstCombine: Remove ilist iterator implicit conversions, NFC
Stop relying on implicit conversions of ilist iterators in
LLVMInstCombine.  No functionality change intended.

llvm-svn: 250183
2015-10-13 16:59:33 +00:00
Simon Pilgrim 3c2b30f8ba [InstCombine][SSE4A] Remove broken INSERTQI range combining optimization
As discussed in D13348 - the INSERTQI range combining code is wrong in that it confuses the insertion bit index with an extraction bit index.

The remaining legal combines are very unlikely (especially once we've converted to shuffles in D13348) so I'm removing the optimization.

llvm-svn: 250160
2015-10-13 14:48:54 +00:00
James Molloy 4015925606 [GlobalsAA] Turn GlobalsAA on again by default
Now that all the known faults with GlobalsAA have been fixed, flip the big switch on -enable-non-lto-gmr again.

Feel free to pester me with any more bugs found, and don't hesitate to flip the switch back off.

llvm-svn: 250157
2015-10-13 10:43:57 +00:00
Sanjoy Das b873cbe5c9 [IndVars] NFC Cleanup.
- Rename methods according to the LLVM Coding Style
 - Merge adjacent anonymous namespace block
 - Use `auto` in two places

llvm-svn: 250152
2015-10-13 07:17:38 +00:00
Manman Ren 9f824dab1d Revert 250089 due to bot failure. It failed when building clang itself with PGO.
llvm-svn: 250145
2015-10-13 03:38:02 +00:00
Duncan P. N. Exon Smith 5b4c837c58 TransformUtils: Remove implicit ilist iterator conversions, NFC
Continuing the work from last week to remove implicit ilist iterator
conversions.  First related commit was probably r249767, with some more
motivation in r249925.  This edition gets LLVMTransformUtils compiling
without the implicit conversions.

No functional change intended.

llvm-svn: 250142
2015-10-13 02:39:05 +00:00
Cong Hou 3320bcd815 Update the branch weight metadata in JumpThreading pass.
In JumpThreading pass, the branch weight metadata is not updated after CFG modification. Consider the jump threading on PredBB, BB, and SuccBB. After jump threading, the weight on BB->SuccBB should be adjusted as some of it is contributed by the edge PredBB->BB, which doesn't exist anymore. This patch tries to update the edge weight in metadata on BB->SuccBB by scaling it by 1 - Freq(PredBB->BB) / Freq(BB->SuccBB). 

Differential revision: http://reviews.llvm.org/D10979

llvm-svn: 250089
2015-10-12 19:44:08 +00:00
Oliver Stannard 939724cd02 GlobalOpt does not treat externally_initialized globals correctly
GlobalOpt currently merges stores into the initialisers of internal,
externally_initialized globals, but should not do so as the value of the global
may change between the initialiser and any code in the module being run.

llvm-svn: 250035
2015-10-12 13:20:52 +00:00
James Molloy 55d633bd60 [LoopVectorize] Shrink integer operations into the smallest type possible
C semantics force sub-int-sized values (e.g. i8, i16) to be promoted to int
type (e.g. i32) whenever arithmetic is performed on them.

For targets with native i8 or i16 operations, usually InstCombine can shrink
the arithmetic type down again. However InstCombine refuses to create illegal
types, so for targets without i8 or i16 registers, the lengthening and
shrinking remains.

Most SIMD ISAs (e.g. NEON) however support vectors of i8 or i16 even when
their scalar equivalents do not, so during vectorization it is important to
remove these lengthens and truncates when deciding the profitability of
vectorization.

The algorithm this uses starts at truncs and icmps, trawling their use-def
chains until they terminate or instructions outside the loop are found (or
unsafe instructions like inttoptr casts are found). If the use-def chains
starting from different root instructions (truncs/icmps) meet, they are
unioned. The demanded bits of each node in the graph are ORed together to form
an overall mask of the demanded bits in the entire graph. The minimum bitwidth
that graph can be truncated to is the bitwidth minus the number of leading
zeroes in the overall mask.

The intention is that this algorithm should "first do no harm", so it will
never insert extra cast instructions. This is why the use-def graphs are
unioned, so that subgraphs with different minimum bitwidths do not need casts
inserted between them.

This algorithm works hard to reduce compile time impact. DemandedBits are only
queried if there are extends of illegal types and if a truncate to an illegal
type is seen. In the general case, this results in a simple linear scan of the
instructions in the loop.

No non-noise compile time impact was seen on a clang bootstrap build.

llvm-svn: 250032
2015-10-12 12:34:45 +00:00
Simon Pilgrim 1d1c56e2df [InstCombine][X86][XOP] Combine XOP integer vector comparisons to native IR
We now have lowering support for XOP PCOM/PCOMU instructions.

llvm-svn: 249977
2015-10-11 14:38:34 +00:00
Sanjoy Das cc16ccc1ab [IndVars] Use `auto`; NFC
llvm-svn: 249944
2015-10-10 06:33:33 +00:00
Owen Anderson 97ca0f3f2c Generalize convergent check to handle invokes as well as calls.
llvm-svn: 249892
2015-10-09 20:17:46 +00:00
Owen Anderson 2c9978b12b Teach LoopUnswitch not to perform non-trivial unswitching on loops containing convergent operations.
Doing so could cause the post-unswitching convergent ops to be
control-dependent on the unswitch condition where they were not before.
This check could be refined to allow unswitching where the convergent
operation was already control-dependent on the unswitch condition.

llvm-svn: 249874
2015-10-09 18:40:20 +00:00
Owen Anderson d95b08a0a7 Refine the definition of convergent to only disallow the addition of new control dependencies.
This covers the common case of operations that cannot be sunk.
Operations that cannot be hoisted should already be handled properly via
the safe-to-speculate rules and mechanisms.

llvm-svn: 249865
2015-10-09 18:06:13 +00:00
Dehao Chen 41dc5a6e86 Make HeaderLineno a local variable.
http://reviews.llvm.org/D13576

As we are using hierarchical profile, there is no need to keep HeaderLineno a member variable. This is because each level of the inline stack will have its own header lineno. One should use the head lineno of its own inline stack level instead of the actual symbol.

llvm-svn: 249848
2015-10-09 16:50:16 +00:00
Andrea Di Biagio 99493df257 [MemCpyOpt] Fix wrong merging adjacent nontemporal stores into memset calls.
Pass MemCpyOpt doesn't check if a store instruction is nontemporal.
As a consequence, adjacent nontemporal stores are always merged into a
memset call.

Example:

;;;
define void @foo(<4 x float>* nocapture %p) {
entry:
  store <4 x float> zeroinitializer, <4 x float>* %p, align 16, !nontemporal !0
  %p1 = getelementptr inbounds <4 x float>, <4 x float>* %dst, i64 1
  store <4 x float> zeroinitializer, <4 x float>* %p1, align 16, !nontemporal !0
  ret void
}

!0 = !{i32 1}
;;;

In this example, the two nontemporal stores are combined to a memset of zero
which does not preserve the nontemporal hint. Later on the backend (tested on a
x86-64 corei7) expands that memset call into a sequence of two normal 16-byte
aligned vector stores.

opt -memcpyopt example.ll -S -o - | llc -mcpu=corei7 -o -

Before:
  xorps  %xmm0, %xmm0
  movaps  %xmm0, 16(%rdi)
  movaps  %xmm0, (%rdi)

With this patch, we no longer merge nontemporal stores into calls to memset.
In this example, llc correctly expands the two stores into two movntps:
  xorps  %xmm0, %xmm0
  movntps %xmm0, 16(%rdi)
  movntps  %xmm0, (%rdi)

In theory, we could extend the usage of !nontemporal metadata to memcpy/memset
calls. However a change like that would only have the effect of forcing the
backend to expand !nontemporal memsets back to sequences of store instructions.
A memset library call would not have exactly the same semantic of a builtin
!nontemporal memset call. So, SelectionDAG will have to conservatively expand
it back to a sequence of !nontemporal stores (effectively undoing the merging).

Differential Revision: http://reviews.llvm.org/D13519

llvm-svn: 249820
2015-10-09 10:53:41 +00:00
Arnaud A. de Grandmaison 859b2ac07d [EarlyCSE] Address post commit review for r249523.
llvm-svn: 249814
2015-10-09 09:23:01 +00:00
Sanjoy Das 3c520a1272 [RS4GC] Refactoring to make a later change easier, NFCI
Summary:
These non-semantic changes will help make a later change adding
support for deopt operand bundles more streamlined.

Reviewers: reames, swaroop.sridhar

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D13491

llvm-svn: 249779
2015-10-08 23:18:38 +00:00
Sanjoy Das c21a05a3a4 [PlaceSafeopints] Extract out `callsGCLeafFunction`, NFC
Summary:
This will be used in a later change to RewriteStatepointsForGC.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13490

llvm-svn: 249777
2015-10-08 23:18:30 +00:00
Sanjoy Das 1ede5367ba [RS4GC] Don't copy ADT's unneccessarily, NFCI
Summary: Use `const auto &` instead of `auto` in `makeStatepointExplicit`.

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13454

llvm-svn: 249776
2015-10-08 23:18:22 +00:00
Evgeniy Stepanov d12212bc8c New MSan mapping layout (llvm part).
This is an implementation of
https://github.com/google/sanitizers/issues/579

It has a number of advantages over the current mapping:
* Works for non-PIE executables.
* Does not require ASLR; as a consequence, debugging MSan programs in
  gdb no longer requires "set disable-randomization off".
* Supports linux kernels >=4.1.2.
* The code is marginally faster and smaller.

This is an ABI break. We never really promised ABI stability, but
this patch includes a courtesy escape hatch: a compile-time macro
that reverts back to the old mapping layout.

llvm-svn: 249753
2015-10-08 21:35:26 +00:00
Evgeniy Stepanov 5fe279e727 Add Triple::isAndroid().
This is a simple refactoring that replaces Triple.getEnvironment()
checks for Android with Triple.isAndroid().

llvm-svn: 249750
2015-10-08 21:21:24 +00:00
Sanjay Patel f61a08fbf1 [InstCombine] transform masking off of an FP sign bit into a fabs() intrinsic call (PR24886)
This is a partial fix for PR24886:
https://llvm.org/bugs/show_bug.cgi?id=24886

Without this IR transform, the backend (x86 at least) was producing inefficient code.

This patch is making 2 assumptions:

    1. The canonical form of a fabs() operation is, in fact, the LLVM fabs() intrinsic.
    2. The high bit of an FP value is always the sign bit; as noted in the bug report, this isn't specified by the LangRef.

Differential Revision: http://reviews.llvm.org/D13076

llvm-svn: 249702
2015-10-08 17:09:31 +00:00
Sanjoy Das 40bdd041db [RS4GC] Use AssertingVH for RematerializedValueMapTy, NFCI
Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13489

llvm-svn: 249620
2015-10-07 21:32:35 +00:00
Sanjoy Das 0015e5a088 [IndVars] Preserve LCSSA in `eliminateIdentitySCEV`
Summary:
After r249211, SCEV can see through some LCSSA phis.  Add a
`replacementPreservesLCSSAForm` check before replacing uses of these phi
nodes with a simplified use of the induction variable to avoid breaking
LCSSA.

Fixes 25047.

Depends on D13460.

Reviewers: atrick, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13461

llvm-svn: 249575
2015-10-07 17:38:31 +00:00
Arnaud A. de Grandmaison a6178a179d [EarlyCSE] Fix handling of target memory intrinsics for CSE'ing loads.
Summary:
Some target intrinsics can access multiple elements, using the pointer as a
base address (e.g. AArch64 ld4). When trying to CSE such instructions,
it must be checked the available value comes from a compatible instruction
because the pointer is not enough to discriminate whether the value is
correct.

Reviewers: ssijaric

Subscribers: mcrosier, llvm-commits, aemerson

Differential Revision: http://reviews.llvm.org/D13475

llvm-svn: 249523
2015-10-07 07:41:29 +00:00
Sanjoy Das 60bf3db17f [RS4GC] Remove an unnecessary assert & related variables
I don't think this assert adds much value, and removing it and related
variables avoids an "unused variable" warning in release builds.

llvm-svn: 249511
2015-10-07 02:39:27 +00:00
Sanjoy Das b40bd1a93f [RS4GC] Cosmetic cleanup, NFC
Summary:
A series of cosmetic cleanup changes to RewriteStatepointsForGC:

  - Rename variables to LLVM style
  - Remove some redundant asserts
  - Remove an unsued `Pass *` parameter
  - Remove unnecessary variables
  - Use C++11 idioms where applicable
  - Pass CallSite by value, not reference

Reviewers: reames, swaroop.sridhar

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13370

llvm-svn: 249508
2015-10-07 02:39:18 +00:00
Hans Wennborg f1f36517b7 InstCombine: Fold comparisons between unguessable allocas and other pointers
This will allow us to optimize code such as:

  int f(int *p) {
    int x;
    return p == &x;
  }

as well as:

  int *allocate(void);
  int f() {
    int x;
    int *p = allocate();
    return p == &x;
  }

The folding can only be done under certain circumstances. Even though p and &x
cannot alias, the comparison must still return true if the pointer
representations are equal. If a user successfully generates a p that's a
correct guess for &x, comparison should return true even though p is an invalid
pointer.

This patch argues that if the address of the alloca isn't observable outside the
function, the function can act as-if the address is impossible to guess from the
outside. The tricky part is keeping the act consistent: if we fold p == &x to
false in one place, we must make sure to fold any other comparisons based on
those pointers similarly. To ensure that, we only fold when &x is involved
exactly once in comparison instructions.

Differential Revision: http://reviews.llvm.org/D13358

llvm-svn: 249490
2015-10-07 00:20:07 +00:00
Hans Wennborg 083ca9bb32 Fix Clang-tidy modernize-use-nullptr warnings in source directories and generated files; other minor cleanups.
Patch by Eugene Zelenko!

Differential Revision: http://reviews.llvm.org/D13321

llvm-svn: 249482
2015-10-06 23:24:35 +00:00
Sanjoy Das 5c8bead46d [IndVars] Don't break dominance in `eliminateIdentitySCEV`
Summary:
After r249211, `getSCEV(X) == getSCEV(Y)` does not guarantee that X and
Y are related in the dominator tree, even if X is an operand to Y (I've
included a toy example in comments, and a real example as a test case).

This commit changes `SimplifyIndVar` to require a `DominatorTree`.  I
don't think this is a problem because `ScalarEvolution` requires it
anyway.

Fixes PR25051.

Depends on D13459.

Reviewers: atrick, hfinkel

Subscribers: joker.eph, llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D13460

llvm-svn: 249471
2015-10-06 21:44:49 +00:00
Sanjoy Das 088bb0ea9f [IndVars] Extract out eliminateIdentitySCEV, NFC
Summary:
Reflow a comment while at it.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13459

llvm-svn: 249470
2015-10-06 21:44:39 +00:00
Joseph Tremoulet 2afea5438f [WinEH] Recognize CoreCLR personality function
Summary:
 - Add CoreCLR to if/else ladders and switches as appropriate.
 - Rename isMSVCEHPersonality to isFuncletEHPersonality to better
   reflect what it captures.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, AndyAyers, llvm-commits

Differential Revision: http://reviews.llvm.org/D13449

llvm-svn: 249455
2015-10-06 20:28:16 +00:00
Arnaud A. de Grandmaison 6fd488b156 [EarlyCSE] Constify ParseMemoryInst methods (NFC).
llvm-svn: 249400
2015-10-06 13:35:30 +00:00
Andrea Di Biagio 40f59e4466 [InstCombine] Teach SimplifyDemandedVectorElts how to handle ConstantVector select masks with ConstantExpr elements (PR24922)
If the mask of a select instruction is a ConstantVector, method
SimplifyDemandedVectorElts iterates over the mask elements to identify which
values are selected from the select inputs.

Before this patch, method SimplifyDemandedVectorElts always used method
Constant::isNullValue() to check if a value in the mask was zero. Unfortunately
that method always returns false when called on a ConstantExpr.

This patch fixes the problem in SimplifyDemandedVectorElts by adding an explicit
check for ConstantExpr values. Now, if a value in the mask is a ConstantExpr, we
avoid calling isNullValue() on it.

Fixes PR24922.

Differential Revision: http://reviews.llvm.org/D13219

llvm-svn: 249390
2015-10-06 10:34:53 +00:00
Evgeniy Stepanov 670abcfd78 [msan] Correct a typo in poison stack pattern command line description.
Patch by Jon Eyolfson.

llvm-svn: 249331
2015-10-05 18:01:17 +00:00
Arnold Schwaighofer 0591c5d719 MergeFunctions: Clear GlobalNumbers ValueMap
Otherwise, the map will observe changes as long as MergeFunctions is alive. This
is bad because follow-up passes could replace-all-uses-with on the key of an
entry in the map. The value handle callback of ValueMap however asserts that the
key type matches.

rdar://22971893

llvm-svn: 249327
2015-10-05 17:26:36 +00:00
Piotr Padlewski dc9b2cfc50 inariant.group handling in GVN
The most important part required to make clang
devirtualization works ( ͡°͜ʖ ͡°).
The code is able to find non local dependencies, but unfortunatelly
because the caller can only handle local dependencies, I had to add
some restrictions to look for dependencies only in the same BB.

http://reviews.llvm.org/D12992

llvm-svn: 249196
2015-10-02 22:12:22 +00:00
Bruno Cardoso Lopes b491a2d641 [SimplifyLibCalls] Fix instruction misplacement in string/memory libcall optimization
When trying to optimize fortified library functions use the right
location to insert new instructions in order to preserve correct
def-use order.

This fixes an issue where a misplaced instruction definition would
happen to be *after* one of its use after a RAUW, forming invalid IR.
This behavior was introduced by r227250.

Differential Revision: http://reviews.llvm.org/D13301

rdar://problem/22802369

llvm-svn: 249092
2015-10-01 22:43:53 +00:00
Arnaud A. de Grandmaison 849f3bf8c9 [InstCombine] Remove trivially empty lifetime start/end ranges.
Summary:
Some passes may open up opportunities for optimizations, leaving empty
lifetime start/end ranges. For example, with the following code:

    void foo(char *, char *);
    void bar(int Size, bool flag) {
      for (int i = 0; i < Size; ++i) {
        char text[1];
        char buff[1];
        if (flag)
          foo(text, buff); // BBFoo
      }
    }

the loop unswitch pass will create 2 versions of the loop, one with
flag==true, and the other one with flag==false, but always leaving
the BBFoo basic block, with lifetime ranges covering the scope of the for
loop. Simplify CFG will then remove BBFoo in the case where flag==false,
but will leave the lifetime markers.

This patch teaches InstCombine to remove trivially empty lifetime marker
ranges, that is ranges ending right after they were started (ignoring
debug info or other lifetime markers in the range).

This fixes PR24598: excessive compile time after r234581.

Reviewers: reames, chandlerc

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13305

llvm-svn: 249018
2015-10-01 14:54:31 +00:00
Jingyue Wu df1a1b113b [NaryReassociate] SeenExprs records WeakVH
Summary:
The instructions SeenExprs records may be deleted during rewriting.
FindClosestMatchingDominator should ignore these deleted instructions.

Fixes PR24301.

Reviewers: grosser

Subscribers: grosser, llvm-commits

Differential Revision: http://reviews.llvm.org/D13315

llvm-svn: 248983
2015-10-01 03:51:44 +00:00
Dehao Chen 7c41dd6498 Update sample profile propagation algorithm.
http://reviews.llvm.org/D13218

llvm-svn: 248968
2015-10-01 00:26:56 +00:00
Michael Zolotukhin fc783e91e0 [SLP] Don't vectorize loads of non-packed types (like i1, i2).
Summary:
Given an array of i2 elements, 4 consecutive scalar loads will be lowered to
i8-sized loads and thus will access 4 consecutive bytes in memory. If we
vectorize these loads into a single <4 x i2> load, it'll access only 1 byte in
memory. Hence, we should prohibit vectorization in such cases.

PS: Initial patch was proposed by Arnold.

Reviewers: aschwaighofer, nadav, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13277

llvm-svn: 248943
2015-09-30 21:05:43 +00:00
Evgeniy Stepanov f608111d1b Fix debug info with SafeStack.
llvm-svn: 248933
2015-09-30 19:55:43 +00:00
Fiona Glaser b0c6d9174e DeadCodeElimination: rewrite to be faster
Same strategy as simplifyInstructionsInBlock. ~1/3 less time
on my test suite. This pass doesn't have many in-tree users,
but getting rid of an O(N^2) worst case and making it cleaner
should at least make it a viable alternative to ADCE, since
it's now consistently somewhat faster.

llvm-svn: 248927
2015-09-30 17:49:49 +00:00
Erik Eckstein 848c1aa452 SLPVectorizer: limit the scheduling region size per basic block.
Usually large blocks are not a problem. But if a large block (> 10k instructions)
contains many (potential) chains of vector instructions, and those chains are
spread over a wide range of instructions, then scheduling becomes a compile time problem.
This change introduces a limit for the accumulate scheduling region size of a block.
For real-world functions this limit will never be exceeded (it's about 10x larger than
the maximum value seen in the test-suite and external test suite).

llvm-svn: 248917
2015-09-30 17:00:44 +00:00
Andrea Di Biagio 0594e2a1e9 [InstCombine] Teach how to convert SSSE3/AVX2 byte shuffles to builtin shuffles if the shuffle mask is constant.
This patch teaches InstCombiner how to convert a SSSE3/AVX2 byte shuffle to a
builtin shuffle if the mask is constant.

Converting byte shuffle intrinsic calls to builtin shuffles can help finding
more opportunities for combining shuffles later on in selection dag.

We may end up with byte shuffles with constant masks as the result of inlining.

Differential Revision: http://reviews.llvm.org/D13252

llvm-svn: 248913
2015-09-30 16:44:39 +00:00
Dehao Chen 6722688eaa http://reviews.llvm.org/D13145
Support hierarachical sample profile format.

llvm-svn: 248865
2015-09-30 00:42:46 +00:00
Evgeniy Stepanov d3f544f271 [safestack] Fix a stupid mix-up in the direct-tls code path.
llvm-svn: 248863
2015-09-30 00:01:47 +00:00
Dehao Chen 8e7df83e6a http://reviews.llvm.org/D13231
Change lookup functions to const functions.

llvm-svn: 248818
2015-09-29 18:28:15 +00:00
Dehao Chen 028e122ca9 Revert r248810 which breaks tests.
llvm-svn: 248814
2015-09-29 18:18:49 +00:00
Dehao Chen 410a25aa7a http://reviews.llvm.org/D13231
Change lookup functions to const functions.

llvm-svn: 248810
2015-09-29 17:59:58 +00:00
Simon Pilgrim 43f5e0848e [InstCombine] Improve Vector Demanded Bits Through Bitcasts
Currently SimplifyDemandedVectorElts can only peek through bitcasts if the vectors have the same number of elements.

This patch fixes and enables some existing (disabled) code to support bitcasting to vectors with more/fewer elements. It currently only accepts cases when vectors alias cleanly (i.e. number of elements are an exact multiple of the other vector).

This was added to improve the demanded vector elements support for SSE vector shifts which require the __m128i (<2 x i64>) argument type to be bitcast to the vector type for the builtin shift. I've added extra tests for various additional bitcasts.

Differential Revision: http://reviews.llvm.org/D12935

llvm-svn: 248784
2015-09-29 08:19:11 +00:00
Chen Li 9f27fc0599 [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
Summary: This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Reviewers: broune, silvas, dnovillo, reames

Subscribers: davidxl, llvm-commits

Differential Revision: http://reviews.llvm.org/D11605

llvm-svn: 248777
2015-09-29 05:03:32 +00:00
Evgeniy Stepanov d8b86f7cdc Move dbg.declare intrinsics when merging and replacing allocas.
Place new and update dbg.declare calls immediately after the
corresponding alloca.

Current code in replaceDbgDeclareForAlloca puts the new dbg.declare
at the end of the basic block. LLVM codegen has problems emitting
debug info in a situation when dbg.declare appears after all uses of
the variable. This usually kinda works for inlining and ASan (two
users of this function) but not for SafeStack (see the pending change
in http://reviews.llvm.org/D13178).

llvm-svn: 248769
2015-09-29 00:30:19 +00:00
Sean Silva ace7818ce6 [GlobalOpt] Sort members of llvm.used deterministically
Patch by Jake VanAdrighem!

Summary:
Fix the way we sort the llvm.used and llvm.compiler.used members.

This bug seems to have been introduced in rL183756 through a set of improper casts to GlobalValue*. In subsequent patches this problem was missed and transformed into a getName call on a ConstantExpr.

Reviewers: silvas

Subscribers: silvas, llvm-commits

Differential Revision: http://reviews.llvm.org/D12851

llvm-svn: 248728
2015-09-28 19:02:11 +00:00
Fiona Glaser f74cc40e34 Improve performance of SimplifyInstructionsInBlock
1. Use a worklist, not a recursive approach, to avoid needless
   revisitation and being repeatedly forced to jump back to the
   start of the BB if a handle is invalidated.

2. Only insert operands to the worklist if they become unused
   after a dead instruction is removed, so we don’t have to
   visit them again in most cases.

3. Use a SmallSetVector to track the worklist.

4. Instead of pre-initting the SmallSetVector like in
   DeadCodeEliminationPass, only put things into the worklist
   if they have to be revisited after the first run-through.
   This minimizes how much the actual SmallSetVector gets used,
   which saves a lot of time.

llvm-svn: 248727
2015-09-28 18:56:07 +00:00
Weiming Zhao 310770a90f [LoopReroll] Ignore debug intrinsics
Originally, debug intrinsics and annotation intrinsics may prevent
the loop to be rerolled, now they are ignored.

Differential Revision: http://reviews.llvm.org/D13150

llvm-svn: 248718
2015-09-28 17:03:23 +00:00
Sanjay Patel 9533407566 [InstCombine] fold zexts and constants into a phi (PR24766)
This is one step towards solving PR24766:
https://llvm.org/bugs/show_bug.cgi?id=24766

We were not producing the same IR for these two C functions because the store
to the temp bool causes extra zexts:

#include <stdbool.h>

bool switchy(char x1, char x2, char condition) {
   bool conditionMet = false;
   switch (condition) {
   case 0: conditionMet = (x1 == x2); break;
   case 1: conditionMet = (x1 <= x2); break;
   }
   return conditionMet;
}

bool switchy2(char x1, char x2, char condition) {
   switch (condition) {
   case 0: return (x1 == x2);
   case 1: return (x1 <= x2);
   }
  return false;
}

As noted in the code comments, this test case manages to avoid the more general existing
phi optimizations where there are only 2 phi inputs or where there are no constant phi 
args mixed in with the casts ops. It seems like a corner case, but if we don't catch it, 
then I don't think we can get SimplifyCFG to further optimize towards the canonical form
for this function shown in the bug report.

Differential Revision: http://reviews.llvm.org/D12866

llvm-svn: 248689
2015-09-27 20:34:31 +00:00
Joseph Tremoulet 09af67aba5 [EH] Create removeUnwindEdge utility
Summary:
Factor the code that rewrites invokes to calls and rewrites WinEH
terminators to their "unwind to caller" equivalents into a helper in
Utils/Local, and use it in the three places I'm aware of that need to do
this.


Reviewers: andrew.w.kaylor, majnemer, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13152

llvm-svn: 248677
2015-09-27 01:47:46 +00:00
Sanjay Patel e1b09caaaf [InstCombine] match De Morgan's Law hidden by zext ops (PR22723)
This is a fix for PR22723:
https://llvm.org/bugs/show_bug.cgi?id=22723

My first attempt at this was to change what I thought was the root problem:

xor (zext i1 X to i32), 1 --> zext (xor i1 X, true) to i32

...but we create the opposite pattern in InstCombiner::visitZExt(), so infinite loop!

My next idea was to fix the matchIfNot() implementation in PatternMatch, but that would
mean potentially returning a different size for the match than what was input. I think
this would require all users of m_Not to check the size of the returned match, so I 
abandoned that idea.

I settled on just fixing the exact case presented in the PR. This patch does allow the
2 functions in PR22723 to compile identically (x86):

bool test(bool x, bool y) { return !x | !y; }
bool test(bool x, bool y) { return !x || !y; }
...
andb	%sil, %dil
xorb	$1, %dil
movb	%dil, %al
retq

Differential Revision: http://reviews.llvm.org/D12705

llvm-svn: 248634
2015-09-25 23:21:38 +00:00
Justin Bogner 0638b7ba99 ADCE: Fix typo in file comment. NFC
llvm-svn: 248613
2015-09-25 21:03:46 +00:00
Charlie Turner 2720593ab4 [InstCombine] Recognize another bswap idiom.
Summary:
The byte-swap recognizer can now notice that this

```
uint32_t bswap(uint32_t x)
{
  x = (x & 0x0000FFFF) << 16 | (x & 0xFFFF0000) >> 16;
  x = (x & 0x00FF00FF) << 8 | (x & 0xFF00FF00) >> 8;
  return x;
}
```
    
is a bswap. Fixes PR23863.

Reviewers: nlewycky, hfinkel, hans, jmolloy, rengolin

Subscribers: majnemer, rengolin, llvm-commits

Differential Revision: http://reviews.llvm.org/D12637

llvm-svn: 248482
2015-09-24 10:24:58 +00:00
Michael Zolotukhin 74621cced7 Add CFG Simplification pass after Loop Unswitching.
Loop unswitching produces conditional branches with constant condition,
and it's beneficial for later passes to clean this up with simplify-cfg.
We do this after the second invocation of loop-unswitch, but not after
the first one. Not doing so might cause problem for passes like
LoopUnroll, whose estimate of loop body size would be less accurate.

Reviewers: hfinkel

Differential Revision: http://reviews.llvm.org/D13064

llvm-svn: 248460
2015-09-24 03:50:17 +00:00
Evgeniy Stepanov 8685daf23e [safestack] Fix compiler crash in the presence of stack restores.
A use can be emitted before def in a function with stack restore
points but no static allocas.

llvm-svn: 248455
2015-09-24 01:23:51 +00:00
Michael Zolotukhin d56ee06d1f [Unroll] When completely unrolling the loop, replace conditinal branches with unconditional.
Nothing is expected to change, except we do less redundant work in
clean-up.

Reviewers: hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12951

llvm-svn: 248444
2015-09-23 23:12:43 +00:00
Wei Mi 3cc9204a52 Put profile variables of COMDAT functions to it's own COMDAT group.
In -fprofile-instr-generate compilation, to remove the redundant profile
variables for the COMDAT functions, these variables are placed in the same
COMDAT group as its associated function. This way when the COMDAT function
is not picked by the linker, those profile variables will also not be
output in the final binary. This may cause warning when mix link objects
built w and wo -fprofile-instr-generate.

This patch puts the profile variables for COMDAT functions to its own COMDAT
group to avoid the problem.

Patch by xur.
Differential Revision: http://reviews.llvm.org/D12248

llvm-svn: 248440
2015-09-23 22:40:45 +00:00
Lawrence Hu cac0b89289 Swap loop invariant GEP with loop variant GEP to allow more LICM.
This patch changes the order of GEPs generated by Splitting GEPs
    pass, specially when one of the GEPs has constant and the base is
    loop invariant, then we will generate the GEP with constant first
    when beneficial, to expose more cases for LICM.

    If originally Splitting GEP generate the following:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 %3
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 1032
      ...
    Now it genereates:
      do.body.i:
        %idxprom.i = sext i32 %shr.i to i64
        %2 = bitcast %typeD* %s to i8*
        %3 = shl i64 %idxprom.i, 2
        %uglygep = getelementptr i8, i8* %2, i64 1032
        %uglygep7 = getelementptr i8, i8* %uglygep, i64 %3
      ...

    For no-loop cases, the original way of generating GEPs seems to
    expose more CSE cases, so we don't change the logic for no-loop
    cases, and only limit our change to the specific case we are
    interested in.

llvm-svn: 248420
2015-09-23 19:25:30 +00:00
Akira Hatanaka f6afd11538 [InstCombine] Preserve metadata when merging loads that are phi
arguments.

Make sure InstCombiner::FoldPHIArgLoadIntoPHI doesn't drop the following
metadata:

MD_tbaa
MD_alias_scope
MD_noalias
MD_invariant_load
MD_nonnull
MD_range

rdar://problem/17617709

Differential Revision: http://reviews.llvm.org/D12710

llvm-svn: 248419
2015-09-23 18:40:57 +00:00
Evgeniy Stepanov a2002b08f7 Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

This is a re-commit of a change in r248357 that was reverted in
r248358.

llvm-svn: 248405
2015-09-23 18:07:56 +00:00
Vedant Kumar ff08e926ba [Inline] Use AssumptionCache from the right Function
This changes the behavior of AddAligntmentAssumptions to match its
comment. I.e, prove the asserted alignment in the context of the caller,
not the callee.

Thanks to Mehdi Amini for seeing the issue here! Also to Artur Pilipenko
who also saw a fix for the issue.

rdar://22521387

Differential Revision: http://reviews.llvm.org/D12997

llvm-svn: 248390
2015-09-23 15:49:08 +00:00
David Majnemer fa36bde2f6 [DeadArgElim] Split the invoke successor edge
Invoking a function which returns an aggregate can sometimes be
transformed to return a scalar value.  However, this means that we need
to create an insertvalue instruction(s) to recreate the correct
aggregate type.  We achieved this by inserting an insertvalue
instruction at the invoke's normal successor.  However, this is not
feasible if the normal successor uses the invoke's return value inside a
PHI node.

Instead, split the edge between the invoke and the unwind successor and
create the insertvalue instruction in the new basic block.  The new
basic block's successor will be the old invoke successor which leaves
us with IR which is well behaved.

This fixes PR24906.

llvm-svn: 248387
2015-09-23 15:41:09 +00:00
Igor Laevsky 029bd93c5d [DeadStoreElimination] Remove dead zero store to calloc initialized memory
This change allows dead store elimination to remove zero and null stores into memory freshly allocated with calloc-like function.

Differential Revision: http://reviews.llvm.org/D13021

llvm-svn: 248374
2015-09-23 11:38:44 +00:00
Simon Pilgrim 9cb018b6b6 [X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR
This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead.

LLVM counterpart to D12835

Differential Revision: http://reviews.llvm.org/D13002

llvm-svn: 248368
2015-09-23 08:48:33 +00:00
Sanjoy Das 2aacc0ecca [SCEV] Introduce ScalarEvolution::getOne and getZero.
Summary:
It is fairly common to call SE->getConstant(Ty, 0) or
SE->getConstant(Ty, 1); this change makes such uses a little bit
briefer.

I've refactored the call sites I could find easily to use getZero /
getOne.

Reviewers: hfinkel, majnemer, reames

Subscribers: sanjoy, llvm-commits

Differential Revision: http://reviews.llvm.org/D12947

llvm-svn: 248362
2015-09-23 01:59:04 +00:00
Evgeniy Stepanov 8d0e3011d8 Revert "Android support for SafeStack."
test/Transforms/SafeStack/abi.ll breaks when target is not supported;
needs refactoring.

llvm-svn: 248358
2015-09-23 01:23:22 +00:00
Evgeniy Stepanov ce2e16f00c Android support for SafeStack.
Add two new ways of accessing the unsafe stack pointer:

* At a fixed offset from the thread TLS base. This is very similar to
  StackProtector cookies, but we plan to extend it to other backends
  (ARM in particular) soon. Bionic-side implementation here:
  https://android-review.googlesource.com/170988.
* Via a function call, as a fallback for platforms that provide
  neither a fixed TLS slot, nor a reasonable TLS implementation (i.e.
  not emutls).

llvm-svn: 248357
2015-09-23 01:03:51 +00:00
Michael Zolotukhin deade19630 [Unroll] Do not crash trying to propagate a value to vector load.
llvm-svn: 248333
2015-09-22 22:27:12 +00:00
Michael Zolotukhin 8bb31dd08a [Unroll] Follow-up for r247769: fix a bug in UnrolledInstAnalyzer::visitLoad.
Apart from checking that GlobalVariable is a constant, we should check
that it's not a weak constant, in which case we can't propagate its
value.

llvm-svn: 248327
2015-09-22 21:41:29 +00:00
NAKAMURA Takumi 10c80e7996 Prune trailing whitespaces.
llvm-svn: 248265
2015-09-22 11:19:03 +00:00
NAKAMURA Takumi 0a7d0ad95f Untabify.
llvm-svn: 248264
2015-09-22 11:15:07 +00:00
NAKAMURA Takumi a9cb538a74 Reformat blank lines.
llvm-svn: 248263
2015-09-22 11:14:39 +00:00
NAKAMURA Takumi 84965031a7 Reformat comment lines.
llvm-svn: 248262
2015-09-22 11:14:12 +00:00
NAKAMURA Takumi 70ad98aca4 Reformat.
llvm-svn: 248261
2015-09-22 11:13:55 +00:00
Evgeniy Stepanov 3c9c8338d0 Remove unused TargetTransformInfo dependency from SafeStack pass.
llvm-svn: 248233
2015-09-22 00:44:32 +00:00
Michael Zolotukhin 9f3aea6e1f [LoopUnswitch] Require DominatorTree info.
Summary:
We should either require the DT info to be available, or check if it's
available in every place we use DT (and we already miss such check in
one place, which causes failures in some cases). As other loop passes
preserve DT and it's usually available, it makes sense to just require
it here.

There is no regression test, because the bug only shows up if pass
manager decides to clean DT info right before LoopUnswitch. If
loop-unswitch is run separately, DT is available, so bug isn't exposed.

Reviewers: chandlerc, hfinkel

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D13036

llvm-svn: 248230
2015-09-22 00:22:47 +00:00
Philip Reames 5f99423de9 [LICM] Hoist calls to readonly argmemonly functions even with stores in the loop
We know that an argmemonly function can only access memory pointed to by it's pointer arguments. Rather than needing to consider all possible stores as aliasing (as we do for a readonly function), we can only consider the aliasing of the pointer arguments.

Note that this change only addresses hoisting. I'm thinking about how to address speculation safety as well, but that will be a different change.

FYI, argmemonly disallows accessing memory through non-pointer typed arguments.  

Differential Revision: http://reviews.llvm.org/D12771

llvm-svn: 248220
2015-09-21 22:27:59 +00:00
Mehdi Amini 24e20583d1 Fix UB: can't bind a reference to nullptr (NFC)
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 248213
2015-09-21 21:29:43 +00:00
James Molloy 50a4c27f97 [LoopUtils,LV] Propagate fast-math flags on generated FCmp instructions
We're currently losing any fast-math flags when synthesizing fcmps for
min/max reductions. In LV, make sure we copy over the scalar inst's
flags. In LoopUtils, we know we only ever match patterns with
hasUnsafeAlgebra, so apply that to any synthesized ops.

llvm-svn: 248201
2015-09-21 19:41:19 +00:00
Chandler Carruth 7542d37688 [FunctionAttrs] Extract a helper function for the core logic used to
evaluate whether 'readonly' or 'readnone' apply to a given function.
This both reduces indentation and will make it easy to share the logic
with a new pass manager implementation.

llvm-svn: 248181
2015-09-21 17:39:41 +00:00
Sanjay Patel 55dcd40d3e add ShouldChangeType() variant that takes bitwidths
This is more efficient for cases like D12965 where we already have widths.

llvm-svn: 248170
2015-09-21 16:09:37 +00:00
Sanjay Patel 84dca494b1 don't repeat function names in comments; NFC
llvm-svn: 248166
2015-09-21 15:33:26 +00:00
Sanjoy Das 7cc2cfecd9 [IndVars] Use C++11 style field initialization; NFCI.
llvm-svn: 248131
2015-09-20 18:42:53 +00:00
Sanjoy Das e1e352d5c5 [IndVars] Don't add a level of indentation for namespace {. NFC.
Whitespace-only change.

llvm-svn: 248130
2015-09-20 18:42:50 +00:00
Sanjoy Das 9119bf4c0b [IndVars] Don't repeat function names in comment; NFC.
Only changes comments.

llvm-svn: 248112
2015-09-20 06:58:03 +00:00
Sanjoy Das 428db150d1 [IndVars] Fix a bug in r248045.
Because -indvars widens induction variables through arithmetic,
`NeverNegative` cannot be a property of the `WidenIV` (a `WidenIV`
manages information for all transitive uses of an IV being widened,
including uses of `-1 * IV`).  Instead it must live on `NarrowIVDefUse`
which manages information for a specific def-use edge in the transitive
use list of an induction variable.

This change also adds a test case that demonstrates the problem with
r248045.

llvm-svn: 248107
2015-09-20 01:52:18 +00:00
Simon Pilgrim 996725eb17 [InstCombine] Use SimplifyDemandedVectorEltsLow helper function. NFCI.
Use the SimplifyDemandedVectorEltsLow helper function introduced in D12680.

llvm-svn: 248089
2015-09-19 11:41:53 +00:00
David Majnemer 47ce0b81b0 [InstCombine] FoldICmpCstShrCst failed for ashr when comparing against -1
(icmp eq (ashr C1, %V) -1) may have multiple answers if C1 is not a
power of two and has the sign bit set.

This fixes PR24873.

llvm-svn: 248074
2015-09-19 00:48:31 +00:00
David Majnemer e5977ebecc [InstCombine] FoldICmpCstShrCst didn't handle icmps of -1 in the ashr case correctly
llvm-svn: 248073
2015-09-19 00:48:26 +00:00
Sanjoy Das f69d0e3384 [IndVars] Widen more comparisons for non-negative induction vars
Summary:
If an induction variable is provably non-negative, its sign extension is
equal to its zero extension.  This means narrow uses like

  icmp slt iNarrow %indvar, %rhs

can be widened into

  icmp slt iWide zext(%indvar), sext(%rhs)

Reviewers: atrick, mcrosier, hfinkel

Subscribers: hfinkel, reames, llvm-commits

Differential Revision: http://reviews.llvm.org/D12745

llvm-svn: 248045
2015-09-18 21:21:02 +00:00
Larisse Voufo 532bf7153c Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan. (Complete version of r247497. See D12886)
llvm-svn: 248022
2015-09-18 19:14:35 +00:00
Piotr Padlewski a4d43337d4 gvn small fix
http://reviews.llvm.org/D12928

llvm-svn: 247935
2015-09-17 20:34:22 +00:00
Simon Pilgrim 61116ddc7b [InstCombine] Added vector demanded bits support for SSE4A EXTRQ/INSERTQ instructions
The SSE4A instructions EXTRQ/INSERTQ only use the lower 64-bits (or less) for many of their input vector operands and all of them have undefined upper 64-bits results.

Differential Revision: http://reviews.llvm.org/D12680

llvm-svn: 247934
2015-09-17 20:32:45 +00:00
Sanjoy Das e5f4889ba9 [InstCombine] Optimize icmp slt signum(x), 1 --> icmp slt x, 1
Summary:
`signum(x)` is sometimes implemented as `(x >> 63) | (-x >>> 63)` (for
an `i64` `x`).  This change adds a matcher for that pattern, and an
instcombine rule to optimize `signum(x) s< 1`.

Later, we can also consider optimizing:

  icmp slt signum(x), 0 --> icmp slt x, 0
  icmp sle signum(x), 1 --> true

etc.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12703

llvm-svn: 247846
2015-09-16 20:41:29 +00:00
Sanjay Patel 815adacd22 don't repeat function names in comments; NFC
llvm-svn: 247813
2015-09-16 16:21:08 +00:00
Adhemerval Zanella f0c95bd2ca [sanitizer] Add MSan support for AArch64
This patch adds support for msan on aarch64-linux for both 39 and
42-bit VMA.  The support is enabled by defining the
SANITIZER_AARCH64_VMA compiler flag to either 39 or 42 at build time
for both clang/llvm and compiler-rt.  The default VMA is 39 bits.

llvm-svn: 247807
2015-09-16 15:10:27 +00:00
David L Kreitzer da700ce581 Test commit: Fixed a few typos in the comments.
llvm-svn: 247793
2015-09-16 13:27:30 +00:00
Michael Zolotukhin fc314be0ec [Unroll] Fix a bug in UnrolledInstAnalyzer::visitLoad.
We only checked that a global is initialized with constants, which is
incorrect. We should be checking that GlobalVariable *is* a constant,
not just initialized with it.

llvm-svn: 247769
2015-09-16 03:25:09 +00:00
Sanjoy Das 8a5526e8be [IndVars] Fix PR24783.
In `IndVarSimplify::ExpandSCEVIfNeeded`,
`SCEVExpander::findExistingExpansion` may return an `llvm::Value` that
differs in type from the SCEV it was asked to find an expansion for (but
computes the same value).  In such cases, we fall back on
`expandCodeFor`; and rely on LLVM to CSE the two equivalent
expressions (different only by a no-op cast) into a single computation.

I tried a few other approaches to fixing PR24783, all of which turned
out to be more complex than this current version:

 1. Move the `ExpandSCEVIfNeeded` logic into `expandCodeFor`.  This got
    problematic because currently we do not pass in the `Loop *` into
    `expandCodeFor`.  Changing the interface to do this is a more
    invasive change, and really does not make much semantic sense unless
    the SCEV being passed in is an add recurrence.

    There is also the problem of `expandCodeFor` being used in places
    other than `indvars` -- there may be performance / correctness
    issues elsewhere if `expandCodeFor` is moved from always generating
    IR from scratch to cache-like model.

 2. Have `findExistingExpansion` only return expression with the correct
    type.  This would make `isHighCostExpansionHelper` and thus
    `isHighCostExpansion` more conservative than necessary.

 3. Insert casts on the value returned by `findExistingExpansion` if
    needed using `InsertNoopCastOfTo`.  This is complicated because
    `InsertNoopCastOfTo` depends on internal state of its
    `SCEVExpander` (specifically `Builder.GetInserPoint()`), and this
    may not be set up when `ExpandSCEVIfNeeded` is called.

 4. Manually insert casts on the value returned by
    `findExistingExpansion` if needed using `InsertNoopCastOfTo` via
    `CastInst::Create`.  This is probably workable, but figuring out the
    location where the cast instruction needs to be inserted has enough
    edge cases (arguments, constants, invokes, LCSSA must be preserved)
    makes me feel what I have right now is simplest solution.

llvm-svn: 247749
2015-09-15 23:45:39 +00:00
Sanjoy Das 0ce51a92a8 [IndVars] Rename variable; NFC.
llvm-svn: 247748
2015-09-15 23:45:35 +00:00
Alexey Samsonov c1603b6493 [ASan] Don't instrument globals in .preinit_array/.init_array/.fini_array
These sections contain pointers to function that should be invoked
during startup/shutdown by __libc_csu_init and __libc_csu_fini.
Instrumenting these globals will append redzone to them, which will be
filled with zeroes. This will cause null pointer dereference at runtime.

Merge ASan regression tests for globals that should be ignored by
instrumentation pass.

llvm-svn: 247734
2015-09-15 23:05:48 +00:00
Larisse Voufo 6b867c7254 Revert "Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan." for preliminary community discussion (See. D12886)
llvm-svn: 247716
2015-09-15 19:14:05 +00:00
Arch D. Robison 8ed0854f55 Broaden optimization of fcmp ([us]itofp x, constant) by instcombine.
The patch extends the optimization to cases where the constant's
magnitude is so small or large that the rounding of the conversion
is irrelevant.  The "so small" case includes negative zero.

Differential review: http://reviews.llvm.org/D11210

llvm-svn: 247708
2015-09-15 17:51:59 +00:00
Igor Laevsky bdc1eafe20 [CorrelatedValuePropagation] Infer nonnull attributes
LazuValueInfo can prove that value is nonnull based on the context information. 
Make use of this ability to infer nonnull attributes for the call arguments.

Differential Revision: http://reviews.llvm.org/D12836

llvm-svn: 247707
2015-09-15 17:51:50 +00:00
Marcello Maggioni 454faa84e2 [NaryReassociate] Add support for Mul instructions
This patch extends the current pass by handling
Mul instructions as well.

Patch by: Volkan Keles (vkeles@apple.com)

llvm-svn: 247705
2015-09-15 17:22:52 +00:00
Sanjay Patel f9b776350f more space; NFC
llvm-svn: 247699
2015-09-15 15:24:42 +00:00
James Molloy d5b161a221 [GlobalsAA] Disable globals-aa by default
Several issues have been found with it - disabling in the meantime.

llvm-svn: 247674
2015-09-15 10:44:06 +00:00
Sanjoy Das f75e15e5ac [PlaceSafepoints] Make the width of a counted loop settable.
Summary:
This change lets a `PlaceSafepoints` client change how wide the trip
count of a loop has to be for the loop to be considerd "counted", via
`CountedLoopTripWidth`.  It also removes the boolean `SkipCounted` flag
and the `upperTripBound` constant -- we can get the old behavior of
`SkipCounted` == `false` by setting `CountedLoopTripWidth` to `13` (2 ^
13 == 8192).

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12789

llvm-svn: 247656
2015-09-15 01:42:48 +00:00
David Blaikie 6614d8d230 [opaque pointer types] Switch a few cases of getElementType over, since I had them lying around anyway
llvm-svn: 247610
2015-09-14 20:29:26 +00:00
Chen Li 0d043b52eb [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG. It also adds assertions in isKnownNonNull() and isKnownNonNullFromDominatingCondition() to make sure the value checked is pointer type (as defined in LLVM document). These assertions might trip failures in things which are not  covered under llvm/test, but fixes should be pretty obvious. 

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247587
2015-09-14 18:10:43 +00:00
David Blaikie 16a2f3e302 Revert "[opaque pointer type] Pass GlobalAlias the actual pointer type rather than decomposing it into pointee type + address space"
This was a flawed change - it just caused the getElementType call to be
deferred until later, when we really need to remove it. Now that the IR
for GlobalAliases has been updated, the root cause is addressed that way
instead and this change is no longer needed (and in fact gets in the way
- because we want to pass the pointee type directly down further).

Follow up patches to push this through GlobalValue, bitcode format, etc,
will come along soon.

This reverts commit 236160.

llvm-svn: 247585
2015-09-14 18:01:59 +00:00
JF Bastien 26aca14b15 [MergeFuncs] Fix bug in merging GetElementPointers
GetElementPointers must have the first argument's type compared
for structural equivalence. Previously the code erroneously compared the
pointer's type, but this code was dead because all pointer types (of the
same address space) are the same. The pointee must be compared instead
(using the type stored in the GEP, not from the pointer type which will
be erased anyway).

Author: jrkoenig
Reviewers: dschuff, nlewycky, jfb
Subscribers: nlewycky, llvm-commits
Differential revision: http://reviews.llvm.org/D12820

llvm-svn: 247570
2015-09-14 15:37:48 +00:00
Chandler Carruth 3824f859f5 [FunctionAttrs] Move the malloc-like test to a static helper function
that could be used from a new pass manager. This one makes particular
sense as a static helper as it doesn't even need TLI.

llvm-svn: 247525
2015-09-13 08:23:27 +00:00
Chandler Carruth 8874b78697 [FunctionAttrs] Factor the logic to test for a known non-null return out
of a method and into a re-usable static helper. We can potentially use
this function from the implementation of a new pass manager oriented
version of the pass. Also add some better documentation of exactly what
the semantic model of this routine is (it isn't trivial) and use a more
modern naming convention for it.

llvm-svn: 247524
2015-09-13 08:17:14 +00:00
Chandler Carruth 444d005615 [FunctionAttrs] Make the per-function attribute inference a boring
static function rather than a method. It just needed access to
TargetLibraryInfo, and this way it can be easily reused between the
current FunctionAttrs implementation and any port for the new pass
manager.

llvm-svn: 247522
2015-09-13 08:03:23 +00:00
Chandler Carruth d02452015c [FunctionAttrs] Collect utility functions as static helpers rather than
methods. They don't need anything from the class anyways.

Also, collect the declarations into the private section of the pass.

llvm-svn: 247521
2015-09-13 07:50:43 +00:00
Chandler Carruth a632fb9e86 Clean up doxygen comments in FunctionAttrs, promoting some non-doxygen
comments, deleting duplicate comments, moving comments to consistently
live on the definition since these are all really internal routines,
etc. NFC.

llvm-svn: 247520
2015-09-13 06:57:25 +00:00
Chandler Carruth 63559d7cd4 Do some spring cleaning on FunctionAttrs.cpp with clang-format prior to
other refactorings and cleanups here.

llvm-svn: 247519
2015-09-13 06:47:20 +00:00
Simon Pilgrim 48ffca0f47 Fixed unused variable warning.
llvm-svn: 247505
2015-09-12 14:00:17 +00:00
Simon Pilgrim 20c607b110 [InstCombine] CVTPH2PS Vector Demanded Elements + Constant Folding
Improved InstCombine support for CVTPH2PS (F16C half 2 float conversion):

<4 x float> @llvm.x86.vcvtph2ps.128(<8 x i16>) - only uses the bottom 4 i16 elements for the conversion.

Added constant folding support.

Differential Revision: http://reviews.llvm.org/D12731

llvm-svn: 247504
2015-09-12 13:39:53 +00:00
Chandler Carruth 29a18a4663 [PM] Port SROA to the new pass manager.
In some ways this is a very boring port to the new pass manager as there
are no interesting analyses or dependencies or other oddities.

However, this does introduce the first good example of a transformation
pass with non-trivial state porting to the new pass manager. I've tried
to carve out patterns here to replicate elsewhere, and would appreciate
comments on whether folks like these patterns:

- A common need in the new pass manager is to effectively lift the pass
  class and some of its state into a public header file. Prior to this,
  LLVM used anonymous namespaces to provide "module private" types and
  utilities, but that doesn't scale to cases where a public header file
  is needed and the new pass manager will exacerbate that. The pattern
  I've adopted here is to use the namespace-cased-name of the core pass
  (what would be a module if we had them) as a module-private namespace.
  Then utility and other code can be declared and defined in this
  namespace. At some point in the future, we could even have
  (conditionally compiled) code that used modules features when
  available to do the same basic thing.

- I've split the actual pass run method in two in order to expose
  a private method usable by the old pass manager to wrap the new class
  with a minimum of duplicated code. I actually looked at a bunch of
  ways to automate or generate these, but they are all quite terrible
  IMO. The fundamental need is to extract the set of analyses which need
  to cross this interface boundary, and that will end up being too
  unpredictable to effectively encapsulate IMO. This is also
  a relatively small amount of boiler plate that will live a relatively
  short time, so I'm not too worried about the fact that it is boiler
  plate.

The rest of the patch is totally boring but results in a massive diff
(sorry). It just moves code around and removes or adds qualifiers to
reflect the new name and nesting structure.

Differential Revision: http://reviews.llvm.org/D12773

llvm-svn: 247501
2015-09-12 09:09:14 +00:00
Larisse Voufo f57162b6e7 Clean up: Refactoring the hardcoded value of 6 for FindAvailableLoadedValue()'s parameter MaxInstsToScan.
llvm-svn: 247497
2015-09-12 01:41:55 +00:00
Bruce Mitchener e9ffb45b60 Fix typos.
Summary: This fixes a variety of typos in docs, code and headers.

Subscribers: jholewinski, sanjoy, arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D12626

llvm-svn: 247495
2015-09-12 01:17:08 +00:00
Sanjay Patel 41c739b3fa typo; NFC
llvm-svn: 247454
2015-09-11 19:29:18 +00:00
Mehdi Amini 2bd08527ff Revert "[InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite"
This reverts commit r247356.

Breaks test/Transforms/InstCombine/pr8547.ll with:

Wrong types for attribute: byval inalloca nest noalias nocapture nonnull readnone readonly sret dereferenceable(1) dereferenceable_or_null(1)
  %call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str, i64 0, i64 0), i32 nonnull %conv2) #0
LLVM ERROR: Broken function found, compilation aborted!

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 247371
2015-09-11 01:33:48 +00:00
Chen Li a29c612ddd [InstCombineCalls] Use isKnownNonNullAt() to check nullness of passing arguments at callsite
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of passing arguments at callsite. In this way it can handle cases where the argument does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D12779

llvm-svn: 247356
2015-09-10 23:04:49 +00:00
Chen Li 32a51416e5 [InstCombineCalls] Use isKnownNonNullAt() to check nullness of gc.relocate return value
Summary: This patch replaces isKnownNonNull() with isKnownNonNullAt() when checking nullness of gc.relocate return value. In this way it can handle cases where the relocated value does not have nonnull attribute but has a dominating null check from the CFG.

Reviewers: reames

Subscribers: llvm-commits, sanjoy

Differential Revision: http://reviews.llvm.org/D12772

llvm-svn: 247353
2015-09-10 22:35:41 +00:00
Filipe Cabecinhas 48b090a31f Remove gcc warning when comparing an unsigned var for >= 0
llvm-svn: 247352
2015-09-10 22:34:39 +00:00
Matthew Simpson 29dc0f7075 [LV] Relax Small Size Reduction Type Requirement
This patch enables small size reductions in which the source types are smaller
than the reduction type (e.g., computing an i16 sum from the values in an i8
array). The previous behavior was to only allow small size reductions if the
source types and reduction type were the same. The change accounts for the fact
that the existing sign- and zero-extend instructions in these cases should
still be included in the cost model.

Differential Revision: http://reviews.llvm.org/D12770

llvm-svn: 247337
2015-09-10 21:12:57 +00:00
JF Bastien fa946233b4 [MergeFuncs] Fix callsite attributes in thunk generation
This change correctly sets the attributes on the callsites
generated in thunks. This makes sure things such as sret, sext, etc.
are correctly set, so that the call can be a proper tailcall.

Also, the transfer of attributes in the replaceDirectCallers function
appears to be unnecessary, but until this is confirmed it will remain.

Author: jrkoenig
Reviewers: dschuff, jfb
Subscribers: llvm-commits, nlewycky
Differential revision: http://reviews.llvm.org/D12581

llvm-svn: 247313
2015-09-10 18:08:35 +00:00
Philip Reames 053701399d [SimplifyCFG] Use known bits to eliminate dead switch defaults
This is a follow up to http://reviews.llvm.org/D11995 implementing the suggestion by Hans.

If we know some of the bits of the value being switched on, we know that the maximum number of unique cases covers the unknown bits. This allows to eliminate switch defaults for large integers (i32) when most bits in the value are known.

Note that I had to make the transform contingent on not having any dead cases. This is conservatively correct with the old code, but required for the new code since we might have a dead case which varies one of the known bits. Counting that towards our number of covering cases would be bad.  If we do have dead cases, we'll eliminate them first, then revisit the possibly dead default.

Differential Revision: http://reviews.llvm.org/D12497

llvm-svn: 247309
2015-09-10 17:44:47 +00:00
Hans Wennborg aa15bffa1f Re-commit r247216: "Fix Clang-tidy misc-use-override warnings, other minor fixes"
Except the changes that defined virtual destructors as =default, because that
ran into problems with GCC 4.7 and overriding methods that weren't noexcept.

llvm-svn: 247298
2015-09-10 16:49:58 +00:00
Sanjay Patel 9361d35525 80-cols; NFC
llvm-svn: 247295
2015-09-10 16:31:19 +00:00
Sanjay Patel f4b34b76d4 use range-based for loop; NFCI
llvm-svn: 247294
2015-09-10 16:25:38 +00:00
Sanjay Patel 5e7bd91891 use range-based for loop; NFCI
llvm-svn: 247293
2015-09-10 16:15:21 +00:00
Sanjay Patel 59661459f1 fix typo; NFC
llvm-svn: 247287
2015-09-10 15:14:34 +00:00