Commit Graph

6688 Commits

Author SHA1 Message Date
Simon Pilgrim ca140b17cb [InstCombine][SSE] Added support to VPERMD/VPERMPS to shuffle combine to accept UNDEF elements.
llvm-svn: 268206
2016-05-01 20:43:02 +00:00
Simon Pilgrim c590492075 Dropped FIXME comment
llvm-svn: 268205
2016-05-01 20:33:25 +00:00
Simon Pilgrim eeacc40e27 [InstCombine][SSE] Added support to VPERMILVAR to shuffle combine to accept UNDEF elements.
llvm-svn: 268204
2016-05-01 20:22:42 +00:00
Simon Pilgrim cc7f567b6a [InstCombine][AVX] Fixed PERMILVAR identity tests and added additional decode tests
llvm-svn: 268203
2016-05-01 20:06:47 +00:00
Simon Pilgrim e5e8c2fde0 [InstCombine][SSE] Added support to PSHUFB to shuffle combine to accept UNDEF elements.
llvm-svn: 268202
2016-05-01 19:26:21 +00:00
Simon Pilgrim cae3e70707 [InstCombine][SSE] Regenerate MOVSX/MOVZX tests
llvm-svn: 268201
2016-05-01 18:28:45 +00:00
Simon Pilgrim 8cddf8b3c6 [InstCombine][AVX2] Combine VPERMD/VPERMPS intrinsics with constant masks to shufflevector.
llvm-svn: 268199
2016-05-01 16:41:22 +00:00
Simon Pilgrim c179435055 [InstCombine][AVX2] Added VPERMD/VPERMPS shuffle combining placeholder tests.
For future support for VPERMD/VPERMPS to generic shuffles combines

llvm-svn: 268166
2016-04-30 20:41:52 +00:00
Simon Pilgrim 8e38a5439b [InstCombine][AVX] Split off VPERMILVAR tests and added additional tests for UNDEF mask elements
llvm-svn: 268159
2016-04-30 07:32:19 +00:00
Sanjoy Das 47cf2affbd [LowerGuardIntrinsics] Keep track of !make.implicit metadata
If a guard call being lowered by LowerGuardIntrinsics has the
`!make.implicit` metadata attached, then reattach the metadata to the
branch in the resulting expanded form of the intrinsic.  This allows us
to implement null checks as guards and still get the benefit of implicit
null checks.

llvm-svn: 268148
2016-04-30 00:55:59 +00:00
Lawrence Hu 1befea2bdc Reroll loops with multiple IV and negative step part 3
support multiple induction variables

    This patch enable loop reroll for the following case:
        for(int i=0;  i<N; i += 2) {
           S += *a++;
           S += *a++;
        };

Differential Revision: http://reviews.llvm.org/D16550

llvm-svn: 268147
2016-04-30 00:51:22 +00:00
Sanjoy Das 52c68bb0f5 [LowerGuardIntrinsics] Preserve calling conv when lowering
llvm-svn: 268142
2016-04-30 00:17:47 +00:00
Sanjay Patel bc6fad0bdf add minimal test to show dropped metadata
llvm-svn: 268141
2016-04-30 00:12:54 +00:00
Sanjay Patel 6748ec49e9 remove the metadata added with r267827
We can demonstrate the 'select' bug and fix with a simpler test case.
The merged weight values are already tested in another test.

llvm-svn: 268139
2016-04-30 00:02:36 +00:00
Sanjoy Das 107aefc2fc Mark guards on true as "trivially dead"
This moves some logic added to EarlyCSE in rL268120 into
`llvm::isInstructionTriviallyDead`.  Adds a test case for DCE to
demonstrate that passes other than EarlyCSE can now pick up on the new
information.

llvm-svn: 268126
2016-04-29 22:23:16 +00:00
Sanjoy Das ee81b23fe7 [EarlyCSE] Simplify guard intrinsics
Summary:
This change teaches EarlyCSE some basic properties of guard intrinsics:

 - Guard intrinsics read all memory, but don't write to any memory
 - After a guard has executed, the condition it was guarding on can be
   assumed to be true
 - Guard intrinsics on a constant `true` are no-ops

Reviewers: reames, hfinkel

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19578

llvm-svn: 268120
2016-04-29 21:52:58 +00:00
Chad Rosier cd62bf5821 [InstCombine] Determine the result of a select based on a dominating condition.
Differential Revision: http://reviews.llvm.org/D19550

llvm-svn: 268104
2016-04-29 21:12:31 +00:00
David Majnemer d2a074b1f4 [ValueTracking] matchSelectPattern needs to be more careful around FP
matchSelectPattern attempts to see through casts which mask min/max
patterns from being more obvious.  Under certain circumstances, it would
misidentify a sequence of instructions as a min/max because it assumed
that folding casts would preserve the result.  This is not the case for
floating point <-> integer casts.

This fixes PR27575.

llvm-svn: 268086
2016-04-29 18:40:34 +00:00
Geoff Berry b92cd5293e [BasicAA] Treat llvm.assume as not accessing memory in getModRefBehavior(Function)
Reviewers: dberlin, chandlerc, hfinkel, reames, sanjoy

Subscribers: mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19730

llvm-svn: 268068
2016-04-29 17:18:28 +00:00
Sanjay Patel 362dcf9615 auto-generate checks
llvm-svn: 268061
2016-04-29 16:39:37 +00:00
Simon Pilgrim 07a691c706 [InstCombine][SSE] Added x86 pshufb undef mask tests
FIXME: We currently don't support folding constant pshufb shuffle masks containing undef elements.
llvm-svn: 268016
2016-04-29 09:13:53 +00:00
Simon Pilgrim 5779fb61b0 [InstCombine][SSE] Regenerated x86 pshufb tests
llvm-svn: 268014
2016-04-29 08:53:35 +00:00
David Majnemer 1a5799fe3e [DeadArgumentElimination] Propagate operand bundles to promoted call sites
We neglected to transfer operand bundles when performing argument
promotion.

llvm-svn: 268008
2016-04-29 07:22:36 +00:00
Adam Nemet c60d8f8fd0 [LoopDist] Add missing RUN line in test from r268006
llvm-svn: 268007
2016-04-29 07:16:00 +00:00
Adam Nemet 88ec491830 [LoopDist] Also emit optimization remark on success (-Rpass=)
The option -Rpass=loop-distribute now reports the loops that were
distributed.

llvm-svn: 268006
2016-04-29 07:10:46 +00:00
David Majnemer 13d5526392 [SLPVectorizer] Add operand bundles to vectorized functions
SLPVectorizing a call site should result in further propagation of its
bundles.

llvm-svn: 268004
2016-04-29 07:09:51 +00:00
David Majnemer 50ddc0e1b6 [LoopVectorize] Add operand bundles to vectorized functions
Also, do not crash when calculating a cost model for loop-invariant
token values.

llvm-svn: 268003
2016-04-29 07:09:48 +00:00
Matt Arsenault 7d1b6c81af AMDGPU: Stop reporting an addressing mode for unknown addrspace
This was being treated the same as private, which has an immediate
offset. For unknown, it probably means it's for a computation not
actually being used for accessing memory, so it should not have a
nontrivial addressing mode.

llvm-svn: 268002
2016-04-29 06:25:10 +00:00
David Majnemer cd24bb1d3a [ArgumentPromotion] Propagate operand bundles to promoted call sites
We neglected to transfer operand bundles when performing argument
promotion.

This fixes PR27568.

llvm-svn: 267986
2016-04-29 04:56:12 +00:00
Michael Zolotukhin 1816d03b7d [PR25281] Remove AAResultsWrapper from preserved analyses of loop vectorizer.
We don't preserve AAResults, because, for one, we don't preserve SCEV-AA.
That fixes PR25281.

llvm-svn: 267980
2016-04-29 03:31:25 +00:00
Hal Finkel 1b66f7e3c8 [LoopVectorize] Keep hints from original loop on the vector loop
We need to keep loop hints from the original loop on the new vector loop.
Failure to do this meant that, for example:

  void foo(int *b) {
  #pragma clang loop unroll(disable)
    for (int i = 0; i < 16; ++i)
      b[i] = 1;
  }

this loop would be unrolled. Why? Because we'd vectorize it, thus dropping the
hints that unrolling should be disabled, and then we'd unroll it.

llvm-svn: 267970
2016-04-29 01:27:40 +00:00
Adam Nemet 0ba164bbcb [LoopDist] Emit optimization remarks (-Rpass*)
I closely followed the precedents set by the vectorizer:

* With -Rpass-missed, the loop is reported with further details pointing
to -Rpass--analysis.

* -Rpass-analysis reports the details why distribution has failed.

* Regardless of -Rpass*, when distribution fails for a loop where
distribution was forced with the pragma, a warning is produced according
to -Wpass-failed.  In this case the analysis info is also printed even
without -Rpass-analysis.

llvm-svn: 267952
2016-04-28 23:08:32 +00:00
Hal Finkel 50316d95a9 [Inliner] Preserve llvm.mem.parallel_loop_access metadata
When inlining a call site with llvm.mem.parallel_loop_access metadata, this
metadata needs to be propagated to all cloned memory-accessing instructions.
Otherwise, inlining parts of the loop body will invalidate the annotation.

With this functionality, we now vectorize the following as expected:

  void Body(int *res, int *c, int *d, int *p, int i) {
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  }

  void Test(int *res, int *c, int *d, int *p, int n) {
    int i;

  #pragma clang loop vectorize(assume_safety)
    for (i = 0; i < 1600; i++) {
      Body(res, c, d, p, i);
    }
  }

llvm-svn: 267949
2016-04-28 23:00:04 +00:00
Arch D. Robison 0e61034018 [SLPVectorizer] Extend SLP Vectorizer to deal with aggregates.
The refactoring portion part was done as r267748.

http://reviews.llvm.org/D14185

llvm-svn: 267899
2016-04-28 16:11:45 +00:00
Simon Pilgrim bd4a3be7d2 [InstCombine][SSE] Add MOVMSK support to SimplifyDemandedUseBits
The MOVMSK instructions copies a vector elements' sign bits to the low bits of a scalar register and zeros the high bits.

This patch adds MOVMSK support to SimplifyDemandedUseBits so that its aware that the upper bits are known to be zero. It also removes the call to MOVMSK if none of the lower bits are actually required and just returns zero.

Differential Revision: http://reviews.llvm.org/D19614

llvm-svn: 267873
2016-04-28 12:22:53 +00:00
Sanjay Patel 21bd38a07b Update test to use FileCheck
Also, add some metadata to show what that currently looks like.

llvm-svn: 267827
2016-04-28 00:29:27 +00:00
Rong Xu a4c3f67fe8 more buildbot failure fix to r267792
__llvm_prf_nm length is embedded in llvm_used. Relax llvm_used check.

llvm-svn: 267816
2016-04-27 23:23:53 +00:00
Rong Xu 6e34c490ff [PGO] Promote indirect calls to conditional direct calls with value-profile
This patch implements the transformation that promotes indirect calls to
conditional direct calls when the indirect-call value profile meta-data is
available.

Differential Revision: http://reviews.llvm.org/D17864

llvm-svn: 267815
2016-04-27 23:20:27 +00:00
Rong Xu 4b1dc5d60b Fix buildbot failure due to r267792
Relax the test check as some targets do not have name compression.

llvm-svn: 267803
2016-04-27 22:06:35 +00:00
Rong Xu af5aebaa32 [PGO] Prohibit address recording if the function is both internal and COMDAT
Differential Revision: http://reviews.llvm.org/D19515

llvm-svn: 267792
2016-04-27 21:17:30 +00:00
Simon Pilgrim 3f595aabe2 [InstCombine][AVX2] Add AVX2 per-element vector shift tests
At the moment we don't simplify PSRAV/PSRLV/PSLLV intrinsics to generic IR for constant shift amounts, but we could.

llvm-svn: 267777
2016-04-27 20:25:34 +00:00
David Majnemer 0c80e2eac6 [CodeGenPrepare] Don't sink a cast past its user
The sink cast machinery is supposed to sink casts as close to their user
as possible.  However, an EH pad is the first instruction in it's basic
block.  Don't sink if the user is an EH pad.

This fixes PR27536.

llvm-svn: 267767
2016-04-27 19:36:38 +00:00
Ahmed Bougacha ace97c1f7d [LIR] Set attributes on memset_pattern16.
"inferattrs" will deduce the attribute, but it will be too late for
many optimizations. Set it ourselves when creating the call.

Differential Revision: http://reviews.llvm.org/D17598

llvm-svn: 267762
2016-04-27 19:04:50 +00:00
Ahmed Bougacha 44c19876c7 [InferAttrs] Mark memset_pattern16 params nocapture.
Differential Revision: http://reviews.llvm.org/D19471

llvm-svn: 267760
2016-04-27 19:04:43 +00:00
Matthew Simpson 622b95be7b [LV] Reallow positive-stride interleaved load groups with gaps
We previously disallowed interleaved load groups that may cause us to
speculatively access memory out-of-bounds (r261331). We did this by ensuring
each load group had an access corresponding to the first and last member.
Instead of bailing out for these interleaved groups, this patch enables us to
peel off the last vector iteration, ensuring that we execute at least one
iteration of the scalar remainder loop. This solution was proposed in the
review of the previous patch.

Differential Revision: http://reviews.llvm.org/D19487

llvm-svn: 267751
2016-04-27 18:21:36 +00:00
Gerolf Hoflehner 88017c08a6 [InstCombine] Sharpended test case in pr21210.ll
llvm-svn: 267742
2016-04-27 17:19:54 +00:00
Matthew Simpson e5dfb08fcb [TTI] Add hook for vector extract with extension
This change adds a new hook for estimating the cost of vector extracts followed
by zero- and sign-extensions. The motivating example for this change is the
SMOV and UMOV instructions on AArch64. These instructions move data from vector
to general purpose registers while performing the corresponding extension
(sign-extend for SMOV and zero-extend for UMOV) at the same time. For these
operations, TargetTransformInfo can assume the extensions are free and only
report the cost of the vector extract. The SLP vectorizer has been updated to
make use of the new hook.

Differential Revision: http://reviews.llvm.org/D18523

llvm-svn: 267725
2016-04-27 15:20:21 +00:00
Simon Pilgrim f23aa2a9c9 [InstCombine][SSE] Regenerated vector shift tests
llvm-svn: 267699
2016-04-27 12:04:44 +00:00
Simon Pilgrim d2ea708739 [InstCombine][SSE] Added DemandedBits tests for MOVMSK instructions
MOVMSK zeros the upper bits of the gpr - we should be able to use this.

llvm-svn: 267686
2016-04-27 09:53:09 +00:00
Adam Nemet d2fa414718 [LoopDist] Add llvm.loop.distribute.enable loop metadata
Summary:
D19403 adds a new pragma for loop distribution.  This change adds
support for the corresponding metadata that the pragma is translated to
by the FE.

As part of this I had to rethink the flag -enable-loop-distribute.  My
goal was to be backward compatible with the existing behavior:

  A1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute is specified

  A2. pass is on when invoked directly from opt (e.g. for unit-testing)

The new pragma/metadata overrides these defaults so the new behavior is:

  B1. A1 + enable distribution for individual loop with the pragma/metadata

  B2. A2 + disable distribution for individual loop with the pragma/metadata

The default value whether the pass is on or off comes from the initiator
of the pass.  From the PassManagerBuilder the default is off, from opt
it's on.

I moved -enable-loop-distribute under the pass.  If the flag is
specified it overrides the default from above.

Then the pragma/metadata can further modifies this per loop.

As a side-effect, we can now also use -enable-loop-distribute=0 from opt
to emulate the default from the optimization pipeline.  So to be precise
this is the new behavior:

  C1. pass is off by default from the optimization pipeline
  unless -enable-loop-distribute or the pragma/metadata enables it

  C2. pass is on when invoked directly from opt
  unless -enable-loop-distribute=0 or the pragma/metadata disables it

Reviewers: hfinkel

Subscribers: joker.eph, mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D19431

llvm-svn: 267672
2016-04-27 05:28:18 +00:00
Evgeny Stupachenko 23ce61b663 The patch fixes PR27392.
Summary:
 It is incorrect to compare TripCount (which is BECount + 1)
  with extraiters (or Count) to check if we should enter unrolled
  loop or not, because TripCount can potentially overflow
  (when BECount is max unsigned integer).
 While comparing BECount with (Count - 1) is overflow safe and
  therefore correct.

Reviewer: hfinkel

Differential Revision: http://reviews.llvm.org/D19256

From: Evgeny Stupachenko <evstupac@gmail.com>
llvm-svn: 267662
2016-04-27 03:04:54 +00:00
Philip Reames 3f83dbeed9 [LVI] Reduce compile time by lazily scanning blocks if needed
When encountering a non-local pointer, LVI would eagerly scan the block for dereferences of the given object to prove the pointer to be non null.  That's all well and good, but *then* we'd go recurse through our input blocks.  As a result, we could end up scanning each and every block we traverse, even if the final definition was obviously non null or we found a constant value somewhere up the chain.  The previous code papered over this by using the isKnownNonNull routine from value tracking.  This made the duplication less painful in the common case.

Instead, we know do the block scan only *after* we've gotten the recursive results back.  This lets us stop scanning individual blocks as soon as we've determined it to be non-null in any predecessor block and use our usual merge rules to propagate that information cheaply through successor blocks.  For a pointer which can be found non-null, this does strictly less work and sometimes substaintially so.

Note that the case where we *can't* prove something non-null is still the really expensive case.  We end up scanning each and every block looking for a dereference and never end up finding one.

llvm-svn: 267642
2016-04-27 00:30:55 +00:00
Justin Bogner c2bf63d29d PM: Port Reassociate to the new pass manager
llvm-svn: 267631
2016-04-26 23:39:29 +00:00
Sanjay Patel 29dea0d230 [SimplifyCFG] propagate branch metadata when creating select
llvm-svn: 267624
2016-04-26 23:15:48 +00:00
Philip Reames 053c2a6f25 [LVI] Apply transfer rule for overdefine inputs for binary operators
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules.  Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined.  This greatly impacts the precision of the overall analysis and makes it far more fragile as well.

This patch builds on 267609 which did the same thing for unary casts.

llvm-svn: 267620
2016-04-26 23:10:35 +00:00
Philip Reames e5030e85ea [LVI] A better fix for the assertion error introduced by 267609
Essentially, I was using the wrong size function.  For types which were sized, but not primitive, I wasn't getting a useful size for the operand and failed an assert.  I fixed this, and also added a guard that the input is a sized type.  Test case is for the original mistake.  I'm not sure how to actually exercise the sized type check.

llvm-svn: 267618
2016-04-26 22:52:30 +00:00
Sanjay Patel d2d2aa52cd [LowerExpectIntrinsic] make default likely/unlikely ratio bigger
We need the default ratio to be sufficiently large that it triggers transforms 
based on block frequency info (BFI) and plays well with the recently introduced
BranchProbability used by CGP.

Differential Revision: http://reviews.llvm.org/D19435

llvm-svn: 267615
2016-04-26 22:23:38 +00:00
Philip Reames 38c87c2e50 [LVI] Infer local facts from unary expressions
As pointed out by John Regehr over in http://reviews.llvm.org/D19485, LVI was being incredibly stupid about applying its transfer rules. Rather than gathering local facts from the expression itself, it was simply giving up entirely if one of the inputs was overdefined. This greatly impacts the precision of the overall analysis and makes it far more fragile as well.

This patch implements only the unary operation case. Once this is in, I'll implement the same for the binary operations.

Differential Revision: http://reviews.llvm.org/D19492

llvm-svn: 267609
2016-04-26 21:48:16 +00:00
David Majnemer abb9f55c80 Revert "[SimplifyLibCalls] sprintf doesn't copy null bytes"
The destination buffer that sprintf uses is restrict qualified, we do
not need to worry about derived pointers referenced via format
specifiers.

This reverts commit r267580.

llvm-svn: 267605
2016-04-26 21:04:47 +00:00
Elena Demikhovsky 308a7eb0d2 Masked Store in Loop Vectorizer - bugfix
Fixed a bug in loop vectorization with conditional store.

Differential Revision: http://reviews.llvm.org/D19532

llvm-svn: 267597
2016-04-26 20:18:04 +00:00
Justin Bogner 4563a06cee PM: Port Internalize to the new pass manager
llvm-svn: 267596
2016-04-26 20:15:52 +00:00
David Majnemer 8cd77baebc [SimplifyLibCalls] sprintf doesn't copy null bytes
sprintf doesn't read or copy the terminating null byte from it's string
operands.  sprintf will append it's own after processing all of the
format specifiers.

This fixes PR27526.

llvm-svn: 267580
2016-04-26 18:16:49 +00:00
Dehao Chen 5d6d4841ed Tune basic block annotation algorithm.
Summary:
Instead of using maximum IR weight as the basic block weight, this patch uses the voting algorithm to find the most likely weight for the basic block. This can effectively avoid the cases when some IRs are annotated incorrectly due to code motion of the profiled binary.

This patch also updates propagate.ll unittest to include discriminator in the input file so that it is testing something meaningful.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19301

llvm-svn: 267519
2016-04-26 04:59:11 +00:00
Hal Finkel e4c0c1679b [SimplifyCFG] Preserve !llvm.mem.parallel_loop_access when merging
When SimplifyCFG merges identical instructions from both sides of a diamond, it
can preserve !llvm.mem.parallel_loop_access (as it does with most of the other
metadata). There's no real data or control dependency change in this case.

llvm-svn: 267515
2016-04-26 02:06:06 +00:00
Hal Finkel 411d31ad72 [LoopVectorize] Don't consider conditional-load dereferenceability for marked parallel loops
I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we did not vectorize the loop. Even with "assume_safety" the check that we
don't if-convert conditionally-executed loads (to protect against
data-dependent deferenceability) was not elided.

One subtlety: As implemented, it will still prefer to use a masked-load
instrinsic (given target support) over the speculated load. The choice here
seems architecture specific; the best option depends on how expensive the
masked load is compared to a regular load. Ideally, using the masked load still
reduces unnecessary memory traffic, and so should be preferred. If we'd rather
do it the other way, flipping the order of the checks is easy.

The LangRef is updated to make explicit that llvm.mem.parallel_loop_access also
implies that if conversion is okay.

Differential Revision: http://reviews.llvm.org/D19512

llvm-svn: 267514
2016-04-26 02:00:36 +00:00
Sanjay Patel a31b0c0ece [CodeGenPrepare] don't convert an unpredictable select into control flow
Suggested in the review of D19488:
http://reviews.llvm.org/D19488

llvm-svn: 267504
2016-04-26 00:47:39 +00:00
Justin Bogner 1a07501379 PM: Port GlobalOpt to the new pass manager
llvm-svn: 267499
2016-04-26 00:28:01 +00:00
Justin Bogner 6f6c5f2a02 GlobalOpt: Convert a bunch of tests from grep to FileCheck
llvm-svn: 267493
2016-04-25 23:36:50 +00:00
Sanjay Patel 82059090d3 Add check for "branch_weights" with prof metadata
While we're here, fix the comment and variable names to make it
clear that these are raw weights, not percentages.

llvm-svn: 267491
2016-04-25 23:15:16 +00:00
Arch D. Robison be0490a6e8 Optimize store of "bitcast" from vector to aggregate.
This patch is what was the "instcombine" portion of D14185, with an additional 
test added (see julia_pseudovec in test/Transforms/InstCombine/insert-val-extract-elem.ll). 
The patch causes instcombine to replace sequences of extractelement-insertvalue-store 
that act essentially like a bitcast followed by a store.

Differential review: http://reviews.llvm.org/D14260

llvm-svn: 267482
2016-04-25 22:22:39 +00:00
Chad Rosier bbabc85031 Fix typo from r267432.
llvm-svn: 267436
2016-04-25 18:20:27 +00:00
Chad Rosier 4c4e3336b8 [ValueTracking] Add an additional test case for r266767 where one operand is a const.
llvm-svn: 267432
2016-04-25 17:41:48 +00:00
Chad Rosier e2cbd13e56 [ValueTracking] Improve isImpliedCondition when the dominating cond is false.
llvm-svn: 267430
2016-04-25 17:23:36 +00:00
James Molloy eb040cc55f [GlobalOpt] Allow constant globals to be SRA'd
The current logic assumes that any constant global will never be SRA'd. I presume this is because normally constant globals can be pushed into their uses and deleted. However, that sometimes can't happen (which is where you really want SRA, so the elements that can be eliminated, are!).

There seems to be no reason why we can't SRA constants too, so let's do it.

llvm-svn: 267393
2016-04-25 10:48:29 +00:00
Simon Pilgrim 4b5462f119 [InstCombine][SSE] Reduce DIVSS/DIVSD to FDIV if only first element is required
As discussed on D19318, if we only demand the first element of a DIVSS/DIVSD intrinsic, then reduce to a FDIV call. This matches the existing FADD/FSUB/FMUL patterns.

llvm-svn: 267359
2016-04-24 18:35:59 +00:00
Simon Pilgrim 83020942d3 [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 2 of 2)
Split from D17490. This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - demanded vector element support for unary and some extra binary scalar intrinsics (RCP/RSQRT/SQRT/FRCZ and ADD/CMP/DIV/ROUND).

2 - addss/addsd get simplified to a fadd call if we aren't interested in the pass through elements

3 - if we don't need the lowest element of a scalar operation then just use the first argument (the pass through elements) directly

We can add support for propagating demanded elements through any equivalent packed SSE intrinsics in a future patch (these wouldn't use the pass through patterns).

Differential Revision: http://reviews.llvm.org/D19318

llvm-svn: 267357
2016-04-24 18:23:14 +00:00
Simon Pilgrim 424da1637a [InstCombine][SSE] Demanded vector elements for scalar intrinsics (Part 1 of 2)
This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)

2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input

Differential Revision: http://reviews.llvm.org/D17490

llvm-svn: 267356
2016-04-24 18:12:42 +00:00
Duncan P. N. Exon Smith a59d3e5af8 DebugInfo: Remove MDString-based type references
Eliminate DITypeIdentifierMap and make DITypeRef a thin wrapper around
DIType*.  It is no longer legal to refer to a DICompositeType by its
'identifier:', and DIBuilder no longer retains all types with an
'identifier:' automatically.

Aside from the bitcode upgrade, this is mainly removing logic to resolve
an MDString-based reference to an actualy DIType.  The commits leading
up to this have made the implicit type map in DICompileUnit's
'retainedTypes:' field superfluous.

This does not remove DITypeRef, DIScopeRef, DINodeRef, and
DITypeRefArray, or stop using them in DI-related metadata.  Although as
of this commit they aren't serving a useful purpose, there are patchces
under review to reuse them for CodeView support.

The tests in LLVM were updated with deref-typerefs.sh, which is attached
to the thread "[RFC] Lazy-loading of debug info metadata":

  http://lists.llvm.org/pipermail/llvm-dev/2016-April/098318.html

llvm-svn: 267296
2016-04-23 21:08:00 +00:00
Nico Weber 0aa9845d15 Revert r267210, it makes clang assert (PR27490).
llvm-svn: 267232
2016-04-22 22:08:42 +00:00
Peter Collingbourne 7dd8dbf486 Introduce llvm.load.relative intrinsic.
This intrinsic takes two arguments, ``%ptr`` and ``%offset``. It loads
a 32-bit value from the address ``%ptr + %offset``, adds ``%ptr`` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form ``i32 trunc(x - %ptr)``,
the intrinsic call is folded to ``x``.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if ``x`` is an
``unnamed_addr`` function. However, it does not provide this guarantee for
a constant initializer folded into a function body. This intrinsic can be
used to avoid the possibility of overflows when loading from such a constant.

Differential Revision: http://reviews.llvm.org/D18367

llvm-svn: 267223
2016-04-22 21:18:02 +00:00
Philip Reames 5f0e36947b [unordered] sink unordered stores at end of blocks
The existing code turned out to be completely correct when auditted.  Thus, only minor code changes and adding a couple of tests.

llvm-svn: 267215
2016-04-22 20:53:32 +00:00
Sanjoy Das f97229d6ba Fold compares for distinct allocations
Summary:
We can fold compares to false when two distinct allocations within a
function are compared for equality.

Patch by Anna Thomas!

Reviewers: majnemer, reames, sanjoy

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19390

llvm-svn: 267214
2016-04-22 20:52:25 +00:00
Philip Reames eedef73b63 [unordered] Extend load/store type canonicalization to handle unordered operations
Extend the type canonicalization logic to work for unordered atomic loads and stores.  Note that while this change itself is fairly simple and low risk, there's a reasonable chance this will expose problems in the backends by suddenly generating IR they wouldn't have seen before.  Anything of this nature will be an existing bug in the backend (you could write an atomic float load), but this will definitely change the frequency with which such cases are encountered.  If you see problems, feel free to revert this change, but please make sure you collect a test case.  

llvm-svn: 267210
2016-04-22 20:33:48 +00:00
Justin Bogner b93949089e PM: Port SinkingPass to the new pass manager
llvm-svn: 267199
2016-04-22 19:54:10 +00:00
Jun Bum Lim d29a24e4fd [DeadStoreElimination] Shorten beginning of memset overwritten by later stores
Summary: This change will shorten memset if the beginning of memset is overwritten by later stores.

Reviewers: hfinkel, eeckstein, dberlin, mcrosier

Subscribers: mgrang, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D18906

llvm-svn: 267197
2016-04-22 19:51:29 +00:00
Justin Bogner 395c2127ed PM: Port DCE to the new pass manager
Also add a very basic test, since apparently there aren't any tests
for DCE whatsoever to add the new pass version to.

llvm-svn: 267196
2016-04-22 19:40:41 +00:00
Adam Nemet 54053a518e [LoopVersioningLICM] Add test coverage for llvm.loop.licm_versioning.disable
In the next change, I am generalizing the function
findStringMetadataForLoop and I want to make sure I don't break this.
Looks like there was no coverage for this so far.

llvm-svn: 267182
2016-04-22 18:34:50 +00:00
Chad Rosier 1a60159064 [SimplifyCFG] Add final missing implications to isImpliedTrueByMatchingCmp.
Summary: eq imply [u|s]ge and [u|s]le are true.

Remove redundant logic by implementing isImpliedFalseByMatchingCmp(Pred1, Pred2)
as isImpliedTrueByMatchingCmp(Pred1, getInversePredicate(Pred2)).

llvm-svn: 267177
2016-04-22 17:57:34 +00:00
Chad Rosier 3456cb5672 [SimplifyCFG] Add missing implications to isImpliedTrueByMatchingCmp.
Summary: [u|s]gt and [u|s]lt imply [u|s]ge and [u|s]le are true, respectively.
I've simplified the existing tests and added additional tests to cover the new
cases mentioned above.  I've also added tests for all the cases where the
first compare doesn't imply anything about the second compare.

llvm-svn: 267171
2016-04-22 17:14:12 +00:00
Chad Rosier 1960d13e29 [SimplifyCFG] Simplify code review by temporarily removing this test file.
A followup commit will replace these tests with simplified and more inclusive
tests.  The diff is unreadable if this were to be done in a single commit.

llvm-svn: 267170
2016-04-22 17:14:08 +00:00
David Majnemer bfd695d591 [EarlyCSE] Don't add the overflow flags to the hash
We take the intersection of overflow flags while CSE'ing.
This permits us to consider two instructions with different overflow
behavior to be replaceable.

llvm-svn: 267153
2016-04-22 14:12:50 +00:00
Silviu Baranga e985c76b90 [InstCombine] Preserve fast math flags when combining PHIs
Summary:
When optimizing PHIs which have inputs floating point binary
operators, we preserve all IR flags except the fast math
flags.

This change removes the logic which tracked some of the IR flags
(no wrap, exact) and replaces it by doing an and on the IR flags of
all inputs to the PHI - which will also handle the fast math
flags.

Reviewers: majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19370

llvm-svn: 267139
2016-04-22 11:21:36 +00:00
David Majnemer d0ce8f1485 [GVN] Respect fast-math-flags on fcmps
We assumed that flags were only present on binary operators.  This is
not true, they may also be present on calls and fcmps.

llvm-svn: 267113
2016-04-22 06:37:51 +00:00
David Majnemer 9554c1339c [EarlyCSE] Take the intersection of flags on instructions
EarlyCSE had inconsistent behavior with regards to flag'd instructions:
- In some cases, it would pessimize if the available instruction had
  different flags by not performing CSE.
- In other cases, it would miscompile if it replaced an instruction
  which had no flags with an instruction which has flags.

Fix this by being more consistent with our flag handling by utilizing
andIRFlags.

llvm-svn: 267111
2016-04-22 06:37:45 +00:00
Sanjoy Das a085cfc150 Folding compares with unescaped allocations
Summary:
If we know that the pointer allocated within a function does not escape,
we can fold away comparisons that are done with global pointers

Patch by Anna Thomas!

Reviewers: reames, majnemer, sanjoy

Subscribers: mgrang, mcrosier, majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D19276

llvm-svn: 267035
2016-04-21 19:26:45 +00:00
Philip Reames a98c7ead30 [instcombine][unordered] Extend load(select) transform to handle unordered loads
llvm-svn: 267023
2016-04-21 17:59:40 +00:00
Philip Reames 3ac0718423 [unordered] unordered loads from null are still unreachable
llvm-svn: 267019
2016-04-21 17:45:05 +00:00
Philip Reames ac55090e96 [instcombine][unordered] Implement *-load forwarding for unordered atomics
This builds on 266999 which made FindAvailableValue do the right thing.  Tests included show the newly enabled transforms and those which disabled either due to conservatism or correctness requirements.

llvm-svn: 267006
2016-04-21 17:03:33 +00:00
Philip Reames 92c43699bc [unordered] Add tests and conservative handling in support of future changes [NFCI]
This change adds a couple of test cases to make sure FindAvailableLoadedValue does the right thing.  At the moment, the code added is dead, but separating it makes follow on changes far more obvious.

llvm-svn: 266999
2016-04-21 16:51:08 +00:00
Sanjoy Das 54a3a006ca [SimplifyCFG] Fold `llvm.guard(false)` to unreachable
Summary:
`llvm.guard(false)` always bails out of the current compilation unit, so
we can prune any control flow following it.

Reviewers: hfinkel, pcc, reames

Subscribers: majnemer, reames, mcrosier, llvm-commits

Differential Revision: http://reviews.llvm.org/D19245

llvm-svn: 266955
2016-04-21 05:09:12 +00:00
Mehdi Amini bda3c97c16 ThinLTO/ModuleLinker: add a flag to not always pull-in linkonce when performing importing
Summary:
The function importer already decided what symbols need to be pulled
in. Also these magically added ones will not be in the export list
for the source module, which can confuse the internalizer for
instance.

Reviewers: tejohnson, rafael

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19096

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266948
2016-04-21 01:59:39 +00:00
Nick Lewycky 762f8a8549 Add optimization for 'icmp slt (or A, B), A' and some related idioms based on knowledge of the sign bit for A and B.
No matter what value you OR in to A, the result of (or A, B) is going to be UGE A. When A and B are positive, it's SGE too. If A is negative, OR'ing a value into it can't make it positive, but can increase its value closer to -1, therefore (or A, B) is SGE A. Working through all possible combinations produces this truth table:

```
A is
+, -, +/-
F  F   F   +    B is
T  F   ?   -
?  F   ?   +/-
```

The related optimizations are flipping the 'slt' for 'sge' which always NOTs the result (if the result is known), and swapping the LHS and RHS while swapping the comparison predicate.

There are more idioms left to implement (aren't there always!) but I've stopped here because any more would risk becoming unreasonable for reviewers.

llvm-svn: 266939
2016-04-21 00:53:14 +00:00
Vedant Kumar 932866bfe7 [test/PGOProfile] Make tests independent of the raw profile version (NFC)
Differential Revision: http://reviews.llvm.org/D19290

llvm-svn: 266928
2016-04-20 22:24:01 +00:00
Teresa Johnson b35cc691ea [ThinLTO] Prevent importing of "llvm.used" values
Summary:
This patch prevents importing from (and therefore exporting from) any
module with a "llvm.used" local value. Local values need to be promoted
and renamed when importing, and their presense on the llvm.used variable
indicates that there are opaque uses that won't see the rename. One such
example is a use in inline assembly.

See also the discussion at:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098047.html

As part of this, move collectUsedGlobalVariables out of Transforms/Utils
and into IR/Module so that it can be used more widely. There are several
other places in LLVM that used copies of this code that can be cleaned
up as a follow on NFC patch.

Reviewers: joker.eph

Subscribers: pcc, llvm-commits, joker.eph

Differential Revision: http://reviews.llvm.org/D18986

llvm-svn: 266877
2016-04-20 14:39:45 +00:00
Mandeep Singh Grang 029a0567fa [LLVM] Remove unwanted --check-prefix=CHECK from unit tests. NFC.
Summary: Removed unwanted --check-prefix=CHECK from numerous unit tests.

Reviewers: t.p.northover, dblaikie, uweigand, MatzeB, tstellarAMD, mcrosier

Subscribers: mcrosier, dsanders

Differential Revision: http://reviews.llvm.org/D19279

llvm-svn: 266834
2016-04-19 23:51:52 +00:00
Marcin Koscielnicki 3fdc257d6a [AArch64] [ARM] Make a target-independent llvm.thread.pointer intrinsic.
Both AArch64 and ARM support llvm.<arch>.thread.pointer intrinsics that
just return the thread pointer.  I have a pending patch that does the same
for SystemZ (D19054), and there are many more targets that could benefit
from one.

This patch merges the ARM and AArch64 intrinsics into a single target
independent one that will also be used by subsequent targets.

Differential Revision: http://reviews.llvm.org/D19098

llvm-svn: 266818
2016-04-19 20:51:05 +00:00
Chad Rosier b7dfbb40a3 [ValueTracking] Improve isImpliedCondition for conditions with matching operands.
This patch improves SimplifyCFG to catch cases like:

  if (a < b) {
    if (a > b) <- known to be false
      unreachable;
  }

Phabricator Revision: http://reviews.llvm.org/D18905

llvm-svn: 266767
2016-04-19 17:19:14 +00:00
Simon Pilgrim 998cffa6b9 [InstCombine][X86] Added extra tests introduced for D17490
llvm-svn: 266732
2016-04-19 12:59:52 +00:00
Simon Pilgrim 74b3bfdf71 [InstCombine][X86] Regenerate SSE combine tests as part of setup for D17490
Regenerated with utils/update_test_checks.py 

llvm-svn: 266731
2016-04-19 12:56:46 +00:00
Tim Northover b629c77692 ARM: use a pseudo-instruction for cmpxchg at -O0.
The fast register-allocator cannot cope with inter-block dependencies without
spilling. This is fine for ldrex/strex loops coming from atomicrmw instructions
where any value produced within a block is dead by the end, but not for
cmpxchg. So we lower a cmpxchg at -O0 via a pseudo-inst that gets expanded
after regalloc.

Fortunately this is at -O0 so we don't have to care about performance. This
simplifies the various axes of expansion considerably: we assume a strong
seq_cst operation and ensure ordering via the always-present DMB instructions
rather than v8 acquire/release instructions.

Should fix the 32-bit part of PR25526.

llvm-svn: 266679
2016-04-18 21:48:55 +00:00
Chad Rosier e30fed70e6 [ValueTracking] Correct lit test comments. NFC.
llvm-svn: 266657
2016-04-18 19:11:45 +00:00
Eric Liu d09f15ea6f Revert "Replace the use of MaxFunctionCount module flag"
This reverts commit r266477.

This commit introduces cyclic dependency. This commit has "Analysis" depend on "ProfileData",
while "ProfileData" depends on "Object", which depends on "BitCode", which
depends on "Analysis".

llvm-svn: 266619
2016-04-18 15:31:11 +00:00
Renato Golin 4b18a510a2 [ARM] AArch32 v8 NEON is still not IEEE-754 compliant
llvm-svn: 266603
2016-04-18 12:06:47 +00:00
Sanjoy Das 99042473d0 Fix a typo in rL265762
I accidentally replaced `mayBeOverridden` with `!isInterposable`.
Remove the negation and add a test case that would've caught this.

Many thanks to Håkan Hjort for spotting this!

llvm-svn: 266551
2016-04-17 04:30:43 +00:00
Mehdi Amini 2d28f7aa07 ThinLTO: Make aliases explicit in the summary
To be able to work accurately on the reference graph when taking
decision about internalizing, promoting, renaming, etc. We need
to have the alias information explicit.

Differential Revision: http://reviews.llvm.org/D18836

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266517
2016-04-16 06:56:44 +00:00
Evgeniy Stepanov 40cd1514cf [cfi] Support explicit sections for functions in cfi-icall.
Allow explicit section for indirectly called functions in cfi-icall.
Jumptables for functions in the same type class must be contiguous, so they
always go to the default text section.

Fixes PR25079.

llvm-svn: 266486
2016-04-15 22:55:38 +00:00
Adrian Prantl dc75a6b517 Convert this sample-based-profiling testcase to use a NoDebug CU.
llvm-svn: 266481
2016-04-15 22:05:38 +00:00
Easwaran Raman f53baca686 Replace the use of MaxFunctionCount module flag
Adds an interface to get ProfileSummary for a module and makes InlineCost use ProfileSummary to get max function count.

Differential Revision: http://reviews.llvm.org/D18622

llvm-svn: 266477
2016-04-15 21:39:58 +00:00
Tim Northover 903f81ba18 ARM: don't try to hoist constant RHS out of a division.
Divisions by a constant can be converted into multiplies which are usually
cheaper, but this isn't possible if the constant gets separated (particularly
in loops). Fix this by telling ConstantHoisting that the immediate in a DIV is
cheap.

I considered making the check generic, but neither AArch64 (strangely) nor x86
showed any benefit on the tests I had.

llvm-svn: 266464
2016-04-15 18:17:18 +00:00
David Majnemer 2e02ba78d5 [InstCombine] Don't transform compares of calls to functions named fabs{f,l,}
InstCombine wants to optimize compares of calls to fabs with zero.
However, we didn't have the necessary legality checking to verify that
the function call had the same behavior as fabs.

llvm-svn: 266452
2016-04-15 17:21:03 +00:00
Adrian Prantl 75819aedf6 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
Sanjay Patel f11ab05bdb [SimplifyCFG] propagate branch metadata when creating select (PR27344)
This is almost identical to:
http://reviews.llvm.org/rL264527

This doesn't solve PR27344; it just allows the profile weights to survive. 
To solve the bug, we need to use the profile weights in the backend.

llvm-svn: 266442
2016-04-15 15:32:12 +00:00
Sanjay Patel 81433e99b9 [SimplifyCFG] add metadata to show failure to propagate (PR27344)
llvm-svn: 266435
2016-04-15 14:53:35 +00:00
Justin Lebar f04e678e36 Move divergent-target test into CodeGen/NVPTX because it requires an NVPTX target.
llvm-svn: 266403
2016-04-15 01:20:52 +00:00
Justin Lebar cad81cf6b3 [Speculation] Add a SpeculativeExecution mode where the pass does nothing unless TTI::hasBranchDivergence() is true.
Summary:
This lets us add this pass to the IR pass manager unconditionally; it
will simply not do anything on targets without branch divergence.

Reviewers: tra

Subscribers: llvm-commits, jingyue, rnk, chandlerc

Differential Revision: http://reviews.llvm.org/D18625

llvm-svn: 266398
2016-04-15 00:32:09 +00:00
Vedant Kumar 4960fbf391 [test] Require 'asserts' for a test which uses -debug-only
Without this line, bots which run check-all on Release compilers will
break.

llvm-svn: 266386
2016-04-14 23:32:40 +00:00
Michael Kuperstein 16f13e252b [AliasSetTracker] Correctly handle changing the size of an entry
If the size of an AST entry changes, we also need to make sure we perform
necessary alias set merges, as the new size may overlap pointers in other sets.
We happen to run into this with memset, because memset allows an entry for a
i8* pointer to have a decidedly non-i8 size.

This fixes PR27262.

Differential Revision: http://reviews.llvm.org/D18939

llvm-svn: 266381
2016-04-14 22:00:11 +00:00
Renato Golin 5cb666add7 [ARM] Adding IEEE-754 SIMD detection to loop vectorizer
Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

llvm-svn: 266363
2016-04-14 20:42:18 +00:00
Sanjay Patel e998b91d86 [InstCombine] remove constant by inverting compare + logic (PR27105)
https://llvm.org/bugs/show_bug.cgi?id=27105

We can check if all bits outside of a constant mask are set with a 
single constant.

As noted in the bug report, although this form should be considered the
canonical IR, backends may want to transform this into an 'andn' / 'andc' 
comparison against zero because that could be a single machine instruction.

Differential Revision: http://reviews.llvm.org/D18842

llvm-svn: 266362
2016-04-14 20:17:40 +00:00
Dehao Chen 46f8fbbb1b Update discriminator assignment algorithm to handle nested call correctly.
Summary: Add discriminator for nested call correctly.

Reviewers: davidxl, dnovillo

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D19127

llvm-svn: 266354
2016-04-14 18:37:18 +00:00
Adam Nemet 7aab648831 Revert "Support arbitrary addrspace pointers in masked load/store intrinsics"
This reverts commit r266086.

It breaks the LTO build of gcc in SPEC2000.

llvm-svn: 266282
2016-04-14 08:47:17 +00:00
Tim Northover 5c02f9ad28 ARM: override cost function to re-enable ConstantHoisting (& fix it).
At some point, ARM stopped getting any benefit from ConstantHoisting because
the pass called a different variant of getIntImmCost. Reimplementing the
correct variant revealed some problems, however:

  + ConstantHoisting was modifying switch statements. This is simply invalid,
    the cases must remain integer constants no matter the notional cost.
  + ConstantHoisting was mangling alloca instructions in the entry block. These
    should be handled by FrameLowering, so constants actually have a cost of 0.
    Worse, the resulting bitcasts meant they became dynamic allocas.

rdar://25707382

llvm-svn: 266260
2016-04-13 23:08:27 +00:00
Easwaran Raman cbd3989742 Test case for r265852.
llvm-svn: 266237
2016-04-13 19:43:31 +00:00
Betul Buyukkurt bf8554c279 [PGO] Remove redundant VP instrumentation
LLVM optimization passes may reduce a profiled target expression
to a constant. Removing runtime calls at such instrumentation points
would help speedup the runtime of the instrumented program.

llvm-svn: 266229
2016-04-13 18:52:19 +00:00
Mehdi Amini b5b289339b Revert "Make aliases explicit in the summary"
Inadvertently commited...

This reverts commit e618ec93786d99df2ddf280ad2d5e02f5516cecf.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266215
2016-04-13 17:20:07 +00:00
Mehdi Amini ce744a95fd Make aliases explicit in the summary
Summary:
To be able to work accurately on the reference graph when taking decision
about internalizing, promoting, renaming, etc. We need to have the alias
information explicit.

Reviewers: tejohnson

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D18836

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266214
2016-04-13 17:18:42 +00:00
David L Kreitzer 752c1448fe Simplify strlen to a subtraction for certain cases.
Patch by Li Huang (li1.huang@intel.com)

Differential Revision: http://reviews.llvm.org/D18230

llvm-svn: 266200
2016-04-13 14:31:06 +00:00
Petar Jovanovic 644b8c1a5d Calculate __builtin_object_size when pointer depends on a condition
This patch fixes calculating of builtin_object_size if it depends on a
condition. Before this patch compiler did not know how to calculate the
object size when it finds a condition that cannot be eliminated.
This patch enables calculating of builtin_object_size even in case when
condition cannot be eliminated by choosing minimum or maximum value as a
result from condition. Choosing minimum or maximum value from condition
is based on the second argument of __builtin_object_size function.

Patch by Strahinja Petrovic.

Differential Revision: http://reviews.llvm.org/D18438

llvm-svn: 266193
2016-04-13 12:25:25 +00:00
David Majnemer 3ee5f34469 [InstCombine] We folded an fcmp to an i1 instead of a vector of i1
Remove an ad-hoc transform in InstCombine and replace it with more
general machinery (ValueTracking, InstructionSimplify and VectorUtils).

This fixes PR27332.

llvm-svn: 266175
2016-04-13 06:55:52 +00:00
Matt Arsenault b34eea9cb5 AMDGPU: Remove leftover ShaderType attributes in tests
llvm-svn: 266155
2016-04-13 00:39:48 +00:00
Sanjay Patel 5e5056d939 [x86, InstCombine] fix masked load pass-through operand to be a zero vector
This bug was introduced with:
http://reviews.llvm.org/rL262269

AVX masked loads are specified to set vector lanes to zero when the high bit of the mask
element for that lane is zero:
"If the mask is 0, the corresponding data element is set to zero in the load form of these
instructions, and unmodified in the store form." --Intel manual

Differential Revision: http://reviews.llvm.org/D19017

llvm-svn: 266148
2016-04-12 23:16:23 +00:00
Mehdi Amini d5faa267c4 Add a pass to name anonymous/nameless function
Summary:
For correct handling of alias to nameless
function, we need to be able to refer them through a GUID in the summary.
Here we name them using a hash of the non-private global names in the module.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D18883

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266132
2016-04-12 21:35:28 +00:00
Mehdi Amini 68da426eea Move summary creation out of llvm-as into opt
Summary:
Let keep llvm-as "dumb": it converts textual IR to bitcode. This
commit removes the dependency from llvm-as to libLLVMAnalysis.
We'll add back summary in llvm-as if we get to a textual
representation for it at some point. In the meantime, opt seems
like a better place for that.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits

Differential Revision: http://reviews.llvm.org/D19032

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266131
2016-04-12 21:35:18 +00:00
James Y Knight 19f6cce4e3 Add __atomic_* lowering to AtomicExpandPass.
(Recommit of r266002, with r266011, r266016, and not accidentally
including an extra unused/uninitialized element in LibcallRoutineNames)

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw, and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266115
2016-04-12 20:18:48 +00:00
Artur Pilipenko dbe0bc8df4 Support arbitrary addrspace pointers in masked load/store intrinsics
This is a resubmittion of 263158 change.

This patch fixes the problem which occurs when loop-vectorize tries to use @llvm.masked.load/store intrinsic for a non-default addrspace pointer. It fails with "Calling a function with a bad signature!" assertion in CallInst constructor because it tries to pass a non-default addrspace pointer to the pointer argument which has default addrspace.

The fix is to add pointer type as another overloaded type to @llvm.masked.load/store intrinsics.

Reviewed By: reames

Differential Revision: http://reviews.llvm.org/D17270

llvm-svn: 266086
2016-04-12 15:58:04 +00:00
Rafael Espindola d41b54be11 This reverts commit r266002, r266011 and r266016.
They broke the msan bot.

Original message:

Add __atomic_* lowering to AtomicExpandPass.

AtomicExpandPass can now lower atomic load, atomic store, atomicrmw,and
cmpxchg instructions to __atomic_* library calls, when the target
doesn't support atomics of a given size.

This is the first step towards moving all atomic lowering from clang
into llvm. When all is done, the behavior of __sync_* builtins,
__atomic_* builtins, and C11 atomics will be unified.

Previously LLVM would pass everything through to the ISelLowering
code. There, unsupported atomic instructions would turn into __sync_*
library calls. Because of that behavior, Clang currently avoids emitting
llvm IR atomic instructions when this would happen, and emits __atomic_*
library functions itself, in the frontend.

This change makes LLVM able to emit __atomic_* libcalls, and thus will
eventually allow clang to depend on LLVM to do the right thing.

It is advantageous to do the new lowering to atomic libcalls in
AtomicExpandPass, before ISel time, because it's important that all
atomic operations for a given size either lower to __atomic_*
libcalls (which may use locks), or native instructions which won't. No
mixing and matching.

At the moment, this code is enabled only for SPARC, as a
demonstration. The next commit will expand support to all of the other
targets.

Differential Revision: http://reviews.llvm.org/D18200

llvm-svn: 266062
2016-04-12 12:30:25 +00:00
George Burgess IV 278199f615 Add the allocsize attribute to LLVM.
`allocsize` is a function attribute that allows users to request that
LLVM treat arbitrary functions as allocation functions.

This patch makes LLVM accept the `allocsize` attribute, and makes
`@llvm.objectsize` recognize said attribute.

The review for this was split into two patches for ease of reviewing:
D18974 and D14933. As promised on the revisions, I'm landing both
patches as a single commit.

Differential Revision: http://reviews.llvm.org/D14933

llvm-svn: 266032
2016-04-12 01:05:35 +00:00
JF Bastien 4f43cfd2c2 MergeFunctions: test alloca better
r237193 fix handling of alloca size / align in MergeFunctions, but only tested one and didn't follow FunctionComparator::cmpOperations's usual comparison pattern. It also didn't update Instruction.cpp:haveSameSpecialState which I'll do separately.

llvm-svn: 266022
2016-04-12 00:03:26 +00:00
Mehdi Amini ae280e54a9 ThinLTO renaming: use module hash instead of position in the summary
This is more robust to changes in the link ordering.

Differential Revision: http://reviews.llvm.org/D18946

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 266018
2016-04-11 23:26:46 +00:00
Evgeniy Stepanov f17120a85f [safestack] Add canary to unsafe stack frames
Add StackProtector to SafeStack. This adds limited protection against
data corruption in the caller frame. Current implementation treats
all stack protector levels as -fstack-protector-all.

llvm-svn: 266004
2016-04-11 22:27:48 +00:00