Commit Graph

14468 Commits

Author SHA1 Message Date
Sanjay Patel 2440130437 fix inaccurate comment; NFC
llvm-svn: 261484
2016-02-21 17:33:31 +00:00
Sanjay Patel 368ac5dbf7 [InstCombine] add getNegativeIsTrueBoolVec() helper function; NFC
Originally part of:
http://reviews.llvm.org/D17485

We need this when simplifying masked memory ops too.

llvm-svn: 261483
2016-02-21 17:29:33 +00:00
Sanjoy Das 979a11d5b2 [LoopDeletion] Add an assert that verifies LCSSA
This is inspired by PR24804 -- had this assert been there before,
isolating the root cause for PR24804 would have been far easier.

llvm-svn: 261481
2016-02-21 17:11:59 +00:00
Simon Pilgrim 471efd244a [InstCombine] SSE/SSE2 (u)comiss/(u)comisd comparison intrinsics only use the lowest vector element
llvm-svn: 261460
2016-02-20 23:17:35 +00:00
Benjamin Kramer 7d537ae747 [SimplifyCFG] Use pointer identity to simplify predicate.
No functional change intended.

llvm-svn: 261427
2016-02-20 10:40:42 +00:00
David Majnemer 1efa23ddab [SimplifyCFG] Merge together cleanuppads
Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.

Differential Revision: http://reviews.llvm.org/D17459

llvm-svn: 261390
2016-02-20 01:07:45 +00:00
Hans Wennborg a0f7090563 Revert r255691 "[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions."
It caused PR26509.

llvm-svn: 261368
2016-02-19 21:40:12 +00:00
Matthew Simpson 29c997c1a1 [LV] Vectorize first-order recurrences
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.

In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.

Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197

llvm-svn: 261346
2016-02-19 17:56:08 +00:00
Silviu Baranga ad1dafb2c3 [LV] Fix PR26600: avoid out of bounds loads for interleaved access vectorization
Summary:
If we don't have the first and last access of an interleaved load group,
the first and last wide load in the loop can do an out of bounds
access. Even though we discard results from speculative loads,
this can cause problems, since it can technically generate page faults
(or worse).

We now discard interleaved load groups that don't have the first and
load in the group.

Reviewers: hfinkel, rengolin

Subscribers: rengolin, llvm-commits, mzolotukhin, anemet

Differential Revision: http://reviews.llvm.org/D17332

llvm-svn: 261331
2016-02-19 15:46:10 +00:00
Chandler Carruth 31088a9d58 [LPM] Factor all of the loop analysis usage updates into a common helper
routine.

We were getting this wrong in small ways and generally being very
inconsistent about it across loop passes. Instead, let's have a common
place where we do this. One minor downside is that this will require
some analyses like SCEV in more places than they are strictly needed.
However, this seems benign as these analyses are complete no-ops, and
without this consistency we can in many cases end up with the legacy
pass manager scheduling deciding to split up a loop pass pipeline in
order to run the function analysis half-way through. It is very, very
annoying to fix these without just being very pedantic across the board.

The only loop passes I've not updated here are ones that use
AU.setPreservesAll() such as IVUsers (an analysis) and the pass printer.
They seemed less relevant.

With this patch, almost all of the problems in PR24804 around loop pass
pipelines are fixed. The one remaining issue is that we run simplify-cfg
and instcombine in the middle of the loop pass pipeline. We've recently
added some loop variants of these passes that would seem substantially
cleaner to use, but this at least gets us much closer to the previous
state. Notably, the seven loop pass managers is down to three.

I've not updated the loop passes using LoopAccessAnalysis because that
analysis hasn't been fully wired into LoopSimplify/LCSSA, and it isn't
clear that those transforms want to support those forms anyways. They
all run late anyways, so this is harmless. Similarly, LSR is left alone
because it already carefully manages its forms and doesn't need to get
fused into a single loop pass manager with a bunch of other loop passes.

LoopReroll didn't use loop simplified form previously, and I've updated
the test case to match the trivially different output.

Finally, I've also factored all the pass initialization for the passes
that use this technique as well, so that should be done regularly and
reliably.

Thanks to James for the help reviewing and thinking about this stuff,
and Ben for help thinking about it as well!

Differential Revision: http://reviews.llvm.org/D17435

llvm-svn: 261316
2016-02-19 10:45:18 +00:00
Chandler Carruth ac07270828 [AA] Preserve the AA results wrapper pass as well as BasicAA in a few
more places to prevent gratuitous re-"runs" of these passes.

The passes themselves don't do any work when run, but we keep spending
time scheduling and running these needlessly when we really don't need
to do so.

This is the first patch towards fixing the really horrible loop pass
pipeline fragmentation pointed out by Sanjoy in PR24804.

llvm-svn: 261302
2016-02-19 03:12:14 +00:00
Lawrence Hu 84e6f1dd70 Bug fix: use dyn_cast_or_null instead of dyn_cast
Differential Revision: http://reviews.llvm.org/D17154

llvm-svn: 261299
2016-02-19 02:17:07 +00:00
Richard Trieu 7a08381403 Remove uses of builtin comma operator.
Cleanup for upcoming Clang warning -Wcomma.  No functionality change intended.

llvm-svn: 261270
2016-02-18 22:09:30 +00:00
Adam Nemet 9d9cb274ea [PPCLoopDataPrefetch] Move pass to Transforms/Scalar/LoopDataPrefetch. NFC
This patch is part of the work to make PPCLoopDataPrefetch
target-independent
(http://thread.gmane.org/gmane.comp.compilers.llvm.devel/92758).

Obviously the pass still only used from PPC at this point.  Subsequent
patches will start driving this from ARM64 as well.

Due to the previous patch most lines should show up as moved lines.

llvm-svn: 261265
2016-02-18 21:38:19 +00:00
Matthew Simpson 92821cb4a8 Reapply commit r259357 with a fix for PR26629
Commit r259357 was reverted because it caused PR26629. We were assuming all
roots of a vectorizable tree could be truncated to the same width, which is not
the case in general. This commit reapplies the patch along with a fix and a new
test case to ensure we don't regress because of this issue again. This should
fix PR26629.

llvm-svn: 261212
2016-02-18 14:14:40 +00:00
Chandler Carruth 9c4ed175c2 [PM] Port the PostOrderFunctionAttrs pass to the new pass manager and
convert one test to use this.

This is a particularly significant milestone because it required
a working per-function AA framework which can be queried over each
function from within a CGSCC transform pass (and additionally a module
analysis to be accessible). This is essentially *the* point of the
entire pass manager rewrite. A CGSCC transform is able to query for
multiple different function's analysis results. It works. The whole
thing appears to actually work and accomplish the original goal. While
we were able to hack function attrs and basic-aa to "work" in the old
pass manager, this port doesn't use any of that, it directly leverages
the new fundamental functionality.

For this to work, the CGSCC framework also has to support SCC-based
behavior analysis, etc. The only part of the CGSCC pass infrastructure
not sorted out at this point are the updates in the face of inlining and
running function passes that mutate the call graph.

The changes are pretty boring and boiler-plate. Most of the work was
factored into more focused preperatory patches. But this is what wires
it all together.

llvm-svn: 261203
2016-02-18 11:03:11 +00:00
Junmo Park 80440eb804 Minor code cleanup. NFC.
llvm-svn: 261200
2016-02-18 10:09:20 +00:00
Kostya Serebryany d4590c7304 [sanitizer-coverage] implement -fsanitize-coverage=trace-pc. This is similar to trace-bb, but has a different API. We already use the equivalent flag in GCC for Linux kernel fuzzing. We may be able to use this flag with AFL too
llvm-svn: 261159
2016-02-17 21:34:43 +00:00
Amaury Sechet da71cb7b92 NFC: Fix formating
llvm-svn: 261156
2016-02-17 21:21:29 +00:00
Haicheng Wu 57e1a3e6ee [LIR] Avoid turning non-temporal stores into memset
This is to fix PR26645.

llvm-svn: 261149
2016-02-17 21:00:06 +00:00
Adrian Prantl a5b2a64980 Debug Info: Teach LdStHasDebugValue() (Local.cpp) about DIExpressions.
This function is used to check whether a dbg.value intrinsic has already
been inserted, but without comparing the DIExpression, it would erroneously
fire on split aggregates and only the first scalar would survive.

Found via http://reviews.llvm.org/D16867.
<rdar://problem/24456528>

llvm-svn: 261145
2016-02-17 20:02:25 +00:00
Elena Demikhovsky 88e76cad16 Create masked gather and scatter intrinsics in Loop Vectorizer.
Loop vectorizer now knows to vectorize GEP and create masked gather and scatter intrinsics for random memory access.

The feature is enabled on AVX-512 target.
Differential Revision: http://reviews.llvm.org/D15690

llvm-svn: 261140
2016-02-17 19:23:04 +00:00
Amaury Sechet 61a7d629ec Fix load alignement when unpacking aggregates structs
Summary: Store and loads unpacked by instcombine do not always have the right alignement. This explicitely compute the alignement and set it.

Reviewers: dblaikie, majnemer, reames, hfinkel, joker.eph

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D17326

llvm-svn: 261139
2016-02-17 19:21:28 +00:00
David Majnemer f48bcb2bd9 Revert "Reapply commit r258404 with fix."
This reverts commit r259357, it caused PR26629.

llvm-svn: 261137
2016-02-17 19:02:36 +00:00
Frederic Riss 009d60650d [ObjCARC] Handle ARCInstKind::ClaimRV in OptimizeIndividualCalls.
When support for objc_unsafeClaimAutoreleasedReturnValue has been added to the
ARC optimizer in r258970, one case was missed which would lead the optimizer
to execute an llvm_unreachable. In this case, just handle ClaimRV in the same
way we handle RetainRV.

llvm-svn: 261134
2016-02-17 18:51:27 +00:00
Mehdi Amini 1db10ac6ce Define the ThinLTO Pipeline (experimental)
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel (even if
it is not totally "free": projects like clang are reusing product
from the "compile phase" for multiple link, think about
libLLVMSupport reused for opt, llc, etc.).

This pipeline is based on the proposal in D13443 for full LTO. We
didn't move forward on this proposal because the LTO link was far too
long after that. We believe that we can afford it with ThinLTO.

The ThinLTO pipeline integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes. It is not clear if this part
   of the pipeline will stay as is, as the split model of ThinLTO
   does not allow the same benefit as FullLTO without added tricks.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO and it will evolve, but this should provide a good starting
point.

Reviewers: tejohnson

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261029
2016-02-16 23:02:29 +00:00
Mehdi Amini ec8bee1879 Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()" (NFC)
It is intended to contains the passes run over a function after the
inliner is done with a function and before it moves to its callers.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 261028
2016-02-16 22:54:27 +00:00
Junmo Park 6ebdc14cf1 [SCEVExpander] Make findExistingExpansion smarter
Summary:
Extending findExistingExpansion can use existing value in ExprValueMap.
This patch gives 0.3~0.5% performance improvements on 
benchmarks(test-suite, spec2000, spec2006, commercial benchmark)
   
Reviewers: mzolotukhin, sanjoy, zzheng

Differential Revision: http://reviews.llvm.org/D15559

llvm-svn: 260938
2016-02-16 06:46:58 +00:00
Silviu Baranga ec7063ac77 [LV] Add support for insertelt/extractelt processing during type truncation
Summary:
While shrinking types according to the required bits, we can
encounter insert/extract element instructions. This will cause us to
reach an llvm_unreachable statement.

This change adds support for truncating insert/extract element
operations, and adds a regression test.

Reviewers: jmolloy

Subscribers: mzolotukhin, llvm-commits

Differential Revision: http://reviews.llvm.org/D17078

llvm-svn: 260893
2016-02-15 15:38:17 +00:00
Roman Gareev 036c08874a Tweak the LICM code to reuse the first sub-loop instead of creating a new one
LICM starts with an *empty* AST, and then merges in each sub-loop. While the
add code is appropriate for sub-loop 2 and up, it's utterly unnecessary for
sub-loop 1. If the AST starts off empty, we can just clone/move the contents
of the subloop into the containing AST.

Reviewed-by: Philip Reames <listmail@philipreames.com>

Differential Revision: http://reviews.llvm.org/D16753

llvm-svn: 260892
2016-02-15 14:48:50 +00:00
Benjamin Kramer 8a752e316d Use ArrayRef to hide SmallVector details, kill a useless vector copy along the way.
llvm-svn: 260824
2016-02-13 16:01:12 +00:00
Chandler Carruth 632d208c78 [attrs] Move the norecurse deduction to operate on the node set rather
than the SCC object, and have it scan the instruction stream directly
rather than relying on call records.

This makes the behavior of this routine consistent between libc routines
and LLVM intrinsics for libc routines. We can go and start teaching it
about those being norecurse, but we should behave the same for the
intrinsic and the libc routine rather than differently. I chatted with
James Molloy and the inconsistency doesn't seem intentional and likely
is due to intrinsic calls not being modelled in the call graph analyses.

This also fixes a bug where we would deduce norecurse on optnone
functions, when generally we try to handle optnone functions as-if they
were replaceable and thus unanalyzable.

llvm-svn: 260813
2016-02-13 08:47:51 +00:00
Keno Fischer 7c7c3e3591 [Cloning] Clone every Function's Debug Info
Summary:
Export the CloneDebugInfoMetadata utility, which clones all debug info
associated with a function into the first module. Also use this function
in CloneModule on each function we clone (the CloneFunction entrypoint
already does this).

Without this, cloning a module will lead to DI quality regressions,
especially since r252219 reversed the Function <-> DISubprogram edge
(before we could get lucky and have this edge preserved if the
DISubprogram itself was, e.g. due to location metadata).

This was verified to fix missing debug information in julia and
a unittest to verify the new behavior is included.

Patch by Yichao Yu! Thanks!

Reviewers: loladiro, pcc
Differential Revision: http://reviews.llvm.org/D17165

llvm-svn: 260791
2016-02-13 02:04:29 +00:00
Chad Rosier 81362a8599 [LIR] Allow merging of memsets in negatively strided loops.
Last part of PR25166.

llvm-svn: 260732
2016-02-12 21:03:23 +00:00
Justin Lebar 6086c6a387 Fix typo in comment.
llvm-svn: 260731
2016-02-12 21:01:37 +00:00
Justin Lebar db63949e8d [SimplifyCFG] Don't fold conditional branches that contain calls to convergent functions.
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.

Reviewers: jingyue

Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D17128

llvm-svn: 260730
2016-02-12 21:01:36 +00:00
Justin Lebar df04d2a1f1 [LoopRotate] Don't perform loop rotation if the loop header calls a convergent function.
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.

Reviewers: jingyue

Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune

Differential Revision: http://reviews.llvm.org/D17127

llvm-svn: 260729
2016-02-12 21:01:33 +00:00
David Majnemer 01674939b2 Remove unused variable
llvm-svn: 260722
2016-02-12 20:33:51 +00:00
Philip Reames 96fccc2d09 [GVN] Common code for local and non-local load availability [NFCI]
The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case.

The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first.

Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread.

Differential Revision: http://reviews.llvm.org/D16608

llvm-svn: 260711
2016-02-12 19:24:57 +00:00
Chad Rosier 4acff96646 [LIR] Partially revert r252926(NFC), which introduced a very subtle change.
In short, before r252926 we were comparing an unsigned (StoreSize) against an a
APInt (Stride), which is fine and well.  After we were zero extending the Stride
and then converting to an unsigned, which is not the same thing.  Obviously,
Stides can also be negative.  This commit just restores the original behavior.

AFAICT, it's not possible to write a test case to expose the issue because
the code already has checks to make sure the StoreSize can't overflow an
unsigned (which prevents the Stride from overflowing an unsigned as well).

llvm-svn: 260706
2016-02-12 19:05:27 +00:00
David Majnemer 0f0abc7bc2 [InstCombine] Don't aggressively replace xor with icmp
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.

However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).

This patch make the replacement happen only when xor/sub has only one
use.

Differential Revision: http://reviews.llvm.org/D16915

Patch by Taewook Oh!

llvm-svn: 260695
2016-02-12 18:12:38 +00:00
Chandler Carruth 3937bc70f9 [attrs] Simplify the convergent removal to directly use the pre-built
node set rather than walking the SCC directly.

This directly exposes the functions and has already had null entries
filtered out. We also don't need need to handle optnone as it has
already been handled in the caller -- we never try to remove convergent
when there are optnone functions in the SCC.

With this change, the code for removing convergent should work with the
new pass manager and a different SCC analysis.

llvm-svn: 260668
2016-02-12 09:47:49 +00:00
Chandler Carruth 057df3d423 [attrs] Consolidate the test for a non-SCC, non-convergent function call
with the test for a non-convergent intrinsic call.

While it is possible to use the call records to search for function
calls, we're going to do an instruction scan anyways to find the
intrinsics, we can handle both cases while scanning instructions. This
will also make the logic more amenable to the new pass manager which
doesn't use the same call graph structure.

My next patch will remove use of CallGraphNode entirely and allow this
code to work with both the old and new pass manager. Fortunately, it
should also get strictly simpler without changing functionality.

llvm-svn: 260666
2016-02-12 09:23:53 +00:00
Chandler Carruth bbbbec0b54 [attrs] Run clang-format over a newly added routine in function-attrs
before I update it to be friendly with the new pass manager.

llvm-svn: 260653
2016-02-12 03:07:50 +00:00
Evgeniy Stepanov ba6ca87ffb [msan] Put msan constructor in a comdat.
MSan adds a constructor to each translation unit that calls
__msan_init, and does nothing else. The idea is to run __msan_init
before any instrumented code. This results in multiple constructors
and multiple .init_array entries in the final binary, one per
translation unit. This is absolutely unnecessary; one would be
enough.

This change moves the constructors to a comdat group in order to drop
the extra ones.

llvm-svn: 260632
2016-02-12 00:37:52 +00:00
Matthew Simpson a4e43c5b51 [SLP] Add debug output for extract cost (NFC)
llvm-svn: 260614
2016-02-11 23:06:40 +00:00
Quentin Colombet 490cfbe2a2 Re-apply r238452, the bug was in clang and was fixed in r260567.
Original commit message:
[InstCombine] Fold IntToPtr and PtrToInt into preceding loads.

Currently we only fold a BitCast into a Load when the BitCast is its
only user.

Do the same for any no-op cast.

Patch by Philip Pfaffe!

Differential Revision: http://reviews.llvm.org/D9152

llvm-svn: 260612
2016-02-11 22:30:41 +00:00
Mehdi Amini 9c1c3ac627 Revert "Refactor the PassManagerBuilder: extract a "addFunctionSimplificationPasses()""
This reverts commit r260603.
I didn't intend to push it :(

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260607
2016-02-11 22:09:11 +00:00
Mehdi Amini c5bf5ecc1b Revert "Define the ThinLTO Pipeline"
This reverts commit r260604.
I didn't intend to push this now.

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260606
2016-02-11 22:09:07 +00:00
Mehdi Amini 484470d605 Define the ThinLTO Pipeline
Summary:
On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.

This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:

 - The compile phase perform the inliner with a somehow lighter
   function simplification. (TODO: tune the inliner thresholds here)
   This is intendend to simplify the IR and get rid of obvious things
   like linkonce_odr that will be inlined.
 - The link phase will run the pipeline from the start, extended with
   some specific passes that leverage the augmented knowledge we have
   during LTO. Especially after the inliner is done, a sequence of
   globalDCE/globalOpt is performed, followed by another run of the
   "function simplification" passes.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.

Reviewers: tejohnson

Subscribers: joker.eph, llvm-commits, dexonsmith

Differential Revision: http://reviews.llvm.org/D17115

From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260604
2016-02-11 22:00:31 +00:00