Commit Graph

21060 Commits

Author SHA1 Message Date
Nikita Popov 649e125451 [BDCE][DemandedBits] Detect dead uses of undead instructions
This (mostly) fixes https://bugs.llvm.org/show_bug.cgi?id=39771.

BDCE currently detects instructions that don't have any demanded bits
and replaces their uses with zero. However, if an instruction has
multiple uses, then some of the uses may be dead (have no demanded bits)
even though the instruction itself is still live. This patch extends
DemandedBits/BDCE to detect such uses and replace them with zero.
While this will not immediately render any instructions dead, it may
lead to simplifications (in the motivating case, by converting a rotate
into a simple shift), break dependencies, etc.

The implementation tries to strike a balance between analysis power and
complexity/memory usage. Originally I wanted to track demanded bits on
a per-use level, but ultimately we're only really interested in whether
a use is entirely dead or not. I'm using an extra set to track which uses
are dead. However, as initially all uses are dead, I'm not storing uses
those user is also dead. This case is checked separately instead.

The test case has a couple of cases that are not simplified yet. In
particular, we're only looking at uses of instructions right now. I think
it would make sense to also extend this to arguments. Furthermore
DemandedBits doesn't yet know some of the tricks that InstCombine does
for the demanded bits or bitwise or/and/xor in combination with known
bits information.

Differential Revision: https://reviews.llvm.org/D55563

llvm-svn: 349674
2018-12-19 19:56:21 +00:00
Anton Afanasyev ce28791e20 Test commit
Fix typos.

llvm-svn: 349644
2018-12-19 17:18:40 +00:00
Vitaly Buka 4e4920694c [asan] Restore ODR-violation detection on vtables
Summary:
unnamed_addr is still useful for detecting of ODR violations on vtables

Still unnamed_addr with lld and --icf=safe or --icf=all can trigger false
reports which can be avoided with --icf=none or by using private aliases
with -fsanitize-address-use-odr-indicator

Reviewers: eugenis

Reviewed By: eugenis

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55799

llvm-svn: 349555
2018-12-18 22:23:30 +00:00
Kuba Mracek 3760fc9f3d [asan] In llvm.asan.globals, allow entries to be non-GlobalVariable and skip over them
Looks like there are valid reasons why we need to allow bitcasts in llvm.asan.globals, see discussion at https://github.com/apple/swift-llvm/pull/133. Let's look through bitcasts when iterating over entries in the llvm.asan.globals list.

Differential Revision: https://reviews.llvm.org/D55794

llvm-svn: 349544
2018-12-18 21:20:17 +00:00
Pete Cooper be4f571107 Change the objc ARC optimizer to use the new objc.* intrinsics
We're moving ARC optimisation and ARC emission in clang away from runtime methods
and towards intrinsics.  This is the part which actually uses the intrinsics in the ARC
optimizer when both analyzing the existing calls and emitting new ones.

Differential Revision: https://reviews.llvm.org/D55348

Reviewers: ahatanak
llvm-svn: 349534
2018-12-18 20:32:49 +00:00
Nikita Popov 20853a7807 [InstCombine] Simplify cttz/ctlz + icmp eq/ne into mask check
Checking whether a number has a certain number of trailing / leading
zeros means checking whether it is of the form XXXX1000 / 0001XXXX,
which can be done with an and+icmp.

Related to https://bugs.llvm.org/show_bug.cgi?id=28668. As a next
step, this can be extended to non-equality predicates.

Differential Revision: https://reviews.llvm.org/D55745

llvm-svn: 349530
2018-12-18 19:59:50 +00:00
Florian Hahn 5c014037b3 [SCCP] Get rid of redundant call for getPredicateInfoFor (NFC).
We can use the result fetched a few lines above.

llvm-svn: 349527
2018-12-18 19:37:07 +00:00
Sanjay Patel e51d5bdb3c [InstCombine] refactor isCheapToScalarize(); NFC
As the FIXME indicates, this has the potential to go
overboard. So I'm not sure if it's even worth keeping 
this vs. iteratively doing simple matches, but we might 
as well clean it up.

llvm-svn: 349523
2018-12-18 19:07:38 +00:00
Michael Kruse d4eb13c880 [LoopVectorize] Rename pass options. NFC.
Rename:
NoUnrolling to InterleaveOnlyWhenForced
and
AlwaysVectorize to !VectorizeOnlyWhenForced

Contrary to what the name 'AlwaysVectorize' suggests, it does not
unconditionally vectorize all loops, but applies a cost model to
determine whether vectorization is profitable to all loops. Hence,
passing false will disable the cost model, except when a loop is marked
with llvm.loop.vectorize.enable. The 'OnlyWhenForced' suffix (suggested
by @hfinkel in D55716) better matches this behavior.

Similarly, 'NoUnrolling' disables the profitability cost model for
interleaving (a term to distinguish it from unrolling by the
LoopUnrollPass); rename it for consistency.

Differential Revision: https://reviews.llvm.org/D55785

llvm-svn: 349513
2018-12-18 17:46:09 +00:00
Michael Kruse 3284775b70 [LoopUnroll] Honor '#pragma unroll' even with -fno-unroll-loops.
When using clang with `-fno-unroll-loops` (implicitly added with `-O1`),
the LoopUnrollPass is not not added to the (legacy) pass pipeline. This
also means that it will not process any loop metadata such as
llvm.loop.unroll.enable (which is generated by #pragma unroll or
WarnMissedTransformationsPass emits a warning that a forced
transformation has not been applied (see
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610833.html).
Such explicit transformations should take precedence over disabling
heuristics.

This patch unconditionally adds LoopUnrollPass to the optimizing
pipeline (that is, it is still not added with `-O0`), but passes a flag
indicating whether automatic unrolling is dis-/enabled. This is the same
approach as LoopVectorize uses.

The new pass manager's pipeline builder has no option to disable
unrolling, hence the problem does not apply.

Differential Revision: https://reviews.llvm.org/D55716

llvm-svn: 349509
2018-12-18 17:16:05 +00:00
Dylan McKay f920da009e [IPO][AVR] Create new Functions in the default address space specified in the data layout
This modifies the IPO pass so that it respects any explicit function
address space specified in the data layout.

In targets with nonzero program address spaces, all functions should, by
default, be placed into the default program address space.

This is required for Harvard architectures like AVR. Without this, the
functions will be marked as residing in data space, and thus not be
callable.

This has no effect to any in-tree official backends, as none use an
explicit program address space in their data layouts.

Patch by Tim Neumann.

llvm-svn: 349469
2018-12-18 09:52:52 +00:00
Tim Northover 856628f707 SROA: preserve alignment tags on loads and stores.
When splitting up an alloca's uses we were dropping any explicit
alignment tags, which means they default to the ABI-required default
alignment and this can cause miscompiles if the real value was smaller.

Also refactor the TBAA metadata into a parent class since it's shared by
both children anyway.

llvm-svn: 349465
2018-12-18 09:29:39 +00:00
Peter Collingbourne d3a3e4b46d hwasan: Move ctor into a comdat.
Differential Revision: https://reviews.llvm.org/D55733

llvm-svn: 349413
2018-12-17 22:56:34 +00:00
Sanjay Patel 200885e654 [AggressiveInstCombine] convert rotate with guard branch into funnel shift (PR34924)
Now, that we have funnel shift intrinsics, it should be safe to convert this form of rotate to it. 
In the worst case (a target that doesn't have rotate instructions), we will expand this into a 
branch-less sequence of ALU ops (neg/and/and/lshr/shl/or) in the backend, so it's still very 
likely to be a perf improvement over the original code.

The motivating source code pattern for this is shown in:
https://bugs.llvm.org/show_bug.cgi?id=34924

Background:
I looked at several different options before deciding where to try this - instcombine, simplifycfg, 
CGP - because it doesn't fit cleanly anywhere AFAIK.

The backend (CGP, SDAG, GlobalIsel?) is too late for what we're trying to accomplish. We want to 
have the IR converted before we reach things like vectorization because the reduced code can make a 
loop much simpler to transform.

Technically, this could be included in instcombine, but it's a large pattern match that includes 
control-flow, so it just felt wrong to stuff into there (although I have a draft of that patch). 
Similarly, this could be part of simplifycfg, but all of this pattern matching is a stretch.

So we're left with our relatively new dumping ground for homeless transforms: aggressive-instcombine. 
This only runs at -O3, but that seems like a reasonable limitation given that source code has many 
options to avoid this pattern (including the recently added clang intrinsics for rotates).

I'm including a PhaseOrdering test because we require the teamwork of 3 passes (aggressive-instcombine, 
instcombine, simplifycfg) to get this into the minimal IR form that we want. That test shows a bug
with the new pass manager that's independent of this change (but it will be masked if we canonicalize
harder to funnel shift intrinsics in instcombine).

Differential Revision: https://reviews.llvm.org/D55604

llvm-svn: 349396
2018-12-17 21:14:51 +00:00
Sanjay Patel 1a6e9ec434 [InstCombine] don't widen an arbitrary sequence of vector ops (PR40032)
The problem is shown specifically for a case with vector multiply here:
https://bugs.llvm.org/show_bug.cgi?id=40032
...and this might mask the original backend bug for ARM shown in:
https://bugs.llvm.org/show_bug.cgi?id=39967

As the test diffs here show, we were (and probably still aren't) doing 
these kinds of transforms in a principled way. We are producing more or 
equal wide instructions than we started with in some cases, so we still 
need to restrict/correct other transforms from overstepping.

If there are perf regressions from this change, we can either carve out 
exceptions to the general IR rules, or improve the backend to do these 
transforms when we know the transform is profitable. That's probably 
similar to a change like D55448.

Differential Revision: https://reviews.llvm.org/D55744

llvm-svn: 349389
2018-12-17 20:27:43 +00:00
Davide Italiano e41e1d015f [EarlyCSE] If DI can't be salvaged, mark it as unavailable.
Fixes PR39874.

llvm-svn: 349323
2018-12-17 01:42:39 +00:00
Kamil Rytarowski 21e270a479 Add NetBSD support in needsRuntimeRegistrationOfSectionRange.
Use linker script magic to get data/cnts/name start/end.

llvm-svn: 349277
2018-12-15 16:51:35 +00:00
Kamil Rytarowski 15ae738bc8 Register kASan shadow offset for NetBSD/amd64
The NetBSD x86_64 kernel uses the 0xdfff900000000000 shadow
offset.

llvm-svn: 349276
2018-12-15 16:32:41 +00:00
Florian Hahn c214bc2b8d [NewGVN] Update use counts for SSA copies when replacing them by their operands.
The current code relies on LeaderUseCount to determine if we can remove
an SSA copy, but in that the LeaderUseCount does not refer to the SSA
copy. If a SSA copy is a dominating leader, we use the operand as dominating
leader instead. This means we removed a user of a ssa copy and we should
decrement its use count, so we can remove the ssa copy once it becomes dead.

Fixes PR38804.

Reviewers: efriedma, davide

Reviewed By: davide

Differential Revision: https://reviews.llvm.org/D51595

llvm-svn: 349217
2018-12-15 00:32:38 +00:00
Vedant Kumar 9d1827331f [Util] Refer to [s|z]exts of args when converting dbg.declares (fix PR35400)
When converting dbg.declares, if the described value is a [s|z]ext,
refer to the ext directly instead of referring to its operand.

This fixes a narrowing bug (the debugger got the sign of a variable
wrong, see llvm.org/PR35400).

The main reason to refer to the ext's operand was that an optimization
may remove the ext itself, leading to a dropped variable. Now that
InstCombine has been taught to use replaceAllDbgUsesWith (r336451), this
is less of a concern. Other passes can/should adopt this API as needed
to fix dropped variable bugs.

Differential Revision: https://reviews.llvm.org/D51813

llvm-svn: 349214
2018-12-15 00:03:33 +00:00
Michael Kruse ea9ef34558 [TransformWarning] Do not warn missed transformations in optnone functions.
Optimization transformations are intentionally disabled by the 'optnone'
function attribute. Therefore do not warn if transformation metadata is
still present.

Using the legacy pass manager structure, the `skipFunction` method takes
care for the optnone attribute (already called before this patch). For
the new pass manager, there is no equivalent, so we check for the
'optnone' attribute manually.

Differential Revision: https://reviews.llvm.org/D55690

llvm-svn: 349184
2018-12-14 19:45:43 +00:00
Michael Kruse 5948b7f30f [Transforms] Preserve metadata when converting invoke to call.
The `changeToCall` function did not preserve the invoke's metadata.
Currently, there is probably no metadata that depends on being applied
on a CallInst or InvokeInst. Therefore we can replace the instruction's
metadata.

This fixes http://llvm.org/PR39994

Suggested-by: Moritz Kreutzer <moritz.kreutzer@siemens.com>

Differential Revision: https://reviews.llvm.org/D55666

llvm-svn: 349170
2018-12-14 18:15:11 +00:00
Evgeniy Stepanov eb238ecf0f Revert "[hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)"
Breaks sanitizer-android buildbot.

This reverts commit af8443a984c3b491c9ca2996b8d126ea31e5ecbe.

llvm-svn: 349092
2018-12-13 23:47:50 +00:00
Wei Mi 66c6c5abea [SampleFDO] handle ProfileSampleAccurate when initializing function entry count
ProfileSampleAccurate is used to indicate the profile has exact match to the
code to be optimized.

Previously ProfileSampleAccurate is handled in ProfileSummaryInfo::isColdCallSite
and ProfileSummaryInfo::isColdBlock. A better solution is to initialize function
entry count to 0 when ProfileSampleAccurate is true, so we don't have to handle
ProfileSampleAccurate in multiple places.

Differential Revision: https://reviews.llvm.org/D55660

llvm-svn: 349088
2018-12-13 21:51:42 +00:00
Nikita Popov dc73a6edde Reapply "[MemCpyOpt] memset->memcpy forwarding with undef tail"
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

The previous version of this patch did not perform dependency checking
properly: While the dependency is checked at the position of the memset,
the used size must be that of the memcpy. Previously the size of the
memset was used, which missed modification in the region
MemSetSize..CopySize, resulting in miscompiles. The added tests cover
variations of this issue.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 349078
2018-12-13 20:04:27 +00:00
Easwaran Raman 5a7056fa03 [ThinLTO] Compute synthetic function entry count
Summary:
This patch computes the synthetic function entry count on the whole
program callgraph (based on module summary) and writes the entry counts
to the summary. After function importing, this count gets attached to
the IR as metadata. Since it adds a new field to the summary, this bumps
up the version.

Reviewers: tejohnson

Subscribers: mehdi_amini, inglorion, llvm-commits

Differential Revision: https://reviews.llvm.org/D43521

llvm-svn: 349076
2018-12-13 19:54:27 +00:00
Davide Italiano 9737096bb1 [LoopUtils] Use i32 instead of `void`.
The actual type of the first argument of the @dbg intrinsic
doesn't really matter as we're setting it to `undef`, but the
bitcode reader is picky about `void` types.

llvm-svn: 349069
2018-12-13 18:37:23 +00:00
Vitaly Buka a257639a69 [asan] Don't check ODR violations for particular types of globals
Summary:
private and internal: should not trigger ODR at all.
unnamed_addr: current ODR checking approach fail and rereport false violation if
a linker merges such globals
linkonce_odr, weak_odr: could cause similar problems and they are already not
instrumented for ELF.

Reviewers: eugenis, kcc

Subscribers: kubamracek, hiraditya, llvm-commits

Differential Revision: https://reviews.llvm.org/D55621

llvm-svn: 349015
2018-12-13 09:47:39 +00:00
David L. Jones 54c01ad6a9 Revert r348645 - "[MemCpyOpt] memset->memcpy forwarding with undef tail"
This revision caused trucated memsets for structs with padding. See:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20181210/610520.html

llvm-svn: 349002
2018-12-13 03:15:11 +00:00
Davide Italiano 8ee59ca653 [LoopUtils] Prefer a set over a map. NFCI.
llvm-svn: 348999
2018-12-13 01:11:52 +00:00
Davide Italiano 744c3c327f [LoopDeletion] Update debug values after loop deletion.
When loops are deleted, we don't keep track of variables modified inside
the loops, so the DI will contain the wrong value for these.

e.g.

int b() {

int i;
for (i = 0; i < 2; i++)
  ;
patatino();
return a;
-> 6 patatino();

7     return a;
8   }
9   int main() { b(); }
(lldb) frame var i
(int) i = 0

We mark instead these values as unavailable inserting a
@llvm.dbg.value(undef to make sure we don't end up printing an incorrect
value in the debugger. We could consider doing something fancier,
for, e.g. constants, in the future.

PR39868.
rdar://problem/46418795)

Differential Revision: https://reviews.llvm.org/D55299

llvm-svn: 348988
2018-12-12 23:32:35 +00:00
Nikita Popov 36e03ac6ee [InstCombine] Fix negative GEP offset evaluation for 32-bit pointers
This fixes https://bugs.llvm.org/show_bug.cgi?id=39908.

The evaluateGEPOffsetExpression() function simplifies GEP offsets for
use in comparisons against zero, basically by converting X*Scale+Offset==0
to X+Offset/Scale==0 if Scale divides Offset. However, before this is done,
Offset is masked down to the pointer size. This results in incorrect
results for negative Offsets, because we basically end up dividing the
32-bit offset *zero* extended to 64-bit bits (rather than sign extended).

Fix this by explicitly sign extending the truncated value.

Differential Revision: https://reviews.llvm.org/D55449

llvm-svn: 348987
2018-12-12 23:19:03 +00:00
Ryan Prichard e028c818f5 [hwasan] Android: Switch from TLS_SLOT_TSAN(8) to TLS_SLOT_SANITIZER(6)
Summary:
The change is needed to support ELF TLS in Android. See D55581 for the
same change in compiler-rt.

Reviewers: srhines, eugenis

Reviewed By: eugenis

Subscribers: srhines, llvm-commits

Differential Revision: https://reviews.llvm.org/D55592

llvm-svn: 348983
2018-12-12 22:45:06 +00:00
Michael Kruse 7244852557 [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes.
When multiple loop transformation are defined in a loop's metadata, their order of execution is defined by the order of their respective passes in the pass pipeline. For instance, e.g.

    #pragma clang loop unroll_and_jam(enable)
    #pragma clang loop distribute(enable)

is the same as

    #pragma clang loop distribute(enable)
    #pragma clang loop unroll_and_jam(enable)

and will try to loop-distribute before Unroll-And-Jam because the LoopDistribute pass is scheduled after UnrollAndJam pass. UnrollAndJamPass only supports one inner loop, i.e. it will necessarily fail after loop distribution. It is not possible to specify another execution order. Also,t the order of passes in the pipeline is subject to change between versions of LLVM, optimization options and which pass manager is used.

This patch adds 'followup' attributes to various loop transformation passes. These attributes define which attributes the resulting loop of a transformation should have. For instance,

    !0 = !{!0, !1, !2}
    !1 = !{!"llvm.loop.unroll_and_jam.enable"}
    !2 = !{!"llvm.loop.unroll_and_jam.followup_inner", !3}
    !3 = !{!"llvm.loop.distribute.enable"}

defines a loop ID (!0) to be unrolled-and-jammed (!1) and then the attribute !3 to be added to the jammed inner loop, which contains the instruction to distribute the inner loop.

Currently, in both pass managers, pass execution is in a fixed order and UnrollAndJamPass will not execute again after LoopDistribute. We hope to fix this in the future by allowing pass managers to run passes until a fixpoint is reached, use Polly to perform these transformations, or add a loop transformation pass which takes the order issue into account.

For mandatory/forced transformations (e.g. by having been declared by #pragma omp simd), the user must be notified when a transformation could not be performed. It is not possible that the responsible pass emits such a warning because the transformation might be 'hidden' in a followup attribute when it is executed, or it is not present in the pipeline at all. For this reason, this patche introduces a WarnMissedTransformations pass, to warn about orphaned transformations.

Since this changes the user-visible diagnostic message when a transformation is applied, two test cases in the clang repository need to be updated.

To ensure that no other transformation is executed before the intended one, the attribute `llvm.loop.disable_nonforced` can be added which should disable transformation heuristics before the intended transformation is applied. E.g. it would be surprising if a loop is distributed before a #pragma unroll_and_jam is applied.

With more supported code transformations (loop fusion, interchange, stripmining, offloading, etc.), transformations can be used as building blocks for more complex transformations (e.g. stripmining+stripmining+interchange -> tiling).

Reviewed By: hfinkel, dmgreen

Differential Revision: https://reviews.llvm.org/D49281
Differential Revision: https://reviews.llvm.org/D55288

llvm-svn: 348944
2018-12-12 17:32:52 +00:00
Mikael Holmen c06b01cb22 Fix compiler warning about unused variable [NFC]
llvm-svn: 348913
2018-12-12 06:33:45 +00:00
Gor Nishanov 20d833d5e3 [coroutines] Improve suspend point simplification
Summary:
Enable suspend point simplification for cases where:
* coro.save and coro.suspend are in different basic blocks
* where there are intervening intrinsics

Reviewers: modocache, tks2103, lewissbaker

Reviewed By: modocache

Subscribers: EricWF, llvm-commits

Differential Revision: https://reviews.llvm.org/D55160

llvm-svn: 348897
2018-12-11 21:23:09 +00:00
Fedor Sergeev a1d95c3fc4 [NewPM] fixing asserts on deleted loop in -print-after-all
IR-printing AfterPass instrumentation might be called on a loop
that has just been invalidated. We should skip printing it to
avoid spurious asserts.

Reviewed By: chandlerc, philip.pfaffe
Differential Revision: https://reviews.llvm.org/D54740

llvm-svn: 348887
2018-12-11 19:05:35 +00:00
Vedant Kumar b3a7cae045 [HotColdSplitting] Disable outlining landingpad instructions (PR39917)
It's currently not safe to outline landingpad instructions (see
llvm.org/PR39917). Like @llvm.eh.typeid.for, the order and content of
previous landingpad instructions in a function alters the lowering of
subsequent landingpads by renumbering type info ID's. Outlining a
landingpad therefore breaks exception handling & unwinding.

llvm-svn: 348870
2018-12-11 18:05:31 +00:00
Sanjay Patel 2aa2dc76c2 [InstCombine] try to convert x86 movmsk intrinsic to generic IR (PR39927)
call iM movmsk(sext <N x i1> X) --> zext (bitcast <N x i1> X to iN) to iM

This has the potential to create less-than-8-bit scalar types as shown in 
some of the test diffs, but it looks like the backend knows how to deal 
with that in these patterns. This is the simple part of the fix suggested in:
https://bugs.llvm.org/show_bug.cgi?id=39927

Differential Revision: https://reviews.llvm.org/D55529

llvm-svn: 348862
2018-12-11 16:38:03 +00:00
David Stenberg 2474ce5862 [DeadArgElim] Fixes for dbg.values using dead arg/return values
Summary:
When eliminating a dead argument or return value in a function with
local linkage, all uses, including in dbg.value intrinsics, would be
replaced with null constants. This would mean that, for example for an
integer argument, the debug info would incorrectly express that the
value is 0. Instead, replace all uses with undef to indicate that the
argument/return value is optimized out.

Also, make sure that metadata uses of return values are rewritten even
if there are no non-metadata uses of the value.

As a bit of historical curiosity, the code that emitted null constants
was introduced in the initial check-in of the pass in 2003, before
'undef' values even existed in LLVM.

This fixes PR23260.

Reviewers: dblaikie, aprantl, vsk, djtodoro

Reviewed By: aprantl

Subscribers: llvm-commits

Tags: #debug-info

Differential Revision: https://reviews.llvm.org/D55513

llvm-svn: 348837
2018-12-11 10:33:38 +00:00
Davide Italiano 8ec7709f58 [Local] Promote an utility that could be used elsewhere. NFCI.
llvm-svn: 348804
2018-12-10 22:17:04 +00:00
Matt Arsenault 9ccde61f81 InstCombine: Scalarize single use icmp/fcmp
llvm-svn: 348801
2018-12-10 21:50:54 +00:00
Nikita Popov 94b8e2ea4e [MemCpyOpt] memset->memcpy forwarding with undef tail
Currently memcpyopt optimizes cases like

    memset(a, byte, N);
    memcpy(b, a, M);

to

    memset(a, byte, N);
    memset(b, byte, M);

if M <= N. Often this allows further simplifications down the line,
which drop the first memset entirely.

This patch extends this optimization for the case where M > N, but we
know that the bytes a[N..M] are undef due to alloca/lifetime.start.

This situation arises relatively often for Rust code, because Rust does
not initialize trailing structure padding and loves to insert redundant
memcpys. This also fixes https://bugs.llvm.org/show_bug.cgi?id=39844.

For the implementation, I'm reusing a bit of code for a similar existing
optimization (direct memcpy of undef). I've also added memset support to
MemDepAnalysis GetLocation -- Instead, getPointerDependencyFrom could be
used, but it seems to make more sense to add this to GetLocation and thus
make the computation cachable.

Differential Revision: https://reviews.llvm.org/D55120

llvm-svn: 348645
2018-12-07 21:16:58 +00:00
Vedant Kumar 03f9f15b16 [HotColdSplitting] Refine definition of unlikelyExecuted
The splitting pass uses its 'unlikelyExecuted' predicate to statically
decide which blocks are cold.

- Do not treat noreturn calls as if they are cold unless they are actually
  marked cold. This is motivated by functions like exit() and longjmp(), which
  are not beneficial to outline.

- Do not treat inline asm as an outlining barrier. In practice asm("") is
  frequently used to inhibit basic block merging; enabling outlining in this case
  results in substantial memory savings.

- Treat invokes of cold functions as cold.

As a drive-by, remove the 'exceptionHandlingFunctions' predicate, because it's
no longer needed. The pass can identify & outline blocks dominated by EH pads,
so there's no need to special-case __cxa_begin_catch etc.

Differential Revision: https://reviews.llvm.org/D54244

llvm-svn: 348640
2018-12-07 20:24:04 +00:00
Vedant Kumar 03aaa3e2aa [HotColdSplitting] Outline more than once per function
Algorithm: Identify maximal cold regions and put them in a worklist. If
a candidate region overlaps with another, discard it. While the worklist
is full, remove a single-entry sub-region from the worklist and attempt
to outline it. By the non-overlap property, this should not invalidate
parts of the domtree pertaining to other outlining regions.

Testing: LNT results on X86 are clean. With test-suite + externals, llvm
outlines 134KB pre-patch, and 352KB post-patch (+ ~2.6x). The file
483.xalancbmk/src/Constants.cpp stands out as an extreme case where llvm
outlines over 100 times in some functions (mostly EH paths). There was
not a significant performance impact pre vs. post-patch.

Differential Revision: https://reviews.llvm.org/D53887

llvm-svn: 348639
2018-12-07 20:23:52 +00:00
Nikita Popov 110cf05203 Reapply "[DemandedBits][BDCE] Support vectors of integers"
DemandedBits and BDCE currently only support scalar integers. This
patch extends them to also handle vector integer operations. In this
case bits are not tracked for individual vector elements, instead a
bit is demanded if it is demanded for any of the elements. This matches
the behavior of computeKnownBits in ValueTracking and
SimplifyDemandedBits in InstCombine.

Unlike the previous iteration of this patch, getDemandedBits() can now
again be called on arbirary (sized) instructions, even if they don't
have integer or vector of integer type. (For vector types the size of the
returned mask will now be the scalar size in bits though.)

The added LoopVectorize test case shows a case which triggered an
assertion failure with the previous attempt, because getDemandedBits()
was called on a pointer-typed instruction.

Differential Revision: https://reviews.llvm.org/D55297

llvm-svn: 348602
2018-12-07 15:38:13 +00:00
Max Kazantsev b9e65cbddf Introduce llvm.experimental.widenable_condition intrinsic
This patch introduces a new instinsic `@llvm.experimental.widenable_condition`
that allows explicit representation for guards. It is an alternative to using
`@llvm.experimental.guard` intrinsic that does not contain implicit control flow.

We keep finding places where `@llvm.experimental.guard` is not supported or
treated too conservatively, and there are 2 reasons to that:

- `@llvm.experimental.guard` has memory write side effect to model implicit control flow,
  and this sometimes confuses passes and analyzes that work with memory;
- Not all passes and analysis are aware of the semantics of guards. These passes treat them
  as regular throwing call and have no idea that the condition of guard may be used to prove
  something. One well-known place which had caused us troubles in the past is explicit loop
  iteration count calculation in SCEV. Another example is new loop unswitching which is not
  aware of guards. Whenever a new pass appears, we potentially have this problem there.

Rather than go and fix all these places (and commit to keep track of them and add support
in future), it seems more reasonable to leverage the existing optimizer's logic as much as possible.
The only significant difference between guards and regular explicit branches is that guard's condition
can be widened. It means that a guard contains (explicitly or implicitly) a `deopt` block successor,
and it is always legal to go there no matter what the guard condition is. The other successor is
a guarded block, and it is only legal to go there if the condition is true.

This patch introduces a new explicit form of guards alternative to `@llvm.experimental.guard`
intrinsic. Now a widenable guard can be represented in the CFG explicitly like this:


    %widenable_condition = call i1 @llvm.experimental.widenable.condition()
    %new_condition = and i1 %cond, %widenable_condition
    br i1 %new_condition, label %guarded, label %deopt

  guarded:
    ; Guarded instructions

  deopt:
    call type @llvm.experimental.deoptimize(<args...>) [ "deopt"(<deopt_args...>) ]

The new intrinsic `@llvm.experimental.widenable.condition` has semantics of an
`undef`, but the intrinsic prevents the optimizer from folding it early. This form
should exploit all optimization boons provided to `br` instuction, and it still can be
widened by replacing the result of `@llvm.experimental.widenable.condition()`
with `and` with any arbitrary boolean value (as long as the branch that is taken when
it is `false` has a deopt and has no side-effects).

For more motivation, please check llvm-dev discussion "[llvm-dev] Giving up using
implicit control flow in guards".

This patch introduces this new intrinsic with respective LangRef changes and a pass
that converts old-style guards (expressed as intrinsics) into the new form.

The naming discussion is still ungoing. Merging this to unblock further items. We can
later change the name of this intrinsic.

Reviewed By: reames, fedor.sergeev, sanjoy
Differential Revision: https://reviews.llvm.org/D51207

llvm-svn: 348593
2018-12-07 14:39:46 +00:00
Markus Lavin 4dc4ebd606 [PM] Port LoadStoreVectorizer to the new pass manager.
Differential Revision: https://reviews.llvm.org/D54848

llvm-svn: 348570
2018-12-07 08:23:37 +00:00
Max Kazantsev a523a21175 [LoopSimplifyCFG] Do not deal with loops with irreducible CFG inside
The current algorithm that collects live/dead/inloop blocks relies on some invariants
related to RPO and PO traversals. In particular, the important fact it requires is that
the only loop's latch is the first block in PO traversal. It also relies on fact that during
RPO we visit all prececessors of a block before we visit this block (backedges ignored).

If a loop has irreducible non-loop cycle inside, both these assumptions may break.
This patch adds detection for this situation and prohibits the terminator folding
for loops with irreducible CFG.

We can in theory support this later, for this some algorithmic changes are needed.
Besides, irreducible CFG is not a frequent situation and we can just don't bother.

Thanks @uabelho for finding this!

Differential Revision: https://reviews.llvm.org/D55357
Reviewed By: skatkov

llvm-svn: 348567
2018-12-07 05:44:45 +00:00
Vedant Kumar b2a6f8e505 [CodeExtractor] Store outputs at the first valid insertion point
When CodeExtractor outlines values which are used by the original
function, it must store those values in some in-out parameter. This
store instruction must not be inserted in between a PHI and an EH pad
instruction, as that results in invalid IR.

This fixes the following verifier failure seen while outlining within
ObjC methods with live exit values:

  The unwind destination does not have an exception handling instruction!
    %call35 = invoke i8* bitcast (i8* (i8*, i8*, ...)* @objc_msgSend to i8* (i8*, i8*)*)(i8* %exn.adjusted, i8* %1)
            to label %invoke.cont34 unwind label %lpad33, !dbg !4183
  The unwind destination does not have an exception handling instruction!
    invoke void @objc_exception_throw(i8* %call35) #12
            to label %invoke.cont36 unwind label %lpad33, !dbg !4184
  LandingPadInst not the first non-PHI instruction in the block.
    %3 = landingpad { i8*, i32 }
            catch i8* null, !dbg !1411

rdar://46540815

llvm-svn: 348562
2018-12-07 03:01:54 +00:00