Allowing cycles in Phi traversal increases the scope of optimize memory instruction
in case we are in loop.
The added test shows an example of enabling optimization inside a loop.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35294
llvm-svn: 308419
Summary: Currently, when GVN creates a load and when InstCombine creates a new store for unreachable Load, the DebugLoc info gets lost.
Reviewers: dberlin, davide, aprantl
Reviewed By: aprantl
Subscribers: davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D34639
llvm-svn: 308404
It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads.
Notes:
Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz.
There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare.
We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329:
https://bugs.llvm.org/show_bug.cgi?id=33329#c12
TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion.
Committed on behalf of Sanjay Patel
Differential Revision: https://reviews.llvm.org/D35067
llvm-svn: 308322
using runtime checks
Extend the SCEVPredicateRewriter to work a bit harder when it encounters an
UnknownSCEV for a Phi node; Try to build an AddRecurrence also for Phi nodes
whose update chain involves casts that can be ignored under the proper runtime
overflow test. This is one step towards addressing PR30654.
Differential revision: http://reviews.llvm.org/D30041
llvm-svn: 308299
Summary:
Currently most tests for the loop interchange pass are in
test/Transforms/LoopInterchange/interchange.ll. This patch splits up the
large test file in smaller pieces, which makes debugging test failures
easier.
Reviewers: karthikthecool, blitz.opensource, hfinkel
Reviewed By: hfinkel
Subscribers: hfinkel, mcrosier, mkuper, mzolotukhin, mssimpso, llvm-commits
Differential Revision: https://reviews.llvm.org/D35488
llvm-svn: 308284
In some particular cases eq/ne conditions can be turned into equivalent
slt/sgt conditions. This patch teaches parseLoopStructure to handle some
of these cases.
Differential Revision: https://reviews.llvm.org/D35010
llvm-svn: 308264
Summary:
Add canary tests to verify that InstCombine currently does nothing with the element atomic memory intrinsics for memmove and memset.
Placeholder tests that will fail once element atomic @llvm.mem[move|set] instrinsics have been added to the MemIntrinsic class hierarchy. These will act as a reminder to verify that inst combine handles these intrinsics properly once they have been added to that class hierarchy.
Reviewers: reames
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35502
llvm-svn: 308247
This reverts commit r308114 (and follow on fixes to test).
There is a linking failure in a ThinLTO bot:
http://green.lab.llvm.org/green/job/clang-stage2-configure-Rthinlto_build/3663/
(and undefined reference). It seems like it must be a second order
effect of the heuristic change I made, and may take some time to try
to reproduce locally and track down. Therefore, reverting for now.
llvm-svn: 308206
Finally figured out that some bots were failing from r308114
with the message:
llvm-lto2: LTO::run failed: No available targets are compatible with this triple.
after adding in some other checking that finally caused this to show up
in the FileCheck output.
Added "REQUIRES: x86-registered-target" which should fix it.
llvm-svn: 308119
This restores r308078/r308079 with a fix for bot non-determinisim (make
sure we run llvm-lto in single threaded mode so the debug output doesn't get
interleaved).
llvm-svn: 308114
Summary:
If one side simplifies to the identity value for inner opcode, we can replace the value with just the operation that can't be simplified.
I've removed a couple now unneeded special cases in visitAnd and visitOr. There are probably other cases I missed.
Reviewers: spatel, majnemer, hfinkel, dberlin
Reviewed By: spatel
Subscribers: grandinj, llvm-commits, spatel
Differential Revision: https://reviews.llvm.org/D35451
llvm-svn: 308111
This fold hit the trifecta:
1. It was untested.
2. It oversteps (multiuse is not checked, so increases instruction count).
3. It is incomplete (doesn't work for vectors).
llvm-svn: 308102
that appears to exhibit non-determinism and is flaking on the bots
pretty consistently.
r308078: [ThinLTO] Ensure we always select the same function copy to import
r308079: Require asserts in new test that uses debug flag
llvm-svn: 308095
Summary:
Check if the first eligible callee is under the instruction threshold.
Checking this on the first eligible callee ensures that we don't end
up selecting different callees to import when we invoke this routine
with different thresholds due to reaching the callee via paths that
are shallower or hotter (when there are multiple copies, i.e. with
weak or linkonce linkage). We don't want to leave the decision of which
copy to import up to the backend.
Reviewers: mehdi_amini
Subscribers: inglorion, fhahn, llvm-commits
Differential Revision: https://reviews.llvm.org/D35436
llvm-svn: 308078
Now, getUserCost() only checks the src and dst types of EXT to decide it is free
or not. This change first checks the types, then calls isExtFreeImpl(), and
check if EXT can form ExtLoad at last. Currently, only AArch64 has customized
implementation of isExtFreeImpl() to check if EXT can be folded into its use.
Differential Revision: https://reviews.llvm.org/D34458
llvm-svn: 308076
Summary:
When checking for memory dependencies between calls using MemorySSA,
handle cases where the calls have no MemoryAccess associated with them
because the AA analysis being used has determined that the call does not
read/write memory.
Fixes PR33756
Reviewers: dberlin, davide
Subscribers: mcrosier, llvm-commits, Prazek
Differential Revision: https://reviews.llvm.org/D35317
llvm-svn: 308051
Add the following pattern to TryToUnfoldSelectInCurrBB()
bb:
%p = phi [0, %bb1], [1, %bb2], [0, %bb3], [1, %bb4], ...
%c = cmp %p, 0
%s = select %c, trueval, falseval
The Select in the above pattern will be unfolded and then jump-threaded. The
current implementation does not allow CMP in the middle of PHI and Select.
Differential Revision: https://reviews.llvm.org/D34762
llvm-svn: 308050
When iterating through loop
for (int i = INT_MAX; i > 0; i--)
We fail to generate the pre-loop for it. It happens because we use the
overflown value in a comparison predicate when identifying whether or not
we need it.
In old logic, we used SLE predicate against Greatest value which exceeds all
seen values of the IV and might be overflown. Now we use the GreatestSeen
value of this IV with SLT predicate.
Also added a test that ensures that a pre-loop is generated for such loops.
Differential Revision: https://reviews.llvm.org/D35347
llvm-svn: 308001
Summary:
When we runtime unroll with multiple exit blocks, we also need to update the
immediate dominators of the immediate successors of the exit blocks.
Reviewers: reames, mkuper, mzolotukhin, apilipenko
Reviewed by: mzolotukhin
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D35304
llvm-svn: 307909
Summary:
Similar to X86, it should be safe to inline callees if their
target-features are a subset of the caller. As some subtarget features
provide different instructions depending on whether they are set or
unset (e.g. ThumbMode and ModeSoftFloat), we use a whitelist of
target-features describing hardware capabilities only.
Reviewers: kristof.beyls, rengolin, t.p.northover, SjoerdMeijer, peter.smith, silviu.baranga, efriedma
Reviewed By: SjoerdMeijer, efriedma
Subscribers: dschuff, efriedma, aemerson, sdardis, javed.absar, arichardson, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D34697
llvm-svn: 307889
When we fail to sink an instruction, we must make sure not to modify
the function; otherwise, we end up in an infinite loop because
CodeGenPrepare iterates until it doesn't make any changes.
Fixes https://bugs.llvm.org/show_bug.cgi?id=33608 .
llvm-svn: 307866
This is an incremental change to the promotion feature.
There are two problems with the current behavior:
1) loops with multiple exiting blocks are totally disabled
2) a counter update can only be promoted one level up in
the loop nest -- which does help much for short trip
count inner loops inside a high trip-count outer loops.
Due to this limitation, we still saw very large profile
count fluctuations from run to run for the affected loops
which are usually very hot.
This patch adds the support for promotion counters iteratively
across the loop nest. It also turns on the promotion for
loops with multiple exiting blocks (with a limit).
For single-threaded applications, the performance impact is flat
on average. For instance, dealII improves, but povray regresses.
llvm-svn: 307863
Refactored the code and separated out a function
`canSafelyUnrollMultiExitLoop` to reduce redundant checks and make it
easier to add profitability heuristics later.
Added tests to runtime unrolling to make sure that unrolling for
multi-exit loops is not done unless the option
-unroll-runtime-multi-exit is true.
llvm-svn: 307843
Summary:
LoopRotate manually updates the DoomTree by iterating over all predecessors of a basic block and computing the Nearest Common Dominator.
When a predecessor happens to be unreachable, `DT.findNearestCommonDominator` returns nullptr.
This patch teaches LoopRotate to handle this case and fixes [[ https://bugs.llvm.org/show_bug.cgi?id=33701 | PR33701 ]].
In the future, LoopRotate should be taught to use the new incremental API for updating the DomTree.
Reviewers: dberlin, davide, uabelho, grosser
Subscribers: efriedma, mzolotukhin
Differential Revision: https://reviews.llvm.org/D35074
llvm-svn: 307828
As promised in D35003.
Uses -codegenprepare instead of -instcombine since we hit the same
buggy path anyway, and CGP lets us keep this test really simple
(instcombine likes turning the alloca T, N into alloca [N x T], which
hides the bug this is testing for).
llvm-svn: 307811
This normally indicates mixed CFI + non-CFI compilation, and will
result in us treating the function in the same way as a function
defined outside of the LTO unit.
Part of PR33752.
Differential Revision: https://reviews.llvm.org/D35281
llvm-svn: 307744
[GlobalOpt] Remove unreachable blocks before optimizing a function.
While the change is presumably correct, it exposes a latent bug
in DI which breaks on of the CFI checks. I'll analyze it further
and try to understand what's going on.
llvm-svn: 307729
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
This is fine as nothing in the code relies on leader and memory
leader being the same for a given congruency class. Ack'ed by
Dan.
Fixes PR33720.
llvm-svn: 307699
The loop structure for the outer loop does not contain the epilog
preheader when we try to unroll inner loop with multiple exits and
epilog code is generated. For now, we just bail out in such cases.
Added a test case that shows the problem. Without this bailout, we would
trip on assert saying LCSSA form is incorrect for outer loop.
llvm-svn: 307676