We already do this for min/max (see the blob above the diff),
so we should do the same for abs/nabs.
A sign-bit check (<s 0) is used as a predicate for other IR
transforms and it's likely the best for codegen.
This might solve the motivating cases for D47037 and D47041,
but I think those patches still make sense. We can't guarantee
this canonicalization if the icmp has more than one use.
Differential Revision: https://reviews.llvm.org/D47076
llvm-svn: 332819
In the patch rL329547, we have lifted the over-restrictive limitation on collected range
checks, allowing to work with range checks with the end of their range not being
provably non-negative. However it appeared that the non-negativity of this value was
assumed in the utility function `ClampedSubtract`. In particular, its reasoning is based
on the fact that `0 <= SINT_MAX - X`, which is not true if `X` is negative.
The function `ClampedSubtract` is only called twice, once with `X = 0` (which is OK)
and the second time with `X = IRC.getEnd()`, where we may now see the problem if
the end is actually a negative value. In this case, we may sometimes miscompile.
This patch is the conservative fix of the miscompile problem. Rather than rejecting
non-provably non-negative `getEnd()` values, we will check it for non-negativity in
runtime. For this, we use function `smax(smin(X, 0), -1) + 1` that is equal to `1` if `X`
is non-negative and is equal to 0 if `X` is negative. If we multiply `Begin, End` of safe
iteration space by this function calculated for `X = IRC.getEnd()`, we will get the original
`[Begin, End)` if `IRC.getEnd()` was non-negative (and, thus, `ClampedSubtract` worked
correctly) and the empty range `[0, 0)` in case if ` IRC.getEnd()` was negative.
So we in fact prohibit execution of the main loop if at least one of range checks was
made against a negative value (and we figured it out in runtime). It is still better than
what we have before (non-negativity had to be proved in compile time) and prevents
us from miscompile, however it is sometiles too restrictive for unsigned range checks
against a negative value (which in fact can be eliminated).
Once we re-implement `ClampedSubtract` in a way that it handles negative `X` correctly,
this limitation can be lifted, too.
Differential Revision: https://reviews.llvm.org/D46860
Reviewed By: samparker
llvm-svn: 332809
The evaluator goes through BB and creates global vars as temporary values to evaluate
results of LLVM instructions. It creates undef for alloca, however it assumes alloca
in addr space 0. If the next instruction is addrspace cast to 0, then we get an invalid
cast instruction.
This patch let the temp global var have an address space matching alloca addr space,
so that the valuation can be done.
Differential Revision: https://reviews.llvm.org/D47081
llvm-svn: 332794
Summary:
Floating point division by zero or even undef does not have undefined
behavior and may occur due to optimizations.
Fixes https://bugs.llvm.org/show_bug.cgi?id=37523.
Reviewers: kcc
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D47085
llvm-svn: 332761
The introduced problem is:
llvm.src/lib/Transforms/Vectorize/VPlanVerifier.cpp:29:13: error: unused function 'hasDuplicates' [-Werror,-Wunused-function]
static bool hasDuplicates(const SmallVectorImpl<VPBlockBase *> &VPBlockVec) {
^
llvm-svn: 332747
Summary:
Fix a case where FoldBranchToCommonDest() would bail out from doing CSE
when encountering a debug intrinsic. Handle that by skipping past the
debug intrinsics.
Also, as a minor refactoring, rename checkCSEInPredecessor() to
tryCSEWithPredecessor() to make it a bit more clear that the function
may remove instructions.
Reviewers: fhahn, craig.topper, dblaikie, xbolva00
Reviewed By: fhahn, xbolva00
Subscribers: vsk, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D46635
llvm-svn: 332698
1. Define Myriad-specific ASan constants.
2. Add code to generate an outer loop that checks that the address is
in DRAM range, and strip the cache bit from the address. The
former is required because Myriad has no memory protection, and it
is up to the instrumentation to range-check before using it to
index into the shadow memory.
3. Do not add an unreachable instruction after the error reporting
function; on Myriad such function may return if the run-time has
not been initialized.
4. Add a test.
Differential Revision: https://reviews.llvm.org/D46451
llvm-svn: 332692
Summary:
- Add wasm personality function
- Re-categorize the existing `isFuncletEHPersonality()` function into
two different functions: `isFuncletEHPersonality()` and
`isScopedEHPersonality(). This becomes necessary as wasm EH uses scoped
EH instructions (catchswitch, catchpad/ret, and cleanuppad/ret) but not
outlined funclets.
- Changed some callsites of `isFuncletEHPersonality()` to
`isScopedEHPersonality()` if they are related to scoped EH IR-level
stuff.
Reviewers: majnemer, dschuff, rnk
Subscribers: jfb, sbc100, jgravelle-google, eraman, JDevlieghere, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D45559
llvm-svn: 332667
Patch #3 from VPlan Outer Loop Vectorization Patch Series #1
(RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html).
Expected to be NFC for the current inner loop vectorization path. It
introduces the basic algorithm to build the VPlan plain CFG (single-level
CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization
path using VPInstructions. It includes:
- VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
- VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
- VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.
Reviewers: rengolin, fhahn, mkuper, mssimpso, a.elovikov, hfinkel, aprantl.
Differential Revision: https://reviews.llvm.org/D44338
llvm-svn: 332654
According to alive this is valid. I'm hoping to use this to make an assumption that the sign bit is zero after this sequence. The only way it wouldn't be is if the input was INT__MIN, but by preserving the flags we can make doing this to INT_MIN UB.
The nuw flags is weird because it creates such a contradiction that the original number would have to be positive meaning we could remove the select entirely, but we don't get that far.
Differential Revision: https://reviews.llvm.org/D46988
llvm-svn: 332623
entries to reach the target. Since these calls don't require type checks,
we can short-circuit them to their real targets.
Differential Revision: https://reviews.llvm.org/D46326
llvm-svn: 332610
Summary:
The verifier accepts PHI nodes with multiple entries for the
same basic block, as long as the value is the same.
As seen in PR37203, SROA did not handle such PHI nodes properly
when speculating loads over the PHI, since it inserted multiple
loads in the predecessor block and changed the PHI into having
multiple entries for the same basic block, but with different
values.
This patch teaches SROA to reuse the same speculated load for
each PHI duplicate entry in such situations.
Resolves: https://bugs.llvm.org/show_bug.cgi?id=37203
Reviewers: uabelho, chandlerc, hfinkel, bkramer, efriedma
Reviewed By: efriedma
Subscribers: dberlin, efriedma, llvm-commits
Differential Revision: https://reviews.llvm.org/D46426
llvm-svn: 332577
The current integer widening does not support rewriting partial split slices in rewriteIntegerStore (and rewriteIntegerLoad).
This patch adds explicit checks for this case in isIntegerWideningViableForSlice.
Before r322533, splitting is allowed only for the whole-alloca slice and hence the above case is implicitly rejected by another check `if (DL.getTypeStoreSize(ValueTy) > Size)` because whole-alloca slice is larger than the partition.
Differential Revision: https://reviews.llvm.org/D46750
llvm-svn: 332575
r332057 introduced distance() for ranges. Based on post-commit feedback,
this renames distance() to size(). The new size() is also only enabled
when the operation is O(1).
Differential Revision: https://reviews.llvm.org/D46976
llvm-svn: 332551
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
llvm-svn: 332479
Summary: If file stream arg is not captured and source is fopen, we could replace IO calls by unlocked IO ("_unlocked" function variants) to gain better speed,
Reviewers: efriedma, RKSimon, spatel, sanjoy, hfinkel, majnemer, lebedev.ri, rja
Reviewed By: rja
Subscribers: rja, srhines, efriedma, lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D45736
llvm-svn: 332452
So that it can be shared with other passes that may end up doing the same
thing.
Differential Revision: https://reviews.llvm.org/D45874
llvm-svn: 332450
A catchswitch must be the only non-phi instruction in its basic block;
attempting to move a retain or release into a catchswitch basic block
will result in invalid IR. Explicitly mark a CFG hazard in this case to
prevent the code motion.
Differential Revision: https://reviews.llvm.org/D46482
llvm-svn: 332430
Author: Samuel Pitoiset
Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.
This fixes a regression introduced by
"[PatternMatch] define m_Not using m_Xor and cst_pred_ty"
https://reviews.llvm.org/D46351
llvm-svn: 332403
When two interposable functions are merged, we cannot replace
uses and have to emit calls to a common internal function. However,
writeThunk() will not actually emit a thunk if the function is too
small. This leaves us in a broken state where mergeTwoFunctions
already rewired the functions, but writeThunk doesn't do anything.
This patch changes the implementation so that:
* writeThunk() does just that.
* The direct replacement of calls is moved into mergeTwoFunctions()
into the non-interposable case only.
* isThunkProfitable() is extracted and will be called for
the non-iterposable case always, and in the interposable case
only if uses are still left after replacement.
This issue has been introduced in https://reviews.llvm.org/D34806,
where the code for checking thunk profitability has been moved.
Differential Revision: https://reviews.llvm.org/D46804
Reviewed By: whitequark
llvm-svn: 332342
Summary:
Part of the InstCombine code for simplifying GEPs looks through
addrspacecasts. However, this was done by updating a variable
also used by the next transformation, for marking GEPs as
inbounds. This led to replacing a GEP with a similar instruction
in a different addrspace, which caused an assertion failure in RAUW.
This caused julia issue https://github.com/JuliaLang/julia/issues/27055
Patch by Jeff Bezanson <jeff@juliacomputing.com>
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D46722
llvm-svn: 332302
The DEBUG() macro is very generic so it might clash with other projects.
The renaming was done as follows:
- git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g'
- git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM
- Manual change to APInt
- Manually chage DOCS as regex doesn't match it.
In the transition period the DEBUG() macro is still present and aliased
to the LLVM_DEBUG() one.
Differential Revision: https://reviews.llvm.org/D43624
llvm-svn: 332240
This is a CodeExtractor improvement which adds support for extracting blocks
which have exception handling constructs if that is legal to do. CodeExtractor
performs validation checks to ensure that extraction is legal when it finds
invoke instructions or EH pads (landingpad, catchswitch, or cleanuppad) in
blocks to be extracted.
I have also added an option to allow extraction of blocks with alloca
instructions, but no validation is done for allocas. CodeExtractor caller has
to validate it himself before allowing alloca instructions to be extracted.
By default allocas are still not allowed in extraction blocks.
Differential Revision: https://reviews.llvm.org/D45904
llvm-svn: 332151
Let separate-const-offset-from-gep pass handle trunc() when it calculates
constant offset relative to base. The pass itself may insert trunc()
instructions when it canonicalises array indices to pointer-size integers
and needs to handle trunc() in order to evaluate the offset.
Differential Revision: https://reviews.llvm.org/D46732
llvm-svn: 332142
Summary:
This change adds handling of the atomic memset intrinsic to the
code path that simplifies the regular memset. In practice this means
that we will now also expand a small constant-length atomic memset
into a single unordered atomic store.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46660
llvm-svn: 332132
Phi nodes can reside in live blocks but one of their incoming
arguments can come from a dead block. Dead blocks and reassociate
don't play nice together. In fact, reassociate performs an RPO
as a first step to avoid processing dead blocks.
The reason why Reassociate might not fixpoint when examining
dead blocks is that the following:
%xor0 = xor i16 %xor1, undef
%xor1 = xor i16 %xor0, undef
is perfectly valid LLVM IR (if it appears in a dead block),
so the worklist algorithm keeps pushing the two instructions for
reexamination. Note that this is not Reassociate fault, at least
not entirely. It's llvm that has a weird definition of dominance.
Fixes PR37390.
llvm-svn: 332100
Summary:
This change reworks the handling of atomic memcpy within the instcombine pass.
Previously, a constant length atomic memcpy would be lowered into loads & stores
as long as no more than 16 load/store pairs are created. This is quite different
from the lowering done for a non-atomic memcpy; which only ever lowers into a single
load/store pair of no more than 8 bytes. Larger constant-sized memcpy calls are
expanded to load/stores in later passes, such as SelectionDAG lowering.
In this change the behaviour for atomic memcpy is unified with non-atomic memcpy;
atomic memcpy is now treated in the same was as non-atomic memcpy has always been.
We leave it to later passes to lower longer-length atomic memcpy calls.
Due to the structure of the pass's handling of memtransfer intrinsics, this change
also gives us handling of atomic memmove that we did not previously have.
Reviewers: apilipenko, skatkov, mkazantsev, anna, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D46658
llvm-svn: 332093
Summary:
https://bugs.llvm.org/show_bug.cgi?id=34897 demonstrates an incorrect
coroutine frame allocation elision in the coro-elide pass. The elision
is performed on the basis that the SSA variables from all llvm.coro.begin
are directly referenced in subsequent llvm.coro.destroy instructions.
However, this ignores the fact that the function may exit through paths
that do not run these destroy instructions. In the sample program from
PR34897, for example, the llvm.coro.destroy instruction is only
executed in exception handling code. When the coroutine function exits
normally, llvm.coro.destroy is not called. Eliding the allocation in
this case causes a subsequent reference to the coroutine handle from
outside of the function to access freed memory.
To fix the issue, when finding an llvm.coro.destroy for each llvm.coro.begin,
only consider llvm.coro.destroy that are executed along non-exceptional paths.
Test Plan:
1. Download the sample program from
https://bugs.llvm.org/show_bug.cgi?id=34897, compile it with
`clang++ -fcoroutines-ts -stdlib=libc++ -std=c++1z -O2`, and run it.
It should print `"run1\ncheck1\nrun2\ncheck2"` and then exit
successfully.
2. Compile https://godbolt.org/g/mCKfnr and confirm it is still
optimized to a single instruction, 'return 1190'.
3. `check-llvm`
Reviewers: rsmith, GorNishanov, eric_niebler
Reviewed By: GorNishanov
Subscribers: andrewrk, lewissbaker, EricWF, llvm-commits
Differential Revision: https://reviews.llvm.org/D43242
llvm-svn: 332077