This is an extension of the handling for unary intrinsics and
follows the logic that we use for binary ops.
We don't canonicalize to min/max intrinsics yet, but this might
help unlock other folds seen in D98152.
This patch updates VPWidenPHI recipes for first-order recurrences to
also track the incoming value from the back-edge. Similar to D99294,
which did the same for reductions.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104197
This fixes a bug at LibCallSimplifier::optimizeMemChr which does the following transformation:
```
// memchr("\r\n", C, 2) != nullptr -> (1 << C & ((1 << '\r') | (1 << '\n')))
// != 0
// after bounds check.
```
As written above, a bounds check on C (whether it is less than integer bitwidth) is done before doing `1 << C` otherwise 1 << C will overflow.
If the bounds check is false, the result of (1 << C & ...) must not be used at all, otherwise the result of shift (which is poison) will contaminate the whole results.
A correct way to encode this is `select i1 (bounds check), (1 << C & ...), false` because select does not allow the unused operand to contaminate the result.
However, this optimization was introducing `and (bounds check), (1 << C & ...)` which cannot do that.
The bug was found from compilation of this C++ code: https://reviews.llvm.org/rG2fd3037ac615#1007197
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104901
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.
Depends on D102361
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102423
Types should be defined in function scope instead of a local lexical scope. Field types should be defined inside in its parent type scope.
We were seeing a type defined in a local scope causing trouble to the dwarf emitter where a context is required to be a funciton scope, a namespace or a global scope.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D104937
If we have a umul.with.overflow where the multiply result is not used and one of the operands is a constant, we can perform the overflow check cheaper with a comparison then by performing the multiply and extracting the overflow flag.
(Noticed when looking at the conditions SCEV emits for overflow checks.)
Differential Revision: https://reviews.llvm.org/D104665
Rather than relying on pointer type equality (which, for a change,
is silently incorrect with opaque pointers) check that the GEP
source element types match.
WPD currently assumes that there is a one to one correspondence between
type test assume sequences and virtual calls. However, with
-fstrict-vtable-pointers this may not be true. This ends up causing
crashes when we try to optimize a virtual call more than once (
applyUniformRetValOpt()/applyUniqueRetValOpt()/applyVirtualConstProp()/applySingleImplDevirt()).
applySingleImplDevirt() actually didn't previous crash because it would
replace the devirtualized call with the same direct call. Adding an
assert that the call is indirect causes the corresponding test to crash
with the rest of the patch.
This makes Chrome successfully build with -fstrict-vtable-pointers + WPD.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104798
The whole transform can be dropped once we have fully transitioned
to opaque pointers (as it's purpose is to remove no-op pointer
casts). For now, make sure that it handles opaque pointers correctly.
- When emitting libcalls, do not only pass the calling convention from the
function prototype but also the attributes.
- Do not pass attributes from e.g. libc memcpy to llvm.memcpy.
Review: Reid Kleckner, Eli Friedman, Arthur Eubanks
Differential Revision: https://reviews.llvm.org/D103992
Similar to what we already do for `ret` terminators.
As noted by @rnk, clang seems to already generate a single `ret`/`resume`,
so this isn't likely to cause widespread changes.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104849
This patch enables the salvaging of debug values that may be calculated
from more than one SSA value, such as with binary operators that do not
use a constant argument. The actual functionality for this behaviour is
added in a previous commit (c7270567), but with the ability to actually
emit the resulting debug values switched off.
The reason for this is that the prior patch has been reverted several
times due to issues discovered downstream, some time after the actual
landing of the patch. The patch in question is rather large and touches
several widely used header files, and all issues discovered are more
related to the handling of variadic debug values as a whole rather than
the details of the patch itself. Therefore, to minimize the build time
impact and risk of conflicts involved in any potential future
revert/reapply of that patch, this significantly smaller patch (that
touches no header files) will instead be used as the capstone to enable
variadic debug value salvaging.
The review linked to this patch is mostly implemented by the previous
commit, c7270567, but also contains the changes in this patch.
Differential Revision: https://reviews.llvm.org/D91722
Based ontop of D104598, which is a NFCI-ish refactoring.
Here, a restriction, that only empty blocks can be merged, is lifted.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104597
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.
Differential Revision: https://reviews.llvm.org/D91722
This reverts commit 386b66b2fc.
This enable no_sanitize C++ attribute to exclude globals from hwasan
testing, and automatically excludes other sanitizers' globals (such as
ubsan location descriptors).
Differential Revision: https://reviews.llvm.org/D104825
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.
This relands commit 4474958d3a
with a fix to a use-of-uninitialized-value error that tripped
MemorySanitizer.
Link: https://github.com/ClangBuiltLinux/linux/issues/1354
Reviewed By: nickdesaulniers, pcc
Differential Revision: https://reviews.llvm.org/D104058
This attribute uses Attributor's internal 'optimistic' call graph
information to answer queries about function call reachability.
Functions can become reachable over time as new call edges are
discovered.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D104599
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.
The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.
Differential Revision: https://reviews.llvm.org/D104784
If a ctlz operation is performed on higher datatype and then
downcasted, then this can be optimized by doing a ctlz operation
on a lower datatype and adding the difference bitsize to the result
of ctlz to provide the same output:
https://alive2.llvm.org/ce/z/8uup9M
The original problem is shown in
https://llvm.org/PR50173
Differential Revision: https://reviews.llvm.org/D103788
This is part of improving floating-point patterns seen in:
https://llvm.org/PR39480
We don't require any FMF because the 2 potential corner cases
(-0.0 and NaN) are correctly handled without FMF:
1. -0.0 is treated as strictly less than +0.0 with
maximum/minimum, so fabs/fneg work as expected.
2. +/- 0.0 with maxnum/minnum is indeterminate, so
transforming to fabs/fneg is more defined.
3. The sign of a NaN may be altered by this transform,
but that is allowed in the default FP environment.
If there are FMF, they are propagated from the min/max call to
one or both new operands which seems to agree with Alive2:
https://alive2.llvm.org/ce/z/bem_xC
This changes the approach taken to tail-merge the blocks
to always create a new block instead of trying to reuse some block,
and generalizes it to support dealing not with just the `ret` in the future.
This effectively lifts the CallBr restriction, although this isn't really intentional.
That is the only non-NFC change here, i'm not sure if it's reasonable/feasible to temporarily retain it.
Other restrictions of the transform remain.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104598
With regards to overrunning, the langref (llvm/docs/LangRef.rst)
specifies:
(llvm.experimental.vector.insert)
Elements ``idx`` through (``idx`` + num_elements(``subvec``) - 1)
must be valid ``vec`` indices. If this condition cannot be determined
statically but is false at runtime, then the result vector is
undefined.
(llvm.experimental.vector.extract)
Elements ``idx`` through (``idx`` + num_elements(result_type) - 1)
must be valid vector indices. If this condition cannot be determined
statically but is false at runtime, then the result vector is
undefined.
For the non-mixed cases (e.g. inserting/extracting a scalable into/from
another scalable, or inserting/extracting a fixed into/from another
fixed), it is possible to statically check whether or not the above
conditions are met. This was previously missing from the verifier, and
if the conditions were found to be false, the result of the
insertion/extraction would be replaced with an undef.
With regards to invalid indices, the langref (llvm/docs/LangRef.rst)
specifies:
(llvm.experimental.vector.insert)
``idx`` represents the starting element number at which ``subvec``
will be inserted. ``idx`` must be a constant multiple of
``subvec``'s known minimum vector length.
(llvm.experimental.vector.extract)
The ``idx`` specifies the starting element number within ``vec``
from which a subvector is extracted. ``idx`` must be a constant
multiple of the known-minimum vector length of the result type.
Similarly, these conditions were not previously enforced in the
verifier. In some circumstances, invalid indices were permitted
silently, and in other circumstances, an undef was spawned where a
verifier error would have been preferred.
This commit adds verifier checks to enforce the constraints above.
Differential Revision: https://reviews.llvm.org/D104468
Follow-up on Roman's idea expressed in D103959.
- If a Phi has undefined inputs from live blocks:
- and no other inputs, assume it is undef itself;
- and exactly one non-undef input, we can assume that all undefs are equal to this input.
Differential Revision: https://reviews.llvm.org/D104618
Reviewed By: lebedev.ri, nikic
Zero factor leads to division by zero and failure of corresponding
assert as shown in PR50765. We should filter out such factors.
Differential Revision: https://reviews.llvm.org/D104702
Reviewed By: huihuiz, reames
This patch makes PriorityInlineOrder lazily updated.
The PriorityInlineOrder would lazily update the desirability of a call site if it's decreasing.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D104654
This patch fixes a problem with the AAExecutionDomain attributor not
checking if it is in a valid state. This can cause it to incorrectly
return that a block is executed in a single threaded context after the
attributor failed for any reason.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103186
After landing the globalization optimizations, the precense of globalization on
the device that was not put in shared or stack memory is a failed optimization
with performance consequences so it should indicate a missed remark.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D104735
When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.
Differential Revision: https://reviews.llvm.org/D104718
Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.
Depends on D102444
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D104416
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.
Depends on D102197
Reviewed By: sstefan1 thegameg
Differential Revision: https://reviews.llvm.org/D102444
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.
Depends on D97818
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102197
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D102824
Summary:
The changes introduced in D97680 create a simpler interface to code that needs
to be globalized. This interface is used to simplify the globalization calls in
the middle end. We can check any globalization call that is only called by a
single thread in the team and replace it with a static shared memory buffer.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D97818
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).
Add PointerType::hasSameElementTypeAs() to hide the element type
details.
Differential Revision: https://reviews.llvm.org/D104668
There was a bug from cost calculation for partially invariant unswitch.
The costs of non-duplicated blocks are substracted from the total LoopCost, so
anything that is duplicated should not be counted.
Differential Revision: https://reviews.llvm.org/D103816
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.
This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D97680
Patch was reverted due to a bug that existed before it and was exposed
by it. Returning after the underlying bug has been fixed.
Differential Revision: https://reviews.llvm.org/D103959
Reapplied without changes -- this was reverted together with an
underlying patch.
-----
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
Relative to the original patch, an InstCombine test has been
added to show a previously missed pattern, and the Coroutine
test that resulted in the revert has been regenerated.
-----
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. This previously
happened for the isSized() check, which skipped folds like
distributing a bitcast over a select.
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.
Differential Revision: https://reviews.llvm.org/D104590
Perform better analysis when trying to vectorize PHIs.
1. Do not try to vectorize vector PHIs.
2. Do deeper analysis for more profitable nodes for the vectorization.
Before we just tried to vectorize the PHIs of the same type. Patch
improves this and tries to vectorize PHIs with incoming values which
come from the same basic block, have the same and/or alternative
opcodes.
It allows to save the compile time and provides better vectorization
results in general.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103638
Bitcasts having opaque pointer source or result type cannot be
converted into a zero-index GEP, GEP source and result types
always have the same opaque-ness.
Move this into a separate function, to make sure that early
returns do not accidentally skip other transforms. There is
already one isSized() check that could run into this issue,
thus this change is not strictly NFC.
It's not possible to bitcast between different address spaces,
and this is ensured by the IR verifier. As such, this bitcast to
addrspacecast canonicalization can never be hit.
This reverts commit bb1dc876eb.
This patch causes an assertion failure when building an arm64 defconfig
Linux kernel.
See https://reviews.llvm.org/D103959 for a link to the original bug
report and a reduced reproducer.
This is no outwardly-visible-difference-intended,
but it is obviously better to have all transforms
for an intrinsic housed together since we already
have helper functions in place.
It is also potentially more efficient to zap a
simple pattern match before trying to do expensive
computeKnownBits() calls.
Use poison instead of undef for cases dealing with unreachable
code. This still leaves the more interesting case of "load from
uninitialized memory" as undef.
getSpecializationCost was returning INT_MAX for a case when specialisation
shouldn't happen, but this wasn't properly checked if specialisation was
forced.
Differential Revision: https://reviews.llvm.org/D104461
This patch lifts the requirement to have the only incoming live block
for Phis. There can be multiple live blocks if the same value comes to
phi from all of them.
Differential Revision: https://reviews.llvm.org/D103959
Reviewed By: nikic, lebedev.ri
This patch updates InstCombine to use poison constant to represent the resulting value of (either semantically or syntactically) unreachable instrs, or a don't-care value of an unreachable store instruction.
This allows more aggressive folding of unused results, as shown in llvm/test/Transforms/InstCombine/getelementptr.ll .
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104602
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.
The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.
Differential Revision: https://reviews.llvm.org/D102982
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in
certain matrix operations:
for(i=..)
dst[i] = 0;
for(j=..)
dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..)
dst[i] = 0;
sum = 0;
for(j=..)
sum += src[i*n+j];
dst[i] = sum;
The first store is dead, and with this patch is now removed.
Differntial Revision: https://reviews.llvm.org/D100464
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.
Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D104028
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:
* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code
The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.
Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.
Differential Revision: https://reviews.llvm.org/D104482
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.
This implementation uses InstSimplify. SCEV version was rejected due to high
compile time impact.
Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: nikic
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.
Differential Revision: https://reviews.llvm.org/D104193
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D104028
This looks like not a practical pattern in our codebase (it could fail
in some sandbox environement).
Instead we print it via standard output, and it is controled by the
-attributor-print-call-graph, this follows a similiar pattern of attributor-print-dep.
InstCombine didn't perform (sext bool X) * (sext bool X) --> zext (and X, X) which can result in just (zext X). The patch adds regression tests to check this transformation and adds a check for equality of mul's operands for that case.
Differential Revision: https://reviews.llvm.org/D104193
This reverts commit 76d0747e08.
If a group has `__llvm_prf_vals` due to static value profiler counters
(`NS!=0`), we cannot make `__llvm_prf_data` private, because a prevailing text
section may reference `__llvm_prf_data` and will cause a `relocation refers to a
discarded section` linker error.
Note: while a `__profc_` group is non-prevailing, it may be referenced by a
prevailing text section due to inlining.
```
group section [ 66] `.group' [__profc__ZN5clang20EmitClangDeclContextERN4llvm12RecordKeeperERNS0_11raw_ostreamE] contains 4 sections:
[Index] Name
[ 67] __llvm_prf_cnts
[ 68] __llvm_prf_vals
[ 69] __llvm_prf_data
[ 70] .rela__llvm_prf_data
```
This should fix PR50683. The wrong assumption was that we
could always know what the callee is when we replace a call site
argument with undef. We wanted to know that to remove the `noundef`
that might be attached to the argument. Since no callee means we
did the propagation on the caller site, there is no need to remove
an attribute. It is only needed if we replace all uses and therefore
pass `undef` instead of the value that was passed in otherwise.
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
If we simplify values we sometimes end up with type mismatches. If the
value is a constant we can often cast it though to still allow
propagation. The logic is now put into a helper and it replaces some
ad hoc things we did before.
This also introduces the AA namespace for abstract attribute related
functions and types.
Differential Revision: https://reviews.llvm.org/D103856
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.
Differential Revision: https://reviews.llvm.org/D98608
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
The current naming scheme adds the `dfs$` prefix to all
DFSan-instrumented functions. This breaks mangling and prevents stack
trace printers and other tools from automatically demangling function
names.
This new naming scheme is mangling-compatible, with the `.dfsan`
suffix being a vendor-specific suffix:
https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling-structure
With this fix, demangling utils would work out-of-the-box.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104494
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=48857.
Previous attempts can be found in D104007 and D101980.
A lot of discussions can be found in those two patches.
To summarize the bug:
When Clang emits IR for coroutines, the first thing it does is to make a copy of every argument to the local stack, so that uses of the arguments in the function will all refer to the local copies instead of the arguments directly.
However, in some cases we find that arguments are still directly used:
When Clang emits IR for a function that has pass-by-value arguments, sometimes it emits an argument with byval attribute. A byval attribute is considered to be local to the function (just like alloca) and hence it can be easily determined that it does not alias other values. If in the IR there exists a memcpy from a byval argument to a local alloca, and then from that local alloca to another alloca, MemCpyOpt will optimize out the first memcpy because byval argument's content will not change. This causes issues because after a coroutine suspension, the byval argument may die outside of the function, and latter uses will lead to memory use-after-free.
This is only a problem for arguments with either byval attribute or noalias attribute, because only these two kinds are considered local. Arguments without these two attributes will be considered to alias coro_suspend and hence we won't have this problem. So we need to be able to deal with these two attributes in coroutines properly.
For noalias arguments, since coro_suspend may potentially change the value of any argument outside of the function, we simply shouldn't mark any argument in a coroutiune as noalias. This can be taken care of in CoroEarly pass.
For byval arguments, if such an argument needs to live across suspensions, we will have to copy their value content to the frame, not just the pointer.
Differential Revision: https://reviews.llvm.org/D104184
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D104059
For ELF, since all counters/data are in a section group (either `comdat any` or
`comdat noduplicates`), and the signature for `comdat any` is `__profc_`, the
D1003372 optimization prerequisite (linker GC cannot discard data variables
while the text section is retained) is always satisified, we can make __profd_
unconditionally private.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
This pass emits a floating point compare and a conditional branch,
but if strictfp is enabled we don't emit a constrained compare
intrinsic.
The backend also won't expand the readonly sqrt call this pass inserts
to a sqrt instruction under strictfp. So we end up with 2 libcalls as
seen here. https://godbolt.org/z/oax5zMEWd
Fix these things by disabling the pass.
Differential Revision: https://reviews.llvm.org/D104479
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481
Fold all exits based on known trip count/multiple information from
SCEV. Previously only the latch exit or the single exit were folded.
This doesn't yet eliminate ULO.TripCount and ULO.TripMultiple
entirely: They're still used to a) decide whether runtime unrolling
should be performed and b) for ORE remarks. However, the core
unrolling logic is independent of them now.
Differential Revision: https://reviews.llvm.org/D104203
This really isn't talking about vectors in general,
but only about either fixed or scalable vectors,
and it's pretty confusing to see it state
that there aren't any vectors :)
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.
This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.
At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104188
This can be seen as a follow up to commit 0ee439b705,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.
Differential Revision: https://reviews.llvm.org/D103988
The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.
Test Plan: check-all
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D104282
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.
Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.
Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.
Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.
Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D104099
This adds support for functions outlined by the IR Outliner to be
recognized by the debugger. The expected behavior is that it will
skip over the instructions included in that section. This is due to the
fact that we can not say which of the original locations the
instructions originated from.
These functions will show up in the call stack, but you cannot step
through them.
Reviewers: paquette, vsk, djtodoro
Differential Revision: https://reviews.llvm.org/D87302
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.
Also discussed in D103382.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104238
This commit mostly just replaces bad uses of `NDEBUG` with uses of
`LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI
breaking changes (normally extra struct elements in headers).
Differential Revision: https://reviews.llvm.org/D104216
Currently, Loop strengh reduce is not handling loops with scalable stride very well.
Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).
Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.
This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.
Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103939
We canonicalized to these select patterns (poison-safe logic)
with D101191, so we need to reduce 'not' ops when possible
as we would with 'and'/'or' instructions.
This is shown in a secondary example in:
https://llvm.org/PR50389https://alive2.llvm.org/ce/z/BvsESh
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50610.
In computed goto pattern, there are usually a list of basic blocks that are all targets of indirectbr instruction, and each basic block also has address taken and stored in a variable.
CHR pass could potentially clone these basic blocks, which would generate a cloned version of the indirectbr and clonved version of all basic blocks in the list.
However these basic blocks will not have their addresses taken and stored anywhere. So latter SimplifyCFG pass will simply remove all tehse cloned basic blocks, resulting in incorrect code.
To fix this, when searching for scopes, we skip scopes that contains BBs with addresses taken.
Added a few test cases.
Reviewed By: aeubanks, wenlei, hoy
Differential Revision: https://reviews.llvm.org/D103867
Also:
- add driver test (fsanitize-use-after-return.c)
- add basic IR test (asan-use-after-return.cpp)
- (NFC) cleaned up logic for generating table of __asan_stack_malloc
depending on flag.
for issue: https://github.com/google/sanitizers/issues/1394
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D104076
It was found by chance revealing discrepancy between comment (few lines above),
the condition and how re-ordering of instruction is done inside the if statement
it guards. The condition was always evaluated to true.
Differential Revision: https://reviews.llvm.org/D104064
Adds the basic instrumentation needed for stack tagging.
Currently does not support stack short granules or TLS stack histories,
since a different code path is followed for the callback instrumentation
we use.
We may simply wait to support these two features until we switch to
a custom calling convention.
Patch By: xiangzhangllvm, morehouse
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D102901
The problematic code pattern in the test is based on:
https://llvm.org/PR50638
If the IfCond is itself the phi that we are trying to remove,
then the loop around line 2835 can end up with something like:
%cmp = select i1 %cmp, i1 false, i1 true
That can then lead to a use-after-free and assert (although
I'm still not seeing that locally in my release + asserts build).
I think this can only happen with unreachable code.
Differential Revision: https://reviews.llvm.org/D104063
<string> is currently the highest impact header in a clang+llvm build:
https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html
One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.
This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.
Differential Revision: https://reviews.llvm.org/D103888
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).
Differential Revision: https://reviews.llvm.org/D104029
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.
This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.
This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:
https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.
The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.
Function specialisation has great potential to increase compile-times and
code-size. The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
everywhere a little bit more time is spent processing more functions and
instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
show up very high in the profile of the optimisation passes.
The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.
Differential Revision: https://reviews.llvm.org/D93838
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.
Reviewed By: lebedev.ri, spatel
Differential Revision: https://reviews.llvm.org/D103419
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.
Then we use the refactored code trying 4 variants of matching operands.
Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.
Reviewed By: volkan
Differential Revision: https://reviews.llvm.org/D103912
SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.
Added a LIT test to catch one case.
Patch by Mark Mendell
Differential Revision: https://reviews.llvm.org/D103254
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.
The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.
Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103907
Essentially, the cover function simply combines the loop level check and the function level scope into one call. This simplifies several callers and is (subjectively) less error prone.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.
Differential Revision: https://reviews.llvm.org/D104026
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
for (const auto VF : VFCandidates) {
^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
for (const auto VF : VFCandidates) {
^~~~~~~~~~~~~~~
&
1 warning generated.
```
Differential Revision: https://reviews.llvm.org/D103970
This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.
Differential Revision: https://reviews.llvm.org/D103153
Summary:
The current implementation of AANoFreeFloating will incorrectly list floating
point loads and stores as may-free. This prevents other attributor instances
like HeapToStack from pushing some allocations to the stack.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D103975
This allows for using the frame record feature (which uses __hwasan_tls)
independently from however the user wants to access the shadow base, which
prior was only usable if shadow wasn't accessed through the TLS variable or ifuncs.
Frame recording can be explicitly set according to ShadowMapping::WithFrameRecord
in ShadowMapping::init. Currently, it is only enabled on Fuchsia and if TLS is
used, so this should mimic the old behavior.
Added an extra case to prologue.ll that covers this new case.
Differential Revision: https://reviews.llvm.org/D103841
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D99435
1. Better sorting of scalars to be gathered. Trying to insert
constants/arguments/instructions-out-of-loop at first and only then
the instructions which are inside the loop. It improves hoisting of
invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103458
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D99435
As discussed in the post-commit comments for:
3cdd05e519
It seems to be safe to propagate all flags from the final fneg
except for 'nsz' to the new select:
https://alive2.llvm.org/ce/z/J_APDc
nsz has unique FMF semantics: it is not poison, it is only
"insignificant" in the calculation according to the LangRef.
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76a.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d8.
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Also, a crash problem on legacy pass manager is fixed.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.
This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103814
Complete support for fast8:
- amend shadow size and mapping in runtime
- remove fast16 mode and -dfsan-fast-16-labels flag
- remove legacy mode and make fast8 mode the default
- remove dfsan-fast-8-labels flag
- remove functions in dfsan interface only applicable to legacy
- remove legacy-related instrumentation code and tests
- update documentation.
Reviewed By: stephan.yichao.zhao, browneee
Differential Revision: https://reviews.llvm.org/D103745
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message.
inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons.
First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal.
Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash.
Differential Revision: https://reviews.llvm.org/D103492
We need to adjust the FMF propagation on at least
one of these transforms as discussed in:
https://llvm.org/PR49654
...so this should make it easier to intersect flags.
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.
I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D103515
This patch is an extension of D103421. It allows the InstCombiner to
generate the negated form of integer scalable-vector splats. It can
technically handle fixed-length vectors too but those are completely
covered by the preceding logic.
This enables extra combining opportunities for scalable vector types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103801
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:
newbound = min(n, c)
while (iv < n) { while(iv < newbound) {
A A
if (iv < c) B
B C
C }
} if (iv != n) {
while (iv < n) {
A
C
}
}
Differential Revision: https://reviews.llvm.org/D102234
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.
When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.
This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.
In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.
Differential Revision: https://reviews.llvm.org/D103362