This implements support for isKnownNonZero, computeKnownBits when freeze is involved.
```
br (x != 0), BB1, BB2
BB1:
y = freeze x
```
In the above program, we can say that y is non-zero. The reason is as follows:
(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D75808
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.
Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.
Fixes: SWDEV-247595
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D87158
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.
As a result, we were missing cases like this:
```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```
https://godbolt.org/z/16r7ad
This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.
This is a 0.2% geomean code size improvement for CTMark at -O3.
There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.
Differential Revision: https://reviews.llvm.org/D87397
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.
Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.
Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87386
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.
This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.
Differential Revision: https://reviews.llvm.org/D87415
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.
This is enabled only with optimizations enabled like SelectionDAG.
Differential Revision: https://reviews.llvm.org/D86824
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.
Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.
Differential Revision: https://reviews.llvm.org/D86665
This combine previously tried to take sequences like:
%cond = G_ICMP pred, a, b
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
G_BRCOND %cond, %truebb
G_BR %falsebb
%truebb:
...
%falsebb:
...
and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.
I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.
Differential Revision: https://reviews.llvm.org/D86664
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).
int main() { // BB0
return 0; // BB2 (due to entry block splitting)
}
// BB1 is the exit block (since gcov 4.8)
This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.
// BB0 is the synthetic entry block with a single edge to BB2
int main() { // BB2
return 0; // BB2
}
// BB1 is the exit block (since gcov 4.8)
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.
Differential Revision: https://reviews.llvm.org/D85091
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist. Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.
This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87251
If a function had at most one return block, the pass would return false
regardless if an unified unreachable block was created.
This patch fixes that by refactoring runOnFunction into two separate
helper functions for handling the unreachable blocks respectively the
return blocks, as suggested by @bjope in a review comment.
This was caught using the check introduced by D80916.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D85818
This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions
that are mostly about accessing the file system or processes.
A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`.
They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them.
- clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union.
- For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice.
After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or
introduced by a transformation from primitive IR operations.
For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85894
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.
isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84242
Some constructors of IEEEFloat do not initialize member variable exponent.
Fix it by initializing exponent with the following values:
For NaNs, the `exponent` is `maxExponent+1`.
For Infinities, the `exponent` is `maxExponent+1`.
For Zeroes, the `exponent` is `maxExponent-1`.
Patch by: @nullptr.cpp (Yang Fan)
Differential Revision: https://reviews.llvm.org/D86997
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
The MemorySSAWrapperPass depends on AAResultsWrapperPass and if
MemorySSA is preserved but AAResultsWrapperPass is not, this could lead
to a crash when updating the last user of the MemorySSAWrapperPass.
Alternatively AAResultsWrapperPass could be marked preserved by GVN, but
I am not sure if that would be safe. I am not sure what is required in
order to preserve AAResultsWrapperPass. At the moment, it seems like a
couple of passes that do similar transforms to GVN are preserving it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D87137
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.
Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D87252
We really want to try and avoid spilling P0, which can be difficult
since there's only one register, so try to rematerialize any VCTP
instructions.
Differential Revision: https://reviews.llvm.org/D87280
This commit cleans up the ::initialize method of various AAs in the
following ways:
- If an associated function is required, give up on declarations.
This was discovered as a real problem when lots of llvm.dbg.XXX
call sites were assumed `noreturn` until proven otherwise. That
does not make any sense and caused huge regressions and missed
deductions.
- Require more associated declarations for function interface AAs.
- Use the IRAttribute::initialize to determine if function interface
AAs can be used in IPO, don't replicate the checks (especially
isFunctionIPOAmendable) all over the place. Arguably the function
declaration check should be moved to some central place to.
If we have a callback, call site arguments were already associated with
the callback callee. Now we also associate the function with the
callback callee, thus we know ensure that the following holds true (if
all return nonnull):
`getAssociatedArgument()->getParent() == getAssociatedFunction()`
To test this an early exit from
`AAMemoryBehaviorCallSiteArgument::initialize``
is included as well. Without the change to getAssociatedFunction() this
kind of early exit for declarations would cause callback call site
arguments to miss out.
As we handle callback calls we need to disambiguate the call site
argument number from the callee argument number. While always equal in
non-callback calls, a callback comes with a partial parameter-argument
mapping so there is no implicit correspondence. Here we split
`IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()`
and `getCalleeArgNo()`. Usages are adjusted to pick the right one for
their purpose. This fixed some problems that would have been exposed as
we more aggressively optimize callbacks.
While operand bundles carry unpredictable semantics, we know some of
them and can therefore "ignore" them. In this case we allow to look at
the declaration of `llvm.assume` when asked for the attributes at a call
site. The assume operand bundles we have do not invalidate the
declaration attributes.
We cannot test this in isolation because the llvm.assume attributes are
determined by the parser. However, a follow up patch will provide test
coverage.
In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized
attributes until stack frame ~35k caused space to run out. The initial
size 1024 is pretty much random.
For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum
number of counters necessary is |E|-(|V|-1). The emitted edges form a spanning
tree. libgcov emitted .gcda files leverages this optimization while clang
--coverage's doesn't.
Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can
correctly print line counts of gcc --coverage emitted files and enable
the future improvement of clang --coverage.
Instead, passing in the command line options, initialized to nullptr. In
an upcoming patch, we can then use the parameter to pass actual command
line options.
Differential Revision: https://reviews.llvm.org/D87336
Since a function might have portions of its code coming from multiple
different files, "start line" is ambiguous (it can't just be resolved
relative to the file/line specified). Add start file to disambiguate it.
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.
This required adding a SDNodeFlags to SelectionDAG::getSetCC.
Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.
Differential Revision: https://reviews.llvm.org/D87200
Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was
widened to HVX. Make sure that v8i8 does not get altered (even if it's
changed to another legal type).
The get{Return,Unwind,Unreachable}Block functions in
UnifyFunctionExitNodes have not been used for many years,
so just remove them.
Reviewed By: bjope
Differential Revision: https://reviews.llvm.org/D87078
If we know that the abs operand is known negative, we can replace
it with a neg.
To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.
Differential Revision: https://reviews.llvm.org/D87196
D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:
%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.
There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.
While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.
Differential Revision: https://reviews.llvm.org/D86859
When the function return type is non-void and `end` instructions are at
the very end of a function, CFGStackify's `fixEndsAtEndOfFunction`
function fixes the corresponding block/loop/try's type to match the
function's return type. This is applied to consecutive `end` markers at
the end of a function. For example, when the function return type is
`i32`,
```
block i32 ;; return type is fixed to i32
...
loop i32 ;; return type is fixed to i32
...
end_loop
end_block
end_function
```
But try-catch is a little different, because it consists of two parts:
a try part and a catch part, and both parts' return type should satisfy
the function's return type. Which means,
```
try i32 ;; return type is fixed to i32
...
block i32 ;; this should be changed i32 too!
...
end_block
catch
...
end_try
end_function
```
As you can see in this example, it is not sufficient to only `end`
instructions at the end of a function; in case of `try`, we should
check instructions before `catch`es, in case their corresponding `try`'s
type has been fixed.
This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist
that contains a reverse iterator, each of which is a starting point for
a new backward `end` instruction search.
Fixes https://bugs.llvm.org/show_bug.cgi?id=47413.
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D87207
We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h.
Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.
This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.
Differential Revision: https://reviews.llvm.org/D86268
Patch by Eli Friedman.
Review: Ulrich Weigand
This patch makes the debug_ranges section optional. When we specify an
empty debug_ranges section, yaml2obj only emits the section header.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D87263
count register
After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.
This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and
amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'
Commit 3c0b3250 introduced memory cluster under pwr10 target, but a
check for operands was unexpectedly removed. This adds it back to avoid
regression.
Implement AArch64 variant of shouldCoalesce() to detect a known failing case
and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load.
Do not coalesce in the following case:
COPY where source is bottom 32 bits of a 64-register,
and destination is a 32-bit subregister of a 64-bit register,
ie it causes the rest of the register to be implicitly set to zero.
A mir test has been added.
In the test case, the 32-bit copy implements a 32 to 64 bit zero extension
and relies on the upper 32 bits being zeroed.
Coalescing to the result of the 64-bit load meant overwriting
the upper 32 bits incorrectly when the loaded byte was negative.
Reviewed By: john.brawn
Differential Revision: https://reviews.llvm.org/D85956
Without gcc 7.4 warns with
../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
BaseOp1.isFI() &&
~~~~~~~~~~~~~~~^~
"Only base registers and frame indices are supported.");
~
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.
Patched by: mdchen
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D86939
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
Deleting or replacing anything is certainly a modification. This caused
a later assertion in IPSCCP when compiling 400.perlbench with the new PM.
I'm not sure how to test this.
On Power10, it's profitable to schedule some stores with adjacent target
address together. This patch implements this feature.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D86754
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.
It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric
So let's just reland this.
Original commit message:
I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.
After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.
Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.
I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.
Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.
Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.
As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S
llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement
The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.
If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
* -14 (-73.68%) loops not rotated due to the header size (yay)
* -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
* -3937 (-64.19%) common instructions hoisted
* +561 (+0.06%) x86 asm instructions
* -2 basic blocks
* +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable {F12360201}
* -36396 (-65.29%) common instructions hoisted
* +1676 (+0.02%) x86 asm instructions
* +662 (+0.06%) basic blocks
* +4395 (+0.04%) IR instructions
It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D84108
This reverts commit 503deec218.
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.
Differential Revision: https://reviews.llvm.org/D87183
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.
It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D87215
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.
This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.
This is part of improving flags propagation noticed
with PR47430.
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.
Differential Revision: https://reviews.llvm.org/D87112
MASM allows variables defined by equate statements to be used in expressions.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D86946
MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D86945
optimizeEndCF removes EXEC restoring instruction case this instruction is the only one except the branch to the single successor and that successor contains EXEC mask restoring instruction that was lowered from END_CF belonging to IF_ELSE.
As a result of such optimization we get the basic block with the only one instruction that is a branch to the single successor.
In case the control flow can reach such an empty block from S_CBRANCH_EXEZ/EXECNZ it might happen that spill/reload instructions that were inserted later by register allocator are placed under exec == 0 condition and never execute.
Removing empty block solves the problem.
This change require further work to re-implement LIS updates. Recently, LIS is always nullptr in this pass. To enable it we need another patch to fix many places across the codegen.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D86634
a warning about clobbering reserved registers (NFC).
Also address some minor inefficiencies and style issues.
Differential Revision: https://reviews.llvm.org/D86088
We already simplify the unsigned comparisons if we've found the operands are non-negative, but we were still calling LowerVSETCCWithSUBUS which resulted in the PR47448 regressions.
Normal dead code elimination ignores assume intrinsics, so we fail to
delete assumes that are not meaningful (and potentially worse if they
cause conflicts with other assumptions).
The motivating example in https://llvm.org/PR47416 suggests that we
might have problems upstream from here (difference between C and C++),
but this should be a cheap way to make sure we remove more dead code.
Differential Revision: https://reviews.llvm.org/D87149
In getMemcpyLoadsAndStores(), a memcpy where the source is a zero constant is expanded to a MemOp::Set instead of a MemOp::Copy, even when the memcpy is volatile.
This is incorrect.
The fix is to add a check for volatile, and expand to MemOp::Copy in the volatile case.
Reviewed By: chill
Differential Revision: https://reviews.llvm.org/D87134
lowerShuffleWithPERMV allows us to use the ZMM variants for 128/256-bit variable shuffles on non-VLX AVX512 targets.
This is another step towards shuffle combining through between vector widths - we still end up with an annoying regression (combine_vpermilvar_vperm2f128_zero_8f32) but we're going in the right direction....
To enable the cost of constants, the helper function has been
reorganised:
- A struct has been introduced to hold SCEV operand information so
that we know the user of the operand, as well as the operand index.
The Worklist now uses instead instead of a bare SCEV.
- The costing of each SCEV, and collection of its operands, is now
performed in a helper function.
Differential Revision: https://reviews.llvm.org/D86050
Modify FoldBranchToCommonDest to consider the cost of inserting
instructions when attempting to combine predicates to fold blocks.
The threshold can be controlled via a new option:
-simplifycfg-branch-fold-threshold which defaults to '2' to allow
the insertion of a not and another logical operator.
Differential Revision: https://reviews.llvm.org/D86526
When optimising for size, make the cost of i1 logical operations
relatively expensive so that optimisations don't try to combine
predicates.
Differential Revision: https://reviews.llvm.org/D86525
This patch makes the debug_addr section optional. When an empty
debug_addr section is specified, yaml2obj only emits a section header
for it.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D87205
log:
BB [7, 8): begin {}, end {}, livein {}, liveout {}
BB [1, 2): begin {}, end {}, livein {}, liveout {}
...
But it is not convenient to know what the basic block is.
So I add the basic block name to it.
Reviewed By: vitalybuka
TestPlan: check-llvm
Differential Revision: https://reviews.llvm.org/D87152
This is the split part of D86269, which add a new ELF machine flag called EM_CSKY and related relocations.
Some target-specific flags and tests for csky can be added in follow-up patches later.
Differential Revision: https://reviews.llvm.org/D86610
Fixes PR47375, in which an assertion was triggering because
WebAssemblyTargetLowering::isVectorLoadExtDesirable was improperly
assuming the use of simple value types.
Differential Revision: https://reviews.llvm.org/D87110
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82727
rGabd33bf5eff2 enabled us to pad 128/256-bit shuffles to 512-bit on non-VLX targets, but wasn't updating binary shuffles to account for the new vector width.
This addresses the remaining issue from D87188. Due to a series of
folds, we may end up with abs-of-abs represented as
x == 0 ? -abs(x) : abs(x). Rather than recognizing this as a special
abs pattern and doing an abs-of-abs fold on it afterwards,
I'm directly folding this to one of the select operands in InstSimplify.
The general pattern falls into the "select with operand replaced"
category, but that fold is not powerful enough to recognize that
both hands of the select are the same for value zero.
Differential Revision: https://reviews.llvm.org/D87197
The post-index matcher, before it queries the target legality, walks uses
of some instructions which in pathological cases can be massive. Since
no targets actually support indexed loads yet, disable this to stop wasting
compile time on something which is going to fail anyway.
During the PEI pass, the dead TargetStackID::SGPRSpill spill slots
are not being removed while spilling the FP/BP to memory.
Fixes: SWDEV-250393
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D87032
This adjusts the description of `llvm.memcpy` to also allow operands
to be equal. This is in line with what Clang currently expects.
This change is intended to be temporary and followed by re-introduce
a variant with the non-overlapping guarantee for cases where we can
actually ensure that property in the front-end.
See the links below for more details:
http://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html
and PR11763.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D86815
Recognize umin/umax/smin/smax intrinsics and convert them to the
already existing SCEV nodes of the same name.
In the future we'll want SCEVExpander to also produce the intrinsics,
but we're not ready for that yet.
Differential Revision: https://reviews.llvm.org/D87160
Similar to D87168, but for abs. If we have a dominating x >= 0
condition, then we know that abs(x) is x. This fold is in
InstCombine, because we need to create a sub instruction for
the x < 0 case.
Differential Revision: https://reviews.llvm.org/D87184
If we have a dominating condition that x >= y, then umax(x, y) is x,
etc. I'm doing this in InstSimplify as the corresponding transform
for the select form is also done there.
Differential Revision: https://reviews.llvm.org/D87168
Previously SDNodeFlags::instersectWith(Flags) would do nothing if Flags was
in an undefined state, which is very bad given that this is the default when
getNode() is called without passing an explicit SDNodeFlags argument.
This meant that if an already existing and reused node had a flag which the
second caller to getNode() did not set, that flag would remain uncleared.
This was exposed by https://bugs.llvm.org/show_bug.cgi?id=47092, where an NSW
flag was incorrectly set on an add instruction (which did in fact overflow in
one of the two original contexts), so when SystemZElimCompare removed the
compare with 0 trusting that flag, wrong-code resulted.
There is more that needs to be done in this area as discussed here:
Differential Revision: https://reviews.llvm.org/D86871
Review: Ulrich Weigand, Sanjay Patel
When a switch case is folded into default's case, that's an IR change that
should be reported, update ConstantFoldTerminator accordingly.
Differential Revision: https://reviews.llvm.org/D87142
Libcall __gcc_qtou is not available, which breaks some tests needing
it. On PowerPC, we have code to manually expand the operation, this
patch applies it to constrained conversion. To keep it strict-safe,
it's using the algorithm similar to expandFP_TO_UINT.
For constrained operations marking FP exception behavior as 'ignore',
we should set the NoFPExcept flag. However, in some custom lowering
the flag is missed. This should be fixed by future patches.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D86605
This can cause an infinite loop if SimplifiedDemandedElts asks
for the node to replace itself.
A similar protection exists in other places in shuffle combining.
Fixes ISPC https://github.com/ispc/ispc/issues/1864