Previously loading the vtable used in calling a virtual method in a loop
was not hoisted out of the loop. This fixes that.
canSinkOrHoistInst() itself doesn't check that the load operands are
loop invariant, callers also check that separately.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99784
meetBDVState looks pretty difficult to read and follow.
This is purely NFC but doing several things:
1) Combine meet and meetBDVState
2) Move the function to be a member of BDVState
3) Make BDVState be a mutable object
4) Convert switch to sequence of ifs
5) Adds comments.
Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99064
Add explicit type i64 to RV64 only patterns to stop emitting unneeded i32 patterns.
It can reduce the isel table size.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100089
This reverts commit e35afbe535, reapplying
022ccedde8 and
e7ed5c920d.
- The first attempt missed defining `SignpostEmitterImpl`.
- The second attempt missed defining `llvm::SignpostEmitterImpl`.
Not sure how I failed to test both versions locally before; I thought
I'd turned the feature off via rerunning `cmake` but it must have been
stuck in place. This time I confirmed via `clang -E` that I was testing
both build configurations.
Original commit message:
Replace some manual memory management with std::unique_ptr.
Differential Revision: https://reviews.llvm.org/D100151
This reverts commit 078072285d, reapplying
022ccedde8.
I figured out why this was failing in other environments: it's not a
problem with std::unique_ptr, but that SignpostEmitterImpl only has a
forward declaration. Adding an empty definition should do the trick.
Original commit message:
Replace some manual memory management with std::unique_ptr.
Differential Revision: https://reviews.llvm.org/D100151
The destructor for SignPostEmitterImpl::SignpostLog is known statically. Avoid
the unnecessary vtable indirection through std::function in the std::unique_ptr
by turning LogDeleter into a struct. No real functionality change here.
Differential Revision: https://reviews.llvm.org/D100154
This is a DenseMap, which has its own initializer; we don't need to explicitly
call the default constructor here.
Differential Revision: https://reviews.llvm.org/D100152
Add a variant of `fs::resize_file` for use immediately before opening a
file with `mapped_file_region::readwrite`. On Windows, `_chsize`
(`ftruncate`) is slow, but `CreateFileMapping` (`mmap`) automatically
extends the file so the call to `fs::resize_file` can be skipped.
This optimization was added to `FileOutputBuffer` in
da9bc2e56d5a5c6332a9def1a0065eb399182b93; this commit just extracts the
logic out and adds a unit test.
Differential Revision: https://reviews.llvm.org/D95490
Instead of instantiating multiclasses inside multiclasses, just
inherit from them.
We can do the same for the VPseudo* multiclasses, but that may
interfere with the scheduler class work.
Pretty straightforward use of existing infrastructure and port of the attributor inference rules for nosync.
A couple points of interest:
* I deliberately switched from "monotonic or better" to "unordered or better". This is simply me being conservative and is better in line with the rest of the optimizer. We treat monotonic conservatively pretty much everywhere.
* The operand bundle test change is suspicious. It looks like we might have missed something here, but if so, it's an issue with the existing nofree inference as well. I'm going to take a closer look at that separately.
* I needed to keep the previous inference from readnone. This surprised me, but made sense once I realized readonly inference goes to lengths to reason about local vs non-local memory and that writes to local memory are okay. This is fine for the purpose of nosync, but would e.g. prevent us from inferring nofree from readnone - which is slightly surprising.
Differential Revision: https://reviews.llvm.org/D99769
This fixes a "Cached first special instruction is wrong!" assert.
The assert fires because replacing a value with another can cause an
instruction to no longer be "special" to ICF. In this case,
devirtualization happened, turning an indirect call to a
call to a willreturn function which is no longer special.
Reviewed By: nikic, rnk
Differential Revision: https://reviews.llvm.org/D99977
It used to work correctly even with a KILL, but there is
no reason to consider meta instructions since they do not
create real HW uses.
Differential Revision: https://reviews.llvm.org/D100135
After D99249 we use three different loop pass managers for LICM,
LoopRotate and LICM+LoopUnswitch. This happens because LazyBFI
and LazyBPI are not preserved by LoopRotate (note that D74640
is no longer needed). Avoid this by marking them as preserved.
My understanding of D86156 is that it is okay to simply preserve
them (which LoopUnswitch already does for the same reason) and
rely on callbacks to deal with deleted blocks.
Differential Revision: https://reviews.llvm.org/D99843
wasm64 was missing DAG ISEL patterns for external symbol based global.get, but simply adding these analogous to the existing 32-bit versions doesn't work.
This is because we are conflating the 32-bit global index with the pointer represented by the external symbol, which for wasm32 happened to work.
The simplest fix is to pretend we have a 64-bit global index. This sounds incorrect, but is immaterial since once this index is stored as a MachineOperand it becomes 64-bit anyway (and has been all along). As such, the EmitInstrWithCustomInserter based implementation I experimented with become a no-op and no further changes in the C++ code are required.
Differential Revision: https://reviews.llvm.org/D99904
After loop interchange, the (old) outer loop header should not jump to
the `LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of outer loop.
This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D98475
Add InstAlias that allows the last operand to be an imm for following instructions:
1. Zbb or Zbp:
- ror
- rorw (RV64 Only)
2. Zbs
- best
- bclr
- binv
- bext
Reviewed By: craig.topper, jrtc27
Differential Revision: https://reviews.llvm.org/D100083
clang++ uses llvm.compiler.used in certain cases to preserve
symbol which is fully inlined. D96087 has resulted in undefined
symbols in such cases. Set it to false by default to preserve
old behavior but keep the option for specific uses where we
want to ignore these (e.g. to detect a potential indirect call
to a function).
Differential Revision: https://reviews.llvm.org/D99897
During SelectionDAG, we must track the SDNodes that each SDDbgValue depends on
to compute its value. These are ultimately derived from the location operands to
the SDDbgValue, but were stored in a separate vector prior to this patch. This
resulted in cases where one of the lists was updated incorrectly, resulting in
crashes during compilation. This patch fixes the issue by directly recomputing
the dependency list from the SDDbgOperands in getDependencies().
Differential Revision: https://reviews.llvm.org/D99423
Look through copies to find more cases where the two values being
selected are identical. The motivation for this is just to be able to
remove the weird special case where tryFoldCndMask was called from
foldInstOperand, part way through folding a move-immediate into its
users, without regressing any lit tests.
Instead of passing the start value and the defined value to
widenPHIInstruction, pass the VPWidenPHIRecipe directly, which can be
used to get both (and more in future patches).
This allows mapping larger files, delaying OOM failures until too many
pages of them are accessed. This is makes the behavior of the
mapped_file_region in this regard consistent between its "Unix" and
"Windows" implementations.
Guard the code witih #if defined(MAP_NORESERVE), consistent with other
uses of MAP_NORESERVE in llvm-project, because some FreeBSD versions do
not provide this flag.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D96626
ScratchExecCopy needs to be marked as live, we cannot use that register
while EXEC is stored in there.
Marking SGPRForFPSaveRestoreCopy and SGPRForBPSaveRestoreCopy as
available is unnecessary, they should not be live at that point anway.
Differential Revision: https://reviews.llvm.org/D100098
When attempting to truncate a FP vector and store the result out
to memory we crashed because we had no pattern for truncating FP
stores. In fact, we don't support these types of stores and the
correct fix is to stop marking these truncating stores as legal.
Tests have been added here:
CodeGen/AArch64/sve-fptrunc-store.ll
Differential Revision: https://reviews.llvm.org/D100025
During LoopStrengthReduce, some of the SSA values that are used by debug values
may be lost and/or salvaged. After LSR we attempt to recover any undef debug
values, including any that were salvaged but then lost their values afterwards,
by replacing the lost values with any live equal values (plus a possible
constant offset) that have been gathered prior to running LSR. When we do this
we restore the debug value's original DIExpression, to undo any salvaging (as we
have gone back to using the original debug value).
This process can currently produce invalid debug info if the number of operands
has changed by salvaging during LSR. Replacing old values during the
applyEqualValues step does not change the number of location operands, which
means that when we restore the old DIExpression we may have a mismatch between
the number of operands used by the debug value and the number of operands
referenced by the DIExpression. This patch fixes this by restoring the full
original location metadata at the start of the applyEqualValues step, so that
there is no mismatch in operand count between the debug value and its
DIExpression.
Differential Revision: https://reviews.llvm.org/D98644
Without the fix we get
../lib/Target/NVPTX/NVPTXLowerArgs.cpp:236:24: error: lambda capture 'Arg' is not used [-Werror,-Wunused-lambda-capture]
auto IsALoadChain = [Arg](Value *Start) {
^~~
1 error generated.
D99674 stopped the folding of certain select operations into and/or, due
to incorrect folding in the presence of poison. D97360 added some costs
to attempt to account for the change, but only worked at the getUserCost
level, not the getCmpSelInstrCost that the vectorizer will use directly.
This adds similar logic into the vectorizer to handle these logical
and/or selects, treating them like and/or directly.
This fixes 60% performance regressions from code like the attached test
case.
Differential Revision: https://reviews.llvm.org/D99884
This patch adds RVV codegen support for OR/XOR/AND reductions for both
scalable- and fixed-length vector types. There are a few possible
codegen strategies for each -- vmfirst.m, vmsbf.m, and vmsif.m could be
used to some extent -- but the vpopc.m instruction was chosen since it
produces the scalar result in one instruction, after which scalar
instructions can finish off the computation.
The reductions are lowered identically for both scalable- and
fixed-length vectors, although some alternate strategies may be more
optimal on fixed-length vectors since it's cheaper to get the length of
those types.
Other reduction types were not deemed to be relevant for mask vectors.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100030
The GNU AS manual states the following about single-character constants enclosed within single quotes:
> Some backslash escapes apply to characters, \b, \f, \n, \r, \t, and \" with the same meaning as for strings, plus \' for a single quote.
Add two more characters to the switch handling this case to match GAS behaviour, plus a test to make sure nothing regresses.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99609
Combine all collected stats into separate struct RAGreedyStats
with add and report methods.
The motivation is to extend the number of statistics to capture and instead of
adding new parameters, just combine all of them into one structure.
Additionally I plan to use report from different places in future to report data
for function as well.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100012
To save compile time, avoid computation of stats if ORE will not emit it.
The motivation is to add more stats and compute it only if it will dumped.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100010
Summary: Set the default DwarfInlinedStrings as inlined strings for DBX, due to DBX does not support .dwstr section for now.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D99933
If the stack size is larger than 12 bits, we have to use a scratch
register to store the stack size. Before we introduce the scalable stack
offset, we could simplify
%0 = ADDI %stack.0, 0
=>
%scratch = ... # sequence of instructions to move the offset into
%%scratch
%0 = ADD %fp, %scratch
However, if the offset contains scalable part, we need to consider it.
%0 = ADDI %stack.0, 0
=>
%scratch = ... # sequence of instructions to move the offset into
%%scratch
%scratch = ADD %fp, %scratch
%scalable_offset = ... # sequence of instructions for vscaled-offset.
%0 = ADD/SUB %scratch, %scalable_offset
Differential Revision: https://reviews.llvm.org/D100035
This is a (late) follow-up patch of 8871a4b4ca and
c95f39891a to make ConstantStruct::get/ConstantArray::getImpl
correctly return PoisonValue if all elements are poison.
This was found while discussing about the elements of a vector-typed UndefValue (D99853)
Pseudo probes, when scattered in a block, can be chained dependencies of other regular DAG nodes and block DAG combine optimizations. To fix this, scattered probes in a block are grouped and placed at the beginning of the block. This shouldn't affect the profile quality.
Test Plan:
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D100002
New custom DAG nodes were added to represent operations on CSR. These
nodes are lowered to corresponding pseudo instruction. Using the pseudo
instructions allows to specify different scheduling information for
operations on different system registers. It also make possible to
specify dependencies of instructions on specific system registers.
Differential Revision: https://reviews.llvm.org/D98936
After loop interchange, the (old) outer loop header should not jump to
`LoopExit`. Note that the old outer loop becomes the new inner loop
after interchange. If we branched to `LoopExit` then after interchange
we would jump directly from the (new) inner loop header to `LoopExit`
without executing the rest of (new) outer loop.
This patch modifies adjustLoopBranches() such that the old outer
loop header (which becomes the new inner loop header) jumps to the
old inner loop latch which becomes the new outer loop latch after
interchange.
Reviewed By: bmahjour
Differential Revision: https://reviews.llvm.org/D98475
Allow pass to work separately with SGPR, VGPR registers or both.
This is NFC now but will be needed to split RA for separate
SGPR and VGPR passes.
Differential Revision: https://reviews.llvm.org/D100063
If the constants have a difference of 1 we can convert one to
the other by adding or subtracting the condition.
We have a DAG combine for this, but it only runs before type
legalization. If the select is introduced later during type
legalization or op legalization we will miss it.
We don't need a specific condition, but some conditions are
harder to materialize than others on RISCV. I know that SETLT
will be a single instruction and it is what is used by the
motivating pattern from signed saturating add/sub.
Differential Revision: https://reviews.llvm.org/D99021
When using the large code model with FastISel (for example via
clang -O0 which adds the optnone attribute), FP constants could
still be materialized using adrp + ldr. Unconditionally enable
the existing path for MachO to materialize the constant in code.
For testing, restore literal_pools_float.ll to exercise the constant
pool and add two optnone-functions that return a float and a double,
respectively. Consolidate fpimm.ll and add a new fast-isel-fpimm.ll
to check the code paths taken with FastISel.
Differential Revision: https://reviews.llvm.org/D99607
Since we have created a new OF_TextWithCRLF flag, we no longer need to worry about OF_Text flag turning on CRLF translation. I can remove this workaround I added to globally open all ToolOutputFiles as binary on Windows.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D100034
This can't use our normal strategy of splatting the scalar and using
a .vv operation instead of .vx.
Instead this patch bitcasts the vector to the equivalent SEW=32
vector and inserts the scalar parts using two vslide1up/down. We
do that unmasked and apply the mask separately at the end with
a vmerge.
For vslide1up there maybe some other options here like getting
i64 into element 0 and using vslideup.vi with this vector as
vd and the original source as vs1. Masking would still need to
be done afterwards.
That idea doesn't work for vslide1down. We need to slidedown and
then insert a single scalar at vl-1 which we could do with a
vslideup, but that assumes vl > 0 which I don't think we can assume.
The i32 double slide1down implemented here is the best I could come
up with and I just made vslide1up consistent.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D99910
This allows FoldConstantArithmetic to handle SPLAT_VECTOR in
addition to BUILD_VECTOR. This allows it to support scalable
vectors. I'm also allowing fixed length SPLAT_VECTOR which is
used by some targets, but I'm not familiar enough to write tests
for those targets.
I had to block this function from running on CONCAT_VECTORS to
avoid calling getNode for a CONCAT_VECTORS of 2 scalars.
This can happen because the 2 operand getNode calls this
function for any opcode. Previously we were protected because
CONCAT_VECTORs of BUILD_VECTOR is folded to a larger BUILD_VECTOR
before that call. But it's not always possible to fold a CONCAT_VECTORS
of SPLAT_VECTORs, and we don't even try.
This fixes PR49781 where DAG combine thought constant folding
should be possible, but FoldConstantArithmetic couldn't do it.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D99682
-Make sure of the CreateShl/LShr/AShr methods that take a uint64_t
instead of creating a ConstantInt for 1 ourselves.
-Use Builder.getInt1 or ConstantInt::getBool instead of a conditional.
-Pull out repeated calls to getType.
All of the code that handles general constant here (other than the more
restrictive APInt-dealing code) expects that it is an immediate,
because otherwise we won't actually fold the constants, and increase
instruction count. And it isn't obvious why we'd be okay with
increasing the number of constant expressions,
those still will have to be run..
But after 2829094a8e
this could also cause endless combine loops.
So actually properly restrict this code to immediates.
This fixes the examples from
D99674 and
https://llvm.org/PR49878
The matchers succeed on partial undef/poison vector constants,
but the transform creates a full 'not' (-1) constant, so it
would undo a demanded vector elements change triggered by the
extractelement.
Differential Revision: https://reviews.llvm.org/D100044
We see a regression related to low probe factor(0.01) which prevents some callsites being promoted in ICPPass and later cause the missing inline in CGSCC inliner. The root cause is due to redundant(the second) multiplication of the probe factor and this change try to fix it.
`Sum` does multiply a factor right after findCallSamples but later when using as the parameter in setProbeDistributionFactor, it multiplies one again.
This change could get ~2% perf back on mcf benchmark. In mcf, previously the corresponding factor is 1 and it's the recent feature introducing the <1 factor then trigger this bug.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D99787
Use report_fatal_error here since this is an internal error, and not
something the user can/should be trying to fix.
Also distinguish between the symbol being missing and the symbol having
the wrong type.
We have a failure internally where the symbol is missing. Currently
trying to reduce the test case to something we can attach to an llvm
bug.
Differential Revision: https://reviews.llvm.org/D99960
The struct is used for both, callee and caller-save registers now.
The frame index is not set for entrypoints, as we do not need to save
the registers then.
Update the struct name to reflect that.
Differential Revision: https://reviews.llvm.org/D99722
Extend D94856 to handle 'and', 'or' and 'xor' instructions as well
We still fail on many i8/i16 cases as the test and the logic-op are performed on different widths
No need to lookup through and/or try to vectorize operands of the
CmpInst instructions during attempts to find/vectorize min/max
reductions. Compiler implements postanalysis of the CmpInsts so we can
skip extra attempts in tryToVectorizeHorReductionOrInstOperands and save
compile time.
Differential Revision: https://reviews.llvm.org/D99950
The swap of the operands can affect later transforms that
are expecting a constant as operand 1. I don't think we
can trigger a bug with the current code, but I hit that
problem while drafting a new transform for min/max intrinsics.
I do not see any bit-width restriction from the point of the
LLVM Lang Ref - Operand Bundles on the types of the deopt bundle
operands. Statepoint Lowering seems to be able to work with any
types.
This patch relaxes the two related assertions and adds a new test
for this change.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100006
This reverts commit a547b4e26b,
relanding commit 31d219d299,
which was reverted because there was a conflicting inverse transform,
which was causing an endless combine loop, which has now been adjusted.
Original commit message:
https://alive2.llvm.org/ce/z/67w-wQ
We prefer `add`s over `sub`, and this particular xform
allows further folds to happen:
Fixes https://bugs.llvm.org/show_bug.cgi?id=49858
I.e., if any/all of the consants is an expression, don't do it.
Since those constants won't reduce into an immediate,
but would be left as an constant expression, they could cause
endless combine loops after 31d219d299
added an inverse transformation.
A value from reachable block may come to a Phi node as its input from
unreachable block. This may confuse matchSimpleRecurrence which
has no access to DomTree and can falsely recognize something as a recurrency
because of this effect, as the attached test shows.
Patch `ae7b1e` deals with half of this problem, but it only accounts from
the case when an unreachable instruction comes to Phi as an input.
This patch provides a generalization by checking that no Phi block's
predecessor is unreachable (no matter what the input is).
Differential Revision: https://reviews.llvm.org/D99929
Reviewed By: reames
Consider the .debug_pubnames and .debug_pubtypes their own kind of
accelerator and stop emitting them together with the Apple-style
accelerator tables. The only reason we were still emitting both was for
(byte-for-byte) compatibility with dsymutil-classic.
- This patch adds a new accelerator table kind "Pub" which can be
specified with --accelerator=Pub.
- This patch removes the ability to emit both pubnames/types and apple
style accelerator tables. I don't think anyone is relying on that but
it's worth pointing out.
- This patch removes the --minimize option and makes this behavior the
default. Specifying the flag will result in a warning but won't abort
the program.
Differential revision: https://reviews.llvm.org/D99907
Looking at the Doxygen-generated documentation for the llvm namespace
currently shows all sorts of random comments from different parts of the
codebase. These are mostly caused by:
- File doc comments that aren't marked with \file, so they're attached to
the next declaration, which is usually "namespace llvm {".
- Class doc comments placed before the namespace rather than before the
class.
- Code comments before the namespace that (in my opinion) shouldn't be
extracted by doxygen at all.
This commit fixes these comments. The generated doxygen documentation now
has proper docs for several classes and files, and the docs for the llvm
and llvm::detail namespaces are now empty.
Reviewed By: thakis, mizvekov
Differential Revision: https://reviews.llvm.org/D96736
We encountered a hang in our internal code base. I'm having trouble
creating a test case because the test that hit it was testing some
code that is not upstream.
Summary:
The function SplitCriticalEdge (called by SplitEdge) can return a nullptr in
cases where the edge is a critical. SplitEdge uses SplitCriticalEdge assuming it
can always split all critical edges, which is an incorrect assumption.
The three cases where the function SplitCriticalEdge will return a nullptr is:
1. DestBB is an exception block
2. Options.IgnoreUnreachableDests is set to true and
isa(DestBB->getFirstNonPHIOrDbgOrLifetime()) is not equal to a nullptr
3. LoopSimplify form must be preserved (Options.PreserveLoopSimplify is true)
and it cannot be maintained for a loop due to indirect branches
For each of these situations they are handled in the following way:
1. Modified the function ehAwareSplitEdge originally from
llvm/lib/Transforms/Coroutines/CoroFrame.cpp to handle the cases when the DestBB
is an exception block. This function is called directly in SplitEdge.
SplitEdge does not call SplitCriticalEdge in this case
2. Options.IgnoreUnreachableDests is set to false by default, so this situation
does not apply.
3. Return a nullptr in this situation since the SplitCriticalEdge also returned
nullptr. Nothing we can do in this case.
Reviewed By: asbirlea
Differential Revision:https://reviews.llvm.org/D94619
Follow up to a6d2a8d6f5. These were found by simply grepping for "::assume", and are the subset of that result which looked cleaner to me using the isa/dyn_cast patterns.
Follow up to a6d2a8d6f5. This covers all the public interfaces of the bundle related code. I tried to cleanup the internals where the changes were obvious, but there's definitely more room for improvement.
Fixes the ASan RISC-V memory mapping (originally introduced by D87580 and
D87581). This should be an improvement both in terms of first principles
soundness and observed test failures --- test failures would occur
non-deterministically depending on the ASLR random offset.
On RISC-V Linux (64-bit), `TASK_UNMAPPED_BASE` is currently defined as
`PAGE_ALIGN(TASK_SIZE / 3)`. The non-power-of-two divisor makes the result
be the not very round number 0x1555556000. That address had to be further
rounded to ensure page alignment after the shadow scale shifting is applied.
Still, that value explains why the mapping table may look less regular than
expected.
Further cleanups:
- Moved the mapping table comment, to ensure that the two Linux/AArch64
tables stayed together;
- Removed mention of Sv48. Neither the original mapping nor this one are
compatible with an actual Linux Sv48 address space (mainline Linux still
operates Sv48 in Sv39 mode). A future patch can improve this;
- Removed the additional comments, for consistency.
Differential Revision: https://reviews.llvm.org/D97646
Previously, 34-bit constants were materialized in selectI64Imm(), and we relied
on td pattern matching to instead produce a pli. This becomes problematic as
there is no guarantee that the 34-bit constant will reach the td pattern
selection for pli. It is also possible for other transformations (such as complex
bit permutations) to also produce and utilize the 34-bit constant materialized
through selectI64Imm().
This patch instead produces pli on Power10 directly whenever the constant fits
within 34-bits.
Differential Revision: https://reviews.llvm.org/D99906
Add the subclass, update a few places which check for the intrinsic to use idiomatic dyn_cast, and update the public interface of AssumptionCache to use the new class. A follow up change will do the same for the newer assumption query/bundle mechanisms.
performScalarPREInsertion() inserts instructions into blocks that we
need to tell ImplicitControlFlowTracking about, otherwise the ICF cache
may be invalid.
Fixes PR49193.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99909
The current code does not properly handle vector indices unless they are
the first index.
At the moment LangRef gives the impression that the vector index must be
the one and only index (https://llvm.org/docs/LangRef.html#getelementptr-instruction).
But vector indices can appear at any position and according to the
verifier there may be multiple vector indices. If that's the case, the
number of elements must match.
This patch updates SimplifyGEPInst to properly handle those additional
cases.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99961
Many of the operands are handled the same or in the same order
for all these intrinsics. Factor out the code for selecting and
pushing them into the Operands vector.
Differential Revision: https://reviews.llvm.org/D99923
This allows frontend and backend diagnostic files to all go into the
same place. Have it control the Windows (mini-)dump location.
Differential Revision: https://reviews.llvm.org/D99199
This patch adds support for TLS variables to the XCOFF object writer:
- Add TData and TBSS sections
- Add CsectGroups for the mapping classes XCOFF::XMC_TL and XCOFF::XMC_UL
- Add XMC_UL in the enum entry of CsectStorageMapping class to print the string
while reading the symbol properties for TLS variables
- Fix the starting address of TData and TBSS sections
Reviewed by: hubert.reinterpretcast, DiggerLin
Differential Revision: https://reviews.llvm.org/D98946
The key change (4f5e92c) to switch gc.result and gc.relocate to being readnone landed nearly two weeks ago, and we haven't seen any fallout. Time to remove the code added to make reverting easy.
This fixes an oversight in D99747 which moved the IMG init code from
SIAddIMGInit to AdjustInstrPostInstrSelection, but did not set the
hasPostISelHook flag on gather4 instructions.
Differential Revision: https://reviews.llvm.org/D99953
Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:
%phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
%load = load <8 x float>
%reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)
This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D98435
Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.
Solution:
This patch adds two new flags
- OF_CRLF which indicates that CRLF translation is used.
- OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.
Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.
So this is the behaviour per platform with my patch:
z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode
Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return
The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.
```
if (Flags & OF_CRLF)
CrtOpenFlags |= _O_TEXT;
```
These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D99426
Changes getRecurrenceIdentity to always return a neutral value of -0.0 for FAdd.
Reviewed By: dmgreen, spatel
Differential Revision: https://reviews.llvm.org/D98963
For VPWidenPHIRecipes that model all incoming values as VPValue
operands, print those operands instead of printing the original PHI.
D99294 updates recipes of reduction PHIs to use the VPValue for the
incoming value from the loop backedge, making use of this new printing.
After rG47321c311bdbe0145b9bf45d822185c37b19fa50 we promote vXi8 reductions to vXi16 to create a much faster PMULLW mul reduction, followed by a (free) truncation. This avoids the high cost of repeated vXi8 multiplications (which extend+multiply+truncate to/from vXi16 types....).
Fixes the missing vXi8 mul reduction vectorization in PR42674 (Comment #20) 'mul16' test case.
This patch enhances hasAddressTaken() to ignore bitcasts as a
callee in callbase instruction. Such bitcast usage doesn't really take
the address in a useful meaningful way.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D98884
It is generally beneficial to prefer "movi d0, #0" over "fmov s0, wzr" as this
is most efficient across all cores; it is recognised as a zeroing idiom. For
newer cores, fmov instructions can also be eliminated early and there is no
difference with movi, but some implementations lack this so is not true for
other/older cores. Thus this standardises on using movi as this should always
gives the same or better performance than the fmov with wzr.
Differential Revision: https://reviews.llvm.org/D99586
This was using the .2d variant which zeros 128 bits, but using the .2s variant
that zeros 64 bits is faster on some cores.
This is a prep step for D99586 to always using movi for zeroing floats.
Differential Revision: https://reviews.llvm.org/D99710
The reason for the NewPM redesign is described in the commit
cba3e783389a: [NewPM] Disable PreservedCFGChecker ...
The checker introduces an internal custom CFG analysis that tracks
current up-to date CFG snapshot. The analysis is invalidated along
any other CFG related analysis (the key is CFGAnalyses). If the CFG
analysis is not invalidated at a functional pass exit then the checker
asserts that the CFG snapshot taken from this analysis is equals to
a snapshot of the current CFG.
Along the way:
- the function CFG::printDiff() is simplified by removing function
name calculation. The name is printed by the caller;
- fixed CFG invalidated condition (see CFG::invalidate());
- StandardInstrumentations::registerCallbacks() gets additional
optional parameter of type FunctionAnalysisManager*, which is
needed by the checker to get the custom CFG analysis;
- several PM related tests updated to explicitly set
-verify-cfg-preserved=1 as they need.
This patch is safe to land as the CFGChecker is left switched off
(the options -verify-cfg-preserved is false by default). It will be
switched on by a separate patch to minimize possible reverts.
Reviewed By: skatkov, kuhar
Differential Revision: https://reviews.llvm.org/D91327
I missed a few intrinsics in 3dd4aa7d09
when I did this for masked loads and masked segment loads/stores.
Found while trying to share more code between these custom isel
functions.
When we are able to SROA an alloca, we know all uses of it, meaning we
don't have to preserve the invariant group intrinsics and metadata.
It's possible that we could lose information regarding redundant
loads/stores, but that's unlikely to have any real impact since right
now the only user is Clang and vtables.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D99760
This is the sibling fix to c590a9880d -
as there, we can't subsitute a vector value the equality
compare replacement that we are trying requires that the
comparison is true for the entire value. Vector select
can be partly true/false.
It's a bit silly, but it allows us to write stricter type
constraints for isel. There's still some extra type checks in
the generated table due to some type interference limitations
around HWMode.
For use in an uncoming patch. Left out the phi case (which could otherwise fit in this framework) as it would cause infinite recursion in said patch. We can probably also leverage this in instcombine to ensure we keep the two sets of related analysis and transforms in sync.
These look like $00A0cf for hex and %001010101 for binary. They are used in Motorola assembly syntax.
Differential Revision: https://reviews.llvm.org/D98519
TextAPI/ELF has moved out into InterfaceStubs, so theres no longer a
need to seperate out TextAPI between formats.
Reviewed By: ributzka, int3, #lld-macho
Differential Revision: https://reviews.llvm.org/D99811
This patch supports bitcasts from scalar types to fixed-length vectors
and vice versa. It custom-lowers and custom-legalizes them to
EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT operations, using a single-element
vectors to hold the scalar where appropriate.
Previously, some of these would fail to select, others would be expanded
through stack loads and stores. Effort was made to ensure the codegen
avoids the stack for both legal and illegal scalar types.
Some of the codegen could be improved, but on first glance it looks like
a general optimization of EXTRACT_VECTOR_ELT when extracting an i64
element on RV32.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99667
As shown in the example based on:
https://llvm.org/PR49832
...and the existing test, we can't substitute
a vector value because the equality compare
replacement that we are attempting requires
that the comparison is true for the entire
value. Vector select can be partly true/false.
In 0dbcb36394, most most target symbols were made hidden by default
with the public ones marked with LLVM_EXTERNAL_VISIBILITY. When the
M68k target was added, this particular change was forgotten so that
external tools cannot make use of the public M68k target functions
in libLLVM.so. Thus, add the missing LLVM_EXTERNAL_VISIBILITY macro
to all public target functions in the M68k backend.
Differential Revision: https://reviews.llvm.org/D99869
Caught in internal testing, these operations are assumed legal by
default, even for scalable vector types. Expand them back into separate
truncations and stores, or loads and extensions.
Also add explicit fixed-length vector tests for these operations, even
though they should have been correct already.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99654
During vectorization better to postpone the vectorization of the CmpInst
instructions till the end of the basic block. Otherwise we may vectorize
it too early and may miss some vectorization patterns, like reductions.
Reworked part of D57059
Differential Revision: https://reviews.llvm.org/D99796
This patch introduces a DIPrinter interface to implement by different output style printer implementations. DIPrinterGNU and DIPrinterLLVM implement the GNU and LLVM output style printing respectively. No functional changes.
This refactoring clarifies and simplifies the code, and makes a new output style addition easier.
Reviewed By: jhenderson, dblaikie
Differential Revision: https://reviews.llvm.org/D98994
The W version of orc.b does not exist in Zbp so we need to use
gorci encoding. If we have Zbp, we can use gorciw which can avoid a
sext.w in some cases.
This is identical to 781d077afb,
but for the other function.
For certain shift amount bit widths, we must first ensure that adding
shift amounts is safe, that the sum won't have an unsigned overflow.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49778
This is discussed in https://llvm.org/PR48999 ,
but it does not solve that request.
The difference in the vector test shows that some
other logic transform is limited to scalar types.
When converting a switch with two cases and a default into a
select, also handle the denegerate case where two cases have the
same value.
Generate this case directly as
%or = or i1 %cmp1, %cmp2
%res = select i1 %or, i32 %val, i32 %default
rather than
%sel1 = select i1 %cmp1, i32 %val, i32 %default
%res = select i1 %cmp2, i32 %val, i32 %sel1
as InstCombine is going to canonicalize to the former anyway.
Even if one of the operands is overdefined, we may still produce
a non-overdefined result, e.g. due to a min/max operation. This
matches our handling elsewhere, e.g. for binary operators.
The slot poisoning comment refers to a much older LVI cache
implementation.
As long as it's a constant we can directly pattern match it
without any problems. It's only when it isn't a constant that
we need to add an AND.
In theory this should allow more target independent optimizations
to remain active.
This patch fixes llvm.org/pr49688 by conditionally folding select i1 into and/or:
```
select cond, cond2, false
->
and cond, cond2
```
This is not safe if cond2 is poison whereas cond isn’t.
Unconditionally disabling this transformation affects later pipelines that depend on and/or i1s.
To minimize its impact, this patch conservatively checks whether cond2 is an instruction that
creates a poison or its operand creates a poison.
This approach is similar to what InstSimplify's SimplifyWithOpReplaced is doing.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99674
This was prompted by D95727, which had the side-effect to break the
'release' mode build bot for ML-driven policies. The problem is that now
the pre-compiled object files don't get transitively carried through as
'source' anymore; that being said, the previous way of consuming them
was problematic, because it was only working for static builds; in
dynamic builds, the whole tf_xla_runtime was linked, which is
undesirable.
The alternative is to treat tf_xla_runtime as an archive, which then
leads to the desired effect.
Differential Revision: https://reviews.llvm.org/D99829
This is a followup to D98145: As far as I know, tracking of kill
flags in FastISel is just a compile-time optimization. However,
I'm not actually seeing any compile-time regression when removing
the tracking. This probably used to be more important in the past,
before FastRA was switched to allocate instructions in reverse
order, which means that it discovers kills as a matter of course.
As such, the kill tracking doesn't really seem to serve a purpose
anymore, and just adds additional complexity and potential for
errors. This patch removes it entirely. The primary changes are
dropping the hasTrivialKill() method and removing the kill
arguments from the emitFast methods. The rest is mechanical fixup.
Differential Revision: https://reviews.llvm.org/D98294
Fixes PR47603
This should probably be transferable to DAGCombine - the main limitation with the existing trunc(logicop) DAG fold is we don't know if legalization has tried to promote truncated logicops already. We might be able to peek through extensions as well.
Use the getTargetShuffleInputs helper for all shuffle decoding
Reapplied (after reversion in rGfa0aff6d6960) with fix+test for subvector splitting - we weren't accounting for peeking through bitcasts changing the vector element count of the shuffle sources.
Started to see build errors like this
../lib/Support/Z3Solver.cpp:19:10: fatal error: 'z3.h' file not found
#include <z3.h>
^~~~~~
1 error generated.
after commit 43ceb74eb1.
The -isystem path to the Z3_INCLUDE_DIR wen't missing in the compile
commands. No idea why target_include_directories stopped working with
that commit, but using include_directories seem to work better.
InstCombine performs simple forwarding from stores to loads, but
currently only handles the case where the load and store have the
same size. This extends it to also handle a store of a constant
with a larger size followed by a load with a smaller size.
This is implemented through ConstantFoldLoadThroughBitcast() which
is fairly primitive (e.g. does not allow storing a large integer
and then loading a small one), but at least can forward the first
element of a vector store. Unfortunately it seems that we currently
don't have a generic helper for "read a constant value as a different
type", it's all tangled up with other logic in either
ConstantFolding or VNCoercion.
Differential Revision: https://reviews.llvm.org/D98114
The AAMDNodes part of the MemoryLocation is not used by the BasicAA
cache, so don't store it. This reduces the size of each cache entry
from 112 bytes to 48 bytes.
BasicAA itself doesn't make use of AA metadata, but passes it
through to recursive queries and makes it part of the cache key.
Aliasing decisions that are based on AA metadata (i.e. TBAA and
ScopedAA) are based *only* on AA metadata, so checking them with
different pointer values or sizes is not useful, the result will
always be the same.
While this change is a mild compile-time improvement by itself,
the actual goal here is to reduce the size of AA cache keys in
a followup change.
Differential Revision: https://reviews.llvm.org/D90098
Define -fatal-warnings to make warnings fatal, and accept /WX as an ML.EXE compatible alias for it.
Also make sure that if Warning() returns true, we always treat it as an error.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92504
Head files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
clmul
clmulh
clmulr
Differential Revision: https://reviews.llvm.org/D99711
Forgot to amend the Author.
Original commit message:
Header files are included in a separate patch in case the name needs to be changed.
RV32 / 64:
orc.b
Differential Revision: https://reviews.llvm.org/D99320
Make variables and text-macro references case-insensitive, to match ml.exe.
Also improve error handling for text-macro expansion.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D92503
Implementation for RISC-V Zbr extension intrinsic.
Header files are included in separate patch in case the name needs to be changed
RV32 / 64:
crc32b
crc32h
crc32w
crc32cb
crc32ch
crc32cw
RV64 Only:
crc32d
crc32cd
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D99009
For positive constants we try shifting left to remove leading zeros
and fill the bottom bits with 1s. We then materialize that constant
shift it right.
This patch adds a new strategy to try filling the bottom bits with
zeros instead. This catches some additional cases.
When run under valgrind, or with a malloc that poisons freed memory,
this can lead to segfaults or other problems.
To avoid modifying the AdditionalUsers DenseMap while still iterating,
save the instructions to be notified in a separate SmallPtrSet, and use
this to later call OperandChangedState on each instruction.
Fixes PR49582.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D98602
This patch moves mapping of IR operands to VPValues out of
tryToCreateWidenRecipe. This allows using existing VPValue operands when
widening recipes directly, which will be introduced in future patches.
The safepoints being inserted exists to free memory, or coordinate with another thread to do so. Thus, we must strip any inferred attributes and reinfer them after the lowering.
I'm not aware of any active miscompiles caused by this, but since I'm working on strengthening inference of both and leveraging them in the optimization decisions, I figured a bit of future proofing was warranted.
Change the definition of G_SBFX and G_UBFX so that the lsb and width
can have different types than the src and dst operands.
Differential Revision: https://reviews.llvm.org/D99739