I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98938
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98776
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.
Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.
Differential Revision: https://reviews.llvm.org/D98857
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.
Differential Revision: https://reviews.llvm.org/D98646
This requires changes to TableGen files and some C++ files due to
incompatible multiclass template arguments that slipped through
before the improved handling.
Replace the unknown operand used for immediate operands for DIV/MUL with a fixed 16-bit immediate.
This is required since the assembly parser generator requires that all operands are typed.
Differential Revision: https://reviews.llvm.org/D98819
This patch adds support for intrinsic overloading on unnamed types.
This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484).
The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types.
This can result in identical intrinsic mangled names for different function prototypes.
This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name.
Implementation details:
- The mapping is created on demand and kept in Module.
- It also checks for existing clashes and recycles potentially existing prototypes and declarations.
- Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument.
- I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name).
-- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is
-- The current situation already has a limitation. So that should not get worse with this patch.
- Intrinsic::getDeclaration and the verifier are now using the new version.
Other notes:
- As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct
- The initial count starts from 0 for each intrinsic mangled name.
- In case of name clashes, existing prototypes are remembered and reused when that makes sense.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D91250
When a D-Form instruction is fed by an add-immediate, we attempt
to merge the two immediates to form a single displacement so we
can remove the add-immediate.
However, we don't check whether the new displacement fits into
a 16-bit signed immediate field early enough. Namely, we do a
sign-extend from 16 bits first which will discard high bits and
then we check whether the result is a 16-bit signed immediate.
It of course will always be.
Move the check prior to the sign extend to ensure we are checking
the correct value.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49640
This patch consists of the initial changes to help distinguish between text and binary content correctly on z/OS. I would like to get feedback from Windows users on setting OF_None for all ToolOutputFiles. This seems to have been done as an optimization to prevent CRLF translation on Windows in the past.
Reviewed By: zibi
Differential Revision: https://reviews.llvm.org/D97785
This makes it simpler to determine when two registers are actually the
same vs just partially aliasing.
The only real caveat is that it becomes impossible to know which name
was used for the register previously. (i.e. parsing assembly and then
disassembling it can result in the register name changing.)
Differential Revision: https://reviews.llvm.org/D98536
This is required in order to determine during disassembly whether a
Reg bead without associated DA bead is referring to a data register.
Differential Revision: https://reviews.llvm.org/D98534
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.
Observed ~0.5% negative compile time impact.
Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.
This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.
Differential Revision: https://reviews.llvm.org/D98823
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.
In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.
Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.
Differential Revision: https://reviews.llvm.org/D98629
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).
With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).
```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>> /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```
Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.
Differential Revision: https://reviews.llvm.org/D98903
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
This is only adding support to the dfsan instrumentation pass but not
to the runtime.
Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98734
This reverts commit 962b73dd0f.
This commit was reverted because of some internal SPEC test failures.
It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
This is the alternative approach to D96931.
In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm. ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.
In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D98762
This reverts commit 6b053c9867.
The build is broken:
ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>> LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
LIT test would become too obscure. I can imagine that we'd want to CHECK
against VPlan dumps after multiple transformations instead. That would be
easier with plain text dumps than with DOT format.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D96628
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.
Depends on D98466.
Differential Revision: https://reviews.llvm.org/D98676
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.
Depends on D98457.
Differential Revision: https://reviews.llvm.org/D98466
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.
Differential Revision: https://reviews.llvm.org/D98457
value profile annotated after inlining.
In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.
The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.
Differential Revision: https://reviews.llvm.org/D98835
This reverts commit 11b70b9e3a.
The bot failure was due to ArgumentPromotion deleting functions
without deleting their analyses. This was separately fixed in 4b1c807.
Not doing it here can lead to subtle bugs - the analysis results are
associated by the Function object's address. Nothing stops the memory
allocator from allocating new functions at the same address.
Valgrind is reporting this bogus warning because it doesn't model
pthread_sigmask fully accurately. This is a valgrind bug, but
silencing it has effectively no cost, so just do it.
==73662== Syscall param __pthread_sigmask(set) points to uninitialised byte(s)
==73662== at 0x101E9D4C2: __pthread_sigmask (in /usr/lib/system/libsystem_kernel.dylib)
==73662== by 0x101EFB5EA: pthread_sigmask (in /usr/lib/system/libsystem_pthread.dylib)
==73662== by 0x1000D9F6D: llvm::sys::Process::SafelyCloseFileDescriptor(int) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662== by 0x100072795: llvm::ErrorOr<std::__1::unique_ptr<llvm::MemoryBuffer, std::__1::default_delete<llvm::MemoryBuffer> > > getFileAux<llvm::MemoryBuffer>(llvm::Twine const&, long long, unsigned long long, unsigned long long, bool, bool) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662== by 0x100072573: llvm::MemoryBuffer::getFileOrSTDIN(llvm::Twine const&, long long, bool) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662== by 0x100282C25: mlir::openInputFile(llvm::StringRef, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) (in /Users/chrisl/Projects/circt/build/bin
Differential Revision: https://reviews.llvm.org/D98830
[amdgpu] Update med3 combine to skip i64
Fixes an assumption that a type which is not i32 will be i16. This asserts
when trying to sign/zero extend an i64 to i32.
Test case was cut down from an openmp application. Variations on it are hit by
other combines before reaching the problematic one, e.g. replacing the
immediate values with other function arguments changes the codegen path and
misses this combine.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D98872
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.
Noticed while investigating the poor codegen in D98587.
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.
Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).
This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
If SLP vectorizer tries to extend the scheduling region and runs out of
the budget too early, but still extends the region to the new ending
instructions (i.e., it was able to extend the region for the first
instruction in the bundle, but not for the second), the compiler need to
recalculate dependecies in full, just like if the extending was
successfull. Without it, the schedule data chunks may end up with the
wrong number of (unscheduled) dependecies and it may end up with the
incorrect function, where the vectorized instruction does not dominate
on the extractelement instruction.
Differential Revision: https://reviews.llvm.org/D98531
Provides API that allows to check predicate for being true or
false with one call. Current implementation is naive and just
calls isKnownPredicate twice, but further we can rework this
logic trying to use one check to prove both facts.
Add ISD::ABS to the existing unary instructions handling for splat detection
This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring.
Differential Revision: https://reviews.llvm.org/D98778
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96486
This patch supports the masked gather intrinsics in RVV.
The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.
Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D96263
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98779
D93594 depend on the dominate tree and loop information. It increased
the compile time when build with -O0. However this is just to amend the
dominate tree and loop information, so that it is unnecessary to
re-analyze them again. Given the dominate tree of loop information are
absent in this pass, we can avoid amending them.
Differential Revision: https://reviews.llvm.org/D98773
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.
Differential Revision: https://reviews.llvm.org/D98704
This patch changes the operand order of masked vmslt[u]
from (mask, rs1, scalar, maskedoff, vl)
to (maskedoff, rs1, scalar, mask, vl).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98839
The offset in HVX loads/stores is only 4 bits long, so often an
extra register is needed to hold the address. Minimize the number
of such registers by "standardizing" the base addresses and reusing
preexisting base registers when replacing frame indices.
These are pseudos without any users, so DCE was killing them in the combiner.
Marking them as having side effects doesn't seem quite right since they don't.
Gives a nice 0.3% geomean size win on CTMark -Os.
Differential Revision: https://reviews.llvm.org/D98811
Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains the intent of the original code by still revisiting
a subgraph if a different set of lanes is defined and hence
marking might progress differently.
Note: default size of the visited set has been confirmed to
cover >99% of invocations in large array of test shaders.
Reviewed By: piotr
Differential Revision: https://reviews.llvm.org/D98772
Fixed substitution printing not to produce an empty diagnostic for
errors handled elsewhere.
Reviewed By: thopre
Differential Revision: https://reviews.llvm.org/D98088
A more general name might be match-time error propagation. That is,
it's conceivable we'll one day have non-numeric errors that require
the handling fixed by this patch.
Without this patch, FileCheck behaves as follows:
```
$ cat check
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]
$ FileCheck -vv -dump-input=never check < input
check:1:54: remark: implicit EOF: expected string found in input
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]
^
<stdin>:2:1: note: found here
^
check:1:15: error: unable to substitute variable or numeric expression: overflow error
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]
^
$ echo $?
0
```
Notice that the exit status is 0 even though there's an error.
Moreover, FileCheck doesn't print the error diagnostic unless both
`-dump-input=never` and `-vv` are specified.
The same problem occurs when `CHECK-NOT` does have a match but a
capture fails due to overflow: exit status is 0, and no diagnostic is
printed unless both `-dump-input=never` and `-vv` are specified. The
usefulness of capturing from `CHECK-NOT` is questionable, but this
case should certainly produce an error.
With this patch, FileCheck always includes the error diagnostic and
has non-zero exit status for the above examples. It's conceivable
that this change will cause some existing tests to fail, but my
assumption is that they should fail. Moreover, with nearly every
project enabled, this patch didn't produce additional `check-all`
failures for me.
This patch also extends input dumps to include such numeric error
diagnostics for both expected and excluded patterns.
As noted in fixmes in some of the tests added by this patch, this
patch worsens an existing issue with redundant diagnostics. I'll fix
that bug in a subsequent patch.
Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D98086
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.
We should fall back earlier, in the translator, to avoid this issue.
Differential Revision: https://reviews.llvm.org/D98730
Fix a bug in MachOObjectFile::getSymbolType() that it is not checking if
the iterator is end() before deference the iterator. Instead, return
`Other` type, which aligns with the behavior of `llvm-nm`.
rdar://75291638
Reviewed By: davide, ab
Differential Revision: https://reviews.llvm.org/D98739
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.
Differential Revision: https://reviews.llvm.org/D98210
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.
Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98805
At the moment `getMCInstrBeads` is forward-declared in a few places,
bring this together into a single header file.
This was done as part of the disassembler work, since the disassembler
would otherwise add one more forward declaration.
Differential Revision: https://reviews.llvm.org/D98533
This is required because empty strings are not allowed when generating
the assembly parser tables.
Differential Revision: https://reviews.llvm.org/D98532
Maybe there's a way to make them work, but until I've investigated
if tools can consume large PDBs, erroring out is better than slowly
and silently consuming all available ram due to internal invariants
being violated.
(Patch to make writing larger files work at
https://bugs.chromium.org/p/chromium/issues/detail?id=1179085#c25
but I haven't had time to check if windbg & co can consume these
large PDBs. llvm-pdbutil can't, but we can fix that one at least :) )
Differential Revision: https://reviews.llvm.org/D98788
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.
This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.
Differential Revision: https://reviews.llvm.org/D98728
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.
Differential Revision: https://reviews.llvm.org/D98206
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587.
The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.
The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D97465
Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we
don't naturally have a rn, rm addressing mode, we can either generate
"ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]".
We currently generate the first, always creating a sxth. They are both
the same number of instructions, but if we generate the second then the
mov #0 will likely be CSE'd or pulled out of a loop, etc.
This adjusts the ISel patterns to do that, creating a mov instead of a
sxth.
Differential Revision: https://reviews.llvm.org/D98693
The D93881 added functionality which preserve ownership for output file
if llvm-objcopy is called under root. That code was added into the place
where output file is created. The llvm-objcopy already has a function which
sets/restores rights/permissions for the output file.
That is the restoreStatOnFile() function. This patch moves code
(preserving ownershipping) into the restoreStatOnFile() function.
Differential Revision: https://reviews.llvm.org/D98511
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427.
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:
%c1 = extractelement <4 x i1> %C, i32 1
br i1 %c1, label %if, label %else
if:
%sa = extractelement <4 x i32> %a, i32 1
%sb = getelementptr inbounds float, float* %pg, i32 %sa
%sv = extractelement <4 x float> %x, i32 1
store float %sa, float* %sb, align 4
else:
So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.
Differential Revision: https://reviews.llvm.org/D98243
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.
In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.
This optimization is disabled when optimizing for size.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98700
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.
Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.
This patch drops dependence of BasicAA on LoopInfo to avoid this problem.
This may potentially pessimize the result of queries to BasicAA.
Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic