Add handling of s_andn2 and mask of 0.
This eliminates redundant instructions from uniform control flow.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D83641
This patch handles CFI with basic block sections, which unlike DebugInfo does
not support ranges. The DWARF standard explicitly requires emitting separate
CFI Frame Descriptor Entries for each contiguous fragment of a function. Thus,
the CFI information for all callee-saved registers (possibly including the
frame pointer, if necessary) have to be emitted along with redefining the
Call Frame Address (CFA), viz. where the current frame starts.
CFI directives are emitted in FDE’s in the object file with a low_pc, high_pc
specification. So, a single FDE must point to a contiguous code region unlike
debug info which has the support for ranges. This is what complicates CFI for
basic block sections.
Now, what happens when we start placing individual basic blocks in unique
sections:
* Basic block sections allow the linker to randomly reorder basic blocks in the
address space such that a given basic block can become non-contiguous with the
original function.
* The different basic block sections can no longer share the cfi_startproc and
cfi_endproc directives. So, each basic block section should emit this
independently.
* Each (cfi_startproc, cfi_endproc) directive will result in a new FDE that
caters to that basic block section.
* Now, this basic block section needs to duplicate the information from the
entry block to compute the CFA as it is an independent entity. It cannot refer
to the FDE of the original function and hence must duplicate all the stuff that
is needed to compute the CFA on its own.
* We are working on a de-duplication patch that can share common information in
FDEs in a CIE (Common Information Entry) and we will present this as a follow up
patch. This can significantly reduce the duplication overhead and is
particularly useful when several basic block sections are created.
* The CFI directives are emitted similarly for registers that are pushed onto
the stack, like callee saved registers in the prologue. There are cfi
directives that emit how to retrieve the value of the register at that point
when the push happened. This has to be duplicated too in a basic block that is
floated as a separate section.
Differential Revision: https://reviews.llvm.org/D79978
Because of the layout of stores (that don't have a destination operand)
this check is exactly the same as the one in
RISCVInstrInfo::isLoadFromStackSlot.
Differential Revision: https://reviews.llvm.org/D81805
ComputeNumSignBits and computeKnownBits both trigger "Scalable flag
may be dropped" warnings when a fixed length vector is extracted
from a scalable vector. This patch assumes nothing about the
demanded elements thus matching the behaviour when extracting a
scalable vector from a scalable vector.
Differential Revision: https://reviews.llvm.org/D83642
Fix two obvious errors in the code and also update the test check.
Also add one test to catch the failure.
Patch by Ruiling Song!
Differential Revision: https://reviews.llvm.org/D83280
The code already supports addressing a fixed-size stack object from
the frame-pointer, by first subtracting sizeof(SVE area) from FP.
Reviewers: efriedma, cameron.mcinally, david-arm, rengolin
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D83125
The hardware spec require src0 of s_cmpk should be a register. So, we
should not optimize s_cmp to s_cmpk if src0 is not register.
Patch by Ruiling Song!
I have added a new file:
llvm/test/CodeGen/AArch64/README
that describes what to do in the event one of the SVE codegen tests
fails the warnings check. In addition, I've added comments to all
the relevant SVE tests pointing users at the README file.
Differential Revision: https://reviews.llvm.org/D83467
In DAGCombiner::TransformFPLoadStorePair we were dropping the scalable
property of TypeSize when trying to create an integer type of equivalent
size. In fact, this optimisation makes no sense for scalable types
since we don't know the size at compile time. I have changed the code
to bail out when encountering scalable type sizes.
I've added a test to
llvm/test/CodeGen/AArch64/sve-fp.ll
that exercises this code path. The test already emits an error if it
encounters warnings due to implicit TypeSize->uint64_t conversions.
Differential Revision: https://reviews.llvm.org/D83572
Preserve SCC dead flags in SIOptimizeExecMaskingPreRA.
This helps with removing redundant s_andn2 instructions later.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D83637
https://reviews.llvm.org/D78411 introduced test changes which relied on
the ability to strip debugify metadata even if module-level metadata is
missing. This introduces a more targeted test for that ability.
We have this generic transform in IR (instcombine),
but as shown in PR41098:
http://bugs.llvm.org/PR41098
...the pattern may emerge in codegen too.
x86 has a potential refinement/reversal opportunity here,
but that should come later or needs a target hook to
avoid the transform. Converting to bswap is the more
specific form, so we should use it if it is available.
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision: https://reviews.llvm.org/D83567
This carves out an exception for a pair of consecutive loads that are
reversed from the consecutive order of a pair of stores. All of the
existing profitability/legality checks for the memops remain between
the 2 altered hunks of code.
This should give us the same x86 base-case asm that gcc gets in
PR41098 and PR44895:i
http://bugs.llvm.org/PR41098http://bugs.llvm.org/PR44895
I think we are missing a potential subsequent conversion to use "movbe"
if the target supports that. That might be similar to what AArch64
would use to get "rev16".
Differential Revision:
This refactors option -disable-mve-tail-predication to take different arguments
so that we have 1 option to control tail-predication rather than several
different ones.
This is also a prep step for D82953, in which we want to reject reductions
unless that is requested with this option.
Differential Revision: https://reviews.llvm.org/D83133
Check that input size matches size of destination reg class.
Attempt to extend input size when needed.
Differential Revision: https://reviews.llvm.org/D83384
This patch adds support for constrained int/fp conversion between
signed/unsigned i32 and f32/f64.
Reviewed By: jhibbits
Differential Revision: https://reviews.llvm.org/D82747
Bit 7 of the index controls zeroing, the other bits are ignored when bit 7 is set. Shuffle lowering was using 128 and shuffle combining was using 255. Seems like we should be consistent.
This patch changes shuffle combining to use 128 to match lowering.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D83587
Currently, llvm when see a global variable in .maps section,
it ensures its type must be a struct type. Then pointee
will be further evaluated for the structure members.
In normal cases, the pointee type will be skipped.
Although this is what current all bpf programs are doing,
but it is a little bit restrictive. For example, it is legitimate
for users to have:
typedef struct { int key_size; int value_size; } __map_t;
__map_t map __attribute__((section(".maps")));
This patch lifts this restriction and typedef of
a struct type is also allowed for .maps section variables.
To avoid create unnecessary fixup entries when traversal
started with typedef/struct type, the new implementation
first traverse all map struct members and then traverse
the typedef/struct type. This way, in internal BTFDebug
implementation, no fixup entries are generated.
Two new unit tests are added for typedef and const
struct in .maps section. Also tested with kernel bpf selftests.
Differential Revision: https://reviews.llvm.org/D83638
fadd (fma A, B, (fmul C, D)), E --> fma A, B, (fma C, D, E)
This is only allowed when "reassoc" is present on the fadd.
As discussed in D80801, this transform goes beyond
what is allowed by "contract" FMF (-ffp-contract=fast).
That is because we are fusing the trailing add of 'E' with a
multiply, but without "reassoc", the code mandates that the
products A*B and C*D are added together before adding in 'E'.
I've added this example to the LangRef to try to clarify the
meaning of "contract". If that seems reasonable, we should
probably do something similar for the clang docs because
there does not appear to be any formal spec for the behavior
of -ffp-contract=fast.
Differential Revision: https://reviews.llvm.org/D82499
These test cases fail to use vpternlog because the AND was converted
to a blend shuffle and then converted back to AND during shuffle lowering.
This results in the AND having a different type than it started with.
This prevents our custom matching logic from seeing the two logic ops.
It is possible that LowerSwitch pass leaves certain blocks
unreachable from the entry. If not removed, these dead blocks
can cause undefined behavior in the subsequent passes.
It caused a crash in the AMDGPU backend after the instruction
selection when a PHI node has its incoming values coming from
these unreachable blocks.
In the AMDGPU pass flow, the last invocation of UnreachableBlockElim
precedes where LowerSwitch is currently placed and eventually
missed out on the opportunity to get these blocks eliminated.
This patch ensures that LowerSwitch pass get inserted earlier
to make use of the existing unreachable block elimination pass.
Reviewed By: sameerds, arsenm
Differential Revision: https://reviews.llvm.org/D83584
P9 is the only one with InstrSchedModel, but we may have more in the
future, we should not hardcoded it to P9, check hasInstrSchedModel
instead.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D83590
In BUILD_VECTOR lowering, we used to generally prefer using splats
over v128.const instructions because v128.const has a very large
encoding. However, in d5b7a4e2e8 we switched to preferring consts
because they are expected to be more efficient in engines. This patch
updates the ISel patterns to match this current preference.
Differential Revision: https://reviews.llvm.org/D83581
This doesn't appear used for anything, and is emitted incorrectly
based on the description. This also depends on the IR type, and
pointee element type.