Summary:
* Remove a bunch of asserts checking for unsupported scalable types and
add some more now that they are supported.
* Propagate the scalable flag where necessary.
* Add another `EVT::getExtendedVectorVT` method that takes an
ElementCount parameter.
* Add `EVT::isExtendedScalableVector` and
`EVT::getExtendedVectorElementCount` - latter is currently unused.
Reviewers: sdesmalen, efriedma, rengolin, craig.topper, huntergr
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75672
Summary:
Related to D75672, this patch adds EVT::isFixedLengthVector to determine
if the underlying vector type is of fixed length.
An assert is introduced in EVT::getVectorNumElements that triggers for
types that aren't fixed length. This is currently guarded by a flag
added D75297 that is off by default and has been renamed to the more
generic ENABLE_STRICT_FIXED_SIZE_VECTORS.
Ideally we want to get rid of getVectorNumElements but a quick grep
shows there are >350 uses in lib/CodeGen and 75 in lib/Target/AArch64
alone. All of these probably aren't EVT::getVectorNumElements (some may
be the MVT equivalent), but there are many places to fixup and having
the assert on by default would make the SVE upstreaming effort
difficult.
Reviewers: sdesmalen, efriedma, ctetreau, huntergr, rengolin
Reviewed By: efriedma
Subscribers: mgorny, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76376
Gather/scatter don't access one memory location, they access multiple disjoint locations. So using a fixed size isn't accurate. But we don't have a way to represent the true behavior so just use UnknownSize.
Previously we "split" the memory VT and use that size for the MMO of each half. But the memory VT is scalar so splitting usually just returned the original scalar VT, but on 32-bit X86 if the scalar VT was i64 it probably returned i32?
Differential Revision: https://reviews.llvm.org/D76388
The existence of the class is more confusing than helpful, I think; the
commonality is mostly just "GEP is legal", which can be queried using
APIs on GetElementPtrInst.
Differential Revision: https://reviews.llvm.org/D75660
SelectionDAG CSEs nodes based on their result type and operands, but not their flags. The flags are expected to be intersected when they are CSEd. In SelectionDAGBuilder, for FP nodes we manage both the fast math flags and the nofpexcept flag after the nodes have already been CSEd when they were created with getNode. The management of the fastmath flags before the constrained nodes prevents the nofpexcept management from working correctly.
This commit moves the FMF handling for constrained intrinsics into their visitor and disables the common FMF handling for these nodes.
Differential Revision: https://reviews.llvm.org/D75224
This patch generates TableGen descriptions for the specified register
banks which contain a list of register sizes corresponding to the
available HwModes. The appropriate size is used during codegen according
to the current HwMode. As this HwMode was not available on generation,
it is set upon construction of the RegisterBankInfo class. Targets
simply need to provide the HwMode argument to the
<target>GenRegisterBankInfo constructor.
The RISC-V RegisterBankInfo constructor has been updated accordingly
(plus an unused argument removed).
Differential Revision: https://reviews.llvm.org/D76007
This ports some combines from DAGCombiner.cpp which perform some trivial
transformations on instructions with undef operands.
Not having these can make it extremely annoying to find out where we differ
from SelectionDAG by looking at existing lit tests. Without them, we tend to
produce pretty bad code generation when we run into instructions which use
undef operands.
Also remove the nonpow2_store_narrowing testcase from arm64-fallback.ll, since
we no longer fall back on the add.
Differential Revision: https://reviews.llvm.org/D76339
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
When optimising for code size at the expense of performance, it is often
worth saving and restoring some of r0-r3, if IPRA will be able to take
advantage of them. This doesn't cost any extra code size if we already
have a PUSH/POP pair, and increases the number of available registers
across any calls to the function.
We already have an optimisation which tries fold the subtract/add of the
SP into the PUSH/POP by using extra registers, which somewhat conflicts
with this. I've made the new optimisation less aggressive in cases where
the existing one is likely to trigger, which gives better results than
either of these optimisations by themselves.
Differential revision: https://reviews.llvm.org/D69936
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76348
Skip debug instructions before calling functions not expecting them.
In particular, LIS.getInstructionIndex(*mi) would fail if mi was a debg instr.
Differential Revision: https://reviews.llvm.org/D76129
If it is a*b-c*d, it could be also folded into fma(a, b, -c*d) or fma(-c, d, a*b).
This patch is trying to respect the uses of a*b and c*d to make the best choice.
Differential Revision: https://reviews.llvm.org/D75982
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
ISD::ROTL/ROTR rotation values are guaranteed to act as a modulo amount, so for power-of-2 bitwidths we only need the lowest bits.
Differential Revision: https://reviews.llvm.org/D76201
When compiling
```
struct S {
float w;
};
void f(long w, long b);
void g(struct S s) {
int w = s.w;
f(w, w*4);
}
```
I get Assertion failed: ((!CombinedExpr || CombinedExpr->isValid()) && "Combined debug expression is invalid").
That's because we combine two epxressions that both end in DW_OP_stack_value:
```
(lldb) p Expr->dump()
!DIExpression(DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p Param.Expr->dump()
!DIExpression(DW_OP_constu, 4, DW_OP_mul, DW_OP_LLVM_convert, 32, DW_ATE_signed, DW_OP_LLVM_convert, 64, DW_ATE_signed, DW_OP_stack_value)
(lldb) p CombinedExpr->isValid()
(bool) $0 = false
(lldb) p CombinedExpr->dump()
!DIExpression(4097, 32, 5, 4097, 64, 5, 16, 4, 30, 4097, 32, 5, 4097, 64, 5, 159, 159)
```
I believe that in this particular case combining two stack values is
safe, but I didn't want to sink the special handling into
DIExpression::append() because I do want everyone to think about what
they are doing.
Patch by Adrian Prantl.
Fixes PR45181.
rdar://problem/60383095
Differential Revision: https://reviews.llvm.org/D76164
RDF is designed to be target agnostic. Therefore it would be useful to have it available for other targets, such as X86.
Based on a previous patch by Krzysztof Parzyszek
Differential Revision: https://reviews.llvm.org/D75932
I believe we were previously calculating a pointer info with the scalar base and an offset of 0. But that's not really where the gather is pointing. The offset is a function of the indices of the GEP we looked through.
Also set the size of the MachineMemOperand to UnknownSize
Differential Revision: https://reviews.llvm.org/D76157
Summary: The following change is to allow the machine outlining can be applied for Nth times, where N is specified by the compiler option. By default the value of N is 1. The motivation is that the repeated machine outlining can further reduce code size. Please refer to the presentation "Improving Swift Binary Size via Link Time Optimization" in LLVM Developers' Meeting in 2019.
Reviewers: aschwaighofer, tellenbach, paquette
Reviewed By: paquette
Subscribers: tellenbach, hiraditya, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71027
Under certain circumstances we'll end up in the position where the negated shift amount will get truncated to the type specified getScalarShiftAmountTy(), so we need to test for a truncated version of the shift amount as well.
This allows us to remove half of the remaining patterns tested for by X86ISelLowering's combineOrShiftToFunnelShift.
MCTargetOptionsCommandFlags.inc and CommandFlags.inc are headers which contain
cl::opt with static storage.
These headers are meant to be incuded by tools to make it easier to parametrize
codegen/mc.
However, these headers are also included in at least two libraries: lldCommon
and handle-llvm. As a result, when creating DYLIB, clang-cpp holds a reference
to the options, and lldCommon holds another reference. Linking the two in a
single executable, as zig does[0], results in a double registration.
This patch explores an other approach: the .inc files are moved to regular
files, and the registration happens on-demand through static declaration of
options in the constructor of a static object.
[0] https://bugzilla.redhat.com/show_bug.cgi?id=1756977#c5
Differential Revision: https://reviews.llvm.org/D75579
With -fstack-protector-strong we check if a non-array variable has its address
taken in a way that could cause a potential out-of-bounds access. However what
we don't catch is when the address is directly used to create an out-of-bounds
memory access.
Fix this by examining the offsets of GEPs that are ultimately derived from
allocas and checking if the resulting address is out-of-bounds, and by checking
that any memory operations using such addresses are not over-large.
Fixes PR43478.
Differential revision: https://reviews.llvm.org/D75695
This is the second patch in a series of patches to enable basic block
sections support.
This patch adds support for:
* Creating direct jumps at the end of basic blocks that have fall
through instructions.
* New pass, bbsections-prepare, that analyzes placement of basic blocks
in sections.
* Actual placing of a basic block in a unique section with special
handling of exception handling blocks.
* Supports placing a subset of basic blocks in a unique section.
* Support for MIR serialization and deserialization with basic block
sections.
Parent patch : D68063
Differential Revision: https://reviews.llvm.org/D73674
I used the implementation for floor instead of round. It also turns
out the OpenCL builtin library wasn't using the round builtin, but
implemented the expanded form.
Summary:
This is a part of the series of efforts for correcting alignment of memory operations.
(Another related bugs: https://bugs.llvm.org/show_bug.cgi?id=44388 , https://bugs.llvm.org/show_bug.cgi?id=44543 )
This fixes https://bugs.llvm.org/show_bug.cgi?id=43880 by giving default alignment of loads to 1.
The test CodeGen/AArch64/bcmp-inline-small.ll should have been changed; it was introduced by https://reviews.llvm.org/D64805 . I talked with @evandro, and confirmed that the test is okay to be changed.
Other two tests from PowerPC needed changes as well, but fixes were straightforward.
Reviewers: courbet
Reviewed By: courbet
Subscribers: nlopes, gchatelet, wuzish, nemanjai, kristof.beyls, hiraditya, steven.zhang, danielkiss, llvm-commits, evandro
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76113
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not frozen, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Followup to D75114, this patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have variable shift amounts, matched with MatchFunnelPosNeg which acts in an (almost) equivalent manner to MatchRotatePosNeg.
Summary:
This is a simple fix for CodeGenPrepare that freezes branch condition when transforming select to branch.
If it is not freezed, instsimplify or the later pipeline can potentially exploit undefined behavior.
The diff shows optimized form becase D75859 and D76048 already made a few changes to CodeGenPrepare for optimizing freeze(cmp).
Reviewers: jdoerfert, spatel, lebedev.ri, efriedma
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76179
Summary:
This is a simple patch that expands https://reviews.llvm.org/D75859 to pointer comparison and fcmp
Checked with Alive2
Reviewers: reames, jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76048
LLVM currently supports CSK_MD5 and CSK_SHA1 source file checksums in
debug info. This change adds support for CSK_SHA256 checksums.
The SHA256 checksums are supported by the CodeView debug format.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D75785
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76043
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
* aarch64_sve_ldnt1_gather_index
* aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.
As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.
The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
* GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
* SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1
The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75601
Summary: When narrowing a scalar G_EXTRACT where the destination lines up perfectly with a single result of the emitted G_UNMERGE_VALUES a COPY should be emitted instead of unconditionally trying to emit a G_MERGE_VALUES.
Reviewers: arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75743