Summary:
It can be the case that a vector type is legal but the corresponding
scalar type is not legal for an architecture (i8 vs. v16i8 on AArch64).
Check if the scalar type created when folding
truncate(build_vector(x,y)) -> build_vector(truncate(x),truncate(y))
is legal if we are running after the type legalizer.
This fixes https://github.com/android/ndk/issues/1207.
Reviewers: RKSimon, srhines
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76312
Adds/changes some types in the ByVal cc test so that they aren't all
structs of arrays of bytes, and adds testing for passing multiple
ByVal arguments.
The combine tries to put the broadcast in either the integer or
fp domain to match the bitcast domain. But we can only do this
if the broadcast size is 32 or larger.
FinishThunk, and the invariant of setting and then unsetting
CurCodeDecl, was added in 7f416cc426 (2015). The invariant didn't
exist when I added this musttail codepath in ab2090d107 (2014).
Recently in 28328c3771, I started using this codepath on non-Windows
platforms, and users reported problems during release testing (PR44987).
The issue was already present for users of EH on i686-windows-msvc, so I
added a test for that case as well.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D76444
Discovered by a downstream user, we found that the CallGraph ignores
callees unless they are defined. This seems foolish, and prevents
combining the report with other reports to create unified reports.
Additionally, declarations contain information that is likely useful to
consumers of the CallGraph.
This patch implements this by splitting the includeInGraph function into
two versions, the current one plus one that is for callees only. The
only difference currently is that includeInGraph checks for a body, then
calls includeCalleeInGraph.
Differential Revision: https://reviews.llvm.org/D76435
The sll/srl/sra scalar vector shifts can be replaced with generic shifts if the shift amount is known to be in range.
This also required public DemandedElts variants of llvm::computeKnownBits to be exposed (PR36319).
Summary:
I've implemented them as target-specific IR intrinsics rather than
using `@llvm.experimental.vector.reduce.add`, on the grounds that the
'experimental' intrinsic doesn't currently have much code generation
benefit, and my replacements encapsulate the sign- or zero-extension
so that you don't expose the illegal MVE vector type (`<4 x i64>`) in
IR.
The machine instructions come in two versions: with and without an
input accumulator. My new IR intrinsics, like the 'experimental' one,
don't take an accumulator parameter: we represent that by just adding
on the input value using an ordinary i32 or i64 add. So if you write
the `vaddvaq` C-language intrinsic with an input accumulator of zero,
it can be optimised to VADDV, and conversely, if you write something
like `x += vaddvq(y)` then that can be combined into VADDVA.
Most of this is achieved in isel lowering, by converting these IR
intrinsics into the existing `ARMISD::VADDV` family of custom SDNode
types. For the difficult case (64-bit accumulators), isel lowering
already implements the optimization of folding an addition into a
VADDLV to make a VADDLVA; so once we've made a VADDLV, our job is
already done, except that I had to introduce a parallel set of ARMISD
nodes for the //predicated// forms of VADDLV.
For the simpler VADDV, we handle the predicated form by just leaving
the IR intrinsic alone and matching it in an ordinary dag pattern.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76491
Summary:
I've implemented these as target-specific IR intrinsics, because
they're not //quite// enough like @llvm.experimental.vector.reduce.min
(which doesn't take the extra scalar parameter). Also this keeps the
predicated and unpredicated versions looking similar, and the
floating-point minnm/maxnm versions fold into the same schema.
We had a couple of min/max reductions already implemented, from the
initial pathfinding exercise in D67158. Those were done by having
separate IR intrinsic names for the signed and unsigned integer
versions; as part of this commit, I've changed them to use a flag
parameter indicating signedness, which is how we ended up deciding
that the rest of the MVE intrinsics family ought to work. So now
hopefully the ewhole lot is consistent.
In the new llc test, the output code from the `v8f16` test functions
looks quite unpleasant, but most of it is PCS lowering (you can't pass
a `half` directly in or out of a function). In other circumstances,
where you do something else with your `half` in the same function, it
doesn't look nearly as nasty.
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76490
Summary:
This increases the coverage for things that differ between Linux and Windows, such as -fdelayed-template-parsing. This would have prevented the rollback of https://reviews.llvm.org/D76346.
While at it, update -std=c++11 to c++17 for the test.
Reviewers: gribozavr2
Reviewed By: gribozavr2
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76497
The zero sized structs force creation of a stack object of size 1, align
8 in the locals area, but otherwise have no effect on the calling convention
code. i.e. They consume no registers or stack space in the paramater save area.
The 32-bit codegen has 8 bytes of padding to fit the new stack object so
stack size stays the same. 64-bit codegen has no padding in the stack
frames allocated so 8 bytes is added, and becuase of 16-byte aligned
stack, the stack size increases from 112 bytes to 128.
Summary:
For some reason the order in which we call getNegatedExpression
for the involved operands, after a call to isCheaperToUseNegatedFPOps,
seem to matter. This patch includes a new test case in
test/CodeGen/X86/fdiv.ll that crashes if we reverse the order of
those calls. Before this patch that could happen depending on
which compiler that were used when buildind llvm. With my GCC
version (7.4.0) I got the crash, because it seems like it is
using a different order for the argument evaluation compared
to clang.
All other users of isCheaperToUseNegatedFPOps already used this
pattern with unfolded/ordered calls to getNegatedExpression, so
this patch is aligning visitFDIV with the other use cases.
This patch simply deals with the non-determinism for FDIV. While
the underlying problem with getNegatedExpression is discussed
further in D76439.
Reviewers: spatel, RKSimon
Reviewed By: spatel
Subscribers: hiraditya, mgrang, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76319
Summary:
We introduced a way to fallback to the immediately larger size class for
the Primary in the event a region was full, but in the event of the largest
size class, we would just fail.
This change allows to fallback to the Secondary when the last region of
the Primary is full. We also expand the trick to all platforms as opposed
to being Android only, and update the test to cover the new case.
Reviewers: hctim, cferris, eugenis, morehouse, pcc
Subscribers: #sanitizers, llvm-commits
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D76430
Summary:
Currently we custom select add/sub with carry out to scalar form relying on later replacing them to vector form if necessary.
This change enables custom selection code to take the divergence of adde/addc SDNodes into account and select the appropriate form in one step.
Reviewers: arsenm, vpykhtin, rampitec
Reviewed By: arsenm, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa
Differential Revision: https://reviews.llvm.org/D76371
Summary:
This patch implements the following CDE intrinsics:
int8x16_t __arm_vreinterpretq_s8_u8 (uint8x16_t in);
uint16x8_t __arm_vreinterpretq_u16_u8 (uint8x16_t in);
int16x8_t __arm_vreinterpretq_s16_u8 (uint8x16_t in);
uint32x4_t __arm_vreinterpretq_u32_u8 (uint8x16_t in);
int32x4_t __arm_vreinterpretq_s32_u8 (uint8x16_t in);
uint64x2_t __arm_vreinterpretq_u64_u8 (uint8x16_t in);
int64x2_t __arm_vreinterpretq_s64_u8 (uint8x16_t in);
float16x8_t __arm_vreinterpretq_f16_u8 (uint8x16_t in);
float32x4_t __arm_vreinterpretq_f32_u8 (uint8x16_t in);
These intrinsics are header-only because they reuse the existing
MVE vreinterpret clang built-ins.
This set is slightly different from the published specification
(see https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf):
it includes
int8x16_t __arm_vreinterpretq_s8_u8 (uint8x16_t in);
which was unintentionally ommitted from the spec, and
does not include
float64x2_t __arm_vreinterpretq_f64_u8 (uint8x16_t in);
The float64x2_t type requires additional implementation
effort, and we are not including it yet.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76300
Summary:
This patch implements the following intrinsics:
uint8x16_t __arm_vcx1q_u8 (int coproc, uint32_t imm);
T __arm_vcx1qa(int coproc, T acc, uint32_t imm);
T __arm_vcx2q(int coproc, T n, uint32_t imm);
uint8x16_t __arm_vcx2q_u8(int coproc, T n, uint32_t imm);
T __arm_vcx2qa(int coproc, T acc, U n, uint32_t imm);
T __arm_vcx3q(int coproc, T n, U m, uint32_t imm);
uint8x16_t __arm_vcx3q_u8(int coproc, T n, U m, uint32_t imm);
T __arm_vcx3qa(int coproc, T acc, U n, V m, uint32_t imm);
Most of them are polymorphic. Furthermore, some intrinsics are
polymorphic by 2 or 3 parameter types, such polymorphism is not
supported by the existing MVE/CDE tablegen backends, also we don't
really want to have a combinatorial explosion caused by 1000 different
combinations of 3 vector types. Because of this some intrinsics are
implemented as macros involving a cast of the polymorphic arguments to
uint8x16_t.
The IR intrinsics are even more restricted in terms of types: all MVE
vectors are cast to v16i8.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76299
Summary:
This change implements ACLE CDE intrinsics that translate to
instructions working with general-purpose registers.
The specification is available at
https://static.docs.arm.com/101028/0010/ACLE_2019Q4_release-0010.pdf
Each ACLE intrinsic gets a corresponding LLVM IR intrinsic (because
they have distinct function prototypes). Dual-register operands are
represented as pairs of i32 values. Because of this the instruction
selection for these intrinsics cannot be represented as TableGen
patterns and requires custom C++ code.
Reviewers: simon_tatham, MarkMurrayARM, dmgreen, ostannard
Reviewed By: MarkMurrayARM
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76296
Prior to this change, for non-relocatable objects llvm-readobj would
assume that all symbols that corresponded to a stack size section's
entries were in the section specified by the section's sh_link field.
In the presence of an output section description combining
SHF_LINK_ORDER sections linking different output sections, this cannot
be respected, since linker script section patterns are "by name" by
nature. Consequently, the sh_link value would not be correct for all
section entries.
This patch changes llvm-readobj to ignore the section of symbols in a
non-relocatable object.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45228.
Reviewed by: grimar, MaskRay
Differential Revision: https://reviews.llvm.org/D76425
Summary:
This increases the coverage for things that differ between Linux and Windows, such as `-fdelayed-template-parsing`. This would have prevented the rollback of https://reviews.llvm.org/D76346.
While at it, update -std=c++11 to c++17 for the test.
Reviewers: gribozavr2
Reviewed By: gribozavr2
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76433
Summary:
This patch improves step over performance for the case when we are
stepping over a call with a next-branch-breakpoint (see
https://reviews.llvm.org/D58678), and we encounter a stop during the
call. Currently, this causes the thread plan to step-out //each frame//
until it reaches the step-over range. This is a regression introduced by
https://reviews.llvm.org/D58678 (which did improve other things!). Prior
to that change, the step-over plan would always step-out just once.
With this patch, if we find ourselves stopped in a deeper stack frame
and we already have a next branch breakpoint, we simply return from the
step-over plan's ShouldStop handler without pushing the step out plan.
In my experiments this improved the time of stepping over a call that
loads 12 dlls from 14s to 5s. This was in remote debugging scenario with
10ms RTT, the call in question was Vulkan initialization
(vkCreateInstance), which loads various driver dlls. Loading those dlls
must stop on the rendezvous breakpoint, causing the perf problem
described above.
Reviewers: clayborg, labath, jingham
Reviewed By: jingham
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D76216
Seems we do not test how we print relocation addends well.
And the behavior of dumpers does not seem to be ideal here
(and llvm-readelf does not match GNU as the test case shows).
This patch adds a test case to document the current behavior.
Differential revision: https://reviews.llvm.org/D75671
Summary:
Changes:
- handle immediate invocations for constructors.
- add tests
after this patch i believe the implementation of consteval is nearly standard compliant, but IR-gen still needs to be taught not to emit consteval declarations.
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: wchilders
Differential Revision: https://reviews.llvm.org/D74007
This reverts commit e9f22fd429.
When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70
failing tests with this revision, and 29 without this revision.
The MVE VDUP instruction take a GPR and splats into every lane of a
vector register. Unlike NEON we do not have a VDUPLANE equivalent
instruction, doing the same splat from a fp register. Previously a VDUP
to a v4f32/v8f16 would be represented as a (v4f32 VDUP f32), which
would mean the instruction pattern needs to add a COPY_TO_REGCLASS to
the GPR.
Instead this now converts that earlier during an ISel DAG combine,
converting (VDUP x) to (VDUP (bitcast x)). This can allow instruction
selection to tell that the input needs to be an i32, which in one of the
testcases allows it to use ldr (or specifically ldm) over (vldr;vmov).
Whilst being simple enough for floats, as the types sizes are the same,
these is no BITCAST equivalent for getting a half into a i32. This uses
a VMOVrh ARMISD node, which doesn't know the same tricks yet.
Differential Revision: https://reviews.llvm.org/D76292
Floating point positive zero can be selected using fmv.w.x / fmv.d.x /
fcvt.d.w and the zero source register.
Differential Revision: https://reviews.llvm.org/D75729
If a call argument has the "returned" attribute, we can simplify
the call to the value of that argument. This was already partially
handled by InstSimplify/InstCombine for the case where the argument
is an integer constant, and the result is thus known via known bits.
The non-constant (or non-int) argument cases weren't handled though.
This previously landed as an InstSimplify transform, but was reverted
due to assertion failures when compiling the Linux kernel. The reason
is that simplifying a call to another call breaks assumptions in
call graph updating during inlining. As the code is not easy to fix,
and there is no particularly strong motivation for having this in
InstSimplify, the transform is only performed in InstCombine instead.
Differential Revision: https://reviews.llvm.org/D75815
This is the same change as D75824, but for two cases where
InstCombine performs the same optimization: Replacing an instruction
whose bits are fully known with a constant. This is not (generally)
legal for musttail calls.
Differential Revision: https://reviews.llvm.org/D76457
Summary:
This patch split Basic test into multple individual tests to allow simpler
filtering and clearer signal into what's broken when it's broken.
Reviewers: gribozavr2
Reviewed By: gribozavr2
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D76366
This patch sets the stage for supporting both row and column major
layouts for matrixes. It renames ColumnMatrixTy to MatrixTy, adds
booleans indicating the underlying layout to both MatrixTy and ShapeInfo
and generalizes the methods of MatrixTy to support both row and column
major layouts.
Reviewers: Gerolf, anemet, andrew.w.kaylor, LuoYuanke
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D76324
For MemoryPhis, we have to avoid that the MemoryPhi may be executed
before before the access we are currently looking at.
To do this we do a post-order numbering of the basic blocks in the
function and bail out once we reach a MemoryPhi with a larger (or equal)
post-order block number than the current MemoryAccess.
This changes the order in which we visit stores for elimination.
This patch also adds support for exploring multiple paths. We keep a worklist (ToCheck) of memory accesses that might be eliminated by our starting MemoryDef or MemoryPhis for further exploration. For MemoryPhis, we add the incoming values to the worklist, for MemoryDefs we add the defining access.
Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D72148
If there were no free VGPRs we would need two emergency spill slots for register
scavenging during PEI/frame index elimination. Reuse 'ResultReg' for scale
calculation so that only one spill is needed.
Differential Revision: https://reviews.llvm.org/D76387
Apart from the argument registers, set the CostPerUse
value as per the ratio reg_index/allocation_granularity.
It is a pre-commit for introducing the scratch registers
in the ABI. This change should help in a balanced
register allocation.
Differential Revision: https://reviews.llvm.org/D76417
For now, when final suspend can be simplified by simplifySuspendPoint,
handleFinalSuspend is executed as well to remove last case in switch
instruction. This patch fixes it.
Differential Revision: https://reviews.llvm.org/D76345
Passing small data limit to RISCVELFTargetObjectFile by module flag,
So the backend can set small data section threshold by the value.
The data will be put into the small data section if the data smaller than
the threshold.
Differential Revision: https://reviews.llvm.org/D57497