Summary:
New `@test13` in `Attributor/align.ll` is the main motivation - `null` pointer
really does not limit our alignment knowledge, in fact it is fully aligned
since it has no bits set.
Here we don't special-case `null` pointer because it is somewhat controversial
to add one more place where we enforce that `null` pointer is zero,
but instead we do the more general thing of trying to perform constant-fold
of pointer constant to an integer, and perform alignment inferrment on that.
Reviewers: jdoerfert, gchatelet, courbet, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, arphaman, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73131
Pointers of unrecognized address spaces shoudl be treated as
global-like pointers. Even if loads and stores of them aren't handled,
dumb operations that just operate on the bits should work.
Summary:
Multivalue calls both take and return an arbitrary number of
arguments, but ISel only supports one or the other in a single
instruction. To get around this, calls are modeled as two pseudo
instructions during ISel. These pseudo instructions, CALL_PARAMS and
CALL_RESULTS, are recombined into a single CALL MachineInstr in a
custom emit hook.
RegStackification and the MC layer will additionally need to be made
aware of multivalue calls before the tests will produce correct
output.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71496
Summary:
WebAssembly is unique among upstream targets in that it does not at
any point use physical registers to store values. Instead, it uses
virtual registers to model positions in its value stack. This means
that some target-independent lowering activities that would use
physical registers need to use virtual registers instead for
WebAssembly and similar downstream targets. This CL generalizes the
existing `usesPhysRegsForPEI` lowering hook to
`usesPhysRegsForValues` in preparation for using it in more places.
One such place is in InstrEmitter for instructions that have variadic
defs. On register machines, it only makes sense for these defs to be
physical registers, but for WebAssembly they must be virtual registers
like any other values. This CL changes InstrEmitter to check the new
target lowering hook to determine whether variadic defs should be
physical or virtual registers.
These changes are necessary to support a generalized CALL instruction
for WebAssembly that is capable of returning an arbitrary number of
arguments. Fully implementing that instruction will require additional
changes that are described in comments here but left for a follow up
commit.
Reviewers: aheejin, dschuff, qcolombet
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71484
Add support for converting Signaling NaN, and a NaN Payload from string.
The NaNs (the string "nan" or "NaN") may be prefixed with 's' or 'S' for defining a Signaling NaN.
A payload for a NaN can be specified as a suffix.
It may be a octal/decimal/hexadecimal number in parentheses or without.
Differential Revision: https://reviews.llvm.org/D69773
These names have been changed from CamelCase to camelCase, but there were
many places (comments mostly) that still used the old names.
This change is NFC.
We removed UseVSXReg flag in https://reviews.llvm.org/D58685
But we did not reclain the bit 6 it was assigned,
this will become confusing and a hole later..
We should reclaim it as early as possible before new bits.
Reviewed By: sfertile
Differential Revision: https://reviews.llvm.org/D72649
This was unconditionally folding this to the source operand, even if the access was out of bounds. Use undef instead of the extract is not the first element.
This helps with some cases where 3-vectors are legalized and avoids processing the 4th component.
Original Patch by: arsenm (Matt Arsenault)
Differential Revision: https://reviews.llvm.org/D51589
Extends the gather/scatter pass in MVEGatherScatterLowering.cpp to
enable the transformation of masked scatters into calls to MVE's masked
scatter intrinsic.
Differential Revision: https://reviews.llvm.org/D72856
There's no reason to introduce a new, unnaturally sized value
here. This has a chance to produce worse code with
legalization. Avoids regression in a future patch.
a785209bc2 switched to using a pseudos instead of manually tying
operands on the regular instruction. The VGPR indexing mode path
should have the same problems that change attempted to avoid, so these
should use the same strategy.
Use a single pseudo for the VGPR indexing mode and movreld paths, and
expand it based on the subtarget later. These have essentially the
same constraints, reading the index from m0.
Switch from using an offset to the subregister index directly, instead
of computing an offset and re-adding it back. Also add missing pseudos
for existing register class sizes.
Summary:
Follow-up from https://reviews.llvm.org/D71733. Also moved an
initialization to the base class, where it belonged in the first place.
Reviewers: eraman, davidxl
Reviewed By: davidxl
Subscribers: hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72949
Document some high level strategies that should be used for register
bank selection. The constant bus restriction section hasn't actually
been implemented yet.
Differential revision: https://reviews.llvm.org/D72701
The patch adds a new option ABI for Hexagon. It primary deals with
the way variable arguments are passed and is use in the Hexagon Linux Musl
environment.
If a callee function has a variable argument list, it must perform the
following operations to set up its function prologue:
1. Determine the number of registers which could have been used for passing
unnamed arguments. This can be calculated by counting the number of
registers used for passing named arguments. For example, if the callee
function is as follows:
int foo(int a, ...){ ... }
... then register R0 is used to access the argument ' a '. The registers
available for passing unnamed arguments are R1, R2, R3, R4, and R5.
2. Determine the number and size of the named arguments on the stack.
3. If the callee has named arguments on the stack, it should copy all of these
arguments to a location below the current position on the stack, and the
difference should be the size of the register-saved area plus padding
(if any is necessary).
The register-saved area constitutes all the registers that could have
been used to pass unnamed arguments. If the number of registers forming
the register-saved area is odd, it requires 4 bytes of padding; if the
number is even, no padding is required. This is done to ensure an 8-byte
alignment on the stack. For example, if the callee is as follows:
int foo(int a, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 5 (for R1-R5) * 4 (bytes) - 4 (bytes of padding)
If the callee is as follows:
int foo(int a, int b, ...){ ... }
... then the named arguments should be copied to the following location:
current_position - 4 (for R2-R5) * 4 (bytes) - 0 (bytes of padding)
4. After any named arguments have been copied, copy all the registers that
could have been used to pass unnamed arguments on the stack. If the number
of registers is odd, leave 4 bytes of padding and then start copying them
on the stack; if the number is even, no padding is required. This
constitutes the register-saved area. If padding is required, ensure
that the start location of padding is 8-byte aligned. If no padding is
required, ensure that the start location of the on-stack copy of the
first register which might have a variable argument is 8-byte aligned.
5. Decrement the stack pointer by the size of register saved area plus the
padding. For example, if the callee is as follows:
int foo(int a, ...){ ... } ;
... then the decrement value should be the following:
5 (for R1-R5) * 4 (bytes) + 4 (bytes of padding) = 24 bytes
The decrement should be performed before the allocframe instruction.
Increment the stack-pointer back by the same amount before returning
from the function.
This should be the last step needed to solve the problem in the
description of PR44153:
https://bugs.llvm.org/show_bug.cgi?id=44153
If we're casting an FP value to int, testing its signbit, and then
choosing between a value and its negated value, that's a
complicated way of saying "copysign":
(bitcast X) < 0 ? -TC : TC --> copysign(TC, X)
Differential Revision: https://reviews.llvm.org/D72643
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet, nicolasvasilache
Subscribers: hiraditya, jfb, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, csigg, arpith-jacob, mgester, lucyrfox, herhut, liufengdb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73041
As mentioned in D72643, we'd like to be able to assert that any select
of equivalent constants has been removed before we're deep into InstCombine.
But there's a loophole in that assertion for vectors with undef elements
that don't match exactly.
This patch should close that gap. If we have undefs, we can't safely
propagate those unless both constants elements for that lane are undef.
Differential Revision: https://reviews.llvm.org/D72958
Summary:
Loop unroll spends a lot of time in SCEVs processing in case when a function
contains hundreds of simple 'for' loops with a quite complex arrays indexes like
for (int i = 0; i < 8; ++i) {
for (int j = 0; j < 32; ++j) {
C[j*8+i] = B[j*32+i+128] + A[i*64+128];
}
}
for (int i = 0; i < 8; ++i) {
for (int j = 0; j < 8; ++j) {
for (int k = 0; k < 32; ++k) {
D[k*64+i*8+j] = D[k*64+i*8+j] + E[i+16] * C[k*8+j+256];
}
}
}
The patch improves loop unroll speed since isLoopBackedgeGuardedByCond takes
much less time than isLoopEntryGuardedByCond in the edge case.
Reviewers: skatkov, sanjoy, mkazantsev
Reviewed By: sanjoy
Subscribers: fhahn, hiraditya, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72929
Summary:
A recent commit accidentally defined names like `MVE_VMAXAs8` as
instances of the multiclass `MVE_VMINA`, and vice versa. This has no
effect on the test suite, because nothing directly refers to those
instruction names (the isel patterns are generated in Tablegen using
`!cast<Instruction>(NAME)` inside a lower-level multiclass). But it
means that `llvm-mc -show-inst` was listing VMAXA as VMINA, and it
would also affect any further draft code gen patches that use those
instruction ids.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73034
The ACLE distinguishes between the following addressing modes for gather
loads:
* "scalar base, vector offset", and
* "vector base, scalar offset".
For the "vector base, scalar offset" case, the
`int_aarch64_sve_ld1_gather_imm` intrinsic was added in 79f2422d.
Currently, that intrinsic assumes that the scalar offset is passed as an
immediate. As a result, it does not cater for cases where scalar offset
is stored in a register.
In this patch `int_aarch64_sve_ld1_gather_imm` is extended so that all
cases are covered:
* `int_aarch64_sve_ld1_gather_imm` is renamed as
`int_aarch64_sve_ld1_gather_scalar_offset`
* new DAG combine rules are added for GLD1_IMM for scenarios where the
offset is a non-immediate scalar or an out-of-range immediate
* sve-intrinsics-gather-loads-vector-base.ll is renamed as
sve-intrinsics-gather-loads-vector-base-imm-offset.ll
* sve-intrinsics-gather-loads-vector-base-scalar-offset.ll is added to test
file for non-immediate offsets
Similar changes are made for scatter store intrinsics.
Reviewed By: sdesmalen, efriedma
Differential Revision: https://reviews.llvm.org/D71773
Summary: Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.
Reviewers: hsaito, Ayal, fhahn, reames, silvas, dcaballe, SjoerdMeijer, mkuper, DaniilSuchkov
Reviewed By: Ayal, DaniilSuchkov
Subscribers: fedor.sergeev, hiraditya, rkruppe, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D67905
This patch uses helper function rewriteLoopExitValues that is refactored in
D72602 to rematerialise the iteration count in exit blocks, so that we can
clean-up loop update expressions inside the hardware-loops later in
ARMLowOverheadLoops, which is necessary to get actual performance gains for
tail-predicated loops.
Differential Revision: https://reviews.llvm.org/D72714
Summary: Current implementation of getLoopEstimatedTripCount returns 1 iteration less than it should. The reason is that in bottom tested loop first iteration is executed before first back branch is taken. For example for loop with !{!"branch_weights", i32 1 // taken, i32 1 // exit} metadata getLoopEstimatedTripCount gives 1 while actual number of iterations is 2.
Reviewers: Ayal, fhahn
Reviewed By: Ayal
Subscribers: mgorny, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71990
Summary:
This was reverted in 328e0f3dca due to
chromium bot failure. This revision addresses that case.
Original commit message:
Summary:
This patch will provide support for auto return type for the C++ member
functions. Before this return type of the member function is deduced and
stored in the DIE.
This patch includes llvm side implementation of this feature.
Patch by: Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: dblaikie, aprantl, shafik, alok, SouraVX, jini.susan.george
Reviewed by: dblaikie
Differential Revision: https://reviews.llvm.org/D70524
This moves `rewriteLoopExitValues()` from IndVarSimplify to LoopUtils thus
making it a generic loop utility function. This allows to rewrite loop exit
values by just calling this function without running the whole IndVarSimplify
pass.
We use this in D72714 to rematerialise the iteration count in exit blocks, so
that we can clean-up loop update expressions inside the hardware-loops later.
Differential Revision: https://reviews.llvm.org/D72602
The idea is to produce R_X86_64_PLT32 instead of
R_X86_64_PC32 for branches.
It fixes https://bugs.llvm.org/show_bug.cgi?id=44397.
This patch teaches MC to do that for JCC (jump if condition is met)
instructions. The new behavior matches modern GNU as.
It is similar to D43383, which did the same for "call/jmp foo",
but missed JCC cases.
Differential revision: https://reviews.llvm.org/D72831
This adds Post inc variants of the VLD2/4 and VST2/4 instructions in
MVE. It uses the same mechanism/nodes as Neon, transforming the
intrinsic+add pair into a ARMISD::VLD2_UPD, which gets selected to a
post-inc instruction. The code to do that is mostly taken from the
existing Neon code, but simplified as less variants are needed.
It also fills in some getTgtMemIntrinsic for the arm.mve.vld2/4
instrinsics, which allow the nodes to have MMO's, calculated as the full
length to the memory being loaded/stored.
Differential Revision: https://reviews.llvm.org/D71194
We were previously not necessarily favouring postinc for the MVE loads
and stores, leading to extra code prior to the loop to set up the
preinc. MVE in general can benefit from postinc (as we don't have
unrolled loops), and certain instructions like the VLD2's only post-inc
versions are available.
Differential Revision: https://reviews.llvm.org/D70790
StackColoring::remapInstructions() remaps MachineOperand frame index (e.g. %stack.1 -> %stack.0)
but does not remap FixedStackPseudoSourceValue frame index (e.g. store 4 into %stack.1.ap2.i.i)
referenced by MachineMemoryOperand.
This can cause an assertion failure when LiveDebugValues references a dead stack object.
It is difficult to craft a test case. -g, va_copy and stack-coloring are required.
I can only reproduce it on ppc32.
Currently PromoteMaskArithemtic only looks at a single operation to
skip casts. This means we miss cases where we combine multiple masks.
This patch updates PromoteMaskArithemtic to try to recursively promote
AND/XOR/AND nodes that terminate in truncates of the right size or
constant vectors.
Reviewers: craig.topper, RKSimon, spatel
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D72524
Except AMDGPU/R600RegisterInfo (a bunch of MIR tests seem to have
problems), every target overrides it with true. PostMachineScheduler
requires livein information. Not providing it can cause assertion
failures in ScheduleDAGInstrs::addSchedBarrierDeps().
The MaterializationResponsibility::defineMaterializing method allows clients to
add new definitions that are in the process of being materialized to the JIT.
This patch adds support to defineMaterializing for symbols with weak linkage
where the new definitions may be rejected if another materializer concurrently
defines the same symbol. If a weak symbol is rejected it will not be added to
the MaterializationResponsibility's responsibility set. Clients can check for
membership in the responsibility set via the
MaterializationResponsibility::getSymbols() method before resolving any
such weak symbols.
This patch also adds code to RTDyldObjectLinkingLayer to tag COFF comdat symbols
introduced during codegen as weak, on the assumption that these are COFF comdat
constants. This fixes http://llvm.org/PR40074.
This intention is to move patchable-function before aarch64-branch-targets
(configured in AArch64PassConfig::addPreEmitPass) so that we emit BTI before NOPs
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92424).
This also allows addPreEmitPass() passes to know the precise instruction sizes if they want.
Tried x86-64 Debug/Release builds of ccls with -fxray-instrument -fxray-instruction-threshold=1.
No output difference with this commit and the previous commit.
Summary:
I think whatever problem the gluing was fixing has long since been fixed. We don't have any of the restrictions on FP stack stuff that existed back when this was first added.
I had to change which type we use for FILD in BuildFILD when X86 was enabled because most of the isel patterns block f32/f64 instructions when SSE1/SSE2 are enabled. So I needed to use the f80 pattern, but this shouldn't have an effect the generated code since there is only one FILD instruction anyway. We already use f80 explicitly in other other places.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: andrew.w.kaylor, scanon, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72805
The x86-64 General Dynamic TLS code sequence uses prefixes to allow
linker relaxation. Adding segment override prefix or NOPs can break
linker relaxation (ld -pie/-no-pie).
i386 General Dynamic and x86-64 Local Dynamic do not use prefixes, but
for simplicity, just disable auto padding consistently.
Reviewed By: skan, LuoYuanke
Differential Revision: https://reviews.llvm.org/D72878
This makes the SectionLabel handling more resilient - specifically for
future PROPELLER work which will have more CU ranges (rather than just
one per function).
Ultimately it might be nice to make this more general/resilient to
arbitrary labels (rather than relying on the labels being created for CU
ranges & then being reused by ranges, loclists, and possibly other
addresses). It's possible that other (non-rnglist/loclist) uses of
addresses will need the addresses to be in SectionLabels earlier (eg:
move the CU.addRange to be done on function begin, rather than function
end, so during function emission they are already populated for other
use).
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
This is a reland of rG3a05c3969c18 with fixes for the expensive-checks
and Windows builds
Differential Revision: https://reviews.llvm.org/D71681
Currently there are 4 different mechanisms for controlling denormal
flushing behavior, and about as many equivalent frontend controls.
- AMDGPU uses the fp32-denormals and fp64-f16-denormals subtarget features
- NVPTX uses the nvptx-f32ftz attribute
- ARM directly uses the denormal-fp-math attribute
- Other targets indirectly use denormal-fp-math in one DAGCombine
- cl-denorms-are-zero has a corresponding denorms-are-zero attribute
AMDGPU wants a distinct control for f32 flushing from f16/f64, and as
far as I can tell the same is true for NVPTX (based on the attribute
name).
Work on consolidating these into the denormal-fp-math attribute, and a
new type specific denormal-fp-math-f32 variant. Only ARM seems to
support the two different flush modes, so this is overkill for the
other use cases. Ideally we would error on the unsupported
positive-zero mode on other targets from somewhere.
Move the logic for selecting the flush mode into the compiler driver,
instead of handling it in cc1. denormal-fp-math/denormal-fp-math-f32
are now both cc1 flags, but denormal-fp-math-f32 is not yet exposed as
a user flag.
-cl-denorms-are-zero, -fcuda-flush-denormals-to-zero and
-fno-cuda-flush-denormals-to-zero will be mapped to
-fp-denormal-math-f32=ieee or preserve-sign rather than the old
attributes.
Stop emitting the denorms-are-zero attribute for the OpenCL flag. It
has no in-tree users. The meaning would also be target dependent, such
as the AMDGPU choice to treat this as only meaning allow flushing of
f32 and not f16 or f64. The naming is also potentially confusing,
since DAZ in other contexts refers to instructions implicitly treating
input denormals as zero, not necessarily flushing output denormals to
zero.
This also does not attempt to change the behavior for the current
attribute. The LangRef now states that the default is ieee behavior,
but this is inaccurate for the current implementation. The clang
handling is slightly hacky to avoid touching the existing
denormal-fp-math uses. Fixing this will be left for a future patch.
AMDGPU is still using the subtarget feature to control the denormal
mode, but the new attribute are now emitted. A future change will
switch this and remove the subtarget features.
Summary:
Detect a run of memory tagging instructions for adjacent stack frame slots,
and replace them with a shorter instruction sequence
* replace STG + STG with ST2G
* replace STGloop + STGloop with STGloop
This code needs to run when stack slot offsets are already known, but before
FrameIndex operands in STG instructions are eliminated; that's the
reason for the new hook in PrologueEpilogue.
This change modifies STGloop and STZGloop pseudos to take the size as an
immediate integer operand, and adds _untied variants of those pseudos
that are allowed to take the base address as a FI operand. This is needed to
simplify recognizing an STGloop instruction as operating on a stack slot
post-regalloc.
This improves memtag code size by ~0.25%, and it looks like an additional ~0.1%
is possible by rearranging the stack frame such that consecutive STG
instructions reference adjacent slots (patch pending).
Reviewers: pcc, ostannard
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70286
Static method MemoryDependenceResults::getLoadLoadClobberFullWidthSize
does not have or use any info specific to MemoryDependenceResults.
Move it to its only user: VNCoercion.
This is an alternative to the continous mode that was implemented in
D68351. This mode relies on padding and the ability to mmap a file over
the existing mapping which is generally only available on POSIX systems
and isn't suitable for other platforms.
This change instead introduces the ability to relocate counters at
runtime using a level of indirection. On every counter access, we add a
bias to the counter address. This bias is stored in a symbol that's
provided by the profile runtime and is initially set to zero, meaning no
relocation. The runtime can mmap the profile into memory at abitrary
location, and set bias to the offset between the original and the new
counter location, at which point every subsequent counter access will be
to the new location, which allows updating profile directly akin to the
continous mode.
The advantage of this implementation is that doesn't require any special
OS support. The disadvantage is the extra overhead due to additional
instructions required for each counter access (overhead both in terms of
binary size and performance) plus duplication of counters (i.e. one copy
in the binary itself and another copy that's mmapped).
Differential Revision: https://reviews.llvm.org/D69740
As of D70146 lld GCs comdats as a group and no longer considers notes in
comdats to be GC roots, so we need to move the note to a comdat with a GC root
section (.init_array) in order to prevent lld from discarding the note.
Differential Revision: https://reviews.llvm.org/D72936
Extend -fxray-instrumentation-bundle to split function-entry and
function-exit into two separate options, so that it is possible to
instrument only function entry or only function exit. For use cases
that only care about one or the other this will save significant overhead
and code size.
Differential Revision: https://reviews.llvm.org/D72890
XRay allows tuning by minimum function size, but also always instruments
functions with loops in them. If the minimum function size is set to a
large value the loop instrumention ends up causing most functions to be
instrumented anyway. This adds a new flag, xray-ignore-loops, to disable
the loop detection logic.
Differential Revision: https://reviews.llvm.org/D72659
[this re-applies c0176916a4
with the correct commit message and phabricator link]
This addresses point 1 of PR44213.
https://bugs.llvm.org/show_bug.cgi?id=44213
The DW_AT_LLVM_sysroot attribute is used for Clang module debug info,
to allow LLDB to import a Clang module from source. Currently it is
part of each DW_TAG_module, however, it is the same for all modules in
a compile unit. It is more efficient and less ambiguous to store it
once in the DW_TAG_compile_unit.
This should have no effect on DWARF consumers other than LLDB.
Differential Revision: https://reviews.llvm.org/D71732
Summary:
* Pass the Scalability test to VectorType::get in order to be
able to deserialize bitcode that contains scalable vector operations
Change-Id: I37fe5b1c0c237a9153130deefdc1a6d595c7f12e
Reviewers: efriedma, pcc, sdesmalen, apazos, huihuiz, chrisj
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72792
This is a purely cosmetic change that is NFC in terms of the binary
output. I bugs me that I called the attribute DW_AT_LLVM_isysroot
since the "i" is an artifact of GCC command line option syntax
(-isysroot is in the category of -i options) and doesn't carry any
useful information otherwise.
This attribute only appears in Clang module debug info.
Differential Revision: https://reviews.llvm.org/D71722
During the SeparateConstOffsetFromGEP pass, signed extensions are distributed
to the values that feed into them and then later recombined. The recombination
stage is somewhat problematic- it doesn't differ add and sub instructions
from another when matching the sext(a) +/- sext(b) -> sext(a +/- b) pattern
in some instances.
An example- the IR contains:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extA + extB
The problematic optimization will transform that into:
%unextendedA
%unextendedB
%subuAuB = unextendedA - unextendedB
%extA = extend A
%extB = extend B
%addeAeB = extend subuAuB ; Obviously not semantically equivalent to the IR input.
This patch fixes that.
Patch by Drew Wock <drew.wock@sas.com>
Differential Revision: https://reviews.llvm.org/D65967
Fixes https://bugs.llvm.org/show_bug.cgi?id=44552. We need to make
sure that the store is reprocessed, because performing DSE may
expose more DSE opportunities.
There is a slight caveat here though: We need to make sure that we
add back the store the worklist first, because that means it will
be processed after the operands of the removed store have been
processed. This is a general bug in InstCombine worklist management
that I hope to address at some point, but for now it means we need
to do this manually rather than just returning the instruction as
changed.
Differential Revision: https://reviews.llvm.org/D72807
There are two related bugs here: First, we don't add the operand
we're replacing to the worklist, which means it may not get DCEd
(see test change). Second, usually this would just get picked up
in the next iteration, but we also do not report the instruction
as changed. This means that we do not get that extra instcombine
iteration, and more importantly, may break the pass pipeline, as
the function is not marked as changed.
Differential Revision: https://reviews.llvm.org/D72864
Currently, there is no way to disable ExpensiveCombines when doing
a standalone opt -instcombine run, as that's the default, and the
opt option can currently only be used to force enable, not to force
disable. The only way to disable expensive combines is via -O1 or -O2,
but that of course also runs the rest of the kitchen sink...
This patch allows using opt -instcombine -expensive-combines=0 to
run InstCombine without ExpensiveCombines.
Differential Revision: https://reviews.llvm.org/D72861
Currently the lowering for i16 image coordinates asserts on gfx10. I'm
somewhat confused by this though. The feature is missing from the
gfx10 feature lists, but the a16 bit appears to be present in the
manual for MIMG instructions.
This is another part of a problem noted in PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024
The AVX2 code may use awkward 256-bit shuffles vs. the AVX code that gets split
into the expected 128-bit unpack instructions. We have to be selective in
matching the types where we try to do this though. Otherwise, we can end up
with more instructions (in the case of v8x32/v4x64).
Differential Revision: https://reviews.llvm.org/D72575
There was a change to trap1 instruction between v62 and v65. This
feature will allow the assembler/disassembler to handle different
variants depending on the CPU version.
Most X87 compare instructions write to the X87 status word, while the SSE (U)COMI compares write to rFLAGS. These are often handled very differently on CPUs (e.g. rFLAGS outputs typically involve a fpu2gpr transfer), and we shouldn't be grouping all these instructions behind a single class - so this patch splits off the SSE compares into a new WriteFComX class (and currently keeps the same behaviours). If there's a need to distinguish between X87 instructions more closely we can investigate that in the future, but as we don't handle any of the X87 side effects at the moment its unlikely to have any notable effect.
Introduce a method to walk through use-def chains to decide whether
it's possible to remove a given instruction and its users. These
instructions are then stored in a set until the end of the transform
when they're erased. This is now used to perform checks on the
iteration count (LoopDec chain), element count (VCTP chain) and the
possibly redundant iteration count.
As well as being able to remove chains of instructions, we know also
check that the sub feeding the vctp is producing the expected value.
Differential Revision: https://reviews.llvm.org/D71837
Blindly following unique-successors chain appeared to be a bad idea.
In a degenerate case when block jumps to itself that goes into endless loop.
Discovered this problem when playing with additional changes,
managed to reproduce it on existing LoopPredication code.
Fix by checking a "visited" set while iterating through unique successors.
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72908
Add DemandedElts handling to ISD::ANY_EXTEND and add missing ISD::ANY_EXTEND_VECTOR_INREG handling. Despite the lack of test changes this code IS being used - its just that the ANY_EXTEND ops are legalized later on (typically to ZERO_EXTEND equivalents) so we typically manage to combine later on.
GCC will accept any case for assembler directives.
For example ".abort" and ".ABORT" (even ".aBoRt")
are equivalent.
https://sourceware.org/binutils/docs/as/Pseudo-Ops.html#Pseudo-Ops
"The names are case insensitive for most targets,
and usually written in lower case."
Change llvm-mc to accept any case for generic directives
or aliases of those directives.
This for Bugzilla #39527.
Differential Revision: https://reviews.llvm.org/D72686
Summary:
Several SVE intrinsics with immediate arguments (including those
added by D70253 & D70437) do not use the ImmArg property.
This patch adds ImmArg<Op> where required and changes
the appropriate patterns which match the immediates.
Reviewers: efriedma, sdesmalen, andwar, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72612
This include file was created in October and has a "using namespace llvm". This seems to get exposed to other include files and finally onto cpp files. While this somewhat okay for llvm itself, its bad for other projects that use llvm as a library and includes a header file that picks this up. This was found by ISPC which has some class names at gloal scope with the same names as LLVM.
It looks like RISCV accidentally became dependent on this. I fixed it by reordering some includes in the RISCV code, but maybe we want to change the TableGenEmitter to put "namespace llvm {" in the generated file instead? But we probably want to do the simplest thing first so we can merge it to 10.0.
Differential Revision: https://reviews.llvm.org/D72895
This is (more?) usable by GDB pretty printers and seems nicer to write.
There's one tricky caveat that in C++14 (LLVM's codebase today) the
static constexpr member declaration is not a definition - so odr use of
this constant requires an out of line definition, which won't be
provided (that'd make all these trait classes more annoyidng/expensive
to maintain). But the use of this constant in the library implementation
is/should always be in a non-odr context - only two unit tests needed to
be touched to cope with this/avoid odr using these constants.
Based on/expanded from D72590 by Christian Sigg.
Given the following situation:
x = G_FCONSTANT (something that can't be materialized)
G_STORE x, some_addr
We know that x must be materialized as at least a single mov. However, at the
time of selection, the G_STORE will have been regbankselected to a FPR store.
So, as a result, you'll get an unnecessary fmov into the G_STORE.
Storing a constant value in a GPR and a constant value in a FPR are the same.
So, whenever you see a G_FCONSTANT that feeds into only G_STORES, so might as
well make it a G_CONSTANT.
This adds a target-specific combine which changes G_FCONSTANTs feeding into
G_STOREs into G_CONSTANTs.
Differential Revision: https://reviews.llvm.org/D72814
This case can be handled as a regular selection pattern, so move it
out of the weird post-isel folding code which doesn't have an exactly
equivalent place in GlobalISel.
I think it doesn't make much sense to do this optimization here
though, and it would be more useful in instcombine. There's not really
any new information that will be gained during lowering since these
inputs were known from the beginning.
This change has 2 components:
Target-independent: add a method getDwarfFrameBase to TargetFrameLowering. It
describes how the Dwarf frame base will be encoded. That can be a register (the
default), the CFA (which replaces NVPTX-specific logic in DwarfCompileUnit), or
a DW_OP_WASM_location descriptr.
WebAssembly: Allow WebAssemblyFunctionInfo::getFrameRegister to return the
correct virtual register instead of FP32/SP32 after WebAssemblyReplacePhysRegs
has run. Make WebAssemblyExplicitLocals store the local it allocates for the
frame register. Use this local information to implement getDwarfFrameBase
The result is that the DW_AT_frame_base attribute is correctly encoded for each
subprogram, and each param and local variable has a correct DW_AT_location that
uses DW_OP_fbreg to refer to the frame base.
Differential Revision: https://reviews.llvm.org/D71681
We lifted this code from InstCombine for general usage in:
rL369842
...but it's not safe as-is. There are no existing users that can
trigger this bug, but I discovered it via crashing several
regression tests when trying to use it for select folding in
InstSimplify.
ICmp requires (vector) integer types, so give up on anything that's
not integer or FP (pointers and ?) then bitcast the constants
before trying the match. That matches the definition of "equal or
undef" that I was looking for. If someone wants an FP-aware version
of equality (deal with NaN, -0.0), that could be a different mode
or different function.
Differential Revision: https://reviews.llvm.org/D72784
Introduce parsing, add a few instances of parameter use into GVN-PRE tests.
Reviewers: skatkov, asbirlea
Reviewed By: skatkov
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72752
This was assuming the narrow target was the source type. Respect the
requested type when these don't match by using intermediate
merges. This avoids producing very wide, illegal shift expansions.
The algorithm here only works if the sint_to_fp doesn't do any
rounding. Otherwise it can round before the offset fixup is
applied. Add an assert to protect this.
To avoid breaking the one test in tree that tested this code
with a set of types that fail the assert, I've enabled i32->f32
to use the i64->f32 algorithm. This only occurs when f64 isn't
a legal type. If f64 is legal then we do i32->f64->f32 instead.
Differential Revision: https://reviews.llvm.org/D72794
This was dropping the invariant metadata on dead argument loads, so
they weren't deleted.
Atomics still need to be fixed the same way. Also, apparently store
was never preserving dereferencable which should also be fixed.
This reverts commit 3f3017e because there's a failure on peel-loop-nests.ll
with LLVM_ENABLE_EXPENSIVE_CHECKS on.
Differential Revision: https://reviews.llvm.org/D70304
Summary:
The `llc` tool currently defaults to Static relocation model and generates non-relocatable code for 32-bit Power.
This is not desirable on AIX where we always generate Position Independent Code (PIC). This patch makes PIC the default relocation model for AIX.
Reviewers: daltenty, hubert.reinterpretcast, DiggerLin, Xiangling_L, sfertile
Reviewed By: hubert.reinterpretcast
Subscribers: mgorny, wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72479
There are a few global (cl::opt) controls that enable optional
behavior in GVN. Introduce GVNOptions that provide corresponding
per-pass instance controls.
That will allow to use GVN multiple times in pipeline each time
with different settings.
Reviewers: asbirlea, rnk, reames, skatkov, fhahn
Reviewed By: fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72732
Summary:
The old pass manager separated speed optimization and size optimization
levels into two unsigned values. Coallescing both in an enum in the new
pass manager may lead to unintentional casts and comparisons.
In particular, taking a look at how the loop unroll passes were constructed
previously, the Os/Oz are now (==new pass manager) treated just like O3,
likely unintentionally.
This change disallows raw comparisons between optimization levels, to
avoid such unintended effects. As an effect, the O{s|z} behavior changes
for loop unrolling and loop unroll and jam, matching O2 rather than O3.
The change also parameterizes the threshold values used for loop
unrolling, primarily to aid testing.
Reviewers: tejohnson, davidxl
Reviewed By: tejohnson
Subscribers: zzheng, ychen, mehdi_amini, hiraditya, steven_wu, dexonsmith, dang, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72547
Recommitting e93e0d413f after reverting due to test failures, which
will hopefully now be fixed. Original commit message:
After expanding the pseudo instructions, update the liveness info.
We do this in a post-order traversal of the loop, including its
exit blocks and preheader(s).
Differential Revision: https://reviews.llvm.org/D72131