Rework getConstantstVRegValWithLookThrough in order to make it clear if we
are matching integer/float constant only or any constant(default).
Add helper functions that get DefVReg and APInt/APFloat from constant instr
getIConstantVRegValWithLookThrough: integer constant, only G_CONSTANT
getFConstantVRegValWithLookThrough: float constant, only G_FCONSTANT
getAnyConstantVRegValWithLookThrough: either G_CONSTANT or G_FCONSTANT
Rename getConstantVRegVal and getConstantVRegSExtVal to getIConstantVRegVal
and getIConstantVRegSExtVal. These now only match G_CONSTANT as described
in comment.
Relevant matchers now return both DefVReg and APInt/APFloat.
Replace existing uses of getConstantstVRegValWithLookThrough and
getConstantVRegVal with new helper functions. Any constant match is
only required in:
ConstantFoldBinOp: for constant argument that was bit-cast of float to int
getAArch64VectorSplat: AArch64::G_DUP operands can be any constant
amdgpu select for G_BUILD_VECTOR_TRUNC: operands can be any constant
In other places use integer only constant match.
Differential Revision: https://reviews.llvm.org/D104409
getMetadata() currently uses a weird API where it populates a
structure passed to it, and optionally merges into it. Instead,
we can return the AAMDNodes and provide a separate merge() API.
This makes usages more compact.
Differential Revision: https://reviews.llvm.org/D109852
Introduce a new command-line flag `-swift-async-fp={auto|always|never}`
that controls how code generation sets the Swift extended async frame
info bit. There are three possibilities:
* `auto`: which determines how to set the bit based on deployment target, either
statically or dynamically via `swift_async_extendedFramePointerFlags`.
* `always`: the default, always set the bit statically, regardless of deployment
target.
* `never`: never set the bit, regardless of deployment target.
Patch by Doug Gregor <dgregor@apple.com>
Reviewed By: doug.gregor
Differential Revision: https://reviews.llvm.org/D109392
If a loop count was initially represented by a 32b unsigned int in C
then the hardware-loop pass can recognise the loop guard and insert
the llvm.test.set.loop.iterations intrinsic. If this was instead a
unsigned short/char then clang inserts a zext instruction to expand
the loop count to an i32. This patch adds the necessary pattern
matching to enable the use of lvm.test.set.loop.iterations in those
cases.
Patch by: sherwin-dc
Differential Revision: https://reviews.llvm.org/D109631
New field `elements` is added to '!DIImportedEntity', representing
list of aliased entities.
This is needed to dump optimized debugging information where all names
in a module are imported, but a few names are imported with overriding
aliases.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D109343
The doc comment for isPredecessor says:
Returns true if \p DefMI precedes \p UseMI or they are the same
instruction.
And dominates relies on that behavior for its own:
Returns true if \p DefMI dominates \p UseMI. By definition an
instruction dominates itself.
Make both statements correct by fixing isPredecessor.
Found by inspection.
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.
There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.
The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.
Ignore dbg instructions when collecting stack slot markers. This is
to make sure the coloring is invariant regarding presence of dbg
instructions (even in cases when the dbg instructions might be
badly placed in the input).
Differential Revision: https://reviews.llvm.org/D109758
APInt is used to describe a bit mask in a variety of value tracking and demanded bits/elts functions.
When traversing through dst/src operands, we have a number of places where these masks need to widened/narrowed to translate through bitcasts, reductions etc. to a different type.
This patch add a APIntOps::ScaleBitMask common helper, adds unit test coverage, and updates a number of cases to use the the helper instead of their own implementation.
This came up on D109065 where we currently have to add yet another implementation of the same code.
Differential Revision: https://reviews.llvm.org/D109683
This extends the custom lowering for extending loads on
fixed length vectors in SVE to support masked extending loads.
The existing tests for correct behaviour of masked extending loads
exhibit bad code generation due to the legalistaion of i1 vectors.
They have been left as-is and new tests have been added that do not
exhibit this behaviour.
Differential Revision: https://reviews.llvm.org/D108200
Rather than inspecting the pointer element type, use the access
type of the load/store/atomicrmw/cmpxchg.
In the process of doing this, simplify the logic by storing the
address + type in MemoryUses, rather than an Instruction + Operand
pair (which was then used to fetch the address).
Always use the byval/inalloca/preallocated type (which is required
nowadays), don't fall back on the pointer element type.
This requires adding Function::getParamPreallocatedType() to
mirror the CallBase API, so that the templated code can work with
both.
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type requires widening.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D109509
This patch implements legalization of EXTRACT_SUBVECTOR for the case
where the result needs promoting, and the input type is either legal
or requires splitting.
The idea is that the operation is broken down into simpler steps,
by first extracting a smaller subvector until the input vector
becomes legal or requires promotion.
Reviewed By: CarolineConcatto
Differential Revision: https://reviews.llvm.org/D109313
Soft deprecrate isNullValue/isAllOnesValue and update in tree
callers. This matches the changes to the APInt interface from
D109483.
Reviewed By: lattner
Differential Revision: https://reviews.llvm.org/D109535
Follow up to suggestions in D109103 via hans:
I think UnreachableDefault (or UnreachableFallthrough) would be a
better name now, since it doesn't just omit the range check, it also
omits the last bit test.
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D109455
This renames the primary methods for creating a zero value to `getZero`
instead of `getNullValue` and renames predicates like `isAllOnesValue`
to simply `isAllOnes`. This achieves two things:
1) This starts standardizing predicates across the LLVM codebase,
following (in this case) ConstantInt. The word "Value" doesn't
convey anything of merit, and is missing in some of the other things.
2) Calling an integer "null" doesn't make any sense. The original sin
here is mine and I've regretted it for years. This moves us to calling
it "zero" instead, which is correct!
APInt is widely used and I don't think anyone is keen to take massive source
breakage on anything so core, at least not all in one go. As such, this
doesn't actually delete any entrypoints, it "soft deprecates" them with a
comment.
Included in this patch are changes to a bunch of the codebase, but there are
more. We should normalize SelectionDAG and other APIs as well, which would
make the API change more mechanical.
Differential Revision: https://reviews.llvm.org/D109483
This moves one mid-size function out of line, inlines the
trivial tcAnd/tcOr/tcXor/tcComplement methods into their only
caller, and moves the magic/umagic functions into SelectionDAG
since they are implementation details of its algorithm. This
also removes the unit tests for magic, but these are already
tested in the divide lowering logic for various targets.
This also upgrades some C style comments to C++.
Differential Revision: https://reviews.llvm.org/D109476
This is a port of a combine which matches a pattern where a wide type scalar
value is stored by several narrow stores. It folds it into a single store or
a BSWAP and a store if the targets supports it.
Assuming little endian target:
i8 *p = ...
i32 val = ...
p[0] = (val >> 0) & 0xFF;
p[1] = (val >> 8) & 0xFF;
p[2] = (val >> 16) & 0xFF;
p[3] = (val >> 24) & 0xFF;
=>
*((i32)p) = val;
On CTMark AArch64 -Os this results in a good amount of savings:
Program before after diff
SPASS 412792 412788 -0.0%
kc 432528 432512 -0.0%
lencod 430112 430096 -0.0%
consumer-typeset 419156 419128 -0.0%
bullet 475840 475752 -0.0%
tramp3d-v4 367760 367628 -0.0%
clamscan 383388 383204 -0.0%
pairlocalalign 249764 249476 -0.1%
7zip-benchmark 570100 568860 -0.2%
sqlite3 287628 286920 -0.2%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D109419
Otherwise we end up with an extra conditional jump, following by an
unconditional jump off the end of a function. ie.
bb.0:
BT32rr ..
JCC_1 %bb.4 ...
bb.1:
BT32rr ..
JCC_1 %bb.2 ...
JMP_1 %bb.3
bb.2:
...
bb.3.unreachable:
bb.4:
...
Should be equivalent to:
bb.0:
BT32rr ..
JCC_1 %bb.4 ...
JMP_1 %bb.2
bb.1:
bb.2:
...
bb.3.unreachable:
bb.4:
...
This can occur since at the higher level IR (Instruction) SwitchInsts
are required to have BBs for default destinations, even when it can be
deduced that such BBs are unreachable.
For most programs, this isn't an issue, just wasted instructions since the
unreachable has been statically proven.
The x86_64 Linux kernel when built with CONFIG_LTO_CLANG_THIN=y fails to
boot though once D106056 is re-applied. D106056 makes it more likely
that correlation-propagation (CVP) can deduce that the default case of
SwitchInsts are unreachable. The x86_64 kernel uses a binary post
processor called objtool, which emits this warning:
vmlinux.o: warning: objtool: cfg80211_edmg_chandef_valid()+0x169: can't
find jump dest instruction at .text.cfg80211_edmg_chandef_valid+0x17b
I haven't debugged precisely why this causes a failure at boot time, but
fixing this very obvious jump off the end of the function fixes the
warning and boot problem.
Link: https://bugs.llvm.org/show_bug.cgi?id=50080
Fixes: https://github.com/ClangBuiltLinux/linux/issues/679
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1440
Reviewed By: hans
Differential Revision: https://reviews.llvm.org/D109103
When handling register spill for indirect debug value LiveDebugValues pass doesn't add
DW_OP_deref operator which may in some cases cause debugger to return value address, instead
of value while machine register holding that address is spilled.
Differential revision: https://reviews.llvm.org/D109142
This patch extends the preliminary support for vector-predicated (VP)
operation legalization to include promotion of illegal integer vector
types.
Integer promotion of binary VP operations is relatively simple and
piggy-backs on the non-VP logic, but passing the two extra mask and VP
operands through to the promoted operation.
Tests have been added to the RISC-V target to cover the basic scenarios
for integer promotion for both fixed- and scalable-vector types.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108288
In preparation for passing the MCSubtargetInfo (STI) through to writeNops
so that it can use the STI in operation at the time, we need to record the
STI in operation when a MCAlignFragment may write nops as padding. The
STI is currently unused, a further patch will pass it through to
writeNops.
There are many places that can create an MCAlignFragment, in most cases
we can find out the STI in operation at the time. In a few places this
isn't possible as we are in initialisation or finalisation, or are
emitting constant pools. When possible I've tried to find the most
appropriate existing fragment to obtain the STI from, when none is
available use the per module STI.
For constant pools we don't actually need to use EmitCodeAlign as the
constant pools are data anyway so falling through into it via an
executable NOP is no better than falling through into data padding.
This is a prerequisite for D45962 which uses the STI to emit the
appropriate NOP for the STI. Which can differ per fragment.
Note that involves an interface change to InitSections. It is now
called initSections and requires a SubtargetInfo as a parameter.
Differential Revision: https://reviews.llvm.org/D45961
Legalizing G_MUL for non-standard types (like i33) generated an error. Putting
minScalar and maxScalar instead of clampScalar. Also using new rule, instead
of widening to the next power of 2, widen to the next multiple of the passed
argument (32 in this case), so instead of widening i65 to i128, we widen it to
i96.
Patch by: Mateja Marjanovic
Differential Revision: https://reviews.llvm.org/D109228
Add implementation for the legalization of G_ROTL and G_ROTR machine
instructions. They are very similar to funnel shift instructions, the only
difference is funnel shifts have 3 operands, whereas rotate instructions have
two operands, the first being the register that is being rotated and the second
being the number of shifts. The legalization of G_ROTL/G_ROTR is just lowering
them into funnel shift instructions if they are legal.
Patch by: Mateja Marjanovic
Differential Revision: https://reviews.llvm.org/D105347
Legalize G_MEMCPY, G_MEMMOVE, G_MEMSET and G_MEMCPY_INLINE.
Corresponding intrinsics are replaced by a loop that uses loads/stores in
AMDGPULowerIntrinsics pass unless their length is a constant lower then
MemIntrinsicExpandSizeThresholdOpt (default 1024). Any G_MEM* instruction that
reaches legalizer should have a const length argument and should be expanded
into appropriate number of loads + stores.
Differential Revision: https://reviews.llvm.org/D108357
This patch adds support for the vector-predicated `VP_STORE` and
`VP_LOAD` nodes. We do this in the same way we lower `MSTORE` and
`MLOAD`: to regular load/store instructions via intrinsics.
One necessary change was made to `SelectionDAGLegalize` so that
`VP_STORE` nodes' operation actions are taken from the stored "value"
operands, in the same vein as `STORE` or `MSTORE`.
Reviewed By: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D108999
This patch adds support for the `VP_SCATTER` and `VP_GATHER` nodes by
lowering them to RVV's `vsox`/`vlux` instructions, respectively. This
process is almost identical to the existing `MSCATTER`/`MGATHER` support.
One extra change was made to `SelectionDAGLegalize` so that
`VP_SCATTER`'s operation action is derived from its stored "value"
operand rather than its return type (which is always the chain).
Reviewed By: craig.topper, rogfer01
Differential Revision: https://reviews.llvm.org/D108987
In case of a virtual register tied to a phys-def, the register class needs to
be computed. Make sure that this works generally also with fast regalloc by
using TLI.getRegClassFor() whenever possible, and make only the case of
'Untyped' use getMinimalPhysRegClass().
Fixes https://bugs.llvm.org/show_bug.cgi?id=51699.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D109291
Given a select_cc producing a constant and a invertion of the constant
for a comparison more than zero, we can produce an xor with ashr
instead, which produces smaller code. The ashr either sets all bits or
clear all bits depending on if the value is negative. This is then xor'd
with the constant to optionally negate the value.
https://alive2.llvm.org/ce/z/DTFaBZ
This includes a OneUseCheck on the Cmp, which seems to make thinks a
little worse and will be removed in a followup.
Differential Revision: https://reviews.llvm.org/D109149
Pulled out of D109149, this folds set_cc seteq (ashr X, BW-1), -1 ->
set_cc setlt X, 0 to prevent some regressions later on when folding
select_cc setgt X, -1, C, ~C -> xor (ashr X, BW-1), C
Differential Revision: https://reviews.llvm.org/D109214
The temporary object was used as a workaround when the target parser may
change STI. D14346 made the MCSubtargetInfo argument to
createMCAsmParser const, so we no longer need the temporary object.
This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D109161
This adds the following combines:
```
x = ... 0 or 1
c = icmp eq x, 1
->
c = x
```
and
```
x = ... 0 or 1
c = icmp ne x, 0
->
c = x
```
When the target's true value for the relevant types is 1.
This showed up in the following situation:
https://godbolt.org/z/M5jKexWTW
SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.
This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)
Differential Revision: https://reviews.llvm.org/D109130
This add support for SjLj using Wasm exception handling instructions:
https://github.com/WebAssembly/exception-handling/blob/master/proposals/exception-handling/Exceptions.md
This does not yet support the mixed use of EH and SjLj within a
function. It will be added in a follow-up CL.
This currently passes all SjLj Emscripten tests for wasm0/1/2/3/s,
except for the below:
- `test_longjmp_standalone`: Uses Node
- `test_dlfcn_longjmp`: Uses NodeRAWFS
- `test_longjmp_throw`: Mixes EH and SjLj
- `test_exceptions_longjmp1`: Mixes EH and SjLj
- `test_exceptions_longjmp2`: Mixes EH and SjLj
- `test_exceptions_longjmp3`: Mixes EH and SjLj
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D108960
The semantics of tail predication loops means that the value of LR as an
instruction is executed determines the predicate. In other words:
mov r3, #3
DLSTP lr, r3 // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.
This means that the value of lr cannot be spilled and re-used in tail
predication regions without potentially altering the behaviour of the
program. More lanes than required could be stored, for example, and in
the case of a gather those lanes might not have been setup, leading to
alignment exceptions.
This patch adds a new lr predicate operand to MVE instructions in order
to keep a reference to the lr that they use as a tail predicate. It will
usually hold the zeroreg meaning not predicated, being set to the LR phi
value in the MVETPAndVPTOptimisationsPass. This will prevent it from
being spilled anywhere that it needs to be used.
A lot of tests needed updating.
Differential Revision: https://reviews.llvm.org/D107638
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
This patch extends D107904's introduction of vector-predicated (VP)
operation legalization to include vector splitting.
When the result of a binary VP operation needs splitting, all of its
operands are split in kind. The two operands and the mask are split as
usual, and the vector-length parameter EVL is "split" such that the low
and high halves each execute the correct number of elements.
Tests have been added to the RISC-V target to show splitting several
scenarios for fixed- and scalable-vector types. Without support for
`umax` (e.g. in the `B` extension) the generated code starts to branch.
Ideally a cost model would prevent their insertion in the first place.
Through these tests many opportunities for better codegen can be seen:
combining known-undef VP operations and for constant-folding operations
on `ISD::VSCALE`, to name but a few.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D107957
I believe, the profitability reasoning here is correct
"sub"reg is already located within the 0'th subreg of wider reg,
so if we have suvector insertion at index 0 into undef,
then it's always free do to.
After this, D109065 finally avoids the regression in D108382.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D109074
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
0x0000002a: DW_TAG_variable
DW_AT_name ("g1")
DW_AT_type (0x00000052 "int")
DW_AT_external (true)
DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
DW_AT_decl_line (8)
DW_AT_location (DW_OP_addr 0x0)
0x0000003f: DW_TAG_LLVM_annotation
DW_AT_name ("btf_tag")
DW_AT_const_value ("tag1")
0x00000048: DW_TAG_LLVM_annotation
DW_AT_name ("btf_tag")
DW_AT_const_value ("tag2")
0x00000051: NULL
In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.
[1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html
Differential Revision: https://reviews.llvm.org/D106621
When RA eliminated a dead def it can either immediately delete
the instruction itself or replace it with KILL to defer the
actual removal. If this instruction has a virtual register use
killing the register it will shrink the LI of the use. However,
if the LI covers the instruction and extends beyond it the
shrink will not happen. In fact that is impossible to shrink
such use because of the KILL still using it.
If later the LI of the use will be split at the KILL and the
KILL itself is eliminated after that point the new live segment
ends up at an invalid slot index.
This extremely rare condition was hit after D106408 which has
enabled rematerialization of such instructions. The replacement
with KILL is only done for rematerialized defs which became dead
and such rematerialization did not generally happen before.
The patch deletes an instruction immediately if it is a result
of rematerialization and has such use. An alternative would be
to prohibit a split at a KILL instruction, but it looks like it
is better to split a live range rather then keeping a killed
instruction just in case it can be rematerialized further.
Fixes PR51655.
Differential Revision: https://reviews.llvm.org/D108951
As noted in the comments in D108227, using G_FPTOSI produces wrong results for
G_ISNAN. Drop the G_FPTOSI and perform the operation on integer types.
Elsewhere in LLVM, a bitcast would be the appropriate choice (as it is in SDAG).
GlobalISel does not distinguish between integer and FP types, so a bitcast would
be meaningless here.
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D105871
Instead of splitting off the fp16 to float conversion and generating
a libcall, we should split the operation into fp16 to float and float
to integer operations. This will allow the float to integer conversion
to go through any custom handling the target has. If the target doesn't
have custom handling then we should come back to ExpandIntRes_FP_TO_SINT/
ExpandIntRes_FP_TO_UINT automatically to create the libcall.
This avoids generating libcalls on 32-bit X86. These library functions may
not exist in 32-bit libgcc. At least for LLVM, we never generate them when
hardware floating point instructions are available.
Differential Revision: https://reviews.llvm.org/D108933
When expanding a SMULFIXSAT ISD node (usually originating from
a smul.fix.sat intrinsic) we've applied some optimizations for
the special case when the scale is zero. The idea has been that
it would be cheaper to use an SMULO instruction (if legal) to
perform the multiplication and at the same time detect any overflow.
And in case of overflow we could use some SELECT:s to replace the
result with the saturated min/max value. The only tricky part
is to know if we overflowed on the min or max value, i.e. if the
product is positive or negative. Unfortunately the implementation
has been incorrect as it has looked at the product returned by the
SMULO to determine the sign of the product. In case of overflow that
product is truncated and won't give us the correct sign bit.
This patch is adding an extra XOR of the multiplication operands,
which is used to determine the sign of the non truncated product.
This patch fixes PR51677.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D108938
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.
This is a followup to D108076.
Differential Revision: https://reviews.llvm.org/D108875
Md5 hashing is expansive. Using a hash map to look up already computed GUID for dwarf names. Saw a 2% build time improvement on an internal large application.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D108722
The check for whether a rotate is possible occurs before the
memory legality checks for the integer type. So it's possible we
decide we can use a rotate, but then fail the legality checks. If
that happens we should not fall back to a vector type. This triggers
an assertion in the rotate handling when it finds a vector type
instead of an integer type.
In theory we could use a shufflevector in place of the rotate, but
right now I'd just like to fix the crash.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D108839
If we encounter a new debug value, describing the same parameter,
we should stop tracking the parameter's Entry Value. At that point,
in some cases, the Transfer which uses the parameter's Entry Value,
is already emitted. Thanks to the RemoveRedundantDebugValues pass,
many problems with incorrect instruction order and number of DBG_VALUEs
are fixed. However, we still cannot rely on the rule that each new
debug value is set by the previous non-debug instruction in Machine
Basic Block.
When new parameter debug value triggers removal of Backup Entry Value
for the same parameter, do the cleanup of Transfers emitted from Backup
Entry Values. Get the Transfer Instruction which created the new debug
value and search for debug values already emitted from the to-be-deleted
Backup Entry Value and attached to the Transfer Instruction. If found,
delete the Transfer and remove "primary" Entry Value Var Loc from
OpenRanges.
This patch fixes PR47628.
Patch by Nikola Tesic.
Differential revision: https://reviews.llvm.org/D106856
This patch emits DW_TAG_namelist and DW_TAG_namelist_item for fortran
namelist variables. DICompositeType is extended to support this fortran
feature.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D108553
Without this change only the preferred fusion opcode is tested
when attempting to combine FMA operations.
If both FMA and FMAD are available then FMA ops formed prior to
legalization will not be merged post legalization as FMAD becomes
the preferred fusion opcode.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108619
We can halve the number of mask constants by masking before shl
and after srl.
This can reduce the number of mov immediate or constant
materializations. Or reduce the number of constant pool loads
for X86 vectors.
I think we might be able to do something similar for bswap. I'll
look at it next.
Differential Revision: https://reviews.llvm.org/D108738
In the combination of addressing modes, when replacing the matched phi nodes,
sometimes the phi node to be replaced has been modified. For example,
there’s matcher set [A, B] and [C, A], which will have cyclic dependency:
A is replaced by B and C will be replaced by A. Because we tried to match new phi node
to another new phi node, we should ignore new phi nodes when mapping new phi node to old one.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D108635
Emscripten SjLj and (soon-to-be-added) Wasm SjLj transformation share
many steps:
1. Initialize `setjmpTable` and `setjmpTableSize` in the entry BB
2. Handle `setjmp` callsites
3. Handle `longjmp` callsites
4. Cleanup and update SSA
1, 3, and 4 are identical for Emscripten SjLj and Wasm SjLj. Only the
step 2 is different. This CL extracts the current Emscripten SjLj's
longjmp callsites handling into a function. The reason to make this a
separate CL is, without this, the diff tool cannot compare things well
in the presence of moved code and added code in the followup Wasm SjLj
CL, and it ends up mixing them together, making the diff unreadable.
Also fixes some typos and variable names. So far we've been calling the
buffer argument to `setjmp` and `longjmp` `jmpbuf`, but the name used in
the man page for those functions is `env`, so updated them to be
consistent.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108728
The plan was to use `wasm.catch.exn` intrinsic to catch exceptions and
add `wasm.catch.longjmp` intrinsic, that returns two values (setjmp
buffer and return value), later to catch longjmps. But because we
decided not to use multivalue support at the moment, we are going to use
one intrinsic that returns a single value for both exceptions and
longjmps. And even if it's not for that, I now think the naming of
`wasm.catch.exn` is a little weird, because the intrinsic can still take
a tag immediate, which means it can be used for anything, not only
exceptions, as long as that returns a single value.
This partially reverts D107405.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D108683
This is another bug exposed by https://llvm.org/PR51612
(and the one that triggered the initial assertion) in the report.
That example was suppressed with:
985b48f183
...but these would still crash because we created nodes
like UADDO without the expected 2 output values.
There are 2 bugs here:
1. We were not checking uses of operand 2 (the false value of the select).
2. We were not checking for multiple uses of nodes that produce >1 result.
Correcting those is enough to avoid the crash in the reduced test based on:
https://llvm.org/PR51612
The additional use check on operand 0 (the condition value of the select)
should not strictly be necessary because we are only replacing one use
with another (whether it makes performance sense to do the transform with
that pattern is not clear). But as noted in the TODO, changing that
uncovers another bug.
Note: there's at least one more bug here - we aren't propagating EVTs
correctly, but I plan to fix that in another patch.
Add support for the GNU C style __attribute__((error(""))) and
__attribute__((warning(""))). These attributes are meant to be put on
declarations of functions whom should not be called.
They are frequently used to provide compile time diagnostics similar to
_Static_assert, but which may rely on non-ICE conditions (ie. relying on
compiler optimizations). This is also similar to diagnose_if function
attribute, but can diagnose after optimizations have been run.
While users may instead simply call undefined functions in such cases to
get a linkage failure from the linker, these provide a much more
ergonomic and actionable diagnostic to users and do so at compile time
rather than at link time. Users instead may be able use inline asm .err
directives.
These are used throughout the Linux kernel in its implementation of
BUILD_BUG and BUILD_BUG_ON macros. These macros generally cannot be
converted to use _Static_assert because many of the parameters are not
ICEs. The Linux kernel still needs to be modified to make use of these
when building with Clang; I have a patch that does so I will send once
this feature is landed.
To do so, we create a new IR level Function attribute, "dontcall" (both
error and warning boil down to one IR Fn Attr). Then, similar to calls
to inline asm, we attach a !srcloc Metadata node to call sites of such
attributed callees.
The backend diagnoses these during instruction selection, while we still
know that a call is a call (vs say a JMP that's a tail call) in an arch
agnostic manner.
The frontend then reconstructs the SourceLocation from that Metadata,
and determines whether to emit an error or warning based on the callee's
attribute.
Link: https://bugs.llvm.org/show_bug.cgi?id=16428
Link: https://github.com/ClangBuiltLinux/linux/issues/1173
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D106030
InstrRefBasedLDV is marginally slower than VarlocBasedLDV when analysing
optimised code -- however, it's much slower when analysing code compiled
-O0.
To avoid this: don't use instruction referencing for -O0 functions. In the
"pure" case of unoptimised code, this won't really harm the debugging
experience because most variables won't have been promoted off the stack,
so can't go missing. It becomes more complicated when optimised code is
inlined into functions marked optnone; however these are rare, and as -O0
doesn't run many optimisations there should be little damage to the debug
experience as a result.
I've taken the opportunity to refactor testing for instruction-referencing
into a MachineFunction method, which seems the most appropriate place to
put it.
Differential Revision: https://reviews.llvm.org/D108585
For ISD::EXTRACT_SUBVECTOR, its second operand must be a constant
multiple of the known-minimum vector length of the result type.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D107795
Stack slot colouring adds "weight" to slots if a non-dbg-value instruction
refers to it. This, unfortunately, means that DBG_PHI instructions can have
an effect on codegen. The fix is very simple, replace isDebugValue with
isDebugInstr.
The regression test contains a scenario that reproduces this problem; I've
represented both normal-debug mode and instr-ref debug mode instructions
in comment lines prefixed with AAAAAA and BBBBBB, and un-comment them with
sed to test that the two different modes produce the same behaviour.
Differential Revision: https://reviews.llvm.org/D108627
The sext_inreg_of_load combine did not have the isLegalOrBeforeLegalizer check,
leading to the generation of potentially illegal G_SEXTLOADs when run after legalization.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D108626
On some AMDGPU subtargets, copying to and from AGPR registers using another
AGPR register is not possible. A intermediate VGPR register is needed for AGPR
to AGPR copy. This is an issue when machine copy propagation forwards a
COPY $agpr, replacing a COPY $vgpr which results in $agpr = COPY $agpr. It is
removing a cross class copy that may have been optimized by previous passes and
potentially creating an unoptimized cross class copy later on.
To avoid this issue, check CrossCopyRegClass if a different register class will
be needed for the copy. If so then avoid forwarding the copy when the
destination does not match the desired register class and if the original copy
already matches the desired register class.
Issue seen while attempting to optimize another AGPR to AGPR issue:
Live-ins: $agpr0
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $vgpr0
$agpr3 = COPY $vgpr0
$agpr4 = COPY $vgpr0
After machine-cp:
$vgpr0 = COPY $agpr0
$agpr1 = V_ACCVGPR_WRITE_B32 $vgpr0
$agpr2 = COPY $agpr0
$agpr3 = COPY $agpr0
$agpr4 = COPY $agpr0
Machine-cp propagated COPY $agpr0 to replace $vgpr0 creating 3 AGPR to AGPR
copys. Later this creates a cross-register copy from AGPR->VGPR->AGPR for each
copy when the prior VGPR->AGPR copy was already optimal.
Reviewed By: lkail, rampitec
Differential Revision: https://reviews.llvm.org/D108011
Currently isReallyTriviallyReMaterializableGeneric() implementation
prevents rematerialization on any virtual register use on the grounds
that is not a trivial rematerialization and that we do not want to
extend liveranges.
It appears that LRE logic does not attempt to extend a liverange of
a source register for rematerialization so that is not an issue.
That is checked in the LiveRangeEdit::allUsesAvailableAt().
The only non-trivial aspect of it is accounting for tied-defs which
normally represent a read-modify-write operation and not rematerializable.
The test for a tied-def situation already exists in the
/CodeGen/AMDGPU/remat-vop.mir,
test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve.
The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets
where I more or less understand the asm it seems to reduce spilling
(as expected) or be neutral. However, it needs a review by all targets'
specialists.
Differential Revision: https://reviews.llvm.org/D106408
One of the cases identified in PR45116 - we don't need to limit load combines to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for fp->int load/store copies) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
Differential Revision: https://reviews.llvm.org/D108318
One of the cases identified in PR45116 - we don't need to limit load combines (in this case for ISD::BUILD_PAIR) to ABI alignment, we can use allowsMemoryAccess - which tests using getABITypeAlign, but also checks if a target permits (fast) misaligned memory loads by checking allowsMisalignedMemoryAccesses as a fallback.
This helps in particular for 32-bit X86 cases loading 64-bit size data, reducing codegen diffs vs x86_64.
Differential Revision: https://reviews.llvm.org/D108307
D106408 enables rematerialization of instructions with virtual
register uses. That has uncovered the bug in the allUsesAvailableAt
implementation: https://bugs.llvm.org/show_bug.cgi?id=51516.
In the majority of cases canRematerializeAt() called to check if
an instruction can be rematerialized before the given UseIdx.
However, SplitEditor::enterIntvAtEnd() calls it to rematerialize
an instruction at the end of a block passing LIS.getMBBEndIdx()
into the check. In the testcase from the bug it has attempted to
rematerialize ADDXri after STRXui in bb.17. The use operand %55
of the ADD is killed by the STRX but that is undetected by the check
because it adjusts passed UseIdx to the reg slot, before the kill.
The value is dead at the index passed to the check however.
This change uses a later of passed UseIdx and its reg slot. This
shall be correct because if are checking an availability of operands
before an instruction that instruction cannot be the one defining
these operands. If we are checking for late rematerialization we
are really interested if operands live past the instruction.
The bug is not exploitable without D106408 but needed to reland
reverted D106408.
Differential Revision: https://reviews.llvm.org/D108475