Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.
This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.
This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.
This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.
This also allows us to handle cases like this:
$ebx = [...]
$rdi = MOVSX64rr32 $ebx
$esi = MOV32rr $edi
CALL64pcrel32 @call
The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.
This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.
I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
This caused "Too many bits for uint64_t" asserts when building Chromium. See
https://crbug.com/1031978#c2 for a reproducer. I'll follow up on the
llvm-commits thread with a creduced version.
> ARMCodeGenPrepare has already been generalized and renamed to
> TypePromotion. We've had it enabled and tested downstream for a
> while, so enable it by default.
>
> Differential Revision: https://reviews.llvm.org/D70998
This adds support for constrained floating-point comparison intrinsics.
Specifically, we add:
declare <ty2>
@llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
declare <ty2>
@llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
metadata <condition code>,
metadata <exception behavior>)
The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).
The condition code is implemented as a metadata string. The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).
These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations. Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes. The patch includes support
for the common legalization operations for those nodes.
The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).
Differential Revision: https://reviews.llvm.org/D69281
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:
// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)
Instead, I'd suggest to use the following sequence:
// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).
In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.
There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)
Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper
Differential Revision: https://reviews.llvm.org/D67105
Summary:
Split off of D67120.
Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71072
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.
To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:
make sure there is at least one duplication in current work set.
the number of duplication should not exceed the number of successors.
The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.
Differential Revision: https://reviews.llvm.org/D64376
One of CodeGenPrepare's optimizations is to duplicate address calculations
into basic blocks, so that as much information as possible can be folded
into memory addressing operands. This is great -- but the dbg.value
variable location intrinsics are not updated in the same way. This can lead
to dbg.values referring to address computations in other blocks that will
never be encoded into the DAG, while duplicate address computations are
performed locally that could be used by the dbg.value. Some of these (such
as non-constant-offset GEPs) can't be salvaged past.
Fix this by, whenever we duplicate an address computation into a block,
looking for dbg.value users of the original memory address in the same
block, and redirecting those to the local computation.
Differential Revision: https://reviews.llvm.org/D58403
This patch implements the following changes:
1) SelectionDAGBuilder::visitConstrainedFPIntrinsic currently treats
each constrained intrinsic like a global barrier (e.g. a function call)
and fully serializes all pending chains. This is actually not required;
it is allowed for constrained intrinsics to be reordered w.r.t one
another or (nonvolatile) memory accesses. The MI-level scheduler already
allows for that flexibility, so it makes sense to allow it at the DAG
level as well.
This patch therefore changes the way chains for constrained intrisincs
are created, and handles them basically like load operations are handled.
This has the effect that constrained intrinsics are no longer serialized
against one another or (nonvolatile) loads. They are still serialized
against stores, but that seems hard to change with the current DAG chain
setup, and it also doesn't seem to be a big problem preventing DAG
2) The OPC_CheckFoldableChainNode check requires that each of the
intermediate nodes in a multi-node pattern match only has a single use.
This check tends to fail if those intermediate nodes are strict operations
as those have a chain output that typically indeed has another use.
However, we don't really need to consider chains here at all, since they
will all be rewritten anyway by UpdateChains later. Other parts of the
matcher therefore already ignore chains, but this hasOneUse check doesn't.
This patch replaces hasOneUse by a custom test that verifies there is no
more than one use of any non-chain output value.
In theory, this change could affect code unrelated to strict FP nodes,
but at least on SystemZ I could not find any single instance of that
happening
3) The SystemZ back-end currently does not allow matching multiply-and-
extend operations (32x32 -> 64bit or 64x64 -> 128bit FP multiply) for
strict FP operations. This was not possible in the past due to the
problems described under 1) and 2) above.
With those issues fixed, it is now possible to fully support those
instructions in strict mode as well, and this patch does so.
Differential Revision: https://reviews.llvm.org/D70913
That refactoring moves NonRelocatableStringpool into common CodeGen folder.
So that NonRelocatableStringpool could be used not only inside dsymutil.
Differential Revision: https://reviews.llvm.org/D71068
This is for the case where -gmlt -gsplit-dwarf -fsplit-dwarf-inlining
are used together in some but not all units during LTO (or, in the
reduced case, even without LTO) - ensuring that no split dwarf is used
(because split-dwarf-inlining puts the same data in the .o file, so
there's no need to duplicate it into the .dwo file)
* Context *
During register coalescing, we use rematerialization when coalescing is not
possible. That means we may rematerialize a super register when only a smaller
register is actually used.
E.g.,
0B v1 = ldimm 0xFF
1B v2 = COPY v1.low8bits
2B = v2
=>
0B v1 = ldimm 0xFF
1B v2 = ldimm 0xFF
2B = v2.low8bits
Where xB are the slot indexes.
Here v2 grew from a 8-bit register to a 16-bit register.
When that happens and subregister liveness is enabled, we create subranges for
the newly created value.
E.g., before remat, the live range of v2 looked like:
main range: [1r, 2r)
(Reads v2 is defined at index 1 slot register and used before the slot register
of index 2)
After remat, it should look like:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 1d) <-- dead def
I.e., the unsused lanes of v2 should be marked as dead definition.
* The Problem *
Prior to this patch, the live-ranges from the previous exampel, would have the
full live-range for all subranges:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long
* The Fix *
Technically, the code that this patch changes is not wrong:
When we create the subranges for the newly rematerialized value, we create only
one subrange for the whole bit mask.
In other words, at this point v2 live-range looks like this:
main range: [1r, 2r)
low & high: [1r, 2r)
Then, it gets wrong when we call LiveInterval::refineSubRanges on low 8 bits:
main range: [1r, 2r)
low 8 bits: [1r, 2r)
high 8 bits: [1r, 2r) <-- too long
Ideally, we would like LiveInterval::refineSubRanges to be able to do the right
thing and mark the dead lanes as such. However, this is not possible, because by
the time we update / refine the live ranges, the IR hasn't been updated yet,
therefore we actually don't have enough information to do the right thing.
Another option to fix the problem would have been to call
LiveIntervals::shrinkToUses after the IR is updated. This is not desirable as
this may have a noticeable impact on compile time.
Instead, what this patch does is when we create the subranges for the
rematerialized value, we explicitly create one subrange for the lanes that were
used before rematerialization and one for the lanes that were not used. The used
one inherits the live range of the main range and the unused one is just created
empty. The existing rematerialization code then detects that the unused one are
not live and it correctly sets dead def intervals for them.
https://llvm.org/PR41372
The loclists_table_base was being overwritten for each CU even though
only one loclists contribution is made so everything but the last CU
would have a label that was never defined and fail to assemble.
Summary:
Previously, it was not possible to skip running the localizer pass
conditionally. This patch adds an input function to the pass which
decides if the pass should run on the given MachineFunction or not.
No test case as there is no upstream target needs this functionality.
Reviewers: qcolombet
Reviewed By: qcolombet
Subscribers: rovka, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71038
This patch addresses a performance problem reported in PR43855, and
present in the reapplication in in 001574938e5. It turns out that
MachineSink will (often) move instructions to the first block that
post-dominates the current block, and then try to sink further. This
means if we have a lot of conditionals, we can needlessly create large
numbers of DBG_VALUEs, one in each block the sunk instruction passes
through.
To fix this, rather than immediately sinking DBG_VALUEs, record them in
a pass structure. When sinking is complete and instructions won't be
sunk any further, new DBG_VALUEs are added, avoiding lots of
intermediate DBG_VALUE $noregs being created.
Differential revision: https://reviews.llvm.org/D70676
Fix part of PR43855, resolving a problem that comes from the reapplication
in 001574938e5. If we have two DBG_VALUE insts in a block that specify
the location of the same variable, for example:
%0 = someinst
DBG_VALUE %0, !123, !DIExpression()
%1 = anotherinst
DBG_VALUE %1, !123, !DIExpression()
if %0 were to sink, the corresponding DBG_VALUE would sink too, past the
next DBG_VALUE, effectively re-ordering assignments. To fix this, I've
added a SeenDbgVars set recording what variable locations have been seen in
a block already (working bottom up), and now flag DBG_VALUEs that would
pass a later DBG_VALUE for the same variable.
NB, this only works for repeated DBG_VALUEs in the same basic block, the
general case involving control flow is much harder, which I've written
up in PR44117.
Differential revision: https://reviews.llvm.org/D70672
These were:
* D58386 / f5e1b718a6 / reverted in d382a8a768
* D58238 / ee50590e16 / reverted in a8db456b53
Of which the latter has a performance regression tracked in PR43855,
fixed by D70672 / D70676, which will be committed atomically with this
reapplication.
Contains a minor difference to account for a change in the IsCopyInstr
signature.
ARMCodeGenPrepare has already been generalized and renamed to
TypePromotion. We've had it enabled and tested downstream for a
while, so enable it by default.
Differential Revision: https://reviews.llvm.org/D70998
Summary:
If a call is bundled then the code that looks for instructions that
produce parameter values would break when reaching the call's bundle
header, due to the `ifCall(/*AnyInBundle*/)` invocation returning true.
It is not enough to simply ignore bundle headers in the `isCall()`
invocation, as the bundle header may have defines of parameter registers
due to the call, meaning that such registers would incorrectly be
removed from the worklist. Therefore, do not look at bundle headers at
all.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: aprantl, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D71024
This patch adds forward iterators mc_difflist_iterator,
mc_subreg_iterator and mc_superreg_iterator, based on the existing
DiffListIterator. Those are used to provide iterator ranges over
sub- and super-register from TRI, which are slightly more convenient
than the existing MCSubRegIterator/MCSuperRegIterator. Unfortunately,
it duplicates a bit of functionality, but the new iterators are a bit
more convenient (and can be used with various existing iterator
utilities) and should probably replace the old iterators in the future.
This patch updates some existing users.
Reviewers: evandro, qcolombet, paquette, MatzeB, arsenm
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D70565
This patch turns MachineOperandIteratorBase into a regular forward
iterator, which can be used with iterator_range.
It also adds mi_bundle_ops and const_mi_bundle_ops that return iterator
ranges over all operands in a bundle and updates a use of the old
iterator.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D70561
Fix assertion error
```
bool llvm::MachineOperand::isRenamable() const: Assertion `Register::isPhysicalRegister(getReg()) && "isRenamable should only be checked on physical registers"' failed.
```
by checking if the register is 0 before invoking `isRenamable`.
Summary:
This patch mainly do such transformation
```
$R0 = OP ...
... // No read/clobber of $R0 and $R1
$R1 = COPY $R0 // $R0 is killed
```
Replace $R0 with $R1 and remove the COPY, we have
```
$R1 = OP ...
```
This transformation can also expose more opportunities for existing
copy elimination in MCP.
Differential Revision: https://reviews.llvm.org/D67794
An interplay of code from D70210, along with code from the
Value-Numbering-esque hash-based namer from D70210, as well as some
crusty code from the original MIR-Canon code lead to multiple causes of
failure when canonicalizing or renaming vregs for MIR with multiple
basic blocks. This patch fixes those issues while deleting some no
longer needed code and adding a nice diamond test case to boot.
Differential Revision: https://reviews.llvm.org/D70478
That patch fixes incompatible compilation unit type (DW_UT_skeleton) and root DIE (DW_TAG_compile_unit) error.
cat split-dwarf.cpp
int main()
{
int a = 1;
return 0;
}
clang++ -O -g -gsplit-dwarf -gdwarf-5 split-dwarf.cpp; llvm-dwarfdump --verify ./a.out | grep skeleton
error: Compilation unit type (DW_UT_skeleton) and root DIE (DW_TAG_compile_unit) do not match.
The fix is to change DW_TAG_compile_unit into DW_TAG_skeleton_unit when skeleton file is generated.
Differential Revision: https://reviews.llvm.org/D70880
Summary:
This follows a previous patch that changes the X86 datalayout to represent
mixed size pointers (32-bit sext, 32-bit zext, and 64-bit) with address spaces
(https://reviews.llvm.org/D64931)
This patch implements the address space cast lowering to the corresponding
sign extension, zero extension, or truncate instructions.
Related to https://bugs.llvm.org/show_bug.cgi?id=42359
Reviewers: rnk, craig.topper, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69639
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
Summary:
The default case handles the majority of MVTs so most of the individual
cases can be removed. Also added a case for floating point types.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D70955
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan). Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax. However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.
To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.
Differential Revision: https://reviews.llvm.org/D70965
This is the example:
int foo(int a, int b, int c, int d) {
return a + b + c + d;
}
And this is the Dependency Graph:
+------+ +------+ +------+ +------+
| A | | B | | C | | D |
+--+--++ +---+--+ +--+---+ +--+---+
^ ^ ^ ^ ^ ^
| | | | | |
| | | |New1 +--------------+
| | | | |
| | | | +--+---+
| |New2 | +-------+ ADD1 |
| | | +--+---+
| | | Fuse ^
| | +-------------+
| +------------+
| |
| Fuse +--+---+
+----------->+ ADD2 |
| +------+
+--+---+
| ADD3 |
+------+
We need also create an artificial edge from ADD1 to A if
https://reviews.llvm.org/D69998 is landed. That will force the Node A scheduled
before the ADD1 and ADD2. But in fact, it is ok to schedule the Node A
in-between ADD3 and ADD2, as ADD3 and ADD2 are NOT a fusion pair because
ADD2 has been matched to ADD1. We are creating these unnecessary dependency
edges that override the heuristics.
Differential Revision: https://reviews.llvm.org/D70066
This is an alternative to D64662 that shares more code between
strict and non-strict nodes. It's modeled after the implementation
that I did for softening.
Differential Revision: https://reviews.llvm.org/D70867
https://reviews.llvm.org/D70922
This adds a hook to allow targets to define exactly what extension
operation should be performed for widening constants. This handles cases
like widening i1 true which would end up becoming -1 which affects code
quality during combines.
Additionally, in order to stay consistent with how DAG is promoting
constants, we now signextend for byte sized types and zero extend
otherwise (by default). Targets can of course override this if
necessary.
As it can be seen from accompanying cleanup, it is not unheard of
to write `~Known.Zero` meaning "what maximal value can this KnownBits
produce". But i think `~Known.Zero` isn't *that* self-explanatory,
as compared to a method with a name.
Note that not all `~Known.Zero` places were cleaned up,
only those where this arguably improves things.
The DebugVariable class is a class declared in LiveDebugValues.cpp which
is used to uniquely identify a single variable, using its source
variable, inline location, and fragment info to do so. This patch moves
this class into DebugInfoMetadata.h, making it available in a much
broader scope.
Convert ARMCodeGenPrepare into a generic type promotion pass by:
- Removing the insertion of arm specific intrinsics to handle narrow
types as we weren't using this.
- Removing ARMSubtarget references.
- Now query a generic TLI object to know which types should be
promoted and what they should be promoted to.
- Move all codegen tests into Transforms folder and testing using opt
and not llc, which is how they should have been written in the
first place...
The pass searches up from icmp operands in an attempt to safely
promote types so we can avoid generating unnecessary unsigned extends
during DAG ISel.
Differential Revision: https://reviews.llvm.org/D69556
The idea is to remove front-end analysis for the parameter's value
modification and leave it to the value tracking system. Front-end in some
cases marks a parameter as modified even the line of code that modifies the
parameter gets optimized, that implies that this will cover more entry
values even. In addition, extending the support for modified parameters
will be easier with this approach.
Since the goal is to recognize if a parameter’s value has changed, the idea
at very high level is: If we encounter a DBG_VALUE other than the entry
value one describing the same variable (parameter), we can assume that the
variable’s value has changed and we should not track its entry value any
more. That would be ideal scenario, but due to various LLVM optimizations,
a variable’s value could be just moved around from one register to another
(and there will be additional DBG_VALUEs describing the same variable), so
we have to recognize such situation (otherwise, we will lose a lot of entry
values) and salvage the debug entry value.
Differential Revision: https://reviews.llvm.org/D68209
While working with a patch for instruction selection, the splitting of a
large immediate ended up begin treated incorrectly by the backend. Where a
register operand should have been created, it instead became an immediate. To
my surprise the machine verifier failed to report this, which at the time
would have been helpful.
This patch improves the verifier so that it will report this type of error.
This patch XFAILs CodeGen/SPARC/fp128.ll, which has been reported at
https://bugs.llvm.org/show_bug.cgi?id=44091
Review: thegameg, arsenm, fhahn
https://reviews.llvm.org/D63973
These nodes have a FIXME that they only get here because a Custom
handler returned SDValue() instead of the original Op.
Even though we aren't expanding them, we should return true here to
prevent ConvertNodeToLibcall from also trying to process them until
the FIXME has been addressed.
I'm hoping to add checking to ConvertNodeToLibcall to make sure
we don't give it nodes it doesn't have support for.
The code that processes the Results vector also calls ReplaceNode
and makes ExpandNode return true.
If we don't add it to the Results node, we end up returning false
from ExpandNode. This causes ConvertNodeToLibcall to be called next.
But ConvertNodeToLibcall doesn't do anything for shifts so they
just pass through unmodified. Except for printing a debug message.
Ultimately, I'd like to add more checks to ExpandNode and
ConvertNodeToLibcall to make sure we don't have nodes marked as
Expand that don't have any Expand or libcall handling.
This revision is revised to update Go-bindings and Release Notes.
The original commit message follows.
This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE.
When explicit alignment is specified.
Patch by Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok,
deadalinx
Differential Revision: https://reviews.llvm.org/D70111
This patch adds support for debug_macinfo.dwo section[pre-standardized]
to llvm and llvm-dwarfdump.
Reviewers: probinson, dblaikie, aprantl, jini.susan.george, alok
Differential Revision: https://reviews.llvm.org/D70705
Tags: #debug-info #llvm
Summary:
In case of a need to distinguish different query sites for gradual commit or
debugging of PGSO. NFC.
Reviewers: davidxl
Subscribers: hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70510
analyzePhysReg does not really fit into the iterator and moving it
makes it easier to change the base iterator.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D70559
Summary:
When combining COPY instructions, we were replacing the destination registers
with the source register without checking register constraints. This patch adds
a simple logic to check if the constraints match before replacing registers.
Reviewers: qcolombet, aditya_nandakumar, aemerson, paquette, dsanders, Petar.Avramovic
Reviewed By: aditya_nandakumar
Subscribers: rovka, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70616
analyzeVirtReg does not really fit into the iterator and moving it
makes it easier to change the base iterator.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm, qcolombet
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D70558
These will be needed for ARM fp-instrinsics.ll which is currently
XFAILed.
One of the getOperand calls in SoftenFloatRes_FP_EXTEND was not
taking strict FP into account. It only affected the call
to setTypeListBeforeSoften which only has an effect on some targets.
We would previously fallback if the type wasn't f32/f64/f128. But
I don't think any of the other floating point types ever go through
the softening code anyway. So this code is dead.
Summary: This combine showed up as needed when exploring the regression when processing the DAG in topological order.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68195
X86 has some calling conventions where bits 127:0 of a vector register are callee saved, but the upper bits aren't. Previously we could detect that the full ymm register was clobbered when the xmm portion was really preserved. This patch checks the subregisters to make sure they aren't preserved.
Fixes PR44140
Differential Revision: https://reviews.llvm.org/D70699
This is based on what's required for softening fp128 operations on 32-bit X86 assuming f32/f64/f80 are legal. So there could be some things missing.
Differential Revision: https://reviews.llvm.org/D70654
Summary: This will be enhanced in a follow up to add strict fp support
Reviewers: efriedma
Reviewed By: efriedma
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70751
This has been factored out of D70654 which will add strict FP support to these functions. By making the helpers we avoid repeating even more code.
Differential Revision: https://reviews.llvm.org/D70736
MVE has a basic symmetry between it's normal loads/store operations and
the masked variants. This means that masked loads and stores can use
pre-inc and post-inc addressing modes, just like the standard loads and
stores already do.
To enable that, this patch adds all the relevant infrastructure for
treating masked loads/stores addressing modes in the same way as normal
loads/stores.
This involves:
- Adding an AddressingMode to MaskedLoadStoreSDNode, along with an extra
Offset operand that is added after the PtrBase.
- Extending the IndexedModeActions from 8bits to 16bits to store the
legality of masked operations as well as normal ones. This array is
fairly small, so doubling the size still won't make it very large.
Offset masked loads can then be controlled with
setIndexedMaskedLoadAction, similar to standard loads.
- The same methods that combine to indexed loads, such as
CombineToPostIndexedLoadStore, are adjusted to handle masked loads in
the same way.
- The ARM backend is then adjusted to make use of these indexed masked
loads/stores.
- The X86 backend is adjusted to hopefully be no functional changes.
Differential Revision: https://reviews.llvm.org/D70176
Add some more helper functions to ReachingDefs to query the uses of
a given MachineInstr and also to query whether two MachineInstrs use
the same def of a register.
For Arm, while tail-predicating, these helpers are used in the
low-overhead loops to remove the dead code that calculates the number
of loop iterations.
Differential Revision: https://reviews.llvm.org/D70240
Add several new methods to ReachingDefAnalysis:
- getReachingMIDef, instead of returning an integer, return the
MachineInstr that produces the def.
- getInstFromId, return a MachineInstr for which the given integer
corresponds to.
- hasSameReachingDef, return whether two MachineInstr use the same
def of a register.
- isRegUsedAfter, return whether a register is used after a given
MachineInstr.
These methods have been used in ARMLowOverhead to replace searching
for uses/defs.
Differential Revision: https://reviews.llvm.org/D70009
There seems to have been a misunderstanding of what ISD::FTRUNC
represents. ISD::FTRUNC is equivalent to llvm.trunc which takes
a floating point value, truncates it without changing the size
of the value and returns it.
Despite its similar name, its different than the fptrunc instruction
in IR which changes a floating point value to a smaller floating
point value. fptrunc is represented by ISD::FP_ROUND in SelectionDAG.
Since the ISD::FP_TO_FP16 node takes a floating point value and
converts it to f16 its more similar to ISD::FP_ROUND. In fact there
is identical code to what is being removed here in SoftenFloatRes_FP_ROUND.
I assume this bug was never encountered because it would require
f16 to be legalized by softening rather than the default of
promoting.
We already have this simplification at node-creation-time, but
the test from:
https://bugs.llvm.org/show_bug.cgi?id=44139
...shows that we can combine our way to an assert/crash too.
I need to be able to drop an operand for STRICT_FP_ROUND handling on X86. Merging these functions gives me the ArrayRef interface that passes the return type, operands, and debugloc instead of the Node.
Differential Revision: https://reviews.llvm.org/D70503
This is a re-land of D56151 / r364515 with a completely new implementation.
Once MIR code leaves SSA form and the liveness of a vreg is considered,
DBG_VALUE insts are able to refer to non-live vregs, because their
debug-uses do not contribute to liveness. This non-liveness becomes
problematic for optimizations like register coalescing, as they can't
``see'' the debug uses in the liveness analyses.
As a result registers get coalesced regardless of debug uses, and that can
lead to invalid variable locations containing unexpected values. In the
added test case, the first vreg operand of ADD32rr is merged with various
copies of the vreg (great for performance), but a DBG_VALUE of the
unmodified operand is blindly updated to the modified operand. This changes
what value the variable will appear to have in a debugger.
Fix this by changing any DBG_VALUE whose operand will be resurrected by
register coalescing to be a $noreg DBG_VALUE, i.e. give the variable no
location. This is an overapproximation as some coalesced locations are safe
(others are not) -- an extra domination analysis would be required to work
out which, and it would be better if we just don't generate non-live
DBG_VALUEs.
Differential Revision: https://reviews.llvm.org/D64630
Fix two problems that popped up after my last patch. One is that the
stiching of prologue/epilogue can be wrong when reading a value from a
previsou stage. Also changed how we duplicate phi instructions to avoid
generating extra phi that we delete later.
Differential Revision: https://reviews.llvm.org/D70213
The original commit message follows.
This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.
Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george
Differential Revision: https://reviews.llvm.org/D69462
This patch adds support for debug_loclists.dwo section in llvm and llvm-dwarfdump.
Also Fixes PR43622, PR43623.
Reviewers: dblaikie, probinson, labath, aprantl, jini.susan.george
https://reviews.llvm.org/D69462
Summary:
This is a preparatory cleanup before i add more
of this fold to deal with comparisons with non-zero.
In essence, the current lowering is:
```
Name: (X % C1) == 0 -> X * C3 <= C4
Pre: (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, 0
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%r = icmp ule i8 %n3, %C4
```
https://rise4fun.com/Alive/oqd
It kinda just works, really no weird edge-cases.
But it isn't all that great for when comparing with non-zero.
In particular, given `(X % C1) == C2`, there will be problems
in the always-false tautological case where `C2 u>= C1`:
https://rise4fun.com/Alive/pH3
That case is tautological, always-false:
```
Name: (X % Y) u>= Y
%o0 = urem i8 %x, %y
%r = icmp uge i8 %o0, %y
=>
%r = false
```
https://rise4fun.com/Alive/ofu
While we can't/shouldn't get such tautological case normally,
we do deal with non-splat vectors, so unless we want to give up
in this case, we need to fixup/short-circuit such lanes.
There are two lowering variants:
1. We can blend between whatever computed result and the correct tautological result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%res = icmp ule i8 %n3, %C4
%r = select i1 %is_tautologically_false, i1 0, i1 %res
```
https://rise4fun.com/Alive/PjT5https://rise4fun.com/Alive/1KV
2. We can invert the comparison result
```
Name: (X % C1) == C2 -> X * C3 <= C4 || false
Pre: (C2 == 0 || C1 u<= C2) && (C1 u>> countTrailingZeros(C1)) * C3 == 1
%zz = and i8 C3, 0 ; trick alive into making C3 avaliable in precondition
%o0 = urem i8 %x, C1
%r = icmp eq i8 %o0, C2
=>
%zz = and i8 C3, 0 ; and silence it from complaining about said reg
%C4 = -1 /u C1
%n0 = mul i8 %x, C3
%n1 = lshr i8 %n0, countTrailingZeros(C1) ; rotate right
%n2 = shl i8 %n0, ((8-countTrailingZeros(C1)) %u 8) ; rotate right
%n3 = or i8 %n1, %n2 ; rotate right
%is_tautologically_false = icmp ule i8 C1, C2
%C4_fixed = select i1 %is_tautologically_false, i8 -1, i8 %C4
%res = icmp ule i8 %n3, %C4_fixed
%r = xor i1 %res, %is_tautologically_false
```
https://rise4fun.com/Alive/2xChttps://rise4fun.com/Alive/jpb5
3. We can expand into `and`/`or`:
https://rise4fun.com/Alive/WGnhttps://rise4fun.com/Alive/lcb5
Blend-one is likely better since we avoid having to load the
replacement from constant pool. `xor` is second best since
it's still pretty general. I'm not adding `and`/`or` variants.
Reviewers: RKSimon, craig.topper, spatel
Reviewed By: RKSimon
Subscribers: nick, hiraditya, xbolva00, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70051
float node
This patch add an option 'disable-strictnode-mutation' to prevent strict
node mutating to an normal node.
So we can make sure that the patch which sets strict-node as legal works
correctly.
Patch by Chen Liu(LiuChen3)
Differential Revision: https://reviews.llvm.org/D70226
Summary:
Most libraries are defined in the lib/ directory but there are also a
few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining
"Component Libraries" as libraries defined in lib/ that may be included in
libLLVM.so. Explicitly marking the libraries in lib/ as component
libraries allows us to remove some fragile checks that attempt to
differentiate between lib/ libraries and tools/ libraires:
1. In tools/llvm-shlib, because
llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of
all libraries defined in the whole project, there was custom code
needed to filter out libraries defined in tools/, none of which should
be included in libLLVM.so. This code assumed that any library
defined as static was from lib/ and everything else should be
excluded.
With this change, llvm_map_components_to_libnames(LIB_NAMES, "all")
only returns libraries that have been added to the LLVM_COMPONENT_LIBS
global cmake property, so this custom filtering logic can be removed.
Doing this also fixes the build with BUILD_SHARED_LIBS=ON
and LLVM_BUILD_LLVM_DYLIB=ON.
2. There was some code in llvm_add_library that assumed that
libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or
ARG_LINK_COMPONENTS set. This is only true because libraries
defined lib lib/ use LLVMBuild.txt and don't set these values.
This code has been fixed now to check if the library has been
explicitly marked as a component library, which should now make it
easier to remove LLVMBuild at some point in the future.
I have tested this patch on Windows, MacOS and Linux with release builds
and the following combinations of CMake options:
- "" (No options)
- -DLLVM_BUILD_LLVM_DYLIB=ON
- -DLLVM_LINK_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON
- -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON
Reviewers: beanz, smeenai, compnerd, phosek
Reviewed By: beanz
Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70179
Summary:
The fix in BranchFolder related to non debug invariant problems
done in commit ec32dff0b0 actually introduced some new
problems with debug invariance.
Before that patch ComputeCommonTailLength would move iterators
back, past debug instructions, in order to make ProfitableToMerge
make consistent answers "when one block differs from the other
only by whether debugging pseudos are present at the beginning".
But the changes in ec32dff0b0 undid that by moving the iterators
forward again.
This patch refactors ComputeCommonTailLength. The function was
really complex, considering that the SkipTopCFIAndReturn part
always moved the iterators forward to the first "real" instruction
in the found tail after ec32dff0b0.
The patch also restores the logic to "back past possible debugging
pseudos at beginning of block" to make sure ProfitableToMerge
gives consistent answers independent of DBG_VALUE instructions
before the tail. That is now done by ProfitableToMerge instead of
being hidden as a side-effect in ComputeCommonTailLength.
Reviewers: probinson, yechunliang, jmorse
Reviewed By: jmorse
Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70091
DwarfExpression::addMachineReg() knows how to build a larger register
that isn't expressible in DWARF by combining multiple
subregisters. However, if the entire value fits into just one
subregister, it would still emit the other subregisters, leading to
all sorts of inconsistencies down the line.
This patch fixes that by moving an already existing(!) check whether
the subregister's offset is before the end of the value to the right
place.
rdar://problem/57294211
Differential Revision: https://reviews.llvm.org/D70508
This allows operations that are marked Custom, but have some type
combinations that are legal to get past this code.
Add custom mutation code to X86's Select function for the nodes
that don't have isel patterns yet.
This patch lowering jump table, constant pool and block address in assembly.
1. On AIX, jump table index is always relative;
2. Put CPI and JTI into ReadOnlySection until we support unique data sections;
3. Create the temp symbol for block address symbol;
4. Update MIR testcases and add related assembly part;
Differential Revision: https://reviews.llvm.org/D70243
Summary:
Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.
Work remaining: match ADD, etc.
Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast
Reviewed By: lebedev.ri
Subscribers: chfast, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70079
This is recommit of commit e6584b2b7b, which was reverted in
30e7ee3c4b together with af57dbf12e.
Original message is below.
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
Summary
In several places we need to enumerate all constrained intrinsics or IR
nodes that should be represented by them. It is easy to miss some of
the cases. To make working with these intrinsics more convenient and
robust, this change introduces file containing definitions of all
constrained intrinsics and some of their properties. This file can be
included to generate constrained intrinsics processing code.
Reviewers: kpn, andrew.w.kaylor, cameron.mcinally, uweigand
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69887
A call site parameter description of a memory operand needs to
unambiguously convey the size of the operand to prevent incorrect entry
value evaluation.
Thanks for David Stenberg for pointing this issue out!
Cleanup handling of the denormal-fp-math attribute. Consolidate places
checking the allowed names in one place.
This is in preparation for introducing FP type specific variants of
the denormal-fp-mode attribute. AMDGPU will switch to using this in
place of the current hacky use of subtarget features for the denormal
mode.
Introduce a new header for dealing with FP modes. The constrained
intrinsic classes define related enums that should also be moved into
this header for uses in other contexts.
The verifier could use a check to make sure the denorm-fp-mode
attribute is sane, but there currently isn't one.
Currently, DAGCombiner incorrectly asssumes non-IEEE behavior by
default in the one current user. Clang must be taught to start
emitting this attribute by default to avoid regressions when this is
switched to assume ieee behavior if the attribute isn't present.
AMDGPU needs to know the FP mode for the function to answer this
correctly when this is removed from the subtarget.
AArch64 had to make this more complicated by using this from an IR
hook, so add an IR typed overload.
Summary:
Assert in getFunctionLocalOffsetAfterInsn() fails when processing a call
MachineInstr inside a bundle and compiling with debug info. This is
because labels are added by DwarfDebug::beginInstruction() which is
called for each top-level MI by EmitFunctionBody()'s for-loop iteration
but constructCallSiteEntryDIEs() which calls
getFunctionLocalOffsetAfterInsn() iterates over all MIs.
This commit modifies constructCallSiteEntryDIEs() to get the associated
bundle MI for call MIs inside a bundle and use that to when calling
getFunctionLocalOffsetAfterInsn() and getLabelAfterInsn(). It also skips
loop iterations for bundle MIs since the loop statements are concerned
with debug info for each physical instructions and bundles represent a
group of instructions. It also fix the comment about PCAddr since the
code is getting the return address and not the call address.
Reviewers: dstenb, vsk, aprantl, djtodoro, dblaikie, NikolaPrica
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70293
Previously we mutated the node and then converted it to a libcall. But this loses the chain information.
This patch keeps the chain, but unfortunately breaks tail call optimization as the functions involved in deciding if a node is in tail call position can't handle the chain. But correct ordering seems more important to be right.
Somehow the SystemZ tests improved. I looked at one of them and it seemed that we're handling the split vector elements in a different order and that made the copies work better.
Differential Revision: https://reviews.llvm.org/D70334
and a follow-up NFC rearrangement as it's causing a crash on valid. Testcase is on the original review thread.
This reverts commits af57dbf12e and e6584b2b7b
* Implements scalable size queries for MVTs, split out from D53137.
* Contains a fix for FindMemType to avoid using scalable vector type
to contain non-scalable types.
* Explicit casts for several places where implicit integer sign
changes or promotion from 32 to 64 bits caused problems.
* CodeGenDAGPatterns will treat scalable and non-scalable vector types
as different.
Reviewers: greened, cameron.mcinally, sdesmalen, rovka
Reviewed By: rovka
Differential Revision: https://reviews.llvm.org/D66871
These were both recently added. While the call to GetSoftenedFloat
is a little more optimal, we don't do it in the expand for
FP_TO_SINT/UINT so there's no real reason to do it here. This
avoids a FIXME for strict fp.
This doesn't handle softening the input type, but we don't handle
softening any of the strict nodes yet. Skipping that made it easy
to reuse an existing function for creating a libcall from a node
with a chain.
Before this we were emitting a bitcast to integer from the lowering
code that itself will need to be legalized. By calling
GetSoftenedFloat we get the integer conversion in one step without
needing to relegalize a bitcast.
This code isn't exercised, and was in the wrong place. If we need
this, we would need to promote the type before figuring out which
libcall to use.
I'm choosing to remove it rather than fixing since we don't
support PromoteFloat for LRINT/LROUND/LLRINT/LLROUND when the
result type is legal so I don't see much reason to support it
for the case where the result type isn't legal.
These too functions are were the same except for which libcall gets
emitted. Just merge them into one.
This is prep work for some other work including strict fp support.
This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE.
When explicit alignment is specified.
Patch by Awanish Pandey <Awanish.Pandey@amd.com>
Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok,
deadalinx
Differential Revision: https://reviews.llvm.org/D70111
This only implements the non-dwo part, but loclistx is necessary to use
location lists in DWARFv5, so it's a precursor to that work - and
generally reduces relocations (only using one reloc, then
indexes/relative offsets for all location list references) in non-split
DWARF.
LLVM IR of 1-element vectors get lower into scalar in GISel. As a
result, shuffle vector may also produce a scalar.
This patch teaches the shuffle combiner how to deal with scalars when
they are in the destination type of a shuffle vector.
For now, we just support the easy case where this can be lowered to
a plain copy. For other cases, we leave the shuffle vector as is.
This type of IR are seen in O0 pipelines. E.g., as produced with
SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c.
rdar://problem/57198904
https://reviews.llvm.org/D70210
Previously:
Due to sensitivity of the algorithm with gaps, and extra instructions,
when diffing, often we see naming being off by a few. Makes the diff
unreadable even for tests with 7 and 8 instructions respectively.
Naming can change depending on candidates (and order of picking
candidates). Suddenly if there's one extra instruction somewhere, the
entire subtree would be named completely differently.
No consistent naming of similar instructions which occur in different
functions. If we try to do something like count the frequency
distribution of various differences across suite, then the above
sensitivity issues are going to result in poor results.
Instead:
Name instruction based on semantics of the instruction (hash of the
opcode and operands). Essentially for a given instruction that occurs in
any module/function it'll be named similarly (ie semantic). This has
some nice properties
Can easily look at many instructions and just check the hash and if
they're named similarly, then it's the same instruction. Makes it very
easy to spot the same instruction both multiple times, as well as across
many functions (useful for frequency distribution).
Independent of traversal/candidates/depth of graph. No need to keep
track of last index/gaps/skip count etc.
No off by few issues with diffs. I've tried the old vs new
implementation in files ranging from 30 to 700 instructions. In both
cases with the old algorithm, diffs are a sea of red, where as for the
semantic version, in both cases, the diffs line up beautifully.
Simplified implementation of the main loop (simple iteration) , no keep
track of what's visited and not.
Handle collision just by incrementing a counter. Roughly
bb[N]_hash_[CollisionCount].
Additionally with the new implementation, we can probably avoid doing
the hoisting of instructions to various places, as they'll likely be
named the same resulting in differences only based on collision (ie
regardless of whether the instruction is hoisted or not/close to use or
not, it'll be named the same hash which should result in use of the
instruction be identical with the only change being the collision count)
which is very easy to spot visually.
Enumerations that describe rounding mode and exception behavior were
defined inside ConstrainedFPIntrinsic. It makes sense to use the same
definitions to represent the same properties in other cases, not only
in constrained intrinsics. It was however inconvenient as required to
include constrained intrinsics definitions even if they were not needed.
Also using long scope prefix reduced readability.
This change moves these definitioins to the namespace llvm::fp.
No functional changes.
Differential Revision: https://reviews.llvm.org/D69552
The SmallVector reserve() call in
MachineInstrExpressionTrait::getHashValue accounted for over 3% of all
calls to malloc() when I compiled a bunch of graphics shaders for the
AMDGPU target. Its initial size was only enough for machine instructions
with up to 7 operands, but for AMDGPU 8 and 10 operands are very common.
Here's a histogram of number of operands for each call to getHashValue,
gathered from the same collection of shaders:
1 13503
2 254273
3 135781
4 422508
5 614997
6 194953
7 287248
8 1517255
9 31218
10 1191269
11 70731
12 24
13 77
15 84
17 4692
27 16
33 705
49 6
Typical instructions with 8 and 10 operands are floating point
arithmetic and multiply-accumulate instructions like:
%83:vgpr_32 = V_MUL_F32_e64 0, killed %82:vgpr_32, 0, killed %81:vgpr_32, 0, 0, implicit $exec
%330:vgpr_32 = V_MAC_F32_e64 0, killed %327:vgpr_32, 0, killed %329:sgpr_32, 0, %328:vgpr_32(tied-def 0), 0, 0, implicit $exec
Differential Revision: https://reviews.llvm.org/D70301
Allow call site paramter descriptions to reference spill slots. Spill
slots are not visible to high-level LLVM IR, so they can safely be
referenced during entry value evaluation (as they cannot be clobbered by
some other function).
This gives a 5% increase in the number of call site parameter DIEs in an
LTO x86_64 build of the xnu kernel.
This reverts commit eb4c98ca3d (
[DebugInfo] Exclude memory location values as parameter entry values),
effectively reintroducing the portion of D60716 which dealt with memory
locations (authored by Djordje, Nikola, Ananth, and Ivan).
This partially addresses llvm.org/PR43343. However, not all memory
operands forwarded to callees live in spill slots. In the xnu build, it
may be possible to use an escape analysis to increase the number of call
site parameter by another 15% (more details in PR43343).
Differential Revision: https://reviews.llvm.org/D70254
We were previously pushing all intrinsics used in a function to the
worklist. This is wasteful for memory in a function with a lot of
intrinsics.
We also ask TTI if we should expand every intrinsic, but we only
have expansion support for the reduction intrinsics. This just
wastes time for the non-reduction intrinsics.
This patch only pushes reduction intrinsics into the worklist and
skips other intrinsics.
Differential Revision: https://reviews.llvm.org/D69470
I reviewed the diff hunks of 05da2fe521 that don't contain
'#include' lines, and found two unintended changes. I deleted a header
banner inadvertently while inserting a header, and changed the
indentation of a constructor in an odd way. Add back the banner, and
reformat the constructor.
Avoids the need to include TargetMachine.h from various places just for
an enum. Various other enums live here, such as the optimization level,
TLS model, etc. Data suggests that this change probably doesn't matter,
but it seems nice to have anyway.
This file lists every pass in LLVM, and is included by Pass.h, which is
very popular. Every time we add, remove, or rename a pass in LLVM, it
caused lots of recompilation.
I found this fact by looking at this table, which is sorted by the
number of times a file was changed over the last 100,000 git commits
multiplied by the number of object files that depend on it in the
current checkout:
recompiles touches affected_files header
342380 95 3604 llvm/include/llvm/ADT/STLExtras.h
314730 234 1345 llvm/include/llvm/InitializePasses.h
307036 118 2602 llvm/include/llvm/ADT/APInt.h
213049 59 3611 llvm/include/llvm/Support/MathExtras.h
170422 47 3626 llvm/include/llvm/Support/Compiler.h
162225 45 3605 llvm/include/llvm/ADT/Optional.h
158319 63 2513 llvm/include/llvm/ADT/Triple.h
140322 39 3598 llvm/include/llvm/ADT/StringRef.h
137647 59 2333 llvm/include/llvm/Support/Error.h
131619 73 1803 llvm/include/llvm/Support/FileSystem.h
Before this change, touching InitializePasses.h would cause 1345 files
to recompile. After this change, touching it only causes 550 compiles in
an incremental rebuild.
Reviewers: bkramer, asbirlea, bollu, jdoerfert
Differential Revision: https://reviews.llvm.org/D70211
This method is private and only called from this file and doesn't need
to be inline. Saves a TargetMachine.h include in MachineFunction.h, a
popular header. The include was introduced in 98603a8153 despite the
forward decl of LLVMTargetMachine.
v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0.
This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored.
Differential Revision: https://reviews.llvm.org/D70138
During register coalescing, we update the live-intervals on-the-fly.
To do that we are in this strange mode where the live-intervals can
be slightly out-of-sync (more precisely they are forward looking)
compared to what the IR actually represents.
This happens because the register coalescer only updates the IR when
it is done with updating the live-intervals and it has to do it this
way because updating the IR on-the-fly would actually clobber some
information on how the live-ranges that are being updated look like.
This is problematic for updates that rely on the IR to accurately
represents the state of the live-ranges. Right now, we have only
one of those: stripValuesNotDefiningMask.
To reconcile this need of out-of-sync IR, this patch introduces a
new argument to LiveInterval::refineSubRanges that allows the code
doing the live range updates to reason about how the code should
look like after the coalescer will have rewritten the registers.
Essentially this captures how a subregister index with be offseted
to match its position in a new register class.
E.g., let say we want to merge:
V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32>
We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32>
overlap, i.e., by choosing a class where we can find "offset + 1 == 3".
Put differently we align V2's sub3 with V1's sub1:
V2: sub0 sub1 sub2 sub3
V1: <offset> sub0 sub1
This offset will look like a composed subregidx in the the class:
V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
=> V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32>
Now if we didn't rewrite the uses and def of V1, all the checks for V1
need to account for this offset to match what the live intervals intend
to capture.
Prior to this patch, we would fail to recognize the uses and def of V1
and would end up with machine verifier errors: No live segment at def.
This could lead to miscompile as we would drop some live-ranges and
thus, miss some interferences.
For this problem to trigger, we need to reach stripValuesNotDefiningMask
while having a mismatch between the IR and the live-ranges (i.e.,
we have to apply a subreg offset to the IR.)
This requires the following three conditions:
1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1>
2. An update with Tuple registers with a possibility to coalesce the
subreg index: e.g., v1.dsub_1 == v2.dsub_3
3. Subreg liveness enabled.
looking at the IR to decide what is alive and what is not, i.e., calling
stripValuesNotDefiningMask.
coalescer maintains for the live-ranges information.
None of the targets that currently use subreg liveness (i.e., the targets
that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and
and #2, so this patch also artificial enables subreg liveness for ARM,
so that a nice test case can be attached.
Summary:
Entry values are considered for parameters that have register-described
DBG_VALUEs in the entry block (along with other conditions).
If a parameter's value has been propagated from the caller to the
callee, then the parameter's DBG_VALUE in the entry block may be
described using a register defined by some instruction, and entry values
should not be emitted for the parameter, which can currently occur.
One such case was seen in the attached test case, in which the second
parameter, which is described by a redefinition of the first parameter's
register, would incorrectly get an entry value using the first
parameter's register. This commit intends to solve such cases by keeping
track of register defines, and ignoring DBG_VALUEs in the entry block
that are described by such registers.
In a RelWithDebInfo build of clang-8, the average size of the set was
27, and in a RelWithDebInfo+ASan build it was 30.
Reviewers: djtodoro, NikolaPrica, aprantl, vsk
Reviewed By: djtodoro, vsk
Subscribers: hiraditya, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D69889
Summary:
The conditions that are used to determine if entry values should be
emitted for a parameter are quite many, and will grow slightly
in a follow-up commit, so move those to a helper function, as was
suggested in the code review for D69889.
Reviewers: djtodoro, NikolaPrica
Reviewed By: djtodoro
Subscribers: probinson, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69955
This patch adds a target interface to set the StackID for a given type,
which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a
'sve-vec' StackID, so it is allocated in the SVE area of the stack frame.
Reviewers: ostannard, efriedma, rengolin, cameron.mcinally
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D70080
Summary:
Replaces
```
unsigned getShiftAmountThreshold(EVT VT)
```
by
```
bool shouldAvoidTransformToShift(EVT VT, unsigned amount)
```
thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not.
Updates the MSP430 target with a custom implementation.
This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this.
Existing tests apply, a few more have been added.
Reviewers: asl, spatel
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70042
In MachineCopyPropagation, when propagating the source of a copy into
the operand of a later instruction, bail if a destination overlaps
(partly defines) the copy source. If the instruction where the
substitution is happening is also a copy, allowing the propagation
confuses the tracking mechanism.
Differential Revision: https://reviews.llvm.org/D69953
Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
Summary:
This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction.
ConstantExpr freeze is removed, as discussed in the previous review.
FreezeOperator is not added because there's no ConstantExpr freeze.
`freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed.
InstVisitor has visitFreeze now because freeze is not unaryop anymore.
Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri
Reviewed By: craig.topper, lebedev.ri
Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69932