For instructions that don't need any special handling, use
ConstantFoldInstOperands(), rather than re-implementing individual
cases.
This is probably not NFC because it can handle cases the previous
code missed (e.g. vector operations).
Support compares in ConstantFoldInstOperands(), instead of
forcing the use of ConstantFoldCompareInstOperands(). Also handle
insertvalue (extractvalue was already handled).
This removes a footgun, where many uses of ConstantFoldInstOperands()
need a separate check for compares beforehand. It's particularly
insidious if called on a constant expression, because it doesn't
fail in that case, but will just not do DL-dependent folding.
This test checks one of problematic cases outlined in D128006, leading
to the patch's reversal. I thought it best to add a test just in case
this sort of optimization is attempted again in the future in some
fashion.
Even though the array is declared with '*' upper bounds, it has an
initial value that has a statically known shape. Use the shape from
the type of the initializer when the declared size is '*'.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128889
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128888
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
In some cases, there may be widened users of inductions even though the
plan includes the scalar VF. In those cases, make sure we still replace
the VPWidenIntOrFpInductionRecipe with scalar steps, as otherwise we may
try to execute a VPWidenIntOrFpInductionRecipe with a scalar VF.
Alternatively the patch could also split the range if needed.
This fixes a crash exposed by D123720.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D128755
The names of CHARACTER strings were being truncated leading to invalid
collisions and other failures. This change makes sure to use the entire
string as the seed for the unique name.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: jeanPerier
Differential Revision: https://reviews.llvm.org/D128884
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.
This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D126116
Currently, we only remove dead blocks and non-feasible edges in
IPSCCP, but not in SCCP. I'm not aware of any strong reason for
that difference, so this patch updates SCCP to perform the CFG
cleanup as well.
Compile-time impact seems to be pretty minimal, in the 0.05%
geomean range on CTMark.
For the test case from https://reviews.llvm.org/D126962#3611579
the result after -sccp now looks like this:
define void @test(i1 %c) {
entry:
br i1 %c, label %unreachable, label %next
next:
unreachable
unreachable:
call void @bar()
unreachable
}
-jump-threading does nothing on this, but -simplifycfg will produce
the optimal result.
Differential Revision: https://reviews.llvm.org/D128796
Here is a character SELECT CASE construct that requires a temp to hold the
result of the TRIM intrinsic call:
```
module m
character(len=6) :: s
contains
subroutine sc
n = 0
if (lge(s,'00')) then
select case(trim(s))
case('11')
n = 1
case default
continue
case('22')
n = 2
case('33')
n = 3
case('44':'55','66':'77','88':)
n = 4
end select
end if
print*, n
end subroutine
end module m
```
This SELECT CASE construct is implemented as an IF/ELSE-IF/ELSE comparison
sequence. The temp must be retained until some comparison is successful.
At that point the temp may be freed. Generalize statement context processing
to allow multiple finalize calls to do this, such that the program always
executes exactly one freemem call.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler, vdonaldson
Differential Revision: https://reviews.llvm.org/D128852
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Follow up to D127894, new liveness update code needs to handle
the case where S_ANDN2 input must be extended through loops when
V_CNDMASK_B32 has been hoisted.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128800
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.
This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
ADR+NOP if they reference the same page
These two kinds already cover more than 50% of all LOHs in
chromium_framework.
Differential Review: https://reviews.llvm.org/D128093
`mov x0, 1024u` is permitted in binutils but rejected by the integrated
assembler. Support the case. This is especially important when using the C
pre-processor with the assembler: some shared code between C and assembler may
use lower-cased suffices.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D128871
With libgcc, we follow the behavior of GCC for backwards compatibility,
only using --as-needed in the non-C++ mode.
With libunwind, there are no backward compatibility requirements so we
can always use --as-needed on all supported platforms.
Differential Revision: https://reviews.llvm.org/D128841
enabled
The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.
This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D128794
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.
Differential Revision: https://reviews.llvm.org/D128804
It's violate coding guideline in LLVM coding standard[1], because the the initialization order is nondeterministic and that might increase the launch time of programs.
However these variables are only used to cache query result, so we can move these variables into the function,, that which resolve both issue: 1. initialized in deterministic order, 2. Initialized that when the first time used.
[1] https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D128726
D126838 added support for the TYPE_MATCH compile-once run-everywhere
relocation to LLVM proper. On the clang side no changes are necessary,
other than the adjustment of a comment to mention this relocation as well.
This change takes care of that.
Differential Revision: https://reviews.llvm.org/D126839
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!
Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128847
Among others, BPF currently supports the type-exists CO-RE relocation
(e.g., see D83878 & D83242). Its intention, as the name tries to convey,
is to be used for checking existence of a type in a target.
While that check is useful and has its place, we would also like to
be able to perform stricter type queries: instead of just checking mere
existence, we want to make sure that members match up in composite
types, that enum variants are present, etc. We refer to this as "type
match".
This change proposes the addition of a new relocation variant/value that
we intend to use for establishing this match relation.
Differential Revision: https://reviews.llvm.org/D126838
llvm::codeview::visitMemberRecordStream expects to receive an array ref that's FieldListRecord's Data not a CVType's data which has 4 more bytes preceeding. The first 2 bytes indicate the size of the FieldListRecord, and following 2 bytes is always 0x1203. Inside llvm::codeview::visitMemberRecordStream, it iterates to the data to check if first two bytes matching some type record kinds. If the size coincidentally matches one type kind, it will start parsing from there and causing crash.
There's no reason that arguments to e.g. lambda calls should be treated
differently than those to "real" functions.
Reviewed By: nridge
Differential Revision: https://reviews.llvm.org/D128329
Don't dump dot CFG graph for functions that should not be printed.
Reviewed By: rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D128699
Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.
Differential Revision: https://reviews.llvm.org/D128772
The copy statements inserted by the matrix-multiplication optimization
introduce new dependencies between the copy statements and other
statements. As a result, the DependenceInfo must be recomputed.
Not recomputing them caused IslAstInfo to deduce that some loops are
parallel but cause race conditions when accessing the packed arrays.
As a result, matrix-matrix multiplication currently cannot be
parallelized.
Also see discussion at https://reviews.llvm.org/D125202
TestVSCode_breakpointEvents.py is failing on macOS Ventura because we
receive 3 breakpoint events instead of one. This is likely the result of
dyld moving into the shared cache.
Completes D127777 by adding llvm-side tests for emitting index and imports files from in-process ThinLTO
Differential Revision: https://reviews.llvm.org/D128771
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.
However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.
For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.
Add an assertion for the number of blocks when updating a function
layout.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128696