Generate INSTRINFO_OPERAND_TYPE table in X86GenInstrInfo.inc.
This diff adds support for instructions that were previously reported as having
memory access size 0. It replaces the heuristic of looking at instruction
register width to determine memory access width by instead checking the memory
operand type using tablegen-provided tables.
Reviewed By: skan
Differential Revision: https://reviews.llvm.org/D126116
Currently, we only remove dead blocks and non-feasible edges in
IPSCCP, but not in SCCP. I'm not aware of any strong reason for
that difference, so this patch updates SCCP to perform the CFG
cleanup as well.
Compile-time impact seems to be pretty minimal, in the 0.05%
geomean range on CTMark.
For the test case from https://reviews.llvm.org/D126962#3611579
the result after -sccp now looks like this:
define void @test(i1 %c) {
entry:
br i1 %c, label %unreachable, label %next
next:
unreachable
unreachable:
call void @bar()
unreachable
}
-jump-threading does nothing on this, but -simplifycfg will produce
the optimal result.
Differential Revision: https://reviews.llvm.org/D128796
Here is a character SELECT CASE construct that requires a temp to hold the
result of the TRIM intrinsic call:
```
module m
character(len=6) :: s
contains
subroutine sc
n = 0
if (lge(s,'00')) then
select case(trim(s))
case('11')
n = 1
case default
continue
case('22')
n = 2
case('33')
n = 3
case('44':'55','66':'77','88':)
n = 4
end select
end if
print*, n
end subroutine
end module m
```
This SELECT CASE construct is implemented as an IF/ELSE-IF/ELSE comparison
sequence. The temp must be retained until some comparison is successful.
At that point the temp may be freed. Generalize statement context processing
to allow multiple finalize calls to do this, such that the program always
executes exactly one freemem call.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: klausler, vdonaldson
Differential Revision: https://reviews.llvm.org/D128852
Co-authored-by: V Donaldson <vdonaldson@nvidia.com>
Follow up to D127894, new liveness update code needs to handle
the case where S_ANDN2 input must be extended through loops when
V_CNDMASK_B32 has been hoisted.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D128800
Linker optimization hints mark a sequence of instructions used for
synthesizing an address, like ADRP+ADD. If the referenced symbol ends up
close enough, it can be replaced by a faster sequence of instructions
like ADR+NOP.
This commit adds support for 2 of the 7 defined ARM64 optimization
hints:
- LOH_ARM64_ADRP_ADD, which transforms a pair of ADRP+ADD into ADR+NOP
if the referenced address is within +/- 1 MiB
- LOH_ARM64_ADRP_ADRP, which transforms two ADRP instructions into
ADR+NOP if they reference the same page
These two kinds already cover more than 50% of all LOHs in
chromium_framework.
Differential Review: https://reviews.llvm.org/D128093
`mov x0, 1024u` is permitted in binutils but rejected by the integrated
assembler. Support the case. This is especially important when using the C
pre-processor with the assembler: some shared code between C and assembler may
use lower-cased suffices.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D128871
With libgcc, we follow the behavior of GCC for backwards compatibility,
only using --as-needed in the non-C++ mode.
With libunwind, there are no backward compatibility requirements so we
can always use --as-needed on all supported platforms.
Differential Revision: https://reviews.llvm.org/D128841
enabled
The C++20 Coroutines couldn't be compiled to WebAssembly due to an
optimization named symmetric transfer requires the support for musttail
calls but WebAssembly doesn't support it yet.
This patch tries to fix the problem by adding a supportsTailCalls
method to TargetTransformImpl to skip the symmetric transfer when
tail-call feature is not supported.
Reviewed By: tlively
Differential Revision: https://reviews.llvm.org/D128794
In X86 we split greddy register allocation into 2 passes. The 1st pass
is to allocate tile register, and the 2nd pass is to allocate the rest
of virtual register. In most cases there is no tile register, so the 1st
pass is unnecessary. To improve the compiling time, we check if there is
any register need to be allocated by invoking callback
`ShouldAllocateClass`. If there is no register to be allocated, just
return false in the pass. This would improve the 1st greed RA pass for
normal cases.
Differential Revision: https://reviews.llvm.org/D128804
It's violate coding guideline in LLVM coding standard[1], because the the initialization order is nondeterministic and that might increase the launch time of programs.
However these variables are only used to cache query result, so we can move these variables into the function,, that which resolve both issue: 1. initialized in deterministic order, 2. Initialized that when the first time used.
[1] https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D128726
D126838 added support for the TYPE_MATCH compile-once run-everywhere
relocation to LLVM proper. On the clang side no changes are necessary,
other than the adjustment of a comment to mention this relocation as well.
This change takes care of that.
Differential Revision: https://reviews.llvm.org/D126839
When the iteration graph is cyclic (even after several attempts using less and less constraints), the current sparse compiler bails out, and no rewriting hapens. However, this revision adds some new logic where the sparse compiler tries to find a single input sparse tensor that breaks the cycle, and then adds a proper sparse conversion operation. This way, more incoming kernels can be handled!
Note, the resulting code is not optimal (although it keeps more or less proper "sparse" complexity), and more improvements should be added (especially when the kernel directly yields without computation, such as the transpose example). However, handling is better than not handling ;-)
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D128847
Among others, BPF currently supports the type-exists CO-RE relocation
(e.g., see D83878 & D83242). Its intention, as the name tries to convey,
is to be used for checking existence of a type in a target.
While that check is useful and has its place, we would also like to
be able to perform stricter type queries: instead of just checking mere
existence, we want to make sure that members match up in composite
types, that enum variants are present, etc. We refer to this as "type
match".
This change proposes the addition of a new relocation variant/value that
we intend to use for establishing this match relation.
Differential Revision: https://reviews.llvm.org/D126838
llvm::codeview::visitMemberRecordStream expects to receive an array ref that's FieldListRecord's Data not a CVType's data which has 4 more bytes preceeding. The first 2 bytes indicate the size of the FieldListRecord, and following 2 bytes is always 0x1203. Inside llvm::codeview::visitMemberRecordStream, it iterates to the data to check if first two bytes matching some type record kinds. If the size coincidentally matches one type kind, it will start parsing from there and causing crash.
There's no reason that arguments to e.g. lambda calls should be treated
differently than those to "real" functions.
Reviewed By: nridge
Differential Revision: https://reviews.llvm.org/D128329
Don't dump dot CFG graph for functions that should not be printed.
Reviewed By: rafauler, maksfb
Differential Revision: https://reviews.llvm.org/D128699
Otherwise a header may be erroneously marked as having a header macro guard and won't get re-included.
Differential Revision: https://reviews.llvm.org/D128772
The copy statements inserted by the matrix-multiplication optimization
introduce new dependencies between the copy statements and other
statements. As a result, the DependenceInfo must be recomputed.
Not recomputing them caused IslAstInfo to deduce that some loops are
parallel but cause race conditions when accessing the packed arrays.
As a result, matrix-matrix multiplication currently cannot be
parallelized.
Also see discussion at https://reviews.llvm.org/D125202
TestVSCode_breakpointEvents.py is failing on macOS Ventura because we
receive 3 breakpoint events instead of one. This is likely the result of
dyld moving into the shared cache.
Completes D127777 by adding llvm-side tests for emitting index and imports files from in-process ThinLTO
Differential Revision: https://reviews.llvm.org/D128771
When SplitFunctions pass adds a trampoline code for exception landing
pads (limited to shared objects), it may increase the size of the hot
fragment making it larger than the whole function pre-split. When this
happens, the pass reverts the splitting action by restoring the original
block order and marking all blocks hot.
However, if createEHTrampolines() added new blocks to the CFG and
modified invoke instructions, simply restoring the original block layout
will not suffice as the new CFG has more blocks.
For proper backout of the split, modify the original layout by merging
in trampoline blocks immediately before their matching targets. As a
result, the number of blocks increases, but the number of instructions
and the function size remains the same as pre-split.
Add an assertion for the number of blocks when updating a function
layout.
Reviewed By: rafauler
Differential Revision: https://reviews.llvm.org/D128696
This feature is tested by unit test since not many places in the codebase
use SubElementTypeInterface.
Differential Revision: https://reviews.llvm.org/D127539
Since only mutable types and attributes can go into infinite recursion
inside SubElementInterface::walkSubElement, and there are only a few of
them (mutable types and attributes), we introduce new traits for Type
and Attribute: TypeTrait::IsMutable and AttributeTrait::IsMutable,
respectively. They indicate whether a type or attribute is mutable.
Such traits are required if the ImplType defines a `mutate` function.
Then, inside SubElementInterface, we use a set to record visited mutable
types and attributes that have been visited before.
Differential Revision: https://reviews.llvm.org/D127537
- when printing a shared node for the second time, don't print its children
(This keeps output proportional to the size of the structure)
- when printing a shared node for the second time, print its type only, not rule
(for consistency with above: don't dump details of nodes twice)
- don't abbreviate shared nodes, to ensure we can prune the tree there
Differential Revision: https://reviews.llvm.org/D128805
The filter clause in the landingpad may not have a GlobalVariable operand.
It may instead have a ConstantArray of operands and each operand within this
ConstantArray should also be checked to see if it is a GlobalVariable.
This patch add the check for the ConstantArray as well as a debug message that
outputs the contents of MustKeepGlobalVariables.
Reviewed By: lei, amyk, scui
Differential Revision: https://reviews.llvm.org/D128287
With libgcc, we follow the behavior of GCC for backwards compatibility,
only using --as-needed in the non-C++ mode.
With libunwind, there are no backward compatibility requirements so we
can always use --as-needed on all supported platforms.
Differential Revision: https://reviews.llvm.org/D128841
An ADD_ADDR rebase opcode's argument can be encoded as an immediate if
the offset is less than 15 * word size. This change reduces the size of
chromium_framework by 100+ KiB.
Differential Revision: https://reviews.llvm.org/D128798
It helps to avoid copy-paste mistakes and makes custom code paths more
noticeable.
Not funnelling all diagnostic through `ODRDiagDeclError` because plan to
break down `err_module_odr_violation_mismatch_decl_diff` into smaller
pieces instead of making it bigger and bigger.
Differential Revision: https://reviews.llvm.org/D128487