SimplifyCFG does some common code hoisting, which is limited to hoisting a
sequence of identical instruction in identical order and stops at the first
non-identical instruction.
This patch allows hoisting instruction pairs over same-length sequences of
non-matching instructions. The linear asymptotic complexity of the algorithm
stays the same, there's an extra parameter `simplifycfg-hoist-common-skip-limit`
serving to limit compilation time and/or the size of the hoisted live ranges.
The patch improves SPECv6/525.x264_r by about 10%.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D129370
This patch adds constant folder for ExpM1Op which only supports single and double precision floating-point.
Differential Revision: https://reviews.llvm.org/D130567
Linux Standard Base Core Specification says that CIE/FDE is padded to an
addressing unit size boundary, but in practice GNU assembler/LLVM integrated
assembler pad FDE/CIE to 4 and the last FDE to 8 on 64-bit systems.
In addition, GNU ld doesn't pad to 8, so let's drop excess padding, too.
If the assembler provides aligned pieces, the output will be aligned.
Noticed .eh_frame size reduction for 3 executables: 0.3% (chrome), 4.7% (clang),
7.6% (an internal program).
After fffabd5348, the v2 and v3
versions produce poison instead of undef. Also adjust the v1
version, as well as the test expectations, to make the example
pass again.
getModRefInfo() queries currently track whether the result is a
MustAlias on a best-effort basis. The only user of this functionality
is the optimized memory access type in MemorySSA -- which in turn
has no users. Given that this functionality has not found a user
since it was introduced five years ago (in D38862), I think we
should drop it again.
The context is that I'm working to separate FunctionModRefBehavior
to track mod/ref for different location kinds (like argmem or
inaccessiblemem) separately, and the fact that ModRefInfo also has
an unrelated Must flag makes this quite awkward, especially as this
means that NoModRef is not a zero value. If we want to retain the
functionality, I would probably split getModRefInfo() results into
a part that just contains the ModRef information, and a separate
part containing a (best-effort) AliasResult.
Differential Revision: https://reviews.llvm.org/D130713
This is successor for D125291. This revision would try to use
@llvm.threadlocal.address in clang to access TLS variable. The reason
why the OpenMP tests contains a lot of change is that they uses
utils/update_cc_test_checks.py to update their tests.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D129833
This belongs to a series of patches which try to solve the thread
identification problem in coroutines. See
https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015
for a full background.
The problem consists of two concrete problems: TLS variable and readnone
functions. This patch tries to convert the TLS problem to readnone
problem by converting the access of TLS variable to an intrinsic which
is marked as readnone.
The readnone problem would be addressed in following patches.
Reviewed By: nikic, jyknight, nhaehnle, ychen
Differential Revision: https://reviews.llvm.org/D125291
Previously we only supporting using the system pointer size (aka the
`absptr` encoding) because `llvm-mc`'s CFI directives always generate EH
frames with that encoding. But libffi uses 4-byte-encoded, hand-rolled
EH frames, so this patch adds support for it.
Fixes#56576.
Reviewed By: #lld-macho, oontvoo
Differential Revision: https://reviews.llvm.org/D130804
This patch removes the `type` field from `Attribute` along with the
`Attribute::getType` accessor.
Going forward, this means that attributes in MLIR will no longer have
types as a first-class concept. This patch lays the groundwork to
incrementally remove or refactor code that relies on generic attributes
being typed. The immediate impact will be on attributes that rely on
`Attribute` containing a type, such as `IntegerAttr`,
`DenseElementsAttr`, and `ml_program::ExternAttr`, which will now need
to define a type parameter on their storage classes. This will save
memory as all other attribute kinds will no longer contain a type.
Moreover, it will not be possible to generically query the type of an
attribute directly. This patch provides an attribute interface
`TypedAttr` that implements only one method, `getType`, which can be
used to generically query the types of attributes that implement the
interface. This interface can be used to retain the concept of a "typed
attribute". The ODS-generated accessor for a `type` parameter
automatically implements this method.
Next steps will be to refactor the assembly formats of certain operations
that rely on `parseAttribute(type)` and `printAttributeWithoutType` to
remove special handling of type elision until `type` can be removed from
the dialect parsing hook entirely; and incrementally remove uses of
`TypedAttr`.
Reviewed By: lattner, rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D130092
* Inline getReloc
* Fold the UINT32_MAX length check into the section size check.
This transformation is valid because we don't support .eh_frame input sections
larger than 32-bit (unrealistic even for large code models).
This simplifies code, removes a read32 (for id==0 check), and makes it feasible
to combine some operations in EhInputSection::split and EhFrameSection::addRecords.
Mostly NFC, but fixes "Relocation not in any piece" assertion failure in an
erroneous case when a relocation offset precedes all CIE/FDE pices.
Expand load address pseudo-instructions earlier (pre-ra) to allow follow-up
patches to fold the addi of PseudoLLA instructions into the immediate
operand of load/store instructions.
Differential Revision: https://reviews.llvm.org/D123264
If we change
CieRecord *&rec = cieMap[{cie.data(), personality}];
to
CieRecord *&rec = cieMap[{cie.data(), nullptr}];
The new test can catch the failure.
When dead-code analysis is run at the scope of a function, call ops to
other functions at the same level were being marked as unreachable,
since the analysis optimistically assumes the call op to have no known
predecessors and that all predecessors are known, but the callee would
never get visited.
This patch fixes the bug by checking if a referenced function is above
the top-level op of the analysis, and is thus considered an external
callable.
Fixes#56830
Reviewed By: zero9178
Differential Revision: https://reviews.llvm.org/D130829
inputSections temporarily contains EhInputSection objects mainly for
combineEhSections. Place EhInputSection objects into a new vector
ehInputSections instead of inputSections.
issue #56775
I rearranged the Thumb2 codegen test to avoid simplifying the chain
of rounding instructions. I'm assuming the intent of the test is
to verify lowering of each of those intrinsics.
Without this patch, clang-repl incorrectly pass some tests when there's
error occured.
Signed-off-by: Jun Zhang <jun@junz.org>
Differential Revision: https://reviews.llvm.org/D130422
Only PACKSS/PACKUS faux shuffles make use of the demanded elts at the moment, but this at least improves the handling of a couple of truncation patterns.
CalculateSmallVectorDefaultInlinedElements<..>::value is already used as default value for second template parameter in SmallVector class declaration.
There is no need to pass it explicitly in to_vector.
Extracted from: https://reviews.llvm.org/D129781
Differential Revision: https://reviews.llvm.org/D130774