This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
This patch addresses @paulwalker-arm's comment on D117900 to
only update/write the by-ref operands iff the function returns
true. It also handles a few more cases where a series of added
offsets can be folded into the base pointer, rather than just looking
at a single offset.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D119728
These are failing on our silent bot:
https://lab.llvm.org/staging/#/builders/162/builds/358
$ <run cmd>
main
foo
bar
baz
SanitizerCoverage: ./sanitizer_coverage_trace_pc_guard-dso.cpp.tmp.2122517.sancov: 2 PCs written
SanitizerCoverage: ./sanitizer_coverage_trace_pc_guard-dso.cpp.tmp_2.so.2122517.sancov: 1 PCs written
SanitizerCoverage: ./sanitizer_coverage_trace_pc_guard-dso.cpp.tmp_1.so.2122517.sancov: 1 PCs written
$ <sancov cmd>
ERROR: Coverage points in binary and .sancov file do not match.
Also reproduces if you build for Thumb on v8 hardware.
Doesn't fail when built with Arm only code so I guess the Thumb mode bit
in the PCs might be the issue.
Improve the LinalgOp verification to ensure the iterator types is known. Previously, unknown iterator types have been ignored without warning, which can lead to confusing bugs.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D120649
Exit values of vector inductions are generated completely independent of
the induction recipes. Consider them for removal, if they are not used
in loop.
This fixes a crash exposed by 49b23f451c.
The Cortex-A55 scheduling model is used for -mcpu=generic, meaning it
can have a wider effect than just the A55. The changes to the A55
scheduling model seems to have caused performance regressions on
Cortex-A510 device which have latencies closer to the original and
different forwarding paths.
This partially reverts the changes from D117003, at least until we can
do something to improve Cortex-A510. According to my results, this
improves the A510 results without altering the A55 very much.
Summary: This is an initial implementation of lvm-objcopy for XCOFF32.
Currently only supports simple copying, op-passthrough to follow.
Reviewed By: jhenderson, shchenz
Differential Revision: https://reviews.llvm.org/D97656
Only call it for intrinsic min/max. The moved implementation is
unchanged apart from the one-use check: It is now hardcoded to
one-use, without the two-use special case for SPF.
This patch removes the builders for `omp.wsloop` operation that aren't
specifically needed anywhere. We can add them later if the need arises.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D120533
When we are building modules, there are cases where the only way to determine
validity of access is by comparing primary interface names. This is because we need
to be able to associate a primary interface name with an imported partition, but
before the primary interface module is complete - so that textual comparison is
necessary.
If this turns out to be needed many times, we could cache the result, but it seems
unlikely to be significant (at this time); cases with very many imported partitions
would seem unusual.
Differential Revision: https://reviews.llvm.org/D118598
SetValueFromCString and SetData methods return false if register can't
be written but they don't set a error message. It sometimes confuses
callers of these methods because they try to get the error message in case of
failure but Status::AsCString returns nullptr.
For example, lldb-vscode crashes due to this bug if some register can't
be written. It invokes SBError::GetCString in case of error and doesn't
check whether the result is nullptr (see request_setVariable implementation in
lldb-vscode.cpp for more info).
Reviewed By: labath, clayborg
Differential Revision: https://reviews.llvm.org/D120319
Not only some AMO instructions but also other instructions need to
process (${gpr}) or 0(${gpr}), where the 0 is be silently ignored.
This patch does some changes for general usage.
Signed-off-by: Eric Tang <eric.tang@starfivetech.com>
Differential Revision: https://reviews.llvm.org/D120017
Construct LLVM Support module about CSKY target parser and attribute parser.
It refers CSKY ABIv2 and implementation of GNU binutils and GCC.
https://github.com/c-sky/csky-doc/blob/master/C-SKY_V2_CPU_Applications_Binary_Interface_Standards_Manual.pdf
Now we only support CSKY 800 series cpus and newer cpus in the future undering CSKYv2 ABI specification.
There are 11 archs including ck801, ck802, ck803, ck803s, ck804, ck805, ck807, ck810, ck810v, ck860, ck860v.
Every arch has base extensions, the cpus of that arch family have more extended extensions than base extensions.
We need specify extended extensions for every cpu. Every extension has its enum value, name and related llvm feature string with +/-.
Every enum value represents a bit of uint64_t integer.
Differential Revision: https://reviews.llvm.org/D119917
In this patch, we add a more narrower exclusion for
zeroext (srl x) -> srli (slli x), so that it provides an opportunity
for the selection of sraiw.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D120467
Let `archive-aix-libatomic` accept additional argument to customize name of output atomic library.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D120534
By failing to lex the token we end up both parsing it as a binary
operator ourselves and parsing it as a unary operator when calling
parseExpression on the RHS. For plus this is harmless but for minus this
parses "foo - 4" as "foo - -4", effectively treating a top-level minus
as a plus.
Fixes https://github.com/llvm/llvm-project/issues/54105
Reviewed By: asb, MaskRay
Differential Revision: https://reviews.llvm.org/D120635
Windows UCRT has got a bug in older versions (present in CI), where
it successfully does set a locale named
`for_sure_this_is_not_an_existing_locale`. By adjusting the tested
locale name to `forsurethisisnotanexistinglocale`, that test works
as expected, failing to set the locale.
The bug is reported upstream at
https://developercommunity.visualstudio.com/t/setlocale-succeeds-for-bogus-locale-names-in-older/1652241,
but as it already is working correctly in newer versions, no action
was prompted there.
We could of course add a bug detection in features.py like other
existing `broken-*` features, but that would seem kinda
pointless as it would be doing exactly what this test does.
Instead just adjust the tested dummy locale name.
This bit was approved to be committed on its own, in
https://reviews.llvm.org/D120546 (which is left open to follow up on
review of the rest of that patch).
The zh_CN.UTF-8 locale on Glibc has got `n_sign_posn == 4` (which means
having the negative sign just after the currency symbol), but has
`int_n_sign_posn == 1` (which means before the string).
On Windows, there's no separate `int_n_sign_posn` field, so the same
`n_sign_posn` (which is 4 there too) is used for international currency
formatting too. This makes the ordering for the international case on
Windows be the same as for the national one right above it.
On Apple platforms, the fr_FR.UTF-8 locale has got `n_sign_posn == 2`
but `p_sign_posn == 1`, giving a different order for the French locale
for the negative format.
On Apple platforms for the zh_CN.UTF-8 locale, both `n_sign_posn` and
`int_n_sign_posn` are 4, but `p_sign_posn` and `int_p_sign_posn` are 1.
Differential Revision: https://reviews.llvm.org/D120550
The SPARC and MIPS branching operations have a branch delay slot, 4 more bytes occupied.
Depends on D120381
Reviewed By: ro, MaskRay
Differential Revision: https://reviews.llvm.org/D120451
SLP makes very heavy use of aliasing queries to construct pointer dependencies for scheduling purposes. AA internally usings pointerMayBeCaptured to prove some noalias results. In a local profile, we were spending about 4% of total O2 time in capture tracking. By using BatchAA interface - which caches capture results - this drops to 2%.
Note that there is no invalidation of BatchAA here. This assumes that no transformation done by SLP invalidates alias or capture results. This is the same assumption made by the existing AliasCache, so this is not a new assumption in the code.
This patch adds a new VPScalarIVStepsRecipe to handle building scalar
steps.
In the first patch, it only handles the case where there is no vector
induction variable needed.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D115953