Stop using the _term variants of the mov to save the initial exec
value before the waterfall loop. This cannot be glued to the bottom of
the block because we may need to spill the result register. Just use a
regular mov, like the loops produced on the DAG path. Fixes some
verification errors with regalloc fast.
This was inserting the new G_CONSTANT after the use, and the later
block scan would run off the end. Fix calling SkipPHIsAndLabels for no
apparent reason.
Fix https://github.com/llvm/llvm-project/issues/53073
In case of a relocation error, GNU ld's link map includes
the archive member extraction information but not output sections.
Our -Map and --why-extract= are currently no-op in case of an error.
This change makes the two options work.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D116838
to prepare for D116838, otherwise for linkerscript/discard-section-err.s,
there will be a null pointer dereference in `part.relrDyn->getParent()->size`
in `finalizeSynthetic(part.relrDyn.get())`.
I'm attempting to debug an issue that I can only get to happen on
godbolt, where the cpu-dispatch resolver for an out of line member
function is generated with the wrong name, causing a link failure.
This makes the mapping between iOS & tvOS/watchOS versions more accurate. For example, iOS 9.3 now gets correctly mapped into tvOS 9.2 and not tvOS 9.3.
Before this change, the incorrect mapping could cause excessive or missing warnings for code that specifies availability for iOS, but not for tvOS/watchOS.
rdar://81491680
Differential Revision: https://reviews.llvm.org/D116822
Use it to remove explicit string compares from unrolling preferences.
I'm of two minds on this. Ideally, we would define things in terms
of architectural or microarchitectural features, but it's hard to
do that with things like unrolling preferences without just ending up
with FeatureSiFive7UnrollingPreferences.
Having a proc enum is consistent with ARM and AArch64. X86 only has
a few and is trying to move away from it.
Reviewed By: asb, mcberg2021
Differential Revision: https://reviews.llvm.org/D117060
-Wdeclaration-after-statement currently only outputs an diagnostic if the user is compiling in C versions older than C99, even if the warning was explicitly requested by the user.
This patch makes the warning also available in later C versions. If the C version is C99 or later it is simply a normal warning that is disabled by default (as it is valid C99) and has to be enabled by users. In older versions it remains an extension warning, and therefore affected by -pedantic.
The above behaviour also matches GCCs behaviour.
Fixes https://bugs.llvm.org/show_bug.cgi?id=51931
Differential Revision: https://reviews.llvm.org/D114787
Enable constant folding of ops within the math dialect, and introduce constant folders for ceil and log2
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D117085
Fma combine assumes that MRI.getVRegDef(Reg)->getOperand(0).getReg() = Reg
which is not true when Reg is defined by instruction with multiple defs
e.g. G_UNMERGE_VALUES.
Fix is to keep register and the instruction that defines register in
DefinitionAndSourceRegister and use when needed.
Differential Revision: https://reviews.llvm.org/D117032
On Apple platforms, arc4random is faster than /dev/urandom, and it is
the recommended user-space RNG according to Apple's own OS folks.
This commit adds an ABI switch to guard ABI-break-protections in
std::random_device, and starts using arc4random instead of /dev/urandom
to implement std::random_device on Apple platforms.
Note that previously, `std::random_device` would allow passing a custom
token to its constructor, and that token would be interpreted as the name
of a file to read entropy from. This was implementation-defined and
undocumented. After this change, Apple platforms will be using arc4random()
instead, and any custom token passed to the constructor will be ignored.
This behavioral change will also impact other platforms that use the
arc4random() implementation, such as OpenBSD. This should be fine since
that is effectively a relaxation of the constructor's requirements.
rdar://86638350
Differential Revision: https://reviews.llvm.org/D116045
Depends on D112160
This adds the new options `--call-graph-profile-sort` (default),
`--no-call-graph-profile-sort` and `--print-symbol-order=`. If call graph
profile sorting is enabled, reads `__LLVM,__cg_profile` sections from object
files and uses the resulting graph to put callees and callers close to each
other in the final binary via the C3 clustering heuristic.
Differential Revision: https://reviews.llvm.org/D112164
Set the current thread ID to the thread where an event happened.
As a result, e.g. when a signal is delivered to a thread other than
the first one, the respective T packet refers to the signaled thread
rather than the first thread (with no stop reason). While this doesn't
strictly make a difference to the LLDB client, it is the expected
behavior.
Differential Revision: https://reviews.llvm.org/D117103
std::remove from algorithm is a lot more common than the overload from
the cstdio (which deletes files). This patch introduces a set of symbols
for which we should prefer the overloaded versions.
Differential Revision: https://reviews.llvm.org/D114724
This ports the `.cg_profile` assembly directive and call graph profile section
generation to MachO from COFF/ELF. Due to MachO section naming rules, the
section is called `__LLVM,__cg_profile` rather than `.llvm.call-graph-profile`
as in COFF/ELF. Support for llvm-readobj is included to facilitate testing.
Corresponding LLD change is D112164
Differential Revision: https://reviews.llvm.org/D112160
When `-ftrivial-auto-var-init=` is enabled, allocas unconditionally
receive auto-initialization since [1].
In certain cases, it turns out, this is causing problems. For example,
when using alloca to add a random stack offset, as the Linux kernel does
on syscall entry [2]. In this case, none of the alloca'd stack memory is
ever used, and initializing it should be controllable; furthermore, it
is not always possible to safely call memset (see [2]).
Introduce `__builtin_alloca_uninitialized()` (and
`__builtin_alloca_with_align_uninitialized`), which never performs
initialization when `-ftrivial-auto-var-init=` is enabled.
[1] https://reviews.llvm.org/D60548
[2] https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com
Reviewed By: glider
Differential Revision: https://reviews.llvm.org/D115440
This patch adds a new BranchOnCount VPInstruction opcode with 2
operands. It first compares its 2 operands (increment of canonical
induction and vector trip count), followed by a branch to either the
exit block or back to the vector header.
It must be the last recipe in the exit block of the topmost vector loop
region.
This extracts parts from D113224 and was discussed in D113223.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D116479
This is required to query the legality more precisely in the LoopVectorizer.
This adds another TTI function named 'forceScalarizeMaskedGather/Scatter'
function to work around the hack introduced for MVE, where
isLegalMaskedGather/Scatter would return an answer by second-guessing
where the function was called from, based on the Type passed in (vector
vs scalar). The new interface makes this explicit. It is also used by
X86 to check for vector widths where gather/scatters aren't profitable
(or don't exist) for certain subtargets.
Differential Revision: https://reviews.llvm.org/D115329
All kernels can be called from the host as per the SPIR_KERNEL calling
convention. As such, all kernels should have external linkage, but
block enqueue kernels were created with internal linkage.
Reported-by: Pedro Olsen Ferreira
Differential Revision: https://reviews.llvm.org/D115523
This feature was previously controlled by a TargetOptions flag, and I
figured that codegen::InitTargetOptionsFromCodeGenFlags would default it
to "on" for all frontends. Enabling by default was discussed here:
https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html
and originally supposed to happen in 3c04507088, but it didn't actually
take effect, as it turns out frontends initialize TargetOptions themselves.
This patch moves the flag from a TargetOptions flag to a global flag to
CodeGen, where it isn't immediately affected by the frontend being used.
Hopefully this will actually cause instr-ref to be on by default on x86_64
now!
This patch is easily reverted, and chances of turbulence are moderately
high. If you need to revert, please consider instead commenting out the
'return true' part of llvm::debuginfoShouldUseDebugInstrRef to turn the
feature off, and dropping me an email.
Differential Revision: https://reviews.llvm.org/D116821
The test checks that an array of Obj-C literal integers (e.g. `@1`) gets a UBSan
warning when cast to an NSString, however the actual concrete Obj-C class of
literal integers doesn't always need to be __NSCFNumber. Let's relax the test
expectations to allow NSConstantIntegerNumber. Which exact subclass of NSNumber
is used is not actually important for the test (the test is just checking that
the invalid cast warning is thrown).
llvm.vp.merge interprets the %evl operand differently than the other vp
intrinsics: all lanes at positions greater or equal than the %evl
operand are passed through from the second vector input. Otherwise it
behaves like llvm.vp.select.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D116725
Due to a missing cast the << 60 always resulted in zero leaving
the top nibble empty. So we weren't actually testing that lldb
ignores those bits in addition to the tag bits.
Correct that and also set the top nibbles to ascending values
so that we can catch if lldb only removes one of the tag bits
and top nibble, but not both.
In future the tag manager will likely only remove the tag bits
and leave non-address bits to the ABI plugin but for now make
sure we're testing what we claim to implement.