If we're multiplying all elements of a vector by '0' or '1' then we can more efficiently perform this as a clearing mask (that is likely to further simplify to a shuffle blend).
This was noticed when reviewing D87502 but seems to help idiv/irem by constant cases even more as '0'/'1' values are often used for 'passthrough' cases.
Differential Revision: https://reviews.llvm.org/D88225
When looking for memory defs killed by memory terminators the code
currently incorrectly ignores the size argument of llvm.lifetime.end.
This patch updates the code to use isMemTerminator and updates
isMemTerminator to use isOverwrite() to make sure locations that are
outside the range marked as dead by llvm.lifetime.end are not
considered. Note that isOverwrite is only used for llvm.lifetime.end,
because free-like functions make the whole underlying object dead.
llvm.lifetime.end accepts a size parameters to limit the size of the
location marked as dead. Add a few tests with stores to locations after
the part that has been marked as dead.
The change implements evaluation of constant floating point expressions
under non-default rounding modes. The main objective was to support
evaluation of global variable initializers, where constant rounding mode
may be specified by `#pragma STDC FENV_ROUND`.
Differential Revision: https://reviews.llvm.org/D87822
This is a fix for PR47630. The regression is caused by the D78011. After
this change the code starts to call the `emitGlobalConstantLargeInt` even
for constants which requires eight bytes to store.
Differential revision: https://reviews.llvm.org/D88261
This is like FastMathFlagGuard in IR. Since we use SDAG instance to get
values, it's with SelectionDAG. By creating a FlagInserter in current
scope, all values created by getNode will get the flags if no Flags
argument provided.
In this patch, I applied it to floating point operations folding part in
DAG combiner, and removed Flags passing to getNode to show its effect.
Other places in DAG combiner and other helper methods similar to getNode
also need this. They can be done in follow-up patches.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D87361
Previously for nowait target, CG emitted a function call to `__tgt_target_nowait`, etc. However, in OpenMP RTL, these functions just directly call the no-nowait version, which means nowait is not working as expected.
OpenMP specification says a target is acutally a target task, which is an untied and detachable task. It is natural to go to the direction that generates a task for a nowait target. However, OpenMP task has a problem that it must be within to a parallel region; otherwise the task will be executed immediately. As a result, if we directly wrap to a regular task, the `target nowait` outside of a parallel region is still a synchronous version.
In D77609, I added the support for unshackled task in OpenMP RTL. Basically, unshackled task is a task that is not bound to any parallel region. So all nowait target will be tranformed into an unshackled task. In order to distinguish from regular task, a new flag bit is set for unshackled task. This flag will be used by RTL for later process.
Since all target tasks are allocated via `__kmpc_omp_target_task_alloc`, and in current `libomptarget`, `__kmpc_omp_target_task_alloc` just calls `__kmpc_omp_task_alloc`. Therefore, we can modify the flag in `__kmpc_omp_target_task_alloc` so that we don't need to modify the FE too much. If users choose to opt out the feature, they just need to use a RTL w/o support of unshackled threads.
As a result, in this patch, the `target nowait` region is simply wrapped into a regular task. Later once we have RTL support for unshackled tasks, the wrapped tasks can be executed by unshackled threads w/o changes in the FE.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D78075
Even if the type is s8/s16, assigning to gpr is preferable with constants because
worst case we can select via a constant pool load, and without cross-bank copies
to the FPR bank more patterns can be imported later.
This matches the legacy PM pass by having one constructor use command
line flags, and the other use parameters to the pass.
This fixes all tests under Transforms/LowerTypeTests using NPM.
Reviewed By: ychen, pcc
Differential Revision: https://reviews.llvm.org/D87845
This patch performs a minor cleanup of the class Slice:
static methods and constructors which take a pointer but assume that
it's not null now take the argument by reference.
NFC.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D88320
This is a patch to LangRef that clarifies the behavior of load/store/memset/memcpy/memmove when the pointers or sizes are not well-defined
as well.
MSan detects a case when e.g., only lower bits of address are garbage when `-msan-check-access-address` is enabled, and it does not directly conflict with this patch because a C program should not use a pointer with undef bits and reasonable optimizations do not convert a well-defined pointer into a pointer with undef bits.
This patch contains a definition of a well-defined value as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D87994
This commit fixes a regression (from LLVM 10 to LLVM 11 RC3) in the LLVM
C API.
Previously, commit 1ee6ec2bf removed the mask operand from the
ShuffleVector instruction, storing the mask data separately in the
instruction instead; this reduced the number of operands of
ShuffleVector from 3 to 2. AFAICT, this change unintentionally caused
a regression in the LLVM C API. Specifically, it is no longer possible
to get the mask of a ShuffleVector instruction through the C API. This
patch introduces new functions which together allow a C API user to get
the mask of a ShuffleVector instruction, restoring the functionality
which was previously available through LLVMGetOperand().
This patch also adds tests for this change to the llvm-c-test
executable, which involved adding support for InsertElement,
ExtractElement, and ShuffleVector itself (as well as constant vectors)
to echo.cpp. Previously, vector operations weren't tested at all in
echo.ll.
I also fixed some typos in comments and help-text nearby these changes,
which I happened to spot while developing this patch. Since the typo
fixes are technically unrelated other than being in the same files, I'm
happy to take them out if you'd rather they not be included in the patch.
Differential Revision: https://reviews.llvm.org/D88190
The intrinsics don't have any pointer arguments, so "argmemonly" makes
optimizations think they don't write to memory at all.
Differential Revision: https://reviews.llvm.org/D88186
This attribute allows declarations to be restricted to the framework
itself, enabling Swift to remove the declarations when importing
libraries. This is useful in the case that the functions can be
implemented in a more natural way for Swift.
This is based on the work of the original changes in
8afaf3aad2
Differential Revision: https://reviews.llvm.org/D87720
Reviewed By: Aaron Ballman
When a Mach-O corefile has an LC_NOTE "main bin spec" for a
standalone binary / firmware, with only a UUID and no load
address, try to locate the binary and dSYM by UUID and if
found, load it at offset 0 for the user.
Add a test case that tests a firmware/standalone corefile
with both the "kern ver str" and "main bin spec" LC_NOTEs.
<rdar://problem/68193804>
Differential Revision: https://reviews.llvm.org/D88282
This commit adds an interceptor for the pthread_detach function,
calling into ThreadRegistry::DetachThread, allowing for thread contexts
to be reused.
Without this change, programs may fail when they create more than 8K
threads.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47389
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D88184
This patch adds handling of rotation patterns with constant shift amounts - the next bit will be how we want to support non-uniform constant vectors.
Differential Revision: https://reviews.llvm.org/D87452
This is a similarity visualization tool that accepts a Module and
passes it to the IRSimilarityIdentifier. The resulting SimilarityGroups
are output in a JSON file.
Tests are found in test/tools/llvm-sim and check for the file not found,
a bad module, and that the JSON is created correctly.
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D86974
Pulled from D87452, this is a fixed version of the collectBitParts fshl/fshr handling which as @nikic noticed wasn't checking for different providers or had correct bit ordering (which was hid by only testing shift amounts of bitwidth/2).
Differential Revision: https://reviews.llvm.org/D88292
attachments. They would crash the backend, which expects all
DISubprograms that are not part of the type system to have a unit field.
Clang right before https://reviews.llvm.org/D79967 would generate this
kind of broken IR.
rdar://problem/69534688
Thanks to Fangrui for fixing an assembler test I had missed!
https://reviews.llvm.org/D88270
Every call to the protected SBAddress constructor and the SetAddress
method takes the address of a valid object which means we might as well
pass it as a const reference instead of a pointer and drop the null
check.
Differential revision: https://reviews.llvm.org/D88249
1c5a3c4d38 updated the variables inserted by Emscripten SjLj lowering to be
thread-local, depending on the CoalesceFeaturesAndStripAtomics pass to downgrade
them to normal globals if the target features did not support TLS. However, this
had the unintended side effect of preventing all non-TLS-supporting objects from
being linked into modules with shared memory, because stripping TLS marks an
object as thread-unsafe. This patch fixes the problem by only making the SjLj
lowering variables thread-local if the target machine supports TLS so that it
never introduces new usage of TLS that will be stripped. Since SjLj lowering
works on Modules instead of Functions, this required that the
WebAssemblyTargetMachine have its feature string updated to reflect the
coalesced features collected from all the functions so that a
WebAssemblySubtarget can be created without using any particular function.
Differential Revision: https://reviews.llvm.org/D88323
Issue Details:
In order to support coroutine splitting, any multi-value PHI node in a coroutine is split into multiple blocks with single-value PHI Nodes, which then allows a subsequent transform to generate `reload` instructions as required (i.e., to reload the value if required if the coroutine has been resumed). This causes issues with EH pads (`catchswitch` and `catchpad`) as all pads within a `catchswitch` must have the same unwind destination, but the coroutine splitting logic may modify them to each have a unique unwind destination if there is a PHI node in the unwind `cleanuppad` that is set from values in the `catchswitch` and `cleanuppad` blocks.
Fix Details:
During splitting, if such a PHI node is detected, then create a "dispatcher" `cleanuppad` as well as the blocks with single-value PHI Nodes: thus the "dispatcher" is the unwind destination and it will detect which predecessor called it and then branch to the appropriate single-value PHI node block, which will then branch back to the original `cleanuppad` block.
Reviewed By: GorNishanov, lxfind
Differential Revision: https://reviews.llvm.org/D88059
They operate like Defined symbols but with no associated InputSection.
Note that `ld64` seems to treat the weak definition flag like a no-op for
absolute symbols, so I have replicated that behavior.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D87909
Apparently this is used in real programs. I've handled this by reusing
the logic we already have for branch (function call) relocations.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D87852
Not 100% sure but it appears that bundles are almost identical to
dylibs, aside from the fact that they do not contain `LC_ID_DYLIB`. ld64's code
seems to treat bundles and dylibs identically in most places.
Supporting bundles allows us to run e.g. XCTests, as all test suites are
compiled into bundles which get dynamically loaded by the `xctest` test runner.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D87856