This patch adds OMPIRBuilder support for the simdlen clause for the
simd directive. It uses the simdlen support in OpenMPIRBuilder when
it is enabled in Clang. Simdlen is lowered by OpenMPIRBuilder by
generating the loop.vectorize.width metadata.
Reviewed By: jdoerfert, Meinersbur
Differential Revision: https://reviews.llvm.org/D129149
Tosa to Linalg conversion crashes when input tensor is a float type other than fp32.
Because tosa.clamp and tosa.reluN have fp32 min/max attribute which is converted as arith.constant with the attribute type.
This commit fixes the crash by correctly setting the float constant type from the input tensor.
Reviewed By: eric-k256
Differential Revision: https://reviews.llvm.org/D128630
This restores the old behavior before D129402 when
enableUnalignedScalarMem is false. This fixes a regression spotted
by @asb.
To fix this correctly, we need to consider alignment of the load
we'd be replacing, but that's not possible in the current interface.
As suggested in the post-commit feedback for D128123,
we can ease the mask constraint to ignore the MSB
(and make the code easier to read by adjusting the check).
https://alive2.llvm.org/ce/z/bbvqWv
There is a post-processing in ext-tsp block reordering that merges some blocks
into chains. This allows to maintain the original block order in the absense of
profile data and can be beneficial for code size (when fallthroughs are merged).
In the earlier version we could merge hot and cold (with zero execution count)
chains, that later were split by SplitFunction.cpp (when split-all-cold=1). The
diff eliminates the redundant merging.
It is unlikely the change will affect the performance of a binary in a
measurable way, as it is mostly operates with cold basic blocks. However, after
the diff the impact of split-all-cold is almost negligible and we can avoid the
extra function splitting.
Measuring on the clang binary (negative is good, positive is a regression):
**clang12**
benchmark1: `0.0253`
benchmark2: `-0.1843`
benchmark3: `0.3234`
benchmark4: `0.0333`
**clang10**
benchmark1 `-0.2517`
benchmark2 `-0.3703`
benchmark3 `-0.1186`
benchmark4 `-0.3822`
**clang7**
benchmark1 `0.2526`
benchmark2 `0.0500`
benchmark3 `0.3024`
benchmark4 `-0.0489`
**Overall**: `-0.0671 ± 0.1172` (insignificant)
Reviewed By: maksfb
Differential Revision: https://reviews.llvm.org/D129397
Summary:
Introduce NeverAlign fragment type.
The intended usage of this fragment is to insert it before a pair of
macro-op fusion eligible instructions. NeverAlign fragment ensures that
the next fragment (first instruction in the pair) does not end at a
given alignment boundary by emitting a minimal size nop if necessary.
In effect, it ensures that a pair of macro-fusible instructions is not
split by a given alignment boundary, which is a precondition for
macro-op fusion in modern Intel Cores (64B = cache line size, see Intel
Architecture Optimization Reference Manual, 2.3.2.1 Legacy Decode
Pipeline: Macro-Fusion).
This patch introduces functionality used by BOLT when emitting code with
MacroFusion alignment already in place.
The use case is different from BoundaryAlign and instruction bundling:
- BoundaryAlign can be extended to perform the desired alignment for the
first instruction in the macro-op fusion pair (D101817). However, this
approach has higher overhead due to reliance on relaxation as
BoundaryAlign requires in the general case - see
https://reviews.llvm.org/D97982#2710638.
- Instruction bundling: the intent of NeverAlign fragment is to prevent
the first instruction in a pair ending at a given alignment boundary, by
inserting at most one minimum size nop. It's OK if either instruction
crosses the cache line. Padding both instructions using bundles to not
cross the alignment boundary would result in excessive padding. There's
no straightforward way to request instruction bundling to avoid a given
end alignment for the first instruction in the bundle.
LLVM: https://reviews.llvm.org/D97982
Manual rebase conflict history:
https://phabricator.intern.facebook.com/D30142613
Test Plan: sandcastle
Reviewers: #llvm-bolt
Subscribers: phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D31361547
Summary:
This is an essential piece of infrastructure for us to be
continuously testing debug info with BOLT. We can't only make changes
to a test repo because we need to change debuginfo tests to call BOLT,
hence, this diff needs to sit in our opensource repo. But when upstreaming
to LLVM, this should be kept BOLT-only outside of LLVM. When upstreaming,
we need to git diff and check all folders that are being modified by our
commits and discard this one (and leave as an internal diff).
To test BOLT in debuginfo tests, configure it with -DLLVM_TEST_BOLT=ON.
Then run check-lldb and check-debuginfo.
Manual rebase conflict history:
https://phabricator.intern.facebook.com/D29205224https://phabricator.intern.facebook.com/D29564078https://phabricator.intern.facebook.com/D33289118https://phabricator.intern.facebook.com/D34957174
Test Plan:
tested locally
Configured with:
-DLLVM_ENABLE_PROJECTS="clang;lld;lldb;compiler-rt;bolt;debuginfo-tests"
-DLLVM_TEST_BOLT=ON
Ran test suite with:
ninja check-debuginfo
ninja check-lldb
Reviewers: #llvm-bolt
Subscribers: ayermolo, phabricatorlinter
Differential Revision: https://phabricator.intern.facebook.com/D35317341
Tasks: T92898286
Create an interface for writing SARIF documents from within clang:
The primary intent of this change is to introduce the interface
clang::SarifDocumentWriter, which allows incrementally adding
diagnostic data to a JSON backed document. The proposed interface is
not yet connected to the compiler internals, which will be covered in
future work. As such this change will not change the input/output
interface of clang.
This change also introduces the clang::FullSourceRange type that is
modeled after clang::SourceRange + clang::FullSourceLoc, this is useful
for packaging a pair of clang::SourceLocation objects with their
corresponding SourceManagers.
Previous discussions:
RFC for this change: https://lists.llvm.org/pipermail/cfe-dev/2021-March/067907.htmlhttps://lists.llvm.org/pipermail/cfe-dev/2021-July/068480.html
SARIF Standard (2.1.0):
https://docs.oasis-open.org/sarif/sarif/v2.1.0/os/sarif-v2.1.0-os.html
Differential Revision: https://reviews.llvm.org/D109701
Perform a major refactoring of vCont-threads tests in order to attempt
to improve their stability and performance.
Split test_vCont_run_subset_of_threads() into smaller test cases,
and split the whole suite into two files: one for signal-related tests,
the running-subset-of tests.
Eliminate output_match checks entirely, as they are fragile to
fragmentation of output. Instead, for the initial thread list capture
raise an explicit SIGINT from inside the test program, and for
the remaining output let the test program run until exit, and check all
the captured output afterwards.
For resume tests, capture the LLDB's thread view before and after
starting new threads in order to determine the IDs corresponding
to subthreads rather than relying on program output for that.
Add a mutex for output to guarantee serialization. A barrier is used
to guarantee that all threads start before SIGINT, and an atomic bool
is used to delay prints from happening until after SIGINT.
Call std::this_thread::yield() to reduce the risk of one of the threads
not being run.
This fixes the test hangs on FreeBSD. Hopefully, it will also fix all
the flakiness on buildbots.
Thanks to Pavel Labath for figuring out why the original version did not
work on Debian.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.llvm.org/D129012
* GNU objcopy supports --set-section-flags src=... --rename-section src=tst and --set-section-flags runs first.
* GNU objcopy processes --update-section before --rename-section.
To match the two behaviors, postpone --rename-section and allow its use together
with --set-section-flags.
As a side effect, --rename-section=.foo1=.foo2 --add-section=.foo1=/dev/null
leads to .foo2 while GNU objcopy surprisingly produces .foo1 (so
--set-section-flags --add-section --rename-section do not form a total order).
I think the deviation is fine as a total order makes more sense.
Rename set-section-flags-and-rename.test to
set-section-attr-and-rename.test and additionally test --set-section-alignment
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D129336
Prevent creating multiple output for the same Value when distributing
operations out of WarpExecuteOnLane0Op. This avoid creating combinatory
explosion of outputs.
Differential Revision: https://reviews.llvm.org/D129465
This uses an int64_t-based fastpath for the common case and falls back to
SlowMPInt to handle the rare cases where larger numbers occur.
It uses `__builtin_*` for performance through the support in LLVM MathExtras.
Using this in the Presburger library results in a minor performance
*improvement* over any commit hash before sequence of patches
starting at d5e31cf38a.
This was previously reverted in 1e10d35ea9 due
to a build failure; relanding now with an attempted fix.
Reviewed By: Groverkss, ftynse
Differential Revision: https://reviews.llvm.org/D128811
Pointed out in Issue #56432: the current reference models may not be
quite friendly to open source projects. Their purpose is only
illustrative - the expectation is that projects would train their own.
To avoid unintentionally pulling such a model, made the URL cmake
setting require explicit user setting.
Differential Revision: https://reviews.llvm.org/D129342
If the combined shuffle mask requires zero elements, we don't currently have much chance of matching them against the expected source vector. This patch uses the SelectionDAG::MaskedVectorIsZero wrapper to attempt to determine if the expected lement we want to use is already known to be zero.
I've also tightened up the ExpectedMask assertion to always be in range - we're never giving it a target shuffle mask that has sentinels at all - allowing to remove some of the confusing bounds checks.
This attempts to address some of the regressions uncovered by D129150 where we more aggressively fold shuffles as AND / 'clear' masks which results in more combined shuffles using SM_SentinelZero.
Differential Revision: https://reviews.llvm.org/D129207
The requirements for "thread until <line number>" are:
a) If any code contributed by <line number> or the nearest subsequent of <line number> is executed before leaving the function, stop
b) If you end up leaving the function w/o triggering (a), then stop
In case of (a), since the <line number> may have multiple entries in the line table and the compiler might have scheduled/moved the relevant code across, and the lldb does not know the control flow, set breakpoints on all the line table entries of best match of <line number> i.e. exact or the nearest subsequent line.
Along with the above, currently, CommandObjectThreadUntil is also setting the breakpoints on all the subsequent line numbers after the best match and this latter part is wrong.
This issue is discussed at http://lists.llvm.org/pipermail/lldb-dev/2018-August/013979.html.
In fact, currently `TestStepUntil.py` is not actually testing step until scenarios and `test_missing_one` test fails without this patch if tests are made to run. Fixed the test as well.
Reviewed By: jingham
Differential Revision: https://reviews.llvm.org/D50304
If the add has more than one use then applying the transformation
won't cause it to be removed, so we can end up applying it again
causing an infinite loop.
Differential Revision: https://reviews.llvm.org/D129361
In case where the bound(s) of a workshare loop use(s) firstprivate var(s), currently, that use is not updated with the created clone. It still uses the shared variable. This patch fixes that.
Reviewed By: peixin
Differential Revision: https://reviews.llvm.org/D127137
Currently, for vectorised loops that use the get.active.lane.mask
intrinsic we only use the mask for predicated vector operations,
such as masked loads and stores, etc. The loop itself is still
controlled by comparing the canonical induction variable with the
trip count. However, for some targets this is inefficient when it's
cheap to use the mask itself to control the loop.
This patch adds support for using the active lane mask for control
flow by:
1. Generating the active lane mask for the next iteration of the
vector loop, rather than the current one. If there are still any
remaining iterations then at least the first bit of the mask will
be set.
2. Extract the first bit of this mask and use this bit for the
conditional branch.
I did this by creating a new VPActiveLaneMaskPHIRecipe that sets
up the initial PHI values in the vector loop pre-header. I've also
made use of the new BranchOnCond VPInstruction for the final
instruction in the loop region.
Differential Revision: https://reviews.llvm.org/D125301
Currently, an error exists when InstrRefBasedLDV observes transfers of
variables across copies, which causes it to lose track of variables
under certain circumstances, resulting in shorter lifetimes for those
variables as LDV gives up searching for live locations for them. This
patch fixes this issue by storing the currently tracked values in
the destination first, then updating them manually later without
clobbering or assigning them the wrong value.
Differential Revision: https://reviews.llvm.org/D128101
The following commit https://reviews.llvm.org/D125998 added a static_assert which was triggered on z/OS because bitfields are always aligned to 1 regardless of type.
```
error: static_assert failed due to requirement 'alignof(llvm::SmallVector<llvm::MDOperand, 0>) <= alignof(llvm::MDNode::Header)' "LargeStorageVector too strongly aligned"
```
The solution was to force the alignment to be size_t.
Reviewed By: wolfgangp
Differential Revision: https://reviews.llvm.org/D129369
This uses an int64_t-based fastpath for the common case and falls back to
SlowMPInt to handle the rare cases where larger numbers occur.
It uses `__builtin_*` for performance through the support in LLVM MathExtras.
Using this in the Presburger library results in a minor performance
*improvement* over any commit hash before sequence of patches
starting at d5e31cf38a.
Reviewed By: Groverkss, ftynse
Differential Revision: https://reviews.llvm.org/D128811
This patch is a simple piece of refactoring that now permits users
to create VPInstructions and specify the name of the value being
generated. This is useful for creating more readable/meaningful
names in IR.
Differential Revision: https://reviews.llvm.org/D128982
This test was failing on our 32 bit build bot:
https://lab.llvm.org/buildbot/#/builders/178/builds/2463
This happened because in UnwindInfoSectionImpl::finalize
a decision is made whether to write out regular or compressed
unwind info.
One check in this does:
```
if (cuPtr->functionAddress >= functionAddressMax) {
break;
```
Where cuPtr->functionAddress was uint64_t and functionAddressMax
was uintptr_t, which is 4 bytes on a 32 bit system.
Using uint64_t for functionAddressMax fixes this problem.
Presumably because at only 4 bytes, the max is much lower than
we expect. We're targetting 64 bit though so the size of the max
should match the size of the addresses.
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D129363
This is simple switch from a unittest to an integration test. It is
being done as a preparatory step to adding TLS support to thread
creation. TLS setup and initialization is tightly coupled with the
loader and hence all thread related tests should be integration tests.
Using SIGSTOP means that if anything goes wrong in the test, the process
can end up in the stopped state, where it is not running, but still
taking up resources. Eventually, these "zombies" can make the machine
completely unusable. Instead, use a signal whose default action is to
kill the processes.
Including the following opcode:
Select_FPR16_Using_CC_GPR
Select_FPR32_Using_CC_GPR
Select_FPR64_Using_CC_GPR
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D127871