This patch adjusts ThreadPool::async to return futures that wrap
the result type of the passed in callable.
To do so, ThreadPool::asyncImpl first creates a shared promise. The
result of the promise is set in a new callable that first executes the
task. The callable is added to the task queue.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D114183
This basically reverts 1778831a3d, which split them.
Since they were split 9 years ago, EmitGCCInlineAsmStr() grew a bunch of
features that usually weren't added to EmitMSInlineAsmStr(), and
that was usually a mistake. D71677, D113932, D114167 are all examples
of where things were backported to EmitMSInlineAsmStr().
The names were also not great. EmitMSInlineAsmStr() used to be called for `asm
inteldialect`, which clang produces for Microsoft-style __asm { ... } blocks as
well for GCC-style __asm__ / asm statements with -masm=intel. On the other hand,
EmitGCCInlineAsmStr() used to be called for `asm`, whic clang produces for
GCC-style __asm__ / asm statements with -masm=att (the default).
It's also less code (23 insertions, 188 deletions).
No behavior change.
Differential Revision: https://reviews.llvm.org/D114330
This makes a line in llvm/test/CodeGen/X86/asm-block-labels.ll pass
with `asm inteldialect` too.
I don't know if this is something one can hit in practice with inline
asm. The test is from 2007 (4646aa3e33) but in 2009 blockaddr was
introduced and e.g. `__asm__ __volatile__("brl %0" :: "X"(&&foo) : "memory");`
compiles to
call void asm sideeffect "brl $0", "X,..."(i8* blockaddress(@func, %1))
nowadays (thanks to jrtc27 for that example!).
(6c4d255bf3 switched clang to blockaddress on an opt-in basis,
e4801f7844 added docs for it, 31b132c0b7 added IR support.)
I half-heartedly tried to build clang 2.8 locally, but it didn't
just build. And 2.8 didn't have a prebuilt clang binary yet.
The motivation is to make EmitGCCInlineAsmStr() and EmitMSInlineAsmStr()
more alike, and maybe we should delete this code form EmitGCCInlineAsmStr()
instead. But since it's just 3 lines and it's reachable from LLVM IR,
let's do the safer thing for now.
Differential Revision: https://reviews.llvm.org/D114329
Most changes are rewording comments but there are some assertions that I rephrased.
Reviewed By: kparzysz
Differential Revision: https://reviews.llvm.org/D114132
Currently, we restore the return address register as the last restoring
instruction in the epilog. The next instruction is `ret` usually. It is
a use of return address register. In some microarchitectures, there is
load-to-use data hazard. To avoid the load-to-use data hazard, we could
separate the load instruction from its use as far as possible. In this
patch, we reverse the order of restoring callee-saved registers to
increase the distance of `load ra` and `ret` in the epilog.
Differential Revision: https://reviews.llvm.org/D113967
This adds validation for consistency of ValueExprMap and
ExprValueMap, and fixes identified issues:
* Addrec construction directly wrote to ValueExprMap in a few places,
without updating ExprValueMap. Add a helper to ensures they stay
consistent. The adjustment in forgetSymbolicName() explicitly
drops the old value from the map, so that we don't rely on it
being overwritten.
* forgetMemoizedResultsImpl() was dropping the SCEV from
ExprValueMap, but not dropping the corresponding entries from
ValueExprMap.
Differential Revision: https://reviews.llvm.org/D113349
Update the reference publication for the SyncDependenceAnalysis and Divergence Analysis. Fix phrasing, formatting. Add comments on reducible loop limitation.
Reviewed By: sameerds
Differential Revision: https://reviews.llvm.org/D114146
Apparently my methodology was suboptimal, and not only did miss all the +VL tuples,
i also missed some plain tuples. I believe, this adds everything missing.
Indeed, these manual costmodels are just not okay long-term.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114334
Much like the VPMOVM2[BW] / VPMOV[BW]2M from AVX512BW,
these either sign-extent the mask register into a vector,
or pack the mask from vector register.
Apparently, we didn't even have MCA tests for these,
added in rG2f364f6f0d3a2420ca78cbd80abb186657180e05,
so i'm just guessing that their perf characteristics
are optimal.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D114314
This patch fixes PR52111. The problem is that LV propagates poison-generating flags (`nuw`/`nsw`, `exact`
and `inbounds`) in instructions that contribute to the address computation of widen loads/stores that are
guarded by a condition. It may happen that when the code is vectorized and the control flow within the loop
is linearized, these flags may lead to generating a poison value that is effectively used as the base address
of the widen load/store. The fix drops all the integer poison-generating flags from instructions that
contribute to the address computation of a widen load/store whose original instruction was in a basic block
that needed predication and is not predicated after vectorization.
Reviewed By: fhahn, spatel, nlopes
Differential Revision: https://reviews.llvm.org/D111846
This adds and uses look-up tables for non-loop branch probabilities, which have
have probabilities directly encoded into the tables for the different condition
codes. Compared to having this logic inlined in different functions, as it used
to be the case, I think this is compacter and thus also easier to check/cross
reference. This also adds a test for pointer heuristics that was missing.
Differential Revision: https://reviews.llvm.org/D114009
This teaches AArch64TargetLowering::shouldSinkOperands to sink splat
shuffles to certain neon intrinsics, so that they can make use of the
lane variants of the instructions that are available.
Differential Revision: https://reviews.llvm.org/D112994
Add an alias of `addi [x], zero, imm` to generate pseudo
instruction li, which makes assembly mush more readable.
For existed tests, users can update them by running script
`llvm/utils/update_llc_test_checks.py`.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D112692
The functionality of hasAnalyzableMemoryWrite() is effectively
subsumed by getLocForWriteEx(), which will return None if the
instruction is not analyzable. The implementations don't match
exactly (e.g. getLocForWriteEx() does not limit non-calls to
stores), but in conjunction with the isRemovable() check, it ends
up being the same.
If we're looking only at the lower bound, the actual modulus
doesn't matter. This is a leftover from when I wanted to consider
the upper bound as well, where the modulus does matter.
If (X urem M) >= C we know that X >= C. Make use of this fact
when computing the implied condition range.
In some cases we could also establish an upper bound, but that's
both tricker and not interesting in practice.
Alive: https://alive2.llvm.org/ce/z/R5ZGSW
A first step towards modeling preheader and exit blocks in VPlan as well.
Keeping the vector loop in a region allows for changing the VF as we
traverse region boundaries.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D113182
This is a generalization/extension of the existing and/or
folds noted with TODO comments. Those have a one-use
constraint that is not necessary.
Potential follow-ups are noted by the TODO comments in
the new function. We can also call this function from
other binop visit* functions, but we need to add tests
first.
This solves:
https://llvm.org/PR52543https://alive2.llvm.org/ce/z/NWuCR5
We have transform that tries turn a pmovmskb into movmskps/pd or
movmskps to movmskpd. This transform isn't valid if the truncate
discarded bits that might be set by the original movmsk.
We could fix this by inserting an AND after the new movmsk to discard
the equivalent of the truncated bits, but I've left that for later
patch.
Fixes PR52567.
Differential Revision: https://reviews.llvm.org/D114306
The compiler was generating symbols in the final code object for local
branch target labels. This bloats the code object, slows down the loader,
and is only used to simplify disassembly.
Use '--symbolize-operands' with llvm-objdump to improve readability of the
branch target operands in disassembly.
Fixes: SWDEV-312223
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D114273
The if-check above deleted part guarantees that StoreOffset <= LoadOffset
and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that
LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows
StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0,
while it could be meaningless to have a type with nonpositive size, so that
the check could be removed. The values are converted to signed types to
avoid unsigned operation with negative offsets.
Part of revision D100179
Reapply commit c35e8185d8 with fixing problem
reported by mstorsjo
The `llvm.instrprof.increment` intrinsic uses `i32` for the index. We should use this same type for the index into the GEP instructions.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D114268
[NFC] As part of using inclusive language within the llvm project, this patch
replaces master with main in `Thumb2SizeReduction.cpp`.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D114196
NFCI. Previously the implicit use was added to V_MOV_B32_indirect_read
when building the instruction. V_MOV_B32_indirect_write didn't have an
implicit use of M0 at all, but apparently it did not cause any problems.
Differential Revision: https://reviews.llvm.org/D114239
Revert "[SCEV] Defer all work from ea12c2cb as late as possible"
Revert "[SCEV] Defer loop property checks from ea12c2cb as late as possible"
This reverts commit 734abbad79 and 1a5666acb2.
Both of these changes were speculative attempts to address a compile time regression. Neither worked, and both complicated the code in undesirable ways.
On RISC-V, icmp is not sunk (as the following snippet shows) which
generates the following suboptimal branch pattern:
```
core_list_find:
lh a2, 2(a1)
seqz a3, a0 <<
bltz a2, .LBB0_5
bnez a3, .LBB0_9 << should sink the seqz
[...]
j .LBB0_9
.LBB0_5:
bnez a3, .LBB0_9 << should sink the seqz
lh a1, 0(a1)
[...]
```
due to an icmp not being sunk.
The blocks after `codegenprepare` look as follows:
```
define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
%idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
%0 = load i16, i16* %idx, align 2, !tbaa !4
%cmp = icmp sgt i16 %0, -1
%tobool.not37 = icmp eq %struct.list_head_s* %list, null
br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader
while.cond9.preheader: ; preds = %entry
br i1 %tobool.not37, label %return, label %land.rhs11.lr.ph
```
where the `%tobool.not37` is the result of the icmp that is not sunk.
Note that it is computed in the basic-block up until what becomes the
`bltz` instruction and the `bnez` is a basic-block of its own.
Compare this to what happens on AArch64 (where the icmp is correctly sunk):
```
define dso_local %struct.list_head_s* @core_list_find(%struct.list_head_s* readonly %list, %struct.list_data_s* nocapture readonly %info) local_unnamed_addr #0 {
entry:
%idx = getelementptr inbounds %struct.list_data_s, %struct.list_data_s* %info, i64 0, i32 1
%0 = load i16, i16* %idx, align 2, !tbaa !6
%cmp = icmp sgt i16 %0, -1
br i1 %cmp, label %while.cond.preheader, label %while.cond9.preheader
while.cond9.preheader: ; preds = %entry
%1 = icmp eq %struct.list_head_s* %list, null
br i1 %1, label %return, label %land.rhs11.lr.ph
```
This is caused by sinkCmpExpression() being skipped, if multiple
condition registers are supported.
Given that the check for multiple condition registers affect only
sinkCmpExpression() and shouldNormalizeToSelectSequence(), this change
adjusts the RISC-V target as follows:
* we no longer signal multiple condition registers (thus changing
the behaviour of sinkCmpExpression() back to sinking the icmp)
* we override shouldNormalizeToSelectSequence() to let always select
the preferred normalisation strategy for our backend
With both changes, the test results remain unchanged. Note that without
the target-specific override to shouldNormalizeToSelectSequence(), there
is worse code (more branches) generated for select-and.ll and select-or.ll.
The original test case changes as expected:
```
core_list_find:
lh a2, 2(a1)
bltz a2, .LBB0_5
beqz a0, .LBB0_9 <<
[...]
j .LBB0_9
.LBB0_5:
beqz a0, .LBB0_9 <<
lh a1, 0(a1)
[...]
```
Differential Revision: https://reviews.llvm.org/D98932
We were adding all defined weak symbols to the materialization
responsibility, but local symbols will not be in the symbol table, so it
failed to materialize due to the "missing" symbol.
Local weak symbols come up in practice when using `ld -r` with a hidden
weak symbol.
rdar://85574696
For tagged-globals, we only need to disable relaxation for globals that
we actually tag. With this patch function pointer relocations, which
we do not instrument, can be relaxed.
This patch also makes tagged-globals work properly with LTO, as
-Wa,-mrelax-relocations=no doesn't work with LTO.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D113220
This makes the following program build with -masm=intel:
int foo(int count) {
asm goto ("dec %0; jb %l[stop]" : "+r" (count) : : : stop);
return count;
stop:
return 0;
}
It's also is another step towards merging EmitGCCInlineAsmStr() and
EmitMSInlineAsmStr().
Differential Revision: https://reviews.llvm.org/D114167
No intended behavior change.
EmitGCCInlineAsmStr() used to explicitly check for modifier 'l'
after handling block address and machine basic block operands.
This prevented passing a MachineOperand with 'l' modifier to
PrintAsmMemoryOperand(). Conceptually that seems kind of nice,
but in practice the overrides of PrintAsmMemoryOperand() in all (*)
AsmPrinter subclasses already reject modifiers they don't know about,
and none of them don't know about 'l'. So removing this doesn't have
a behavior difference, is less code, and it makes EmitGCCInlineAsmStr()
and EmitMSInlineAsmStr() more similar, to prepare for merging them later.
(Why not _add_ the branch to EmitMSInlineAsmStr() instead? Because that
always works with X86AsmPrinter I think, and
X86AsmPrinter::PrintAsmMemoryOperand() very decisively rejects the 'l'
modifier, so it's hard to motivate adding that branch.)
*: The one exception was AVRAsmPrinter, which had an llvm_unreachable instead
of returning true. So this commit changes that, so that the AVR target keeps
emitting an error instead of crashing when passing a mem operand with a :l
modifier to it. All the other targets already don't crash on this.
Differential Revision: https://reviews.llvm.org/D114216
Introduce V_MOV_B32_indirect_read for indexed vgpr reads
(and rename the old V_MOV_B32_indirect to
V_MOV_B32_indirect_write) so they can be unambiguously
distinguished from regular V_MOV_B32_e32. Previously they
were distinguished by looking for extra implicit operands
but this is fragile because regular moves sometimes have
extra implicit operands too:
- either by accident, when instructions end up with
duplicate implicit operands (see e.g. D100939)
- or by design, when SIInstrInfo::copyPhysReg breaks a
multi-dword copy into individual subreg mov instructions
and adds implicit operands for the super-register.
The effect of this is that SIInstrInfo::isFoldableCopy can
be simplified and identifies more foldable copies. The test
diffs show that more immediate 0 values have been folded as
inline operands.
SIInstrInfo::isReallyTriviallyReMaterializable could
probably be simplified too but that is not part of this
patch.
Differential Revision: https://reviews.llvm.org/D114230