This patch is part of supporting `-fstack-clash-protection`. Mainly do
such things compared to existing `lowerDynamicAlloc`
- Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get
actual frame pointer and final stack pointer.
- Synthesize a loop to probe by blocks.
- Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in
prologepilog.
Differential Revision: https://reviews.llvm.org/D81358
I think this mostly looks ok. The only weird thing I noticed was
a couple rotate vXi8 tests picked up an extra logic op where we have
(and (or (and), (andn)), X). Previously we matched the (or (and), (andn))
to vpternlog, but now we match the (and (or), X) and leave the and/andn
unmatched.
Exit early if the exec mask is zero at the end of control flow.
Mark the ends of control flow during control flow lowering and
convert these to exits during the insert skips pass.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82737
Previously, we only supported binding dysyms to the GOT. This
diff adds support for binding them to any arbitrary section. C++
programs appear to use this, I believe for vtables and type_info.
This diff also makes our bind opcode encoding a bit smarter -- we now
encode just the differences between bindings, which will make things
more compact.
I was initially concerned about the performance overhead of iterating
over these relocations, but it turns out that the number of such
relocations is small. A quick analysis of my llvm-project build
directory showed that < 1.3% out of ~7M relocations are RELOC_UNSIGNED
bindings to symbols (including both dynamic and static symbols).
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D83103
When performing dynamic stack allocation, calculation of frame pointer
and actual negsize can be separated. This patch refactors
`lowerDynamicAlloc` in preparation of supporting
`-fstack-clash-protection` which also has to calculate actual frame
pointer and negsize.
Differential Revision: https://reviews.llvm.org/D81354
Exit early if the exec mask is zero at the end of control flow.
Mark the ends of control flow during control flow lowering and
convert these to exits during the insert skips pass.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82737
Generate a single early exit block out-of-line and branch to this
if all lanes are killed. This avoids branching if lanes are active.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D82641
Clean up the input editing path so external input works better
when combined with further changes. List-directed input needed
to allow for advancement to following records.
Reviewed By: tskeith, sscalpone
Differential Revision: https://reviews.llvm.org/D83104
We consider v32i16/v64i8 to be legal types on avx512f, but we
don't have most operations until avx512bw. But we can use
and/or/xor operations. So try those before splitting.
This is especially helpful since we turn some ands with constant
masks into shuffles in early DAG combines. So we should make sure
we recover those back to AND.
Add a isFixedRecordLength flag member to Connection to
disambiguate the state of "record has known variable length"
from "record has fixed length". Code that sets and tests this
flag will appear in later patches. Rearrange data members to
reduce storage requirements, since Connection might indirectly
end up on a program stack frame. Add a utility member function
BeginRecord(); use it in internal I/O processing.
Reviewed By: tskeith, sscalpone
Differential Revision: https://reviews.llvm.org/D83098
SelectionDAGBuilder converts logic-of-compares into multiple branches based
on a boolean TLI setting in isJumpExpensive(). But that probably never
considered the pattern of extracted bools from a vector compare - it seems
unlikely that we would want to turn vector logic into control-flow.
The motivating x86 reduction case is shown in PR44565:
https://bugs.llvm.org/show_bug.cgi?id=44565
...and that test shows the expected improvement from using pmovmsk codegen.
For AArch64, I modified the test to include an extra op because the simpler
test gets transformed by a codegen invocation of SimplifyCFG.
Differential Revision: https://reviews.llvm.org/D82602
This test spawns 32 child processes which race to update counters on
shared memory pages. On some Apple-internal machines, two processes race
to perform an update in approximately 0.5% of the test runs, leading to
dropped counter updates. Deflake the test by using atomic increments.
Tested with:
```
$ for I in $(seq 1 1000); do echo ":: Test run $I..."; ./bin/llvm-lit projects/compiler-rt/test/profile/Profile-x86_64h/ContinuousSyncMode/online-merging.c -av || break; done
```
rdar://64956774
Default vector.contract lowering essentially yields a series of sdot/ddot
operations. However, for some layouts a series of saxpy/daxpy operations,
chained through fma are more efficient. This CL introduces a choice between
the two lowering paths. A default heuristic is to follow.
Some preliminary avx2 performance numbers for matrix-times-vector.
Here, dot performs best for 64x64 A x b and saxpy for 64x64 A^T x b.
```
------------------------------------------------------------
A x b A^T x b
------------------------------------------------------------
GFLOPS sdot (reassoc) saxpy sdot (reassoc) saxpy
------------------------------------------------------------
1x1 0.6 0.9 0.6 0.9
2x2 2.5 3.2 2.4 3.5
4x4 6.4 8.4 4.9 11.8
8x8 11.7 6.1 5.0 29.6
16x16 20.7 10.8 7.3 43.3
32x32 29.3 7.9 6.4 51.8
64x64 38.9 79.3
128x128 32.4 40.7
------------------------------------------------------------
```
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D83012
The comments here indicate that we prefer to promote the shifts
instead of allowing rotate to be pattern matched. But we weren't
taking into account whether 512-bit registers are enabled or
whethever we have vpsllvw/vpsrlvw instructions.
splatvar_rotate_v32i8 is a slight regrssion, but the other cases
are neutral or improved.
We currently don't mark ROTL as custom when avx512bw is enabled
under the assumption we'll be able to promote the shifts in the
rotate idiom. But if we don't have 512-bit registers enabled we
can't promote.
There were dependences upon LLVM libraries in the Fortran
runtime support library binaries due to some indirect #includes
of llvm/Support/raw_ostream.h, which caused some kind of internal
ABI version consistency checking to get pulled in. Fixed by
cleaning up some includes.
Reviewed By: tskeith, PeteSteinfeld, sscalpone
Differential Revision: https://reviews.llvm.org/D83060
The arguments have been moved out of the analyzer so we can't get the
expected number there. Instead use the argument count from the newly
built callee.
Differential Revision: https://reviews.llvm.org/D83063
Summary:
The byte swap fix for big endian hosts in 9782c922cb (for D81570)
swaps based on the host endianess, but for cross-targeting builds (i.e.
big endian host targeting little endian) the host-endianess won't
necessarily match the generated DWARF. This change updates the test
to use symmetrical constants so the results aren't endian dependent.
Reviewers: jhenderson, hubert.reinterpretcast, stevewan, ikudrin
Reviewed By: ikudrin
Subscribers: ikudrin, aprantl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82827
In general there is no way to get to the ASTContext from most AST nodes
(Decls are one of the exception). This will be a problem when implementing
the rest of APValue::dump since we need the ASTContext to dump some kinds of
APValues.
The ASTContext* in ASTDumper and TextNodeDumper is not always
non-null. This is because we still want to be able to use the various
dump() functions in a debugger.
No functional changes intended.
The legacy pass manager implicitly adds BasicAA, but the new PM does
not. This causes pr33196.ll to fail under NPM.
There are almost certainly lots of other failures like this, wanted to
get some input on if adding -basic-aa to tests makes sense at scale.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D82915
This patch introduces conversion pattern for `spv.constant` with scalar
and vector types. There is a special case when the constant value is a
signed/unsigned integer (vector of integers). Since LLVM dialect does not
have signedness semantics, the types had to be converted to signless ints.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82936
The legacy pass was called "loop-reduce".
This lowers the number of check-llvm failures under NPM by 83.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D82925
When configuring in-tree, the correct names are LLVM_VERSION_MAJOR
and LLVM_VERSION_MINOR. This has been wrong since the code was added
in commits fc473dee98 and 821649229e.
As of 1fed131660, we have code that
changes shuffle masks so that we can put the shuffle in a canonical
form that can be matched to a single instruction. However, it
does not properly account for undef elements in the BUILD_VECTOR
that is the RHS splat so we can end up with undefs where they
shouldn't be. This patch converts the splat input with undefs to
one without.
This commit augments spv.CopyMemory's implementation to support 2 memory
access operands. Hence, more closely following the spec. The following
changes are introduces:
- Customize logic for spv.CopyMemory serialization and deserialization.
- Add 2 additional attributes for source memory access operand.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D82710
Added conversion pattern for SPIR-V `FunctionCallOp`. Based on
specification, it returns no results or a single result, so
can be mapped directly to LLVM dialect's `llvm.call`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D83030
This patch introduces conversion pattern for `spv.BitFiledInsert` op,
as well as some utility functions to facilitate code reading.
Since `spv.BitFiledInsert` may take both vector and integer operands,
this case was specifically handled by broadcasting values (`count`
and `offset` here) to vectors. Moreover, the types had to be converted
to same bitwidth in order to conform with LLVM dialect rules.
This was done with `zext` when extending (Note that `count` and
`offset` are treated as unsigned) and `trunc` in the opposite case.
For the latter one, truncation is safe since the op is defined only when
`count`/`offset`/their sum is less than the bitwidth of the result.
This introduces a natural bound of the value of 64, which can be
expressed as `i8`.
Reviewed By: antiagainst, ftynse
Differential Revision: https://reviews.llvm.org/D82639
Summary:
The Scalar class claims to follow the C type conversion rules. This is
true for the Promote function, but it is not true for the implicit
conversions done in the getter methods.
These functions had a subtle bug: when extending the type, they used the
signedness of the *target* type in order to determine whether to do
sign-extension or zero-extension. This is not how things work in C,
which uses the signedness of the *source* type. I.e., C does
(sign-)extension before it does signed->unsigned conversion, and not the
other way around.
This means that: (unsigned long)(int)-1
is equal to (unsigned long)0xffffffffffffffff
and not (unsigned long)0x00000000ffffffff
Unsurprisingly, we have accumulated code which depended on this
inconsistent behavior. It mainly manifested itself as code calling
"ULongLong/SLongLong" as a way to get the value of the Scalar object in
a primitive type that is "large enough". Previously, the ULongLong
conversion did not do sign-extension, but now it does.
This patch makes the Scalar getters consistent with the declared
semantics, and fixes the couple of call sites that were using it
incorrectly.
Reviewers: teemperor, JDevlieghere
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D82772