Vector indexing with a constant index should be folded out in the
legalizer, but this was accidentally falling through. This would
produce the indexing operation with $noreg. Handle this case as a
dynamic index just in case a bug like this happens again in the
future.
We were failing to find constants that were casted. I feel like the
artifact combiner should have folded the constant in the trunc before
the custom lowering, but that doesn't happen.
Reverts part of 6524a7a2b9. Since that
commit, the expansion was ignoring the actual save exec register
produced by the instruction, and looking at other instructions. I do
not understand why it was looking at other instructions, but relying
on this scan was wrong.
Fixes verifier errors after SI_IF is tail duplicated, which should be
correct to do. The results were fed into a phi, which was lowered to
the S_MOV_B64_term instructions.
Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
The flag isn't used, but I believe this matches the MOV32r0 that
would be created by the table emitter. This should allow this node
to be CSEed with any others created by the table.
As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets.
This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway.
There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch.
Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).
A vselect+strictfp node is not equivalent to a masked operation.
The exceptions of the strictfp node are not masked by a vselect
after it so we can't match it to a masked operation.
We already had a hack in IsLegalToFold to prevent these patterns from
matching. This patch removes that hack and removes the patterns.
We have several bug reports that could be characterized as "reducing scalarization",
and this topic was also raised on llvm-dev recently:
http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html
...so I'm proposing that we deal with these patterns in a new, lightweight IR vector
pass that runs before/after other vectorization passes.
There are 4 alternate options that I can think of to deal with this kind of problem
(and we've seen various attempts at all of these), but they all have flaws:
InstCombine - can't happen without TTI, but we don't want target-specific
folds there.
SDAG - too late to assist other vectorization passes; TLI is not equipped
for these kind of cost queries; limited to a single basic block.
CGP - too late to assist other vectorization passes; would need to re-implement
basic cleanups like CSE/instcombine.
SLP - doesn't fit with existing transforms; limited to a single basic block.
This initial patch/transform is based on existing code in AggressiveInstCombine:
we walk backwards through the function looking for a pattern match. But we diverge
from that cost-independent IR canonicalization pass by using TTI to decide if the
vector alternative is profitable.
We probably have at least 10 similar bug reports/patterns (binops, constants,
inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements.
It's possible that we could iterate on a worklist to fix-point like InstCombine does,
but it's safer to start with a most basic case and evolve from there, so I didn't
try to do anything fancy with this initial implementation.
Differential Revision: https://reviews.llvm.org/D73480
When I have symlinked builddir on Fedora 31 x86_64 I get:
FAIL: test_libraries_svr4_libs_present (TestGdbRemoteLibrariesSvr4Support.TestGdbRemoteLibrariesSvr4Support)
----------------------------------------------------------------------
...
File "lldb/packages/Python/lldbsuite/test/tools/lldb-server/libraries-svr4/TestGdbRemoteLibrariesSvr4Support.py", line 106, in
libraries_svr4_libs_present
self.assertIn(self.getBuildDir() + "/" + lib, libraries_svr4_names)
AssertionError:
'/home/jkratoch/redhat/llvm-monorepo-clangassertsymlink/lldb-test-build.noindex/tools/lldb-server/libraries-svr4/TestGdbRemoteLibrariesSvr4Support.test_libraries_svr4_libs_present/libsvr4lib_a.so' not found in ['/home/jkratoch/redhat/llvm-monorepo/lldb/packages/Python/lldbsuite/test/tools/lldb-server/libraries-svr4/linux-vdso.so.1', '/quad/home/jkratoch/redhat/llvm-monorepo-clangassertsymlink/lldb-test-build.noindex/tools/lldb-server/libraries-svr4/TestGdbRemoteLibrariesSvr4Support.test_libraries_svr4_libs_present/libsvr4lib_a.so', '/quad/home/jkratoch/redhat/llvm-monorepo-clangassertsymlink/lldb-test-build.noindex/tools/lldb-server/libraries-svr4/TestGdbRemoteLibrariesSvr4Support.test_libraries_svr4_libs_present/libsvr4lib_b".so', '/usr/lib64/libdl-2.30.so', '/usr/lib64/libstdc++.so.6.0.27', '/usr/lib64/libm-2.30.so', '/usr/lib64/libgcc_s-9-20190827.so.1', '/usr/lib64/libc-2.30.so', '/usr/lib64/ld-2.30.so']
Config=x86_64-/quad/home/jkratoch/redhat/llvm-monorepo-clangassertsymlink/bin/clang-11
----------------------------------------------------------------------
Differential Revision: https://reviews.llvm.org/D74295
The LoopExtractor created new functions (by definition), which violates
the restrictions of a LoopPass.
The correct implementation of this pass should be as a ModulePass.
Includes reverting rL82990 implications on the LoopExtractor.
Fixes PR3082 and PR8929.
Differential Revision: https://reviews.llvm.org/D69069
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
Implement protection against the stack clash attack [0] through inline stack
probing.
Probe stack allocation every PAGE_SIZE during frame lowering or dynamic
allocation to make sure the page guard, if any, is touched when touching the
stack, in a similar manner to GCC[1].
This extends the existing `probe-stack' mechanism with a special value `inline-asm'.
Technically the former uses function call before stack allocation while this
patch provides inlined stack probes and chunk allocation.
Only implemented for x86.
[0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt
[1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html
This a recommit of 39f50da2a3 with proper LiveIn
declaration, better option handling and more portable testing.
Differential Revision: https://reviews.llvm.org/D68720
The existing lowering of gpu.block_dim added a global variable with
the WorkGroupSize decoration. This raises an error within
Vulkan/SPIR-V validation since Vulkan requires this to have a constant
initializer. This is not yet supported in SPIR-V dialect. Changing the
lowering to return the workgroup size as a constant value instead,
obtained from spv.entry_point_abi attribute gets around the issue for
now. The validation goes through since the workgroup size is specified
using spv.execution_mode operation.
In addition to the module pass, this patch introduces a CGSCC pass that
runs the Attributor on a strongly connected component of the call graph
(both old and new PM). The Attributor was always design to be used on a
subset of functions which makes this patch mostly mechanical.
The one change is that we give up `norecurse` deduction in the module
pass in favor of doing it during the CGSCC pass. This makes the
interfaces simpler but can be revisited if needed.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D70767
Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well.
Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D72304
Parallel regions known to be read-only, e.g., after we removed all dead
write accesses, and terminating (`willreturn`) can be removed.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69954
This adds ~27 more runtime calls to the OpenMPKinds.def file, all with
attributes. We deduplicate 16 of those automatically in function =
thread scope. And we annotate all of them automatically during the
OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to
track annotation coverage is included.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D69984
Making sure not to use them with patterns for masked instructions.
Also fix FMA patterns that were matching strict_fma+x86selects to
masked instructions.
This revision adds support in the declarative assembly form for printing attributes with buildable types without the type, and moves several more parsers over to the declarative form.
Differential Revision: https://reviews.llvm.org/D74276