This builds on the restricted after initial revert form of D93906, and adds back support for breaking backedges of inner loops. It turns out the original invalidation logic wasn't quite right, specifically around the handling of LCSSA.
When breaking the backedge of an inner loop, we can cause blocks which were in the outer loop only because they were also included in a sub-loop to be removed from both loops. This results in the exit block set for our original parent loop changing, and thus a need for new LCSSA phi nodes.
This case happens when the inner loop has an exit block which is also an exit block of the parent, and there's a block in the child which reaches an exit to said block without also reaching an exit to the parent loop.
(I'm describing this in terms of the immediate parent, but the problem is general for any transitive parent in the nest.)
The approach implemented here involves a potentially expensive LCSSA rebuild. Perf testing during review didn't show anything concerning, but we may end up needing to revert this if anyone encounters a practical compile time issue.
Differential Revision: https://reviews.llvm.org/D94378
Rather than reimplement, use a `using` declaration to bring in
`SmallVectorImpl<char>`'s assign and append implementations in
`SmallString`.
The `SmallString` versions were missing reference invalidation
assertions from `SmallVector`. This patch also fixes a bug in
`llvm::FileCollector::addFileImpl`, which was a copy/paste from
`clang::ModuleDependencyCollector::copyToRoot`, both caught by the
no-longer-skipped assertions.
As a drive-by, this also sinks the `const SmallVectorImpl&` versions of
these methods down into `SmallVectorImpl`, since I imagine they'd be
useful elsewhere.
Differential Revision: https://reviews.llvm.org/D95202
[libomptarget] Build cuda plugin without cuda installed locally
Compiles a new file, `plugins/cuda/dynamic_cuda/cuda.cpp`, to an object file that exposes the same symbols that the plugin presently uses from libcuda. The object file contains dlopen of libcuda and cached dlsym calls. Also provides a cuda.h containing the subset that is used.
This lets the cmake file choose between the system cuda and a dlopen shim, with no changes to rtl.cpp.
The corresponding change to amdgpu is postponed until after a refactor of the plugin to reduce the size of the hsa.h stub required
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95155
Having this 4MB buffer with a compile-time initialized string forced it
into the DATA section and it took up 4MB of space in the binary, which
accounts for like 80% of debugserver's footprint on disk. Change it to
BSS and strcpy in the initial value at runtime instead.
<rdar://problem/73503892>
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.
Differential Revision: https://reviews.llvm.org/D95268
default arguments.
When a function is declared with a qualified name, its eventual semantic
DeclContext may differ from the scope specified by the qualifier if it
redeclares a function in an inline namespace. In this case, we need to
update the DeclContext to be that of the previous declaration, and we
need to do so before we decide whether to inherit default arguments from
that previous declaration, because we only inherit default arguments
from declarations in the same scope.
Add code pattersn for c++ `range for` loops and objective c `for...in` loops.
Reviewed By: kadircet
Differential Revision: https://reviews.llvm.org/D95131
Similar to our free standing setcc patterns, we can use ADDI to
subtract the immediate from the other operand. Then the cmov
can check if the result is zero or non-zero.
Reviewed By: mundaym
Differential Revision: https://reviews.llvm.org/D95169
Similar to binary operators like fadd/fmul/fsub, propagate shape info
through unary operators (fneg is the only one?).
Differential Revision: https://reviews.llvm.org/D95252
I have previously tried doing that in
b33fbbaa34 / d38205144f,
but eventually it was pointed out that the approach taken there
was just broken wrt how the uses of bonus instructions are updated
to account for the fact that they should now use either bonus instruction
or the cloned bonus instruction. In particluar, all that manual handling
of PHI nodes in successors was just wrong.
But, the fix is actually much much simpler than my initial approach:
just tell SSAUpdate about both instances of bonus instruction,
and let it deal with all the PHI handling.
Alive2 confirms that the reproducers from the original bugs (@pr48450*)
are now handled correctly.
This effectively reverts commit 59560e8589,
effectively relanding b33fbbaa34.
NewBonusInst just took name from BonusInst, so BonusInst has no name,
so BonusInst.getName() makes no sense.
So we need to ask NewBonusInst for the name.
This patch addresses inconsistencies in the way fallthrough is handled
in the RedirectingFileSystem. Rather than trying to change the working
directory of the external filesystem, the RedirectingFileSystem will
canonicalize every path before handing it down. This guarantees that
relative paths are resolved relative to the RedirectingFileSystem's
working directory.
This allows us to have a strictly virtual working directory, and still
fallthrough for absolute paths, but not for relative paths that would
get resolved incorrectly at the lower layer (for example, in case of the
RealFileSystem, because the strictly virtual path does not exist).
Differential revision: https://reviews.llvm.org/D95188
The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.
Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.
Differential Revision: https://reviews.llvm.org/D95035
Add tests to make sure common instructions are accepted in RV64
and not just RV32.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D95150
Just getting rid of some logspew as I test LLD under existing build
systems.
Reviewed By: #lld-macho, smeenai
Differential Revision: https://reviews.llvm.org/D95213
This adds an initial set of patterns for these instructions. Its
more complicated that I would like for the sh*add.uw instructions
because there is no guaranteed canonicalization for shl/and with
constants.
Reviewed By: asb
Differential Revision: https://reviews.llvm.org/D95106
This is to support the memory routines vec_malloc, vec_calloc, vec_realloc, and vec_free. These routines manage memory that is 16-byte aligned. And they are only available on AIX.
Differential Revision: https://reviews.llvm.org/D94710
Fusion of generic/indexed_generic operations with tensor_reshape by
expansion when the latter just adds/removes unit-dimensions is
disabled since it just adds unit-trip count loops.
Differential Revision: https://reviews.llvm.org/D94626
These instructions use a portion of the encodings for grevi and
gorci. The full encodings are only supported with Zbp. Note,
rev8 has a different encoding between rv32 and rv64.
Zbb is closer to being finalized that Zbp which has motivated
some decisions in this patch.
I'm treating rev8 and orc.b as separate instructions when
either Zbb or Zbp is enabled. This allows us to print to suggest
that either feature needs to be enabled to support these mnemonics.
I had tried to put HasStdExtZbbAndNotZbp on the Zbb instructions,
but that caused a diagnostic that said Zbp is required if neither
feature is enabled. We should really mention Zbb since its closer
to final.
This does require extra isel patterns for the different cases so
that bswap will always print as rev8 in assembly listing since
we can't use an InstAlias.
llvm-objdump disassembling should always pick the rev8 or orc.b
instructions. llvm-mc parsing and printing text will not convert
the grevi/gorci spellings to rev8/gorc.b. We could probably fix
this with a special case in processInstruction in the assembly
parser if it its important.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94944
zext.h uses the same encoding as pack rd, rs, x0 in rv32 and
packw rd, rs, x0 in rv64. Encodings without x0 as the second source
are not valid in Zbb.
I've added two new instructions with these specific encodings with
predicates that enable them when either Zbb or Zbp is enabled.
The pack spelling will only be accepted with Zbp. The disassembler
will use the zext.h instruction when either feature is enabled.
Using the pack spelling will print as pack when llvm-mc is
emitting text. We could fix this with some custom code in
processInstruction if this is important, but I'm not sure it is.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94818
Zext.h will need to come back to Zbb, but that only uses specific
encodings of pack.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94742
This didn't make it into the published 0.93 spec, but it was the
intention.
But it is in the tex source as of this commit
d172f029c0
This means zext.w now requires Zba. Not sure if we should still use
pack if Zbp is enabled and Zba isn't. I'll leave that for the future
when pack is closer to being final.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94736
The 0.93 spec has this implementation for add.uw
uint_xlen_t adduw(uint_xlen_t rs1, uint_xlen_t rs2) {
uint_xlen_t rs1u = (uint32_t)rs1;
return rs1u + rs2;
}
The 0.92 spec had the usages of rs1 and rs2 swapped.
Reviewed By: frasercrmck, asb
Differential Revision: https://reviews.llvm.org/D95090
Also renamed Zbe instructions to resolve name conflict even though
that change is in the 0.94 draft.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94653
It's not really clear in the spec that these are in Zbp now, but
that's what I've gather from previous commits to the spec. I've
file an issue to get it documented properly.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94652
This is the first of multiple patches to bring our 0.92
implementation up to 0.93.
Reviewed By: asb, frasercrmck
Differential Revision: https://reviews.llvm.org/D94568
The legacy PM's EP_CGSCCOptimizerLate was only used under not-O0.
Fixes clang/test/CodeGenCXX/cxx0x-initializer-stdinitializerlist.cpp under the new PM.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95250
It has the low-level bit fiddling operations from bit. It eliminates a cyclic dependency between __bit_reference, bits, and vector. I want to exploit this in later patches.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D94908