Summary:
There is still room for improvement in the handling of multivalue
nodes in both passes, but the current algorithm is at least correct
and optimizes some simpler cases. In order to make future
optimizations of these passes easier and build confidence that the
current algorithms are correct, this CL also adds a script that
automatically and exhaustively generates interesting multivalue test
cases.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72902
https://reviews.llvm.org/D67133
While investigating some non determinism (CSE doesn't produce wrong
code, it just doesn't CSE some times) in GISel CSE on an out of tree
target, I realized that the core issue was that there were lots of code
that mutates (setReg, setRegClass etc), but doesn't notify observers
(CSE in this case but this could be any other observer). In order to
make the Observer be available in various parts of code and to avoid
having to thread it through various API, the MachineFunction now has the
observer as field. This allows it to be easily used in helper functions
such as constrainOperandRegClass.
Also added some invariant verification method in CSEInfo which can
catch these issues (when CSE is enabled).
Essentially, fold OrderedBasicBlock into BasicBlock, and make it
auto-invalidate the instruction ordering when new instructions are
added. Notably, we don't need to invalidate it when removing
instructions, which is helpful when a pass mostly delete dead
instructions rather than transforming them.
The downside is that Instruction grows from 56 bytes to 64 bytes. The
resulting LLVM code is substantially simpler and automatically handles
invalidation, which makes me think that this is the right speed and size
tradeoff.
The important change is in SymbolTableTraitsImpl.h, where the numbering
is invalidated. Everything else should be straightforward.
We probably want to implement a fancier re-numbering scheme so that
local updates don't invalidate the ordering, but I plan for that to be
future work, maybe for someone else.
Reviewed By: lattner, vsk, fhahn, dexonsmith
Differential Revision: https://reviews.llvm.org/D51664
There is prior art for this in the code base itself, and a recent
example of this here: c45f8d4989
This came up in discussion on this review where @maskray was going the
opposite direction:
https://reviews.llvm.org/D68772
Given that there is disagreement, we should make a choice and document
it.
Thanks to John McCall for the precise wording.
Reviewed By: MaskRay, rjmccall
Differential Revision: https://reviews.llvm.org/D74515
Summary:
Unlike normal calls, call_indirects have immediate arguments that
caused a MachineVerifier failure without a small tweak to loosen the
verifier's requirements for variadicOpsAreDefs instructions.
One nice thing about the new call_indirects is that they do not need
to participate in the PCALL_INDIRECT mechanism because their post-isel
hook handles moving the function pointer argument and adding the flags
and typeindex arguments itself.
Reviewers: aheejin
Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74191
This reverts commit 649aba93a2, now that
the approach started there has been shown to be workable in the patch
series culminating in https://reviews.llvm.org/D74192.
The WASM and Hexagon plugin check the ArchType rather than the OSType,
so explicitly reject those in the DynamicLoaderStatic.
Differential revision: https://reviews.llvm.org/D74780
Fixing a bug where using a zero-rank shaped type operand to
linalg.generic ops hit an unrelated assert. This also meant that
lowering the operation to loops was not supported. Adding roundtrip
tests and lowering to loops test for zero-rank shaped type operand
with fixes to make the test pass.
Differential Revision: https://reviews.llvm.org/D74638
-ffunction-sections and -fdata-sections are well supported by many
object file formats:
- ELF
- COFF
- XCOFF
- wasm
Only MachO ignores this flag.
While here, remove it from -funique-section-names. Wasm honors this
option.
Addresses PR44910.
Reviewed By: hans, aaron.ballman
Differential Revision: https://reviews.llvm.org/D74634
parseInstructions() doesn't always process the whole set of DWARF
instructions for a frame. It will stop once the target PC is reached, or
if malformed instructions are found. So, for example, if we have an
instruction sequence like this:
```
<start>
...
DW_CFA_remember_state
...
DW_CFA_advance_loc past the location we're unwinding at (pcoffset in parseInstructions() main loop)
...
DW_CFA_restore_state
<end>
```
... the saved state will never be freed, even though the
DW_CFA_remember_state opcode has a matching DW_CFA_restore_state later
in the sequence.
This change adds code to free whatever is left on rememberStack after
parsing the CIE and the FDE instructions.
Differential Revision: https://reviews.llvm.org/D66904
Summary: This class wraps around the various different ways to construct a range of Type, without forcing the materialization of that range into a contiguous vector.
Differential Revision: https://reviews.llvm.org/D74646
Generate the LLDB_PLUGIN_DECLARE macros with CMake and a def file. I'm
landing D73067 in pieces so I can bisect what exactly is breaking the
Windows bot.
Fixes https://bugs.llvm.org/show_bug.cgi?id=44922 (caused by 4698bf145d)
ThreadThroughTwoBasicBlocks assumes PredBBBranch is conditional. The following code can segfault.
AddPHINodeEntriesForMappedBlock(PredBBBranch->getSuccessor(1), PredBB, NewBB,
ValueMapping);
We can also allow unconditional PredBB, but the produced code is not
better.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D74747
Other plugins depend on DynamicLoaderDarwinKernel and which means we
cannot conditionally enable/build this plugin based on the target
platform. This means that it will be past of the list of plugins
initialized once that's autogenerated.
Summary:
Most of our larger data is dynamically allocated (via `map`) but it
became an hindrance with regard to init time, for a cost to benefit
ratio that is not great. So change the `TSD`s, `RegionInfo`, `ByteMap`
to be static.
Additionally, for reclaiming, we used mapped & unmapped a buffer each
time, which is costly. It turns out that we can have a static buffer,
and that there isn't much contention on it.
One of the other things changed here, is that we hard set the number
of TSDs on Android to the maximum number, as there could be a
situation where cores are put to sleep and we could miss some.
Subscribers: mgorny, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74696
Summary:
This patch adds a new MVE intrinsics family, `vbrsrq`: vector bit
reverse and shift right. The intrinsics are compiled into the VBRSR
instruction. Two new LLVM IR intrinsics were also added: arm.mve.vbrsr
and arm.mve.vbrsr.predicated.
Reviewers: simon_tatham, dmgreen, ostannard, MarkMurrayARM
Reviewed By: simon_tatham
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74721
The index of an ExtractElementInst is not guaranteed to be a
ConstantInt. It can be any integer value. Check explicitly for
ConstantInts.
The new test cases illustrate scenarios where we crash without
this patch. I've also added another test case to check the matching
of extractelement vector ops works.
Reviewers: RKSimon, ABataev, dtemirbulatov, vporpo
Reviewed By: ABataev
Differential Revision: https://reviews.llvm.org/D74758
When simplifying demanded bits, we currently only report the
instruction on which SimplifyDemandedBits was called as changed.
However, this is a recursive call, and the actually modified
instruction will usually be further up the chain. Additionally,
all the intermediate instructions should also be revisited,
as additional combines may be possible after the demanded bits
simplification. We fix this by explicitly adding them back to the
worklist.
Differential Revision: https://reviews.llvm.org/D72944
The select-of-cttz transform can currently duplicate cttz intrinsics
and zext/trunc ops. The cause is that it unnecessarily duplicates
the intrinsic and the zext/trunc when setting the "undef_on_zero"
flag to false. However, it's always legal to set the flag from true
to false, so we can make this replacement even if there are extra users.
Differential Revision: https://reviews.llvm.org/D74685