most of the inliner test cases.
The inliner involves a bunch of interesting code and tends to be where
most of the issues I've seen experimenting with the new PM lie. All of
these test cases pass, but I'd like to keep some more thorough coverage
here so doing a fairly blanket enabling.
There are a handful of interesting tests I've not enabled yet because
they're focused on the always inliner, or on functionality that doesn't
(yet) exist in the inliner.
llvm-svn: 290592
skipping indirectly recursive inline chains.
To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.
This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.
llvm-svn: 290590
Nothing really interesting here, but I had to improve the test to use
variables rather than hard coding value names as we happen to end up
with different value names in the new PM.
llvm-svn: 290589
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.
llvm-svn: 290588
This also makes us no longer check for `allocsize` on intrinsic calls.
This shouldn't matter, since intrinsics should provide the information
we get from `allocsize` on their own.
llvm-svn: 290585
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
constant expression and to correctly form function reference edges
through them without crashing because one of the operands (the
`BasicBlock` isn't actually a constant despite being an operand of
a constant).
llvm-svn: 290581
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
This mostly involved converting from grep to FileCheck and tidying up
the IR used.
In one case (invoke_test-3.ll) the test had become completely pointless
as we use 'resume' rather than 'unwind' now, and even then it did not
occur at the end of the line.
llvm-svn: 290570
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
These particular sequences will be generated after a future change to teach InstCombine to turn masked scalar arithmetic intrinsics into native IR.
llvm-svn: 290563
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.
Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.
I've also added a short circuit to the scan for call sites based on what
other code is doing.
With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.
I'll make my way through most of the other inliner test cases
just to get some easy coverage next.
llvm-svn: 290562
removing fully-dead comdats without removing dead entries in comdats
with live members.
This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.
I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).
llvm-svn: 290557
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28111
llvm-svn: 290550
The current GVN algorithm folds unconditional branches to, it claims,
expose more PRE oportunities. The folding, if really needed,
(which is not sure, as it's not really proved it improves analysis)
can be done by an earlier cleanup pass instead of GVN itself.
Ack'ed/SGTM'd by Daniel Berlin.
Differential Revision: https://reviews.llvm.org/D28117
llvm-svn: 290546
systematically and document in the test what all is going on.
This replaces the PR-named test that was the only coverage for GlobalDCE
and comdats previously. I wrote this because I wasn't certain how
comdat DCE was supposed to work and wanted to step through what
GlobalDCE did to fully understand it. After talking to folks and reading
the code and really staring at things it all makes sense but it seemed
good to help write down some of this in a more explicit and fully
covering test case.
For example, it seemed like a bug that GlobalDCE didn't consider comdat
participation of ifuncs. Specifically it seemed like an accident because
testing didn't really cover that case. But in fact, ifuncs specifically
cannot participate in a comdat despite having that API. The new test
case covers this and explicitly documents that DCE gets to fire here
even though there are comdats involved.
Also, we didn't have any positive tests for the challenging cases such
as usage cycles between comdat participants that might make them seem
alive except that there is no external edge into the cycle.
llvm-svn: 290537
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
This recommits r290512 that was reverted when MSVC failed to compile it. Since
then I've played with various approaches using rextester.com (where I was able
to reproduce the failure) and think that I have a solution thanks in part to
the help of Dave Blaikie! It seems MSVC just has a defective `decltype` in this
version. Manually writing out the type seems to do the trick, even though it is
.... quite complicated.
Original commit message:
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.
I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.
Differential Revision: https://reviews.llvm.org/D28093
llvm-svn: 290528
multiple asynchronous RPC calls.
ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.
This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.
llvm-svn: 290523