Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
This recommits r290512 that was reverted when MSVC failed to compile it. Since
then I've played with various approaches using rextester.com (where I was able
to reproduce the failure) and think that I have a solution thanks in part to
the help of Dave Blaikie! It seems MSVC just has a defective `decltype` in this
version. Manually writing out the type seems to do the trick, even though it is
.... quite complicated.
Original commit message:
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.
I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.
Differential Revision: https://reviews.llvm.org/D28093
llvm-svn: 290528
multiple asynchronous RPC calls.
ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.
This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.
llvm-svn: 290523
This code doesn't work on MSVC for reasons that elude me and I've not
yet covinced a workaround to compile cleanly so reverting for now while
I play with it.
llvm-svn: 290513
This allows both defining convenience iterator/range accessors on types
which walk across N different independent ranges within the object, and
more direct and simple usages with range based for loops such as shown
in the unittest. The same facilities are used for both. They end up
quite small and simple as it happens.
I've also switched an iterator on `Module` to use this. I would like to
add another convenience iterator that includes even more sequences as
part of it and seeing this one already present motivated me to actually
abstract it away and introduce a general utility.
Differential Revision: https://reviews.llvm.org/D28093
llvm-svn: 290512
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.
Differential Revision: https://reviews.llvm.org/D28089
llvm-svn: 290507
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.
Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.
Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?
Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.
llvm-svn: 290457
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]
Differential Revision: https://reviews.llvm.org/D28076
llvm-svn: 290449
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
Differential Revision: https://reviews.llvm.org/D25848
llvm-svn: 290427
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).
This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.
Patch by Andrew Zhogin.
llvm-svn: 290426
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: mkuper, MatzeB, aprantl
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290423