This adds a combine that canonicalizes a chain of inserts which broadcasts
a value into a single insert + a splat shufflevector.
This fixes PR31286.
Differential Revision: https://reviews.llvm.org/D27992
llvm-svn: 290641
There is no need to do this within an analysis. That method shouldn't
even be reached if this predicate holds as the actual useful
optimization is in the analysis manager itself.
llvm-svn: 290614
The effect of the bug was that we would incorrectly create summaries
for global and weak values defined in module asm (since we were
essentially testing for bit 1 which is SF_Undefined, and the
RecordStreamer ignores local undefined references). This would have
resulted in conservatively disabling importing of anything referencing
globals and weaks defined in module asm. Added these cases to the test
which now fails without this bug fix.
Fixes PR31459.
llvm-svn: 290610
The feature allows for conditional assembly, filling the entries
of .amd_kernel_code_t etc.
Symbols are defined with value 0 at the beginning of each kernel scope.
After each register usage, the respective symbol is set to:
value = max( value, ( register index + 1 ) )
Thus, at the end of scope the value represents a count of used registers.
Kernel scopes begin at .amdgpu_hsa_kernel directive, end at the
next .amdgpu_hsa_kernel (or EOF, whichever comes first). There is also
dummy scope that lies from the beginning of source file til the
first .amdgpu_hsa_kernel.
Test added.
Differential Revision: https://reviews.llvm.org/D27859
llvm-svn: 290608
Because operand was not marked as seen it was visited twice.
It doesn't change behavior of optimization, it just saves redudant
visit, so no test changes.
llvm-svn: 290607
This requires custom handling because BasicAA caches handles to other
analyses and so it needs to trigger indirect invalidation.
This fixes one of the common crashes when using the new PM in real
pipelines. I've also tweaked a regression test to check that we are at
least handling the most immediate case.
I'm going to work at re-structuring this test some to both scale better
(rather than all being in one file) and check more invalidation paths in
a follow-up commit, but I wanted to get the basic bug fix in place.
llvm-svn: 290603
not really wired into the loop pass manager in a way that will let us
productively use these passes yet.
This lets the new PM get farther in basic testing which is useful for
establishing a good baseline of "doesn't explode". There are still
plenty of crashers in basic testing though, this just gets rid of some
noise that is well understood and not representing a specific or narrow
bug.
llvm-svn: 290601
inter-analysis dependencies) to use the new invalidation infrastructure.
This teaches it to invalidate itself when any of the peer function
AA results that it uses become invalid. We do this by just tracking the
originating IDs. I've kept it in a somewhat clunky API since some users
of AAResults are outside the new PM right now. We can clean this API up
if/when those users go away.
Secondly, it uses the registration on the outer analysis manager proxy
to trigger deferred invalidation when a module analysis result becomes
invalid.
I've included test cases that specifically try to trigger use-after-free
in both of these cases and they would crash or hang pretty horribly for
me even without ASan. Now they work nicely.
The `InvalidateAnalysis` utility pass required some tweaking to be
useful in this context and it still is pretty garbage. I'd like to
switch it back to the previous implementation and teach the explicit
invalidate method on the AnalysisManager to take care of correctly
triggering indirect invalidation, but I wanted to go ahead and send this
out so folks could see how all of this stuff works together in practice.
And, you know, that it does actually work. =]
Differential Revision: https://reviews.llvm.org/D27205
llvm-svn: 290595
that require deferred invalidation.
This handles the other real-world invalidation scenario that we have
cases of: a function analysis which caches references to a module
analysis. We currently do this in the AA aggregation layer and might
well do this in other places as well.
Since this is relative rare, the technique is somewhat more cumbersome.
Analyses need to register themselves when accessing the outer analysis
manager's proxy. This proxy is already necessarily present to allow
access to the outer IR unit's analyses. By registering here we can track
and trigger invalidation when that outer analysis goes away.
To make this work we need to enhance the PreservedAnalyses
infrastructure to support a (slightly) more explicit model for "sets" of
analyses, and allow abandoning a single specific analyses even when
a set covering that analysis is preserved. That allows us to describe
the scenario of preserving all Function analyses *except* for the one
where deferred invalidation has triggered.
We also need to teach the invalidator API to support direct ID calls
instead of always going through a template to dispatch so that we can
just record the ID mapping.
I've introduced testing of all of this both for simple module<->function
cases as well as for more complex cases involving a CGSCC layer.
Much like the previous patch I've not tried to fully update the loop
pass management layer because that layer is due to be heavily reworked
to use similar techniques to the CGSCC to handle updates. As that
happens, we'll have a better testing basis for adding support like this.
Many thanks to both Justin and Sean for the extensive reviews on this to
help bring the API design and documentation into a better state.
Differential Revision: https://reviews.llvm.org/D27198
llvm-svn: 290594
skipping indirectly recursive inline chains.
To do this, we implicitly build an inline stack for each callsite and
check prior to inlining that doing so would not form a cycle. This uses
the exact same technique and even shares some code with the legacy PM
inliner.
This solution remains deeply unsatisfying to me because it means we
cannot actually iterate the inliner externally. Doing so would not be
able to easily detect and avoid such cycles. Some day I would very much
like to have a solution that works without this internal state to detect
cycles, but this is not that day.
llvm-svn: 290590
We currently ignore the `allocsize` attribute on functions calls with
the `nobuiltin` attribute when trying to lower `@llvm.objectsize`. We
shouldn't care about `nobuiltin` here: `allocsize` is explicitly added
by the user, not inferred based on a function's symbol.
llvm-svn: 290588
This also makes us no longer check for `allocsize` on intrinsic calls.
This shouldn't matter, since intrinsics should provide the information
we get from `allocsize` on their own.
llvm-svn: 290585
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
This builds on r290554 which added supported for 128 and 256-bit.
llvm-svn: 290582
The 128 and 256 bit masked intrinsics are currently unused by clang. The sse and avx2 unmasked intrinsics are used instead. The new 512-bit intrinsic will be used to do the same. Then all masked versions will removed and autoupgraded.
llvm-svn: 290573
An earlier commit added support for unmasked scalar operations. At that time isel wouldn't generate an optimal sequence for masked operations, but that has now been fixed.
llvm-svn: 290566
inside of `InlineFunction`. Prior to this, call instructions are
specifically being rewritten and replaced within the inlined region,
invalidating some of the call sites.
Several of these regions are using the same technique to walk the
inlined region so this seems clearly safe up to this point.
I've also added a short circuit to the scan for call sites based on what
other code is doing.
With this, the most common crash I've found in the new inliner code is
fixed. I've turned it on for another test case that covers this
scenario.
I'll make my way through most of the other inliner test cases
just to get some easy coverage next.
llvm-svn: 290562
removing fully-dead comdats without removing dead entries in comdats
with live members.
This factors the core logic out of the current inliner's internals to
a reusable utility and leverages that in both places. The factored out
code should also be (minorly) more efficient in cases where we have very
few dead functions or dead comdats to consider.
I've added a test case to cover this behavior of the always inliner.
This is the last significant bug in the new PM's always inliner I've
found (so far).
llvm-svn: 290557
PMULDQ/PMULUDQ vXi64 instructions only use the even numbered v2Xi32 input elements which SimplifyDemandedVectorElts should try and use.
Differential Revision: https://reviews.llvm.org/D28119
llvm-svn: 290554
Mostly use a bit more idiomatic C++ where we can,
so we can combine some things later.
Reviewers: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D28111
llvm-svn: 290550
Summary:
I only do this for unmasked cases for now because isel is failing to fold the mask. I'll try to fix that soon.
I'll do the same thing for packed add/sub/mul/div in a future patch.
Reviewers: delena, RKSimon, zvi, craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27879
llvm-svn: 290535
Summary:
This patch adds support for converting the masked vpermv intrinsics into shufflevector instructions if the indices are constants.
We also need to wrap a select instruction around the shuffle to take care of the masking part. InstCombine will take care of optimizing the select if the mask is constant so I didn't bother checking for that.
Reviewers: zvi, delena, spatel, RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27825
llvm-svn: 290530
multiple asynchronous RPC calls.
ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.
This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.
llvm-svn: 290523
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.
Differential Revision: https://reviews.llvm.org/D28089
llvm-svn: 290507
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.
Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.
Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?
Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.
llvm-svn: 290457
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]
Differential Revision: https://reviews.llvm.org/D28076
llvm-svn: 290449
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
Differential Revision: https://reviews.llvm.org/D25848
llvm-svn: 290427
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).
This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.
Patch by Andrew Zhogin.
llvm-svn: 290426
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: mkuper, MatzeB, aprantl
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290423
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
tests of machine-licm.Correct some check lines.
Differential Revision: https://reviews.llvm.org/D27916
llvm-svn: 290410
This patch fixes some ASAN unittest failures on FreeBSD. See the
cfe-commits email thread for r290169 for more on those.
According to the LangRef, the allocsize attribute only tells us about
the number of bytes that exist at the memory location pointed to by the
return value of a function. It does not necessarily mean that the
function will only ever allocate. So, we need to be very careful about
treating functions with allocsize as general allocation functions. This
patch makes us fully conservative in this regard, though I suspect that
we have room to be a bit more aggressive if we want.
This has a FIXME that can be fixed by a relatively straightforward
refactor; I just wanted to keep this patch minimal. If this sticks, I'll
come back and fix it in a few days.
llvm-svn: 290397
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason. Please review this as essentially a new pass. The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.
The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits. The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.
This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection. It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).
Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.
Reviewers: reames, anna, atrick
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27592
llvm-svn: 290394
This patch adds support for YAML<->DWARF for debug_info sections.
This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.
After adding support for preserving endianness, this should be good now.
llvm-svn: 290386
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.
The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.
This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.
Differential Revision: https://reviews.llvm.org/D28012
llvm-svn: 290384
We used to not check generic vregs, but that is actually a mistake given
nothing in the GlobalISel pipeline is going to fix the constraints on
target specific instructions. Therefore, the target has to have them
right from the start.
llvm-svn: 290380
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.
This fixes invalid register classes on call instruction.
llvm-svn: 290377
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.
llvm-svn: 290376
This is going to be needed to be able to constraint register class on
target specific instruction while the RegBankSelect pass did not run
yet.
llvm-svn: 290375
Move the logic to constraint register from InstructionSelector to a
utility function. It will be required by other passes in the GlobalISel
pipeline.
llvm-svn: 290374
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
llvm-svn: 290372