multiple asynchronous RPC calls.
ParallelCallGroup allows multiple asynchronous calls to be dispatched,
and provides a wait method that blocks until all asynchronous calls have
been executed on the remote and all return value handlers run on the
local machine.
This will allow, for example, the JIT client to issue memory allocation calls
for all sections in parallel, then block until all memory has been allocated
on the remote and the allocated addresses registered with the client, at which
point the JIT client can proceed to applying relocations.
llvm-svn: 290523
This makes it explicit what is the exact list to handle, and it
looks much more easy to manipulate and understand that the
previous custom tracking of min/max to express the range where
to look for.
Differential Revision: https://reviews.llvm.org/D28089
llvm-svn: 290507
whether functions are removed, and fix the new PM's always inliner to
actually pass this test.
Without this, the new PM's always inliner leaves all the functions
kicking around which won't work out very well given the semantics of
always inline.
Doing this really highlights how frustrating the current alwaysinline
semantic contract is though -- why can we put it on *external*
functions, etc?
Also I've added a number of tricky and interesting test cases for
removing functions with the always inliner. There is one remaining case
not handled -- fully removing comdats -- and I've left a FIXME about
this.
llvm-svn: 290457
Pretty boring and lame as-is but necessary. This is definitely a place
we'll end up with extension hooks longer term. =]
Differential Revision: https://reviews.llvm.org/D28076
llvm-svn: 290449
The pass creates some state which expects to be cleaned up by
a later instance of the same pass. opt-bisect happens to expose
this not ideal design because calling skipLoop() will result in
this state not being cleaned up at times and an assertion firing
in `doFinalization()`. Chandler tells me the new pass manager will
give us options to avoid these design traps, but until it's not ready,
we need a workaround for the current pass infrastructure. Fix provided
by Andy Kaylor, see the review for a complete discussion.
Differential Revision: https://reviews.llvm.org/D25848
llvm-svn: 290427
According to the Cortex-A57 doc, FDIV/FSQRT instructions should use F0 unit
(W-unit in AArch64SchedA57.td, the same as cryptography instructions),
not F1 unit (X-unit in td, like ASIMD absolute diff accum SABA/UABA).
This patch changes FDIV/FSQRT scheduling declarations to use A57UnitW
instead of A57UnitX. Also, latencies for those instructions are
corrected.
Patch by Andrew Zhogin.
llvm-svn: 290426
Summary:
In mergeSPUpdates, debug values need to be ignored when getting the
previous element, otherwise debug data could have an impact on codegen.
In eliminateCallFramePseudoInstr, debug values after the erased element
could have an impact on codegen and should be skipped.
Closes PR31319 (https://llvm.org/bugs/show_bug.cgi?id=31319)
Reviewers: mkuper, MatzeB, aprantl
Subscribers: gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D27688
llvm-svn: 290423
1.Fix pessimized case in FIXME.
2.Add tests for it.
3.The canonicalisation on shifts results in different sequence for
tests of machine-licm.Correct some check lines.
Differential Revision: https://reviews.llvm.org/D27916
llvm-svn: 290410
This patch fixes some ASAN unittest failures on FreeBSD. See the
cfe-commits email thread for r290169 for more on those.
According to the LangRef, the allocsize attribute only tells us about
the number of bytes that exist at the memory location pointed to by the
return value of a function. It does not necessarily mean that the
function will only ever allocate. So, we need to be very careful about
treating functions with allocsize as general allocation functions. This
patch makes us fully conservative in this regard, though I suspect that
we have room to be a bit more aggressive if we want.
This has a FIXME that can be fixed by a relatively straightforward
refactor; I just wanted to keep this patch minimal. If this sticks, I'll
come back and fix it in a few days.
llvm-svn: 290397
Summary:
This change rewrites a core component in the ImplicitNullChecks pass for
greater simplicity since the original design was over-complicated for no
good reason. Please review this as essentially a new pass. The change
is almost NFC and I've added a test case for a scenario that this new
code handles that wasn't handled earlier.
The implicit null check pass, at its core, is a code hoisting transform.
It differs from "normal" code transforms in that it speculates
potentially faulting instructions (by design), but a lot of the usual
hazard detection logic (register read-after-write etc.) still applies.
We previously detected hazards by keeping track of registers defined and
used by machine instructions over an instruction range, but that was
unwieldy and did not actually confer any performance benefits. The
intent was to have linear time complexity over the number of machine
instructions considered, but it ended up being N^2 is practice.
This new version is more obviously O(N^2) (with N capped to 8 by
default) in hazard detection. It does not attempt to be clever in
tracking register uses or defs (the previous cleverness here was a
source of bugs).
Once this is checked in, I'll extract out the `IsSuitableMemoryOp` and
`CanHoistLoadInst` lambda into member functions (they're too complicated
to be inline lambdas) and do some other related NFC cleanups.
Reviewers: reames, anna, atrick
Subscribers: mcrosier, llvm-commits
Differential Revision: https://reviews.llvm.org/D27592
llvm-svn: 290394
This patch adds support for YAML<->DWARF for debug_info sections.
This re-lands r290147, reverted in 290148, re-landed in r290204 after fixing the issue that caused bots to fail (thank you UBSan!), and reverted again in r290209 due to failures on big endian systems.
After adding support for preserving endianness, this should be good now.
llvm-svn: 290386
Use a dummy private function with inline asm calls instead of module
level asm blocks for CFI jumptables.
The main advantage is that now jumptable codegen can be affected by
the function attributes (like target_cpu on ARM). Module level asm
gets the default subtarget based on the target triple, which is often
not good enough.
This change also uses asm constraints/arguments to reference
jumptable targets and aliases directly. We no longer do asm name
mangling in an IR pass.
Differential Revision: https://reviews.llvm.org/D28012
llvm-svn: 290384
We used to not check generic vregs, but that is actually a mistake given
nothing in the GlobalISel pipeline is going to fix the constraints on
target specific instructions. Therefore, the target has to have them
right from the start.
llvm-svn: 290380
The InstructionSelect pass will not look at target specific instructions
since they are already selected. As a result, the operands of target
specific instructions must be properly constrained, because it is not
going to fix them.
This fixes invalid register classes on call instruction.
llvm-svn: 290377
When generic virtual registers get constrained, because of a use on a
target specific operation for instance, we end up with regular virtual
registers with a type and that's perfectly fine.
llvm-svn: 290376
This is going to be needed to be able to constraint register class on
target specific instruction while the RegBankSelect pass did not run
yet.
llvm-svn: 290375
Move the logic to constraint register from InstructionSelector to a
utility function. It will be required by other passes in the GlobalISel
pipeline.
llvm-svn: 290374
Canonicalize a select with a constant to the false side. This
enables more instruction shrinking opportunities since an
inline immediate can be used for the false side of v_cndmask_b32_e32.
This seems to usually be better but causes some code size regressions
in some tests.
llvm-svn: 290372
This is a succeeding patch of https://reviews.llvm.org/D22840 to address the
issue when a value to be merged into an int64 pair is in a different BB. Redoing
the store splitting in CodeGenPrepare so we can match the pattern across multiple
BBs and move some instructions into the same BB. We still keep the code in dag
combine so that we can catch cases that show up after DAG combining runs.
Differential Revision: https://reviews.llvm.org/D25914
llvm-svn: 290365
This is for splitMergedValStore in DAG Combine to share the target query interface
with similar logic in CodeGenPrepare.
Differential Revision: https://reviews.llvm.org/D24707
llvm-svn: 290363
Follow up to D27209 fix, this patch now properly handles single transient
instruction in basic block.
Patch by Aleksandar Beserminji.
Differential Revision: https://reviews.llvm.org/D27856
llvm-svn: 290361
When the pipeliner is renaming phi values, it may need to iterate through
the phi operands to check for other phis. However, the pipeliner should
stop once it reaches a phi that is outside the pipelined loop.
Also, when the generateExistingPhis code is unable to reuse an existing
phi, the default code that computes the PhiOp2 is only to be used when
the pipeliner is generating the kernel. Otherwise, the phi may be a value
computed earlier in the same epilog.
Patch by Brendon Cahoon.
llvm-svn: 290355
The code have been developed by Daniel Berlin over the years, and
the new implementation goal is that of addressing shortcomings of
the current GVN infrastructure, i.e. long compile time for large
testcases, lack of phi predication, no load/store value numbering
etc...
The current code just implements the "core" GVN algorithm, although
other pieces (load coercion, phi handling, predicate system) are
already implemented in a branch out of tree. Once the core is stable,
we'll start adding pieces on top of the base framework.
The test currently living in test/Transform/NewGVN are a copy
of the ones in GVN, with proper `XFAIL` (missing features in NewGVN).
A flag will be added in a future commit to enable NewGVN, so that
interested parties can exercise this code easily.
Differential Revision: https://reviews.llvm.org/D26224
llvm-svn: 290346
Summary: This is needed for later SDWA support in CodeGen.
Reviewers: vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27412
llvm-svn: 290338
Summary: Real instruction should copy constraints from real instruction. This allows auto-generated disassembler to correctly process tied operands.
Reviewers: nhaustov, vpykhtin, tstellarAMD
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27847
llvm-svn: 290336
Replacing the memory operand in the ymm version of VPMADDWD from i128mem to i256mem.
Differential Revision: https://reviews.llvm.org/D28024
llvm-svn: 290333
declarations.
We're using a custom class here instead of the helper template, these
bits just didn't get deleted when the other bits did get deleted. This
was found by a really nice MSVC warning about explicitly instantiating
a template where some member functions aren't defined and thus can't be
instantiatied.
llvm-svn: 290327
from the old pass manager in the new one.
I'm not trying to support (initially) the numerous options that are
currently available to customize the pass pipeline. If we end up really
wanting them, we can add them later, but I suspect many are no longer
interesting. The simplicity of omitting them will help a lot as we sort
out what the pipeline should look like in the new PM.
I've also documented to the best of my ability *why* each pass or group
of passes is used so that reading the pipeline is more helpful. In many
cases I think we have some questionable choices of ordering and I've
left FIXME comments in place so we know what to come back and revisit
going forward. But for now, I've left it as similar to the current
pipeline as I could.
Lastly, I've had to comment out several places where passes are not
ported to the new pass manager or where the loop pass infrastructure is
not yet ready. I did at least fix a few bugs in the loop pass
infrastructure uncovered by running the full pipeline, but I didn't want
to go too far in this patch -- I'll come back and re-enable these as the
infrastructure comes online. But I'd like to keep the comments in place
because I don't want to lose track of which passes need to be enabled
and where they go.
One thing that seemed like a significant API improvement was to require
that we don't build pipelines for O0. It seems to have no real benefit.
I've also switched back to returning pass managers by value as at this
API layer it feels much more natural to me for composition. But if
others disagree, I'm happy to go back to an output parameter.
I'm not 100% happy with the testing strategy currently, but it seems at
least OK. I may come back and try to refactor or otherwise improve this
in subsequent patches but I wanted to at least get a good starting point
in place.
Differential Revision: https://reviews.llvm.org/D28042
llvm-svn: 290325
When DwarfExpression is emitting a fragment that is located in a
register and that fragment is smaller than the register, and the
register must be composed from sub-registers (are you still with me?)
the last DW_OP_piece operation must not be larger than the size of the
fragment itself, since the last piece of the fragment could be smaller
than the last subregister that is being emitted.
rdar://problem/29779065
llvm-svn: 290324
There are helpers for testing for constant or constant build_vector,
and for splat ConstantFP vectors, but not for a constantfp or
non-splat ConstantFP vector.
llvm-svn: 290317
This is to put the vector into a well defined state. Apparently the state of a
vector after being moved from is valid but unspecified. Found with clang-tidy.
llvm-svn: 290298
This adds a basic tablegen backend that analyzes the SelectionDAG
patterns to find simple ones that are eligible for GlobalISel-emission.
That's similar to FastISel, with one notable difference: we're not fed
ISD opcodes, so we need to map the SDNode operators to generic opcodes.
That's done using GINodeEquiv in TargetGlobalISel.td.
Otherwise, this is mostly boilerplate, and lots of filtering of any kind
of "complicated" pattern. On AArch64, this is sufficient to match G_ADD
up to s64 (to ADDWrr/ADDXrr) and G_BR (to B).
Differential Revision: https://reviews.llvm.org/D26878
llvm-svn: 290284
Each function summary has an attached list of type identifier GUIDs. The
idea is that during the regular LTO phase we would match these GUIDs to type
identifiers defined by the regular LTO module and store the resolutions in
a top-level "type identifier summary" (which will be implemented separately).
Differential Revision: https://reviews.llvm.org/D27967
llvm-svn: 290280
In order for the llvm DWARF parser to be used in LLDB we will need to be able to get the parent of a DIE. This patch adds that functionality by changing the DWARFDebugInfoEntry class to store a depth field instead of a sibling index. Using a depth field allows us to easily calculate the sibling and the parent without increasing the size of DWARFDebugInfoEntry.
I tested llvm-dsymutil on a debug version of clang where this fully parses DWARF in over 1200 .o files to verify there was no serious regression in performance.
Added a full suite of unit tests to test this functionality.
Differential Revision: https://reviews.llvm.org/D27995
llvm-svn: 290274
RTDyldMemoryManager.cpp describes the differing __register_frame
API between libunwind and libgcc, with a mailing list posting URL.
The original link was 404; replace it with what I believe is the
intended post, as well as a reference to the "OS X" implementation in
libunwind.
Differential Revision: https://reviews.llvm.org/D27965
llvm-svn: 290269
The constantexpr parsing was too constrained and rejected legal vector GEPs.
This relaxes it to be similar to the ones for instruction parsing.
This fixes PR30816.
Differential Revision: https://reviews.llvm.org/D28013
llvm-svn: 290261
For vector GEPs, CastGEPIndices can end up in an infinite recursion, because
we compare the vector type to the scalar pointer type, find them different,
and then try to cast a type to itself.
Differential Revision: https://reviews.llvm.org/D28009
llvm-svn: 290260
I added API for creation a target specific memory node in DAG. Today, all memory nodes are common for all targets and their constructors are located in SelectionDAG.cpp.
There are some cases in X86 where we need to create a special node - truncation-with-saturation store, float-to-half-store.
In the current patch I added truncation-with-saturation nodes and I'm using them for intrinsics. In the future I plan to implement DAG lowering for truncation-with-saturation pattern.
Differential Revision: https://reviews.llvm.org/D27899
llvm-svn: 290250
The vectorcall calling convention specifies that arguments to functions are to be passed in registers, when possible.
vectorcall uses more registers for arguments than fastcall or the default x64 calling convention use.
The vectorcall calling convention is only supported in native code on x86 and x64 processors that include Streaming SIMD Extensions 2 (SSE2) and above.
The current implementation does not handle Homogeneous Vector Aggregates (HVAs) correctly and this review attempts to fix it.
This aubmit also includes additional lit tests to cover better HVAs corner cases.
Differential Revision: https://reviews.llvm.org/D27392
llvm-svn: 290240
In r267672, where the loop distribution pragma was introduced, I tried
it hard to keep the old behavior for opt: when opt is invoked
with -loop-distribute, it should distribute the loop (it's off by
default when ran via the optimization pipeline).
As MichaelZ has discovered this has the unintended consequence of
breaking a very common developer work-flow to reproduce compilations
using opt: First you print the pass pipeline of clang
with -debug-pass=Arguments and then invoking opt with the returned
arguments.
clang -debug-pass will include -loop-distribute but the pass is invoked
with default=off so nothing happens unless the loop carries the pragma.
While through opt (default=on) we will try to distribute all loops.
This changes opt's default to off as well to match clang. The tests are
modified to explicitly enable the transformation.
llvm-svn: 290235
we used to print UNKNOWN instructions when the instruction to be printer was not
yet inserted in any BB: in that case the pretty printer would not be able to
compute a TII as the instruction does not belong to any BB or function yet.
This patch explicitly passes the TII to the pretty-printer.
Differential Revision: https://reviews.llvm.org/D27645
llvm-svn: 290228
No existing client is passing a non-null value here. This will come back
in a slightly different form as part of the type identifier summary work.
Differential Revision: https://reviews.llvm.org/D28006
llvm-svn: 290222
We're currently doing nearly the same thing for @llvm.objectsize in
three different places: two of them are missing checks for overflow,
and one of them could subtly break if InstCombine gets much smarter
about removing alloc sites. Seems like a good idea to not do that.
llvm-svn: 290214
GlobPattern is a class to handle glob pattern matching. Currently
only LLD is using that, but technically that feature is not specific
to linkers, so in this patch I move that file to LLVM.
Differential Revision: https://reviews.llvm.org/D27969
llvm-svn: 290212
Summary:
In getRangeForAffineAR we compute ranges for affine exprs E = A + B*C,
where ranges for A, B, and C are known. To avoid overflow, we need to
operate on a bigger bitwidth, and originally we chose 2*x+1 for this
(x being the original bitwidth). However, it is safe to use just 2*x:
A+B*C <= (2^x - 1) + (2^x - 1)*(2^x - 1) =
= 2^x - 1 + 2^2x - 2^x - 2^x + 1 =
= 2^2x - 2^x <= 2^2x - 1
Unnecessary extending of bitwidths results in noticeable slowdowns: ranges
perform arithmetic operations using APInt, which are much slower when bitwidths
are bigger than 64.
Reviewers: sanjoy, majnemer, chandlerc
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27795
llvm-svn: 290211
This patch adds support for YAML<->DWARF for debug_info sections.
This re-lands r290147, after fixing the issue that caused bots to fail (thank you UBSan!).
llvm-svn: 290204
Also make the summary ref and call graph vectors immutable. This means
a smaller API surface and fewer places to audit for non-determinism.
Differential Revision: https://reviews.llvm.org/D27875
llvm-svn: 290200
Make it clear that TripCount is the upper bound of the iteration on which
control exits LatchBlock.
Differential Revision: https://reviews.llvm.org/D26675
llvm-svn: 290199
See https://reviews.llvm.org/D6678 for the history of
isExtractSubvectorCheap. Essentially the same considerations apply
to ARM.
This temporarily breaks the formation of vpadd/vpaddl in certain cases;
AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles.
See https://reviews.llvm.org/D27779 for followup fix.
Differential Revision: https://reviews.llvm.org/D27774
llvm-svn: 290198
When the instruction is processed the first time, it may be
deleted resulting in crashes. While the new test adds the same
user to the worklist twice, this particular case doesn't crash
but I'm not sure why.
llvm-svn: 290191
I haven't managed to get this to fail yet but its technically possible for the AND -> shuffle decomposition to result in illegal types.
llvm-svn: 290183
Summary:
Without a MachineMemOperand, the scheduler was assuming MIMG instructions
were ordered memory references, so no loads or stores could be reordered
across them.
Reviewers: arsenm
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye
Differential Revision: https://reviews.llvm.org/D27536
llvm-svn: 290179
Include of llvm/IR/Verifier.h was removed from HexagonCommonGEP.cpp in r289604
as unused. In fact it is required when expensive checks are enabled, because
it declared function `verifyFunction`, which is called in conditionally compiled
part of the file.
llvm-svn: 290170
clear. The current RefSCC can occur in exactly one position so we should
just enforce that and leverage the property rather than checking for it
anywhere.
This addresses review comments made on another patch.
llvm-svn: 290162
This doesn't implement *every* feature of the existing inliner, but
tries to implement the most important ones for building a functional
optimization pipeline and beginning to sort out bugs, regressions, and
other problems.
Notable, but intentional omissions:
- No alloca merging support. Why? Because it isn't clear we want to do
this at all. Active discussion and investigation is going on to remove
it, so for simplicity I omitted it.
- No support for trying to iterate on "internally" devirtualized calls.
Why? Because it adds what I suspect is inappropriate coupling for
little or no benefit. We will have an outer iteration system that
tracks devirtualization including that from function passes and
iterates already. We should improve that rather than approximate it
here.
- Optimization remarks. Why? Purely to make the patch smaller, no other
reason at all.
The last one I'll probably work on almost immediately. But I wanted to
skip it in the initial patch to try to focus the change as much as
possible as there is already a lot of code moving around and both of
these *could* be skipped without really disrupting the core logic.
A summary of the different things happening here:
1) Adding the usual new PM class and rigging.
2) Fixing minor underlying assumptions in the inline cost analysis or
inline logic that don't generally hold in the new PM world.
3) Adding the core pass logic which is in essence a loop over the calls
in the nodes in the call graph. This is a bit duplicated from the old
inliner, but only a handful of lines could realistically be shared.
(I tried at first, and it really didn't help anything.) All told,
this is only about 100 lines of code, and most of that is the
mechanics of wiring up analyses from the new PM world.
4) Updating the LazyCallGraph (in the new PM) based on the *newly
inlined* calls and references. This is very minimal because we cannot
form cycles.
5) When inlining removes the last use of a function, eagerly nuking the
body of the function so that any "one use remaining" inline cost
heuristics are immediately refined, and queuing these functions to be
completely deleted once inlining is complete and the call graph
updated to reflect that they have become dead.
6) After all the inlining for a particular function, updating the
LazyCallGraph and the CGSCC pass manager to reflect the
function-local simplifications that are done immediately and
internally by the inline utilties. These are the exact same
fundamental set of CG updates done by arbitrary function passes.
7) Adding a bunch of test cases to specifically target CGSCC and other
subtle aspects in the new PM world.
Many thanks to the careful review from Easwaran and Sanjoy and others!
Differential Revision: https://reviews.llvm.org/D24226
llvm-svn: 290161
This patch implements PR31013 by introducing a
DIGlobalVariableExpression that holds a pair of DIGlobalVariable and
DIExpression.
Currently, DIGlobalVariables holds a DIExpression. This is not the
best way to model this:
(1) The DIGlobalVariable should describe the source level variable,
not how to get to its location.
(2) It makes it unsafe/hard to update the expressions when we call
replaceExpression on the DIGLobalVariable.
(3) It makes it impossible to represent a global variable that is in
more than one location (e.g., a variable with multiple
DW_OP_LLVM_fragment-s). We also moved away from attaching the
DIExpression to DILocalVariable for the same reasons.
This reapplies r289902 with additional testcase upgrades and a change
to the Bitcode record for DIGlobalVariable, that makes upgrading the
old format unambiguous also for variables without DIExpressions.
<rdar://problem/29250149>
https://llvm.org/bugs/show_bug.cgi?id=31013
Differential Revision: https://reviews.llvm.org/D26769
llvm-svn: 290153
Summary:
The expression for computing the return value of getMemOpBaseRegImmOfs has only
one possible value. The other value would result in a return earlier in the
function. This patch replaces the expression with its only possible value.
Reviewers: sanjoy
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D27437
llvm-svn: 290133
DWARF 4 and later supports encoding the PC as an address or as as offset from the low PC. Clients using DWARFDie should be insulated from how to extract the high PC value. This function takes care of extracting the form value and looking for the correct form.
Differential Revision: https://reviews.llvm.org/D27885
llvm-svn: 290131
This allows lowering i8 and i16 arguments if they can fit in the registers. Note
that the lowering is incomplete - ABI extensions are handled in a subsequent
patch.
(Last part of)
Differential Revision: https://reviews.llvm.org/D27704
llvm-svn: 290106
Teach the instruction selector and legalizer that it's ok to have adds with 8 or
16-bit integers.
This is the second part of https://reviews.llvm.org/D27704
llvm-svn: 290105