Summary:
Add a hook to the GCMetadataPrinter for emitting stack maps in
custom format. The hook will be called at stack map generation
time. The default stack map format is used if there is no hook.
For this to be useful a few data structures and accessors are
exposed from the StackMaps class, so the custom printer can
access the stack map data.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, apilipenko, reames
Reviewed By: reames
Subscribers: reames, apilipenko, nemanjai, javed.absar, kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D53892
llvm-svn: 347584
ParentTy is never used other than an assignment, and since it is a
pointer, there is no side effect. Some versions of GCC notice and warn
on this.
Change-Id: I37dc1a18c7b58040419afb803621de13d8904a8f
llvm-svn: 347581
Summary:
STATEPOINT records its args' locations on stack relative to SP.
If the SP is changed, take that into account.
This patch authored by Cherry Zhang <cherryyz@google.com>.
Reviewers: thanm, reames
Reviewed By: reames
Subscribers: reames, llvm-commits
Differential Revision: https://reviews.llvm.org/D53603
llvm-svn: 347569
This source file has not been needed since r346522 and was triggering diagnostics in MSVC about an object file which exports no public symbols (LNK4221).
llvm-svn: 347565
We can now select CLZ via the TableGen'erated code, so support G_CTLZ
and G_CTLZ_ZERO_UNDEF throughout the pipeline for types <= s32.
Legalizer:
If the CLZ instruction is available, use it for both G_CTLZ and
G_CTLZ_ZERO_UNDEF. Otherwise, use a libcall for G_CTLZ_ZERO_UNDEF and
lower G_CTLZ in terms of it.
In order to achieve this we need to add support to the LegalizerHelper
for the legalization of G_CTLZ_ZERO_UNDEF for s32 as a libcall (__clzsi2).
We also need to allow lowering of G_CTLZ in terms of G_CTLZ_ZERO_UNDEF
if that is supported as a libcall, as opposed to just if it is Legal or
Custom. Due to a minor refactoring of the helper function in charge of
this, we will also allow the same behaviour for G_CTTZ and G_CTPOP.
This is not going to be a problem in practice since we don't yet have
support for treating G_CTTZ and G_CTPOP as libcalls (not even in
DAGISel).
Reg bank select:
Map G_CTLZ to GPR. G_CTLZ_ZERO_UNDEF should not make it to this point.
Instruction select:
Nothing to do.
llvm-svn: 347545
This should likely be adjusted to limit this transform
further, but these diffs should be clear wins.
If we have blendv/conditional move, then we should assume
those are cheap ops. The loads become independent of the
compare, so those can be speculated before we need to use
the values in the blend/mov.
llvm-svn: 347526
rL347502 moved the null sibling, so we should group all of these
together. I'm not sure why these aren't methods of the SDValue
class itself, but that's another patch if that's possible.
llvm-svn: 347523
...and use them to avoid creating obviously undef values as
discussed in the post-commit thread for r347478.
The diffs in vector div/rem show that we were missing real
optimizations by creating bogus shift nodes.
llvm-svn: 347502
This code takes a truncate, fp_to_int, or int_to_fp with a legal result type and an input type that needs to be split and enlarges the elements in the result type before doing the split. Then inserts a follow up truncate or fp_round after concatenating the two halves back together.
But if the input type of the original op is being split on its way to ultimately being scalarized we're just going to end up building a vector from scalars and then truncating or rounding it in the vector register. Seems kind of silly to enlarge the result element type of the operation only to end up with scalar code and then building a vector with large elements only to make the elements smaller again in the vector register. Seems better to just try to get away producing smaller result types in the scalarized code.
The X86 test case that changes is a pretty contrived test case that exists because of a bug we used to have in our AVG matching code. I think the code is better now, but its not realistic anyway.
llvm-svn: 347482
SplitVecOp_TruncateHelper tries to introduce a multilevel truncate to avoid scalarization. But if splitting the result type would still be a legal type we don't need to do that.
The comment block at the top of the function implied that this was already implemented. I looked back through the history and it doesn't look to have ever been checked.
llvm-svn: 347479
We fail to canonicalize IR this way (prefer 'not' ops to arbitrary 'xor'),
but that would not matter without this patch because DAGCombiner was
reversing that transform. I think we need this transform in the backend
regardless of what happens in IR to catch cases where the shift-xor
is formed late from GEP or other ops.
https://rise4fun.com/Alive/NC1
Name: shl
Pre: (-1 << C2) == C1
%shl = shl i8 %x, C2
%r = xor i8 %shl, C1
=>
%not = xor i8 %x, -1
%r = shl i8 %not, C2
Name: shr
Pre: (-1 u>> C2) == C1
%sh = lshr i8 %x, C2
%r = xor i8 %sh, C1
=>
%not = xor i8 %x, -1
%r = lshr i8 %not, C2
https://bugs.llvm.org/show_bug.cgi?id=39657
llvm-svn: 347478
GCC does it this way, and we have to be consistent. This includes
stdcall and fastcall functions with suffixes. I confirmed that a
fastcall function named "foo" ends up in ".text$foo", not
".text$@foo@8".
Based on a patch by Andrew Yohn!
Fixes PR39218.
Differential Revision: https://reviews.llvm.org/D54762
llvm-svn: 347431
This transform needs to be limited.
We are converting to a constant pool load very early, and we
are turning loads that are independent of the select condition
(and therefore speculatable) into a dependent non-speculatable
load.
We may also be transferring a condition code from an FP register
to integer to create that dependent load.
llvm-svn: 347424
This is another step in vector narrowing - a follow-up to D53784
(and hoping to eventually squash potential regressions seen in
D51553).
The x86 test diffs are wins, but the AArch64 diff is probably not.
That problem already exists independent of this patch (see PR39722), but it
went unnoticed in the previous patch because there were no regression tests
that showed the possibility.
The x86 diff in i64-mem-copy.ll is close. Given the frequency throttling
concerns with using wider vector ops, an extra extract to reduce vector
width is the right trade-off at this level of codegen.
Differential Revision: https://reviews.llvm.org/D54392
llvm-svn: 347356
When you have a member function with a ref-qualifier, for example:
struct Foo {
void Func() &;
void Func2() &&;
};
clang-cl was not emitting this information. Doing so is a bit
awkward, because it's not a property of the LF_MFUNCTION type, which
is what you'd expect. Instead, it's a property of the this pointer
which is actually an LF_POINTER. This record has an attributes
bitmask on it, and our handling of this bitmask was all wrong. We
had some parts of the bitmask defined incorrectly, but importantly
for this bug, we didn't know about these extra 2 bits that represent
the ref qualifier at all.
Differential Revision: https://reviews.llvm.org/D54667
llvm-svn: 347354
This is for compatibility with MSVC, which also marks this pointers
as being const-qualified.
Fixes llvm.org/pr36526
Differential Revision: https://reviews.llvm.org/D54736
llvm-svn: 347353
This uncovered an off-by-one typo in SimplifyDemandedVectorElts's INSERT_SUBVECTOR handling as its bounds check was bailing on safe indices.
llvm-svn: 347313
For bitcast nodes from larger element types, add the ability for SimplifyDemandedVectorElts to call SimplifyDemandedBits by merging the elts mask to a bits mask.
I've raised https://bugs.llvm.org/show_bug.cgi?id=39689 to deal with the few places where SimplifyDemandedBits's lack of vector handling is a problem.
Differential Revision: https://reviews.llvm.org/D54679
llvm-svn: 347301
Summary:
We already support this for scalars, but it was explicitly disabled for vectors. In the updated test cases this allows us to see the upper bits are zero to use less multiply instructions to emulate a 64 bit multiply.
This should help with this ispc issue that a coworker pointed me to https://github.com/ispc/ispc/issues/1362
Reviewers: spatel, efriedma, RKSimon, arsenm
Reviewed By: spatel
Subscribers: wdng, llvm-commits
Differential Revision: https://reviews.llvm.org/D54725
llvm-svn: 347287
Consistently use (!LegalOperations || isOperationLegalOrCustom) for all node pairs.
Differential Revision: https://reviews.llvm.org/D53478
llvm-svn: 347255
As discussed on D53794, for float types with ranges smaller than the destination integer type, then we should be able to just use a regular FP_TO_SINT opcode.
I thought we'd need to provide MSA test cases for very small integer types as well (fp16 -> i8 etc.), but it turns out that promotion will kick in so they're unnecessary.
Differential Revision: https://reviews.llvm.org/D54703
llvm-svn: 347251
This will hold flags specific to subprograms. In the future
we could potentially free up scarce bits in DIFlags by moving
subprogram-specific flags from there to the new flags word.
This patch does not change IR/bitcode formats, that will be
done in a follow-up.
Differential Revision: https://reviews.llvm.org/D54597
llvm-svn: 347239
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.
The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).
Differential Revision: https://reviews.llvm.org/D52653
llvm-svn: 347208
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI(). I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.
Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively. Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB. For example,
Function::getEntryBlock, Loop:getExitBlock, etc.
I also fixed one of the comments.
Patch by Rodrigo Caetano Rocha!
Differential Revision: https://reviews.llvm.org/D54669
llvm-svn: 347182
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code,
but copying the folds seems to be the standard method to ensure
that we don't miss these folds.
Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no
way to ensure that we do these kinds of simplifications unless the
code is repeated at node creation time and during combines.
There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167
I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.
This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023
llvm-svn: 347170
For example, on X86 we emit a sign_extend_vector_inreg from LowerLoad and without sse4.1 this node will need further legalization. Previously this sign_extend_vector_inreg was being custom lowered during DAG legalization instead of vector op legalization.
Unfortunately, this doesn't seem to matter for the output of any existing lit tests.
llvm-svn: 347094
Summary:
Experience has shown that the functionality is useful. It makes linking
optimized clang with debug info for me a lot faster, 20s to 13s. The
type merging phase of PDB writing goes from 10s to 3s.
This removes the LLVM cl::opt and replaces it with a metadata flag.
After this change, users can do the following to use ghash:
- add -gcodeview-ghash to compiler flags
- replace /DEBUG with /DEBUG:GHASH in linker flags
Reviewers: zturner, hans, thakis, takuto.ikuta
Subscribers: aprantl, hiraditya, JDevlieghere, llvm-commits
Differential Revision: https://reviews.llvm.org/D54370
llvm-svn: 347072