We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This only leaves one user of createTargetShuffleMask which we can hopefully get rid of in a similar manner.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373641
Register indexing 64-bit elements is possible on the SALU, but not the
VALU. Handle splitting this into two 32-bit indexes. Extend waterfall
loop handling to allow moving a range of instructions.
llvm-svn: 373638
We can still do a waterfall loop over the index if using a VGPR to
index an SGPR. The result will still be a VGPR, but we can avoid the
wide copy of the source register to a VGPR.
llvm-svn: 373637
This allows propagating the include automatically to targets that
depend on one of the libc++ targets such as the benchmarks. Note
that the GoogleBenchmark build itself still needs to manually specify
the -include, since I don't know of any way to have an external project
link against one of the libc++ targets (which would propagate the -include
automatically).
llvm-svn: 373631
Summary:
This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique
address.
In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.
The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.
In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.
This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.
Reviewers: arsenm, pendingchaos, rampitec, nhaehnle, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65961
llvm-svn: 373630
The Hexagon code assumes there's no existing terminator when inserting its
trip count condition check.
This causes swp-stages5.ll to break. The generated code looks good to me,
it is likely a permutation. I have disabled the new codegen path to keep
everything green and will investigate along with the other 3-4 tests
that have different codegen.
Fixes expensive-checks build.
llvm-svn: 373629
ARM EHABI unwinding tables only store the start address of each function, so the
last function is assumed to cover the entire address space after it. The test
picks an address on the stack assuming that it's in no function, but because of
the above it's actually resolved to the last function. Fix this by using address
0 instead.
Differential Revision: https://reviews.llvm.org/D68387
llvm-svn: 373628
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373626
The dsymutil implementation file has a using-directive for the llvm
namespace. This patch just removes redundant namespace qualifiers.
llvm-svn: 373623
This patch reimplements command line option parsing in dsymutil with
Tablegen and libOption. The main motivation for this change is to
prevent clashes with other cl::opt options defined in llvm. Although
it's a bit more heavyweight, it has some nice advantages such as no
global static initializers and better separation between the code and
the option definitions.
I also used this opportunity to improve how dsymutil deals with
incompatible options. Instead of having checks spread across the code,
everything is now grouped together in verifyOptions. The fact that the
options are no longer global means that we need to pass them around a
bit more, but I think it's worth the trade-off.
Differential revision: https://reviews.llvm.org/D68361
llvm-svn: 373622
During studying support for bitfield, I found an issue for
an example like the one in test offset-reloc-middle-chain.ll.
struct t1 { int c; };
struct s1 { struct t1 b; };
struct r1 { struct s1 a; };
#define _(x) __builtin_preserve_access_index(x)
void test1(void *p1, void *p2, void *p3);
void test(struct r1 *arg) {
struct s1 *ps = _(&arg->a);
struct t1 *pt = _(&arg->a.b);
int *pi = _(&arg->a.b.c);
test1(ps, pt, pi);
}
The IR looks like:
%0 = llvm.preserve.struct.access(base, ...)
%1 = llvm.preserve.struct.access(%0, ...)
%2 = llvm.preserve.struct.access(%1, ...)
using %0, %1 and %2
In this case, we need to generate three relocatiions
corresponding to chains: (%0), (%0, %1) and (%0, %1, %2).
After collecting all the chains, the current implementation
process each chain (in a map) with code generation sequentially.
For example, after (%0) is processed, the code may look like:
%0 = base + special_global_variable
// llvm.preserve.struct.access(base, ...) is delisted
// from the instruction stream.
%1 = llvm.preserve.struct.access(%0, ...)
%2 = llvm.preserve.struct.access(%1, ...)
using %0, %1 and %2
When processing chain (%0, %1), the current implementation
tries to visit intrinsic llvm.preserve.struct.access(base, ...)
to get some of its properties and this caused segfault.
This patch fixed the issue by remembering all necessary
information (kind, metadata, access_index, base) during
analysis phase, so in code generation phase there is
no need to examine the intrinsic call instructions.
This also simplifies the code.
Differential Revision: https://reviews.llvm.org/D68389
llvm-svn: 373621
We can point to the target region + emit parent functions names/real var
names if they were not found in host module during device codegen.
llvm-svn: 373620
These old aliases were renamed, but are still used by some projects (eg newlib).
Differential Revision: https://reviews.llvm.org/D68392
llvm-svn: 373618
The static analyzer is warning about potential null dereferences, but in these cases we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373612
It allows using "Size" with or without "Content" in YAML descriptions of
SHT_LLVM_ADDRSIG sections.
Differential revision: https://reviews.llvm.org/D68334
llvm-svn: 373610
Summary:
Those functions started being mistakenly exported from the libc++abi
shared library after commit r344152 in 2018. Removing these symbols is
technically an ABI break. However, they are not part of the C++ ABI,
they haven't ever been re-exported from libc++, and they are not
declared in any public header, so it's very unlikely that calls to
these functions exist out there. Also, the functions have reserved
names, so any impacted user would have to have tried really hard
being broken by this removal.
Note that avoiding this kind of problem is exactly why we're now
controlling exported symbols explicitly with a textual list.
Also note that applying the hidden visibility attribute is necessary
because the list of exported symbols is only used on Apple platforms
for the time being.
Reviewers: phosek, mclow.lists, EricWF
Subscribers: christof, jkorous, dexonsmith, libcxx-commits
Tags: #libc
Differential Revision: https://reviews.llvm.org/D68357
llvm-svn: 373602
Summary: This PR creates a utility class called ValueProfileCollector that tells PGOInstrumentationGen and PGOInstrumentationUse what to value-profile and where to attach the profile metadata. It then refactors logic scattered in PGOInstrumentation.cpp into two plugins that plug into the ValueProfileCollector.
Authored By: Wael Yehia <wyehia@ca.ibm.com>
Reviewer: davidxl, tejohnson, xur
Reviewed By: davidxl, tejohnson, xur
Subscribers: llvm-commits
Tag: #llvm
Differential Revision: https://reviews.llvm.org/D67920
Patch By Wael Yehia <wyehia@ca.ibm.com>
llvm-svn: 373601
As we have previously estabilished, `sub` is an outcast,
and should be considered non-canonical iff it can be converted to `add`.
It can be converted to `add` if it's second operand can be negated.
So far we mostly only do that for constants and negation itself,
but we should be more through.
llvm-svn: 373597
Having a precompiled binary here is excessive.
I also added a few missing tags.
Differential revision: https://reviews.llvm.org/D68386
llvm-svn: 373594
Summary:
This revision adds three new Stencil combinators:
* `expression`, which idiomatically constructs the source for an expression,
including wrapping the expression's source in parentheses if needed.
* `deref`, which constructs an idiomatic dereferencing expression.
* `addressOf`, which constructs an idiomatic address-taking expression.
Reviewers: gribozavr
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D68315
llvm-svn: 373593
llvm-readobj "non-standard" flags `--mips-plt-got`, `--mips-abi-flags`,
`--mips-reginfo`, and `--mips-options` are superseded by the `--arch-specific`
flag and can be removed now.
llvm-svn: 373590