Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed.
llvm-svn: 312450
In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern.
llvm-svn: 312448
Because it is common to treat vector types as an array of their elements, or
even some other type that's not the element type, and thus index into them, we
can't use struct-path TBAA for these accesses. Even though we already treat all
vector types as equivalent to 'char', we were using field-offset information
for them with TBAA, and this renders undefined the intra-value indexing we
intend to allow. Note that, although 'char' is universally aliasing, with path
TBAA, we can still differentiate between access to s.a and s.b in
struct { char a, b; } s;. We can't use this capability as-is for vector types.
Fixes PR33967.
llvm-svn: 312447
Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated.
This is a subset of the generated tests that we think may add value to our X86 set of tests.
Some of the checks are not optimal and will be changed after fixing:
1. PR34394
2. PR34382
3. PR34380
4. PR34359
Differential Revision: https://reviews.llvm.org/D37329
llvm-svn: 312442
Not all targets support vararg (e.g. amdgpu). Instead of using vararg in the emitted functions for enqueue_kernel,
this patch creates a temporary array of size_t, stores the size arguments in the temporary array
and passes it to the emitted functions for enqueue_kernel.
Differential Revision: https://reviews.llvm.org/D36678
llvm-svn: 312441
Summary:
This patch contains performance improvements for the `MinComplexityConstraint`. It reduces the constraint time when running on the SQLite codebase by around 43% (from 0.085s down to 0.049s).
The patch is essentially doing two things:
* It introduces a possibility for the complexity value to early exit when reaching the limit we were checking for. This means that once we noticed that the current clone is larger than the limit the user has set, we instantly exit and no longer traverse the tree or do further expensive lookups in the macro stack.
* It also removes half of the macro stack lookups we do so far. Previously we always checked the start and the end location of a Stmt for macros, which was only a middle way between checking all locations of the Stmt and just checking one location. In practice I rarely found cases where it really matters if we check start/end or just the start of a statement as code with lots of macros that somehow just produce half a statement are very rare.
Reviewers: NoQ
Subscribers: cfe-commits, xazax.hun, v.g.vassilev
Differential Revision: https://reviews.llvm.org/D34361
llvm-svn: 312440
The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards.
Adding a type check in case the combine is running after the type legalize pass.
Differential Revision: https://reviews.llvm.org/D37330
llvm-svn: 312438
Extract the target specific option application. This is a huge switch
which was inlined into the `ConstructJob` option which adds a large
amount of code to the already large function. Extract it to simply
reduce the line count. NFC
llvm-svn: 312436
Out-of-line the logic for selecting the debug information handling.
This is still split across the new function and partially inline in the
job construction. This is needed since the split portion attempts to
record the "-cc1" arguments. This needs to be the very last item to
ensure that all the flags are recorded. NFC.
llvm-svn: 312435
attach by pid worked when running from the directory from which the
target was launched, but failed from a different directory. Use the
kern.proc.pathname sysctl to locate the target, falling back to the
original case of the target's argv[0] if that fails. Based on a patch
from Vignesh Balu.
Differential Revision: https://reviews.llvm.org/D32271
llvm-svn: 312430
Calling grow may result in an error if, for example, this is a callback
manager for a remote target. We need to be able to return this error to the
callee.
llvm-svn: 312429
This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.
The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.
When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).
The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.
llvm-svn: 312426
FuzzMutate might not be the best place for these, but it makes more
sense than an entirely new library for now. This will make setting up
fuzz targets with consistent CLI handling easier.
llvm-svn: 312425
Summary:
ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort.
We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code.
Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one.
The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed.
Reviewers: spatel, zvi, igorb, guyblank, RKSimon
Reviewed By: guyblank
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37320
llvm-svn: 312422
In addition to removing chunks of duplicated code, we don't
want these to diverge. If there's a fold for one, there
should be a fold of the other via DeMorgan's Laws.
llvm-svn: 312420
Summary:
Move version control macros, find_first_existing_file and
find_first_existing_vc_file to AddLLVM.cmake so they can be reused by sub projects
like clang.
Differential Revision: https://reviews.llvm.org/D36971
llvm-svn: 312419
We had these locals:
Value *Op0RHS = LHS->getOperand(1);
Value *Op1LHS = RHS->getOperand(0);
...so we confusingly transposed the meaning of left/right and op0/op1.
llvm-svn: 312418
This makes it easier to see that they're almost duplicates.
As with the similar icmp functions, there should be identical
folds for both logic ops because those are DeMorganized variants.
llvm-svn: 312415
In case a PHI node follows an error block we can assume that the incoming value
can only come from the node that is not an error block. As a result, conditions
that seemed non-affine before are now in fact affine.
llvm-svn: 312410
The binutils utility dwp has an option "-e"
https://gcc.gnu.org/wiki/DebugFissionDWP
to specify an executable/library to get the list
of *.dwo files from it. This option is particularly useful when
someone runs the tool manually outside of a build system.
This diff adds an implementation of "-e" to llvm-dwp.
Test plan: make check-all
Differential revision: https://reviews.llvm.org/D37371
llvm-svn: 312409
Summary:
xmlDoc needs to be released with xmlFreeDoc.
XML_PARSE_NODICT is needed for safe moving nodes between documents.
Buffer returned from xmlDocDumpFormatMemoryEnc needs xmlFree, but it needs
outlive users of getMergedManifest results.
Reviewers: ecbeckmann, rnk, zturner, ruiu
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D37321
llvm-svn: 312406
We need to use target specific name for all runtimes targets. Target
specific name means the name of target in the LLVM build is different
from the name in runtimes build (in LLVM build, it's suffixed by the
target itself). Previously we have only used target specific names for
check targets collected through SUB_CHECK_TARGETS, but that's not
sufficient, we need to use target specific names for all targets we're
exposing in LLVM build.
Fixes PR34335.
Differential Revision: https://reviews.llvm.org/D37245
llvm-svn: 312405
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in_dual_input
v2: Add half support to shuffle2
Move shuffle2 to misc/
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312404
This was added in CL 1.1
Tested with a Radeon HD 7850 (Pitcairn) using the CL CTS via:
test_conformance/relationals/test_relationals shuffle_built_in
v2: Add half-precision support to shuffle when available.
Move to misc/ and add section 6.12.12 to clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312403
Uses the same mechanism to enable fp16 as we use for fp64 when
processing clc.h
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
llvm-svn: 312402
Summary:
After a discussion with Rekka, i believe this (or a small variant)
should fix the remaining phi-of-ops problems.
Rekka's algorithm for completeness relies on looking up expressions
that should have no leader, and expecting it to fail (IE looking up
expressions that can't exist in a predecessor, and expecting it to
find nothing).
Unfortunately, sometimes these expressions can be simplified to
constants, but we need the lookup to fail anyway. Additionally, our
simplifier outsmarts this by taking these "not quite right"
expressions, and simplifying them into other expressions or walking
through phis, etc. In the past, we've sometimes been able to find
leaders for these expressions, incorrectly.
This change causes us to not to try to phi of ops such expressions.
We determine safety by seeing if they depend on a phi node in our
block.
This is not perfect, we can do a bit better, but this should be a
"correctness start" that we can then improve. It also requires a
bunch of caching that i'll eventually like to eliminate.
The right solution, longer term, to the simplifier issues, is to make
the query interface for the instruction simplifier/constant folder
have the flags we need, so that we can keep most things going, but
turn off the possibly-invalid parts (threading through phis, etc).
This is an issue in another wrong code bug as well.
Reviewers: davide, mcrosier
Subscribers: sanjoy, llvm-commits
Differential Revision: https://reviews.llvm.org/D37175
llvm-svn: 312401