We currently only support binary instructions in the alternate opcode shuffles.
This patch is an initial attempt at adding cast instructions as well, this raises several issues that we probably want to address as we continue to generalize the alternate mechanism:
1 - Duplication of cost determination - we should probably add scalar/vector costs helper functions and get BoUpSLP::getEntryCost to use them instead of determining costs directly.
2 - Support alternate instructions with the same opcode (e.g. casts with different src types) - alternate vectorization of calls with different IntrinsicIDs will require this.
3 - Allow alternates to be a different instruction type - mixing binary/cast/call etc.
4 - Allow passthrough of unsupported alternate instructions - related to PR30787/D28907 'copyable' elements.
Differential Revision: https://reviews.llvm.org/D49135
llvm-svn: 336804
This mostly brings the P5600 scheduler model to a mostly complete
status. There are a number of instructions which trigger the
`error:'MipsP5600Model' lacks information for` error. These are certain
codegen only instructions relating to MIPS64 which can be addressed by
using the correct predicates for them. That will be done in a full-up
patch.
Patch by Simon Dardis.
Differential revision: https://reviews.llvm.org/D45245
llvm-svn: 336802
Summary:
Sema code complete in the recovery mode is generally useless. For many
cases, sema first completes in recovery context and then recovers to more useful
context, in which it's favorable to ignore results from recovery (as results are
often bad e.g. all builtin symbols and top-level symbols). There is also case
where only sema would fail to recover e.g. completions in excluded #if block.
Sema would try to give results, but the results are often useless (see the updated
excluded #if block test).
Reviewers: sammccall, ilya-biryukov
Subscribers: MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49175
llvm-svn: 336801
Reuse this function as to test correctness and profitability of
reducing width of either load or store operations.
Reviewsers: samparker
Differential Revision: https://reviews.llvm.org/D48624
llvm-svn: 336800
This makes easier to identify changes in the instruction info flags. It also
helps spotting potential regressions similar to the one recently introduced at
r336728.
Using the same character to mark MayLoad/MayStore/HasSideEffects is problematic
for llvm-lit. When pattern matching substrings, llvm-lit consumes tabs and
spaces. A change in position of the flag marker may not trigger a test failure.
This patch only changes the character used for flag `hasSideEffects`. The reason
why I didn't touch other flags is because I want to avoid spamming the mailing
because of the massive diff due to the numerous tests affected by this change.
In future, each instruction flag should be associated with a different character
in the Instruction Info View.
llvm-svn: 336797
This fixes an issue that we were not properly supporting multiple reduction
stmts in a loop, and not generating SMLADs for these cases. The alias analysis
checks were done too early, making it too conservative.
Differential revision: https://reviews.llvm.org/D49125
llvm-svn: 336795
If we have 2 bitcode inputs for different targets, LLD would
print "<internal>" instead of the name of one of the files.
The patch adds a test and fixes this issue.
llvm-svn: 336794
AT_NAME was being emitted before the directory paths were remapped. This
ensures that all paths are remapped before anything is emitted.
An additional test case has been added.
Note that this only works if the replacement string is an absolute path.
If not, then AT_decl_file believes the new path is a relative path, and
joins that path with the compilation directory. I do not know of a good
way to resolve this.
Patch by: Siddhartha Bagaria (starsid)
Differential revision: https://reviews.llvm.org/D49169
llvm-svn: 336793
Since .gdb_index sections contain all known symbols, they can be very large.
One of my executables has a .gdb_index section of 1350 GiB. Uniquifying
symbols by name takes 3.77 seconds on my machine. This patch parallelize it.
Time to call createSymbols() with 8.4 million unique symbols:
Without this patch: 3773 ms
Parallelism = 1: 4374 ms
Parallelism = 2: 2628 ms
Parallelism = 16: 837 ms
As you can see above, this algorithm is a bit more inefficient
than the non-parallelized version, but even with dual-core, it is
faster than that, so I think it is overall a win.
Differential Revision: https://reviews.llvm.org/D49164
llvm-svn: 336790
The compact instruction shuffles active elements of vector
into lowest numbered elements and sets remaining elements
to zero.
e.g.
compact z0.s, p0, z1.s
llvm-svn: 336789
This workaround is for GCC 5.4.1. Without this workaround, lld will
produce larger .gdb_index sections for object files compiled with the
buggy version of the compiler.
Since it is not for correctness, and it affects only debug builds (since
you are generating .gdb_index sections), perhaps the hack shouldn't have been
added in the first place. At least, I think it is time to remove this hack.
Differential Revision: https://reviews.llvm.org/D49149
llvm-svn: 336788
Summary:
log() is split into four functions:
- elog()/log()/vlog() have different severity levels, allowing filtering
- dlog() is a lazy macro which uses LLVM_DEBUG - it logs to the logger, but
conditionally based on -debug-only flag and is omitted in release builds
All logging functions use formatv-style format strings now, e.g:
log("Could not resolve URI {0}: {1}", URI, Result.takeError());
Existing log sites have been split between elog/log/vlog by best guess.
This includes a workaround for passing Error to formatv that can be
simplified when D49170 or similar lands.
Subscribers: ilya-biryukov, javed.absar, ioeric, MaskRay, jkorous, cfe-commits
Differential Revision: https://reviews.llvm.org/D49008
llvm-svn: 336785
The LASTB and LASTA instructions extract the last active element,
or element after the last active, from the source vector.
The added variants are:
Scalar:
last(a|b) w0, p0, z0.b
last(a|b) w0, p0, z0.h
last(a|b) w0, p0, z0.s
last(a|b) x0, p0, z0.d
SIMD & FP Scalar:
last(a|b) b0, p0, z0.b
last(a|b) h0, p0, z0.h
last(a|b) s0, p0, z0.s
last(a|b) d0, p0, z0.d
The CLASTB and CLASTA conditionally extract the last or element after
the last active element from the source vector.
The added variants are:
Scalar:
clast(a|b) w0, p0, w0, z0.b
clast(a|b) w0, p0, w0, z0.h
clast(a|b) w0, p0, w0, z0.s
clast(a|b) x0, p0, x0, z0.d
SIMD & FP Scalar:
clast(a|b) b0, p0, b0, z0.b
clast(a|b) h0, p0, h0, z0.h
clast(a|b) s0, p0, s0, z0.s
clast(a|b) d0, p0, d0, z0.d
Vector:
clast(a|b) z0.b, p0, z0.b, z1.b
clast(a|b) z0.h, p0, z0.h, z1.h
clast(a|b) z0.s, p0, z0.s, z1.s
clast(a|b) z0.d, p0, z0.d, z1.d
Please refer to the architecture specification for more details on
the semantics of the added instructions.
llvm-svn: 336783
This allows us to use SelectionDAG::isKnownNeverZero in DAGCombiner::visitREM (visitSDIVLike/visitUDIVLike handle the checking for constants).
llvm-svn: 336779
llvm-mca doesn't know that on modern AMD processors, portions of a general
purpose register are not treated independently. So, a partial register write has
a false dependency on the super-register.
The issue with partial register writes will be addressed by a follow-up patch.
llvm-svn: 336778
Summary:
The write buffer contains signed chars, which means the shift operations caused values such as the arc tag value (0x01a10000) to be read incorrectly (0xffa10000).
This fixes a regression from https://reviews.llvm.org/D49132.
Reviewers: uweigand, davidxl
Reviewed By: uweigand
Subscribers: llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D49161
llvm-svn: 336775
First stage in PR38057 - support non-uniform constant vectors in the combine to reuse the division-by-constant logic.
We can definitely do better for srem pow2 remainders (and avoid that extra multiply....) but this at least helps keep everything on the vector unit.
Differential Revision: https://reviews.llvm.org/D48975
llvm-svn: 336774
gcc 4.7 seems to disagree with gcc 5.3 about whether you need to say
'return std::move(thing)' instead of just 'return thing'. All the
json::Arrays and json::Objects that I was implicitly turning into
json::Values by returning them from functions now have explicit
std::move wrappers, so hopefully 4.7 will be happy now.
llvm-svn: 336772
The aim of this backend is to output everything TableGen knows about
the record set, similarly to the default -print-records backend. But
where -print-records produces output in TableGen's input syntax
(convenient for humans to read), this backend produces it as
structured JSON data, which is convenient for loading into standard
scripting languages such as Python, in order to extract information
from the data set in an automated way.
The output data contains a JSON representation of the variable
definitions in output 'def' records, and a few pieces of metadata such
as which of those definitions are tagged with the 'field' prefix and
which defs are derived from which classes. It doesn't dump out
absolutely every piece of knowledge it _could_ produce, such as type
information and complicated arithmetic operator nodes in abstract
superclasses; the main aim is to allow consumers of this JSON dump to
essentially act as new backends, and backends don't generally need to
depend on that kind of data.
The new backend is implemented as an EmitJSON() function similar to
all of llvm-tblgen's other EmitFoo functions, except that it lives in
lib/TableGen instead of utils/TableGen on the basis that I'm expecting
to add it to clang-tblgen too in a future patch.
To test it, I've written a Python script that loads the JSON output
and tests properties of it based on comments in the .td source - more
or less like FileCheck, except that the CHECK: lines have Python
expressions after them instead of textual pattern matches.
Reviewers: nhaehnle
Reviewed By: nhaehnle
Subscribers: arichardson, labath, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D46054
llvm-svn: 336771
Summary: XRayRecords now includes a PID field. Basic handlers fetch pid and tid each time they are called instead of caching the value. Added a testcase that calls fork and checks if the child TID is different from the parent TID to verify that the processes' TID are different in the trace.
Reviewers: dberris, Maknee
Reviewed By: dberris, Maknee
Subscribers: kpw, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D49025
llvm-svn: 336769
Summary:
These changes cover the PR#31399.
Now the ffs(x) function is lowered to (x != 0) ? llvm.cttz(x) + 1 : 0
and it corresponds to the following llvm code:
%cnt = tail call i32 @llvm.cttz.i32(i32 %v, i1 true)
%tobool = icmp eq i32 %v, 0
%.op = add nuw nsw i32 %cnt, 1
%add = select i1 %tobool, i32 0, i32 %.op
and x86 asm code:
bsfl %edi, %ecx
addl $1, %ecx
testl %edi, %edi
movl $0, %eax
cmovnel %ecx, %eax
In this case the 'test' instruction can't be eliminated because
the 'add' instruction modifies the EFLAGS, namely, ZF flag
that is set by the 'bsf' instruction when 'x' is zero.
We now produce the following code:
bsfl %edi, %ecx
movl $-1, %eax
cmovnel %ecx, %eax
addl $1, %eax
Patch by Ivan Kulagin
Reviewers: davide, craig.topper, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D48765
llvm-svn: 336768
These patterns looked for a MOVSS/SD followed by a scalar_to_vector. Or a scalar_to_vector followed by a load.
In both cases we emitted a MOVSS/SD for the MOVSS/SD part, a REG_CLASS for the scalar_to_vector, and a MOVSS/SD for the load.
But we have patterns that do each of those 3 things individually so there's no reason to build large patterns.
Most of the test changes are just reorderings. The one test that had a meaningful change is pr30430.ll and it appears to be a regression. But its doing -O0 so I think it missed a lot of opportunities and was just getting lucky before.
llvm-svn: 336762