rL367544 added @earlyclobbers for the MVE VREV64 instruction. This adds the
same for a number of other 32bit instructions that are similarly unpredictable
if the destination equals the source (due to the cross beat nature of the
instructions).
This includes:
VCADD.f32
VCADD.i32
VCMUL.f32
VHCADD.s32
VMULLT/B.s/u32
VQDMLADH{X}.s32
VQRDMLADH{X}.s32
VQDMLSDH{X}.s32
VQRDMLSDH{X}.s32
VQDMULLT/B.s32 with Qm and Rm
No tests here as this would require intrinsics (or very interesting codegen) to
manifest. The tests will follow naturally as the intrinsics are added.
Differential Revision: https://reviews.llvm.org/D67462
llvm-svn: 371838
This code is not on any performance critical path that would
justify this shortening optimization. It also makes it possible
to turn 'ref' into a function (as this is the only place where
we modify this ArgEntry member).
llvm-svn: 371836
Summary:
This patch introduces the skeleton of the constexpr interpreter,
capable of evaluating a simple constexpr functions consisting of
if statements. The interpreter is described in more detail in the
RFC. Further patches will add more features.
Reviewers: Bigcheese, jfb, rsmith
Subscribers: bruno, uenoku, ldionne, Tyker, thegameg, tschuett, dexonsmith, mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64146
llvm-svn: 371834
Follow up of rL371321 that added FMA FP16 patterns. This adds more tests
for @llvm.fma.f16. This probably shows we miss one fmsub optimisation
opportunity, which I will look into.
llvm-svn: 371833
This patch adds vecreduce_smax, vecredude_umax, vecreduce_smin, vecreduce_umin and selection for vmaxv and minv.
Differential Revision: https://reviews.llvm.org/D66413
llvm-svn: 371827
Patch by Justice Adams!
Made llvm-objdump --all-headers output match the order of GNU objdump for compatibility reasons.
Old order of the headers output:
* file header
* section header table
* symbol table
* program header table
* dynamic section
New order of the headers output (GNU compatible):
* file header information
* program header table
* dynamic section
* section header table
* symbol table
(Relevant BugZilla Bug: https://bugs.llvm.org/show_bug.cgi?id=41830)
Differential revision: https://reviews.llvm.org/D67357
llvm-svn: 371826
lib/Transform/ScheduleOptimizer.cpp fails to compile on Solaris, both on the 9.x
branch (first noticed when running test-release.sh without -no-polly) and on trunk:
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp: In function ‘MicroKernelParamsTy getMicroKernelParams(const llvm::TargetTransformInfo*, polly::MatMulInfoTy)’:
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:914:62: error: call of overloaded ‘sqrt(long unsigned int)’ is ambiguous
914 | ceil(sqrt(Nvec * LatencyVectorFma * ThroughputVectorFma) / Nvec) * Nvec;
| ^
In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24,
from /usr/gcc/9/include/c++/9.1.0/cmath:45,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48:
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:220:21: note: candidate: ‘long double std::sqrt(long double)’
220 | inline long double sqrt(long double __X) { return __sqrtl(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:186:15:
note: candidate: ‘float std::sqrt(float)’
186 | inline float sqrt(float __X) { return __sqrtf(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:74:15:
note: candidate: ‘double std::sqrt(double)’
74 | extern double sqrt __P((double));
| ^~~~
/var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:915:67:
error: call of overloaded ‘ceil(long unsigned int)’ is ambiguous
915 | int Mr = ceil(Nvec * LatencyVectorFma * ThroughputVectorFma / Nr);
| ^
In file included from /usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/math.h:24,
from /usr/gcc/9/include/c++/9.1.0/cmath:45,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm-c/DataTypes.h:28,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/Support/DataTypes.h:16,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/Hashing.h:47,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/include/llvm/ADT/ArrayRef.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/include/polly/ScheduleOptimizer.h:12,
from /var/llvm/llvm-9.0.0-rc4/rc4/llvm.src/tools/polly/lib/Transform/ScheduleOptimizer.cpp:48:
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:196:21: note: candidate: ‘long double std::ceil(long double)’
196 | inline long double ceil(long double __X) { return __ceill(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:160:15:
note: candidate: ‘float std::ceil(float)’
160 | inline float ceil(float __X) { return __ceilf(__X); }
| ^~~~
/usr/gcc/9/lib/gcc/x86_64-pc-solaris2.11/9.1.0/include-fixed/iso/math_iso.h:76:15:
note: candidate: ‘double std::ceil(double)’
76 | extern double ceil __P((double));
| ^~~~
Fixed by adding casts to disambiguate, checked that it now compiles on both
amd64-pc-solaris2.11 and sparcv9-sun-solaris2.11 and on x86_64-pc-linux-gnu.
Differential Revision: https://reviews.llvm.org/D67442
llvm-svn: 371825
Summary:
ASTImporter makes now difference between function templates with same
name in different translation units if these are not visible outside.
Reviewers: martong, a.sidorin, shafik, a_sidorin
Reviewed By: a_sidorin
Subscribers: rnkovacs, dkrupp, Szelethus, gamesh411, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D67490
llvm-svn: 371820
Follow-up of rL371321 that added some more FP16 FMA patterns, and an attempt to
reduce the copy-pasting and make this more readable.
Differential Revision: https://reviews.llvm.org/D67403
llvm-svn: 371818
levels:
-- none: no lax vector conversions [new GCC default]
-- integer: only conversions between integer vectors [old GCC default]
-- all: all conversions between same-size vectors [Clang default]
For now, Clang still defaults to "all" mode, but per my proposal on
cfe-dev (2019-04-10) the default will be changed to "integer" as soon as
that doesn't break lots of testcases. (Eventually I'd like to change the
default to "none" to match GCC and general sanity.)
Following GCC's behavior, the driver flag -flax-vector-conversions is
translated to -flax-vector-conversions=integer.
This reinstates r371805, reverted in r371813, with an additional fix for
lldb.
llvm-svn: 371817
This was added to support fp128 on x86-64, but appears to be
unneeded now. This may be because the FR128 register class
added back then was merged with the VR128 register class later.
llvm-svn: 371815
levels:
-- none: no lax vector conversions [new GCC default]
-- integer: only conversions between integer vectors [old GCC default]
-- all: all conversions between same-size vectors [Clang default]
For now, Clang still defaults to "all" mode, but per my proposal on
cfe-dev (2019-04-10) the default will be changed to "integer" as soon as
that doesn't break lots of testcases. (Eventually I'd like to change the
default to "none" to match GCC and general sanity.)
Following GCC's behavior, the driver flag -flax-vector-conversions is
translated to -flax-vector-conversions=integer.
llvm-svn: 371805
Function joinOrderedInstructions merges instructions when a leader is encountered twice.
It also notices that leaders in SeenLeaders may lose their leadership in previous merging,
and tries to handle the case using following code:
Instruction *PrevLeader = UnionFind.getLeaderValue(SeenLeaders.back());
However, this is wrong because it always gets leader for the last element of SeenLeaders,
and I believe it's wrong even we get leader for Prev here. As a result, Statements in cases
like the one in patch aren't merged as expected. After investigation, I believe it's
unnecessary to get leader instruction at all. This is based on fact: Although leaders in
SeenLeaders could lose leadership, they only lose to others in SeenLeaders, in other words,
one existing leader will be chosen as new leader of merged equivalent statements. We can
take advantage of this and simply check if current leader equals to Prev and break merging
if it does.
The patch also adds a new test.
Patch by bin.narwal <bin.narwal@gmail.com>
Differential Revision: https://reviews.llvm.org/D67007
llvm-svn: 371801
Unlike SelectionDAG, treat this as a normally legalizable operation.
In SelectionDAG this is supposed to only ever formed if it's legal,
but I've found that to be restricting. For AMDGPU this is contextually
legal depending on whether denormal flushing is allowed in the use
function.
Technically we currently treat the denormal mode as a subtarget
feature, so custom lowering could be avoided. However I consider this
to be a defect, and this should be contextually dependent on the
controllable rounding mode of the parent function.
llvm-svn: 371800
The result integer does not need to be the same width as the input.
AMDGPU, NVPTX, and Hexagon all have patterns working around the types
matching. GlobalISel defines these as being different type indexes.
llvm-svn: 371797
Summary:
InferiorCall is only ever used in Process, and it is not specific to
POSIX. By moving it to Process, we can remove all dependencies on plugins from
Process. Moving InferiorCall to Process seems to achieve this quite well.
Additionally, the name InferiorCall is a little vague now, so we rename
it something a bit more specific.
Reviewers: JDevlieghere, clayborg, compnerd, labath
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D67472
llvm-svn: 371796
This testcase is invalid, and caught by the verifier. For the verifier
to catch it, the live interval computation needs to complete. Remove
the assert so the verifier catches this, which is less confusing.
In this testcase there is an undefined use of a subregister, and lanes
which aren't used or defined. An equivalent testcase with the
super-register shrunk to have no untouched lanes already hit this
verifier error.
llvm-svn: 371792
This was relying on the SGPR usable for the carry out clobber to also
be used for the input. There was no carry out on gfx9. With no carry
out clobber to worry about, so the literal can just be directly used
with a VOP2 add.
llvm-svn: 371791
With the landing of the previous patch (in particular D66318) there are a lot fewer diffs now. I added an experimental O0 line, and updated all the tests to group experimental and non-experimental O0/O3 together.
Skimming the remaining diffs, there's only a few which are obviously incorrect. There's a large number which are questionable, so more todo.
llvm-svn: 371790
Swiftself uses a callee-saved register. We can tail call when the register used
in the caller and callee is the same.
This behaviour is equivalent to that in `TargetLowering::parametersInCSRMatch`.
Update call-translator-tail-call.ll to verify that we can do this. When we
support inline assembly, we can write a check similar to the one in the
general swiftself.ll. For now, we need to verify that we get the correct COPY
instruction after call lowering.
Differential Revision: https://reviews.llvm.org/D67511
llvm-svn: 371788
This is the first sweep of generic code to add isAtomic bailouts where appropriate. The intention here is to have the switch from AtomicSDNode to LoadSDNode/StoreSDNode be close to NFC; that is, I'm not looking to allow additional optimizations at this time. That will come later. See D66309 for context.
Differential Revision: https://reviews.llvm.org/D66318
llvm-svn: 371786
When using clang as a cross-compiler, we should not use system
headers to do the compilation.
This CL adds support of a new warning flag -Wpoison-system-directories which
emits warnings if --sysroot is set and headers from common host system location
are used.
By default the warning is disabled.
The intention of the warning is to catch bad includes which are usually
generated by third party build system not targeting cross-compilation.
Such cases happen in Chrome OS when someone imports a new package or upgrade
one to a newer version from upstream.
Patch by: denik (Denis Nikitin)
llvm-svn: 371785