(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).
This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.
To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:
1. Reduction with another vector.
2. Reduction on all elements.
in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.
Differential revision: http://reviews.llvm.org/D15250
llvm-svn: 261804
In VisitNonTypeTemplateParamDecl, before SubstExpr with the default argument,
we should create a ConstantEvaluated ExpressionEvaluationContext. Without this,
it is possible to use a PotentiallyEvaluated ExpressionEvaluationContext; and
MaybeODRUseExprs will not be cleared when popping the context, causing
assertion failure.
This is similar to how we handle the context before SubstExpr with the
default argument, in SubstDefaultTemplateArgument.
Part of PR13986.
rdar://24480205
Differential Revision: http://reviews.llvm.org/D17576
llvm-svn: 261803
There are two tests in this file. One which only runs on Windows
and tests that you can set a breakpoint with mismatched case. And
another that only runs on non-Windows and tests that you cannot set
a breakpoint with mismatched case. This latter test is failing on
non Windows platforms for some reason. It could be that the test
is just written incorrectly, as I think the actual functionality
actually works correctly on non-Windows platforms.
llvm-svn: 261800
Summary: Building the sanitizer libraries without rpaths causes all sorts of problems when you try to use them. This simple fix should make it all work.
Reviewers: samsonov, zaks.anna
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17556
llvm-svn: 261797
Summary: This is extracted from D17555
Reviewers: davidxl, reames, sanjoy, MatzeB, pete
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D17579
llvm-svn: 261796
Replace Scop::getStmtForBasicBlock and Scop::getStmtForRegionNode, and
add overloads for llvm::Instruction and llvm::RegionNode.
getStmtFor and overloads become the common interface to get the Stmt
that contains something. Named after LoopInfo::getLoopFor and
RegionInfo::getRegionFor.
llvm-svn: 261791
This is also be caught by the function verifier, but disconnected from
the place that produced it. Catch it already at creation to be able to
reason more directly about the cause.
llvm-svn: 261790
MemoryAccess::addIncoming exists to remember which values come from that
statement in PHI writes, relevant for subregions that have multiple
exiting edges to an exit block. The exit block can be separated from the
exiting block by regions simplifications. It should not be called for
any read accesses.
llvm-svn: 261789
The test style guide defines that opt should get its input from stdin.
(instead by file argument to avoid that the file name appears in its
output)
CHECK-FORCED is not recognized by FileCheck; remove it.
llvm-svn: 261786
This reverts commit r261780. It turns out the original code was just
fine. An overload for ltrim which takes char was added but the Doxygen
docs haven't seemed to pick it up.
llvm-svn: 261784
specializations of a template manages to trigger deserialization of more
specializations of the same template.
No test case provided: this is hard to reliably test due to standard library
differences.
Patch by Vassil Vassilev!
llvm-svn: 261781
Summary:
This is important for e.g. the following case:
void sync() { __syncthreads(); }
void foo() {
do_something();
sync();
do_something_else():
}
Without this change, if the optimizer does not inline sync() (which it
won't because __syncthreads is also marked as noduplicate, for now
anyway), it is free to perform optimizations on sync() that it would not
be able to perform on __syncthreads(), because sync() is not marked as
convergent.
Similarly, we need a notion of convergent calls, since in the case when
we can't statically determine a call's target(s), we need to know
whether it's safe to perform optimizations around the call.
This change is conservative; the optimizer will remove these attrs where
it can, see r260318, r260319.
Reviewers: majnemer
Subscribers: cfe-commits, jhen, echristo, tra
Differential Revision: http://reviews.llvm.org/D17056
llvm-svn: 261779
__global__ functions are present on both host and device side,
so providing __host__ or __device__ overloads is not going to
do anything useful.
llvm-svn: 261778
Summary:
This lets you write, e.g.
uint3 a = threadIdx;
uint3 b = blockIdx;
dim3 c = gridDim;
dim3 d = blockDim;
which is legal in nvcc, but was not legal in clang.
The fact that e.g. the type of threadIdx is not actually uint3 is still
observable, but now you have to try to observe it.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17561
llvm-svn: 261777
Summary:
curand.h includes curand_mtgp32_kernel.h. In host mode, this header
redefines threadIdx and blockDim, giving them their "proper" types of
uint3 and dim3, respectively.
clang has its own plan for these variables -- their types are magic
builtin classes. So these redefinitions are incompatible.
As a hack, we force-include the offending CUDA header and use #defines
to get the right types for threadIdx and blockDim.
Reviewers: tra
Subscribers: echristo, cfe-commits
Differential Revision: http://reviews.llvm.org/D17562
llvm-svn: 261776
Summary:
(Re-land of r260448, which was reverted in r260522 due to a test failure
in Driver/output-file-cleanup.c that only showed up in fresh builds.)
Previously we attempted to be smart; if one job failed, we'd run all
jobs that didn't depend on the failing job.
Problem is, this doesn't work well for e.g. CUDA compilation without
-save-temps. In this case, the device-side and host-side Assemble
actions (which actually are responsible for preprocess, compile,
backend, and assemble, since we're not saving temps) are necessarily
distinct. So our clever heuristic doesn't help us, and we repeat every
error message once for host and once for each device arch.
The main effect of this change, other than fixing CUDA, is that if you
pass multiple cc files to one instance of clang and you get a compile
error, we'll stop when the first cc1 job fails.
Reviewers: echristo
Subscribers: cfe-commits, jhen, echristo, tra, rafael
Differential Revision: http://reviews.llvm.org/D17217
llvm-svn: 261774
Summary:
It checks that certain files do and exist, so make sure that they don't
exist at the beginning of the test.
This hid a failure in r260448; to see the failure, you had to run the test with
a clean-ish objdir.
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D17216
llvm-svn: 261773
Paths on Windows are not case-sensitive. Because of this, if a file
is called main.cpp, you should be able to set a breakpoint on it
by using the name Main.cpp. In an ideal world, you could just
tell people to match the case, but in practice this can be a real
problem as it requires you to know whether the person who compiled
the program ran "clang++ main.cpp" or "clang++ Main.cpp", both of
which would work, regardless of what the file was actually called.
This fixes http://llvm.org/pr22667
Patch by Petr Hons
Differential Revision: http://reviews.llvm.org/D17492
Reviewed by: zturner
llvm-svn: 261771
r261297 called hasUserProvidedDefaultConstructor() to check if defining a
const object is ok. This is incorrect for this example:
struct X { template<typename ...T> X(T...); int n; };
const X x; // formerly OK, now bogus error
Instead, track if a class has a defaulted default constructor, and disallow
a const object for classes that either have defaulted default constructors or
if they need an implicit constructor.
Bug report and fix approach by Richard Smith, thanks!
llvm-svn: 261770
This patch introduces the -fwhole-program-vtables flag, which enables the
whole-program vtable optimization feature (D16795) in Clang.
Differential Revision: http://reviews.llvm.org/D16821
llvm-svn: 261767
This fixes bugs in copy elimination code in llvm. It slightly changes the
semantics of clearRegisterKills(). This is appropriate because:
- Users in lib/CodeGen/MachineCopyPropagation.cpp and
lib/Target/AArch64RedundantCopyElimination.cpp and
lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it
(see included testcase).
- All other users in llvm are unaffected (they pass TRI==nullptr)
- (Kill flags are optional anyway so removing too many shouldn't hurt.)
Differential Revision: http://reviews.llvm.org/D17554
llvm-svn: 261763
"Discarded" section is a marker for discarded sections, and we do not
use the instance except for checking its identity. In that sense, it
is just another type of a "null" pointer for InputSectionBase. So,
it doesn't have to be a real instance of InputSectionBase class.
In this patch, we no longer instantiate Discarded section but instead
use -1 as a pointer value. This eliminates a global variable which
needed initialization at startup.
llvm-svn: 261761