Commit Graph

1277 Commits

Author SHA1 Message Date
Vedant Kumar 2b9f48afdd [ubsan] Use the nicer nullability diagnostic handlers
This is a follow-up to r297700 (Add a nullability sanitizer).

It addresses some FIXME's re: using nullability-specific diagnostic
handlers from compiler-rt, now that the necessary handlers exist.

check-ubsan test updates to follow.

llvm-svn: 297750
2017-03-14 16:48:29 +00:00
Vedant Kumar 42c17ec5ac [ubsan] Add a nullability sanitizer
Teach UBSan to detect when a value with the _Nonnull type annotation
assumes a null value. Call expressions, initializers, assignments, and
return statements are all checked.

Because _Nonnull does not affect IRGen, the new checks are disabled by
default. The new driver flags are:

  -fsanitize=nullability-arg      (_Nonnull violation in call)
  -fsanitize=nullability-assign   (_Nonnull violation in assignment)
  -fsanitize=nullability-return   (_Nonnull violation in return stmt)
  -fsanitize=nullability          (all of the above)

This patch builds on top of UBSan's existing support for detecting
violations of the nonnull attributes ('nonnull' and 'returns_nonnull'),
and relies on the compiler-rt support for those checks. Eventually we
will need to update the diagnostic messages in compiler-rt (there are
FIXME's for this, which will be addressed in a follow-up).

One point of note is that the nullability-return check is only allowed
to kick in if all arguments to the function satisfy their nullability
preconditions. This makes it necessary to emit some null checks in the
function body itself.

Testing: check-clang and check-ubsan. I also built some Apple ObjC
frameworks with an asserts-enabled compiler, and verified that we get
valid reports.

Differential Revision: https://reviews.llvm.org/D30762

llvm-svn: 297700
2017-03-14 01:56:34 +00:00
Vedant Kumar 129edab125 Retry: [ubsan] Detect UB loads from bitfields
It's possible to load out-of-range values from bitfields backed by a
boolean or an enum. Check for UB loads from bitfields.

This is the motivating example:

  struct S {
    BOOL b : 1; // Signed ObjC BOOL.
  };

  S s;
  s.b = 1; // This is actually stored as -1.
  if (s.b == 1) // Evaluates to false, -1 != 1.
    ...

Changes since the original commit:

- Single-bit bools are a special case (see CGF::EmitFromMemory), and we
  can't avoid dealing with them when loading from a bitfield. Don't try to
  insert a check in this case.

Differential Revision: https://reviews.llvm.org/D30423

llvm-svn: 297389
2017-03-09 16:06:27 +00:00
Vedant Kumar 3dea91fec6 Revert "[ubsan] Detect UB loads from bitfields"
This reverts commit r297298. It breaks the self-host on this bot:

  http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/962/steps/build%20clang%2Fubsan/logs/stdio

llvm-svn: 297331
2017-03-09 00:18:53 +00:00
Vedant Kumar 5c13623a69 [ubsan] Detect UB loads from bitfields
It's possible to load out-of-range values from bitfields backed by a
boolean or an enum. Check for UB loads from bitfields.

This is the motivating example:

  struct S {
    BOOL b : 1; // Signed ObjC BOOL.
  };

  S s;
  s.b = 1; // This is actually stored as -1.
  if (s.b == 1) // Evaluates to false, -1 != 1.
    ...

Differential Revision: https://reviews.llvm.org/D30423

llvm-svn: 297298
2017-03-08 17:38:57 +00:00
Reid Kleckner 092d065265 Don't assume cleanup emission preserves dominance in expr evaluation
Summary:
Because of the existence branches out of GNU statement expressions, it
is possible that emitting cleanups for a full expression may cause the
new insertion point to not be dominated by the result of the inner
expression. Consider this example:

  struct Foo { Foo(); ~Foo(); int x; };
  int g(Foo, int);
  int f(bool cond) {
    int n = g(Foo(), ({ if (cond) return 0; 42; }));
    return n;
  }

Before this change, result of the call to 'g' did not dominate its use
in the store to 'n'. The early return exit from the statement expression
branches to a shared cleanup block, which ends in a switch between the
fallthrough destination (the assignment to 'n') or the function exit
block.

This change solves the problem by spilling and reloading expression
evaluation results when any of the active cleanups have branches.

I audited the other call sites of enterFullExpression, and they don't
appear to keep and Values live across the site of the cleanup, except in
ARC code. I wasn't able to create a test case for ARC that exhibits this
problem, though.

Reviewers: rjmccall, rsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D30590

llvm-svn: 297084
2017-03-06 22:18:34 +00:00
Gor Nishanov 90be1213d2 [coroutines] Add co_return statement emission
Summary:
Added co_return statement emission.

Tweaked coro-alloc.cpp test to use co_return to trigger coroutine processing instead of co_await, since this change starts emitting the body of the coroutine and await expression handling has not been upstreamed yet.

Reviewers: rsmith, majnemer, EricWF, aaron.ballman

Reviewed By: rsmith

Subscribers: majnemer, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D29979

llvm-svn: 297076
2017-03-06 21:12:54 +00:00
Vedant Kumar ed00ea084e [ubsan] Extend the nonnull arg check to ObjC
UBSan's nonnull argument check applies when a parameter has the
"nonnull" attribute. The check currently works for FunctionDecls, but
not for ObjCMethodDecls. This patch extends the check to work for ObjC.

Differential Revision: https://reviews.llvm.org/D30599

llvm-svn: 296996
2017-03-06 05:28:22 +00:00
Vedant Kumar 5a97265351 [ubsan] Factor out logic to emit a range check. NFC.
This is a readability improvement, but it will also help prep an
upcoming patch to detect UB loads from bitfields.

llvm-svn: 296374
2017-02-27 19:46:19 +00:00
Vedant Kumar 502bbfafca Retry: [profiling] Fix profile counter increment when emitting selects (PR32019)
2nd attempt: the first was in r296231, but it had a use after lifetime
bug.

Clang has logic to lower certain conditional expressions directly into llvm
select instructions. However, it does not emit the correct profile counter
increment as it does this: it emits an unconditional increment of the counter
for the 'then branch', even if the value selected is from the 'else branch'
(this is PR32019).

That means, given the following snippet, we would report that "0" is selected
twice, and that "1" is never selected:

  int f1(int x) {
    return x ? 0 : 1;
               ^2  ^0
  }

  f1(0);
  f1(1);

Fix the problem by using the instrprof_increment_step intrinsic to do the
proper increment.

llvm-svn: 296245
2017-02-25 06:35:45 +00:00
Vedant Kumar a45f315e2f Revert "[profiling] Fix profile counter increment when emitting selects (PR32019)"
This reverts commit r296231. It causes an assertion failure on 32-bit
machines

clang: /export/users/atombot/llvm/clang-atom-d525-fedora-rel/llvm/lib/IR/Instructions.cpp:263: void llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, const llvm::Twine&): Assertion `(i >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType()) && "Calling a function with a bad signature!"' failed.
llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/export/users/atombot/llvm/clang-atom-d525-fedora-rel/stage1/./bin/clang+0x1c5fbfa)
llvm::sys::RunSignalHandlers() (/export/users/atombot/llvm/clang-atom-d525-fedora-rel/stage1/./bin/clang+0x1c5dc7e)
SignalHandler(int) (/export/users/atombot/llvm/clang-atom-d525-fedora-rel/stage1/./bin/clang+0x1c5dde2)
__restore_rt (/lib64/libpthread.so.0+0x3f1d00efa0)
__GI_raise /home/glibctest/rpmbuild/BUILD/glibc-2.17-c758a686/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
__GI_abort /home/glibctest/rpmbuild/BUILD/glibc-2.17-c758a686/stdlib/abort.c:92:0
__assert_fail_base /home/glibctest/rpmbuild/BUILD/glibc-2.17-c758a686/assert/assert.c:92:0
(/lib64/libc.so.6+0x3f1c82e622)
llvm::CallInst::init(llvm::FunctionType*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::ArrayRef<llvm::OperandBundleDefT<llvm::Value*> >, llvm::Twine const&) (/export/users/atombot/llvm/clang-atom-d525-fedora-rel/stage1/./bin/clang+0x1804e3a)
clang::CodeGen::CodeGenPGO::emitCounterIncrement(clang::CodeGen::CGBuilderTy&, clang::Stmt const*, llvm::Value*) (/export/users/atombot/llvm/clang-atom-d525-fedora-rel/stage1/./bin/clang+0x1ec7891)

llvm-svn: 296234
2017-02-25 02:59:47 +00:00
Vedant Kumar c416e99d42 [profiling] Fix profile counter increment when emitting selects (PR32019)
Clang has logic to lower certain conditional expressions directly into
llvm select instructions. However, it does not emit the correct profile
counter increment as it does this: it emits an unconditional increment
of the counter for the 'then branch', even if the value selected is from
the 'else branch' (this is PR32019).

That means, given the following snippet, we would report that "0" is
selected twice, and that "1" is never selected:

  int f1(int x) {
    return x ? 0 : 1;
               ^2  ^0
  }

  f1(0);
  f1(1);

Fix the problem by using the instrprof_increment_step intrinsic to do
the proper increment.

llvm-svn: 296231
2017-02-25 02:30:03 +00:00
Vedant Kumar 7f809b2fbd [profiling] PR31992: Don't skip interesting non-base constructors
Fix the fact that we don't assign profile counters to constructors in
classes with virtual bases, or constructors with variadic parameters.

Differential Revision: https://reviews.llvm.org/D30131

llvm-svn: 296062
2017-02-24 01:15:19 +00:00
Erik Pilkington 9c42a8d43e [ObjC][CodeGen] CodeGen support for @available.
CodeGens uses of @available into calls to the compiler-rt function
__isOSVersionAtLeast.

This commit is part of a feature that I proposed here:
http://lists.llvm.org/pipermail/cfe-dev/2016-July/049851.html

Differential revision: https://reviews.llvm.org/D27827

llvm-svn: 296015
2017-02-23 21:08:08 +00:00
George Burgess IV 0d6592a899 [CodeGen] Don't reemit expressions for pass_object_size params.
This fixes an assertion failure in cases where we had expression
statements that declared variables nested inside of pass_object_size
args. Since we were emitting the same ExprStmt twice (once for the arg,
once for the @llvm.objectsize call), we were getting issues with
redefining locals.

This also means that we can be more lax about when we emit
@llvm.objectsize for pass_object_size args: since we're reusing the
arg's value itself, we don't have to care so much about side-effects.

llvm-svn: 295935
2017-02-23 05:59:56 +00:00
Vedant Kumar e550d11d34 Rename a helper function, NFC.
llvm-svn: 295918
2017-02-23 01:22:38 +00:00
Vedant Kumar 34b1fd6aaa Retry^2: [ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.

Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.

Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').

This patch teaches ubsan to remove the redundant null checks identified
above.

Testing: check-clang, check-ubsan, and a stage2 ubsan build.

I also compiled X86FastISel.cpp with -fsanitize=null using
patched/unpatched clangs based on r293572. Here are the number of null
checks emitted:

  -------------------------------------
  | Setup          | # of null checks |
  -------------------------------------
  | unpatched, -O0 |            21767 |
  | patched, -O0   |            10758 |
  -------------------------------------

Changes since the initial commit:
- Don't introduce any unintentional object-size or alignment checks.
- Don't rely on IRGen of C labels in the test.

Differential Revision: https://reviews.llvm.org/D29530

llvm-svn: 295515
2017-02-17 23:22:59 +00:00
Vedant Kumar 18348ea9b9 [ubsan] Pass a set of checks to skip to EmitTypeCheck() (NFC)
CodeGenFunction::EmitTypeCheck accepts a bool flag which controls
whether or not null checks are emitted. Make this a bit more flexible by
changing the bool to a SanitizerSet.

Needed for an upcoming change which deals with a scenario in which we
only want to emit null checks.

llvm-svn: 295514
2017-02-17 23:22:55 +00:00
Vedant Kumar 29ba8d9bfe Revert "Retry: [ubsan] Reduce null checking of C++ object pointers (PR27581)"
This reverts commit r295401. It breaks the ubsan self-host. It inserts
object size checks once per C++ method which fire when the structure is
empty.

llvm-svn: 295494
2017-02-17 20:59:40 +00:00
Vedant Kumar 55875b9955 Retry: [ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.

Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.

Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').

This patch teaches ubsan to remove the redundant null checks identified
above.

Testing: check-clang and check-ubsan. I also compiled X86FastISel.cpp
with -fsanitize=null using patched/unpatched clangs based on r293572.
Here are the number of null checks emitted:

  -------------------------------------
  | Setup          | # of null checks |
  -------------------------------------
  | unpatched, -O0 |            21767 |
  | patched, -O0   |            10758 |
  -------------------------------------

Changes since the initial commit: don't rely on IRGen of C labels in the
test.

Differential Revision: https://reviews.llvm.org/D29530

llvm-svn: 295401
2017-02-17 02:03:51 +00:00
Vedant Kumar 4f94a94bea Revert "[ubsan] Reduce null checking of C++ object pointers (PR27581)"
This reverts commit r295391. It breaks this bot:

http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/1898

I need to not rely on labels in the IR test.

llvm-svn: 295396
2017-02-17 01:42:36 +00:00
Vedant Kumar 3e5a9a6be8 [ubsan] Reduce null checking of C++ object pointers (PR27581)
This patch teaches ubsan to insert exactly one null check for the 'this'
pointer per method/lambda.

Previously, given a load of a member variable from an instance method
('this->x'), ubsan would insert a null check for 'this', and another
null check for '&this->x', before allowing the load to occur.

Similarly, given a call to a method from another method bound to the
same instance ('this->foo()'), ubsan would a redundant null check for
'this'. There is also a redundant null check in the case where the
object pointer is a reference ('Ref.foo()').

This patch teaches ubsan to remove the redundant null checks identified
above.

Testing: check-clang and check-ubsan. I also compiled X86FastISel.cpp
with -fsanitize=null using patched/unpatched clangs based on r293572.
Here are the number of null checks emitted:

  -------------------------------------
  | Setup          | # of null checks |
  -------------------------------------
  | unpatched, -O0 |            21767 |
  | patched, -O0   |            10758 |
  -------------------------------------

Differential Revision: https://reviews.llvm.org/D29530

llvm-svn: 295391
2017-02-17 01:05:42 +00:00
Arpith Chacko Jacob 101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
Arpith Chacko Jacob bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Arpith Chacko Jacob 8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00
Arpith Chacko Jacob cdda3daa7f [OpenMP][NVPTX][CUDA] Adding support for printf for an NVPTX OpenMP device.
Support for CUDA printf is exploited to support printf for
an NVPTX OpenMP device.

To reflect the support of both programming models, the file
CGCUDABuiltin.cpp has been renamed to CGGPUBuiltin.cpp, and
the call EmitCUDADevicePrintfCallExpr has been renamed to
EmitGPUDevicePrintfCallExpr.

Reviewers: jlebar
Differential Revision: https://reviews.llvm.org/D17890

llvm-svn: 293444
2017-01-29 20:49:31 +00:00
Akira Hatanaka fdcd18b4c9 [CodeGen] Suppress emission of lifetime markers if a label has been seen
in the current lexical scope.

clang currently emits the lifetime.start marker of a variable when the
variable comes into scope even though a variable's lifetime starts at
the entry of the block with which it is associated, according to the C
standard. This normally doesn't cause any problems, but in the rare case
where a goto jumps backwards past the variable declaration to an earlier
point in the block (see the test case added to lifetime2.c), it can
cause mis-compilation.

To prevent such mis-compiles, this commit conservatively disables
emitting lifetime variables when a label has been seen in the current
block.

This problem was discussed on cfe-dev here: 
http://lists.llvm.org/pipermail/cfe-dev/2016-July/050066.html

rdar://problem/30153946

Differential Revision: https://reviews.llvm.org/D27680

llvm-svn: 293106
2017-01-25 22:55:13 +00:00
Arpith Chacko Jacob 99a1e0eba5 [OpenMP] Codegen support for 'target teams' on the host.
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.

This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084

llvm-svn: 293005
2017-01-25 02:18:43 +00:00
Arpith Chacko Jacob 86f9e46365 Reverting commit because an NVPTX patch sneaked in. Break up into two
patches.

llvm-svn: 293003
2017-01-25 01:45:59 +00:00
Arpith Chacko Jacob 4dbf368e14 [OpenMP] Codegen support for 'target teams' on the host.
This patch adds support for codegen of 'target teams' on the host.
This combined directive has two captured statements, one for the
'teams' region, and the other for the 'parallel'.

This target teams region is offloaded using the __tgt_target_teams()
call. The patch sets the number of teams as an argument to
this call.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29084

llvm-svn: 293001
2017-01-25 01:38:33 +00:00
Arpith Chacko Jacob 19b911cb75 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292419
2017-01-18 18:18:53 +00:00
Arpith Chacko Jacob 42793e000a Revert r292374 to debug Windows buildbot failure.
llvm-svn: 292400
2017-01-18 15:36:05 +00:00
Arpith Chacko Jacob 68019578a3 [OpenMP] Codegen support for 'target parallel' on the host.
This patch adds support for codegen of 'target parallel' on the host.
It is also the first combined directive that requires two or more
captured statements.  Support for this functionality is included in
the patch.

A combined directive such as 'target parallel' has two captured
statements, one for the 'target' and the other for the 'parallel'
region.  Two captured statements are required because each has
different implicit parameters (see SemaOpenMP.cpp).  For example,
the 'parallel' has 'global_tid' and 'bound_tid' while the 'target'
does not.  The patch adds support for handling multiple captured
statements based on the combined directive.

When codegen'ing the 'target parallel' directive, the 'target'
outlined function is created using the outer captured statement
and the 'parallel' outlined function is created using the inner
captured statement.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28753

llvm-svn: 292374
2017-01-18 15:14:52 +00:00
Arpith Chacko Jacob 43a8b7bc8c [OpenMP] Refactor code that calls codegen for target regions on the device.
This patch refactors code that calls codegen for target regions.  Currently
the codebase only supports the 'target' directive.  The patch pulls out
common target processing code into a static function that can be called
by codegen for any target directive.

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D28752

llvm-svn: 292134
2017-01-16 15:26:02 +00:00
Kelvin Li da68118729 [OpenMP] Sema and parsing for 'target teams distribute simd’ pragma
This patch is to implement sema and parsing for 'target teams distribute simd’ pragma.
    
Differential Revision: https://reviews.llvm.org/D28252

llvm-svn: 291579
2017-01-10 18:08:18 +00:00
Filipe Cabecinhas fe5e5afd53 [ubsan] Minimize size of data for type_mismatch (Redo of D19667)
Summary:
This patch makes the type_mismatch static data 7 bytes smaller (and it
ends up being 16 bytes smaller due to alignment restrictions, at least
on some x86-64 environments).

It revs up the type_mismatch handler version since we're breaking binary
compatibility. I will soon post a patch for the compiler-rt side.

Reviewers: rsmith, kcc, vitalybuka, pgousseau, gbedwell

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D28242

llvm-svn: 291236
2017-01-06 14:40:12 +00:00
Reid Kleckner d2ad9dfdb9 [Win64] Don't widen integer literal zero arguments to unprototyped function calls
The special case to widen the integer literal zero when passed to
variadic function calls should only apply to variadic functions, not
unprototyped functions. This is consistent with what MSVC does. In this
test case, MSVC uses a 4-byte store to pass the 5th argument to 'kr' and
an 8-byte store to pass the zero to 'v':

  void v(int, ...);
  void kr();
  void f(void) {
    v(1, 2, 3, 4, 0);
    kr(1, 2, 3, 4, 0);
  }

Aaron Ballman discovered this issue in https://reviews.llvm.org/D28166

llvm-svn: 290906
2017-01-03 21:23:35 +00:00
Kelvin Li 1851df563d [OpenMP] Sema and parsing for 'target teams distribute parallel for simd’ pragma
This patch is to implement sema and parsing for 'target teams distribute parallel for simd’ pragma.

Differential Revision: https://reviews.llvm.org/D28202

llvm-svn: 290862
2017-01-03 05:23:48 +00:00
Kelvin Li 80e8f56284 [OpenMP] Sema and parsing for 'target teams distribute parallel for’ pragma
This patch is to implement sema and parsing for 'target teams distribute parallel for’ pragma.

Differential Revision: https://reviews.llvm.org/D28160

llvm-svn: 290725
2016-12-29 22:16:30 +00:00
Kelvin Li 83c451e998 [OpenMP] Sema and parsing for 'target teams distribute' pragma
This patch is to implement sema and parsing for 'target teams distribute' pragma.

Differential Revision: https://reviews.llvm.org/D28015

llvm-svn: 290508
2016-12-25 04:52:54 +00:00
George Burgess IV e37633713d Add the alloc_size attribute to clang, attempt 2.
This is a recommit of r290149, which was reverted in r290169 due to msan
failures. msan was failing because we were calling
`isMostDerivedAnUnsizedArray` on an invalid designator, which caused us
to read uninitialized memory. To fix this, the logic of the caller of
said function was simplified, and we now have a `!Invalid` assert in
`isMostDerivedAnUnsizedArray`, so we can catch this particular bug more
easily in the future.

Fingers crossed that this patch sticks this time. :)

Original commit message:

This patch does three things:
- Gives us the alloc_size attribute in clang, which lets us infer the
  number of bytes handed back to us by malloc/realloc/calloc/any user
  functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
  is OK sometimes. This is why we have a change in
  test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
  unrelated tests. Richard Smith okay'ed this idea some time ago in
  person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
  D26410. Lack of uniquing only really shows up as a problem when
  combined with our new eagerness in the face of const.

llvm-svn: 290297
2016-12-22 02:50:20 +00:00
Chandler Carruth d7738fe6ad Revert r290149: Add the alloc_size attribute to clang.
This commit fails MSan when running test/CodeGen/object-size.c in
a confusing way. After some discussion with George, it isn't really
clear what is going on here. We can make the MSan failure go away by
testing for the invalid bit, but *why* things are invalid isn't clear.
And yet, other code in the surrounding area is doing precisely this and
testing for invalid.

George is going to take a closer look at this to better understand the
nature of the failure and recommit it, for now backing it out to clean
up MSan builds.

llvm-svn: 290169
2016-12-20 08:28:19 +00:00
George Burgess IV a747027bc6 Add the alloc_size attribute to clang.
This patch does three things:

- Gives us the alloc_size attribute in clang, which lets us infer the
  number of bytes handed back to us by malloc/realloc/calloc/any user
  functions that act in a similar manner.
- Teaches our constexpr evaluator that evaluating some `const` variables
  is OK sometimes. This is why we have a change in
  test/SemaCXX/constant-expression-cxx11.cpp and other seemingly
  unrelated tests. Richard Smith okay'ed this idea some time ago in
  person.
- Uniques some Blocks in CodeGen, which was reviewed separately at
  D26410. Lack of uniquing only really shows up as a problem when
  combined with our new eagerness in the face of const.

Differential Revision: https://reviews.llvm.org/D14274

llvm-svn: 290149
2016-12-20 01:05:42 +00:00
Kelvin Li bf594a5600 [OpenMP] Sema and parsing for 'target teams' pragma
This patch is to implement sema and parsing for 'target teams' pragma.

Differential Revision: https://reviews.llvm.org/D27818

llvm-svn: 290038
2016-12-17 05:48:59 +00:00
Richard Smith 30e304e2a6 Remove custom handling of array copies in lambda by-value array capture and
copy constructors of classes with array members, instead using
ArrayInitLoopExpr to represent the initialization loop.

This exposed a bug in the static analyzer where it was unable to differentiate
between zero-initialized and unknown array values, which has also been fixed
here.

llvm-svn: 289618
2016-12-14 00:03:17 +00:00
Filipe Cabecinhas 322ecd901b [clang] Version support for UBSan handlers
This adds a way for us to version any UBSan handler by itself.
The patch overrides D21289 for a better implementation (we're able to
rev up a single handler).

After this, then we can land a slight modification of D19667+D19668.

We probably don't want to keep all the versions in compiler-rt (maybe we
want to deprecate on one release and remove the old handler on the next
one?), but with this patch we will loudly fail to compile when mixing
incompatible handler calls, instead of silently compiling and then
providing bad error messages.

Reviewers: kcc, samsonov, rsmith, vsk

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D21695

llvm-svn: 289444
2016-12-12 16:18:40 +00:00
Richard Smith 410306bf6e Add two new AST nodes to represent initialization of an array in terms of
initialization of each array element:

 * ArrayInitLoopExpr is a prvalue of array type with two subexpressions:
   a common expression (an OpaqueValueExpr) that represents the up-front
   computation of the source of the initialization, and a subexpression
   representing a per-element initializer
 * ArrayInitIndexExpr is a prvalue of type size_t representing the current
   position in the loop

This will be used to replace the creation of explicit index variables in lambda
capture of arrays and copy/move construction of classes with array elements,
and also C++17 structured bindings of arrays by value (which inexplicably allow
copying an array by value, unlike all of C++'s other array declarations).

No uses of these nodes are introduced by this change, however.

llvm-svn: 289413
2016-12-12 02:53:20 +00:00
Kelvin Li 7ade93f5e2 [OpenMP] Sema and parsing for 'teams distribute parallel for' pragma
This patch is to implement sema and parsing for 'teams distribute parallel for' pragma.
    
Differential Revision: https://reviews.llvm.org/D27345

llvm-svn: 289179
2016-12-09 03:24:30 +00:00
Kelvin Li 579e41ced2 [OpenMP] Sema and parsing for 'teams distribute parallel for simd' pragma
This patch is to implement sema and parsing for 'teams distribute parallel for simd' pragma.

Differential Revision: https://reviews.llvm.org/D27084

llvm-svn: 288294
2016-11-30 23:51:03 +00:00
Alexey Bataev 957d856e7e [OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may
cause hanging of the software in case if reduction clause is used. Patch fixes this problem by avoiding extra reduction processing for branches that were canceled.

llvm-svn: 287227
2016-11-17 15:12:05 +00:00
Vitaly Buka 2d15858e40 Revert "[OPENMP] Fixed codegen for 'omp cancel' construct."
Summary:
r286944 introduced bugs detected by ASAN as use-after-return.
r287025 have not fixed them completely.

This reverts commit r286944 and r287025.

Reviewers: ABataev

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D26720

llvm-svn: 287069
2016-11-16 01:01:22 +00:00
Alexey Bataev 473a3e7fed [OPENMP] Fixed codegen for 'omp cancel' construct.
If 'omp cancel' construct is used in a worksharing construct it may cause
hanging of the software in case if reduction clause is used. Patch fixes
this problem by avoiding extra reduction processing for branches that
were canceled.

llvm-svn: 286944
2016-11-15 09:11:50 +00:00
Amara Emerson 652795db16 Add the loop end location to the loop metadata. This additional information
can be used to improve the locations when generating remarks for loops.

Depends on the companion LLVM change r286227.

Patch by Florian Hahn.

Differential Revision: https://reviews.llvm.org/D25764

llvm-svn: 286456
2016-11-10 14:44:30 +00:00
Gor Nishanov 8df64e940d [coroutines] Add allocation and deallocation substatements.
Summary:
SemaCoroutine: Add allocation / deallocation substatements.
CGCoroutine/Test: Emit allocation and deallocation + test.

Reviewers: rsmith

Subscribers: ABataev, EricWF, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D25879

llvm-svn: 285306
2016-10-27 16:28:31 +00:00
John McCall b92ab1afd5 Refactor call emission to package the function pointer together with
abstract information about the callee.  NFC.

The goal here is to make it easier to recognize indirect calls and
trigger additional logic in certain cases.  That logic will come in
a later patch; in the meantime, I felt that this was a significant
improvement to the code.

llvm-svn: 285258
2016-10-26 23:46:34 +00:00
Vitaly Buka 64c80b4e39 [CodeGen] Don't emit lifetime intrinsics for some local variables
Summary:
Current generation of lifetime intrinsics does not handle cases like:

```
  {
    char x;
  l1:
    bar(&x, 1);
  }
  goto l1;

```
We will get code like this:

```
  %x = alloca i8, align 1
  call void @llvm.lifetime.start(i64 1, i8* nonnull %x)
  br label %l1
l1:
  %call = call i32 @bar(i8* nonnull %x, i32 1)
  call void @llvm.lifetime.end(i64 1, i8* nonnull %x)
  br label %l1
```

So the second time bar was called for x which is marked as dead.
Lifetime markers here are misleading so it's better to remove them at all.
This type of bypasses are rare, e.g. code detects just 8 functions building
clang (2329 targets).

PR28267

Reviewers: eugenis

Subscribers: beanz, mgorny, cfe-commits

Differential Revision: https://reviews.llvm.org/D24693

llvm-svn: 285176
2016-10-26 05:42:30 +00:00
Vitaly Buka 1c94332e7a [CodeGen] Move shouldEmitLifetimeMarkers into more convenient place
Summary: D24693 will need access to it from other places

Reviewers: eugenis

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D24695

llvm-svn: 285158
2016-10-26 01:59:57 +00:00
Kelvin Li 4e325f77a9 Re-apply patch r279045.
llvm-svn: 285066
2016-10-25 12:50:55 +00:00
Benjamin Kramer c3f89253ae Retire llvm::alignOf in favor of C++11 alignof.
No functionality change intended.

llvm-svn: 284730
2016-10-20 14:27:22 +00:00
Akira Hatanaka 642f799b0d [CodeGen][ObjC] Do not call objc_storeStrong when initializing a
constexpr variable.

When compiling a constexpr NSString initialized with an objective-c
string literal, CodeGen emits objc_storeStrong on an uninitialized
alloca, which causes a crash.

This patch folds the code in EmitScalarInit into EmitStoreThroughLValue
and fixes the crash by calling objc_retain on the string instead of
using objc_storeStrong.

rdar://problem/28562009

Differential Revision: https://reviews.llvm.org/D25547

llvm-svn: 284516
2016-10-18 19:05:41 +00:00
Albert Gutowski 2a0621e58a Implement MS _BitScan intrinsics
Summary: _BitScan intrinsics (and some others, for example _Interlocked and _bittest) are supposed to work on both ARM and x86. This is an attempt to isolate them, avoiding repeating their code or writing separate function for each builtin.

Reviewers: hans, thakis, rnk, majnemer

Subscribers: RKSimon, cfe-commits, aemerson

Differential Revision: https://reviews.llvm.org/D25264

llvm-svn: 284060
2016-10-12 22:01:05 +00:00
Richard Smith b2f0f05742 Re-commit r283722, reverted in r283750, with a fix for a CUDA-specific use of
past-the-end iterator.

Original commit message:

P0035R4: Semantic analysis and code generation for C++17 overaligned
allocation.

llvm-svn: 283789
2016-10-10 18:54:32 +00:00
Daniel Jasper e9abe64816 Revert "P0035R4: Semantic analysis and code generation for C++17 overaligned allocation."
This reverts commit r283722. Breaks:
  Clang.SemaCUDA.device-var-init.cu
  Clang.CodeGenCUDA.device-var-init.cu

http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-expensive/884/

llvm-svn: 283750
2016-10-10 14:13:55 +00:00
Richard Smith 189e52fcdf P0035R4: Semantic analysis and code generation for C++17 overaligned
allocation.

llvm-svn: 283722
2016-10-10 06:42:31 +00:00
Gor Nishanov 97e3b6d895 [coroutines] Adding builtins for coroutine intrinsics and backendutil support.
Summary:
With this commit simple coroutines can be created in plain C using coroutine builtins.

Reviewers: rnk, EricWF, rsmith

Subscribers: modocache, mgorny, mehdi_amini, beanz, cfe-commits

Differential Revision: https://reviews.llvm.org/D24373

llvm-svn: 283155
2016-10-03 22:44:48 +00:00
Richard Smith a560ccf2af Switch to a different workaround for unimplementability of P0145R3 in MS ABIs.
Instead of ignoring the evaluation order rule, ignore the "destroy parameters
in reverse construction order" rule for the small number of problematic cases.
This only causes incorrect behavior in the rare case where both parameters to
an overloaded operator <<, >>, ->*, &&, ||, or comma are of class type with
non-trivial destructor, and the program is depending on those parameters being
destroyed in reverse construction order.

We could do a little better here by reversing the order of parameter
destruction for those functions (and reversing the argument evaluation order
for all direct calls, not just those with operator syntax), but that is not a
complete solution to the problem, as the same situation can be reached by an
indirect function call.

Approach reviewed off-line by rnk.

llvm-svn: 282777
2016-09-29 21:30:12 +00:00
Richard Smith 762672a73a Re-commit r282556, reverted in r282564, with a fix to CallArgList::addFrom to
function correctly when targeting MS ABIs (this appears to have never mattered
prior to this change).

Update test case to always cover both 32-bit and 64-bit Windows ABIs, since
they behave somewhat differently from each other here.

Update test case to also cover operators , && and ||, which it appears are also
affected by P0145R3 (they're not explicitly called out by the design document,
but this is the emergent behavior of the existing wording).


Original commit message:

P0145R3 (C++17 evaluation order tweaks): evaluate the right-hand side of
assignment and compound-assignment operators before the left-hand side. (Even
if it's an overloaded operator.)

This completes the implementation of P0145R3 + P0400R0 for all targets except
Windows, where the evaluation order guarantees for <<, >>, and ->* are
unimplementable as the ABI requires the function arguments are evaluated from
right to left (because parameter destructors are run from left to right in the
callee).

llvm-svn: 282619
2016-09-28 19:09:10 +00:00
Richard Smith 4499145a5f Revert r282556. This change made several bots unhappy.
llvm-svn: 282564
2016-09-28 02:20:06 +00:00
Richard Smith 97a616d624 P0145R3 (C++17 evaluation order tweaks): evaluate the right-hand side of
assignment and compound-assignment operators before the left-hand side. (Even
if it's an overloaded operator.)

This completes the implementation of P0145R3 + P0400R0 for all targets except
Windows, where the evaluation order guarantees for <<, >>, and ->* are
unimplementable as the ABI requires the function arguments are evaluated from
right to left (because parameter destructors are run from left to right in the
callee).

llvm-svn: 282556
2016-09-27 23:44:22 +00:00
Richard Smith d8e3ac3185 Fix a couple of wrong-code bugs in switch-on-constant optimization:
* recurse through intermediate LabelStmts and AttributedStmts when checking
   whether a statement inside a switch declares a variable
 * if the end of a compound statement is reachable from the chosen case label,
   and the compound statement contains a variable declaration, it's not valid
   to just emit the contents of the compound statement -- we must emit the
   statement itself or we lose the scope (and thus end lifetimes at the wrong
   point)

llvm-svn: 281797
2016-09-16 23:30:39 +00:00
Peter Collingbourne eeb56abe64 Update Clang for D20147 ("DebugInfo: New metadata representation for global variables.")
Differential Revision: http://reviews.llvm.org/D20415

llvm-svn: 281285
2016-09-13 01:13:19 +00:00
Diana Picus 8b44bbc077 Revert "[OpenMP] Sema and parsing for 'teams distribute simd’ pragma"
This reverts commit r279003 as it breaks some of our buildbots (e.g.
clang-cmake-aarch64-quick, clang-x86_64-linux-selfhost-modules).

The error is in OpenMP/teams_distribute_simd_ast_print.cpp:
clang: /home/buildslave/buildslave/clang-cmake-aarch64-quick/llvm/include/llvm/ADT/DenseMap.h:527:
bool llvm::DenseMapBase<DerivedT, KeyT, ValueT, KeyInfoT, BucketT>::LookupBucketFor(const LookupKeyT&, const BucketT*&) const
[with LookupKeyT = clang::Stmt*; DerivedT = llvm::DenseMap<clang::Stmt*, long unsigned int>;
      KeyT = clang::Stmt*; ValueT = long unsigned int;
      KeyInfoT = llvm::DenseMapInfo<clang::Stmt*>;
      BucketT = llvm::detail::DenseMapPair<clang::Stmt*, long unsigned int>]:
Assertion `!KeyInfoT::isEqual(Val, EmptyKey) && !KeyInfoT::isEqual(Val, TombstoneKey) &&
"Empty/Tombstone value shouldn't be inserted into map!"' failed.

llvm-svn: 279045
2016-08-18 09:25:07 +00:00
Kelvin Li 0e3bde8216 [OpenMP] Sema and parsing for 'teams distribute simd’ pragma
This patch is to implement sema and parsing for 'teams distribute simd’ pragma.

This patch is originated by Carlo Bertolli.

Differential Revision: https://reviews.llvm.org/D23528

llvm-svn: 279003
2016-08-17 23:13:03 +00:00
Kelvin Li 0253287633 [OpenMP] Sema and parsing for 'teams distribute' pragma
This patch is to implement sema and parsing for 'teams distribute' pragma.

Differential Revision: https://reviews.llvm.org/D23189

llvm-svn: 277818
2016-08-05 14:37:37 +00:00
Samuel Antao cc10b85789 [OpenMP] Codegen for use_device_ptr clause.
Summary: This patch adds support for the use_device_ptr clause. It includes changes in SEMA that could not be tested without codegen, namely, the use of the first private logic and mappable expressions support.

Reviewers: hfinkel, carlo.bertolli, arpith-jacob, kkwli0, ABataev

Subscribers: caomhin, cfe-commits

Differential Revision: https://reviews.llvm.org/D22691

llvm-svn: 276977
2016-07-28 14:23:26 +00:00
Kelvin Li 986330c190 [OpenMP] Sema and parsing for 'target simd' pragma
This patch is to implement sema and parsing for 'target simd' pragma.

Differential Revision: https://reviews.llvm.org/D22479

llvm-svn: 276203
2016-07-20 22:57:10 +00:00
Kelvin Li a579b9196c [OpenMP] Sema and parsing for 'target parallel for simd' pragma
This patch is to implement sema and parsing for 'target parallel for simd' pragma.

Differential Revision: http://reviews.llvm.org/D22096

llvm-svn: 275365
2016-07-14 02:54:56 +00:00
Aaron Ballman 7d2aecbc76 Add XRay flags to Clang. We implement two flags to control the XRay behaviour:
-fxray-instrument: enables XRay annotation of IR
-fxray-instruction-threshold: configures the threshold for function size (looking at IR instructions), and allow LLVM to decide whether to add the nop sleds later on in the process.

Also implements the related xray_always_instrument and xray_never_instrument function attributes.

Patch by Dean Michael Berris.

llvm-svn: 275330
2016-07-13 22:32:15 +00:00
Kelvin Li 787f3fcc6b [OpenMP] Sema and parsing for 'distribute simd' pragma
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute simd'.

Differential Revision: http://reviews.llvm.org/D22007

llvm-svn: 274604
2016-07-06 04:45:38 +00:00
Kelvin Li 4a39add05e [OpenMP] Sema and parse for 'distribute parallel for simd'
Summary: This patch is an implementation of sema and parsing for the OpenMP composite pragma 'distribute parallel for simd'.

Differential Revision: http://reviews.llvm.org/D21977

llvm-svn: 274530
2016-07-05 05:00:15 +00:00
Tim Shen 421119fd89 [Temporary, Lifetime] Add lifetime marks for temporaries
With all MaterializeTemporaryExprs coming with a ExprWithCleanups, it's
easy to add correct lifetime.end marks into the right RunCleanupsScope.

Differential Revision: http://reviews.llvm.org/D20499

llvm-svn: 274385
2016-07-01 21:08:47 +00:00
Richard Smith 5179eb7821 P0136R1, DR1573, DR1645, DR1715, DR1736, DR1903, DR1941, DR1959, DR1991:
Replace inheriting constructors implementation with new approach, voted into
C++ last year as a DR against C++11.

Instead of synthesizing a set of derived class constructors for each inherited
base class constructor, we make the constructors of the base class visible to
constructor lookup in the derived class, using the normal rules for
using-declarations.

For constructors, UsingShadowDecl now has a ConstructorUsingShadowDecl derived
class that tracks the requisite additional information. We create shadow
constructors (not found by name lookup) in the derived class to model the
actual initialization, and have a new expression node,
CXXInheritedCtorInitExpr, to model the initialization of a base class from such
a constructor. (This initialization is special because it performs real perfect
forwarding of arguments.)

In cases where argument forwarding is not possible (for inalloca calls,
variadic calls, and calls with callee parameter cleanup), the shadow inheriting
constructor is not emitted and instead we directly emit the initialization code
into the caller of the inherited constructor.

Note that this new model is not perfectly compatible with the old model in some
corner cases. In particular:
 * if B inherits a private constructor from A, and C uses that constructor to
   construct a B, then we previously required that A befriends B and B
   befriends C, but the new rules require A to befriend C directly, and
 * if a derived class has its own constructors (and so its implicit default
   constructor is suppressed), it may still inherit a default constructor from
   a base class

llvm-svn: 274049
2016-06-28 19:03:57 +00:00
Carlo Bertolli 9925f15661 Resubmission of http://reviews.llvm.org/D21564 after fixes.
[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for'

This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:

The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.

llvm-svn: 273884
2016-06-27 14:55:37 +00:00
Peter Collingbourne 0ca0363d05 CodeGen: Start emitting checked loads when both trapping CFI and -fwhole-program-vtables are enabled.
Differential Revision: http://reviews.llvm.org/D21122

llvm-svn: 273757
2016-06-25 00:24:06 +00:00
Peter Collingbourne 8dd14da0dc CodeGen: Update Clang to use the new type metadata.
Differential Revision: http://reviews.llvm.org/D21054

llvm-svn: 273730
2016-06-24 21:21:46 +00:00
Carlo Bertolli b8503d5399 Revert r273705
[OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for'

llvm-svn: 273709
2016-06-24 19:20:02 +00:00
Carlo Bertolli e77d6e0e4d [OpenMP] Initial implementation of parse and sema for composite pragma 'distribute parallel for'
http://reviews.llvm.org/D21564

This patch is an initial implementation for #distribute parallel for.
The main differences that affect other pragmas are:

The implementation of 'distribute parallel for' requires blocking of the associated loop, where blocks are "distributed" to different teams and iterations within each block are scheduled to parallel threads within each team. To implement blocking, sema creates two additional worksharing directive fields that are used to pass the team assigned block lower and upper bounds through the outlined function resulting from 'parallel'. In this way, scheduling for 'for' to threads can use those bounds.
As a consequence of blocking, the stride of 'distribute' is not 1 but it is equal to the blocking size. This is returned by the runtime and sema prepares a DistIncrExpr variable to hold that value.
As a consequence of blocking, the global upper bound (EnsureUpperBound) expression of the 'for' is not the original loop upper bound (e.g. in for(i = 0 ; i < N; i++) this is 'N') but it is the team-assigned block upper bound. Sema creates a new expression holding the calculation of the actual upper bound for 'for' as UB = min(UB, PrevUB), where UB is the loop upper bound, and PrevUB is the team-assigned block upper bound.

llvm-svn: 273705
2016-06-24 18:53:35 +00:00
Richard Smith b130fe7d31 Implement p0292r2 (constexpr if), a likely C++1z feature.
llvm-svn: 273602
2016-06-23 19:16:49 +00:00
Samuel Antao 6d0042642a Re-apply r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects.
An issue in one of the regression tests was fixed for 32-bit hosts.

llvm-svn: 272931
2016-06-16 18:39:34 +00:00
Samuel Antao b1f9501242 Revert r272900 - [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects.
Was causing trouble in one of the regression tests for a 32-bit address space.

llvm-svn: 272908
2016-06-16 16:06:22 +00:00
Samuel Antao 4951617980 [OpenMP] Cast captures by copy when passed to fork call so that they are compatible to what the runtime library expects.
Summary:
This patch fixes an issue detected when firstprivate variables are passed to an OpenMP outlined function vararg list. Currently they are not compatible with what the runtime library expects causing malfunction in some targets.

This patch fixes the issue by moving the casting logic already in place for offloading to the common code that creates the outline function and arguments and updates the regression tests accordingly.

Reviewers: hfinkel, arpith-jacob, carlo.bertolli, kkwli0, ABataev

Subscribers: cfe-commits, caomhin

Differential Revision: http://reviews.llvm.org/D21150

llvm-svn: 272900
2016-06-16 15:09:31 +00:00
Samuel Antao 686c70c3dc [OpenMP] Parsing and sema support for target update directive
Summary:
This patch is to add parsing and sema support for `target update` directive. Support for the `to` and `from` clauses will be added by a different patch.  This patch also adds support for other clauses that are already implemented upstream and apply to `target update`, e.g. `device` and `if`.

This patch is based on the original post by Kelvin Li.

Reviewers: hfinkel, carlo.bertolli, kkwli0, arpith-jacob, ABataev

Subscribers: caomhin, cfe-commits

Differential Revision: http://reviews.llvm.org/D15944

llvm-svn: 270878
2016-05-26 17:30:50 +00:00
David Majnemer a38c9f1fa5 [MS Volatile] Don't make volatile loads/stores to underaligned objects atomic
Underaligned atomic LValues require libcalls which MSVC doesn't have.
MSVC doesn't seem to consider such operations as requiring a barrier
anyway.

This fixes PR27843.

llvm-svn: 270576
2016-05-24 16:09:25 +00:00
Alexey Bataev 7ace49dff1 [OPENMP] Pass scalar firstprivate vars by value.
For better performance and to unify code with offloading part we pass
scalar firstprivate values by value, instead of by reference. It will
remove some extra copying operations.

llvm-svn: 269751
2016-05-17 08:55:33 +00:00
Alexey Bataev 9ebd742748 [OPENMP 4.5] Add codegen support in runtime for '[non]monotonic'
schedule modifiers.

Runtime library expects some additional data in schedule argument for
loop-based directives, that have additional schedule modifiers
'monotonic|nonmonotonic'.

llvm-svn: 269035
2016-05-10 09:57:36 +00:00
Alexey Bataev e7545b33ff Implementation of VlA of GNU C++ extension, by Vladimir Yakovlev.
This enables GNU C++ extension "Variable length array" by default.
Differential Revision: http://reviews.llvm.org/D18823

llvm-svn: 268018
2016-04-29 09:39:50 +00:00
Alexey Bataev 24b5baed27 [OPENMP] Simplified interface for codegen of tasks, NFC.
Reduced number of arguments in member functions of runtime support
library for task-based directives.

llvm-svn: 267863
2016-04-28 09:23:51 +00:00
Alexey Bataev 4ba78a46ff [OPENMP] Fix for codegen of captured variables in inlined directives.
Currently there is a problem with codegen of inlined directives inside
lambdas, it may cause a crash during codegen because of incorrect
capturing of variables. Patch fixes this problem.

llvm-svn: 267677
2016-04-27 07:56:03 +00:00
Alexey Bataev 7292c29bb5 [OPENMP 4.5] Codegen for 'taskloop' directive.
The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using OpenMP tasks. The iterations are distributed across tasks created by the construct and scheduled to be executed.
The next code will be generated for the taskloop directive:
    #pragma omp taskloop num_tasks(N) lastprivate(j)
        for( i=0; i<N*GRAIN*STRIDE-1; i+=STRIDE ) {
          int th = omp_get_thread_num();
          #pragma omp atomic
            counter++;
          #pragma omp atomic
            th_counter[th]++;
          j = i;
    }

Generated code:
task = __kmpc_omp_task_alloc(NULL,gtid,1,sizeof(struct
task),sizeof(struct shar),&task_entry);
psh = task->shareds;
psh->pth_counter = &th_counter;
psh->pcounter = &counter;
psh->pj = &j;
task->lb = 0;
task->ub = N*GRAIN*STRIDE-2;
task->st = STRIDE;
__kmpc_taskloop(
NULL,             // location
gtid,             // gtid
task,             // task structure
1,                // if clause value
&task->lb,        // lower bound
&task->ub,        // upper bound
STRIDE,           // loop increment
0,                // 1 if nogroup specified
2,                // schedule type: 0-none, 1-grainsize, 2-num_tasks
N,                // schedule value (ignored for type 0)
(void*)&__task_dup_entry // tasks duplication routine
);

llvm-svn: 267395
2016-04-25 12:22:29 +00:00
Alexey Bataev 5dff95c04d [OPENMP] Fix for LCV in simd directives in explicit clauses.
If loop control variable for simd-based directives is explicitly marked
as linear/lastprivate in clauses, codegen for such construct would
crash. Patch fixes this problem.

llvm-svn: 267101
2016-04-22 03:56:56 +00:00
Saleem Abdulrasool 10a4972a8d revert SVN r265702, r265640
Revert the two changes to thread CodeGenOptions into the TargetInfo allocation
and to fix the layering violation by moving CodeGenOptions into Basic.
Code Generation is arguably not particularly "basic".  This addresses Richard's
post-commit review comments.  This change purely does the mechanical revert and
will be followed up with an alternate approach to thread the desired information
into TargetInfo.

llvm-svn: 265806
2016-04-08 16:52:00 +00:00
Saleem Abdulrasool 94cfc603d1 Basic: move CodeGenOptions from Frontend
This is a mechanical move of CodeGenOptions from libFrontend to libBasic.  This
fixes the layering violation introduced earlier by threading CodeGenOptions into
TargetInfo.  It should also fix the modules based self-hosting builds.  NFC.

llvm-svn: 265702
2016-04-07 17:49:44 +00:00
JF Bastien 92f4ef1017 NFC: make AtomicOrdering an enum class
Summary: See LLVM change D18775 for details, this change depends on it.

Reviewers: jyknight, reames

Subscribers: cfe-commits

Differential Revision: http://reviews.llvm.org/D18776

llvm-svn: 265569
2016-04-06 17:26:42 +00:00
John McCall 12f2352152 IRGen-level lowering for the Swift calling convention.
llvm-svn: 265324
2016-04-04 18:33:08 +00:00
Alexey Bataev 14fa1c6b60 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
 (required) code to support target specific codegen.

llvm-svn: 264700
2016-03-29 05:34:15 +00:00
Alexey Bataev f539faa733 Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions."
Reverting because of failed tests.

llvm-svn: 264577
2016-03-28 12:58:34 +00:00
Alexey Bataev 424be92831 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
 (required) code to support target specific codegen.

llvm-svn: 264576
2016-03-28 12:52:58 +00:00
Alexey Bataev f662b5943c Revert "[OPENMP] Allow runtime insert its own code inside OpenMP regions."
This reverts commit 3ee791165100607178073f14531a0dc90c622b36.

llvm-svn: 264570
2016-03-28 10:12:03 +00:00
Alexey Bataev b8c425c4f7 [OPENMP] Allow runtime insert its own code inside OpenMP regions.
Solution unifies interface of RegionCodeGenTy type to allow insert
runtime-specific code before/after main codegen action defined in
CGStmtOpenMP.cpp file. Runtime should not define its own RegionCodeGenTy
for general OpenMP directives, but must be allowed to insert its own
  (required) code to support target specific codegen.

llvm-svn: 264569
2016-03-28 09:53:43 +00:00
Pete Cooper 948677131f Revert "Convert some ObjC msgSends to runtime calls."
This reverts commit r263607.

This change caused more objc_retain/objc_release calls in the IR but those
are then incorrectly optimized by the ARC optimizer.  Work is going to have
to be done to ensure the ARC optimizer doesn't optimize user written RR, but
that should land before this change.

This change will also need to be updated to take account for any changes required
to ensure that user written calls to RR are distinct from those inserted by ARC.

llvm-svn: 263984
2016-03-21 20:50:03 +00:00
Pete Cooper be6c750a8e Convert some ObjC msgSends to runtime calls.
It is faster to directly call the ObjC runtime for methods such as retain/release instead of sending a message to those functions.

This patch adds support for converting messages to retain/release/alloc/autorelease to their equivalent runtime calls.

Tests included for the positive case of applying this transformation, negative tests that we ensure we only convert "alloc" to objc_alloc, not "alloc2", and also a driver test to ensure we enable this only for supported runtime versions.

Reviewed by John McCall.

Differential Revision: http://reviews.llvm.org/D14737

llvm-svn: 263607
2016-03-16 00:33:21 +00:00
Alexey Samsonov ae81bbb496 EmitCXXStructorCall -> EmitCXXDestructorCall. NFC.
This function is only used in Microsoft ABI and only to emit
destructors. Rename/simplify it accordingly.

llvm-svn: 263081
2016-03-10 00:20:37 +00:00
Alexey Bataev ef549a8955 [OPENMP 4.5] Codegen for data members in 'linear' clause
OpenMP 4.5 allows privatization of non-static data members in OpenMP
constructs. Patch adds proper codegen support for data members in
'linear' clause

llvm-svn: 263003
2016-03-09 09:49:09 +00:00
Carlo Bertolli fc35ad2bbc Reapply r262741 [OPENMP] Codegen for distribute directive
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.

http://reviews.llvm.org/D17170

llvm-svn: 262832
2016-03-07 16:04:49 +00:00
Samuel Antao bf4d18d3d2 Revert r262741 - [OPENMP] Codegen for distribute directive
Was causing a failure in one of the buildbot slaves.

llvm-svn: 262744
2016-03-04 21:02:14 +00:00
Carlo Bertolli 4a56e3831d [OPENMP] Codegen for distribute directive
This patch provide basic implementation of codegen for teams directive, excluding all clauses except dist_schedule. It also fixes parts of AST reader/writer to enable correct pre-compiled header handling.

http://reviews.llvm.org/D17170

llvm-svn: 262741
2016-03-04 20:24:58 +00:00
Carlo Bertolli 430d8ecc55 Add code generation for teams directive inside target region
llvm-svn: 262652
2016-03-03 20:34:23 +00:00
David Majnemer 25eb165f18 [MSVC Compat] Correctly handle finallys nested within finallys
We'd lose track of the parent CodeGenFunction, leading us to get
confused with regard to which function a nested finally belonged to.

Differential Revision: http://reviews.llvm.org/D17752

llvm-svn: 262379
2016-03-01 19:42:53 +00:00
Peter Collingbourne fb532b9a34 Add whole-program vtable optimization feature to Clang.
This patch introduces the -fwhole-program-vtables flag, which enables the
whole-program vtable optimization feature (D16795) in Clang.

Differential Revision: http://reviews.llvm.org/D16821

llvm-svn: 261767
2016-02-24 20:46:36 +00:00
Alexey Bataev 3392d76081 [OPENMP] Improved handling of pseudo-captured expressions in OpenMP.
Expressions inside 'schedule'|'dist_schedule' clause must be captured in
combined directives to avoid possible crash during codegen. Patch
improves handling of such constructs

llvm-svn: 260954
2016-02-16 11:18:12 +00:00
Rong Xu 9837ef56b4 [PGO] cc1 option name change for profile instrumentation
This patch changes cc1 option -fprofile-instr-generate to an enum option
-fprofile-instrument={clang|none}. It also changes cc1 options
-fprofile-instr-generate= to -fprofile-instrument-path=.
The driver level option -fprofile-instr-generate and -fprofile-instr-generate=
remain intact. This change will pave the way to integrate new PGO
instrumentation in IR level.

Review: http://reviews.llvm.org/D16730
llvm-svn: 259811
2016-02-04 18:39:09 +00:00
Alexey Bataev 31300ed0a5 [OPENMP 4.0] Fixed support of array sections/array subscripts.
Codegen for array sections/array subscripts worked only for expressions with arrays as base. Patch fixes codegen for bases with pointer/reference types.

llvm-svn: 259776
2016-02-04 11:27:03 +00:00
Arpith Chacko Jacob 05bebb578a [OpenMP] Parsing + sema for target parallel for directive.
Summary:
This patch adds parsing + sema for the target parallel for directive along with testcases.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D16759

llvm-svn: 259654
2016-02-03 15:46:42 +00:00
John McCall e399e5bd3d Emit calls to objc_unsafeClaimAutoreleasedReturnValue when
reclaiming a call result in order to ignore it or assign it
to an __unsafe_unretained variable.  This avoids adding
an unwanted retain/release pair when the return value is
not actually returned autoreleased (e.g. when it is returned
from a nonatomic getter or a typical collection accessor).

This runtime function is only available on the latest Apple
OS releases; the backwards-compatibility story is that you
don't get the optimization unless your deployment target is
recent enough.  Sorry.

rdar://20530049

llvm-svn: 258962
2016-01-27 18:32:30 +00:00
Arpith Chacko Jacob e955b3d3fe [OpenMP] Parsing + sema for target parallel directive.
Summary:
This patch adds parsing + sema for the target parallel directive and its clauses along with testcases.

Reviewers: ABataev

Differential Revision: http://reviews.llvm.org/D16553

Rebased to current trunk and updated test cases.

llvm-svn: 258832
2016-01-26 18:48:41 +00:00
Alexey Bataev 1189bd0205 [OPENMP 4.5] Allow arrays in 'reduction' clause.
OpenMP 4.5, alogn with array sections, allows to use variables of array type in reductions.

llvm-svn: 258804
2016-01-26 12:20:39 +00:00
Evgeniy Stepanov 3fd61df186 [cfi] Cross-DSO CFI diagnostic mode (clang part)
* Runtime diagnostic data for cfi-icall changed to match the rest of
  cfi checks
* Layout of all CFI diagnostic data changed to put Kind at the
  beginning. There is no ABI stability promise yet.
* Call cfi_slowpath_diag instead of cfi_slowpath when needed.
* Emit __cfi_check_fail function, which dispatches a CFI check
  faliure according to trap/recover settings of the current module.
* A tiny driver change to match the way the new handlers are done in
  compiler-rt.

llvm-svn: 258745
2016-01-25 23:34:52 +00:00
Justin Lebar 3039a593db [CUDA] Make printf work.
Summary:
The code in CGCUDACall is largely based on a patch written by Eli
Bendersky:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20140324/210218.html

That patch implemented an LLVM pass lowering printf to vprintf; this
one does something similar, but in Clang codegen.

Reviewers: echristo

Subscribers: cfe-commits, jhen, tra, majnemer

Differential Revision: http://reviews.llvm.org/D16372

llvm-svn: 258642
2016-01-23 21:28:14 +00:00
Alexey Bataev 8524d15954 [OPENMP] Fix crash on reduction for complex variables.
reworked codegen for reduction operation for complex types to avoid crash

llvm-svn: 258394
2016-01-21 12:35:58 +00:00
Samuel Antao 7259076032 [OpenMP] Parsing + sema for "target exit data" directive.
Patch by Arpith Jacob. Thanks!

llvm-svn: 258177
2016-01-19 20:04:50 +00:00
Samuel Antao df67fc468e [OpenMP] Parsing + sema for "target enter data" directive.
Patch by Arpith Jacob. Thanks!

llvm-svn: 258165
2016-01-19 19:15:56 +00:00
Peter Collingbourne dc13453128 Introduce -fsanitize-stats flag.
This is part of a new statistics gathering feature for the sanitizers.
See clang/docs/SanitizerStats.rst for further info and docs.

Differential Revision: http://reviews.llvm.org/D16175

llvm-svn: 257971
2016-01-16 00:31:22 +00:00
Alexey Bataev a6f2a14b94 [OPENMP 4.5] Codegen for 'schedule' clause with monotonic/nonmonotonic modifiers.
OpenMP 4.5 adds support for monotonic/nonmonotonic modifiers in 'schedule' clause. Add codegen for these modifiers.

llvm-svn: 256666
2015-12-31 06:52:34 +00:00
Evgeniy Stepanov fd6f92d5cb Cross-DSO control flow integrity (Clang part).
Clang-side cross-DSO CFI.

* Adds a command line flag -f[no-]sanitize-cfi-cross-dso.
* Links a runtime library when enabled.
* Emits __cfi_slowpath calls is bitset test fails.
* Emits extra hash-based bitsets for external CFI checks.
* Sets a module flag to enable __cfi_check generation during LTO.

This mode does not yet support diagnostics.

llvm-svn: 255694
2015-12-15 23:00:20 +00:00
Carlo Bertolli 6200a3d0f3 Add parse and sema of OpenMP distribute directive with all clauses except dist_schedule
llvm-svn: 255498
2015-12-14 14:51:25 +00:00
David Majnemer 4e52d6f811 Update clang to use the updated LLVM EH instructions
Depends on D15139.

Reviewers: rnk

Differential Revision: http://reviews.llvm.org/D15140

llvm-svn: 255423
2015-12-12 05:39:21 +00:00
NAKAMURA Takumi 2d5c6ddf74 Revert r255001, "Add parse and sema for OpenMP distribute directive and all its clauses excluding dist_schedule."
It causes memory leak. Some tests in test/OpenMP would fail.

llvm-svn: 255094
2015-12-09 04:35:57 +00:00
Carlo Bertolli b9bfa75b28 Add parse and sema for OpenMP distribute directive and all its clauses excluding dist_schedule.
llvm-svn: 255001
2015-12-08 04:21:03 +00:00
Alexey Bataev 0a6ed84a0d [OPENMP 4.5] Parsing/sema support for 'omp taskloop simd' directive.
OpenMP 4.5 adds directive 'taskloop simd'. Patch adds parsing/sema analysis for 'taskloop simd' directive and its clauses.

llvm-svn: 254597
2015-12-03 09:40:15 +00:00
George Burgess IV 3e3bb95b69 Add the `pass_object_size` attribute to clang.
`pass_object_size` is our way of enabling `__builtin_object_size` to
produce high quality results without requiring inlining to happen
everywhere.

A link to the design doc for this attribute is available at the
Differential review link below.

Differential Revision: http://reviews.llvm.org/D13263

llvm-svn: 254554
2015-12-02 21:58:08 +00:00
Samuel Antao 4af1b7b693 [OpenMP] Update target directive codegen to use 4.5 implicit data mappings.
Summary:
This patch implements the 4.5 specification for the implicit data maps. OpenMP 4.5 specification changes the default way data is captured into a target region. All the non-aggregate kinds are passed by value by default. This required activating the capturing by value during SEMA for the target region. All the non-aggregate values that can be encoded in the size of a pointer are properly casted and forwarded to the runtime library. On top of fixing the previous weird behavior for mapping pointers in nested data regions (an explicit map was always required), this also improves performance as the number of allocations/transactions to the device per non-aggregate map are reduced from two to only one - instead of passing a reference and the value, only the value passed.

Explicit maps will be added later on once firstprivate, private, and map clauses' SEMA and parsing are available.

Reviewers: hfinkel, rjmccall, ABataev

Subscribers: cfe-commits, carlo.bertolli

Differential Revision: http://reviews.llvm.org/D14940

llvm-svn: 254521
2015-12-02 17:44:43 +00:00
Alexey Bataev 49f6e78d71 [OPENMP 4.5] Parsing/sema analysis for 'taskloop' directive.
Adds initial parsing and semantic analysis for 'taskloop' directive.

llvm-svn: 254367
2015-12-01 04:18:41 +00:00
NAKAMURA Takumi 8965799aa3 CodeGenFunction.h: Prune a \param in r253926. [-Wdocumentation]
llvm-svn: 253938
2015-11-23 23:38:13 +00:00
Samuel Antao 798f11cfb7 Preserve exceptions information during calls code generation.
This patch changes the generation of CGFunctionInfo to contain 
the FunctionProtoType if it is available. This enables the code 
generation for call instructions to look into this type for 
exception information and therefore generate better quality 
IR - it will not create invoke instructions for functions that 
are know not to throw.

llvm-svn: 253926
2015-11-23 22:04:44 +00:00
Eric Christopher c7e79dbec8 In preparation to use it in more places rename
checkBuiltinTargetFeatures to checkTargetFeatures and sink
the error handling into the function.

llvm-svn: 252832
2015-11-12 00:44:04 +00:00
Eric Christopher 2b90a64e31 Extract out a function onto CodeGenModule for getting the map of
features for a particular function, then use it to clean up some
code.

llvm-svn: 252819
2015-11-11 23:05:08 +00:00
Tim Northover cc2a6e0608 Atomics: support __c11_* calls on _Atomic struct types.
When a struct's size is not a power of 2, the corresponding _Atomic() type is
promoted to the nearest. We already correctly handled normal C++ expressions of
this form, but direct calls to the __c11_atomic_whatever builtins ended up
performing dodgy operations on the smaller non-atomic types (e.g. memcpy too
much). Later optimisations removed this as undefined behaviour.

This patch converts EmitAtomicExpr to allocate its temporaries at the full
atomic width, sidestepping the issue.

llvm-svn: 252507
2015-11-09 19:56:35 +00:00
Reid Kleckner a002bd544c [WinEH] Mark calls inside cleanups as noinline
This works around PR25162. The MSVC tables make it very difficult to
correctly inline a C++ destructor that contains try / catch.  We've
attempted to address PR25162 in LLVM's backend, but it feels pretty
infeasible.  MSVC and ICC both appear to avoid inlining such complex
destructors.

Long term, we want to fix this by making the inliner smart enough to
know when it is inlining into a cleanup, so it can inline simple
destructors (~unique_ptr and ~vector) while avoiding destructors
containing try / catch.

llvm-svn: 251576
2015-10-28 23:06:42 +00:00
John McCall b04ecb753a Unify the ObjC entrypoint caches.
llvm-svn: 250918
2015-10-21 18:06:43 +00:00
Eric Christopher 15709991d0 Add an error when calling a builtin that requires features that don't
match the feature set of the function that they're being called from.

This ensures that we can effectively diagnose some[1] code that would
instead ICE in the backend with a failure to select message.

Example:

__m128d foo(__m128d a, __m128d b) {
  return __builtin_ia32_addsubps(b, a);
}

compiled for normal x86_64 via:

clang -target x86_64-linux-gnu -c

would fail to compile in the back end because the normal subtarget
features for x86_64 only include sse2 and the builtin requires sse3.

[1] We're still not erroring on:

__m128i bar(__m128i const *p) { return _mm_lddqu_si128(p); }

where we should fail and error on an always_inline function being
inlined into a function that doesn't support the subtarget features
required.

llvm-svn: 250473
2015-10-15 23:47:11 +00:00