llvm-project/clang/lib
Justin Lebar b080b630b1 [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules.
Summary:
Now when you ask clang to link in a bitcode module, you can tell it to
set attributes on that module's functions to match what we would have
set if we'd emitted those functions ourselves.

This is particularly important for fast-math attributes in CUDA
compilations.

Each CUDA compilation links in libdevice, a bitcode library provided by
nvidia as part of the CUDA distribution.  Without this patch, if we have
a user-function F that is compiled with -ffast-math that calls a
function G from libdevice, F will have the unsafe-fp-math=true (etc.)
attributes, but G will have no attributes.

Since F calls G, the inliner will merge G's attributes into F's.  It
considers the lack of an unsafe-fp-math=true attribute on G to be
tantamount to unsafe-fp-math=false, so it "merges" these by setting
unsafe-fp-math=false on F.

This then continues up the call graph, until every function that
(transitively) calls something in libdevice gets unsafe-fp-math=false
set, thus disabling fastmath in almost all CUDA code.

Reviewers: echristo

Subscribers: hfinkel, llvm-commits, mehdi_amini

Differential Revision: https://reviews.llvm.org/D28538

llvm-svn: 293097
2017-01-25 21:29:48 +00:00
..
ARCMigrate Reapply "IntrusiveRefCntPtr -> std::shared_ptr for CompilerInvocationBase and CodeCompleteConsumer" 2017-01-06 19:49:01 +00:00
AST [OpenMP] Support for thread_limit-clause on the 'target teams' directive. 2017-01-25 11:44:35 +00:00
ASTMatchers Move VariantMatcher's Payload to std::shared_ptr rather than IntrusiveRefCntPtr 2017-01-05 18:51:54 +00:00
Analysis PR31631: fix bad CFG (and bogus warnings) when an if-statement has an init-statement and has binary operator as its condition. 2017-01-13 22:16:41 +00:00
Basic [OpenMP] Codegen support for 'target teams' on the host. 2017-01-25 02:18:43 +00:00
CodeGen [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules. 2017-01-25 21:29:48 +00:00
Driver Driver: ignore -fno-objc-arc-exception when -fno-objc-arc set 2017-01-25 03:36:28 +00:00
Edit Fix problems in "[OpenCL] Enabling the usage of CLK_NULL_QUEUE as compare operand." 2016-12-23 14:55:49 +00:00
Format [clang-format] Implement comment reflowing. 2017-01-25 13:58:58 +00:00
Frontend [CodeGen] [CUDA] Add the ability set default attrs on functions in linked modules. 2017-01-25 21:29:48 +00:00
FrontendTool unique_ptrify createDriverOptTable 2017-01-13 17:34:15 +00:00
Headers [OpenCL] Diagnose write_only image3d when extension is disabled 2017-01-25 12:18:50 +00:00
Index [index] Introduce symbol subkinds to mark an accessor getter or setter. 2017-01-11 21:42:48 +00:00
Lex P0426: Make the library implementation of constexpr char_traits a little easier 2017-01-20 00:45:35 +00:00
Parse Revert r292508 given that we intend to remove driver options for cxx modules. 2017-01-20 20:03:00 +00:00
Rewrite [analyzer] Re-apply r283092, attempt no.4, chunk no.4 (last) 2016-10-07 19:25:10 +00:00
Sema [OpenMP] Support for thread_limit-clause on the 'target teams' directive. 2017-01-25 11:44:35 +00:00
Serialization [OpenMP] Support for thread_limit-clause on the 'target teams' directive. 2017-01-25 11:44:35 +00:00
StaticAnalyzer [analyzer] Fix MacOSXAPIChecker fp with static locals seen from nested blocks. 2017-01-25 10:21:45 +00:00
Tooling clang-format: Make GetStyle return Expected<FormatStyle> instead of FormatStyle 2017-01-17 00:12:27 +00:00
CMakeLists.txt