Commit Graph

254991 Commits

Author SHA1 Message Date
Arpith Chacko Jacob 101e8fb1f3 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295333
2017-02-16 16:20:16 +00:00
George Rimar 505ac8dc41 [ELF] - Do not crash when discarding sections that are referenced by others.
SHF_LINK_ORDER sections adds special ordering requirements.
Such sections references other sections. Previously we would crash
if section that other were referenced to was discarded by script.

Patch fixes that by discarding all dependent sections in that case.
It supports chained dependencies, testcase is provided.

Differential revision: https://reviews.llvm.org/D30033

llvm-svn: 295332
2017-02-16 16:06:13 +00:00
Sjoerd Meijer cb2d950214 [AArch64] AArch64AsmParser clean up of isImmediate functions. NFC
Regression test neon-diagnostics.s needed changing because it now
produces a more specific diagnostic about the immediate ranges. One
change in the expected error message is not obvious, but there multiple
candidate and it happens to pick the immediate diagnostic.

Differential Revision: https://reviews.llvm.org/D29939

llvm-svn: 295331
2017-02-16 15:52:22 +00:00
Saleem Abdulrasool 3d99648f00 math: correct the MSVCRT condition
Fixes a number of tests in the testsuite on Windows.

llvm-svn: 295330
2017-02-16 15:47:50 +00:00
Saleem Abdulrasool 305b4f2ba9 threading_support: make __thread_sleep_for be alertable
On Windows, we were using `Sleep` which is not alertable.  This means
that if the thread was used for a user APC or WinProc handling and
thread::sleep was used, we could potentially dead lock.  Use `SleepEx`
with an alertable sleep, resuming until the time has expired if we are
awoken early.

llvm-svn: 295329
2017-02-16 15:47:45 +00:00
Pavel Labath 7dc6e51ef5 Fix build due to clang r295311
BuiltinType::Kind::OCLNDRange was removed.

llvm-svn: 295328
2017-02-16 15:32:19 +00:00
Dan Gohman 4a5496902c [WebAssembly] Add a cast to void to fix an unused private member warning, for now.
llvm-svn: 295327
2017-02-16 15:21:37 +00:00
Simon Pilgrim 2fe568c95e [X86] Remove local areOnlyUsersOf helper and use SDNode::areOnlyUsersOf instead.
llvm-svn: 295326
2017-02-16 15:11:49 +00:00
Marshall Clow e9110d71dd Remove uses of deprecated std::random_shuffle in the LLVM code base. Reviewed as https://reviews.llvm.org/D29780.
llvm-svn: 295325
2017-02-16 14:37:03 +00:00
Rafael Espindola 908a3d3420 Ignore relocation sections in linker scripts.
Unfortunately, the common way of writing linker scripts seems to be
to get the output of ld.bfd --verbose and edit it a bit.

Also unfortunately, the bfd default script contains things like

.rela.dyn : { *(... .rela.data ...) }

but bfd actually ignores that for -emit-relocs, so we have to do the
same.

llvm-svn: 295324
2017-02-16 14:36:09 +00:00
Arpith Chacko Jacob bd6344c0be Revert r295319 while investigating buildbot failure.
llvm-svn: 295323
2017-02-16 14:25:35 +00:00
Rafael Espindola 82f00ec4a2 Fix crash with -emit-relocs -shared.
The code to handle the input SHT_REL/SHT_RELA sections was getting
confused with the linker generated relocation sections.

llvm-svn: 295322
2017-02-16 14:23:43 +00:00
Diana Picus 1540b06ef8 [ARM] GlobalISel: Select floating point loads
llvm-svn: 295321
2017-02-16 14:10:50 +00:00
Benjamin Kramer aad1bdc863 Silence sign compare warning. NFC.
ExprConstant.cpp:6344:20: warning: comparison of integers of different
signs: 'const size_t' (aka 'const unsigned long') and 'typename
iterator_traits<Expr *const *>::difference_type' (aka 'long')
[-Wsign-compare]

llvm-svn: 295320
2017-02-16 14:08:41 +00:00
Arpith Chacko Jacob 8e170fc857 [OpenMP] Parallel reduction on the NVPTX device.
This patch implements codegen for the reduction clause on
any parallel construct for elementary data types.  An efficient
implementation requires hierarchical reduction within a
warp and a threadblock.  It is complicated by the fact that
variables declared in the stack of a CUDA thread cannot be
shared with other threads.

The patch creates a struct to hold reduction variables and
a number of helper functions.  The OpenMP runtime on the GPU
implements reduction algorithms that uses these helper
functions to perform reductions within a team.  Variables are
shared between CUDA threads using shuffle intrinsics.

An implementation of reductions on the NVPTX device is
substantially different to that of CPUs.  However, this patch
is written so that there are minimal changes to the rest of
OpenMP codegen.

The implemented design allows the compiler and runtime to be
decoupled, i.e., the runtime does not need to know of the
reduction operation(s), the type of the reduction variable(s),
or the number of reductions.  The design also allows reuse of
host codegen, with appropriate specialization for the NVPTX
device.

While the patch does introduce a number of abstractions, the
expected use case calls for inlining of the GPU OpenMP runtime.
After inlining and optimizations in LLVM, these abstractions
are unwound and performance of OpenMP reductions is comparable
to CUDA-canonical code.

Patch by Tian Jin in collaboration with Arpith Jacob

Reviewers: ABataev
Differential Revision: https://reviews.llvm.org/D29758

llvm-svn: 295319
2017-02-16 14:03:36 +00:00
Kuba Mracek 3e81c2675e [tsan] Provide external tags (object types) via debugging API
In D28836, we added a way to tag heap objects and thus provide object types into report. This patch exposes this information into the debugging API.

Differential Revision: https://reviews.llvm.org/D30023

llvm-svn: 295318
2017-02-16 14:02:32 +00:00
Krasimir Georgiev 8fcdd5ab96 Fix clang-move test after clang-format update r295312
llvm-svn: 295317
2017-02-16 13:17:38 +00:00
Artur Pilipenko a1b384c4ce Rever -r295314 "[DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine"
This change causes some of AMDGPU and PowerPC tests to fail.

llvm-svn: 295316
2017-02-16 13:04:46 +00:00
Artur Pilipenko daaa0c0f7d [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 295314
2017-02-16 12:53:26 +00:00
Anastasia Stulova b376bee642 [OpenCL][Doc] Added OpenCL vendor extension description to user manual doc
Added description of a new feature that allows to specify
vendor extension in flexible way using compiler pragma instead
of modifying source code directly (committed in clang@r289979).

Review: D29829
llvm-svn: 295313
2017-02-16 12:49:29 +00:00
Krasimir Georgiev bb99a36dc0 [clang-format] Align block comment decorations
Summary:
This patch implements block comment decoration alignment.

source:
```
/* line 1
* line 2
*/
```

result before:
```
/* line 1
* line 2
*/
```

result after:
```
/* line 1
 * line 2
 */
```

Reviewers: djasper, bkramer, klimek

Reviewed By: klimek

Subscribers: mprobst, cfe-commits, klimek

Differential Revision: https://reviews.llvm.org/D29943

llvm-svn: 295312
2017-02-16 12:39:31 +00:00
Anastasia Stulova 58984e7087 [OpenCL] Correct ndrange_t implementation
Removed ndrange_t as Clang builtin type and added
as a struct type in the OpenCL header.

Use type name to do the Sema checking in enqueue_kernel
and modify IR generation accordingly.

Review: D28058

Patch by Dmitry Borisenkov!  
 

llvm-svn: 295311
2017-02-16 12:27:47 +00:00
Diana Picus b1701e0b05 [ARM] GlobalISel: Select G_SEQUENCE and G_EXTRACT
Since they're only used for passing around double precision floating point
values into the general purpose registers, we'll lower them to VMOVDRR and
VMOVRRD.

llvm-svn: 295310
2017-02-16 12:19:57 +00:00
Diana Picus 6beef3c087 [ARM] GlobalISel: Select double G_FADD and copies
Just use VADDD if available, bail out if not.

llvm-svn: 295309
2017-02-16 12:19:52 +00:00
Diana Picus 9b32faa821 [ARM] GlobalISel: Assert that we don't use the FPR bank if we don't have VFP
llvm-svn: 295308
2017-02-16 11:25:09 +00:00
Anastasia Stulova 9d98a316c5 [OpenCL] Disallow blocks capture other blocks (v2.0, s6.12.5)
llvm-svn: 295307
2017-02-16 11:13:30 +00:00
Diana Picus a93803b9fe [ARM] GlobalISel: Add reg bank mappings for G_SEQUENCE and G_EXTRACT
Support G_SEQUENCE and G_EXTRACT as needed for passing double precision floating
point values in the soft-fp float mode.

llvm-svn: 295306
2017-02-16 11:00:31 +00:00
Krasimir Georgiev f7de84ab9f [clangd] Fix Output.log error
llvm-svn: 295305
2017-02-16 10:53:27 +00:00
Krasimir Georgiev 1b8bfd4b76 [clangd] Implement format on type
Summary:
This patch adds onTypeFormatting to clangd.

The trigger character is '}' and it works by scanning for the matching '{' and formatting the range in-between.

There are problems with ';' as a trigger character, the cursor position is before the `|`:
```
int main() {
  int i;|
}
```
becomes:
```
int main() {  int i;| }
```
which is not likely what the user intended.

Also formatting at semicolon in a non-properly closed scope puts the following tokens in the same unwrapped line, which doesn't reformat nicely.

Reviewers: bkramer

Reviewed By: bkramer

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D29990

llvm-svn: 295304
2017-02-16 10:49:46 +00:00
Alexander Kornienko 762adef1a9 [clang-tidy] Ignore spaces between globs in the Checks option.
llvm-svn: 295303
2017-02-16 10:23:18 +00:00
Diana Picus 7f82c87022 [ARM] GlobalISel: Make the FPR bank 64-bit wide
Also add mappings for single and double precision FP, and use them for G_FADD
and G_LOAD.

llvm-svn: 295302
2017-02-16 10:12:49 +00:00
Erik Verbruggen 2c7c38d9bb Cache FileID when translating diagnostics in PCH files
Modules/preambles/PCH files can contain diagnostics, which, when used,
are added to the current ASTUnit. For that to work, they are translated
to use the current FileManager's FileIDs. When the entry is not the
main file, all local source locations will be checked by a linear
search. Now this is a problem, when there are lots of diagnostics (say,
25000) and lots of local source locations (say, 440000), and end up
taking seconds when using such a preamble.

The fix is to cache the last FileID, because many subsequent diagnostics
refer to the same file. This reduces the time spent in
ASTUnit::TranslateStoredDiagnostics from seconds to a few milliseconds
for files with many slocs/diagnostics.

This fixes PR31353.
Differential Revision: https://reviews.llvm.org/D29755

llvm-svn: 295301
2017-02-16 09:49:30 +00:00
Diana Picus 21c3d8e0fc [ARM] GlobalISel: Legalize 64-bit G_FADD and G_LOAD
For now we just mark them as legal all the time and let the other passes bail
out if they can't handle it. In the future, we'll want to move more of the
brains into the legalizer.

llvm-svn: 295300
2017-02-16 09:09:49 +00:00
Vitaly Buka f813697f05 [sanitizers] Fix formatting of the shell script.
llvm-svn: 295299
2017-02-16 08:47:27 +00:00
George Rimar 09015fee3c [ELF] - Allow section to have multiple dependent sections.
That fixes a case when section has more than one metadata 
section. Previously GC would collect one of such sections 
because we had implementation that stored only last one as
dependent.

Differential revision: https://reviews.llvm.org/D29981

llvm-svn: 295298
2017-02-16 08:41:19 +00:00
NAKAMURA Takumi 14246c937d RWMutex.h: Use llvm-config.h instead of config.h in installed headers.
llvm-svn: 295297
2017-02-16 08:22:08 +00:00
Vitaly Buka 69068dd50e [sanitizers] Redirect pthread calls to interceptors.
It's needed if libcxx is build without disabling threads.

llvm-svn: 295296
2017-02-16 08:06:17 +00:00
Diana Picus ca6a890d7f [ARM] GlobalISel: Lower double precision FP args
For the hard float calling convention, we just use the D registers.

For the soft-fp calling convention, we use the R registers and move values
to/from the D registers by means of G_SEQUENCE/G_EXTRACT. While doing so, we
make sure to honor the endianness of the target, since the CCAssignFn doesn't do
that for us.

For pure soft float targets, we still bail out because we don't support the
libcalls yet.

llvm-svn: 295295
2017-02-16 07:53:07 +00:00
Craig Topper 3731f4d173 [AVX-512][InstCombine] Teach InstCombine to optimize 512-bit packss/packus intrinsics like it does 128/256-bit.
llvm-svn: 295294
2017-02-16 07:35:23 +00:00
Richard Trieu e55fb7f6f1 Revert r295284: Add better ODR checking for modules.
Fix modules build bot.

llvm-svn: 295293
2017-02-16 07:09:18 +00:00
Roman Gareev 4eb07e481e [FIX] Fix the typo in ScheduleOptimizer.cpp.
llvm-svn: 295292
2017-02-16 07:04:41 +00:00
Craig Topper f0d1147fae [AVX-512] Replace 512-bit masked packss/packus builtins and replace with new unmasked builtins.
These new unmasked builtins will enable us to easily support optimizing these builtins in InstCombine in the backend.

llvm-svn: 295291
2017-02-16 06:32:07 +00:00
Craig Topper 715873ead3 [AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics.
The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO.

llvm-svn: 295290
2017-02-16 06:31:54 +00:00
Rui Ueyama cd19b039ce Use isRelExprOneOf.
llvm-svn: 295289
2017-02-16 06:24:16 +00:00
Rui Ueyama f829e8c97d Removes a trivial accessor.
llvm-svn: 295288
2017-02-16 06:12:41 +00:00
Rui Ueyama 924b361d01 Do not overload a one-bit variable, NeedsCopyOrPltAddr.
This patch removes NeedsCopyOrPltAddr and instead add two variables,
NeedsCopy and NeedsPltAddr. This uses one more bit in Symbol class,
but the actual size doesn't increase because we had unused bits.
This should improve code readability.

llvm-svn: 295287
2017-02-16 06:12:22 +00:00
Richard Trieu 2700dc1302 Loosen a Type check ODR checking to try to fix the build bot.
llvm-svn: 295286
2017-02-16 05:48:25 +00:00
Petr Hosek 63524f56f1 [libunwind][CMake] Use libc++ headers when available
libunwind depends on C++ library headers. When building libunwind
as part of LLVM and libc++ is available, use its headers.

Differential Revision: https://reviews.llvm.org/D29997

llvm-svn: 295285
2017-02-16 05:18:08 +00:00
Richard Trieu f351ac8987 Add better ODR checking for modules.
Recommit r293585 that was reverted in r293611 with new fixes.  The previous
issue was determined to be an overly aggressive AST visitor from forward
declared objects.  The visitor will now only deeply visit certain Decl's and
only do a shallow information extraction from all other Decl's.

When objects are imported for modules, there is a chance that a name collision
will cause an ODR violation.  Previously, only a small number of such
violations were detected.  This patch provides a stronger check based on
AST nodes.

The information needed to uniquely identify an object is taken from the AST and
put into a one-dimensional byte stream.  This stream is then hashed to give
a value to represent the object, which is stored with the other object data
in the module.

When modules are loaded, and Decl's are merged, the hash values of the two
Decl's are compared.  Only Decl's with matched hash values will be merged.
Mismatch hashes will generate a module error, and if possible, point to the
first difference between the two objects.

The transform from AST to byte stream is a modified depth first algorithm.
Due to references between some AST nodes, a pure depth first algorithm could
generate loops.  For Stmt nodes, a straight depth first processing occurs.
For Type and Decl nodes, they are replaced with an index number and only on
first visit will these nodes be processed.  As an optimization, boolean
values are saved and stored together in reverse order at the end of the
byte stream to lower the ammount of data that needs to be hashed.

Compile time impact was measured at 1.5-2.0% during module building, and
negligible during builds without module building.

Differential Revision: https://reviews.llvm.org/D21675

llvm-svn: 295284
2017-02-16 04:53:40 +00:00
Rui Ueyama 26ad057099 Add comments.
llvm-svn: 295283
2017-02-16 04:51:46 +00:00