Summary: This is another step in a multi-step refactoring to move add_sanitizer_rt_symbols in the direction of other add_* functions in compiler-rt.
Reviewers: filcab, bogner, kubabrecka, zaks.anna, glider, samsonov
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D12409
llvm-svn: 246178
Current implementation tries to guess which Action will result in a
job which needs to incorporate device-side GPU binaries. The guessing
was attempting to work around the fact that multiple actions may be
combined into a single compiler invocation. If CudaHostAction ends up
being combined (and thus bypassed during action list traversal) no
device-side actions it pointed to were processed. The guessing worked
for most of the usual cases, but fell apart when external assembler
was used.
This change removes the guessing and makes sure we create and pass
device-side jobs regardless of how the jobs get combined.
* CudaHostAction is always inserted either at Compile phase or the
FinalPhase of current compilation, whichever happens first.
* If selectToolForJob combines CudaHostAction with other actions, it
passes info about CudaHostAction up to the caller
* When it sees that CudaHostAction got combined with other actions
(and hence will never be passed to BuildJobsForActions),
BuildJobsForActions creates device-side jobs the same way they would
be created if CudaHostAction was passed to BuildJobsForActions
directly.
* Added two more test cases to make sure GPU binaries are passed to
correct jobs.
Differential Revision: http://reviews.llvm.org/D11280
llvm-svn: 246174
The doc files for checks have been generated from the corresponding header files
using the docs/clang-tidy/tools/dump_check_docs.py script. Committing the script
as well, but the intention is to move all the user-facing docs from header files
to the rST files and add links to .h files appropriately.
llvm-svn: 246173
Changes mostly address formatting and unification of the style. Use
MarkDown style for inline code snippets and lists. Added some text
for a few checks.
The idea is to move most of the documentation out to separate rST files and have
implementation files refer to the corresponding documentation files.
llvm-svn: 246169
If a region does not have more than one loop, we do not identify it as
a Scop in ScopDetection. The main optimizations Polly is currently performing
(tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
to have a positive impact on individual loops. In some cases, Polly's run-time
alias checks or conditional hoisting may still have a positive impact, but those
are mostly enabling transformations which LLVM already performs for individual
loops. As we do not focus on individual loops, we leave them untouched to not
introduce compile time regressions and execution time noise. This results in
good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
Contributed-by: Pratik Bhatu <cs12b1010@iith.ac.in>
Reviewers: grosser
Differential Revision: http://reviews.llvm.org/D12268
llvm-svn: 246161
Constant propagation for single precision math functions (such as
tanf) is already working, but was not enabled. This patch enables
these for many single-precision functions, and adds respective test
cases.
Newly handled functions: acosf asinf atanf atan2f ceilf coshf expf
exp2f fabsf floorf fmodf logf log10f powf sinhf tanf tanhf
llvm-svn: 246158
This is a basic implementation that allows lld to emit binaries
consumable by the HSA runtime.
Differential Revision: http://reviews.llvm.org/D11267
llvm-svn: 246155
Unlike scalar operations, we can perform vector operations on element types that
are smaller than the native integer types. We type-promote scalar operations if
they are smaller than a native type (e.g., i8 arithmetic is promoted to i32
arithmetic on Arm targets). This patch detects and removes type-promotions
within the reduction detection framework, enabling the vectorization of small
size reductions.
In the legality phase, we look through the ANDs and extensions that InstCombine
creates during promotion, keeping track of the smaller type. In the
profitability phase, we use the smaller type and ignore the ANDs and extensions
in the cost model. Finally, in the code generation phase, we truncate the result
of the reduction to allow InstCombine to rewrite the entire expression in the
smaller type.
This fixes PR21369.
http://reviews.llvm.org/D12202
Patch by Matt Simpson <mssimpso@codeaurora.org>!
llvm-svn: 246149
This patch fix the function GetTls for aarch64, where it assumes it
follows the x86_64 way where the TLS initial address is at the end
of TLS. Instead aarch64 set the TLS address as the thread pointer.
llvm-svn: 246148
... and move it into LoopUtils where it can be used by other passes, just like ReductionDescriptor. The API is very similar to ReductionDescriptor - that is, not very nice at all. Sorting these both out will come in a followup.
NFC
llvm-svn: 246145
This change allows the BlockGenerator to be reused in contexts where we want to
provide different/modified isl_ast_expressions, which are not only changed to
a different access relation than the original statement, but which may indeed
be different for each code-generated instance of the statement.
We ensure testing of this feature by moving Polly's support to import changed
access functions through a jscop file to use the BlockGenerators support for
generating arbitary access functions if provided.
This commit should not change the behavior of Polly for now. The diff is rather
large, but most changes are due to us passing the NewAccesses hash table through
functions. This style, even though rather verbose, matches what is done
throughout the BlockGenerator with other per-statement properties.
llvm-svn: 246144
Use ISL to compute the loop trip count when scalar evolution is unable to do
so.
Contributed-by: Matthew Simpson <mssimpso@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D9444
llvm-svn: 246142
Older OSX versions don't define NSOperatingSystemVersion, so building
lldb gets: error: unknown type name 'NSOperatingSystemVersion'
This patch fixes the build by having GetOSVersionNumbers return false if
__ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__ < 101000, causing lldb to
behave the same as it did before the commit.
Reviewed by: jasonmolenda
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D12396
llvm-svn: 246138
Globals in address spaces other than one may have 0 as a valid address,
so we should not assume that they can be null.
Reviewed by Philip Reames.
llvm-svn: 246137
A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-store forwarding across a release fence.
We do need to make sure that stores before the fence can't be eliminated even if there's another store to the same location after the fence. In theory, we could reorder the second store above the fence and *then* eliminate the former, but we can't do this if the stores are on opposite sides of the fence.
Note: While more aggressive then what's there, this patch is still implementing a really conservative ordering. In particular, I'm not trying to exploit undefined behavior via races, or the fact that the LangRef says only 'atomic' accesses are ordered w.r.t. fences.
Differential Revision: http://reviews.llvm.org/D11434
llvm-svn: 246134
When computing base pointers, we introduce new instructions to propagate the base of existing instructions which might not be bases. However, the algorithm doesn't make any effort to recognize when the new instruction to be inserted is the same as an existing one already in the IR. Since this is happening immediately before rewriting, we don't really have a chance to fix it after the pass runs without teaching loop passes about statepoints.
I'm really not thrilled with this patch. I've rewritten it 4 different ways now, but this is the best I've come up with. The case where the new instruction is just the original base defining value could be merged into the existing algorithm with some complexity. The problem is that we might have something like an extractelement from a phi of two vectors. It may be trivially obvious that the base of the 0th element is an existing instruction, but I can't see how to make the algorithm itself figure that out. Thus, I resort to the call to SimplifyInstruction instead.
Note that we can only adjust the instructions we've inserted ourselves. The live sets are still being tracked in side structures at this point in the code. We can't easily muck with instructions which might be in them. Long term, I'm really thinking we need to materialize the live pointer sets explicitly in the IR somehow rather than using side structures to track them.
Differential Revision: http://reviews.llvm.org/D12004
llvm-svn: 246133
This patch ensures that every analysis diagnostic produced by the vectorizer
will be printed if the loop has a vectorization hint on it. The condition has
also been improved to prevent printing when a disabling hint is specified.
llvm-svn: 246132