Summary:
r310244 fixed a bug introduced by r309914 for non-Fuchsia builds.
In doing so it also reversed the intended effect of the change for
Fuchsia builds, which was to allow all the AllocateFromLocalPool
code and its variables to be optimized away entirely.
This change restores that optimization for Fuchsia builds, but
doesn't have the original change's bug because the comparison
arithmetic now takes into account the size of the elements.
Submitted on behalf of Roland McGrath.
Reviewers: vitalybuka, alekseyshl
Reviewed By: alekseyshl
Subscribers: llvm-commits, kubamracek
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36430
llvm-svn: 310330
Summary:
Include <stdarg.h> for variable argument list macros (va_list, va_start etc).
Add fallback definition of _LIBCPP_GET_C_LOCALE, this is required for
GNU libstdc++ compatibility. Define new macro SANITIZER_GET_C_LOCALE.
This value is currently required for FreeBSD and NetBSD for printf_l(3) tests.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, kcc, vitalybuka, filcab, fjricci
Reviewed By: vitalybuka
Subscribers: llvm-commits, emaste, kubamracek, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36406
llvm-svn: 310323
Summary:
Part of the code inspired by the original work on libsanitizer in GCC 5.4 by Christos Zoulas.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, kcc, vitalybuka, filcab, fjricci
Reviewed By: vitalybuka
Subscribers: davide, kubamracek, llvm-commits, #sanitizers
Tags: #sanitizers
Differential Revision: https://reviews.llvm.org/D36377
llvm-svn: 310322
Polly has traditionally always been executed at the beginning of the pass
pipeline as LLVM's inliner and DeLICM passes introduced plenty of scalar
dependences which prevented any kind of useful high-level loop optimizations
later in the pass pipeline. With DeLICM now being available, Polly can also
run optimizations when folded into the pass pipeline. This has the benefit
that Polly should now be more effective on C++ code and as an additional bonus,
no additional early canonicalization phase must be run. As a result, Polly
touches the code only if it applies a transformation. Code that does not
benefit from Polly is not touched and consequently will have the very same
execution time as without Polly enabled. Random performance changes, as could
sometimes be observed with polly-position=early are consequently not possible
any more. If performance is changed, this is due to Polly is choosing to
perform a transformation. If this choice is wrong, it can be fixed directly
in Polly.
http://polly.llvm.org/docs/Architecture.html#polly-in-the-llvm-pass-pipeline
llvm-svn: 310319
viewing of the final IR. This is useful for confirming that
structure layout was correct.
I've added two tests:
- A test that checks that structs in top-level code are completed
correctly during struct layout (they are)
- A test that checks that structs defined in function bodies are
cpmpleted correctly during struct layout (currently they are not,
so this is XFAIL).
The second test fails because LookupSameContext()
(ExternalASTMerger.cpp) can't find the struct. This is an issue I
intend to resolve separately.
Differential Revision: https://reviews.llvm.org/D36429
llvm-svn: 310318
This allows us to remove more scalar dependences. While this feature is still
rather experimental, we want to give it sufficient test coverage.
llvm-svn: 310314
Two write statements which write into the very same array slot generally are
conflicting. However, in case the value that is written is identical, this
does not cause any problem. Hence, allow such write pairs in this specific
situation.
llvm-svn: 310311
Executables may not contain a load config, and clients should be able to
test for nullability. Previously we'd return uninitialized memory. Now
getLoadConfig32/64 return valid pointers or null.
Fixes PR34108
llvm-svn: 310308
This commit implements the initial version of fully-indexed static
expansion.
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[j] = j;
T: A[i] = B[i]
```
After the pass, we want this :
```
for(int i = 0; i<Ni; i++)
for(int j = 0; j<Ni; j++)
S: B[i][j] = j;
T: A[i] = B[i][i]
```
For now we bail (fail) in the following cases:
- Scalar access
- Multiple writes per SAI
- MayWrite Access
- Expansion that leads to an access to the original array
Furthermore: We still miss checks for escaping references to the array
base pointers. A future commit will add the missing escape-checks to
stay correct in those cases. The expansion is still locked behind a
CLI-Option and should not yet be used.
Patch contributed by: Nicholas Bonfante <bonfante.nicolas@gmail.com>
Reviewers: simbuerg, Meinersbur, bollu
Reviewed By: Meinersbur
Subscribers: mgorny, llvm-commits, pollydev
Differential Revision: https://reviews.llvm.org/D34982
llvm-svn: 310304
This is similar to what's done on arm and x86_64, where
these calling conventions are silently ignored, as in
SVN r245076.
Differential Revision: https://reviews.llvm.org/D36105
llvm-svn: 310303
Summary: When device offloading is enabled and the device is an NVIDIA GPU, OpenMP target regions must be compiled with relocation enabled by passing the "-c" flag to the PTXAS invocation.
Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, Hahnfeld, jlebar, hfinkel, tstellar
Reviewed By: Hahnfeld
Subscribers: Hahnfeld, rengolin, mkuron, cfe-commits
Differential Revision: https://reviews.llvm.org/D29642
llvm-svn: 310300
Summary: SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.
Reviewers: chandlerc, davidxl, tejohnson
Reviewed By: chandlerc, tejohnson
Subscribers: sanjoy, mehdi_amini, eraman, llvm-commits
Differential Revision: https://reviews.llvm.org/D36333
llvm-svn: 310296
Summary: When compiling code being offloaded by OpenMP to an NVIDIA GPU, pass the -v to PTXAS if it was passed to the CLANG driver.
Reviewers: arpith-jacob, caomhin, carlo.bertolli, ABataev, jlebar, hfinkel, tstellar
Reviewed By: jlebar
Subscribers: Hahnfeld, rengolin, cfe-commits
Differential Revision: https://reviews.llvm.org/D29644
llvm-svn: 310295
libc++'s inline namespace can change depending on the ABI version.
Instead of hardcoding __1 in the manual Microsoft ABI manglings for the
iostream globals, stringify _LIBCPP_NAMESPACE and use that instead, to
work across all ABI versions.
llvm-svn: 310290
The root cause of reverting was fixed - PR33514.
Summary:
The patch makes instruction count the highest priority for
LSR solution for X86 (previously registers had highest priority).
Reviewers: qcolombet
Differential Revision: http://reviews.llvm.org/D30562
From: Evgeny Stupachenko <evstupac@gmail.com>
<evgeny.v.stupachenko@intel.com>
llvm-svn: 310289
Summary:
All instructions with the DPP modifier may not write to certain lanes of
the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask
aren't set, so the destination register may be both defined and modified.
The right way to handle this is to add a constraint that the destination
register is the same as one of the inputs. We could tie the destination
to the first source, but that would be too restrictive for some use-cases
where we want the destination to be some other value before the
instruction executes. Instead, add a fake "old" source and tie it to the
destination. Effectively, the "old" source defines what value unwritten
lanes will get. We'll expose this functionality to users with a new
intrinsic later.
Also, we want to use DPP instructions for computing derivatives, which
means we need to set WQM for them. We also need to enable the entire
wavefront when using DPP intrinsics to implement nonuniform subgroup
reductions, since otherwise we'll get incorrect results in some cases.
To accomodate this, add a new operand to all DPP instructions which will
be interpreted by the SI WQM pass. This will be exposed with a new
intrinsic later. We'll also add support for Whole Wavefront Mode later.
I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up
the test. However, I could also keep the old behavior (where lanes that
aren't written are undefined) if people want it.
Reviewers: tstellar, arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye
Differential Revision: https://reviews.llvm.org/D34716
llvm-svn: 310283
This is an addition to the -polly-optree pass that reuses the array
content analysis from DeLICM to find array elements that contain the
same value as the value loaded when the target statement instance
is executed.
The analysis is now enabled by default.
The known content analysis could also be used to rematerialize any
llvm::Value that was written to some array element, but currently
only loads are forwarded.
Differential Revision: https://reviews.llvm.org/D36380
llvm-svn: 310279
This hasn't done anything in a long time. This was
running after the the control flow pseudos were expanded,
so this would never find them. The control flow pseudo
expansion was moved to solve the problem this pass was
supposed to solve in the first place, except handling
it earlier also fixes it for fast regalloc which doesn't
use LiveIntervals.
Noticed by checking LCOV reports.
llvm-svn: 310274
Note the original code I deleted incorrectly listed this as (X | C1) & C2 --> (X & C2^(C1&C2)) | C1 Which is only valid if C1 is a subset of C2. This relied on SimplifyDemandedBits to remove any extra bits from C1 before we got to that code.
My new implementation avoids relying on that behavior so that it can be naively verified with alive.
Differential Revision: https://reviews.llvm.org/D36384
llvm-svn: 310272
Using task_for_pid to get the "self" task is not necessary, and it can fail (e.g. for sandboxed processes). Let's just use mach_task_self().
Differential Revision: https://reviews.llvm.org/D36284
llvm-svn: 310271