As suggested by Fabian on PR37441, use PSHUFLW to extend shift amount types for use with PSRAD/PSRLD to reduce register pressure.
Some of this ideally would be done by combineTargetShuffle but its tricky to do as most of the shuffles are sharing inputs.
Differential Revision: https://reviews.llvm.org/D46959
llvm-svn: 332524
Currently, diagnoses code that calls container.data()[some_index] when the container exposes a suitable operator[]() method that can be used directly.
Patch by Shuai Wang.
llvm-svn: 332519
The getAtom() method wasn't doing what we needed in all cases. We want
the symbols for the function which defines that section. We can compute
this easily enough and we know that we have at most one function in each
section.
Once this lands I will revert rL331412 which is no longer needed.
Fixes PR37409
Differential Revision: https://reviews.llvm.org/D46970
llvm-svn: 332517
Summary:
Previously, `google-readability-casting` was disabled for Objective-C.
The Google Objective-C++ style allows both Objective-C and
C++ style in the same file. Since clang-tidy doesn't have a good
way to allow multiple styles per file, this disables the
check for Objective-C++.
Test Plan: New tests added. Ran tests with:
% make -j16 check-clang-tools
Before diff, confirmed tests failed:
https://reviews.llvm.org/P8081
After diff, confirrmed tests passed.
Reviewers: alexfh, Wizard, hokein, stephanemoore
Reviewed By: alexfh, Wizard, stephanemoore
Subscribers: stephanemoore, cfe-commits, bkramer, klimek
Differential Revision: https://reviews.llvm.org/D46659
llvm-svn: 332516
The cost computation assumes we do this correctly, but the actual
lowering was wrong.
Differential Revision: https://reviews.llvm.org/D46923
llvm-svn: 332514
This makes it possible to unwind hardware exception stack frames,
which necessarily save every register and so need an extra column
for storing the return address. CFI for the exception handler could
then look as follows:
.globl exception_vector
exception_vector:
.cfi_startproc
.cfi_signal_frame
.cfi_return_column 32
l.addi r1, r1, -0x100
.cfi_def_cfa_offset 0x100
l.sw 0x00(r1), r2
.cfi_offset 2, 0x00-0x100
l.sw 0x04(r1), r3
.cfi_offset 3, 0x04-0x100
l.sw 0x08(r1), r4
.cfi_offset 4, 0x08-0x100
l.mfspr r3, r0, SPR_EPCR_BASE
l.sw 0x78(r1), r3
.cfi_offset 32, 0x78-0x100
l.jal exception_handler
l.nop
l.lwz r2, 0x00(r1)
l.lwz r3, 0x04(r1)
l.lwz r4, 0x08(r1)
l.jr r9
l.nop
.cfi_endproc
This register could, of course, also be accessed by the trace
callback or personality function, if so desired.
llvm-svn: 332513
Before this commit, R9, the link register, was used as PC register.
However, a stack frame may have R9 not set to PC on entry, either
because it uses a custom calling convention, or, more likely,
because this is a signal or exception stack frame. Using R9 as
PC register made it impossible to unwind such frames.
All other architectures similarly use a dedicated PC register.
llvm-svn: 332512
The added test case was triggering assertion
> Assertion failed: (!SpecializedTemplate.is<SpecializedPartialSpecialization*>() && "Already set to a class template partial specialization!"), function setInstantiationOf, file clang/include/clang/AST/DeclTemplate.h, line 1825.
It was happening with ClassTemplateSpecializationDecl
`enable_if_not_same<int, int>`. Because this template is specialized for
equal types not to have a definition, it wasn't instantiated and its
specialization kind remained TSK_Undeclared. And because it was implicit
instantiation, we didn't mark the decl as invalid. So when we try to
find the best matching partial specialization the second time, we hit
the assertion as partial specialization is already set.
Fix by reusing stored partial specialization when available, instead of
looking for the best match every time.
rdar://problem/39524996
Reviewers: rsmith, arphaman
Reviewed By: rsmith
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D46909
llvm-svn: 332509
Summary:
This can be solved just in seconds with KLEE. Current libFuzzer
is able to satistfy 101 constraints out of 410 constraints presented during
the first hour of running with -use_value_profile=1 and -max_len=20.
During the next 3 hours, libFuzzer is able to generate ~50 NEW inputs,
bot none of those solve any new constraint.
During the next 20 hours, it didn't find any NEW inputs.
This test might be interesting for experimenting with the data flow tracing
approach started in https://reviews.llvm.org/D46666.
For the solution with KLEE and other information, see
https://github.com/Dor1s/codegate2017-quals-angrybird
Reviewers: kcc
Reviewed By: kcc
Subscribers: delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D46924
llvm-svn: 332507
Summary:
This is needed for the continuation of D46504,
to be able to store the timings.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46939
llvm-svn: 332506
Summary:
This is needed for the continuation of D46504,
to be able to store the timings.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46938
llvm-svn: 332505
Summary:
Although this is not stricly required, i would very much prefer
not to have known random precision losses along the way.
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: george.karpenkov
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46937
llvm-svn: 332504
Summary: We have just used `.sys` suffix for the previous timer, this is clearly a typo
Reviewers: george.karpenkov, NoQ, alexfh, sbenza
Reviewed By: alexfh
Subscribers: llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D46936
llvm-svn: 332503
Summary:
It turns out that the previous code construct was not optimizing the allocation
and deallocation of batches. The class id was read as a class member (even
though a precomputed one) and nothing else was optimized. By changing the
construct this way, the compiler actually optimizes most of the allocation and
deallocation away to only work with a single class id, which not only saves some
CPU but also some code footprint.
Reviewers: alekseyshl, dvyukov
Reviewed By: dvyukov
Subscribers: dvyukov, delcypher, llvm-commits, #sanitizers
Differential Revision: https://reviews.llvm.org/D46961
llvm-svn: 332502
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332501
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332500
We need to clean up the DAG floating-point undef logic.
This process is similar to how we handled integer undef
logic in D43141.
And as we did there, I'm trying to reduce the patch by
changing tests that would probably become meaningless
once we make those fixes.
llvm-svn: 332499
As i64 types are not legal on 32-bit targets, insert these into a suitable zero vector and use the packed vXi64<->FP conversion instructions instead.
Fixes PR3163.
Differential Revision: https://reviews.llvm.org/D43441
llvm-svn: 332498
In post-commit review for r332416, Paul Robinson pointed out that the
test for -debugify-each is not checking what it needs to.
This commit tightens up the test.
llvm-svn: 332497
Summary:
Before this patch, signal handling wasn't signal safe. This leads to real-world
crashes. It used ManagedStatic inside of signals, this can allocate and can lead
to unexpected state when a signal occurs during llvm_shutdown (because
llvm_shutdown destroys the ManagedStatic). It also used cl::opt without custom
backing storage. Some de-allocation was performed as well. Acquiring a lock in a
signal handler is also a great way to deadlock.
We can't just disable signals on llvm_shutdown because the signals might do
useful work during that shutdown. We also can't just disable llvm_shutdown for
programs (instead of library uses of clang) because we'd have to then mark the
pointers as not leaked and make sure all the ManagedStatic uses are OK to leak
and remain so.
Move all of the code to lock-free datastructures instead, and avoid having any
of them in an inconsistent state. I'm not trying to be fancy, I'm not using any
explicit memory order because this code isn't hot. The only purpose of the
atomics is to guarantee that a signal firing on the same or a different thread
doesn't see an inconsistent state and crash. In some cases we might miss some
state (for example, we might fail to delete a temporary file), but that's fine.
Note that I haven't touched any of the backtrace support despite it not
technically being totally signal-safe. When that code is called we know
something bad is up and we don't expect to continue execution, so calling
something that e.g. sets errno is the least of our problems.
A similar patch should be applied to lib/Support/Windows/Signals.inc, but that
can be done separately.
Fix r332428 which I reverted in r332429. I originally used double-wide CAS
because I was lazy, but some platforms use a runtime function for that which
thankfully failed to link (it would have been bad for signal handlers
otherwise). I use a separate flag to guard the data instead.
<rdar://problem/28010281>
Reviewers: dexonsmith
Subscribers: steven_wu, llvm-commits
llvm-svn: 332496
We already know where the CUDA SDK is, so there is no point in
letting Clang search for it again and possibly finding no or
a different installation.
--cuda-path is supported since the beginning of CUDA support in
Clang, so making this required doesn't impose additional restrictions.
Differential Revision: https://reviews.llvm.org/D46930
llvm-svn: 332495
Move all logic related to selecting the bitcode compiler and linker
into a new file and dynamically test required compiler flags. This
also adds -fcuda-rdc for Clang trunk as previously attempted in D44992
which fixes the build.
As a result this change also enables building the library by default
if all prerequisites are met.
Differential Revision: https://reviews.llvm.org/D46901
llvm-svn: 332494
Summary: This change will help us turn the DispatchUnit into its own stage.
Reviewers: andreadb, RKSimon, courbet
Reviewed By: andreadb, courbet
Subscribers: mgorny, tschuett, gbedwell, llvm-commits
Differential Revision: https://reviews.llvm.org/D46916
llvm-svn: 332493
Clang often tries to create implicit module import for error recovery,
which does a great job helping out with diagnostics. However, sometimes
clang does not have enough information given that it's using an invalid
context to move on. Be more strict in those cases to avoid crashes.
We hit crash on invalids because of this but unfortunately there are no
testcases and I couldn't manage to create one. The crashtrace however
indicates pretty clear why it's happening.
rdar://problem/39313933
llvm-svn: 332491
As part of merging stores we check that fusing the nodes does not
cause a cycle due to one candidate store being indirectly dependent on
another store (this may happen via chained memory copies). This is
done by searching if a store is a predecessor to another store's
value.
Prune the search at the candidate search's root node which is a
predecessor to all candidate stores. This reduces the
size of the subgraph searched in large basic blocks.
Reviewers: jyknight
Subscribers: llvm-commits, hiraditya
Differential Revision: https://reviews.llvm.org/D46955
llvm-svn: 332490
An assertion was not prepared to be passed a nullptr because the
out-of-quota limit was exceeded. Bail-out before the assertion
since the assertion does not apply on out-of-quote.
This fixes llvm.org/PR37477.
llvm-svn: 332488
As far as I can tell from revision history, there's no good reason to call
these files .so instead of .dll in Windows, so use the normal extension.
Also change PipSquak from SHARED to MODULE -- it's never passed to
target_link_libraries() and only loaded via dlopen(), so MODULE is more
appropriate. This makes it possible to delete a workaround for SHARED ldflags
being not quite right as well.
No intended behavior change.
https://reviews.llvm.org/D46898
llvm-svn: 332487
For regular SVE vector operands, this patch introduces a more
sensible diagnostic when the vector has a wrong suffix (e.g. z0.s vs z0.b).
For example:
add z0.s, z1.s, z2.b -> invalid element width
^_____^
mismatch
For the vector-with-shift/extend (e.g. z0.s, uxtw #2) this patch takes
a slightly different approach and instead returns a 'invalid operand'
if the element size is not as expected. This is because the diagnostics
are more specificied to suggest using the right shift/extend suffix. This
is a trade-off not to introduce more operand classes and still provide
useful diagnostics for LD1 and PRF instructions.
For example:
ld1w z1.s, p0/z, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw)'
ld1w z1.d, p0/z, [x0, z0.s] -> invalid operand
^________________^
mismatch
For gather prefetches, both 'z0.s' and 'z0.d' would be allowed:
prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].s, (uxtw|sxtw) #2'
prfw #0, p0, [x0, z0.d] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'
Without this change, the diagnostic would unnecessarily suggest a
different element size:
prfw #0, p0, [x0, z0.s] -> invalid shift/extend specified, expected 'z[0..31].d, (lsl|uxtw|sxtw) #2'
Reviewers: SjoerdMeijer, aemerson, fhahn, samparker, javed.absar
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D46688
llvm-svn: 332483
Keep loads and stores together (target defines how many loads
and stores to gang up), such that it will help in pairing
and vectorization.
Differential Revision https://reviews.llvm.org/D46477
llvm-svn: 332482
The canonicalization was restricted to shuffle masks with
a 1-to-1 mapping to the constant vector, but that disqualifies
the common splat pattern. This is part of solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
llvm-svn: 332479
Summary:
For the 32-bit TransferBatch:
- `SetFromArray` callers have bounds `count`, so relax the `CHECK` to `DCHECK`;
- same for `Add`;
- mark `CopyToArray` as `const`;
For the 32-bit Primary:
- `{Dea,A}llocateBatch` are only called from places that check `class_id`,
relax the `CHECK` to `DCHECK`;
- same for `AllocateRegion`;
- remove `GetRegionBeginBySizeClass` that is not used;
- use a local variable for the random shuffle state, so that the compiler can
use a register instead of reading and writing to the `SizeClassInfo` at every
iteration;
For the 32-bit local cache:
- pass the count to drain instead of doing a `Min` everytime which is at times
superfluous.
Reviewers: alekseyshl
Reviewed By: alekseyshl
Subscribers: kubamracek, delcypher, #sanitizers, llvm-commits
Differential Revision: https://reviews.llvm.org/D46657
llvm-svn: 332478
functions.
If the combined construct is specified in the declare target function
and the device code is emitted, the compiler crashes because of the
incorrectly chosen captured stmt. We should choose the innermost
captured statement, not the outermost.
llvm-svn: 332477
The module ID numbering typically starts at 0 (in both the new and old
LTO APIs, used by linkers). Make llvm-lto consistent with that.
Split out of D46699.
llvm-svn: 332476