Cleanuppads may be merged together if one is the only predecessor of the
other in which case a simple transform can be performed: replace the
a cleanupret with a branch and remove an unnecessary cleanuppad.
Differential Revision: http://reviews.llvm.org/D17459
llvm-svn: 261390
TLSADDR nodes are lowered into actuall calls inside MC. In order to prevent
shrink-wrapping from pushing prologue/epilogue past them (which result
in TLS variables being accessed before the stack frame is set up), we
put markers, so that the stack gets adjusted properly.
Thanks to Quentin Colombet for guidance/help on how to fix this problem!
llvm-svn: 261387
Summary:
Instead of trying to replace SMRD instructions with a VGPR base pointer
with an equivalent MUBUF instruction, we now copy the base pointer to
SGPRs using v_readfirstlane.
This is safe to do, because any load selected as an SMRD instruction
has been proven to have a uniform base pointer, so each thread in the
wave will have the same pointer value in VGPRs.
This will fix some errors on VI from trying to replace SMRD instructions
with addr64-enabled MUBUF instructions that don't exist.
Reviewers: arsenm, cfang, nhaehnle
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17305
llvm-svn: 261385
Summary:
This bug was originally fixed in http://reviews.llvm.org/D7201.
However it was broken again by the fix to https://llvm.org/bugs/show_bug.cgi?id=22605.
This patch re-fixes __wrap_iter with GCC by providing a forward declaration of <vector> before the friend declaration in __wrap_iter.
This patch avoids the issues in PR22605 by putting canonical forward declarations in <iosfwd> and including <iosfwd> in <vector>.
<iosfwd> was chosen as the canonical forward declaration headers for the following reasons:
1. `<iosfwd>` is small with almost no dependancies.
2. It already forward declares `std::allocator`
3. It is already included in `<iterator>` which we need to fix the GCC bug.
This patch fixes the test "gcc_workaround.pass.cpp"
Reviewers: mclow.lists, EricWF
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D16345
llvm-svn: 261382
Summary:
According to the C++ standard <stdbool.h> isn't allowed to define `true` `false` or `bool`. However these macros are sometimes defined by the compilers `stdbool.h`.
Clang defines the macros whenever `__STRICT_ANSI__` isn't defined (ie `-std=gnu++11`).
New GCC versions define the macros in C++03 mode only, older GCC versions (4.9 and before) always define the macros.
This patch adds a wrapper header for `stdbool.h` that undefs the required macros.
Reviewers: mclow.lists, rsmith, EricWF
Subscribers: cfe-commits
Differential Revision: http://reviews.llvm.org/D16346
llvm-svn: 261381
option. Previously these options could both be used to specify that you were
compiling the implementation file of a module, with a different set of minor
bugs in each case.
This change removes -fmodule-implementation-of, and instead tracks a flag to
determine whether we're currently building a module. -fmodule-name now behaves
the same way that -fmodule-implementation-of previously did.
llvm-svn: 261372
Figured this would be a problem, but didn't want to jump the gun - large
inputs demonstrate it pretty easily (mostly for type units, but might as
well do the same for CUs too). A random sample 6m27s -> 27s change.
Also, by checking this up-front for CUs (rather than when building the
cu_index) we can probably provide better error messages (see FIXMEs),
hopefully providing the name of the CUs rather than just their
signature.
llvm-svn: 261364
Summary:
When optimizing for size, sqrt calls can be incorrectly selected as
AVX512 VSQRT instructions. This is because X86InstrAVX512.td has a
`Requires<[OptForSize]>` in its `avx512_sqrt_scalar` multiclass
definition. Even if the target does not support AVX512, the class can
apparently still be chosen, leading to an incorrect selection of
`vsqrtss`.
In PR26625, this lead to an assertion: Reg >= X86::FP0 && Reg <=
X86::FP6 && "Expected FP register!", because the `vsqrtss` instruction
requires an XMM register, which is not available on i686 CPUs.
Reviewers: grosbach, resistor, joker.eph
Subscribers: spatel, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D17414
llvm-svn: 261360
Add support for opencl_unroll_hint attribute from OpenCL v2.0 s6.11.5.
Reusing most of metadata generation from CGLoopInfo helper class.
The code is based on Khronos OpenCL compiler:
https://github.com/KhronosGroup/SPIR/tree/spirv-1.0
Patch by Liu Yaxun (Sam)!
Differential Revision: http://reviews.llvm.org/D16686
llvm-svn: 261350
Now that we don't always add an element to AllocatedStackSlots if we
don't find a pre-existing unallocated stack slot, bumping
StatepointMaxSlotsRequired to `NumSlots + 1` is not correct. Instead
bump the statistic near the push_back, to
Builder.FuncInfo.StatepointStackSlots.size().
llvm-svn: 261348
The check on MFI->getObjectSize() has to be on the FrameIndex, not on
the index of the FrameIndex in AllocatedStackSlots. Weirdly, the tests
I added in rL261336 didn't catch this.
llvm-svn: 261347
This patch enables the vectorization of first-order recurrences. A first-order
recurrence is a non-reduction recurrence relation in which the value of the
recurrence in the current loop iteration equals a value defined in the previous
iteration. The load PRE of the GVN pass often creates these recurrences by
hoisting loads from within loops.
In this patch, we add a new recurrence kind for first-order phi nodes and
attempt to vectorize them if possible. Vectorization is performed by shuffling
the values for the current and previous iterations. The vectorization cost
estimate is updated to account for the added shuffle instruction.
Contributed-by: Matthew Simpson and Chad Rosier <mcrosier@codeaurora.org>
Differential Revision: http://reviews.llvm.org/D16197
llvm-svn: 261346
This patches does the following:
+ fix return type: ClangExpressionParser::Parse returns unsigned, but was actually returning a signed value, num_errors.
+ use helper clang::TextDiagnosticBuffer::getNumErrors() instead of counting the errors ourself.
+ limit scoping of block-level automatic variables as much as practical.
+ remove reused multipurpose TextDiagnosticBuffer::const_iterator in favour of loop-scoped err, warn, and note variables in the diagnostic printing code.
+ refactor diagnostic printing loops to use a proper loop invariant.
Author: Luke Drummond <luke.drummond@codeplay.com>
Differential Revision: http://reviews.llvm.org/D17273
llvm-svn: 261345
There is a report in the PR from several months ago that it failed
intermittently, but it is passing consistently for me on FreeBSD 10
and 11. We can re-add a decorator if further testing shows it is
still flakey.
llvm.org/pr17214
llvm-svn: 261340
I ran the test suite yesterday and when I came back this morning the
queue_user_work_item.cc test was hung. This could be why the
sanitizer-windows buildbot keeps randomly timing out. I updated all the
usages of WaitForSingleObject involving threading events. I'm assuming
the API can reliably wait for subprocesses, which is what the majority
of call sites use it for.
While I'm at it, we can simplify some EH tests now that clang can
compile C++ EH.
llvm-svn: 261338
NFCI. They key motivation here is that I'd like to use
SmallBitVector::all() in a later change. Also, using a bit vector here
seemed better in general.
The only interesting change here is that in the failure case of
allocateStackSlot, we no longer (the equivalent of) push_back(true) to
AllocatedStackSlots. As far as I can tell, this is fine, since we'd
never re-use those slots in the same StatepointLoweringState instance.
Technically there was no need to change the operator[] type accesses to
set() and test(), but I thought it'd be nice to make it obvious that
we're using something other than a std::vector like thing.
llvm-svn: 261337
allocateStackSlot did not consider the size of the value to be spilled
before deciding to re-use a spill slot. This was originally okay (since
originally we'd only ever spill pointers), but it became not okay when
we changed our scheme to directly spill vectors of pointers.
While this change fixes the bug pointed out, it has two performance
caveats:
- It matches spill slot and spillee size exactly, while in theory we
can spill, e.g., an 8 byte pointer into a 16 byte slot. This is
slightly complicated to fix since in the stackmaps section, we report
the size of the spill slot as the size of the "indirect value"; and
if they're no longer equivalent, we'll have to keep track of the
(indirect) value size separately from the stack slot size.
- It will "spuriously run out" of reusable slots, since we now have an
second check in the search loop in addition to the availablity
check (e.g. you had two free scalar slots, and you first ask for a
vector slot followed by a scalar slot). I'll fix this in a later
commit.
llvm-svn: 261336
This removes the unusual loop structure in allocateStackSlot in favor of
something more straightforward. I've also removed the cautionary
comment in the function, which I suspect is historical cruft now, and
confuses more than it enlightens.
llvm-svn: 261335