Similar to what we do for vXi8 ASHR(X, 7), use SSE42's PCMPGTQ to splat the sign instead of using the PSRAD+PSHUFD.
Avoiding bitcasts this improves combines that utilize computeNumSignBits, permits memory folding and reduces pipe pressure. Although it does require a second register, given that this is a (cheap) zero register the impact is minimal.
Differential Revision: https://reviews.llvm.org/D32973
llvm-svn: 302525
There is no other explanation about why this only started happening
now, even though it crashes on old code (supposedly reachable from
here).
The only common factor between the failing bots is that they use GCC
(4.9 and 5.3) to compile Clang, while the others use Clang 3.8, but the
failure is while building the tests, as an assertion, on Clang.
Commenting it out for now in hope the bots will go back green, but we
should keep looking for the real cause, and update bugzilla.
llvm-svn: 302520
Summary: PPCGCodeGeneration now attaches the size of the kernel launch parameters at the end of the parameter list. For the existing CUDA Runtime, this gets ignored, but the OpenCL Runtime knows to check for kernel-argument size at the end of the parameter list. (The resulting parameters list is twice as long. This has been accounted for in the corresponding test cases).
Reviewers: grosser, Meinersbur, bollu
Reviewed By: bollu
Subscribers: nemanjai, yaxunl, Anastasia, pollydev, llvm-commits
Tags: #polly
Differential Revision: https://reviews.llvm.org/D32961
llvm-svn: 302515
- This change allows targets to opt-in to using them instead of the log2
shufflevector algorithm.
- The SLP and Loop vectorizers have the common code to do shuffle reductions
factored out into LoopUtils, and now have a unified interface for generating
reductions regardless of the preference of the target. LoopUtils now uses TTI
to determine what kind of reductions the target wants to handle.
- For CodeGen, basic legalization support is added.
Differential Revision: https://reviews.llvm.org/D30086
llvm-svn: 302514
This fixes the bug: https://bugs.llvm.org/show_bug.cgi?id=32638
int main()
{
[](auto x) noexcept(noexcept(x)) { } (0);
}
In the above code, prior to this patch, when substituting into the noexcept expression, i.e. transforming the DeclRefExpr that represents 'x' - clang attempts to capture 'x' because Sema's CurContext is still pointing to the pattern FunctionDecl (i.e. the templated-decl set in FinishTemplateArgumentDeduction) which does not match the substituted 'x's DeclContext, which leads to an attempt to capture and an assertion failure.
We fix this by adjusting Sema's CurContext to point to the substituted FunctionDecl under which the noexcept specifier's argument should be transformed, and so the ParmVarDecl that 'x' refers to has the same declcontext and no capture is attempted.
I briefly investigated whether the SwitchContext should occur right after VisitMethodDecl creates the new substituted FunctionDecl, instead of only during instantiating the exception specification - but seeing no other code that seemed to rely on that, I decided to leave it just for the duration of the exception specification instantiation.
llvm-svn: 302507
We were sometimes doing a function->pointer conversion in
Sema::CheckPlaceholderExpr, which isn't the job of CheckPlaceholderExpr.
So, when we saw typeof(OverloadedFunctionName), where
OverloadedFunctionName referenced a name with only one function that
could have its address taken, we'd give back a function pointer type
instead of a function type. This is incorrect.
I kept the logic for doing the function pointer conversion in
resolveAndFixAddressOfOnlyViableOverloadCandidate because it was more
consistent with existing ResolveAndFix* methods.
llvm-svn: 302506
When a type in a class is from a typedef, only check the canonical type. Skip
checking the intermediate underlying types. This is in response to PR 32965
llvm-svn: 302505
This reverts commit r302461.
It appears to be causing failures compiling gtest with debug info on the
Linux sanitizer bot. I was unable to reproduce the failure locally,
however.
llvm-svn: 302504
It appears that the code is actually dead since unbridged-cast
placeholder types are created by calling CastOperation::complete and
ImplicitCastExprs are never passed to it.
Spotted by Vedant Kumar.
rdar://problem/31542226
llvm-svn: 302503
Summary:
r284533 added hot and cold section prefixes based on profile
information, to enable grouping of hot/cold functions at link time.
However, it used "cold" as the prefix for cold sections, but gold only
recognizes "unlikely" (which is used by gcc for cold sections).
Therefore, cold sections were not properly being grouped. Switch to
using "unlikely"
Reviewers: danielcdh, davidxl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32983
llvm-svn: 302502
Previously, the force includes would complain about a missing _DEBUG symbol.
Now we dump macros before adding the force includes to the command line.
Now with proper newlines.
llvm-svn: 302497
blocks.
r302270 made changes to avoid emitting clang.arc.use at -O0 and instead
emit @objc_release. We also have to emit @objc_retain for the captured
variable at -O0 to match the @objc_release instead of just storing the
pointer to the capture field.
llvm-svn: 302495
Summary:
We define the `__xray_customeevent` builtin that gets translated to
IR calls to the correct intrinsic. The default implementation of this is
a no-op function. The codegen side of this follows the following logic:
- When `-fxray-instrument` is not provided in the driver, we elide all
calls to `__xray_customevent`.
- When `-fxray-instrument` is enabled and a function is marked as "never
instrumented", we elide all calls to `__xray_customevent` in that
function; if either marked as "always instrumented" or subject to
threshold-based instrumentation, we emit a call to the
`llvm.xray.customevent` intrinsic from LLVM for each
`__xray_customevent` occurrence in the function.
This change depends on D27503 (to land in LLVM first).
Reviewers: echristo, rsmith
Subscribers: mehdi_amini, pelikan, lrl, cfe-commits
Differential Revision: https://reviews.llvm.org/D30018
llvm-svn: 302492
In r298391 we fixed the umbrella framework model to work when submodules
named "Private" are used. This complements the work by allowing the
umbrella framework model to work in general.
rdar://problem/31790067
llvm-svn: 302491
This patch fixes the test failures and unexpected passes that occur
when testing against GCC 7. Specifically:
* don't mark __gcd as always inline because it's a recursive function. GCC diagnoses this.
* don't XFAIL the aligned allocation tests. GCC 7 supports them but not the -faligned-allocation option.
* Work around gcc.gnu.org/PR78489 in variants constructors.
llvm-svn: 302488
Summary:
For inalloca functions, this is a very common code pattern:
%argpack = type <{ i32, i32, i32 }>
define void @f(%argpack* inalloca %args) {
entry:
%a = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 0
%b = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 1
%c = getelementptr inbounds %argpack, %argpack* %args, i32 0, i32 2
tail call void @llvm.dbg.declare(metadata i32* %a, ... "a")
tail call void @llvm.dbg.declare(metadata i32* %c, ... "b")
tail call void @llvm.dbg.declare(metadata i32* %b, ... "c")
Even though these GEPs can be simplified to a constant offset from EBP
or RSP, we don't do that at -O0, and each GEP is computed into a
register. Registers used to compute argument addresses are typically
spilled and clobbered very quickly after the initial computation, so
live debug variable tracking loses information very quickly if we use
DBG_VALUE instructions.
This change moves processing of dbg.declare between argument lowering
and basic block isel, so that we can ask if an argument has a frame
index or not. If the argument lives in a register as is the case for
byval arguments on some targets, then we don't put it in the side table
and during ISel we emit DBG_VALUE instructions.
Reviewers: aprantl
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32980
llvm-svn: 302483