Replaces setTargetShuffleZeroElements with getTargetShuffleAndZeroables which reports the Zeroable elements but doesn't merge them into the decoded target shuffle mask (the merging has been moved up into getTargetShuffleInputs until we can get rid of it entirely).
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle mask isn't adjusted by its source inputs but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373867
Summary:
The overload resolution for enums with a fixed underlying type has changed in the C++14 standard. This patch implements the new rule.
Patch by Mark de Wever!
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65695
llvm-svn: 373866
Summary:
The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions.
I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68428
llvm-svn: 373864
Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare.
Reviewers: RKSimon, spatel, efriedma
Reviewed By: spatel
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68359
llvm-svn: 373863
Summary:
When using a user-defined conversion function template with a deduced return type the compiler gives a set of warnings:
```
bug.cc:252:44: error: cannot specify any part of a return type in the declaration of a conversion function; use an alias template to declare a conversion to 'auto (Ts &&...) const'
template <typename... Ts> operator auto()(Ts &&... xs) const;
^~~~~~~~~~~~~~~~~~~
bug.cc:252:29: error: conversion function cannot convert to a function type
template <typename... Ts> operator auto()(Ts &&... xs) const;
^
error: pointer to function type cannot have 'const' qualifier
```
after which it triggers an assertion failure. It seems the last error is incorrect and doesn't have any location information. This patch stops the compilation after the second warning.
Fixes bug 31422.
Patch by Mark de Wever!
Reviewers: rsmith
Reviewed By: rsmith
Subscribers: bbannier, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64820
llvm-svn: 373862
Summary: The assertion in getLoopGuardBranch can be a 'return nullptr'
under if condition.
Authored By: DTharun
Reviewer: Whitney, fhahn
Reviewed By: Whitney, fhahn
Subscribers: fhahn, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D66084
llvm-svn: 373857
Removes Programming Documentation page. Also moves existing topics on Programming Documentation page to User Guides and Reference pages.
llvm-svn: 373856
Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform.
Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68250
llvm-svn: 373850
We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly.
This allows us to remove createTargetShuffleMask.
This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask.
llvm-svn: 373846
Summary: The Google C++ and Chromium style guides are broken in the clang-format docs. This patch updates them.
Reviewers: djasper, MyDeveloperDay
Reviewed By: MyDeveloperDay
Subscribers: cfe-commits
Tags: #clang
Patch by: m4tx
Differential Revision: https://reviews.llvm.org/D61256
llvm-svn: 373844
As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop.
This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended.
Differential Revision: https://reviews.llvm.org/D68226
llvm-svn: 373834
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708https://bugs.llvm.org/show_bug.cgi?id=43146
We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).
Here, we add a scalar cost model adjustment with a conservative pattern match and cost
summation for a multi-instruction sequence that can probably be reduced later.
This should prevent SLP from creating a vector reduction unless that sequence is
extremely cheap.
In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:
movbe rax, qword ptr [rdi]
or:
mov rax, qword ptr [rdi]
Not some (half) vector monstrosity as we currently do using SLP:
vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rdi + 5]
shl rcx, 40
movzx edx, byte ptr [rdi + 6]
shl rdx, 48
or rdx, rcx
movzx ecx, byte ptr [rdi + 7]
shl rcx, 56
or rcx, rdx
or rcx, rax
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
or rax, rcx
vzeroupper
ret
Differential Revision: https://reviews.llvm.org/D67841
llvm-svn: 373833
Rename some variables to match lowerShuffleAsRepeatedMaskAndLanePermute - prep work toward adding some equivalent sublane functionality.
llvm-svn: 373832
Added some tests testing urem and srem operations with a constant divisor.
Patch by TG908 (Tim Gymnich)
Differential Revision: https://reviews.llvm.org/D68421
llvm-svn: 373830
The static analyzer is warning about potential null dereferences, but we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373829
The static analyzer is warning about potential null dereferences, but we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373827
The static analyzer is warning about potential null dereferences, but we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373826
The static analyzer is warning about potential null dereferences, but we should be able to use castAs<> directly and if not assert will fire for us.
llvm-svn: 373824
Summary:
This patch makes the `SpacesInSquareBrackets` setting also apply to C++ lambdas with parameters.
Looking through the revision history, it appears support for only array brackets was added, and lambda brackets were ignored. Therefore, I am inclined to think it was simply an omission, rather than a deliberate choice.
See https://bugs.llvm.org/show_bug.cgi?id=17887 and https://reviews.llvm.org/D4944.
Reviewers: MyDeveloperDay, reuk, owenpan
Reviewed By: MyDeveloperDay
Subscribers: cfe-commits
Patch by: mitchell-stellar
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D68473
llvm-svn: 373821
This looks like a defect in gcc-5 where it chooses a constexpr
constructor from the initializer-list that it considers to be explicit.
I've tried to reproduce but I can't install anything prior to gcc-6 easily
on my system, and that doesn't have the error. So this is speculative
pacification.
Reported by Steven Wan.
llvm-svn: 373820
The motivation is to reuse the key value parsing logic here to
parse instance specific pass options within the context of MLIR.
The primary functionality exposed is the "," splitting for
arrays and the logic for properly handling duplicate definitions
of a single flag.
Patch by: Parker Schuh <parkers@google.com>
Differential Revision: https://reviews.llvm.org/D68294
llvm-svn: 373815
This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block.
Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well.
Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant".
llvm-svn: 373814