Summary:
Current peeling implementation bails out in case of loop nests.
The patch introduces a field in TargetTransformInfo structure that
certain targets can use to relax the constraints if it's
profitable (disabled by default).
Also additional option is added to enable peeling manually for
experimenting and testing purposes.
Reviewers: fhahn, lebedev.ri, xbolva00
Reviewed By: xbolva00
Subscribers: RKSimon, xbolva00, hiraditya, zzheng, llvm-commits
Differential Revision: https://reviews.llvm.org/D70304
Summary:
This revision adds padding for 1-D Vector in the common case of x86
execution with a stadard data layout. This supports properly interfacing
codegen with arrays of e.g. `vector<9xf32>`.
Such vectors are already assumed padded to the next power of 2 by LLVM
codegen with the default x86 data layout:
```
define void @test_vector_add_1d_2_3(<3 x float>* nocapture readnone %0,
<3 x float>* nocapture readonly %1, i64 %2, i64 %3, i64 %4, <3 x float>*
nocapture readnone %5, <3 x float>* nocapture readonly %6, i64 %7, i64
%8, i64 %9, <3 x float>* nocapture readnone %10, <3 x float>* nocapture
%11, i64 %12, i64 %13, i64 %14) local_unnamed_addr {
%16 = getelementptr <3 x float>, <3 x float>* %6, i64 1
%17 = load <3 x float>, <3 x float>* %16, align 16
%18 = getelementptr <3 x float>, <3 x float>* %1, i64 1
%19 = load <3 x float>, <3 x float>* %18, align 16
%20 = fadd <3 x float> %17, %19
%21 = getelementptr <3 x float>, <3 x float>* %11, i64 1
```
The pointer addressing a `vector<3xf32>` is assumed aligned `@16`.
Similarly, the pointer addressing a `vector<65xf32>` is assumed aligned
`@512`.
This revision allows using objects such as `vector<3xf32>` properly with
the standard x86 data layout used in the JitRunner. Integration testing
is done out of tree, at the moment such testing fails without this
change.
Reviewers: ftynse
Subscribers: mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75459
These declarations use a mix of unsigned and signed argument and
return types. This is not in accordance with OpenCL v2.0 s6.13.11.
Differential Revision: https://reviews.llvm.org/D74910
Summary: This review is a mostly trivial change to use an explicit ABI flag for the unstable external template list. This follows the practice for an ABI flag per feature, and provides a spot for the rational / motivation for the flag.
Reviewers: EricWF, ldionne
Subscribers: dexonsmith, libcxx-commits
Tags: #libc
Differential Revision: https://reviews.llvm.org/D75457
Summary:
Enabling _GLIBCXX_DEBUG (implied by LLVM_ENABLE_EXPENSIVE_CHECKS) causes
std::min_element (and presumably others) to no longer be constexpr, which
in turn causes the build to fail.
This seems like a bug in the GCC STL. This change works around it.
Change-Id: I5fc471caa9c4de3ef4e87aeeac8df1b960e8e72c
Reviewers: tstellar, hans, serge-sans-paille
Differential Revision: https://reviews.llvm.org/D75199
- Remove unnecessary includes from the headers
- Fix cppcheck definition/declaration arg mismatch warnings
- Tidyup old comments (MVT usage was removed a long time ago)
- Use SmallVector::append for repeated mask entries
getReductionVars, getInductionVars and getFirstOrderRecurrences were all
being returned from LoopVectorizationLegality as pointers to lists. This
just changes them to be references, cleaning up the interface slightly.
Differential Revision: https://reviews.llvm.org/D75448
Summary: Added brackets to fix the loop trip count computation.
The brackets ensure the bounds are subtracted before we divide
the result by the step of the loop.
Differential Revision: https://reviews.llvm.org/D75449
I'm making the CHECK lines vague enough that they pass at -O0.
If that is too vague (we really want to check the data flow
to verify that the variables are not mismatched, etc), then
we can adjust those lines again to more closely match the output
at -O0 rather than -O1.
This change is based on the post-commit comments for:
83f4372f3ahttp://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20200224/307888.html
Summary:
This patch reverts 2c5ee78de1,
now kythe (https://github.com/kythe/kythe/issues/4381) supports returning ctors refs as part of class references, so
there is no need to query the ctor refs in the index (this would also
make the results worse, lots of duplications)
Reviewers: sammccall
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75439
There are no failures from the first set of RUN lines here,
so the CHECKs were already vague enough to not be affected
by optimizations. The final RUN line does induce some kind
of failure, so I'll try to fix that separately in a
follow-up.
Summary:
Disable merging of Type? into a single token.
Merge ?? ?. and ?[ into a single token.
Reviewers: krasimir, MyDeveloperDay
Reviewed By: krasimir
Subscribers: cfe-commits
Tags: #clang-format, #clang
Differential Revision: https://reviews.llvm.org/D75368
Summary:
This is to ensure that the template declaration is seen before
any template specialization.
Reviewers: mravishankar, antiagainst, rriddle!
Differential Revision: https://reviews.llvm.org/D75442
Summary:
All callers are already passing spelling locations to locateMacroAt.
Also there's no point at looking at macro expansion for figuring out undefs as
it is forbidden to have PP directives inside macro bodies.
Also fixes a bug when the previous sourcelocation is unavailable.
Reviewers: sammccall, hokein
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75259
This patch upstreams support for the ARM Armv8.1m cpu Cortex-M55.
In detail adding support for:
- mcpu option in clang
- Arm Target Features in clang
- llvm Arm TargetParser definitions
details of the CPU can be found here:
https://developer.arm.com/ip-products/processors/cortex-m/cortex-m55
Reviewers: chill
Reviewed By: chill
Subscribers: dmgreen, kristof.beyls, hiraditya, cfe-commits,
llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D74966
Summary:
Have a description object for the stream functions
that can store different aspects of a single stream operation.
I plan to extend the structure with other members,
for example pre-callback and index of the stream argument.
Reviewers: Szelethus, baloghadamsoftware, NoQ, martong, Charusso, xazax.hun
Reviewed By: Szelethus
Subscribers: rnkovacs, xazax.hun, baloghadamsoftware, szepet, a.sidorin, mikhail.ramalho, Szelethus, donat.nagy, dkrupp, gamesh411, Charusso, martong, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D75158
in C++ templates."
This was reverted in 802b22b5c8 due to
missing .bc file and a chromium bot failure.
https://bugs.chromium.org/p/chromium/issues/detail?id=1057559#c1
This revision address both of them.
Summary:
This patch adds support for debuginfo generation for defaulted
parameters in clang and also extends corresponding DebugMetadata/IR to support this feature.
Reviewers: probinson, aprantl, dblaikie
Reviewed By: aprantl, dblaikie
Differential Revision: https://reviews.llvm.org/D73462
A printer refactoring removed automatic newline printing in the printer
of a ModuleOp. As a consequence, mlir-opt no longer printed a newline
after the closing brace of a module, which made it hard to distinguish
when used from command line. Print the newline character explicitly in
mlir-opt.
Summary:
This patch adds the following LLVM IR intrinsics for SVE:
1. non-temporal gather loads
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
2. non-temporal scatter stores
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.ldnt1.gather.scalar.offset
These intrinsic are mapped to the corresponding SVE instructions
(example for half-words, zero-extending):
* ldnt1h { z0.s }, p0/z, [z0.s, x0]
* stnt1h { z0.s }, p0/z, [z0.s, x0]
Note that for non-temporal gathers/scatters, the SVE spec defines only
one instruction type: "vector + scalar". For this reason, we swap the
arguments when processing intrinsics that implement the "scalar +
vector" addressing mode:
* @llvm.aarch64.sve.ldnt1.gather
* @llvm.aarch64.sve.ldnt1.gather.uxtw
* @llvm.aarch64.sve.stnt1.scatter
* @llvm.aarch64.sve.ldnt1.gather.uxtw
In other words, all intrinsics for gather-loads and scatter-stores
implemented in this patch are mapped to the same load and store
instruction, respectively.
The sve2_mem_gldnt_vs multiclass (and it's counterpart for scatter
stores) from SVEInstrFormats.td was split into:
* sve2_mem_gldnt_vec_vs_32_ptrs (32bit wide base addresses)
* sve2_mem_gldnt_vec_vs_62_ptrs (64bit wide base addresses)
This is consistent with what we did for
@llvm.aarch64.sve.ld1.scalar.offset and highlights the actual split in
the spec and the implementation.
Reviewed by: sdesmalen
Differential Revision: https://reviews.llvm.org/D74858
Summary:
These instructions convert a vector of floats to a vector of integers
of the same size, with assorted non-default rounding modes.
Implemented in IR as target-specific intrinsics, because as far as I
can see there are no matches for that functionality in the standard IR
intrinsics list.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75255
Summary:
These instructions make a vector of `<4 x float>` by widening every
other lane of a vector of `<8 x half>`.
I wondered about representing these using standard IR, along the lines
of a shufflevector to extract elements of the input into a `<4 x half>`
followed by an `fpext` to turn that into `<4 x float>`. But it looks as
if that would take a lot of work in isel lowering to make it match any
pattern I could sensibly write in Tablegen, and also I haven't been
able to think of any other case where that pattern might be generated
in IR, so there wouldn't be any extra code generation win from doing
it that way.
Therefore, I've just used another target-specific intrinsic. We can
always change it to the other way later if anyone thinks of a good
reason.
(In order to put the intrinsic definition near similar things in
`IntrinsicsARM.td`, I've also lifted the definition of the
`MVEMXPredicated` multiclass higher up the file, without changing it.)
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: miyuki
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75254
Summary:
The two MVE instructions that convert between v4f32 and v8f16 were
implemented as instances of the same class, with the same MC operand
list.
But that's not really appropriate, because the narrowing conversion
only partially overwrites its output register (it only has 4 f16
values to write into a vector of 8), so even when unpredicated, it
needs a $Qd_src input, a constraint tying that to the $Qd output, and
a vpred_n.
The widening conversion is better represented like any other
instruction that completely replaces its output when unpredicated: it
should have no $Qd_src operand, and instead, a vpred_r containing a
$inactive parameter. That's a better match to other similar
instructions, such as its integer analogue, the VMOVL instruction that
makes a v4i32 by sign- or zero-extending every other lane of a v8i16.
This commit brings the widening VCVT.F32.F16 into line with the other
instructions that behave like it. That means you can write isel
patterns that use it unpredicated, without having to add a pointless
undefined $QdSrc operand.
No existing code generation uses that instruction yet, so there should
be no functional change from this fix.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75253
Summary:
These instructions work like VMOVN (narrowing a vector of wide values
to half size, and overwriting every other lane of an output register
with the result), except that the narrowing conversion is saturating.
They come in three signedness flavours: signed to signed, unsigned to
unsigned, and signed to unsigned. All are represented in IR by a
target-specific intrinsic that takes two separate 'unsigned' flags.
Reviewers: MarkMurrayARM, dmgreen, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75252
Summary:
In this patch I've done a slightly bigger rewrite to also remove the
hardcoded header lengths.
Reviewers: jhenderson, dblaikie, ikudrin
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75119
Summary:
This could be considered obvious, but I am putting it up to illustrate
the usefulness/impact of the getInitialLength change.
Reviewers: dblaikie, jhenderson, ikudrin
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75117
The MVE gather instructions smaller than 32bits zext extend the values
in the offset register, as opposed to sign extending them. We need to
make sure that the code that we select from is suitably extended, which
this patch attempts to fix by tightening up the offset checks.
Differential Revision: https://reviews.llvm.org/D75361
Move Base64 implementation from clangd/SemanticHighlighting to
llvm/Support/Base64, fix its implementation and provide a decent test suite.
Previous implementation code was using + operator instead of | to combine some
results, which is a problem when shifting signed values. (0xFF << 16) is
implicitly converted to a (signed) int, and thus results in 0xffff0000, which is
negative. Combining negative numbers with a + in that context is not what we
want to do.
This fixes https://github.com/llvm/llvm-project/issues/149.
Differential Revision: https://reviews.llvm.org/D75057