Commit Graph

64005 Commits

Author SHA1 Message Date
Jinsong Ji 8671191d26 [NFC][PowerPC] Small code refactor in LoopInstrFormPrep
Avoid some duplicate code.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D109083
2021-09-02 03:16:01 +00:00
Chen Zheng 2596120199 [PowerPC] small code format refactor ; NFC
address the code review comments in patch https://reviews.llvm.org/D105872
2021-09-02 01:39:32 +00:00
Jon Roelofs 9237eda304 Revert "[AArch64][GlobalISel] Legalize bswap <2 x i16>"
This reverts commit 5cd63e9ec2.

https://bugs.llvm.org/show_bug.cgi?id=51707

The sequence feeding in/out of the rev32/ushr isn't quite right:

 _swap:
         ldr     h0, [x0]
         ldr     h1, [x0, #2]
-        mov     v0.h[1], v1.h[0]
+        mov     v0.s[1], v1.s[0]
         rev32   v0.8b, v0.8b
         ushr    v0.2s, v0.2s, #16
-        mov     h1, v0.h[1]
+        mov     s1, v0.s[1]
         str     h0, [x0]
         str     h1, [x0, #2]
         ret
2021-09-01 16:49:20 -07:00
Stanislav Mekhanoshin f3645c792a [AMDGPU] Use S_BITCMP1_* to replace AND in optimizeCompareInstr
Differential Revision: https://reviews.llvm.org/D109082
2021-09-01 15:59:12 -07:00
Stanislav Mekhanoshin bf77b11277 [AMDGPU] Introduce optimizeCompareInstr
The following patterns are currently handled:

s_cmp_eq_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_eq_u64 (s_and_b64 $src, 1), 1 => s_and_b64 $src, 1
s_cmp_ge_u32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_ge_i32 (s_and_b32 $src, 1), 1 => s_and_b32 $src, 1
s_cmp_lg_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_lg_u64 (s_and_b64 $src, 1), 0 => s_and_b64 $src, 1
s_cmp_gt_u32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1
s_cmp_gt_i32 (s_and_b32 $src, 1), 0 => s_and_b32 $src, 1

Differential Revision: https://reviews.llvm.org/D109031
2021-09-01 15:57:05 -07:00
Alexander Pivovarov 4b04d54206 [RISCV] Fix typo in RISCVSchedSiFive7.td
Fix typo in "microarchitecure".

Differential Revision: https://reviews.llvm.org/D109006
2021-09-01 16:39:48 -05:00
David Green 49476a4d66 [ARM] Add MVE lowering for fptosi.sat
This adds lowering of the llvm.fptosi.sat and llvm.fptoui.sat intinsics,
selecting a VCVT instruction which under MVE will inherently perform the
saturate.

Differential Revision: https://reviews.llvm.org/D107865
2021-09-01 22:38:47 +01:00
alex-t e3cbf1d437 [AMDGPU] enable scalar compare in truncate selection
Currently, the truncate selection dag node is expanded as a bitwise AND plus compare to 1.  This change enables scalar comparison in the pattern if the truncate node is uniform.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108925
2021-09-01 23:35:11 +03:00
Nikita Popov 7f058ce8c2 [WebAssembly] Support opaque pointers in FixFunctionBitcasts
With opaque pointers, no actual bitcasts will be present. Instead,
there will be a mismatch between the call FunctionType and the
function ValueType. Change the code to collect CallBases
specifically (rather than general Uses) and compare these types.

RAUW is no longer performed, as there would no longer be any
bitcasts that can be RAUWd.

Differential Revision: https://reviews.llvm.org/D108880
2021-09-01 22:17:24 +02:00
Craig Topper ccbb4c8b4f [RISCV] Fold (RISCVISD::SELECT_CC X, Y, CC, Z, Z) -> Z.
If the true and false values are the same, we don't need a SELECT_CC.

This would normally be folded before a select is legalized to
select_cc. The test case exploits the late legalization of vscale
to trigger a case where they become identical after legalization.

This works around an issue found on a test case in D107957. In that
case the true/false values were both eventually 0 and the select was
used by a vector AVL operand. The select_cc got expanded to control
flow and a phi, but the phi inputs were both copies from X0. MachineIR
optimizations simplified this to a single copy from X0 going into the
vector instruction. This became the input of a vsetvli after vsetvli
insertion. Then register coalescing folded the copy into the vsetvli.
X0 as the source of a vsetvli is a special encoding and should not be
created by coalesing. We need to fix our vsetvli handling to make sure
this can never happen any other way, but removing the unneeded select
is still a worthwhile optimization.
2021-09-01 12:37:52 -07:00
Philip Reames 29fa37ec9f [SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch.

The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll

Differential Revision: https://reviews.llvm.org/D109015
2021-09-01 11:51:48 -07:00
Arthur Eubanks b9b419a13c [NFC] Remove redundant code added in 04ce2de3 2021-09-01 11:30:07 -07:00
Thomas Lively fec4749200 [WebAssembly] Lower v2f32 to v2f64 extending loads with promote_low
Previously extra wide v4f32 to v4f64 extending loads would be legalized to v2f32
to v2f64 extending loads, which would then be scalarized by legalization. (v2f32
to v2f64 extending loads not produced by legalization were already being emitted
correctly.) Instead, mark v2f32 to v2f64 extending loads as legal and explicitly
lower them using promote_low. This regresses the addressing modes supported for
the extloads not produced by legalization, but that's a fine trade off for now.

Differential Revision: https://reviews.llvm.org/D108496
2021-09-01 10:27:42 -07:00
Amara Emerson a86bbe1e31 [AArch64][GlobalISel] Handle any-extending FPR loads in manual selection code.
When we have an any-extending FPR bank load, none of the tablegen patterns
match and we fall back to the C++ selector. Like with the truncating stores
that were fixed recently, the C++ wasn't able to handle it and ended up
generating invalid copies between different size regclasses.

This change adds handling for this case, splitting the load into a regular
load and a SUBREG_TO_REG to extend it into the original wide destination reg.
2021-09-01 10:19:22 -07:00
hsmahesha 97688bfd3d Revert "Revert "Disable ReplaceLDS pass, patch up tests to match""
This reverts commit 5ae6804d17.
2021-09-01 21:52:50 +05:30
hsmahesha 5ae6804d17 Revert "Disable ReplaceLDS pass, patch up tests to match"
This reverts commit 50ad3478bd.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D109062
2021-09-01 21:19:39 +05:30
Alexander Kornienko 893ac53afc Fix -Wunused-variable 2021-09-01 11:29:30 +02:00
Christudasan Devadasan 4dab15288d [AMDGPU] Introduce RC flags for vector register classes
Configure and use the TSFlags in TargetRegisterClass to
have unique flags for VGPR and AGPR register classes.
The vector register class queries like `hasVGPRs` will
now become more efficient with just a bitwise operation.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108815
2021-09-01 02:55:45 -04:00
Kai Luo 5eaebd5d64 [PowerPC] Implement quadword atomic load/store
Add support to load/store i128 atomically.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D105612
2021-09-01 06:55:40 +00:00
Luke a78dd726f4 [SLP][RISCV] Implement unsigned getMinVectorRegisterBitWidth() for RISCV
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108973
2021-09-01 14:25:15 +08:00
hsmahesha 98f4713122 [AMDGPU] Split entry basic block after alloca instructions.
While initializing the LDS pointers within entry basic block of kernel(s), make
sure that the entry basic block is split after alloca instructions.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108971
2021-09-01 10:18:44 +05:30
Wang, Pengfei 74043caef2 [X86] Enable half type support in inline assembly constraints
Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105799
2021-09-01 09:29:31 +08:00
Nick Desaulniers e9b3f25730 [RISCVISelLowering] avoid emitting libcalls to __mulodi4() and __multi3()
Similar to D108842, D108844, D108926, D108928, and D108936.

__has_builtin(builtin_mul_overflow) returns true for 32b RISCV targets,
but Clang is deferring to compiler RT when encountering long long types.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D108939
2021-08-31 11:23:56 -07:00
Nick Desaulniers d8b6ae072d [PPCISelLowering] avoid emitting libcalls to __mulodi4()
Similar to D108842, D108844, and D108926.

__has_builtin(builtin_mul_overflow) returns true for 32b PPC targets,
but Clang is deferring to compiler RT when encountering long long types.
This breaks ppc44x_defconfig + CONFIG_BLK_DEV_NBD=y builds of the Linux
kernel that are using builtin_mul_overflow with these types for these
targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support these builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D108936
2021-08-31 11:09:58 -07:00
Joe Nash c96839265a [AMDGPU] Enable ds_min/ds_max on more subtargets
Adds patterns for f64 ds_min/ds_max. Shrinks HasLDSFPAtomics
scope to enable f32.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D108994

Change-Id: Id890b677841ee588b20d42b1bb3f4cdbf6e9ba1a
2021-08-31 13:22:31 -04:00
David Green 22c384129e [ARM] Add missing validForTailPredication for VMINNM/VMAXNM
Apparently this was missing, preventing the generation of tail
predication loops containing VMINNM, VMAXNM, VMINNMA and VMAXNMA.
2021-08-31 18:19:03 +01:00
Simon Pilgrim 9e2d14c285 [X86] Copy X86SchedSkylakeServer.td to X86SchedIceLake.td
Icelake, Rocketlake and Tigerlake targets currently use the SkylakeServer scheduler model, despite being a later microarchitecture, leading to both reported bugs (PR48110) and discrepancies when comparing llvm-mca reports to other profiling tools (OSACA, uops, uica, etc.). And tbh I'm getting sick of llvm-mca getting blamed for what are backend scheduler model issues :-(

This patch doesn't attempt to fix any of these discrepancies - there should be no changes in codegen - its a setup patch that copies the skx model, renames all the resources, adds the additional ports (but doesn't reference them yet) and updates the llvm-exegesis pfm counter mappings (based off https://sourceforge.net/p/perfmon2/libpfm4/ci/master/tree/lib/events/intel_icl_events.h).

This should make it trivial for anyone with hardware access to use llvm-exegesis reports to iteratively improve the model (my attempts to get hold of a cheap tiger lake box haven't been fruitful yet....).

I will copy the SkylakeServer llvm-mca resource tests as follow up commits - the diff should entirely be the resource renames.

Differential Revision: https://reviews.llvm.org/D108914
2021-08-31 11:57:20 +01:00
Simon Wallis f417b660ee [Arm] Add assert in T2 Imm7s code emitter
Add assert to provoke failure in object file output, not just in disassembly output.

Reviewed By: yroux

Differential Revision: https://reviews.llvm.org/D107259
2021-08-31 08:16:48 +01:00
Alexander Pivovarov eb946cc5b6 Fix typo in comments
Reviewed By: MaskRay, jsji

Differential Revision: https://reviews.llvm.org/D108857
2021-08-31 11:55:40 +05:30
Heejin Ahn 3419e85b15 [WebAssembly] Free setjmpTable before exiting calls in EmSjLj
This is an improvement over D107852. We don't need to enumerate specific
function names; we can just check for `noreturn` attribute. This also
requires us to make sure `__resumeExeption` and `emscripten_longjmp`
have `noreturn` attribute too; one of them is a JS function and the
other calls a JS function so Clang does not have a way to deduce they
don't return.

This is effectively NFC, because I'm not sure if there is an additional
case this case covers; if we add a custom function call that has
`noreturn` attribute, it will be processed within the SjLj handling and
turned into `__invoke` call. So this really applies to some special
functions like `emscripten_longjmp`.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108955
2021-08-30 21:46:25 -07:00
Heejin Ahn b8fc71b7ae [WebAssembly] Share rethrowing BBs in LowerEmscriptenEHSjLj
There are three kinds of "rethrowing" BBs in this pass:
1. In Emscripten SjLj, after a possibly longjmping function call, we
   check if the thrown longjmp corresponds to one of setjmps within the
   current function. If not, we rethrow the longjmp by calling
   `emscripten_longjmp`.
2. In Emscripten EH, after a possibly throwing function call, we check
   if the thrown exception corresponds to the current `catch` clauses.
   If not, we rethrow the exception by calling `__resumeException`.
3. When both Emscripten EH and SjLj are used, when we check for an
   exception after a possibly throwing function call, it is possible
   that we get not an exception but a longjmp. In this case, we
   shouldn't swallow it; we should rethrow the longjmp by calling
   `emscripten_longjmp`.
4. When both Emscripten EH and SjLj are used, when we check for a
   longjmp after a possibly longjmping function call, it is possible
   that we get not a longjmp but an exception. In this case, we
   shouldn't swallot it; we should rethrow the exception by calling
   `__resumeException`.

Case 1 is in Emscripten SjLj, 2 is in Emscripten EH, and 3 and 4 are
relevant when both Emscripten EH and SjLj are used. 3 and 4 were first
implemented in D106525.

We create BBs for 1, 3, and 4 in this pass. We create those BBs for
every throwing/longjmping function call, along with other BBs that
contain condition checks. What this CL does is to create a single BB
within a function for each of 1, 3, and 4 cases. These BBs are exiting
BBs in the function and thus don't have successors, so easy to be shared
between calls.

The names of BBs created are:
Case 1: `call.em.longjmp`
Case 3: `rethrow.exn`
Case 4: `rethrow.longjmp`

For the case 2 we don't currently create BBs; we only replace the
existing `resume` instruction with `call @__resumeException`. And Clang
already creates only a single `resume` BB per function and reuses it,
so we don't need to optimize this case.

Not sure what are good benchmarks for EH/SjLj, but this decreases the
size of the object file for `grfmt_jpeg.bc` (presumably from opencv) we
got from one of our users by 8.9%. Even after running `wasm-opt -O4` on
them, there is still 4.8% improvement.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D108945
2021-08-30 21:44:34 -07:00
Nikita Popov c1b7540645 [TTI] Sink IVDescriptors.h include (NFC)
Forward declare RecurrenceDescriptor and include IVDescritor.h
only in implementation code that actually needs it.
2021-08-30 22:41:58 +02:00
Owen Anderson db9de22f2b Teach the AArch64 backend patterns to generate the EOR3 instruction.
Adds patterns to match the EOR3 instruction.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D108793
2021-08-30 20:01:08 +00:00
David Green efa340fbd2 [ARM] Workaround tailpredication min/max costmodel
The min/max intrinsics are not yet canonical, but when they are the tail
predications analysis will change from treating them like icmp to
treating them like intrinsics. Unfortunately, they can currently produce
better code by not being tail predicated thanks to the vectorizer picking
higher VF's and the backend folding to better instructions (especially
for saturate patterns). In the long run we will need to improve the
vectorizers cost modelling, recognizing the instruction directly, but in
the meantime this treats min/max as before to prevent performance
regressions.
2021-08-30 19:19:51 +01:00
Nikita Popov 0529e2e018 [InstrInfo] Use 64-bit immediates for analyzeCompare() (NFCI)
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.

This is a followup to D108076.

Differential Revision: https://reviews.llvm.org/D108875
2021-08-30 19:46:04 +02:00
Kazu Hirata c50faffb4e [llvm] Remove redundant calls to str() and c_str() (NFC)
Identified with readability-redundant-string-cstr.
2021-08-30 09:05:05 -07:00
Craig Topper 0560a4adb3 [RISCV] Enable CONCAT_VECTORS for fixed FP vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D108487
2021-08-30 08:47:45 -07:00
Simon Pilgrim af2920ec6f [TTI][X86] getArithmeticInstrCost - move opcode canonicalization before all target-specific costs. NFCI.
The GLM/SLM special cases still get tested first but after the the MUL/DIV/REM pattern detection - this will be necessary for when we make the SLM vXi32 MUL canonicalization generic to improve PMULLW/PMULHW/PMADDDW cost support etc.
2021-08-30 12:24:59 +01:00
Wang, Pengfei ab40dbfe03 [X86] AVX512FP16 instructions enabling 6/6
Enable FP16 complex FMA instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105269
2021-08-30 13:08:45 +08:00
Qiu Chaofan 3bdd850d0c [PowerPC] Set branch/call instructions as no hasSideEffects
PowerPC can model these instructions, so we don't need this flag set.

Reviewed By: shchenz, jsji

Differential Revision: https://reviews.llvm.org/D71983
2021-08-30 12:23:35 +08:00
Vince Bridgers 55ba1de7c5 [X86] Remove X86LowerAMXType::getRowFromCol from X86LowerAMXType.cpp
Remove method X86LowerAMXType::getRowFromCol since it's not used, and
it's causing a warning.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D108862
2021-08-29 12:27:34 -05:00
Yonghong Song 4948927058 [BPF] support btf_tag attribute in .BTF section
A new kind BTF_KIND_TAG is added to .BTF to encode
btf_tag attributes. The format looks like
   CommonType.name : attribute string
   CommonType.type : attached to a struct/union/func/var.
   CommonType.info : encoding BTF_KIND_TAG
                     kflag == 1 to indicate the attribute is
                     for CommonType.type, or kflag == 0
                     for struct/union member or func argument.
   one uint32_t    : to encode which member/argument starting from 0.

If one particular type or member/argument has more than one attribute,
multiple BTF_KIND_TAG will be generated.

Differential Revision: https://reviews.llvm.org/D106622
2021-08-28 21:02:27 -07:00
Nikita Popov 16086d47c0 [WebAssembly] Fix FastISel of condition in different block (PR51651)
If the icmp is in a different block, then the register for the icmp
operand may not be initialized, as it nominally does not have
cross-block uses. Add a check that the icmp is in the same block
as the branch, which should be the common case.

This matches what X86 FastISel does:
5b6b090cf2/llvm/lib/Target/X86/X86FastISel.cpp (L1648)

The "not" transform that could have a similar issue is dropped
entirely, because it is currently dead: The incoming value is
a branch or select condition of type i1, but this code requires
an i32 to trigger.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51651.

Differential Revision: https://reviews.llvm.org/D108840
2021-08-28 10:28:24 +02:00
Nick Desaulniers c8c176d999 [MipsISelLowering] avoid emitting libcalls to __mulodi4()
__has_builtin(__builtin_mul_overflow) returns true for 32b MIPS targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support malta_defconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108844
2021-08-27 15:15:36 -07:00
Nick Desaulniers 5c91b98c5d [ARMISelLowering] avoid emitting libcalls to __mulodi4()
__has_builtin(__builtin_mul_overflow) returns true for 32b ARM targets,
but Clang is deferring to compiler RT when encountering `long long`
types. This breaks sanitizer builds of the Linux kernel that are using
__builtin_mul_overflow with these types for these targets.

If the semantics of __has_builtin mean "the compiler resolves these,
always" then we shouldn't conditionally emit a libcall.

This will still need to be worked around in the Linux kernel in order to
continue to support allmodconfig builds of the Linux kernel for this
target with older releases of clang.

Link: https://bugs.llvm.org/show_bug.cgi?id=28629
Link: https://github.com/ClangBuiltLinux/linux/issues/1438

Reviewed By: rengolin

Differential Revision: https://reviews.llvm.org/D108842
2021-08-27 15:14:47 -07:00
Craig Topper dbf0d8118c [RISCV] Use ~0ULL instead of ~0U when checking for invalid ErrorInfo.
ErrorInfo is a uint64_t and is initialized to all 1s.

Not sure how to test this. Noticed while working on .insn support.
2021-08-27 12:30:33 -07:00
Roman Lebedev 6734018041
[Codegen][X86] EltsFromConsecutiveLoads(): if only have AVX1, ensure that the "load" is actually foldable (PR51615)
This fixes another reproducer from https://bugs.llvm.org/show_bug.cgi?id=51615
And again, the fix lies not in the code added in D105390

In this case, we completely don't check that the "broadcast-from-mem" we create
can actually fold the load. In this case, it's operand was not a load at all:
```
Combining: t16: v8i32 = vector_shuffle<0,u,u,u,0,u,u,u> t14, undef:v8i32
Creating new node: t29: i32 = undef
RepeatLoad:
t8: i32 = truncate t7
  t7: i64 = extract_vector_elt t5, Constant:i64<0>
    t5: v2i64,ch = load<(load (s128) from %ir.arg)> t0, t2, undef:i64
      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t1: i64 = Register %0
      t4: i64 = undef
    t3: i64 = Constant<0>
Combining: t15: v8i32 = undef

```

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D108821
2021-08-27 20:26:53 +03:00
Philipp Krones 54e8cae565 [MC][RISCV] Add RISCV MCObjectFileInfo
This makes sure, that the text section will have a 2-byte alignment, if
the +c extension is enabled.

Reviewed By: MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D102052
2021-08-27 18:23:29 +01:00
Craig Topper 0eeab8b282 [RISCV] Add -riscv-v-fixed-length-vector-elen-max to limit the ELEN used for fixed length vectorization.
This adds an ELEN limit for fixed length vectors. This will scalarize
any elements larger than this. It will also disable some fractional
LMULs. For example, if ELEN=32 then mf8 becomes illegal, i32/f32
vectors can't use any fractional LMULs, i16/f16 can only use mf2,
and i8 can use mf2 and mf4.

We may also need something for the scalable vectors, but that has
interactions with the intrinsics and we can't scalarize a scalable
vector.

Longer term this should come from one of the Zve* features
2021-08-27 10:17:35 -07:00
Jun Ma 15b2a8e7fa [AArch64][SVE] Optimize ptrue predicate pattern with known sve register width.
For vectors that are exactly equal to getMaxSVEVectorSizeInBits, just use
AArch64SVEPredPattern::all, which can enable the use of unpredicated ptrue when available.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D108706
2021-08-27 20:03:48 +08:00