Commit Graph

145210 Commits

Author SHA1 Message Date
Craig Topper 85f3f6b3cc [RISCV] Lower scalable vector masked loads to intrinsics to match fixed vectors and reduce isel patterns.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98840
2021-03-19 10:39:35 -07:00
Fraser Cormack d399b82e2a [RISCV] Maintain fixed-length info when optimizing BUILD_VECTORs
I'm not sure how I failed to notice this before, but when optimizing
dominant-element BUILD_VECTORs we would lower via the scalable container type,
which lost us the information about the fixed length of the vector types. By
lowering via the fixed-length type we can preserve that information and
eliminate redundant vsetvli instructions.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98938
2021-03-19 17:21:06 +00:00
Philip Reames 00d0315a7c [SCEV] Factor out a lambda for strict condition splitting [NFC] 2021-03-19 10:07:12 -07:00
Fraser Cormack 550292ecb1 [RISCV] Fix missing scalable->fixed-length vector conversion
Returning the scalable-vector container type would present problems when
the fixed-length INSERT_VECTOR_ELT was used by later operations.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98776
2021-03-19 16:49:47 +00:00
Simon Pilgrim 9d2df96407 [DAG] computeKnownBits - add ISD::MULHS/MULHU/SMUL_LOHI/UMUL_LOHI handling
Reuse the existing KnownBits multiplication code to handle the 'extend + multiply + extract high bits' pattern for multiply-high ops.

Noticed while looking at the codegen for D88785 / D98587 - the patch helps division-by-constant expansion code in particular, which suggests that we might have some further KnownBits div/rem cases we could handle - but this was far easier to implement.

Differential Revision: https://reviews.llvm.org/D98857
2021-03-19 16:02:31 +00:00
Stanislav Mekhanoshin 57effe2205 [AMDGPU] Remove dead glc1 handing in asm parser. NFC. 2021-03-19 08:37:47 -07:00
Simon Pilgrim ffb2887103 [DAG] Fold shuffle(bop(shuffle(x,y),shuffle(z,w)),undef) -> bop(shuffle'(x,y),shuffle'(z,w))
Followup to D96345, handle unary shuffles of binops (as well as binary shuffles) if we can merge the shuffle with inner operand shuffles.

Differential Revision: https://reviews.llvm.org/D98646
2021-03-19 14:14:56 +00:00
Paul C. Anagnostopoulos a9fc44c557 [TableGen] Improve handling of template arguments
This requires changes to TableGen files and some C++ files due to
incompatible multiclass template arguments that slipped through
before the improved handling.
2021-03-19 09:57:53 -04:00
Ricky Taylor 028d6250ea [M68k] Replace unknown operand with explicit type
Replace the unknown operand used for immediate operands for DIV/MUL with a fixed 16-bit immediate.

This is required since the assembly parser generator requires that all operands are typed.

Differential Revision: https://reviews.llvm.org/D98819
2021-03-19 13:44:46 +00:00
Jeroen Dobbelaere 04790d9cfb Support intrinsic overloading on unnamed types
This patch adds support for intrinsic overloading on unnamed types.

This fixes PR38117 and PR48340 and will also be needed for the Full Restrict Patches (D68484).

The main problem is that the intrinsic overloading name mangling is using 's_s' for unnamed types.
This can result in identical intrinsic mangled names for different function prototypes.

This patch changes this by adding a '.XXXXX' to the intrinsic mangled name when at least one of the types is based on an unnamed type, ensuring that we get a unique name.

Implementation details:
- The mapping is created on demand and kept in Module.
- It also checks for existing clashes and recycles potentially existing prototypes and declarations.
- Because of extra data in Module, Intrinsic::getName needs an extra Module* argument and, for speed, an optional FunctionType* argument.
- I still kept the original two-argument 'Intrinsic::getName' around which keeps the original behavior (providing the base name).
-- Main reason is that I did not want to change the LLVMIntrinsicGetName version, as I don't know how acceptable such a change is
-- The current situation already has a limitation. So that should not get worse with this patch.
- Intrinsic::getDeclaration and the verifier are now using the new version.

Other notes:
- As far as I see, this should not suffer from stability issues. The count is only added for prototypes depending on at least one anonymous struct
- The initial count starts from 0 for each intrinsic mangled name.
- In case of name clashes, existing prototypes are remembered and reused when that makes sense.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D91250
2021-03-19 14:34:25 +01:00
Nemanja Ivanovic a8697c57fa [PowerPC] Fix the check for 16-bit signed field in peephole
When a D-Form instruction is fed by an add-immediate, we attempt
to merge the two immediates to form a single displacement so we
can remove the add-immediate.

However, we don't check whether the new displacement fits into
a 16-bit signed immediate field early enough. Namely, we do a
sign-extend from 16 bits first which will discard high bits and
then we check whether the result is a 16-bit signed immediate.
It of course will always be.

Move the check prior to the sign extend to ensure we are checking
the correct value.

Fixes https://bugs.llvm.org/show_bug.cgi?id=49640
2021-03-19 07:15:53 -05:00
Abhina Sreeskantharajan 4f750f6ebc [SystemZ][z/OS] Distinguish between text and binary files on z/OS
This patch consists of the initial changes to help distinguish between text and binary content correctly on z/OS. I would like to get feedback from Windows users on setting OF_None for all ToolOutputFiles. This seems to have been done as an optimization to prevent CRLF translation on Windows in the past.

Reviewed By: zibi

Differential Revision: https://reviews.llvm.org/D97785
2021-03-19 08:09:57 -04:00
Ricky Taylor cd442157cf [M68k] Convert register Aliases to AltNames
This makes it simpler to determine when two registers are actually the
same vs just partially aliasing.

The only real caveat is that it becomes impossible to know which name
was used for the register previously. (i.e. parsing assembly and then
disassembling it can result in the register name changing.)

Differential Revision: https://reviews.llvm.org/D98536
2021-03-19 11:44:53 +00:00
Ricky Taylor 51884c6bef [M68k] Introduce DReg bead
This is required in order to determine during disassembly whether a
Reg bead without associated DA bead is referring to a data register.

Differential Revision: https://reviews.llvm.org/D98534
2021-03-19 11:44:53 +00:00
Jay Foad 5a5a531214 [AMDGPU] Remove some redundant code. NFC.
This is redundant because we have already checked that we can't handle
divergent 64-bit atomic operands.
2021-03-19 11:36:15 +00:00
Jay Foad 5dd5ddcb41 [AMDGPU] Skip building some IR if it won't be used. NFC. 2021-03-19 11:36:14 +00:00
Jay Foad c96dfe0d8b [AMDGPU] Sink Intrinsic::getDeclaration calls to where they are used. NFC. 2021-03-19 11:36:14 +00:00
Simon Pilgrim a96897219d [KnownBits] Add knownbits analysis for mulhs/mulu 'multiply high' instructions
Split off from D98857

https://reviews.llvm.org/D98866
2021-03-19 08:56:06 +00:00
Mikael Holmen 6d22ba48ea [NVPTX] Fix warning, remove extra ";" [NFC]
gcc complained with
../lib/Target/NVPTX/NVPTXLowerArgs.cpp:203:2: warning: extra ';' [-Wpedantic]
  203 | };
      |  ^
2021-03-19 09:26:14 +01:00
Max Kazantsev 8eefa07fcf [NFC] Move function up in code 2021-03-19 14:03:31 +07:00
Max Kazantsev 8bb952b57f [NFC] Factor out utility function for finding common dom of user set 2021-03-19 13:49:29 +07:00
Fangrui Song c241659d15 [X86] Fix -Wunused-function in -DLLVM_ENABLE_ASSERTIONS=off builds 2021-03-18 23:22:58 -07:00
Max Kazantsev 16370e02a7 [IndVars] Provide eliminateIVComparison with context
We can prove more predicates when we have a context when eliminating ICmp.
As first (and very obvious) approximation we can use the ICmp instruction itself,
though in the future we are going to use a common dominator of all its users.
Need some refactoring before that.

Observed ~0.5% negative compile time impact.

Differential Revision: https://reviews.llvm.org/D98697
Reviewed By: lebedev.ri
2021-03-19 12:28:22 +07:00
Wenlei He 1410db70b9 [CSSPGO] Add attribute metadata for context profile
This changes adds attribute field for metadata of context profile. Currently we have an inline attribute that indicates whether the leaf frame corresponding to a context profile was inlined in previous build.

This will be used to help estimating inlining and be taken into account when trimming context. Changes for that in llvm-profgen will follow. It will also help tuning.

Differential Revision: https://reviews.llvm.org/D98823
2021-03-18 22:00:56 -07:00
Max Kazantsev fff1363ba0 [SCEV] Add false->any implication
By definition of Implication operator, `false -> true` and `false -> false`. It means that
`false` implies any predicate, no matter true or false. We don't need to go any further
trying to prove the statement we need and just always say that `false` implies it in this case.

In practice it means that we are trying to prove something guarded by `false` condition,
which means that this code is unreachable, and we can safely prove any fact or perform any
transform in this code.

Differential Revision: https://reviews.llvm.org/D98706
Reviewed By: lebedev.ri
2021-03-19 11:29:48 +07:00
Philip Reames fa26da0582 Add a couple of missing attribute query methods [NFC] 2021-03-18 17:33:20 -07:00
Hsiangkai Wang aa8d33a6d6 [RISCV] Spilling for Zvlsseg registers.
For Zvlsseg, we create several tuple register classes. When spilling for
these tuple register classes, we need to iterate NF times to load/store
these tuple registers.

Differential Revision: https://reviews.llvm.org/D98629
2021-03-19 07:46:16 +08:00
Fangrui Song 9558456b53 [SanitizerCoverage] Make __start_/__stop_ symbols extern_weak
On ELF, we place the metadata sections (`__sancov_guards`, `__sancov_cntrs`,
`__sancov_bools`, `__sancov_pcs` in section groups (either `comdat any` or
`comdat noduplicates`).

With `--gc-sections`, LLD since D96753 and GNU ld `-z start-stop-gc` may garbage
collect such sections. If all `__sancov_bools` are discarded, LLD will error
`error: undefined hidden symbol: __start___sancov_cntrs` (other sections are similar).

```
% cat a.c
void discarded() {}
% clang -fsanitize-coverage=func,trace-pc-guard -fpic -fvisibility=hidden a.c -shared -fuse-ld=lld -Wl,--gc-sections
...
ld.lld: error: undefined hidden symbol: __start___sancov_guards
>>> referenced by a.c
>>>               /tmp/a-456662.o:(sancov.module_ctor_trace_pc_guard)
```

Use the `extern_weak` linkage (lowered to undefined weak symbols) to avoid the
undefined error.

Differential Revision: https://reviews.llvm.org/D98903
2021-03-18 16:46:04 -07:00
Craig Topper c9861f722e [RISCV] Correct the output chain in lowerFixedLengthVectorMaskedLoadToRVV
We returned the input chain instead of the output chain from the
new load. This bypasses the load in the chain. I haven't found a
good way to test this yet. IR order prevents my initial attempts
at causing reordering.
2021-03-18 16:34:35 -07:00
George Balatsouras d10f173f34 [dfsan] Add -dfsan-fast-8-labels flag
This is only adding support to the dfsan instrumentation pass but not
to the runtime.

Added more RUN lines for testing: for each instrumentation test that
had a -dfsan-fast-16-labels invocation, a new invocation was added
using fast8.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D98734
2021-03-18 16:28:42 -07:00
Jessica Paquette 0ca83730cc Recommit "[AArch64][GlobalISel] Fold constants into G_GLOBAL_VALUE"
This reverts commit 962b73dd0f.

This commit was reverted because of some internal SPEC test failures.

It turns out that this wasn't actually relevant to anything in open source, so
it's safe to recommit this.
2021-03-18 16:01:02 -07:00
Craig Topper 182b831aeb [DAGCombiner][RISCV] Teach visitMGATHER/MSCATTER to remove gather/scatters with all zeros masks that use SPLAT_VECTOR.
Previously only all zeros BUILD_VECTOR was recognized.
2021-03-18 15:34:14 -07:00
Yuanfang Chen b4a8c0ebb6 [LTO][MC] Discard non-prevailing defined symbols in module-level assembly
This is the alternative approach to D96931.

In LTO, for each module with inlineasm block, prepend directive ".lto_discard <sym>, <sym>*" to the beginning of the inline
asm.  ".lto_discard" is both a module inlineasm block marker and (optionally) provides a list of symbols to be discarded.

In MC while emitting for inlineasm, discard symbol binding & symbol
definitions according to ".lto_disard".

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D98762
2021-03-18 15:33:42 -07:00
Stanislav Mekhanoshin edd6da10d2 [AMDGPU] Remove cpol, tfe, and swz from MUBUF patterns
These are always selected as 0 anyway.

Differential Revision: https://reviews.llvm.org/D98663
2021-03-18 14:36:04 -07:00
Mehdi Amini 3614df3537 Revert "[VPlan] Add plain text (not DOT's digraph) dumps"
This reverts commit 6b053c9867.
The build is broken:

ld.lld: error: undefined symbol: llvm::VPlan::printDOT(llvm::raw_ostream&) const
>>> referenced by LoopVectorize.cpp
>>>               LoopVectorize.cpp.o:(llvm::LoopVectorizationPlanner::printPlans(llvm::raw_ostream&)) in archive lib/libLLVMVectorize.a
2021-03-18 19:20:39 +00:00
Andrei Elovikov 6b053c9867 [VPlan] Add plain text (not DOT's digraph) dumps
I foresee two uses for this:
1) It's easier to use those in debugger.
2) Once we start implementing more VPlan-to-VPlan transformations (especially
   inner loop massaging stuff), using the vectorized LLVM IR as CHECK targets in
   LIT test would become too obscure. I can imagine that we'd want to CHECK
   against VPlan dumps after multiple transformations instead. That would be
   easier with plain text dumps than with DOT format.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D96628
2021-03-18 11:33:39 -07:00
Thomas Lively f5764a8654 [WebAssembly] Finalize SIMD names and opcodes
Updates the names (e.g. widen => extend, saturate => sat) and opcodes of all
SIMD instructions to match the finalized SIMD spec. Deliberately does not change
the public interface in wasm_simd128.h yet; that will require more care.

Depends on D98466.

Differential Revision: https://reviews.llvm.org/D98676
2021-03-18 11:21:25 -07:00
Thomas Lively 2f2ae08da9 [WebAssembly] Remove experimental SIMD instructions
Removes the instruction definitions, intrinsics, and builtins for qfma/qfms,
signselect, and prefetch instructions, which were not included in the final
WebAssembly SIMD spec.

Depends on D98457.

Differential Revision: https://reviews.llvm.org/D98466
2021-03-18 11:21:24 -07:00
Thomas Lively 8638c897f4 [WebAssembly] Remove unimplemented-simd target feature
Now that the WebAssembly SIMD specification is finalized and engines are
generally up-to-date, there is no need for a separate target feature for gating
SIMD instructions that engines have not implemented. With this change,
v128.const is now enabled by default with the simd128 target feature.

Differential Revision: https://reviews.llvm.org/D98457
2021-03-18 10:23:12 -07:00
Peter Waller 0d6482a76a [llvm][AArch64][SVE] Lower fixed length vector fabs
Seemingly striaghtforward.

Differential Revision: https://reviews.llvm.org/D98434
2021-03-18 17:20:08 +00:00
Stanislav Mekhanoshin 961e4384f4 [AMDGPU] Support SCC on buffer atomics
Differential Revision: https://reviews.llvm.org/D98731
2021-03-18 09:56:14 -07:00
Wei Mi 14756b70ee [SampleFDO] Don't mix up the existing indirect call value profile with the new
value profile annotated after inlining.

In https://reviews.llvm.org/D96806 and https://reviews.llvm.org/D97350, we
use the magic number -1 in the value profile to avoid repeated indirect call
promotion to the same target for an indirect call. Function updateIDTMetaData
is used to mark an target as being promoted in the value profile with the
magic number. updateIDTMetaData is also used to update the value profile
when an indirect call is inlined and new inline instance profile should be
applied. For the second case, currently updateIDTMetaData mixes up the
existing value profile of the indirect call with the new profile, leading
to the problematic senario that a target count is larger than the total count
in the value profile.

The patch fixes the problem. When updateIDTMetaData is used to update the
value profile after inlining, all the values in the existing value profile
will be dropped except the values with the magic number counts.

Differential Revision: https://reviews.llvm.org/D98835
2021-03-18 09:54:34 -07:00
Mircea Trofin 92ccc6cb17 Reapply "[NPM][CGSCC] FunctionAnalysisManagerCGSCCProxy: do not clear immutable function passes"
This reverts commit 11b70b9e3a.

The bot failure was due to ArgumentPromotion deleting functions
without deleting their analyses. This was separately fixed in 4b1c807.
2021-03-18 09:44:34 -07:00
Mircea Trofin 4b1c8070bb [NFC][ArgumentPromotion] Clear FAM cached results of erased function.
Not doing it here can lead to subtle bugs - the analysis results are
associated by the Function object's address. Nothing stops the memory
allocator from allocating new functions at the same address.
2021-03-18 09:17:32 -07:00
Chris Lattner ced7256778 [libsupport] Silence a bogus valgrind warning.
Valgrind is reporting this bogus warning because it doesn't model
pthread_sigmask fully accurately.  This is a valgrind bug, but
silencing it has effectively no cost, so just do it.

==73662== Syscall param __pthread_sigmask(set) points to uninitialised byte(s)
==73662==    at 0x101E9D4C2: __pthread_sigmask (in /usr/lib/system/libsystem_kernel.dylib)
==73662==    by 0x101EFB5EA: pthread_sigmask (in /usr/lib/system/libsystem_pthread.dylib)
==73662==    by 0x1000D9F6D: llvm::sys::Process::SafelyCloseFileDescriptor(int) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662==    by 0x100072795: llvm::ErrorOr<std::__1::unique_ptr<llvm::MemoryBuffer, std::__1::default_delete<llvm::MemoryBuffer> > > getFileAux<llvm::MemoryBuffer>(llvm::Twine const&, long long, unsigned long long, unsigned long long, bool, bool) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662==    by 0x100072573: llvm::MemoryBuffer::getFileOrSTDIN(llvm::Twine const&, long long, bool) (in /Users/chrisl/Projects/circt/build/bin/firtool)
==73662==    by 0x100282C25: mlir::openInputFile(llvm::StringRef, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) (in /Users/chrisl/Projects/circt/build/bin

Differential Revision: https://reviews.llvm.org/D98830
2021-03-18 09:09:20 -07:00
Stanislav Mekhanoshin 3f37c28230 [AMDGPU] Remove unused template parameters of MUBUF_Real_AllAddr_vi
Differential Revision: https://reviews.llvm.org/D98804
2021-03-18 09:02:38 -07:00
Jon Chesterfield 253f804deb [amdgpu] Update med3 combine to skip i64
[amdgpu] Update med3 combine to skip i64

Fixes an assumption that a type which is not i32 will be i16. This asserts
when trying to sign/zero extend an i64 to i32.

Test case was cut down from an openmp application. Variations on it are hit by
other combines before reaching the problematic one, e.g. replacing the
immediate values with other function arguments changes the codegen path and
misses this combine.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D98872
2021-03-18 15:56:41 +00:00
Simon Pilgrim 1ba5c550d4 [DAG] Improve folding (sext_in_reg (*_extend_vector_inreg x)) -> (sext_vector_inreg x)
Extend this to support ComputeNumSignBits of the (used) source vector elements so that we can handle more than just the case where we're sext_in_reg from the source element signbit.

Noticed while investigating the poor codegen in D98587.
2021-03-18 15:34:53 +00:00
Sid Manning c539be1dcb [Hexagon] Add support for named registers cs0 and cs1
Allow inline assembly code to referece cs0 and cs1.
2021-03-18 09:53:22 -05:00
Andrew Savonichev e6ce0db378 [MCA] Ensure that writes occur in-order
Delay the issue of a new instruction if that leads to out-of-order
commits of writes.

This patch fixes the problem described in:
https://bugs.llvm.org/show_bug.cgi?id=41796#c3

Differential Revision: https://reviews.llvm.org/D98604
2021-03-18 17:10:20 +03:00
Matt Arsenault b9a0384983 GlobalISel: Preserve source value information for outgoing byval args
Pass through the original argument IR value in order to preserve the
aliasing information in the memcpy memory operands.
2021-03-18 09:16:54 -04:00
Matt Arsenault 61f834cc09 GlobalISel: Insert memcpy for outgoing byval arguments
byval requires an implicit copy between the caller and callee such
that the callee may write into the stack area without it modifying the
value in the parent. Previously, this was passing through the raw
pointer value which would break if the callee wrote into it.

Most of the time, this copy can be optimized out (however we don't
have the optimization SelectionDAG does yet).

This will trigger more fallbacks for AMDGPU now, since we don't have
legalization for memcpy yet (although we should stop using byval
anyway).
2021-03-18 09:16:54 -04:00
Alexey Bataev b3ced9852c [SLP]Fix crash on extending scheduling region.
If SLP vectorizer tries to extend the scheduling region and runs out of
the budget too early, but still extends the region to the new ending
instructions (i.e., it was able to extend the region for the first
instruction in the bundle, but not for the second), the compiler need to
recalculate dependecies in full, just like if the extending was
successfull. Without it, the schedule data chunks may end up with the
wrong number of (unscheduled) dependecies and it may end up with the
incorrect function, where the vectorized instruction does not dominate
on the extractelement instruction.

Differential Revision: https://reviews.llvm.org/D98531
2021-03-18 06:11:08 -07:00
Max Kazantsev 26ec76add5 [NFC] One more use case for evaluatePredicate 2021-03-18 19:21:29 +07:00
Max Kazantsev 1067a13cc1 [NFC] Use evaluatePredicate in eliminateComparison
Just makes code simpler.
2021-03-18 19:21:29 +07:00
Max Kazantsev b3a1500ea8 [SCEV][NFC] API for predicate evaluation
Provides API that allows to check predicate for being true or
false with one call. Current implementation is naive and just
calls isKnownPredicate twice, but further we can rework this
logic trying to use one check to prove both facts.
2021-03-18 19:21:29 +07:00
Simon Pilgrim b1afa187c8 [DAG] SelectionDAG::isSplatValue - add ISD::ABS handling
Add ISD::ABS to the existing unary instructions handling for splat detection

This is similar to D83605, but doesn't appear to need to touch any of the wasm refactoring.

Differential Revision: https://reviews.llvm.org/D98778
2021-03-18 10:28:29 +00:00
Fraser Cormack 3495031a39 [RISCV] Support scalable-vector masked scatter operations
This patch adds support for masked scatter intrinsics on scalable vector
types. It is mostly an extension of the earlier masked gather support
introduced in D96263, since the addressing mode legalization is the
same.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96486
2021-03-18 10:17:50 +00:00
Fraser Cormack 0331399dc9 [RISCV] Support scalable-vector masked gather operations
This patch supports the masked gather intrinsics in RVV.

The RVV indexed load/store instructions only support the "unsigned unscaled"
addressing mode; indices are implicitly zero-extended or truncated to XLEN and
are treated as byte offsets. This ISA supports the intrinsics directly, but not
the majority of various forms of the MGATHER SDNode that LLVM combines to. Any
signed or scaled indexing is extended to the XLEN value type and scaled
accordingly. This is done during DAG combining as widening the index types to
XLEN may produce illegal vectors that require splitting, e.g.
nxv16i8->nxv16i64.

Support for scalable-vector CONCAT_VECTORS was added to avoid spilling via the
stack when lowering split legalized index operands.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D96263
2021-03-18 09:26:18 +00:00
Bing1 Yu 0002d4bf36 [X86][AMX][NFC] Give correct Passname for Tile Register Pre-configure 2021-03-18 17:15:44 +08:00
Fraser Cormack c2b4600ec8 [RISCV] Support bitcasts of fixed-length mask vectors
Without this patch, bitcasts of fixed-length mask vectors would go
through the stack.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98779
2021-03-18 08:52:42 +00:00
Luo, Yuanke e64adc0b88 [X86] Fix compile time regression of D93594.
D93594 depend on the dominate tree and loop information. It increased
the compile time when build with -O0. However this is just to amend the
dominate tree and loop information, so that it is unnecessary to
re-analyze them again. Given the dominate tree of loop information are
absent in this pass, we can avoid amending them.

Differential Revision: https://reviews.llvm.org/D98773
2021-03-18 16:52:43 +08:00
Sjoerd Meijer 90ecb862a0 [AArch64] Rewrite (add, csel) to cinc
Don't rewrite an add instruction with 2 SET_CC operands into a csel
instruction. The total instruction sequence uses an extra instruction and
register. Preventing this allows us to match a `(add, csel)` pattern and
rewrite this into a `cinc`.

Differential Revision: https://reviews.llvm.org/D98704
2021-03-18 08:49:27 +00:00
Lang Hames 86ec3fd9d9 [JITLink] Improve out-of-range error messages.
Switches all backends to use the makeTargetOutOfRangeError function from
JITLink.h.
2021-03-17 21:35:24 -07:00
ShihPo Hung fca5d63aa8 [RISCV] Fix isel pattern of masked vmslt[u]
This patch changes the operand order of masked vmslt[u]
from (mask, rs1, scalar, maskedoff, vl)
to (maskedoff, rs1, scalar, mask, vl).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98839
2021-03-17 20:18:11 -07:00
Krzysztof Parzyszek b292dce230 [ObjectYAML] Handle Hexagon V68 2021-03-17 21:43:35 -05:00
Krzysztof Parzyszek 0ddf38c99e [Hexagon] Improve stack address base reuse for HVX spills
The offset in HVX loads/stores is only 4 bits long, so often an
extra register is needed to hold the address. Minimize the number
of such registers by "standardizing" the base addresses and reusing
preexisting base registers when replacing frame indices.
2021-03-17 21:22:56 -05:00
Krzysztof Parzyszek 849412270b [Hexagon] Add more patterns for HVX loads and stores
In particular, add patterns for loads/stores to the stack
(with a frame index as address).
2021-03-17 21:01:52 -05:00
Chen Zheng d33b016ada [XCOFF][llvm-dwarfdump] llvm-dwarfdump support for XCOFF
Author: hubert.reinterpretcast, shchenz

Reviewed By: jasonliu, echristo

Differential Revision: https://reviews.llvm.org/D97186
2021-03-17 21:21:51 -04:00
Amara Emerson 28963d895b [GlobalISel] Don't DCE LIFETIME_START/LIFETIME_END markers.
These are pseudos without any users, so DCE was killing them in the combiner.

Marking them as having side effects doesn't seem quite right since they don't.

Gives a nice 0.3% geomean size win on CTMark -Os.

Differential Revision: https://reviews.llvm.org/D98811
2021-03-17 18:02:08 -07:00
Carl Ritson 1a4bc3aba3 [AMDGPU] Avoid unnecessary graph visits during WQM marking
Avoid revisiting nodes with the same set of defined lanes by
using a unified visited set which integrates lanes into the key.
This retains the intent of the original code by still revisiting
a subgraph if a different set of lanes is defined and hence
marking might progress differently.

Note: default size of the visited set has been confirmed to
cover >99% of invocations in large array of test shaders.

Reviewed By: piotr

Differential Revision: https://reviews.llvm.org/D98772
2021-03-18 10:00:41 +09:00
Joel E. Denny f87b4109b2 [FileCheck] Fix redundant diagnostics due to numeric errors
Fixed substitution printing not to produce an empty diagnostic for
errors handled elsewhere.

Reviewed By: thopre

Differential Revision: https://reviews.llvm.org/D98088
2021-03-17 19:25:41 -04:00
Joel E. Denny dd59c1324d [FileCheck] Fix numeric error propagation
A more general name might be match-time error propagation.  That is,
it's conceivable we'll one day have non-numeric errors that require
the handling fixed by this patch.

Without this patch, FileCheck behaves as follows:

```
$ cat check
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]

$ FileCheck -vv -dump-input=never check < input
check:1:54: remark: implicit EOF: expected string found in input
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]
                                                     ^
<stdin>:2:1: note: found here

^
check:1:15: error: unable to substitute variable or numeric expression: overflow error
CHECK-NOT: [[#0x8000000000000000+0x8000000000000000]]
              ^
$ echo $?
0
```

Notice that the exit status is 0 even though there's an error.
Moreover, FileCheck doesn't print the error diagnostic unless both
`-dump-input=never` and `-vv` are specified.

The same problem occurs when `CHECK-NOT` does have a match but a
capture fails due to overflow: exit status is 0, and no diagnostic is
printed unless both `-dump-input=never` and `-vv` are specified.  The
usefulness of capturing from `CHECK-NOT` is questionable, but this
case should certainly produce an error.

With this patch, FileCheck always includes the error diagnostic and
has non-zero exit status for the above examples.  It's conceivable
that this change will cause some existing tests to fail, but my
assumption is that they should fail.  Moreover, with nearly every
project enabled, this patch didn't produce additional `check-all`
failures for me.

This patch also extends input dumps to include such numeric error
diagnostics for both expected and excluded patterns.

As noted in fixmes in some of the tests added by this patch, this
patch worsens an existing issue with redundant diagnostics.  I'll fix
that bug in a subsequent patch.

Reviewed By: thopre, jhenderson

Differential Revision: https://reviews.llvm.org/D98086
2021-03-17 19:25:41 -04:00
Arthur Eubanks 792bed6a4c Revert "[NewPM] Verify LoopAnalysisResults after a loop pass"
This reverts commit 6db3ab2903.

Causing too large of compile time regression.
2021-03-17 15:22:52 -07:00
Amara Emerson d7fed7b899 [AArch64][GlobalISel] Fall back if disabling neon/fp in the translator.
The previous technique relied on early-exiting the legalizer predicate
initialization, leaving an empty rule table. That causes a fallback
for most instructions, but some have legacy rules defined like G_ZEXT
which can try continue, but then crash.

We should fall back earlier, in the translator, to avoid this issue.

Differential Revision: https://reviews.llvm.org/D98730
2021-03-17 15:08:08 -07:00
Steven Wu 991df7333d [Object][MachO] Handle end iterator in getSymbolType()
Fix a bug in MachOObjectFile::getSymbolType() that it is not checking if
the iterator is end() before deference the iterator. Instead, return
`Other` type, which aligns with the behavior of `llvm-nm`.

rdar://75291638

Reviewed By: davide, ab

Differential Revision: https://reviews.llvm.org/D98739
2021-03-17 15:06:45 -07:00
David Green 35e0567d58 [ARM] Add VREV MVE shuffle costs
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.

Differential Revision: https://reviews.llvm.org/D98210
2021-03-17 21:21:43 +00:00
Arthur Eubanks 6db3ab2903 [NewPM] Verify LoopAnalysisResults after a loop pass
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.

Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98805
2021-03-17 13:37:22 -07:00
Ricky Taylor eb6b455ba1 [M68k] Forward declare getMCInstrBeads in one place
At the moment `getMCInstrBeads` is forward-declared in a few places,
bring this together into a single header file.

This was done as part of the disassembler work, since the disassembler
would otherwise add one more forward declaration.

Differential Revision: https://reviews.llvm.org/D98533
2021-03-17 13:31:27 -07:00
Ricky Taylor 2416f24363 [M68k] Use fixed asm string for MxPseudo instructions
This is required because empty strings are not allowed when generating
the assembly parser tables.

Differential Revision: https://reviews.llvm.org/D98532
2021-03-17 13:31:27 -07:00
Nico Weber 605a503f35 [lld-link] emit an error when writing a PDB > 4 GiB
Maybe there's a way to make them work, but until I've investigated
if tools can consume large PDBs, erroring out is better than slowly
and silently consuming all available ram due to internal invariants
being violated.

(Patch to make writing larger files work at
https://bugs.chromium.org/p/chromium/issues/detail?id=1179085#c25
but I haven't had time to check if windbg & co can consume these
large PDBs. llvm-pdbutil can't, but we can fix that one at least :) )

Differential Revision: https://reviews.llvm.org/D98788
2021-03-17 15:15:08 -04:00
Philip Reames 31764ea295 [LCSSA] Extract a utility for deciding if a new use requires a new lcssa phi [NFC]
(Triggered by a review comment on D98728, but otherwise unrelated.)
2021-03-17 12:14:01 -07:00
Craig Topper 92b39c6907 [RISCV] Use getTargetExtractSubreg and getTargetInsertSubreg to simplify some code. NFCI 2021-03-17 12:10:19 -07:00
Philip Reames 7c7f4676cd [LICM] Fix a crash when sinking instructions w/token operands
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.

This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.

Differential Revision: https://reviews.llvm.org/D98728
2021-03-17 11:18:46 -07:00
David Green e2935dcfc4 [TTI] Add a Mask to getShuffleCost
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.

Differential Revision: https://reviews.llvm.org/D98206
2021-03-17 17:46:26 +00:00
Craig Topper 696ddef569 [RISCV] Support masked load/store for fixed vectors.
Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D98561
2021-03-17 10:26:15 -07:00
Stephen Tozer 3bfddc2593 Reapply "[DebugInfo] Handle multiple variable location operands in IR"
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.

This reverts commit 01ac6d1587.
2021-03-17 16:45:25 +00:00
Bardia Mahjour fa9d8ace09 [CGSCC] Print CG node itself instead of its address
Fix the debug output from cgscc
2021-03-17 12:36:55 -04:00
LemonBoy 4f024938e4 [LoopVectorize] Refine hasIrregularType predicate
The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.

The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D97465
2021-03-17 17:03:47 +01:00
David Green 402f2cae7d [ARM] Use lrdsb for more thumb1 loads.
Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we
don't naturally have a rn, rm addressing mode, we can either generate
"ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]".

We currently generate the first, always creating a sxth. They are both
the same number of instructions, but if we generate the second then the
mov #0 will likely be CSE'd or pulled out of a loop, etc.

This adjusts the ISel patterns to do that, creating a mov instead of a
sxth.

Differential Revision: https://reviews.llvm.org/D98693
2021-03-17 15:29:02 +00:00
Alexey Lapshin 021de7cf80 [llvm-objcopy][NFC] Move ownership keeping code into restoreStatOnFile().
The D93881 added functionality which preserve ownership for output file
if llvm-objcopy is called under root. That code was added into the place
where output file is created. The llvm-objcopy already has a function which
sets/restores rights/permissions for the output file.
That is the restoreStatOnFile() function. This patch moves code
(preserving ownershipping) into the restoreStatOnFile() function.

Differential Revision: https://reviews.llvm.org/D98511
2021-03-17 17:27:00 +03:00
Hans Wennborg 01ac6d1587 Revert "[DebugInfo] Handle multiple variable location operands in IR"
This caused non-deterministic compiler output; see comment on the
code review.

> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232

This reverts commit df69c69427.
2021-03-17 13:36:48 +01:00
Bradley Smith cf0da91ba5 [AArch64][SVE/NEON] Add support for FROUNDEVEN for both NEON and fixed length SVE
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.

Differential Revision: https://reviews.llvm.org/D98487
2021-03-17 11:41:22 +00:00
David Green 3c25c40d51 [LV] Account for the cost of predication of scalarized load/store
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:

    %c1 = extractelement <4 x i1> %C, i32 1
    br i1 %c1, label %if, label %else
  if:
    %sa = extractelement <4 x i32> %a, i32 1
    %sb = getelementptr inbounds float, float* %pg, i32 %sa
    %sv = extractelement <4 x float> %x, i32 1
    store float %sa, float* %sb, align 4
  else:

So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.

Differential Revision: https://reviews.llvm.org/D98243
2021-03-17 10:57:50 +00:00
Bu Le 9abe500473 [SLP] Fix the trunc instruction insertion problem
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
2021-03-17 13:51:08 +03:00
Fraser Cormack 70251759a2 [RISCV] Optimize "dominant element" BUILD_VECTORs
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.

In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.

This optimization is disabled when optimizing for size.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D98700
2021-03-17 10:09:04 +00:00
Jay Foad 967b64beb4 [AMDGPU] Split dot2-insts feature
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.

Differential Revision: https://reviews.llvm.org/D98717
2021-03-17 09:42:21 +00:00
Arthur Eubanks 70af2924a7 [Unswitch] Guard dbgs logging with LLVM_DEBUG 2021-03-16 22:31:57 -07:00
Max Kazantsev a6074b092c [BasicAA] Drop dependency on Loop Info. PR43276
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.

Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.

This patch drops dependence of BasicAA on LoopInfo to avoid this problem.

This may potentially pessimize the result of queries to BasicAA.

Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic
2021-03-17 11:43:44 +07:00
Anirudh Prasad 9f5da80013 Revert "[AsmParser][SystemZ][z/OS] Reland "Introduce HLASM Comment Syntax""
This reverts commit b605cfb336.

Differential Revision: https://reviews.llvm.org/D98744
2021-03-16 18:39:04 -04:00