Commit Graph

152643 Commits

Author SHA1 Message Date
Jay Foad 3264e95938 [CodeGen] Update LiveIntervals in TargetInstrInfo::convertToThreeAddress
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.

Differential Revision: https://reviews.llvm.org/D113493
2021-11-17 10:16:47 +00:00
Stanislav Mekhanoshin c74f2e5b27 [InstCombine] Use SpecificBinaryOp_match in two more places
Differential Revision: https://reviews.llvm.org/D114038
2021-11-17 01:16:06 -08:00
Roman Lebedev 2037ec725f
[X86][Costmodel] `*ext v64i1 to v32i16` can appear after legalization, cost is same as for `*ext v32i1 to v32i16`
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113914
2021-11-17 12:02:50 +03:00
Roman Lebedev 23b194bf18
[X86][Costmodel] `trunc v32i16 to v64i1` can appear after legalization, cost is same as for `trunc v32i16 to v32i1`
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113913
2021-11-17 12:02:50 +03:00
Eric Tang f7eb061a5f [SelectionDAG] Make WidenVecRes_SELECT work for scalable vectors
This change make WidenVecRes_SELECT work for scalable vectors.

    This patch is split from [D110319](https://reviews.llvm.org/D110319)

Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D110388
2021-11-17 08:55:11 +00:00
Aaron Puchert b20da5117f Don't add irrelevant items to queue in DwarfCompileUnit::createScopeChildrenDIE (NFC)
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D113864
2021-11-17 00:01:20 +01:00
Aaron Puchert 86b3100cde [DebugInfo] Use DbgEntityKind in DbgEntity interface (NFC)
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.

Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D113862
2021-11-17 00:01:20 +01:00
Benjamin Kramer 8b8e8704ce [PowerPC] Fix a nullptr dereference
LiMI1/LiMI2 can be null, so don't call a method on them before checking.
Found by ubsan.
2021-11-16 23:52:42 +01:00
Philip Reames 8d85e945b2 [SCEV] Canonicalize X - urem X, Y patterns
There are multiple possible ways to represent the X - urem X, Y pattern. SCEV was not canonicalizing, and thus, depending on which you analyzed, you could get different results. The sub representation appears to produce strictly inferior results in practice, so I decided to canonicalize to the Y * X/Y version.

The motivation here is that runtime unroll produces the sub X - (and X, Y-1) pattern when Y is a power of two. SCEV is thus unable to recognize that an unrolled loop exits because we don't figure out that the new unrolled step evenly divides the trip count of the unrolled loop. After instcombine runs, we convert the the andn form which SCEV recognizes, so essentially, this is just fixing a nasty pass ordering dependency.

The ARM loop hardware interaction in the test diff is opague to me, but the comments in the review from others knowledge of the infrastructure appear to indicate these are improvements in loop recognition, not regressions.

Differential Revision: https://reviews.llvm.org/D114018
2021-11-16 11:59:21 -08:00
Arthur Eubanks c95a9f46c9 [Loads] Handle addrspacecast constant expressions when determining dereferenceability
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D114015
2021-11-16 11:17:57 -08:00
Victor Huang ae27ca9a67 [PowerPC] PPC backend optimization on conditional trap intrustions
This patch adds PPC back end optimization to analyze the arguments of a
conditional trap instruction to execute one of the following:
1. Delete it if never trap
2. Replace it if always trap
3. Otherwise keep it

Reviewed By: nemanjai, amyk, PowerPC

Differential revision: https://reviews.llvm.org/D111434
2021-11-16 13:11:57 -06:00
Hongtao Yu 042cefd2b5 [CSSPGO] Fix a hash code truncating issue in ContextTrieNode.
std::hash returns a 64bit hash code while previously we were using only lower 32 bits which caused hash collision for large workloads.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113688
2021-11-16 11:01:52 -08:00
River Riddle 4c484f11d3 [llvm] Add a SFINAE template parameter to DenseMapInfo
This allows for using SFINAE partial specialization for DenseMapInfo.
In MLIR, this is particularly useful as it will allow for defining partial
specializations that support all Attribute, Op, and Type classes without
needing to specialize DenseMapInfo for each individual class.

Differential Revision: https://reviews.llvm.org/D113641
2021-11-16 18:54:14 +00:00
Mircea Trofin c6b9b702a0 [NFC][Regalloc] Factor out eviction decision from eviction attempt
This splits tryEvict into a const tryFindEvictionCandidate, which
attempts to find a candidate, and the actual eviction (should the former
be successful)

The newly introduced tryFindEvictionCandidate will move subsequently
into the RegAllocEvictionAdvisor.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113941
2021-11-16 10:50:23 -08:00
Duncan P. N. Exon Smith fd6018072a DebugInfo: Make DWARFExpression::iterator a const iterator
3d1d8c767b made
DWARFExpression::iterator's Operation member `mutable`. After a few prep
commits, the iterator can instead be made a `const` iterator since no
caller can change the Operation.

Differential Revision: https://reviews.llvm.org/D113958
2021-11-16 10:25:10 -08:00
Duncan P. N. Exon Smith a0f1f17131 DebugInfo: Stop modifying Operation::Error inside of verify()
The only caller of Operation::verify() is DWARFExpression::verify(),
which iterates past the (ephemeral) Operation immediately after.

- Stop setting Operation::Error because the mutation will never be
  observed.
- Change verify() to a static function to be sure all callers are
  updated.

Differential Revision: https://reviews.llvm.org/D113957
2021-11-16 10:21:04 -08:00
Kazu Hirata ee0133dc6d [llvm] Use range-for loops (NFC) 2021-11-16 09:01:56 -08:00
Philip Reames ed6b69a38f Add a hasPoisonGeneratingFlags proxy wrapper to Instruction [NFC]
This just cuts down on casts to Operator.
2021-11-16 08:48:16 -08:00
David Sherwood 4607459022 [AArch64] Fix TypeSize->uint64_t implicit conversion in AArch64ISelLowering::hasAndNot
For now I've just changed the code to only return true from
AArch64ISelLowering::hasAndNot if the vector is fixed-length.
Once we have the right patterns or DAG combines to use bic/bif
we can also enable this for SVE.

Test added here:

  CodeGen/AArch64/vselect-constants.ll

Differential Revision: https://reviews.llvm.org/D113994
2021-11-16 16:25:16 +00:00
Jon Chesterfield 30b29db7c7 [amdgpu] Don't crash on empty global ctor/dtor
Global ctor/dtor can be an empty array, which is a Constant not a
ConstantArray. The cast<ConstantArray> therefore asserts / crashes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D113800
2021-11-16 14:36:08 +00:00
Sanjay Patel 8fce94f916 [InstCombine] canonicalize icmp with trunc op into mask and cmp, part 2
If C is a high-bit mask:
(trunc X) u< C --> (X & C) != C (are any masked-high-bits clear?)

If C is low-bit mask:
(trunc X) u> C --> (X & ~C) != 0 (are any masked-high-bits set?)

If C is not-of-power-of-2 (one clear bit):
(trunc X) u> C --> (X & (C+1)) == C+1 (are all masked-high-bits set?)

This extends the fold added with:
acabad9ff6 (https://alive2.llvm.org/ce/z/aFr7qV)

Using decomposeBitTestICmp() to generalize this is a planned follow-up, but that requires removing an inverse fold.

Here are Alive2 generalizations for these folds:
https://alive2.llvm.org/ce/z/u-ZpC_ (ult, the previous patch)
https://alive2.llvm.org/ce/z/YsuAu2 (ult, this patch)
https://alive2.llvm.org/ce/z/ekktQP (ugt, low bitmask)
https://alive2.llvm.org/ce/z/pJY9wR (ugt, one clear bit)

Differential Revision: https://reviews.llvm.org/D112634
2021-11-16 09:27:30 -05:00
Alexey Bataev 900cc1a226 [SLP]Improve cost of the gather nodes.
No need to count the final shuffle cost for the constants, gathering of
the constants is just a constant vector + extra inserts, if required.

Differential Revision: https://reviews.llvm.org/D113770
2021-11-16 06:25:07 -08:00
Alexey Bataev cdf8a53c1d [SLP]Fix windows build, NFC.
Need to put `IndexIdx` var to the list of captures.
2021-11-16 06:09:51 -08:00
Alexey Bataev aa9bbb64be [SLP]Adjust GEP indices types when trying to build entries.
Need to adjust the types of GEPs indices when building the tree
entries/operands. Otherwise some of the nodes might differ and
vectorizer is unable to correctly find them and count their cost.

Differential Revision: https://reviews.llvm.org/D113792
2021-11-16 05:44:33 -08:00
Sander.DeSmalen@arm.com 305816ff1e [IndVarSimplify] Reduce nondeterministic behaviour in visitIVCast.
rGf39978b84f1d3a1da6c32db48f64c8daae64b3ad led to and/or exposed
an issue with IndVarSimplification for a loop where a i32 phi node is
no longer replaced by a widened (i64) phi node, because the SCEVs of a
sign-extend no longer folded the same way. I'm unsure how to properly
explain this because it's all rather complicated, but in short: SCEVs
don't fold as nicely as they used to and this caused a difference.

While investigating this, I found that IndVarSimplify can actually
optimise the case in the way we want to if it chooses the widened IV to
be 'signed' (the i32 IV is both sign and zero-extended). Oddly enough,
there is some level of indeterminism in the way the algorithm works,
it just picks the sign of the 'first' zext/sext user, where the order of
the users-iterator is not guaranteed to be the same on each invocation
of the pass (e.g. shown by first running loop-rotate, which puts the
users in a different order).

While I think the fix is valid in the sense that consistently picking
_any_ order is better than having an nondeterministic order, I can
use a bit of advice from people more familiar in this area of the
code-base.

For example, I'm not sure if this fix is hiding another issue where the
IndVarSimplify pass could actually draw the same conclusions (i.e. that
it only needs an i64 phi node) if it does a bit more work, regardless
of whether it chooses the induction variable to be signed or unsigned.

I'm also not sure if choosing signed is better than unsigned, or whether
that just happens to be beneficial only in this individual case.

Any feedback would be much appreciated!

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D112573
2021-11-16 12:41:04 +00:00
Florian Hahn b7aec4f08e
[SCEV] Support rewriting ZExt expressions with loop guard info.
So far, applying loop guard information has been restricted to
SCEVUnknown. In a few cases, like PR40961 and PR52464, this leads to
SCEV failing to determine tight upper bounds for the backedge taken
count.

This patch adjusts SCEVLoopGuardRewriter and applyLoopGuards to support
re-writing ZExt expressions.

This is a first step towards fixing  PR40961 and PR52464.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D113577
2021-11-16 11:16:07 +00:00
Matt Devereau f526c600c0 [AArch64][SVE] Instcombine SVE LD1/ST1 to stock LLVM IR
InstCombine AArch64 LD1/ST1 to llvm.masked.load/llvm.masked.store
and LD1/ST1 to load/store when a ptrue all predicate pattern operand
is present.

This allows existing IR optimizations such as dead-load removal to
occur.

Differential Revision: https://reviews.llvm.org/D113489
2021-11-16 11:10:23 +00:00
Frederik Gossen 3f3d4e8a15 Fix unused variable warning in LoadStoreOpt.cpp with (void) 2021-11-16 12:03:59 +01:00
Frederik Gossen 2bceb7c8da Revert "Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp"
This reverts commit 40a609aebe.
2021-11-16 12:00:17 +01:00
Frederik Gossen ecfe7a3404 Revert "Fix unused variable warning."
This reverts commit a062e2a8ca.
2021-11-16 11:59:34 +01:00
Frederik Gossen 9a6817b7ed Revert "Fix another unused variable error."
This reverts commit 5b84ae7c48.
2021-11-16 11:58:02 +01:00
Adrian Kuegel 5b84ae7c48 Fix another unused variable error. 2021-11-16 11:32:44 +01:00
Adrian Kuegel a062e2a8ca Fix unused variable warning. 2021-11-16 11:17:33 +01:00
Frederik Gossen 40a609aebe Fix unused variable in llvm/lib/CodeGen/GlobalISel/LoadStoreOpt.cpp 2021-11-16 11:05:18 +01:00
Amara Emerson dcd8728d83 Remove unnecessary <any> include. 2021-11-16 00:50:30 -08:00
jacquesguan 6405e8b584 [RISCV] Refactor some rvv instructions' definition with foreach.
Simplify rvv instructions that use eew in their mnemonic and encoding with foreach. And fix a scheduling bug.

Differential Revision: https://reviews.llvm.org/D113453
2021-11-16 15:20:45 +08:00
Serguei Katkov 0ecb12a27f [STATEPOINT] Force implicit-def for lr register.
STATEPOINT instruction behavior is similar to call instruction.
In aarch64 BL instruction implicitly define lr register, so
STATEPOINT instruction should do the same.
However STATEPOINT is a general pseudo instruction and I could not find
a way to override list of implicit defs for specific target.

So this patch post processes inserting STATEPOINT instruction by
adding implisit dead def for lr.

Reviewers: reames, loicottet, ostannard
Reviewed By: reames
Subscribers: danilaml, hiraditya, kristof.beyls, llvm-commits, yrouban
Differential Revision: https://reviews.llvm.org/D111114
2021-11-16 12:52:00 +07:00
Kazu Hirata 7f00806a6a [llvm] Use make_early_inc_range (NFC) 2021-11-15 21:28:46 -08:00
Amara Emerson dc84770d55 [GlobalISel] Add a store-merging optimization pass and enable for AArch64.
This is a first attempt at a constant value consecutive store merging pass,
a counterpart to the DAGCombiner's store merging optimization.

The high level goals of this pass:

* Have a simple and efficient algorithm. As close to linear time as we can get.
  Thus, prioritizing scalability of the algorithm over merging every corner case
  we can find. The DAGCombiner's store merging code has been the source of
  compile time and complexity issues in the past and I wanted to avoid that.
* Don't introduce any new data structures for ordering memory operations. In MIR,
  we don't have the concept of chains like we do in the DAG, and the instruction
  order is stricter than enforcing ordering with graph edges. Although I
  considered adding something similar, I couldn't justify the overhead.

The pass is current split into 3 main parts. The main store merging code focuses
on identifying candidate stores and managing the candidate group that's under
consideration for merging. Analyzing addressing of stores is a potentially
complex part and for now there's just a basic implementation to identify easy
cases. Finally, the other main bit of complexity is the alias analysis, which
tries to follow the same logic as the DAG's AA.

Currently this implementation only supports merging of constant stores. Stores
of arbitrary variables are technically possible with a very small change, but
the DAG chooses not to do this. Doing so here makes most code worse since
there's extra overhead in merging values into wider registers.

On AArch64 -Os, this optimization results in very minor savings on CTMark.

Differential Revision: https://reviews.llvm.org/D109131
2021-11-15 21:10:39 -08:00
Craig Topper 391b0ba603 [RISCV] Override TargetLowering::hasAndNot for Zbb.
Differential Revision: https://reviews.llvm.org/D113937
2021-11-15 18:44:07 -08:00
Fabian Wolff b484fa8289 [X86] Fix crash with inline asm using wrong register name
Fixes PR#48678. `X86TargetLowering::getRegForInlineAsmConstraint()` can adjust the register class to match the type, e.g. change `VR128X` to `VR256X` if the type needs 256 bits. However, the function currently returns the unadjusted register and the adjusted register class, e.g. `xmm15` and `VR256X`, which then causes an assertion failure later because the register class does not contain that register. This patch fixes this behavior.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113834
2021-11-16 10:38:12 +08:00
Matt Arsenault 659887b405 AMDGPU: Mark prolog/epilog SCC defs as dead
A future change will add SCC liveness checks. Since we are still
relying on forward register scavenging, add dead flags to avoid
spuriously detecting SCC as live.
2021-11-15 21:35:06 -05:00
Duncan P. N. Exon Smith 79df41011b DebugInfo: const-qualify accessors of DWARFExpression::Operation
Add `const` to DWARFExpression::Operation's accessors and make
Operation::extract() private, since it's only used by the friend class
DWARFExpression::iterator.
2021-11-15 17:30:10 -08:00
Craig Topper 233def40f7 [DAGCombiner] Prevent unfoldMaskedMerge from creating an AND with two inverted inputs.
It's possible that the mask is already a NOT. At least if InstCombine
hasn't canonicalized the input. In that case we will form an ANDN with
X instead of with Y. So we don't need to worry about Y being a constant.

We might need to check that X isn't a constant instead, but we don't
have a test case for that yet.

This fixes a size regression found when trying to enable this combine
for RISCV in D113937.

Differential Revision: https://reviews.llvm.org/D113948
2021-11-15 17:15:51 -08:00
Mehrnoosh Heidarpour 62c51a72f9 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D113861
2021-11-15 18:55:04 -05:00
Ben Shi 4c3d916c4b [RISCV] Optimize immediate materialisation with SH*ADD
Use LUI+SH*ADD+ADDI to compose specific immediates.

Reviewed By: craig.topper, luismarques

Differential Revision: https://reviews.llvm.org/D113568
2021-11-15 23:34:28 +00:00
Stanislav Mekhanoshin 833cdb0a07 Revert "[InstSimplify] Fold A|B | (A^B) --> A|B"
This reverts commit 193c40e966.
2021-11-15 14:56:20 -08:00
Arthur Eubanks 19867de9e7 [NewPM] Only invalidate modified functions' analyses in CGSCC passes + turn on eagerly invalidate analyses
Previously, any change in any function in an SCC would cause all
analyses for all functions in the SCC to be invalidated. With this
change, we now manually invalidate analyses for functions we modify,
then let the pass manager know that all function analyses should be
preserved since we've already handled function analysis invalidation.

So far this only touches the inliner, argpromotion, function-attrs, and
updateCGAndAnalysisManager(), since they are the most used.

This is part of an effort to investigate running the function
simplification pipeline less on functions we visit multiple times in the
inliner pipeline.

However, this causes major memory regressions especially on larger IR.
To counteract this, turn on the option to eagerly invalidate function
analyses. This invalidates analyses on functions immediately after
they're processed in a module or scc to function adaptor for specific
parts of the pipeline.

Within an SCC, if a pass only modifies one function, other functions in
the SCC do not have their analyses invalidated, so in later function
passes in the SCC pass manager the analyses may still be cached. It is
only after the function passes that the eager invalidation takes effect.
For the default pipelines this makes sense because the inliner pipeline
runs the function simplification pipeline after all other SCC passes
(except CoroSplit which doesn't request any analyses).

Overall this has mostly positive effects on compile time and positive effects on memory usage.
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=instructions
https://llvm-compile-time-tracker.com/compare.php?from=7f627596977624730f9298a1b69883af1555765e&to=39e824e0d3ca8a517502f13032dfa67304841c90&stat=max-rss

D113196 shows that we slightly regressed compile times in exchange for
some memory improvements when turning on eager invalidation.  D100917
shows that we slightly improved compile times in exchange for major
memory regressions in some cases when invalidating less in SCC passes.
Turning these on at the same time keeps the memory improvements while
keeping compile times neutral/slightly positive.

Reviewed By: asbirlea, nikic

Differential Revision: https://reviews.llvm.org/D113304
2021-11-15 14:44:53 -08:00
Steven Wu fcd07f8107 [JITLink] Fix splitBlock if there are symbols span across the boundary
Fix `splitBlock` so that it can handle the case when the block being
split has symbols span across the split boundary. This is an error
case in general but for EHFrame splitting on macho platforms, there is an
anonymous symbol that marks the entire block. Current implementation
will leave a symbol that is out of bound of the underlying block. Fix
the problem by dropping such symbols when the block is split.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D113912
2021-11-15 13:55:21 -08:00
Stanislav Mekhanoshin 193c40e966 [InstSimplify] Fold A|B | (A^B) --> A|B
This patch adds the following fold opportunity:
A|B | (A^B) --> A|B

that is reported here : https://bugs.llvm.org/show_bug.cgi?id=52479

https://alive2.llvm.org/ce/z/33-My-

Test cases with base results are added in D113860

(authored by MehrHeidar, committed by rampitec).

Differential Revision:  https://reviews.llvm.org/D113861
2021-11-15 13:49:20 -08:00
Jonas Paulsson 1c3ef9ef4a [SystemZ] Support symbolic displacements.
This patch adds support for symbolic displacements, e.g. like 'lg %r0,
sym(%r1)', which is done using relocations. This is needed to compile the
kernel without disabling the integrated assembler.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D113341
2021-11-15 16:46:31 -05:00
Mircea Trofin 19e6b730ce [NFC][Regalloc] Factor types that would be used by the eviction advisor
This is in prepartion of pulling the eviction decision-making into an
analysis pass, which would then allow swapping that decision making
process.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D113929
2021-11-15 13:15:14 -08:00
Fangrui Song fee52fe0ad [X86] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off build. NFC 2021-11-15 13:10:47 -08:00
Philip Reames 8f95e915cd [unroll-runtime] Relax two profitability limitations on multi-exit unrolling
This change is mostly about getting rid of some "uninteresting" cases in a follow on deeper heuristic change.  If anyone sees actually interesting code differences out of this, please let me know.  I'm not expecting this to have much impact at all.

Case 1 - With the single deoptimize non-latch exit, we can't have two exiting blocks sharing an exit block.  We can only hit this with a poorly documented debug flag.

Case 2 - Why should we treat epilog cases differently from prolog cases?  Or to say it differently, why should starting with a constant control whether a multiple exit loop gets unrolled?

Sorry for the lack of tests here.  These are both *exceedingly* narrow cases in practice, and after a while trying, I couldn't come up with a test which did anything "useful" as opposed to simply exercise a random combination of force flags.  Note that the legality cases for each are already exercised with force flags.
2021-11-15 13:00:14 -08:00
Snehasish Kumar bee8e203c6 [InstrProf][NFC] Fix a few typos in code comments. 2021-11-15 12:55:25 -08:00
Nico Weber b4e50e5228 [asm] Make EmitMSInlineAsmStr and EmitGCCInlineAsmStr more alike
https://reviews.llvm.org/D71677 copied a bunch of code from
EmitGCCInlineAsmStr() to EmitMSInlineAsmStr() but made a few small
(likely unintentional) changes. This makes these pieces look the same.

No behavior change.

(Why are these functions two copies? No great reason as far as I can tell.
https://reviews.llvm.org/rG1778831a3d1d24ab6545635f63da4d9c5f8f0ac7 did the
split; we might want to undo them at some point. But PR23933 suggests
that a bigger change is planned for this file in the future, so keeping
this incremental for now.)

Differential Revision: https://reviews.llvm.org/D113924
2021-11-15 15:43:01 -05:00
Nico Weber 0be836b7dd [asm] Convert AsmPrinter::PrintSpecial() to StringRef
No behavior change.

Differential Revision: https://reviews.llvm.org/D113911
2021-11-15 15:38:27 -05:00
Nico Weber 833393e021 [asm] Correctly handle special names in variants
There's really no reason why anyone should use these special names in a variant.
I noticed this while reading the code: all other writes to OS are guarded by
this conditional, and the behavior with the check seems more correct, so
let's add the check.

Differential Revision: https://reviews.llvm.org/D113909
2021-11-15 15:37:09 -05:00
Lei Huang f50c6c1718 [PowerPC] Fix 32bit vector insert instructions for ISA3.1
The platform independent ISD::INSERT_VECTOR_ELT take a element index,
but vins* instructions take a byte index. Update 32bit td patterns for
vector insert to handle the element index accordingly.

Since vector insert for non constant index are supported in
ISA3.1, there is no need to use platform specific ISD node,
PPCISD::VECINSERT.  Update td pattern to directly use
ISD::INSERT_VECTOR_ELT instead.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D113802
2021-11-15 14:36:39 -06:00
Philip Reames 423da61835 [runtime-unroll] Inline canSafelyUnrollMultiExitLoop [NFC]
All of the interesting logic from this routine has been removed, inline the single check into the sole non-assert caller.  The assert use has little value with the restructured code and is simply dropped.
2021-11-15 11:39:07 -08:00
Stanislav Mekhanoshin e785f4ab6a [PatternMatch] Add m_BinOp/m_c_BinOp with specific opcode
Differential Revision: https://reviews.llvm.org/D113508
2021-11-15 11:24:27 -08:00
Philip Reames e99902a872 [runtime-unroll] Restructure if-clause to improve readability [NFC] 2021-11-15 11:13:27 -08:00
Alexey Bataev 224e46d355 [SLP][DOT][NFCI]Output all scalars for the splats, not only the first one. 2021-11-15 10:54:26 -08:00
Simon Pilgrim 441de2536b [X86] Add generic splitVectorOp helper. NFC
Update splitVectorIntUnary/splitVectorIntBinary to use this internally, after some operand type sanity checks.

Avoid code duplication and makes it easier to split other vector instruction forms in the future.
2021-11-15 17:59:23 +00:00
Craig Topper f59307bfdc [RISCV] Teach needVSETVLIPHI to handle mask register instructions.
This handles the case where the mask register instruction input
comes from a Phi of vsetvlis. If the VLMAX is the same as the VLMAX
required by the mask register instruction, we can avoid a vsetvli.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D113204
2021-11-15 09:57:28 -08:00
Mehrnoosh Heidarpour 7daa95c8fa [InstCombine] Fold (A^B)|~A-->~(A&B)
https://alive2.llvm.org/ce/z/2v6rhF

Fixes:
https://llvm.org/PR52478

Differential Revision: https://reviews.llvm.org/D113783
2021-11-15 12:29:37 -05:00
Simon Pilgrim fc7c1cebbc [X86] LowerFunnelShift - pull out repeated EltSizeInBits variable. NFC. 2021-11-15 17:11:44 +00:00
Sanjay Patel 3d01507c2d [x86] fold vector (X > -1) & Y to shift+andn (2nd try)
The first try at this patch ( bf5748a1af ) was reverted ( 5be64d4164 )
because it could crash. The cause of that problem was failing to account
for the optional peek-through-bitcast in the enclosing function.

This version of the patch adds a clause to avoid the fold in case of
bitcasts because it is unlikely to be profitable in that scenario.

A test case based on https://llvm.org/PR52504 was added to make sure
we don't have that problem again.

Original commit message:

and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y

This avoids the -1 constant vector in favor of an arithmetic shift
instruction if it exists (the ISA is still not complete after all
these years...).

We catch this pattern late in combining by matching PCMPGT, so it
should not interfere with more general folds.

Differential Revision: https://reviews.llvm.org/D113603
2021-11-15 11:09:32 -05:00
Roman Lebedev 5c7255fe3a
[X86][Costmodel] `getReplicationShuffleCost()`: promote 8 bit-wide elements to 32 bit when no AVX512VBMI
Currently `X86TTIImpl::getInterleavedMemoryOpCostAVX512()` asks about i8 elt type,
so this change does affect vectorization. In the end, it will ask about i1.

We should also try to promote to i16 if we have AVX512BW, i'll do that in a follow-up.
All costs here look good, i've added the missing truncation costs in preparatory patches.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113853
2021-11-15 19:04:02 +03:00
Roman Lebedev a468c39c90
[X86][Costmodel] `trunc v32i16 to v64i8` can appear after legalization, cost is same as for `trunc v32i16 to v32i8`
Some of the costs get larger here,
but i suppose that makes sense since we'd previously query
scalarization costs that may not be really representative of the reality.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113852
2021-11-15 19:04:02 +03:00
Roman Lebedev 9e57d9b09d
[X86][Costmodel] `trunc v8i64 to v16i8/v32i8/v64i8` can appear after legalization, cost is same as for `trunc v8i64 to v8i8`
While this one is trivial and identical to the previous patch,
there is a weird cost change in a follow-up patch that i'm not sure about.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113851
2021-11-15 19:04:02 +03:00
Roman Lebedev 0116c708c6
[X86][Costmodel] `trunc v16i32 to v32i8/v64i8` can appear after legalization, cost is same as for `trunc v16i32 to v16i8`
While this one is trivial and identical to the previous patch,
there is a weird cost change in a follow-up patch that i'm not sure about.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113850
2021-11-15 19:04:02 +03:00
Simon Pilgrim ea9e6aa423 [X86] getAVX512Node() - find constant broadcasts to encourage load-folding
If an operand is a bitcasted or widended constant, try to more aggressively create broadcastable constants for folding, which in particular helps non-VLX modes.

I've refactored getAVX512Node so that VLX targets can make better use of this as well.

NOTE: In the future, I think we should consider removing the broadcast of constant data from DAG entirely and move this to either X86InstrInfo::foldMemoryOperand or a new pass - AVX1/2 targets has similar problems with missed (whole vector) folds that need to be improved as well.

Differential Revision: https://reviews.llvm.org/D113845
2021-11-15 15:52:03 +00:00
Alexey Bataev 036207d5f2 [SLP]Improve splat detection.
A bunch of scalars can be treated as a splat not only if all elements
are the same but also if some of them are undefvalues.

Differential Revision: https://reviews.llvm.org/D113774
2021-11-15 07:50:34 -08:00
Alexey Bataev b85152f8b1 [SLP][NFC]Use `isa_and_nonnull` and fix comment, NFC. 2021-11-15 06:49:33 -08:00
ksyx 72b5138d37 Revert "[GVN][NFC] Remove redundant check"
This reverts commit c35e8185d8.

mstorsjo reported in the revision thread that one VNCoercion assertion
is violated and seemly in relate to this commit. As per "If a test case
that demonstrates a problem is reported in the commit thread, please
revert and investigate offline", this commit is reverted.
2021-11-15 09:14:13 -05:00
Alexey Bataev 6fb5bed7d1 [SLP]Do not create unused gather nodes for scalar arguments of vector intrinsics.
If the vector intrinsic has scalar argument, we currently still create
a tree entry for this argument. This entry is not used, just consumes
resources and increases the cost of the tree.

Differential Revision: https://reviews.llvm.org/D113806
2021-11-15 06:11:19 -08:00
Florian Hahn 112c1c346a
[IVDescriptor] Make sure the sign is included for negative extension.
At the moment, computeRecurrenceType does not include any sign bits in
the maximum bit width. If the value can be negative, this means the sign
bit will be missing and the sext won't properly extend the value.

If the value can be negative, increment the bitwidth by one to make sure
there is at least one sign bit in the result value.

Note that the increment is also needed *if* the value is *known* to be
negative, as a sign bit needs to be preserved for the sext to work.

Note that this at the moment prevents vectorization, because the
analysis computes i1 as type for the recurrence when looking through the
AND in lookThroughAnd.

Fixes PR51794, PR52485.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113056
2021-11-15 13:12:57 +00:00
Hans Wennborg 5be64d4164 Revert "[x86] fold vector (X > -1) & Y to shift+andn"
This casued assertion failures:

  llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:9446:
  void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode *, llvm::SDNode *):
  Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i))
  && "Cannot use this version of ReplaceAllUsesWith!"' failed.

See comment on the code review.

(Had to update some expectations in test/CodeGen/X86/vselect-zero.ll
 manually due to other changes having landed after the reverted one.)

> and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y
>
> This avoids the -1 constant vector in favor of an arithmetic shift
> instruction if it exists (the ISA is still not complete after all
> these years...).
>
> We catch this pattern late in combining by matching PCMPGT, so it
> should not interfere with more general folds.
>
> Differential Revision: https://reviews.llvm.org/D113603

This reverts commit bf5748a1af.
2021-11-15 12:35:49 +01:00
Simon Pilgrim 7bac1985f4 [DAG] SimplifyVBinOp - add SDLoc() argument
Pass in SDLoc instead of (repeated) local creations in SimplifyVBinOp and scalarizeBinOpOfSplats
2021-11-15 10:43:56 +00:00
Simon Pilgrim 8658d20724 [DAG] SimplifyVBinOp - pull out repeated getValueType() call. NFC. 2021-11-15 10:43:55 +00:00
Jay Foad 4119da2f7c [MachineVerifier] Live interval for a subreg must have subranges
MachineVerifier verified the subranges of a live interval if
they existed, but did not complain if they did not exist.

This patch changes the verifier to complain if there are no
subranges in the live interval for a subreg operand (so long
as MachineRegisterInfo says we should be tracking subreg
liveness for that register). This matches the conditions for
LiveIntervalCalc to create subranges in the first place.

Differential Revision: https://reviews.llvm.org/D112556
2021-11-15 10:13:35 +00:00
Dmitry Preobrazhensky 91f4650ebb [AMDGPU][MC][GFX10] Corrected global_atomic_fcmpswap*
Corrected src data size of global_atomic_fcmpswap and global_atomic_fcmpswap_x2 opcodes.

Differential Revision: https://reviews.llvm.org/D113746
2021-11-15 12:51:12 +03:00
David Green 4c3bfdc7f1 [ARM] Fix GatherScatter AddLikeOr condition 2021-11-15 09:44:41 +00:00
Peter Waller 599ea3e73f [AArch64][SVE] Break false dependencies for inactive lanes of FP unary operations
Follow up to D105889, covering instructions using sve_fp_2op_p_zd_HSD:
frintn, frintp, frintm, frintz, frinta, frintx, frinti, frecpx and
fsqrt.

Reviewed By: bsmith

Differential Revision: https://reviews.llvm.org/D113485
2021-11-15 09:15:21 +00:00
Simon Moll 7cf887b950 [VE] Fix SDNode user loop after efa896e5f7
Rewriting SDNode user loops broke VEISelLowering (commit efa896e5f7).
This fixes it.
2021-11-15 09:53:09 +01:00
Sander de Smalen f835fe8ef7 [LV] Rename blockNeedsPredication to blockNeedsPredicationForAnyReason.
The interface is a convenience function to ask if a block requires
predication when widening, but it's important that there are two
separate concepts to consider:
(A) The block was predicated in the original loop.
(B) The block was unpredicated in the original loop, but requires
    predication because of tail folding.

In the case of (B) we know that at least one lane of the vector will
be executed, which means we can implementing a load from a uniform address
with a scalar load + splat (D112552). In the case of predication because
of (A), we cannot do this, because the scalar load itself requires
predication.

The name 'blockNeedsPredication' does not make the distinction between
(A) and (B), hence the reason to rename it.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D113392
2021-11-15 08:04:20 +00:00
Kyungwoo Lee 6747d44bda [DebugInfo] Fix end_sequence of debug_line in LTO Object
In a LTO build, the `end_sequence` in debug_line table for each compile unit (CU) points the end of text section which merged all CUs. The `end_sequence` needs to point to the end of each CU's range. This bug often causes invalid `debug_line` table in the final `.dSYM` binary for MachO after running `dsymutil` which tries to compensate an out-of-range address of `end_sequence`.
The fix is to sync the line table termination with the range operations that are already maintained in DwarfDebug. When CU or section changes, or nodebug functions appear or module is finished, the prior pending line table is terminated using the last range label. In the MC path where no range is tracked, the old logic is conservatively used to end the line table using the section end symbol.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D108261
2021-11-14 20:19:47 -08:00
Kazu Hirata feb40a3a47 [llvm] Use range-based for loops with instructions (NFC) 2021-11-14 19:40:48 -08:00
Kazu Hirata d243cbf8ea [llvm] Use isa instead of dyn_cast (NFC) 2021-11-14 19:40:46 -08:00
Kazu Hirata a84a401f7e [AMDGPU] Remove selectStoreIntrinsic (NFC)
The last use was removed on Jan 13, 2020 in commit
533d650e94.
2021-11-14 19:40:44 -08:00
Mircea Trofin a32c2c3808 [NFC] Use Optional<ProfileCount> to model invalid counts
ProfileCount could model invalid values, but a user had no indication
that the getCount method could return bogus data. Optional<ProfileCount>
addresses that, because the user must dereference the optional. In
addition, the patch removes concept duplication.

Differential Revision: https://reviews.llvm.org/D113839
2021-11-14 19:03:30 -08:00
Chen Zheng eec9ca622c [PowerPC] guard update form prepare with non-const increment with option
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D113471
2021-11-15 02:16:46 +00:00
Lang Hames 69be352a19 Reapply "[ORC] Initial MachO debugging support (via GDB JIT debug.." with fixes.
This reapplies e1933a0488 (which was reverted in
f55ba3525e due to bot failures, e.g.
https://lab.llvm.org/buildbot/#/builders/117/builds/2768).

The bot failures were due to a missing symbol error: We use the input object's
mangling to decide how to mangle the debug-info registration function name. This
caused lookup of the registration function to fail when the input object
mangling didn't match the host mangling.

Disbaling the test on non-Darwin platforms is the easiest short-term solution.
I have filed https://llvm.org/PR52503 with a proposed longer term solution.
2021-11-14 14:44:07 -08:00
Simon Pilgrim c3a772fdf5 [X86] Add getPack helper
This helper provides a more complete approach for lowering to X86ISD::PACKSS/PACKUS nodes - testing for existing suitable sign/zero extension before recreating it.

It also optionally packs the upper half instead of the lower half.
2021-11-14 21:27:15 +00:00
Koakuma 3e0f3041cc [SPARC] Zero-extend the operands when doing UMULO on 64-bit integers
On SPARC, S/UMULO operation on 64-bit integers works by extending the value to 128-bit, then doing a multiplication and checking the upper half of the result.
This makes UMULO works correctly by putting a zero in the upper half rather than doing a sign extension.

Reviewed By: LemonBoy

Differential Revision: https://reviews.llvm.org/D110555
2021-11-14 19:59:52 +01:00
Kazu Hirata 7379736774 [llvm] Use range-based for loops with User::operands (NFC) 2021-11-14 09:32:38 -08:00
Kazu Hirata 098e935174 [llvm] Use range-based for loops with CallBase::args (NFC) 2021-11-14 09:32:36 -08:00
Roman Lebedev 4dd2f0446c
[X86][Costmodel] `getReplicationShuffleCost()`: promote 16 bit-wide elements to 32 bit when no AVX512BW
The basic idea is simple, if we don't have native shuffle for this element type,
then we must have native shuffle for wider element type,
so promote, replicate, demote.

I believe, asking `getCastInstrCost(Instruction::Trunc` is correct semantically,
case in point `trunc <32 x i32> to <32 x i8>` aka 2 * ZMM will naively result in
2 * XMM, that then will be packed into 1 * YMM,
and it should count the cost of said packing,
not just the truncations.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113609
2021-11-14 20:01:38 +03:00
Roman Lebedev e876698a5d
[NFC][TTI] `getReplicationShuffleCost()`: s/Replicated/Dst/
'Replicated' is mouthful and somewhat ambigious,
while 'destination' is pretty self-explanatory.
2021-11-14 20:01:38 +03:00
Roman Lebedev b283961012
[X86][Costmodel] `trunc v8i64 to v16i16/v32i16` can appear after legalization, cost is same as for `trunc v8i64 to v8i16`
Same as D113842, but for i64

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113843
2021-11-14 18:41:38 +03:00
Roman Lebedev a5f2fdca99
[X86][Costmodel] `trunc v16i32 to v32i16` can appear after legalization, cost is same as for `trunc v16i32 to v16i16`
This was noticed in D113609, hopefully it unblocks that patch.
There are likely other similar problems.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113842
2021-11-14 18:41:37 +03:00
Mircea Trofin 0662a3612c [NFC][InlineFunction] Renamed some vars to conform to coding style 2021-11-14 07:26:44 -08:00
Sanjay Patel 254c5246e9 [DAGCombiner] match inverted/swapped patterns for vselect of mask of signbit
This was noted as a follow-up to D113212 / D113426:
4fc1fc4005
7e30404c3b
11522cfcad

https://alive2.llvm.org/ce/z/e4o96b

The canonicalization rules for these IR patterns are complicated,
and we were not matching the expected forms in 2 out of the 3
cases. We can make codegen more robust by matching the swapped
forms (and that will also work if these patterns are created late).
2021-11-14 09:35:26 -05:00
Simon Pilgrim f4143ffed7 [X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets
Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets.

Fixes regressions in D113192

Differential Revision: https://reviews.llvm.org/D113827
2021-11-14 13:40:53 +00:00
David Green 355ee18c5d [TypePromotion] Extend TypePromotion::isSafeWrap
This modifies the preconditions of TypePromotion's isSafeWrap
method, to allow it to work from all constants from the ICmp.
Using the code:
  %a = add %x, C1
  %c = icmp ult %a, C2

According to Alive, we can prove that is equivalent to
icmp ult (add zext(%x), sext(C1)), zext(C2)  given
C1 <=s 0 and C1 >s C2.
https://alive2.llvm.org/ce/z/CECYZB
Which is similar to what is already present. We can also
prove icmp ult (add zext(%x), sext(C1)), sext(C2) given
C1 <=s 0 and C1 <=s C2.
https://alive2.llvm.org/ce/z/KKgyeL

The PrepareWrappingAdds method was removed, and the
constants are now altered to sext or zext directly as
required by the above methods.

Differential Revision: https://reviews.llvm.org/D113678
2021-11-14 11:18:31 +00:00
Kristina Bessonova 5b4bfd8c24 [DwarfCompileUnit] getOrCreateCommonBlock(): check for existing entity first. NFCI
For global variables and common blocks there is no way to create entities
through getOrCreateContextDIE(), so no need to obtain the context first.

Differential Revision: https://reviews.llvm.org/D113651
2021-11-14 10:58:24 +02:00
Kristina Bessonova 90c5ab54a9 [DwarfCompileUnit] getOrCreateGlobalVariableDIE(): remove outdated comment. NFC 2021-11-14 10:56:54 +02:00
Lang Hames f55ba3525e Revert "[ORC] Initial MachO debugging support (via GDB JIT debug..."
This reverts commit e1933a0488 until I can look
into bot failures.
2021-11-14 00:14:39 -08:00
Kazu Hirata 7505b7045f [llvm] Use GetElementPtrInst::indices (NFC) 2021-11-13 21:43:28 -08:00
Lang Hames e1933a0488 [ORC] Initial MachO debugging support (via GDB JIT debug registration interface)
This commit adds a new plugin, GDBJITDebugInfoRegistrationPlugin, that checks
for objects containing debug info and registers any debug info found via the
GDB JIT registration API.

To enable this registration without redundantly representing non-debug sections
this plugin synthesizes a new embedded object within a section of the LinkGraph.
An allocation action is used to make the registration call.

Currently MachO only. ELF users can still use the DebugObjectManagerPlugin. The
two are likely to be merged in the near future.
2021-11-13 13:21:01 -08:00
ksyx c35e8185d8
[GVN][NFC] Remove redundant check
The if-check above deleted part guarantees that StoreOffset <= LoadOffset
and that StoreOffset + StoreSize >= LoadOffset + LoadSize, and given that
LoadOffset + LoadSize > LoadOffset when LoadSize > 0. Thus, this shows
StoreOffset + StoreSize > LoadOffset is guaranteed given LoadSize > 0,
while it could be meaningless to have a type with nonpositive size, so that
the check could be removed.

Part of revision D100179
Reviewed By: nikic
2021-11-13 15:59:43 -05:00
Keith Smiley 86e2af8043 reland: [VFS] Use original path when falling back to external FS
This reverts commit f0cf544d6f.

Just a small change to fix:

```
/home/buildbot/as-builder-4/llvm-clang-x86_64-expensive-checks-ubuntu/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp: In static member function ‘static llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> > llvm::vfs::File::getWithPath(llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >, const llvm::Twine&)’:
/home/buildbot/as-builder-4/llvm-clang-x86_64-expensive-checks-ubuntu/llvm-project/llvm/lib/Support/VirtualFileSystem.cpp:2084:10: error: could not convert ‘F’ from ‘std::unique_ptr<llvm::vfs::File>’ to ‘llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >’
   return F;
          ^
```

Differential Revision: https://reviews.llvm.org/D113832
2021-11-13 12:14:34 -08:00
Keith Smiley f0cf544d6f Revert "[VFS] Use original path when falling back to external FS"
```
/work/omp-vega20-0/openmp-offload-amdgpu-runtime/llvm.src/llvm/lib/Support/VirtualFileSystem.cpp: In static member function 'static llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> > llvm::vfs::File::getWithPath(llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >, const llvm::Twine&)':
/work/omp-vega20-0/openmp-offload-amdgpu-runtime/llvm.src/llvm/lib/Support/VirtualFileSystem.cpp:2084:10: error: could not convert 'F' from 'std::unique_ptr<llvm::vfs::File>' to 'llvm::ErrorOr<std::unique_ptr<llvm::vfs::File> >'
   return F;
          ^
```

This reverts commit c972175649.
2021-11-13 10:11:51 -08:00
Keith Smiley c972175649 [VFS] Use original path when falling back to external FS
This is a follow up to 0be9ca7c0f to make
paths in the case of falling back to the external file system use the
original format, preserving relative paths, and allow the external
filesystem to canonicalize them if needed.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D109128
2021-11-13 09:34:44 -08:00
Kazu Hirata 609ccbb240 [PowerPC] Use SDNode::uses (NFC) 2021-11-13 08:34:22 -08:00
Simon Pilgrim a310cbae02 [X86] Add getAVX512Node helper. NFC.
For AVX512 targets without VLX, we have to widen 128/256-bit vectors to 512-bits to use some specific AVX512 instructions (or some other instructions with predicates etc.).

I've pulled out the widening code from LowerFunnelShift into the helper function, so we can convert some other widening patterns in the future.
2021-11-13 13:59:42 +00:00
Florian Hahn 8ed8d37088
[SCEV] Update SCEVLoopGuardRewriter to hold reference to map. (NFC)
SCEVLoopGuardRewriter doesn't need to copy the rewrite map. It can just
hold a const reference instead, to avoid an unnecessary copy.
2021-11-13 09:39:14 +00:00
Craig Topper 82bc6a094e [X86] Promote f16 STRICT_FROUND to f32 and call libc.
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D113817
2021-11-12 21:37:03 -08:00
Lang Hames 2272ec1c63 [JITLink][MachO] Fix "find-symbol-by-address" logic.
Only search within the requested section, and allow one-past-then-end addresses.

This is needed to support section-end-address references to sections with no
symbols in them.
2021-11-12 21:28:32 -08:00
Kazu Hirata efa896e5f7 [Target] Use SDNode::uses (NFC) 2021-11-12 21:23:04 -08:00
Duncan P. N. Exon Smith 79c5479822 Support: Pass wrapped Error's error code through FileError
Change FileError to pass through the error code from the Error it wraps.
This allows APIs that return ECError to transition to FileError without
changing returned std::error_code.

This was extracted from https://reviews.llvm.org/D109345.

Differential Revision: https://reviews.llvm.org/D113225
2021-11-12 21:19:09 -08:00
Phoebe Wang e49fcfc7cd [X86][ABI] Change the alignment of f80 in 32-bit calling convention to meet with different data layout
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113739
2021-11-13 10:00:34 +08:00
Ben Langmuir 2a739f2789 [ORC][ORC-RT] Register type metadata from __swift5_types MachO section
Similar to how the other swift sections are registered by the ORC
runtime's macho platform, add the __swift5_types section, which contains
type metadata. Add a simple test that demonstrates that the swift
runtime recognized the registered types.

rdar://85358530

Differential Revision: https://reviews.llvm.org/D113811
2021-11-12 16:39:59 -08:00
Philip Reames 37ead201e6 [runtime-unroll] Use incrementing IVs instead of decrementing ones
This is one of those wonderful "in theory X doesn't matter, but in practice is does" changes. In this particular case, we shift the IVs inserted by the runtime unroller to clamp iteration count of the loops* from decrementing to incrementing.

Why does this matter?  A couple of reasons:
* SCEV doesn't have a native subtract node.  Instead, all subtracts (A - B) are represented as A + -1 * B and drops any flags invalidated by such.  As a result, SCEV is slightly less good at reasoning about edge cases involving decrementing addrecs than incrementing ones.  (You can see this in the inferred flags in some of the test cases.)
* Other parts of the optimizer produce incrementing IVs, and they're common in idiomatic source language.  We do have support for reversing IVs, but in general if we produce one of each, the pair will persist surprisingly far through the optimizer before being coalesced.  (You can see this looking at nearby phis in the test cases.)

Note that if the hardware prefers decrementing (i.e. zero tested) loops, LSR should convert back immediately before codegen.

* Mostly irrelevant detail: The main loop of the prolog case is handled independently and will simple use the original IV with a changed start value.  We could in theory use this scheme for all iteration clamping, but that's a larger and more invasive change.
2021-11-12 15:44:58 -08:00
Craig Topper 02bed66cd5 [RISCV] Improve codegen for i32 udiv/urem by constant on RV64.
The division by constant optimization often produces constants that
are uimm32, but not simm32. These constants require 3 or 4 instructions
to materialize without Zba.

Since these instructions are often used by a multiply with a LHS
that needs to be zero extended with an AND, we can switch the MUL
to a MULHU by shifting both inputs left by 32. Once we shift the
constant left, the upper 32 bits no longer need to be 0 so constant
materialization is free to use LUI+ADDIW. This reduces the constant
materialization from 4 instructions to 3 in some cases while also
reducing the zero extend of the LHS from 2 shifts to 1.

Differential Revision: https://reviews.llvm.org/D113805
2021-11-12 14:49:10 -08:00
Philip Reames de2fed6152 [unroll] Keep unrolled iterations with initial iteration
The unrolling code was previously inserting new cloned blocks at the end of the function.  The result of this with typical loop structures is that the new iterations are placed far from the initial iteration.

With unrolling, the general assumption is that the a) the loop is reasonable hot, and b) the first Count-1 copies of the loop are rarely (if ever) loop exiting.  As such, placing Count-1 copies out of line is a fairly poor code placement choice.  We'd much rather fall through into the hot (non-exiting) path.  For code with branch profiles, later layout would fix this, but this may have a positive impact on non-PGO compiled code.

However, the real motivation for this change isn't performance.  Its readability and human understanding.  Having to jump around long distances in an IR file to trace an unrolled loop structure is error prone and tedious.
2021-11-12 11:40:50 -08:00
Lang Hames 9d5e647428 [JITLink] Fix think-o in handwritten CWrapperFunctionResult -> Error converter.
We need to skip the length field when generating error strings.

No test case: This hand-hacked deserializer should be removed in the near future
once JITLink can use generic ORC APIs (including SPS and WrapperFunction).
2021-11-12 10:36:17 -08:00
Florian Hahn 03cfea68c6
[SCEV] Update SCEVLoopGuardRewriter to take SCEV -> SCEV map (NFC).
Split off refactoring from D113577 to reduce the diff. NFC as the new
interface will only be used in D113577.
2021-11-12 18:16:03 +00:00
Simon Pilgrim 6bb71738e2 [X86] convertShiftLeftToScale - improve vXi8 constant handling
Add support for v32i8/v64i8 converting shift-by-constant to multiply-by-constant. This helps us avoid the generic vXi8 shift lowering, and a lot of VPBLENDVB ops which can be particularly slow.

We also needed to reorder a few shift lowering patterns to prevent regressions, particularly for XOP+AVX2 (Excavator) targets (which can split to fast v16i8 shifts) and AVX512-BWI targets (which prefers to extend to fast v32i16 shifts).
2021-11-12 16:48:10 +00:00
Joel E. Denny c9dfe322ee [OpenMP] Fix main thread barrier for Pascal and amdgpu
Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781.

Reviewed By: jdoerfert, JonChesterfield, tianshilei1992

Differential Revision: https://reviews.llvm.org/D113602
2021-11-12 11:18:45 -05:00
Jay Foad a70bbb5f7a [AMDGPU] Simplify 64-bit division/remainder expansion
The old expansion open-coded a 64-bit addition in a strange way, by
adding the high parts *without* carry-in from the low part, and then
adding the carry back in later on. Fixing this saves a couple of
instructions and makes the code much easier to understand.

Differential Revision: https://reviews.llvm.org/D113679
2021-11-12 15:48:41 +00:00
Kazu Hirata 99d5cbbd7e [CodeGen] Use SDNode::uses (NFC) 2021-11-12 07:33:29 -08:00
Kerry McLaughlin 7647822156 [AArch64][SVE] Remove i1 type from isElementTypeLegalForScalableVector
`collectElementTypesForWidening` collects the types of load, store and
reduction Phis in a loop. These types are later checked using
`isElementTypeLegalForScalableVector` to prevent vectorisation of
loops with instruction types that are unsupported.

This patch removes i1 from the list of types supported for scalable
vectors. This fixes an assert ("Cannot yet scalarize uniform stores") in
`setCostBasedWideningDecision` when we have a loop containing a uniform
i1 store and a scalable VF, which we cannot create a scatter for.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D113680
2021-11-12 14:24:38 +00:00
Alexey Bataev 352c46e707 [SLP]Improve vectorization of split loads.
Need to fix ther cost estimation for split loads, since we look at the
subregs already, no need to permute them, need just to estimate
subregister insert, if it is smaller than the real register. Also, using
split loads, it might be profitable already to vectorize smaller trees
with gathering of the loads.

Differential Revision: https://reviews.llvm.org/D107188
2021-11-12 06:13:22 -08:00
Simon Pilgrim 59087dce3b [X86] combineX86ShufflesConstants - constant fold from target shuffles unless optsize = true
Currently we only constant fold target shuffles if any of the sources has one use, or it would remove a variable shuffle mask - the aim being to avoid constant pool bloat.

This patch proposes we should constant fold by default and only limit this if optsize is enabled - I've added a basic test for this in vector-mul.ll (the pmuludq case is by far the most common), I can add other specific test cases if people need them.

This should permit further constant folding, break some instruction dependencies and help reduce shuffle port pressure.

Differential Revision: https://reviews.llvm.org/D113748
2021-11-12 14:02:43 +00:00
Sanjay Patel bf5748a1af [x86] fold vector (X > -1) & Y to shift+andn
and (pcmpgt X, -1), Y --> pandn (vsrai X, BitWidth-1), Y

This avoids the -1 constant vector in favor of an arithmetic shift
instruction if it exists (the ISA is still not complete after all
these years...).

We catch this pattern late in combining by matching PCMPGT, so it
should not interfere with more general folds.

Differential Revision: https://reviews.llvm.org/D113603
2021-11-12 08:17:46 -05:00
Florian Hahn 819bca9b90
[SCEV] Use APIntOps::umin to select best max BC count (NFC).
Suggested in D102267, but I missed this in the committed version.
2021-11-12 12:20:01 +00:00
Neubauer, Sebastian d1f45ed58f [AMDGPU][NFC] Fix typos
Differential Revision: https://reviews.llvm.org/D113672
2021-11-12 11:37:21 +01:00
Simon Moll 751aa6c280 [VE][NFCi] Remove unused tablegen parameters
TableGen has started warning about unused template parameters in the isel patterns.  Remove those.

Reviewed By: kaz7

Differential Revision: https://reviews.llvm.org/D113675
2021-11-12 08:19:50 +01:00
Markus Lavin 4e94e25c90 Fix minor deficiency in machine-sink.
Register uses that are MRI->isConstantPhysReg() should not inhibit
sinking transformation.

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D111531
2021-11-12 08:01:13 +01:00
Kazu Hirata 2ca45adf24 [CodeGen, Target] Use MachineRegisterInfo::use_operands (NFC) 2021-11-11 22:28:55 -08:00
Serge Pavlov 3057e850b8 [X86] Preserve FPSW when popping x87 stack
When compiler converts x87 operations to stack model, it may insert
instructions that pop top stack element. To do it the compiler inserts
instruction FSTP right after the instruction that calculates value on
the stack. It can break the code that uses FPSW set by the last
instruction. For example, an instruction FXAM is usually followed by
FNSTSW, but FSTP is inserted after FXAM. As FSTP leaves condition code
in FPSW undefined, the compiler produces incorrect code.

With this change FSTP in inserted after the FPSW consumer if the last
instruction sets FPSW.

Differential Revision: https://reviews.llvm.org/D113335
2021-11-12 12:00:09 +07:00
Luís Ferreira 665b4138d9 [DebugInfo] run clang-format on some unformatted files
This trivial patch runs clang-format on some unformatted files before
doing logic changes and prevent hard to review diffs.

Differential Revision: https://reviews.llvm.org/D113572
2021-11-11 18:59:41 -08:00
Phoebe Wang 74b979abcd [X86][FP16] Avoid to generate VZEXT_MOVL with i16
This fixes the crash due to lacking VZEXT_MOVL support with i16.

Reviewed By: LuoYuanke, RKSimon

Differential Revision: https://reviews.llvm.org/D113661
2021-11-12 09:32:29 +08:00
Nikita Popov 986416251b [InstCombine] Drop redundant fold for and/or of icmp eq/ne (NFCI)
This handles a special case of foldAndOrOfICmpsUsingRanges()
with two equality predicates.
2021-11-11 20:25:40 +01:00
Min-Yih Hsu 99152a4164 [M68k][NFC] Rename 'GlSel' -> 'GISel'
AArch64 as well as other targets use the abbrev "GISel" so we'd better
to be consistent with them. NFC.
2021-11-11 11:01:09 -08:00
Simon Pilgrim 94a901a50a [X86] Move LowerFunnelShift below LowerShift. NFC.
Makes it easier to reuse the various vector shift helpers defined above LowerShift
2021-11-11 18:45:51 +00:00
Simon Pilgrim 010b09b0c5 [DAG] reassociateOpsCommutative - test getNode result directly. NFC
Matches the clean code style we use directly above
2021-11-11 18:45:50 +00:00
Mircea Trofin f64eee1625 [NFC][InlineAdvisor] Inform advisor when the module is invalidated
This avoids unnecessary re-calculation of module-wide features in the
MLInlineAdvisor. In cases where function passes don't invalidate
functions (and, thus, don't invalidate the module), but we re-process a
CGSCC, we currently refreshed module features unnecessarily. The
overhead of fetching cached results (albeit they weren't themselves
invalidated) was noticeable in certain modules' compilations.

We don't want to just invalidate the advisor object, though, via the
analysis manager, because we'd then need to re-create expensive state
(like the model evaluator in the ML 'development' mode).

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D113644
2021-11-11 10:23:49 -08:00