Commit Graph

157805 Commits

Author SHA1 Message Date
Quentin Colombet e547a333a4 [DeadArgElim] Set unused arguments for internal functions
Prior to this patch we would only set to undef the unused arguments of the
external functions. The rationale was that unused arguments of internal
functions wouldn't need to be turned into undef arguments because they
should have been simply eliminated by the time we reach that code.

This is actually not true because there are plenty of cases where we can't
remove unused arguments. For instance, if the internal function is used in
an indirect call, it may not be possible to change the function signature.
Yet, for statically known call-sites we would still like to mark the unused
arguments as undef.

This patch enables the "set undef arguments" optimization on internal
functions when we encounter cases where internal functions cannot be
optimized. I.e., whenever an internal function is marked "live".

Differential Revision: https://reviews.llvm.org/D124699
2022-05-02 11:16:32 -07:00
Jonas Paulsson 304378fd09 Reapply "[BuildLibCalls] Introduce getOrInsertLibFunc() for use when building
libcalls." (was 0f8c626). This reverts commit 14d9390.

The patch previously failed to recognize cases where user had defined a
function alias with an identical name as that of the library
function. Module::getFunction() would then return nullptr which is what the
sanitizer discovered.

In this updated version a new function isLibFuncEmittable() has as well been
introduced which is now used instead of TLI->has() anytime a library function
is to be emitted . It additionally also makes sure there is e.g. no function
alias with the same name in the module.

Reviewed By: Eli Friedman

Differential Revision: https://reviews.llvm.org/D123198
2022-05-02 19:37:00 +02:00
Amy Kwan 2534dc120a [PowerPC] Enable CR bits support for Power8 and above.
This patch turns on support for CR bit accesses for Power8 and above. The reason
why CR bits are turned on as the default for Power8 and above is that because
later architectures make use of builtins and instructions that require CR bit
accesses (such as the use of setbc in the vector string isolate predicate
and bcd builtins on Power10).

This patch also adds the clang portion to allow for turning on CR bits in the
front end if the user so desires to.

Differential Revision: https://reviews.llvm.org/D124060
2022-05-02 12:06:15 -05:00
Arthur Eubanks b07aab8fc1 [GlobalOpt] Iterate over replaced values deterministically to constprop
If there are pre-existing dead instructions, the order we visit replaced
values can cause us sometimes to not delete dead instructions.

The added test non-deterministically failed without the change.
2022-05-02 09:43:20 -07:00
Nikita Popov 95fedfab6c [InstCombine] Handle non-canonical GEP index in indexed compare fold (PR55228)
Normally the index type will already be canonicalized here, but
this is not guaranteed depending on visitation order. The code
was already accounting for a potentially needed sext, but a trunc
may also be needed.

Add a ConstantExpr::getSExtOrTrunc() helper method to make this
simpler. This matches the corresponding IRBuilder method in behavior.

Fixes https://github.com/llvm/llvm-project/issues/55228.
2022-05-02 17:56:01 +02:00
Simon Pilgrim 59dc8ce95a [X86] Reduce some superfluous diffs between znver1/znver2 models. NFC
znver2 is a mainly a search+replace of the znver1 model, but for no reason the HADD and DPPS have been moved around - try to keep these in sync (no actual changes in the models).
2022-05-02 16:45:43 +01:00
Simon Pilgrim ce9c0faca1 [X86][AMX] combineLdSt - don't dereference dyn_cast. NFC
This leads to null pointer dereference warnings - use cast<> which will assert that the cast correct.
2022-05-02 16:45:43 +01:00
Augie Fackler c7ae423e39 BuildLibCalls: add alloc-family attribute to many allocator functions
Differential Revision: https://reviews.llvm.org/D123086
2022-05-02 11:12:55 -04:00
Augie Fackler e940456531 BuildLibCalls: infer allocptr attribute for free and realloc() family functions
Differential Revision: https://reviews.llvm.org/D123084
2022-05-02 09:43:21 -04:00
Simon Pilgrim c7662dc3e5 [X86] MOVDDUP has the same sched behaviour as MOVSHDUP/MOVSLDUP on Skylake
Fixes an old TODO - confirmed on Agner + uops.info
2022-05-02 12:50:37 +01:00
David Green 2dcb2d8562 [AArch64] Cost modelling for fptoi_sat
This builds on top of the target-independent cost model added in D124269
to add aarch64 specific costs for fptoui_sat and fptosi_sat intrinsics.
For many common types they will be legal instructions as the AArch64
instructions will saturate naturally. For unsupported pairs of integer
and floating point types, an additional min/max clamp is needed.

Differential Revision: https://reviews.llvm.org/D124357
2022-05-02 11:36:05 +01:00
Simon Pilgrim 86bb7df6e6 [CostModel][X86] getScalarizationOverhead - handle vXi1 extracts with MOVMSK (pre-AVX512)
We can quickly extract multiple elements of a bool vector using MOVMSK ops - since we don't know what generated the vXi1, I've been optimistic and assumed we can use PMOVMSKB to extract the maximum number of bools with a single op.

The MOVMSK pattern isn't great for extract+insert round trips as vXi1 type legalization can interfere with this a lot - so this relies on us remaining good at using getScalarizationOverhead properly (and tagging both Insert and Extract modes) for those round trip cases.

The AVX512 KMOV codegen for bool extraction is a bit of a mess so for now I've not included that - the per-element cost is a lot more accurate for current codegen.
2022-05-02 09:58:39 +01:00
Nikita Popov aae5f8115a [Local] Consider atomic loads from constant global as dead
Per the guidance in
https://llvm.org/docs/Atomics.html#atomics-and-ir-optimization,
an atomic load from a constant global can be dropped, as there can
be no stores to synchronize with. Any write to the constant global
would be UB.

IPSCCP will already drop such loads, but the main helper in Local
doesn't recognize this currently. This is motivated by D118387.

Differential Revision: https://reviews.llvm.org/D124241
2022-05-02 10:52:58 +02:00
Nikita Popov 597946a4dd [ConstantFold] Don't convert getelementptr to ptrtoint+inttoptr
ConstantFolding currently converts "getelementptr i8, Ptr, (sub 0, V)"
to "inttoptr (sub (ptrtoint Ptr), V)". This transform is, taken by
itself, correct, but does came with two issues:

1. It unnecessarily broadens provenance by introducing an inttoptr.
   We generally prefer not to introduce inttoptr during optimization.
2. For the case where V == ptrtoint Ptr, this folds to inttoptr 0,
   which further folds to null. In that case provenance becomes
   incorrect. This has been observed as a real-world miscompile with
   rustc.

We should probably address that incorrect inttoptr 0 fold at some
point, but in either case we should also drop this inttoptr-introducing
fold. Instead, replace it with a fold rooted at
ptrtoint(getelementptr), which seems to cover the original
motivation for this fold (test2 in the changed file).

Differential Revision: https://reviews.llvm.org/D124677
2022-05-02 10:24:46 +02:00
Phoebe Wang 7c04454227 [ArgPromotion][Attributor] Update min-legal-vector-width when do promotion
X86 codegen uses function attribute `min-legal-vector-width` to select the proper ABI. The intention of the attribute is to reflect user's requirement when they passing or returning vector arguments. So Clang front-end will iterate the vector arguments and set `min-legal-vector-width` to the width of the maximum for both caller and callee.

It is assumed any middle end optimizations won't care of the attribute expect inlining and argument promotion.
- For inlining, we will propagate the attribute of inlined functions because the inlining functions become the newer caller.
- For argument promotion, we check the `min-legal-vector-width` of the caller and callee and refuse to promote when they don't match.

The problem comes from the optimizations' combination, as shown by https://godbolt.org/z/zo3hba8xW. The caller `foo` has two callees `bar` and `baz`. When doing argument promotion, both `foo` and `bar` has the same `min-legal-vector-width`. So the argument was promoted to vector. Then the inlining inlines `baz` to `foo` and updates `min-legal-vector-width`, which results in ABI mismatch between `foo` and `bar`.

This patch fixes the problem by expanding the concept of `min-legal-vector-width` to indicator of functions arguments. That says, any passes touch functions arguments have to set `min-legal-vector-width` to the value reflects the width of vector arguments. It makes sense to me because any arguments modifications are ABI related and should response for the ABI compatibility.

Differential Revision: https://reviews.llvm.org/D123284
2022-05-02 14:13:05 +08:00
Fangrui Song 2019c9b1c8 [RISCV] Lower case the first letter of LowerRISCVMachineOperandToMCOperand. NFC 2022-05-01 14:13:55 -07:00
Florian Hahn 5387a38c38
[SimpleLoopUnswitch] Freeze individual OR/AND operands.
In some cases, it is not enough to freeze the final AND/OR operation
when chaining a number of invariant conditions together.

After creating a chain of ANDs/ORs, we assume all unswitched operands to
be either true or false. But if any of the operands is poison, the rest
of the operands could have any value after branching on the frozen
condition.

To avoid that, freeze individual operands, if needed. In some cases this
may lead to unnecessary freezes, but it seems required at least for some
cases (see trivial-unswitch-freeze-individual-conditions.ll)

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124554
2022-05-01 20:11:05 +01:00
Simon Pilgrim 34f97a3709 [VectorCombine] Merge isa<>/cast<> into dyn_cast<>. NFC.
We want to handle the the assert in VectorCombine so avoid the repeated isa/cast code.
2022-05-01 20:09:10 +01:00
Simon Pilgrim ae8b10e543 [DAG] (style) Break apart if-else chain as they all return 2022-05-01 17:56:59 +01:00
Simon Pilgrim 980f41d7c4 [X86] (style) Use auto for dyn_cast<> results 2022-05-01 17:15:18 +01:00
Simon Pilgrim d4f06ec874 [X86] (style) Don't use auto for non obvious types 2022-05-01 17:10:21 +01:00
Simon Pilgrim 09761ce295 [SLPVectorizer] Remove weird unicode character from comment. NFCI.
Whatever it was, Visual Assist really didn't like it....
2022-05-01 16:37:21 +01:00
Simon Pilgrim d5198cf92f [CostModel][X86] Check for 'null op' truncations
If the legalized src/dst types are the same, assume the "truncation" is free.

This fixes some edge cases such as mul lo/hi ops and bool vectors which will get legalized back to legal vector widths
2022-05-01 12:03:40 +01:00
Simon Pilgrim c2964746e3 [CostModel][X86] Reduce cost of vector selects on SSE2/AVX1 targets
Based off the script from D103695, we were exaggerating the cost of the OR(AND(X,M),AND(Y,~M)) expansion using instruction count instead of effective throughput
2022-05-01 09:32:14 +01:00
Jack Andersen 09325d3606 [CAPI] Expose CastInst::getCastOpcode in C API
Reviewed By: deadalnix

Differential Revision: https://reviews.llvm.org/D91514
2022-04-30 18:40:04 -04:00
Dmitry Vassiliev 2e7e0975c0 [NVPTX] Prefix "$L__" for branch label names
A global variable may have the same name as a label, and ptxas does not accept it.
Prefix labels with $L__ to fix this.

Reviewed By: MaskRay, tra

Differential Revision: https://reviews.llvm.org/D119669
2022-04-30 21:55:20 +02:00
Florian Hahn 8b022f87b0
[SimpleLoopUnswitch] Freeze trivial conditions if needed.
Trivial unswitching can also introduce new branches on undef/poison.
Freeze the conditions if needed.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124549
2022-04-30 19:53:36 +01:00
Paul Walker f10a8f6752 [LegalizeDAG] Fix TypeSize conversion error when expanding SIGN_EXTEND_INREG
SIGN_EXTEND_INREG expansion can trigger a TypeSize error because
"VT.getSizeInBits() == 1" is used to detect for a boolean without
first verifying VT is a scalar.
2022-04-30 19:21:48 +01:00
Craig Topper 6affe87bda [DAGCombiner] When matching a disguised rotate by constant don't forget to apply LHSMask/RHSMask.
We try to match as a disguised rotate by constant of these forms
(shl (X | Y), C1) | (srl X, C2) --> (rotl X, C1) | (shl Y, C1)
(shl X, C1) | (srl (X | Y), C2) --> (rotl X, C1) | (srl Y, C2)

We may have also looked through an AND to find the shift. If we
did, we need to apply a mask to the result.

I'll add an AArch64 test and pre-commit it and the RISC-V test
tomorrow.

Fixes PR55201.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D124711
2022-04-30 11:02:30 -07:00
luxufan e098281c27 [RISCV] Don't getDebugLoc for the end node of MBB iterator
Because of shrink wrapping, the block to insert epilog may don't have
instructions (Only debug instructions). And the position to insert may
point to MBB.end() that don't have a DebugLoc. This patch fix this
problem.

The test program was copied from the issue:https://github.com/llvm/llvm-project/issues/53662

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D123679
2022-04-30 16:00:20 +08:00
Saleem Abdulrasool 24ba1302b3 AArch64: modify Swift async frame record storage on Windows
The frame layout on Windows differs from that on other platforms. It
will spill the registers in descending numeric value (i.e. x30, x29,
...). Furthermore, the x29, x30 pair is particularly important as it
is used for the fast stack walking. As a result, we cannot simply
insert the Swift async frame record in between the store. To provide
the simplistic search mechanism, always spill the async frame record
prior to the spilled registers.

This was caught by the assertion failure in the frame lowering code when
building the runtime for Windows AArch64.

Fixes: #55058

Differential Revision: https://reviews.llvm.org/D124498
Reviewed By: mstorsjo
2022-04-30 09:01:33 -07:00
Juneyoung Lee 40a2e35599 [InstCombine] Remove the undef-related workaround code in visitSelectInst
This patch removes an old hack in visitSelectInst that was written to avoid miscompilation bugs in loop unswitch.
(Added via https://reviews.llvm.org/D35811)

The legacy loop unswitch pass will be removed after D124376, and the new simple loop unswitch pass correctly uses freeze to avoid introducing UB after D124252.

Since the hack is not necessary anymore, this patch removes it.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D124426
2022-04-30 20:48:42 +09:00
Simon Pilgrim 92235e3bf4 [X86] lowerShuffleAsRepeatedMaskAndLanePermute - permit 32-bit sublane permute for unary v32i8 cases
Increase the likelihood that we can lower to a permd(pshufb()) pattern, but only after we've attempted with 64-bit sublane permutes first

Fixes #55066
2022-04-30 11:00:28 +01:00
Yeting Kuo c069e37019 [RISCV] Add DAGCombine to fold base operation and reduction.
Transform (<bop> x, (reduce.<bop> vec, splat(neutral_element))) to
(reduce.<bop> vec, splat (x)).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122563
2022-04-30 14:07:05 +08:00
Craig Topper f91690f7db [RISCV] Don't merge addi into load/store address if addi has a FrameIndex operand.
This fixes a crash from D124231.

We can't fold
  (load (add base, (addi src, off1)), off2)
     -> (load (add base, src), off1+off2)
if the src is a FrameIndex. FrameIndex cannot be the operand of an
add.

There was an immediate==0 check that I think was trying to catch
the common case of FrameIndex addis where the immediate is 0, but
they can also appear in non-zero form. Instead explicitly check
for a FrameIndex operand.
2022-04-29 18:22:20 -07:00
Craig Topper 5aa1a7b307 [RISCV] Remove 'frameindex' from list for ComplexPattern. NFC
Putting a node in this list allows the node to be used as the root
of an isel pattern that would then call the ComplexPattern. The
usual case is to use the ComplexPattern as the operand of another
operator.

AddrFI is never used as a root operation. frameindex is handled
directly with custom code in RISCVISelDAGToDAG::Select. So adding
frameindex to the list here serves no purpose.
2022-04-29 17:41:07 -07:00
Hongtao Yu bdb8c50a1c [CSSPGO] Turn on priority inlining for probe-only profile
We have seen that the prioirty inliner delivered on-par performance with the old inliner for probe-only CSSPGO profile, as long as without a size budget. I'm turning on the priority inliner for probe-only profile by default.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D124632
2022-04-29 17:31:56 -07:00
Hongtao Yu e36786d15f [CSSPGO] Rename ProfileIsCSNested and ProfileIsCSFlat
To be more clear and definitive, I'm renaming `ProfileIsCSFlat` back to `ProfileIsCS` which stands for full context-sensitive flat profiles.  `ProfileIsCSNested` is now renamed to `ProfileIsPreInlined` and is extended to be applicable for CS flat profiles too. More specifically, `ProfileIsPreInlined` is for any kind of profiles (flat or nested) that contain 'ShouldBeInlined' contexts. The flag is encoded in the profile summary section for extbinary profiles and is computed on-the-fly for text profiles.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D122602
2022-04-29 17:03:52 -07:00
David Kreitzer 6918a15f43 Test commit. Fixed a typo in a comment. 2022-04-29 16:18:09 -07:00
Dmitry Vassiliev 8c49ab040c [NVPTX] Add add.cc/addc.cc/sub.cc/subc.cc for i64
PTX supports those instructions for i64 starting from 4.3.
The patch also marks corresponding DAG nodes legal for both i32 and i64.

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D124698
2022-04-29 15:32:22 -07:00
Simon Dardis 938ed8ae99 [MIPS] Address instruction selection failure for abs.[sd]
Previously, the choice between the instruction selection of ISD::FABS was
decided at the point of setting the MIPS target lowering operation choice
either `Custom` lowering or `Legal`. This lead to instruction selection
failures as functions could be marked as having no NaNs.

Changing the lowering to always be `Custom` and directly handling the
the cases where MIPS selects the instructions for ISD::FABS resolves
this crash.

Thanks to kray for reporting the issue and to Simon Atanasyan for producing
the reduced test case.

This resolves PR/53722.

Differential Revision: https://reviews.llvm.org/D124651
2022-04-29 23:10:58 +01:00
James Y Knight 02aa795785 Revert "[JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes"
This change has caused non-reproducibility of a self-build of Clang
when using NewPM and providing profile data.

This reverts commit 35f38583d2.
2022-04-29 21:15:47 +00:00
Congzhe Cao c428a3d2a0 [LoopCacheAnalysis] Enable delinearization of fixed sized arrays
Currently loop cache cost (LCC) cannot analyze fix-sized arrays
since it cannot delinearize them. This patch adds the capability
to delinearize fix-sized arrays to LCC. Most of the code is ported
from DependenceAnalysis.cpp and some refactoring will be done in a
next patch.

Reviewed By: #loopoptwg, Meinersbur

Differential Revision: https://reviews.llvm.org/D122857
2022-04-29 16:01:27 -04:00
Stanislav Mekhanoshin 51e02409f0 [AMDGPU] Produce waitcounts for LDS DMA
MUBUF and FLAT LDS DMA operations need a wait on vmcnt before LDS written
can be accessed. A load from LDS to VMEM does not need a wait.

Differential Revision: https://reviews.llvm.org/D124626
2022-04-29 11:14:11 -07:00
David Penry dcb77643e3 Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Fixed "private field is not used" warning when compiled
with clang.

original commit: 28d09bbbc3
reverted in: fa49021c68

------

This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-29 10:54:39 -07:00
Philip Reames 3ea191ed03 [RISCV] Factor repeating code into getMaskTypeFor(VT) [nfc] 2022-04-29 10:00:57 -07:00
Joe Nash 813e521e55 [AMDGPU] Add gfx11 subtarget ELF definition
This is the first patch of a series to upstream support for the new
subtarget.

Contributors:
Jay Foad <jay.foad@amd.com>
Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>

Patch 1/N for upstreaming AMDGPU gfx11 architectures.

Reviewed By: foad, kzhuravl, #amdgpu

Differential Revision: https://reviews.llvm.org/D124536
2022-04-29 12:27:17 -04:00
Paul Walker b481512485 [SVE] Move reg+reg gather/scatter addressing optimisations from lowering into DAG combine.
This is essentially a refactoring patch but allows more cases to
be caught, hence the output changes to some tests.

Differential Revision: https://reviews.llvm.org/D122994
2022-04-29 17:42:33 +01:00
Philip Reames f927be0df8 [RISCV] Extract getAllOnesMask helper [nfc] 2022-04-29 09:30:18 -07:00
Alexey Bataev 484fcb9888 [SLP][NFC]Fix a comment. 2022-04-29 09:27:13 -07:00
Craig Topper 5c38373125 [RISCV] Improve constant materialization for cases that can use LUI+ADDI instead of LUI+ADDIW.
It's possible that we have a constant that isn't simm32 so we can't
use LUI+ADDIW, but we can use LUI+ADDI. Because ADDI uses a sign
extended constant, it's possible that after subtracting it out, we
end up with a simm32 that maps to LUI.

This patch detects this case after removing Lo12 and before shifting
the value for SLLI.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124222
2022-04-29 08:58:32 -07:00
Simon Pilgrim b424055b52 [X86] lowerShuffleAsRepeatedMaskAndLanePermute - move the sublane split code into a lambda helper. NFC.
This is a NFC cleanup as part of the work on #55066 - the idea being that we will be able to check for multiple sub lane scales.
2022-04-29 16:03:50 +01:00
Alexey Bataev 371412e065 [COST]Fix crash for non-power-2 vector shuffle mask.
Need to normalizize the mask to avoid possible crashes during attempts
to estimate cost of the very long shuffles with non-power-2 number of
elements in masks.
2022-04-29 07:28:07 -07:00
Florian Hahn a80081763c
[SimplifyCFG] Avoid shifting by a too large exponent.
TI->getBitWidth can be > 64 and in those cases the shift will be UB due
to the exponent being too large.

To fix this, cap the shift at 63. I think this should work out fine,
because TableSize is itself a 64 bit type and the maximum table size
must fit in the type. Also, if we would underestimate the size here, at
most we get an extra ZExt.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D124608
2022-04-29 15:19:06 +01:00
Anna Thomas 205246cb64 [CompileTime] [Passes] Avoid computing unnecessary analyses. NFC
Similar to c515b2f39e, If there are no loops in the function as seen
through LI, we should avoid computing the remaining expensive analyses
(such as SCEV, BPI).  Reordered the analyses requests and early return
if there are no loops.

The logic of avoiding expensive analyses is applied to LoopVectorizer,
LoopLoadElimination and LoopUnrollPass, i.e. all function passes which operate
on loops.

This is an NFC with compile time improvement.

Differential Revision: https://reviews.llvm.org/D124529
2022-04-29 10:00:06 -04:00
Stefan Pintilie f685bce808 [PowerPC][NFC] Add a function to determine if a call needs to be NOTOC.
Add the isNoTOCCallInstr function to PPCInstrInfo to determine if a call opcode
does not need a TOC restore after the call. All call opcodes should be listed in
this function. A default unreachable in this function should force future call
opcodes to also be added.

This is a follow up patch to D122012

Reviewed By: jsji, shchenz

Differential Revision: https://reviews.llvm.org/D124415
2022-04-29 08:36:07 -05:00
Paul Walker 23c509754d [DAGCombiner] Stop invalid sign conversion in refineIndexType.
When looking through extends of gather/scatter indices it's safe
to convert a known positive signed index to unsigned, but unsigned
indices must remain unsigned.

Depends On D123318

Differential Revision: https://reviews.llvm.org/D123326
2022-04-29 14:20:13 +01:00
Paul Walker 59588f0a3d [SVE][ISel] Ensure explicit gather/scatter offset extension isn't lost.
getGatherScatterIndexIsExtended currently looks through all
SIGN_EXTEND_INREG operations regardless of their input type.  This
patch restricts the code to only look through i32->i64 extensions,
which are the ones supported implicitly by SVE addressing modes.

Differential Revision: https://reviews.llvm.org/D123318
2022-04-29 14:20:13 +01:00
Joseph Huber 643c9b22ef [OpenMP] Make generating offloading entries more generic
This patch moves the logic for generating the offloading entries to the
OpenMPIRBuilder. This makes it easier to re-use in other places, such as
for OpenMP support in Flang or using the same method for generating
offloading entires for other languages like Cuda.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D123460
2022-04-29 09:14:31 -04:00
Nikita Popov 027c728f29 [SelectionDAGBuilder] Don't create MGATHER/MSCATTER with Scale != ElemSize
This is an alternative to D124530. In getUniformBase() only create
scales that match the gather/scatter element size. If targets also
support other scales, then they can produce those scales in target
DAG combines. This is what X86 already does (as long as the
resulting scale would be 1, 2, 4 or 8).

This essentially restores the pre-opaque-pointer state of things.

Fixes https://github.com/llvm/llvm-project/issues/55021.

Differential Revision: https://reviews.llvm.org/D124605
2022-04-29 14:57:53 +02:00
Nikita Popov 1881711fbb [InstCombine] Remove memset of undef value
This removes memset with undef char. We already do this for stores
of undef value.

This comes with the caveat that this optimization is not, strictly
speaking, legal for undef values, because we might be overwriting
a poison value. However, our entire load/store model currently still
operates on undef values, so we need to support undef here as well
for internal consistency.

Once https://github.com/llvm/llvm-project/issues/52930 is resolved,
these and related folds can be limited to poison -- I've added
FIXMEs to that effect.

Differential Revision: https://reviews.llvm.org/D124173
2022-04-29 14:51:18 +02:00
Ricky Zhou 24a133e16f
[LV] Rename CountRoundDown to VectorTripCount (NFC)
The name CountRoundDown is potentially misleading, as the number of
iterations can be rounded up when folding the tail.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D119681
2022-04-29 13:50:00 +01:00
Nikita Popov 982cbed819 [InstCombine] Fold logical and/or of range icmps with nowrap flags
This is an edge-case where we don't convert to bitwise and/or based
on implies poison reasoning, so explicitly try to perform the fold
in logical form. The transform itself is poison-safe, as both icmps
are based on the same value and any nowrap flags are discarded as
part of the fold (https://alive2.llvm.org/ce/z/aCwC8b for the used
example).
2022-04-29 14:42:42 +02:00
Florian Hahn e66127e69b
[VPlan] Simplify & adjust code as suggested in D123005.
Improve code as suggested in D123005. Applied separately, because the
comments where made a diff that has not been rebased to current main.
2022-04-29 13:34:54 +01:00
NAKAMURA Takumi 61d3a3afe2 AVRExpandPseudoInsts.cpp: Fix a warning. [-Wunused-but-set-variable]
It has been enabled since llvmorg-15-init-5683-g2af845a6519c, aka D122271.
2022-04-29 21:01:47 +09:00
Paul Walker 7a0b897e86 [DAGCombiner][SVE] Ensure MGATHER/MSCATTER addressing mode combines preserve index scaling
refineUniformBase and selectGatherScatterAddrMode both attempt the
transformation:

  base(0) + index(A+splat(B)) => base(B) + index(A)

However, this is only safe when index is not implicitly scaled.

Differential Revision: https://reviews.llvm.org/D123222
2022-04-29 12:35:16 +01:00
Simon Pilgrim 3562f855b7 [X86] SimplifyDemandedVectorEltsForTargetNode - fold (uniform) shift(0,x) -> 0 2022-04-29 12:08:47 +01:00
Nikita Popov 57aaeefc18 [InstCombine] Pass ICmpInsts to foldAndOrOfICmpsUsingRanges() (NFC)
Pass the whole instruction rather than unpacking it. This makes it
easier to reuse the function in another place, as the entire
logic is encapsulated.
2022-04-29 12:46:31 +02:00
Simon Pilgrim 336a1233b2 [X86] SimplifyDemandedVectorEltsForTargetNode - fold shift(0,x) -> 0 2022-04-29 11:32:54 +01:00
Nikita Popov 1f53932a95 [InstCombine] Remove foldAndOrOfEqualityCmpsWithConstants() fold
This fold handles a special subset of foldAndOrOfICmpsUsingRanges(),
use the more generic implementation instead.

The result can differ if a representation using a range comparison
is possible, in which case that is preferred over masking. There is
a canonicalization opportunity here.
2022-04-29 12:23:00 +02:00
Nikita Popov 5515263e44 [InstCombine] Fold and of two ranges differing by mask
This is the de Morgan conjugated variant of the existing fold for
ors. Implement this by switching the range code to always work
on ors and perform invert operands at the start and end. This makes
reasoning easier and makes the extension more obviosuly correct.
2022-04-29 12:01:38 +02:00
Florian Hahn fb4113ef0c
[Passes] Remove legacy LoopUnswitch pass.
The legacy LoopUnswitch pass is only used in the legacy pass manager
pipeline, which is deprecated.

The NewPM replacement is SimpleLoopUnswitch and I think it is time to
remove the legacy LoopUnswitch code.

Fixes #31000.

Reviewed By: aeubanks, Meinersbur, asbirlea

Differential Revision: https://reviews.llvm.org/D124376
2022-04-29 10:30:49 +01:00
Simon Pilgrim 6c44e398ec [X86] combineShuffle - reuse SDLoc. NFCI. 2022-04-29 10:30:11 +01:00
Simon Pilgrim 2d7f0b1c22 [X86] Fold ANDNP(undef,x)/ANDNP(x,undef) -> 0
Matches the fold in DAGCombiner::visitANDLike.
2022-04-29 10:20:48 +01:00
Nikita Popov d5ee20fcc9 [InstCombine] Switch an or of icmps fold to use constant ranges
We can express this fold more naturally when working on the constant
range implementation. This change is not entirely NFC, because the
code now also handles cases that don't match the precise pattern
this previously looked for, e.g. we can omit an add on one of the
ranges.
2022-04-29 11:15:54 +02:00
David Green 7047c47918 [VecCombine] Fix sort comparator logic in foldShuffleFromReductions
I think this sort comparator was overly complex, and the windows
expensive check bot agreed, failing as it was not giving a strict weak
ordering. Change it to use the comparison of the mask values as unsigned
integers. This should sort the undef elements to the end whilst keeping
X<Y otherwise.
2022-04-29 09:30:02 +01:00
Nikita Popov 884e9a877b [SimplifyCFG] Replace condition value when threading
Replace the condition value with the known constant value on the
threaded edge. This happens implicitly with phi threading because
we replace with the incoming value, but not for non-phi threading.
2022-04-29 09:50:27 +02:00
Nikita Popov 4e545bdb35 [SimplifyCFG] Thread branches on same condition in more cases (PR54980)
SimplifyCFG implements basic jump threading, if a branch is
performed on a phi node with constant operands. However,
InstCombine canonicalizes such phis to the condition value of a
previous branch, if possible. SimplifyCFG does support this as
well, but only in the very limited case where the same condition
is used in a direct predecessor -- notably, this does not include
the common diamond pattern (i.e. two consecutive if/elses on the
same condition).

This patch extends the code to look back a limited number of
blocks to find a branch on the same value, rather than only
looking at the direct predecessor.

Fixes https://github.com/llvm/llvm-project/issues/54980.

Differential Revision: https://reviews.llvm.org/D124159
2022-04-29 09:44:05 +02:00
Serge Pavlov c96cc500f0 [SystemZ] Custom lowering of llvm.is_fpclass
Differential Revision: https://reviews.llvm.org/D114695
2022-04-29 13:27:36 +07:00
LiaoChunyu 03a3654203 [RISCV] Add cost model for SK_Broadcast
Add cost model for broadcast shuffle in RISCVTTIImpl::getShuffleCost
with scalable vector. The cost model might not the best.

For scalable vector, BasicTTIImpl::getShuffleCost return invalid cost,
so this patch relies on the existing cost model in BasicTTIImpl.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124101
2022-04-29 13:28:02 +08:00
Hsiangkai Wang c62b014db9 [RISCV] Merge addi into load/store as there is a ADD between them
This patch adds peephole optimizations for the following patterns:

(load (add base, (addi src, off1)), off2)
   -> (load (add base, src), off1+off2)
(store val, (add base, (addi src, off1)), off2)
   -> (store val, (add base, src), off1+off2)

Differential Revision: https://reviews.llvm.org/D124231
2022-04-29 04:33:05 +00:00
Serge Pavlov 9fc58f1820 [PowerPC] Support of ppc_fp128 in lowering of llvm.is_fpclass
PowerPC supports `ppc_fp128`, which is not an IEEE floating point
type. The generic lowering of llvm.is_fpclass could not handle it
properly. This change extends the generic lowering code to
support `ppc_fp128`.

The change was tested on emulator using runtime tests from
https://reviews.llvm.org/D112933 and the patch for clang
https://reviews.llvm.org/D112932.

Differential Revision: https://reviews.llvm.org/D113908
2022-04-29 11:10:47 +07:00
Roman Lebedev 981ed72a17
[NFC][SCEV] Refactor `createNodeForSelectViaUMinSeq()` out of `createNodeForSelectOrPHIViaUMinSeq()` 2022-04-29 02:37:06 +03:00
Zequan Wu 4fe2ab5279 Revert "[DebugInfo][InstrRef] Describe value sizes when spilt to stack"
This reverts commit a15b66e76d.

This causes linker to crash at assertion: `Assertion failed: !Expr->isComplex(), file C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\lib\CodeGen\LiveDebugValues\InstrRefBasedImpl.cpp, line 907`.
2022-04-28 16:18:16 -07:00
Mircea Trofin 49942d595f [NFC] remove const from FunctionPropertiesAnalysis::run, keep on Result
The goal in 75881d8b02 was just modifying what `Result` is, didn't
need to also modify ::run.
2022-04-28 15:10:21 -07:00
CHIANG, YU-HSUN (Tommy Chiang, oToToT) 4a31af88a2 [MC][AArch64] Enable '+v8a' when nothing specified for MCSubtargetInfo
Since D110065, the 'R' profile support is added to LLVM. It turns the
`generic` cpu into the intersection of v8-a and v8-r. However, this
makes some backward compatibility problems. The original patch makes
the clang driver implicitly pass -march=armv8-a when only the triple
is specified. Since it only applies to clang, other tools like
llvm-objdump still faces the backward compatibility problem.

This patch applies the same idea to MC related tools by enabling '+v8a'
feature when nothing is specified (both CPU and FS are empty) for
MCSubtargetInfo creation.

This patch should fix PR53956.

Reviewed by: labrinea

Differential Revision: https://reviews.llvm.org/D124319
2022-04-29 04:53:22 +08:00
David Penry fa49021c68 Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
This reverts commit 28d09bbbc3
while I investigate a buildbot failure.
2022-04-28 13:29:27 -07:00
Simon Pilgrim ab17ed0723 [X86] Don't fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y)) on BMI2 targets
With BMI2 we have SHRX which is a lot quicker than regular x86 shifts.

Fixes #55138
2022-04-28 21:28:16 +01:00
David Penry 28d09bbbc3 [CodeGen][ARM] Enable Swing Module Scheduling for ARM
This patch permits Swing Modulo Scheduling for ARM targets
turns it on by default for the Cortex-M7.  The t2Bcc
instruction is recognized as a loop-ending branch.

MachinePipeliner is extended by adding support for
"unpipelineable" instructions.  These instructions are
those which contribute to the loop exit test; in the SMS
papers they are removed before creating the dependence graph
and then inserted into the final schedule of the kernel and
prologues. Support for these instructions was not previously
necessary because current targets supporting SMS have only
supported it for hardware loop branches, which have no
loop-exit-contributing instructions in the loop body.

The current structure of the MachinePipeliner makes it difficult
to remove/exclude these instructions from the dependence graph.
Therefore, this patch leaves them in the graph, but adds a
"normalization" method which moves them in the schedule to
stage 0, which causes them to appear properly in kernel and
prologues.

It was also necessary to be more careful about boundary nodes
when iterating across successors in the dependence graph because
the loop exit branch is now a non-artificial successor to
instructions in the graph. In additional, schedules with physical
use/def pairs in the same cycle should be treated as creating an
invalid schedule because the scheduling logic doesn't respect
physical register dependence once scheduled to the same cycle.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D122672
2022-04-28 13:01:18 -07:00
Mircea Trofin 75881d8b02 [NFC] const-ed the return type of FunctionPropertiesAnalysis
The result is a data bag, this makes sure it's signaled to a user that
the data can't be mutated when, for example, doing something like:

auto &R = FAM.getResult<FunctionPropertiesAnalysis>(F)
...
R.Uses++
2022-04-28 12:42:16 -07:00
Florian Hahn f4e1eaa375
Revert "[VPlan] Remove uneeded needsVectorIV check."
This reverts commit 43842b887e while I
investigate a buildbot failure.

It also reverts the follow-up commit
2883de0514.
2022-04-28 20:16:21 +01:00
David Tenty 8042699a30 [LLVM] Add exported visibility style for XCOFF
For the AIX linker, under default options, global or weak symbols which
have no visibility bits set to zero (i.e. no visibility, similar to ELF
default) are only exported if specified on an export list provided to
the linker. So AIX has an additional visibility style called
"exported" which indicates to the linker that the symbol should
be explicitly globally exported.

This change maps "dllexport" in the LLVM IR to correspond to XCOFF
exported as we feel this best models the intended semantic (discussion
on the discourse RFC thread: https://discourse.llvm.org/t/rfc-adding-exported-visibility-style-to-the-ir-to-model-xcoff-exported-visibility/61853)
and allows us to enable writing this visibility for the AIX target
in the assembly path.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D123951
2022-04-28 14:56:00 -04:00
David Green ded8187e35 [VectorCombine] Try to reduce shuffle cost for commutative reduction operands
Given a shuffle feeding a commutative reduction, the lane ordering of
the shuffle will not alter the result. This is also true if there are a
number of operations between the reduction and the shuffle, providing
they only operate lane-wise. This patch searches for cases like that in
Vector Combine, allowing us to check the cost of the shuffle vs an
in-order identity shuffle and replace the order if possible. This only
handles a single shuffle at the moment to keep things simple, and is
able to ignore splats that produce results where every result is the
same.

This is a more powerful version of a combine that already happens in
instrcombine, capable of optimizing more cases by looking through more
instructions and being able to cost the shuffle.

Differential Revision: https://reviews.llvm.org/D123494
2022-04-28 19:46:12 +01:00
Alan Zhao 3333c28fc0 [llvm-ml] Improve indirect call parsing
In MASM, if a QWORD symbol is passed to a jmp or call instruction in
64-bit mode or a DWORD or WORD symbol is passed in 32-bit mode, then
MSVC's assembler recognizes that as an indirect call. Additionally, if
the operand is qualified as a ptr, then that should also be an indirect
call.

Furthermore, in 64-bit mode, such operands are implicitly rip-relative
(in fact, MSVC's assembler ml64.exe does not allow explicitly specifying
rip as a base register.)

To keep this patch managable, this patch does not include:
* error messages for wrong operand types (e.g. passing a QWORD in 32-bit
  mode)
* resolving indirect calls if the symbol is declared after it's first
  use (llvm-ml currently only runs a single pass).
* imlementing the extern keyword (required to resolve
  https://crbug.com/762167.)

This patch is likely missing a bunch of edge cases, so please do point
them out in the review.

Reviewed By: epastor, hans, MaskRay

Committed By: epastor (on behalf of ayzhao)

Differential Revision: https://reviews.llvm.org/D124413
2022-04-28 13:17:19 -04:00
Simon Pilgrim a9215ed9cc [InstCombine][X86] simplifyDemandedVectorEltsIntrinsic - handle avx2 per-element vector shifts 2022-04-28 18:14:54 +01:00
Alexey Bataev 75e1cf4a6a [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-28 10:04:41 -07:00
Craig Topper ec11fbb1d6 [RISCV] Use default promotion for (i32 (shl 1, X)) on RV64 when Zbs is enabled.
This improves opportunities to use bset/bclr/binv. Unfortunately,
there are no W versions of these instrcutions so this isn't always
a clear win. If we use SLLW we get free sign extend and shift masking,
but need to put a 1 in a register and can't remove an or/xor. If
we use bset/bclr/binv we remove the immediate materializationg and
logic op, but might need a mask on the shift amount and sext.w.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D124096
2022-04-28 09:58:30 -07:00
Pavel Samolysov 9197959e13 [ArgPromotion] Move ArgPart and OffsetAndArgPart to anonymous namespace
The structure ArgPart and alias OffsetAndArgPart have been moved
into the anonymous namespace. NFC.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D124617
2022-04-28 09:51:46 -07:00
Pavel Samolysov 6b825e50f7 [ArgPromotion] Change the condition to check the promotion limit
The condition should be 'ArgParts.size() > MaxElements', so that if we
have exactly 3 elements in the 'ArgParts' vector, the promotion should
be allowed because the 'MaxElement' threshold is not exceeded yet.

The default value for 'MaxElement' has been decreased to 2 in order
to avoid an actual change in argument promoting behavior. However,
this changes byval argument transformation behavior by allowing
adding not more than 2 arguments to the function instead of 3 allowed
before.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D124178
2022-04-28 09:42:58 -07:00
Bjorn Pettersson 3a39bb96ca [SelectionDAG] Use correct boolean representation in FoldConstantArithmetic
The description of SETCC says
  /// SetCC operator - This evaluates to a true value iff the condition is
  /// true.  If the result value type is not i1 then the high bits conform
  /// to getBooleanContents.

Without this patch, we sign extended the i1 to the used larger type
regardless of getBooleanContents. This resulted in miscompiles, as
shown in the attached testcase that ended up returning -1 instead of
1 when using -mattr=+v.

Fixes https://github.com/llvm/llvm-project/issues/55168

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124618
2022-04-28 18:42:16 +02:00
Simon Pilgrim 9e3b7e8e65 [X86] getTargetVShiftByConstNode - use SelectionDAG::FoldConstantArithmetic to perform constant folding. NFCI.
Remove some unnecessary code duplication.
2022-04-28 17:10:20 +01:00
Craig Topper 8631a5e712 [RISCV] Fix alias printing for vmnot.m
By clearing the HasDummyMask flag from mask register binary operations
and mask load/store.

HasDummyMask was causing an extra operand to get appended when
converting from MachineInstr to MCInst. This extra operand doesn't
appear in the assembly string so was mostly ignored, but it prevented
the alias instruction printing from working correctly.

Reviewed By: arcbbb

Differential Revision: https://reviews.llvm.org/D124424
2022-04-28 08:33:52 -07:00
Florian Hahn 2883de0514
[VPlan] Fix comment formatting from 43842b887e. 2022-04-28 16:31:48 +01:00
Florian Hahn 43842b887e
[VPlan] Remove uneeded needsVectorIV check.
Remove one of the last remaining uses of ::needsVectorIV, preparing for
its removal. Now that usesScalars is available and based on the
information explicit in VPlan, there is no need to use the pre-computed
needsVectorIV.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D123720
2022-04-28 16:27:34 +01:00
Alexey Bataev 9861ca0c23 Revert "[COST]Improve cost model for shuffles in SLP."
This reverts commit 29a470e380 to fix
a crash reported in https://reviews.llvm.org/D100486#3479989.
2022-04-28 08:11:56 -07:00
Simon Pilgrim de7cee24b6 [X86] getBT - attempt to peek through aext(and(trunc(x),c)) mask/modulo
Ideally we'd fold this with generic DAGCombiner, but that only works for !isTruncateFree cases - we might be able to adapt IsDesirableToPromoteOp to find truncated src ops in the future, but for now just use this peephole.

Noticed in Issue #55138
2022-04-28 16:10:26 +01:00
Pavel Samolysov 744a837838 [ArgPromotion] Rename variables according to the code style. NFC
Some loop counters ('i', 'e') and variables ('type') were named not
in accordance with the code style and clang-tidy issues warnings
about the using of such variables. This patch renames the variables
and fixes some typos in the comments within the source file.

Differential Revision: https://reviews.llvm.org/D123662
2022-04-28 15:32:05 +02:00
Chris Jackson c792884589 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland 3f2b76ec90 with the test corrected
to require x86-registered-target.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 14:21:56 +01:00
Chris Jackson cd5f9efc4d Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 3f2b76ec90.
2022-04-28 14:07:31 +01:00
Nikita Popov 90dba831ae [InstCombine] Fold or of icmp ne trunc/and
This adds the de Morgan conjugated variant for the existing
"and eq" style fold.

Proof: https://alive2.llvm.org/ce/z/tkNAcG
2022-04-28 15:07:16 +02:00
Chris Jackson 3f2b76ec90 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Reland commit 74273d575f following a fix
for a memory leak. The DVIRecoveryRecord vectors now use unique_ptr.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-28 13:55:49 +01:00
Simon Pilgrim ed8dffef4c [X86] getFauxShuffle - don't assume an UNDEF src element for AND/ANDNP results in an UNDEF shuffle mask index
The other src element might be zero, guaranteeing zero.

Fixes #55157
2022-04-28 12:32:58 +01:00
Michael Forster cfb4e78252 Revert "[llvm-pdbutil] Add options to only dump symbol record at specified offset and its parents or children with spcified depth."
This reverts commit a3b7cb015f.

symbol-offset.test fails under MSAN:

[  1] ; RUN: llvm-pdbutil yaml2pdb %p/Inputs/symbol-offset.yaml --pdb=%t.pdb [FAIL]
llvm-pdbutil yaml2pdb <REDACTED>/llvm/test/tools/llvm-pdbutil/Inputs/symbol-offset.yaml --pdb=<REDACTED>/tmp/symbol-offset.test/symbol-offset.test.tmp.pdb
==9283==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x55f975e5eb91 in __libcpp_tls_set <REDACTED>/include/c++/v1/__threading_support:428:12
    #1 0x55f975e5eb91 in set_pointer <REDACTED>/include/c++/v1/thread:196:5
    #2 0x55f975e5eb91 in void* std::__msan::__thread_proxy<std::__msan::tuple<std::__msan::unique_ptr<std::__msan::__thread_struct, std::__msan::default_delete<std::__msan::__thread_struct> >, llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::'lambda'()::operator()() const::'lambda'()> >(void*) <REDACTED>/include/c++/v1/thread:285:27
    #3 0x7f74a1e55b54 in start_thread (<REDACTED>/libpthread.so.0+0xbb54) (BuildId: 64752de50ebd1a108f4b3f8d0d7e1a13)
    #4 0x7f74a1dc9f7e in clone (<REDACTED>/libc.so.6+0x13cf7e) (BuildId: 7cfed7708e5ab7fcb286b373de21ee76)
2022-04-28 12:42:31 +02:00
Ties Stuij 051deb2d9d [ARM] add Armv9 build attribute
The build attribute number can be found in the Arm ABI addenda32 document:
https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#335target-related-attributes

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D124090
2022-04-28 10:48:26 +01:00
Lian Wang dc0ae8ce18 [RISCV] Support VP_SETCC mask operations
Support VP_SETCC mask operations, turn it to logical operation.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D124438
2022-04-28 08:52:29 +00:00
Nikita Popov b9dc565147 [GVN] Encode GEPs in offset representation
When using opaque pointers, convert GEPs into offset representation
of the form P + V1 * Scale1 + V2 * Scale2 + ... + ConstantOffset.
This allows us to recognize equivalent address calculations even if
the GEPs don't use the same source element type.

This fixes an opaque pointer codegen regression seen in rustc.

Differential Revision: https://reviews.llvm.org/D124527
2022-04-28 09:32:05 +02:00
Luo, Yuanke 942ec5c36d [X86][AMX] combine tile cast and load/store instruction.
The `llvm.x86.cast.tile.to.vector` intrinsic is lowered to
`llvm.x86.tilestored64.internal` and `load <256 x i32>`. The
`llvm.x86.cast.vector.to.tile` is lowered to `store <256 x i32>` and
`llvm.x86.tileloadd64.internal`. When `llvm.x86.cast.tile.to.vector` is
used by `store <256 x i32>` or `load <256 x i32>` is used by
`llvm.x86.cast.vector.to.tile`, they can be combined by
`llvm.x86.tilestored64.internal` and `llvm.x86.tileloadd64.internal`.

Differential Revision: https://reviews.llvm.org/D124378
2022-04-28 14:55:21 +08:00
Max Kazantsev 35f38583d2 [JumpThreading][NFC][CompileTime] Do not recompute BPI/BFI analyzes
They can already be available, and even if not, DT/LI can be available.
We should not recompute them. Old PM is unchanged because it would
require changing dependencies, and we don't care enough about it.

Differential Revision: https://reviews.llvm.org/D124439
Reviewed By: nikic, aeubanks
2022-04-28 10:46:08 +07:00
Wenju He 96d3be8443 [InferAddressSpaces] Check if AS are the same in isNoopPtrIntCastPair
isNoopAddrSpaceCast is expecting SrcAS is different from DestAS.
If the two AS are the same, consider ptrtoint/inttoptr as noop cast.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D123573
2022-04-28 11:10:55 +08:00
Liqin.Weng 6365bde658 [XCORE][CodeGen][NFC] Use ArrayRef in TargetLowering functions
Reviewed By: nigelp-xmos

Differential Revision: https://reviews.llvm.org/D123661
2022-04-28 02:06:46 +00:00
Shengchen Kan 6a6b0e4a63 [X86] Check the address in machine verifier
1. The scale factor must be 1, 2, 4, 8
2. The displacement must fit in 32-bit signed integer

Noticed by: https://github.com/llvm/llvm-project/issues/55091

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D124455
2022-04-28 10:05:39 +08:00
Arthur Eubanks 4e65291837 [OpaquePtr][GlobalOpt] Don't attempt to evaluate global constructors with arguments
Previously all entries in global_ctors had to have the void()* type and
we'd skip evaluating bitcasted functions. With opaque pointers we may
see the function directly.

Fixes #55147.

Reviewed By: #opaque-pointers, nikic

Differential Revision: https://reviews.llvm.org/D124553
2022-04-27 19:00:44 -07:00
Fangrui Song c74a706893 [LegacyPM] Remove ThreadSanitizerLegacyPass
Using the legacy PM for the optimization pipeline was deprecated in 13.0.0.
Following recent changes to remove non-core features of the legacy
PM/optimization pipeline, remove ThreadSanitizerLegacyPass.

Reviewed By: #sanitizers, vitalybuka

Differential Revision: https://reviews.llvm.org/D124209
2022-04-27 16:25:41 -07:00
Kirill Stoimenov 761366e6ae Revert "[Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]"
This reverts commit 74273d575f.

Buildbot: https://lab.llvm.org/buildbot/#/builders/5/builds/22795
Failing with memory leak.
2022-04-27 23:11:48 +00:00
Zequan Wu 1043eeaf86 [llvm-symbolizer][native-pdb] Don't reset CurLineOffset if NextLineOffset is none 2022-04-27 16:05:19 -07:00
Matt Arsenault 7762a3ce18 Revert "BranchFolder: Assert on SSA functions"
This reverts commit 6ff91d17d6.
2022-04-27 19:02:15 -04:00
Matt Arsenault 6ff91d17d6 BranchFolder: Assert on SSA functions
We probably should have the opposite of getRequiredProperties for this
2022-04-27 18:51:37 -04:00
Bill Wendling 8f2ec974d1 [X86] Move target-generic code into CodeGen [NFC]
This code is the same for all platforms.

Differential Revision: https://reviews.llvm.org/D124566
2022-04-27 15:37:28 -07:00
Matt Arsenault 7c2db66632 llvm-reduce: Support multiple MachineFunctions
The current testcase I'm trying to reduce only reproduces with IPRA
enabled and requires handling multiple functions.

The only real difference vs. the IR is the extra indirect to look for
the underlying MachineFunction, so treat the ReduceWorkItem as the
module instead of the function.

The ugliest piece of this is really the ugliness of
MachineModuleInfo. It not only tracks actual module state, but has a
number of transient fields used for isel and/or the asm printer. These
shouldn't do any harm for the use here, though they should be
separated out.
2022-04-27 18:11:59 -04:00
Zequan Wu a3b7cb015f [llvm-pdbutil] Add options to only dump symbol record at specified offset and its parents or children with spcified depth.
Right now, if we want to dump symbol at specified offset, we need to use `grep`.
And it can only show surrounding symbols in layout (not in lexical scope sense).

This adds similar options to `dump` command as `llvm-dwarfdump` to allow users
to dump symbol record at specified offset and its parents or children with
spcified depth.

`--symbol-offset=` must be used with `--modi` to dump only one symbol at given
offset.

`--show-parents`/`--show-children` must be used with `--symbol-offset` to
dump all symbols that are parents/children of the symbol at given offset.

`--parent-recurse-depth`/`--children-recurse-depth` must be used with
`--show-parents`/`--show-children` to specify the max up/down depth.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D124317
2022-04-27 14:37:35 -07:00
David Blaikie 727c590fe9 DebugInfo: Use hash-based unit lookup when available in dwp files
Fix a test case that had a bogus (probably I hand crafted it at some
point) index that didn't point to the right data in the process.
2022-04-27 21:18:14 +00:00
Simon Pilgrim e378577524 [X86] Use is128BitLaneRepeatedShuffleMask wrapper. NFC.
We don't need to know the actual repeated mask.
2022-04-27 21:09:57 +01:00
Nicolas Abram Lujan f8a574bf4d [InstCombine] C0 >> (X - C1) --> (C0 << C1) >> X
With the right pre-conditions, we can fold the offset
into the shifted constant:
https://alive2.llvm.org/ce/z/drMRBU
https://alive2.llvm.org/ce/z/cUQv-_

Fixes #55016

Differential Revision: https://reviews.llvm.org/D124369
2022-04-27 14:18:30 -04:00
Craig Topper c2614b31d9 [RISCV] Add isCommutable to scalar FMA instructions.
The default implementation of findCommutedOpIndices picks the
first two source operands. That's exactly what we want for the
scalar FMA instructions.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124463
2022-04-27 11:07:18 -07:00
Martin Sebor efa0f12c0b [InstCombine] Fold strnlen calls in equality to zero.
Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D123818
2022-04-27 12:03:24 -06:00
Alexey Bataev 29a470e380 [COST]Improve cost model for shuffles in SLP.
Introduced masks where they are not added and improved target dependent
cost models to avoid returning of the incorrect cost results after
adding masks.

Differential Revision: https://reviews.llvm.org/D100486
2022-04-27 10:56:26 -07:00
Wei Wang 26a0d53b15 [CHR] Skip region containing llvm.coro.id
When a block containing llvm.coro.id is cloned during CHR, it inserts an invalid
PHI node with token type to the beginning of the block containing llvm.coro.begin.
To avoid such case, we exclude regions with llvm.coro.id.

Reviewed By: ChuanqiXu

Differential Revision: https://reviews.llvm.org/D124418
2022-04-27 10:27:25 -07:00
Chris Bieneman 05b765ff69 [DXIL] [NFC] Remove dead attribute code paths
DXIL doesn't support attributes added after LLVM 3.7. The DXILPrepare
pass removes those attributes so they should never be present by the
time we reach the DXIL bitcode writer.

In the event that we somehow try to write a newer attribute in the DXIL
writer, we should fail hard (crash), because the output would be
invalid. This case should only be possible if the DXIL writer were
called without DXILPrepare being run first, which shouldn't be possible.

This patch also adds a default case to the switch statement over the
attribute list which covers all the removed cases and any new attribute
kinds that may be added in the future. The default case is handled like
other unsupported cases by a call to llvm_unreachable.
2022-04-27 10:46:59 -05:00
Simon Pilgrim 03482bccad [X86] collectConcatOps - add ability to collect from vector 'widening' patterns
Recognise insert_subvector(undef, x, lo/hi) patterns where we double the width of a vector - creating an UNDEF subvector on the fly.
2022-04-27 15:38:58 +01:00
David Green 46cef9a82d [AArch64] Attempt to fix bots by ensuring legalized type is a vector 2022-04-27 15:36:15 +01:00
Roman Lebedev ffafa71f64
[InstCombine] 'round up integer': if bias is just right, just reuse instructions
This is only useful if we can't create new instruction
because %x.aligned has other uses and already sticks around.
2022-04-27 17:27:02 +03:00
Roman Lebedev aac0afd1dd
[InstCombine] Fold 'round up integer' pattern (when alignment is a power of two)
But don't deal with non-splats.

The test coverage is sufficiently exhaustive,
and alive is happy about the changes there.

Example with constants: https://alive2.llvm.org/ce/z/EUaJ5- / https://alive2.llvm.org/ce/z/Bkng2X
General proof: https://alive2.llvm.org/ce/z/3RjJ5A
2022-04-27 17:26:55 +03:00
Shilei Tian a6b355dd31 [SLP] Fix a typo that causes redundant assertion and potential segment fault
Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D124497
2022-04-27 10:07:59 -04:00
Ivan Kosarev 6ddf2a824d [AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.
As older waves execute long sequences of VALU instructions, this may
prevent younger waves from address calculation and then issuing their
VMEM loads, which in turn leads the VALU unit to idle. This patch tries
to prevent this by temporarily raising the wave's priority.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D124246
2022-04-27 14:37:18 +01:00
Anna Thomas c515b2f39e [IRCE] Avoid computing potentially unnecessary analyses. NFC
IRCE is a function pass that operates on loops. If there are no loops in
the function (as seen through LI), we should avoid computing the
remaining expensive analyses (such as BPI). Reordered the analyses
requests and early return if there are no loops. This is an NFC with
compile time improvement.

The same will be done in a follow-up patch for the loop vectorizer.

Reviewed-By: nikic
Differential Revision: https://reviews.llvm.org/D124478
2022-04-27 09:22:10 -04:00
Denis Antrushin 4059770af5 [StatepointLowering] Only export STATEPOINT results if used in nonlocal blocks.
Cuurently we always export STATEPOINT results (GC pointers lowered via VRegs)
to virtual registers. When processing gc.relocate instructions we have to
generate CopyFromRegs node and then export it to VReg again if gc.relocate
is used in other basic blocks. This results in generation of extra COPY MIR
instruction if statepoint and its gc.relocate are in the same BB, but gc.relocate
result is used in other blocks.

This patch changes this behavior to export statepoint results only if used
in other basic blocks. For local uses StatepointLoweringState.(get|set)Location()
API is used to communicate appropriate statepoint result from `LowerStatepoint()`
to `visitGCRelocate()`

This is NFC and is purely compile time optimization. On big methids it can improve
codegen compile time up to 10%.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D124444
2022-04-27 15:53:24 +03:00
David Green 8e2a0e61f5 [AArch64] Break up larger shuffle-masks into legal sizes in getShuffleCost
Given a larger-than-legal shuffle mask, the final codegen will split
into multiple sub-vectors. This attempts to model that in
AArch64TTIImpl::getShuffleCost, splitting masks up according to the size
of the legalized vectors. If the sub-masks have at most 2 input sources
we can call getShuffleCost on them and sum the costs, to get a more
accurate final cost for the entire shuffle. The call to
improveShuffleKindFromMask helps to improve the shuffle kind for the
sub-mask cost call.

Differential Revision: https://reviews.llvm.org/D123414
2022-04-27 13:51:50 +01:00
Chris Jackson 74273d575f [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
This relands commit 8f550368b1.

The test is amended with REQUIRES: x86-registered-target, in line with
the other debuginfo-scev-salvage tests.

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 13:10:30 +01:00
Chris Jackson 855752e563 Revert [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics[2/2]
This reverts commit 8f550368b1.
2022-04-27 13:06:03 +01:00
Chris Jackson 8f550368b1 [Debuginfo][LSR] Add salvaging variadic dbg.value intrinsics [2/2]
Second of two patches to extend SCEV-based salvaging to dbg.value
intrinsics that have multiple location ops pre-LSR. This second patch
adds the core implementation.

Reviewers: @StephenTozer, @djtodoro

Differential Revision: https://reviews.llvm.org/D120169
2022-04-27 12:47:35 +01:00