Commit Graph

143125 Commits

Author SHA1 Message Date
Bjorn Pettersson 985b9b7e42 [PM] Avoid duplicates in the Used/Preserved/Required sets
The pass analysis uses "sets" implemented using a SmallVector type
to keep track of Used, Preserved, Required and RequiredTransitive
passes. When having nested analyses we could end up with duplicates
in those sets, as there was no checks to see if a pass already
existed in the "set" before pushing to the vectors. This idea with
this patch is to avoid such duplicates by avoiding pushing elements
that already is contained when adding elements to those sets.

To align with the above PMDataManager::collectRequiredAndUsedAnalyses
is changed to skip adding both the Required and RequiredTransitive
passes to its result vectors (since RequiredTransitive always is
a subset of Required we ended up with duplicates when traversing
both sets).

Main goal with this is to avoid spending time verifying the same
analysis mulitple times in PMDataManager::verifyPreservedAnalysis
when iterating over the Preserved "set". It is assumed that removing
duplicates from a "set" shouldn't have any other negative impact
(I have not seen any problems so far). If this ends up causing
problems one could do some uniqueness filtering of the vector being
traversed in verifyPreservedAnalysis instead.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D94416
2021-01-20 13:55:18 +01:00
Mark Murray cab20f6105 [AArch64] Add missing "flagm" feature to the .arch_extension directive.
Depends on D94970

Differential Revision: https://reviews.llvm.org/D94971
2021-01-20 11:57:39 +00:00
Mark Murray f344c028de [AArch64] Add missing "pauth" feature to the .arch_extension directive.
Differential Revision: https://reviews.llvm.org/D94970
2021-01-20 11:57:39 +00:00
Chuanqi Xu c1bc7981ba [Coroutine] Remain alignment information when merging frame variables
Summary: This is to address bug48712.
The solution in this patch is that when we want to merge two variable a
into the storage frame of variable b only if the alignment of a is
multiple of b.
There may be other strategies. But now I think they are hard to handle
and benefit little. Or we can implement them in the future.

Test-plan: check-llvm

Reviewers: jmorse, lxfind, junparser

Differential Revision: https://reviews.llvm.org/D94891
2021-01-20 18:59:00 +08:00
Mirko Brkusanin a6a72dfdf2 [AMDGPU][GlobalISel] Avoid selecting S_PACK with constants
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.

Differential Revision: https://reviews.llvm.org/D92219
2021-01-20 11:54:53 +01:00
Petar Avramovic 4ab704d628 [AMDGPU][MC] Add tfe disassembler support MIMG opcodes
With tfe on there can be a vgpr write to vdata+1.
Add tablegen support for 5 register vdata store.
This is required for 4 register vdata store with tfe.

Differential Revision: https://reviews.llvm.org/D94960
2021-01-20 10:37:09 +01:00
Gabriel Hjort Åkerlund 2aeaaf841b [GlobalISel] Add missing operand update when copy is required
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.

Reviewed By: dsanders

Differential Revision: https://reviews.llvm.org/D91244
2021-01-20 10:32:52 +01:00
David Sherwood 255a507716 [NFC][InstructionCost] Use InstructionCost in lib/Transforms/IPO/IROutliner.cpp
In places where we call a TTI.getXXCost() function I have changed
the code to use InstructionCost instead of unsigned. This is in
preparation for later on when we will change the TTI interfaces
to return InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D94427
2021-01-20 08:33:59 +00:00
Bill Wendling e22295385c [X86] Add segment and address-size override prefixes
X86 allows for the "addr32" and "addr16" address size override prefixes.
Also, these and the segment override prefixes should be recognized as
valid prefixes.

Differential Revision: https://reviews.llvm.org/D94726
2021-01-19 23:54:31 -08:00
Hsiangkai Wang 8ca4b174d7 [RISCV] Implement vlseg intrinsics.
For Zvlsseg, we need continuous vector registers for the values. We need
to define new register classes for the different combinations of (number
of fields and LMUL). For example,

when the number of fields(NF) = 3, LMUL = 2, the values will be assigned
to (V0M2, V2M2, V4M2), (V2M2, V4M2, V6M2), (V4M2, V6M2, V8M2), ...

We define the vlseg intrinsics with multiple outputs. There is no way to
describe the codegen patterns with multiple outputs in the tablegen
files. We do the codegen in RISCVISelDAGToDAG and use EXTRACT_SUBREG to
extract the values of output.

The multiple scalable vector values will be put into a struct. This
patch is depended on the support for scalable vector struct.

Differential Revision: https://reviews.llvm.org/D94229
2021-01-20 14:26:04 +08:00
Kazu Hirata b023cdeacc [llvm] Use llvm::all_of (NFC) 2021-01-19 20:19:17 -08:00
Kazu Hirata 978c754076 [llvm] Use llvm::any_of (NFC) 2021-01-19 20:19:16 -08:00
Kazu Hirata 8857202489 [llvm] Use llvm::find (NFC) 2021-01-19 20:19:14 -08:00
ShihPo Hung 4dae2247fd [RISCV] refactor VPatBinary (NFC)
Make it easier to reuse for intrinsic vrgatherei16
which needs to encode both LMUL & EMUL in the instruction name,
like PseudoVRGATHEREI16_VV_M1_M1 and PseudoVRGATHEREI16_VV_M1_M2.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94951
2021-01-19 19:09:56 -08:00
Juneyoung Lee 4479c0c2c0 Allow nonnull/align attribute to accept poison
Currently LLVM is relying on ValueTracking's `isKnownNonZero` to attach `nonnull`, which can return true when the value is poison.
To make the semantics of `nonnull` consistent with the behavior of `isKnownNonZero`, this makes the semantics of `nonnull` to accept poison, and return poison if the input pointer isn't null.
This makes many transformations like below legal:

```
%p = gep inbounds %x, 1 ; % p is non-null pointer or poison
call void @f(%p)        ; instcombine converts this to call void @f(nonnull %p)
```

Instead, this semantics makes propagation of `nonnull` to caller illegal.
The reason is that, passing poison to `nonnull` does not immediately raise UB anymore, so such program is still well defined, if the callee does not use the argument.
Having `noundef` attribute there re-allows this.

```
define void @f(i8* %p) {       ; functionattr cannot mark %p nonnull here anymore
  call void @g(i8* nonnull %p) ; .. because @g never raises UB if it never uses %p.
  ret void
}
```

Another attribute that needs to be updated is `align`. This patch updates the semantics of align to accept poison as well.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90529
2021-01-20 11:31:23 +09:00
Ian Levesque 68a1f09107 [xray] Honor xray-never function-instrument attribute
function-instrument=xray-never wasn't actually honored before. We were
getting lucky that it worked because CodeGenFunction would omit the
other xray attributes when a function was annotated with
xray_never_instrument. This patch adds proper support.

Differential Revision: https://reviews.llvm.org/D89441
2021-01-19 18:47:09 -05:00
Wei Mi 21b1ad0340 [SampleFDO] Add the support to split the function profiles with context into
separate sections.

For ThinLTO, all the function profiles without context has been annotated to
outline functions if possible in prelink phase. In postlink phase, profile
annotation in postlink phase is only meaningful for function profile with
context. If the profile is large, it is better to split the profile into two
parts, one with context and one without, so the profile reading in postlink
phase only has to read the part with context. To have the profile splitting,
we extend the ExtBinary format to support different section arrangement. It
will be flexible to add other section layout in the future without the need
to create new class inheriting from ExtBinary class.

Differential Revision: https://reviews.llvm.org/D94435
2021-01-19 15:16:19 -08:00
Sam Clegg 96ef4f307d Revert "[WebAssembly] call_indirect issues table number relocs"
This reverts commit 418df4a6ab.

This change broke emscripten tests, I believe because it started
generating 5-byte a wide table index in the call_indirect instruction.
Neither v8 nor wabt seem to be able to handle that.  The spec
currently says that this is single 0x0 byte and:

"In future versions of WebAssembly, the zero byte occurring in the
encoding of the call_indirectcall_indirect instruction may be used to
index additional tables."

So we need to revisit this change.  For backwards compat I guess
we need to guarantee that __indirect_function_table is always at
address zero.   We could also consider making this a single-byte
relocation with and assert if have more than 127 tables (for now).

Differential Revision: https://reviews.llvm.org/D95005
2021-01-19 15:06:07 -08:00
Craig Topper e75a4b6ea9 [RISCV] Remove NotHasStdExtZbb predicate from zext.h/sext.b/sext.h InstAliases. NFC
NotHasStdExtZbb doesn't have an AssemblerPredicate associated with it
so it didn't do anything. We don't need it either because the sorting
rules in tablegen prioritize by number of predicates. So the
dedicated instructions in the B extension that have predicates
will be prioritized automatically.
2021-01-19 14:31:48 -08:00
Alexey Bataev e463bd53c0 Revert "[SLP]Merge reorder and reuse shuffles."
This reverts commit 438682de6a to fix the
bug with the reducing size of the resulting vector for the entry node
with multiple users.
2021-01-19 11:48:04 -08:00
Mitch Phillips 5b7aef6eb4 Revert "[PDB] Defer relocating .debug$S until commit time and parallelize it"
This reverts commit 6529d7c5a4.

Reason: Broke the ASan buildbots.
http://lab.llvm.org:8011/#/builders/99/builds/1567
2021-01-19 11:45:48 -08:00
Jonas Devlieghere a4b42c621b [llvm] Protect signpost map with a mutex
Use a mutex to protect concurrent access to the signpost map. This fixes
nondeterministic crashes in LLDB that appeared after using signposts in
the timer implementation.

Differential revision: https://reviews.llvm.org/D94285
2021-01-19 11:41:54 -08:00
Mariya Podchishchaeva 7113de301a [ScalarizeMaskedMemIntrin] Add missing dependency
The pass has dependency on 'TargetTransformInfoWrapperPass', but the
corresponding call to INITIALIZE_PASS_DEPENDENCY was missing.

Differential Revision: https://reviews.llvm.org/D94916
2021-01-19 22:33:47 +03:00
Nikita Popov 21443381c0 Reapply [InstCombine] Replace one-use select operand based on condition
Relative to the original change, this adds a check that the
instruction on which we're replacing operands is safe to speculatively
execute, because that's what we're effectively doing. We're executing
the instruction with the replaced operand, which is fine if it's pure,
but not fine if can cause side-effects or UB (aka is not speculatable).

Additionally, we cannot (generally) replace operands in phi nodes,
as these may refer to a different loop iteration. This is also covered
by the speculation check.

-----

InstCombine already performs a fold where X == Y ? f(X) : Z is
transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
if f(X) only has one use, then we can always directly replace the
use inside the instruction. To actually be profitable, limit it to
the case where Y is a non-expr constant.

This could be further extended to replace uses further up a one-use
instruction chain, but for now this only looks one level up.

Among other things, this also subsumes D94860.

Differential Revision: https://reviews.llvm.org/D94862
2021-01-19 20:26:38 +01:00
Craig Topper ce8b3937dd [RISCV] Add DAG combine to turn (setcc X, 1, setne) -> (setcc X, 0, seteq) if we can prove X is 0/1.
If we are able to compare with 0 instead of 1, we might be able
to fold the setcc into a beqz/bnez.

Often these setccs start life as an xor that gets converted to
a setcc by DAG combiner's rebuildSetcc. I looked into a detecting
(xor X, 1) and converting to (seteq X, 0) based on boolean contents
being 0/1 in rebuildSetcc instead of using computeKnownBits. It was
very perturbing to AMDGPU tests which I didn't look closely at.
It had a few changes on a couple other targets, but didn't seem
to be much if any improvement.

Reviewed By: lenary

Differential Revision: https://reviews.llvm.org/D94730
2021-01-19 11:21:48 -08:00
Jeroen Dobbelaere 121cac01e8 [noalias.decl] Look through llvm.experimental.noalias.scope.decl
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D93042
2021-01-19 20:09:42 +01:00
Brendon Cahoon 57443bfb4a [Hexagon] Fix segment start to adjust for gaps between segments
The Hexagon Vector Combine pass genertes stores for a complete
aligned vector. The start of each section is a multiple of the
vector size, so that value is passed to normalize to compute
the offset of the stores in the section.  The first store may
not occur at offset 0 when there is a gap between sections.
2021-01-19 12:49:39 -06:00
Jay Foad 18cb7441b6 [AMDGPU] Simpler names for arch-specific ttmp registers. NFC.
Rename the *_gfx9_gfx10 ttmp registers to *_gfx9plus for simplicity,
and use the corresponding isGFX9Plus predicate to decide when to use
them instead of the old *_vi versions.

Differential Revision: https://reviews.llvm.org/D94975
2021-01-19 18:47:14 +00:00
Jessica Paquette cbf5246359 Fix buildbot after cfc6073017
Windows buildbots were not happy with using find_if + instructionsWithoutDebug.

In cfc6073017, instructionsWithoutDebug is not technically necessary. So,
just iterate over the block directly.

http://lab.llvm.org:8011/#/builders/127/builds/4732/steps/7/logs/stdio
2021-01-19 10:38:04 -08:00
Jessica Paquette cfc6073017 [GlobalISel] Combine (a[0]) | (a[1] << k1) | ...| (a[m] << kn) into a wide load
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)

This tries to recognize patterns like below (assuming a little-endian target):

```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)

s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```

(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)

To recognize the pattern, this searches from the last G_OR in the expression
tree.

E.g.

```
    Reg   Reg
     \    /
      OR_1   Reg
       \    /
        OR_2
          \     Reg
           .. /
          Root
```

Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.

If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)

To simplify things, this patch

(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
    specific location

An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)

At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.

Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.

Differential Revision: https://reviews.llvm.org/D94350
2021-01-19 10:24:27 -08:00
Fraser Cormack 9c6a00fe99 [RISCV] Add ISel patterns for scalable mask exts & truncs
Original patch by @rogfer01.

This patch adds support for sign-, zero-, and any-extension from
scalable mask vector types to integer vector types, as well as
truncation in the opposite direction.

Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94590
2021-01-19 18:13:15 +00:00
David Green 6a563eef13 [ARM] Expand vXi1 VSELECT's
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.

Differential Revision: https://reviews.llvm.org/D94946
2021-01-19 17:56:50 +00:00
Nikita Popov 051ec9f5f4 [ValueTracking] Strengthen impliesPoison reasoning
Split impliesPoison into two recursive walks, one over V, the
other over ValAssumedPoison. This allows us to reason about poison
implications in a number of additional cases that are important
in practice. This is a generalized form of D94859, which handles
the cmp to cmp implication in particular.

Differential Revision: https://reviews.llvm.org/D94866
2021-01-19 18:04:23 +01:00
Fraser Cormack 15fd6bae0e [RISCV] Extend RVV VType info with the type's AVL (NFC)
This patch factors out the "VLMax" operand passed to most
scalable-vector ISel patterns into a property of each VType.

This is seen as a preparatory change to allow RVV in the future to
more easily support fixed-length vector types with constrained vector
lengths, with the AVL operand set to the length of the fixed-length
vector. It has no effect on the scalable code generation path.

Reviewed By: HsiangKai

Differential Revision: https://reviews.llvm.org/D94594
2021-01-19 15:46:56 +00:00
David Green f373b30923 [ARM] Add MVE add.sat costs
This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs,
based on when the instruction is legal. With smaller than legal types
that are promoted we generate shr(qadd(shl, shl)), so the cost is 4
appropriately.

Differential Revision: https://reviews.llvm.org/D94958
2021-01-19 15:38:46 +00:00
Victor Huang 909d6c86ea [PowerPC] Fix the check for the instruction using FRSP/XSRSP output register
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.

We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
  FRSP/XSRSP. which stays in a undef status.

This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.

Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
2021-01-19 09:20:03 -06:00
Florian Hahn 3747b69b53
[LoopRotate] Calls not lowered to calls should not block rotation.
83daa49758 made loop-rotate more conservative in the presence of
function calls in the prepare-for-lto stage. The code did not properly
account for calls that are no actual function calls, like calls to
intrinsics. This patch updates the code to ensure only calls that are
lowered to actual calls are considered inline candidates.
2021-01-19 14:37:36 +00:00
Tim Northover 6259fbd8b6 AArch64: add apple-a14 as a CPU
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
2021-01-19 14:04:53 +00:00
Hans Wennborg ec877106a3 [ThinLTO] Also prune Thin-* files from the ThinLTO cache
Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately
after they're used (either by renaming or deletion). However, we've seen
instances on Windows where this doesn't happen, probably due to the
filesystem being flaky. This is effectively a resource leak which has
prevented us from using the ThinLTO cache on Windows.

Since those temporary files are in the thinlto cache directory which we
prune periodically anyway, allowing them to be pruned too seems like a
tidy way to solve the problem.

Differential revision: https://reviews.llvm.org/D94962
2021-01-19 14:43:49 +01:00
Caroline Concatto 172f1f8952 [AArch64][SVE]Add cost model for vector reduce for scalable vector
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts:  the legalization cost and the horizontal
reduction.

Differential Revision: https://reviews.llvm.org/D93639
2021-01-19 11:54:16 +00:00
Simon Pilgrim 5626adcd6b [X86][SSE] combineVectorSignBitsTruncation - fold trunc(srl(x,c)) -> packss(sra(x,c))
If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.
2021-01-19 11:04:13 +00:00
Hans Wennborg 58bdfcfac0 Revert 5238e7b302 "[InstCombine] Replace one-use select operand based on condition"
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.

> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862

This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:

> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 11:50:56 +01:00
Jay Foad 49dce85584 [AMDGPU] Simplify AMDGPUInstPrinter::printExpSrcN. NFC.
Change-Id: Idd7f47647bc0faa3ad6f61f44728c0f20540ec00
2021-01-19 10:39:56 +00:00
Florian Hahn 83daa49758
[LoopRotate] Add PrepareForLTO stage, avoid rotating with inline cands.
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.

The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.

This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.

I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.

From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.

There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).

There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.

I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.

Reviewed By: sanwou01

Differential Revision: https://reviews.llvm.org/D94232
2021-01-19 10:15:29 +00:00
Yvan Roux 244ad228f3 [ARM][MachineOutliner] Add stack fixup feature
This patch handles cases where we have to save/restore the link register
into the stack and and load/store instruction which use the stack are
part of the outlined region. It checks that there will be no overflow
introduced by the new offset and fixup these instructions accordingly.

Differential Revision: https://reviews.llvm.org/D92934
2021-01-19 10:59:09 +01:00
Fraser Cormack c81ea9429f [RISCV] Add scalable-vector integer extension patterns
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D94694
2021-01-19 09:30:36 +00:00
Tres Popp a003f26539 [llvm] Prevent infinite loop in InstCombine of select statements
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.

Differential Revision: https://reviews.llvm.org/D94934
2021-01-19 10:31:48 +01:00
Lang Hames 95b63c7b13 [ORC] Move LookupRequest from OrcShared to Orc.
It depends on Orc types (SymbolLookupSet), so can't be part of OrcShared.
2021-01-19 20:23:47 +11:00
Tres Popp 170199f562 [llvm][nvptx] add atomicity to counter in ISelLowering
Previously uniqueCallSite could have race conditions between different
threads. Now it is accessed with an atomic RMW and will be unique
between different threads.

Differential Revision: https://reviews.llvm.org/D94784
2021-01-19 10:20:20 +01:00
David Sherwood c3ce262794 [NFC] Make remaining cost functions in LoopVectorize.cpp use InstructionCost
A previous patch has already changed getInstructionCost to return
an InstructionCost type. This patch changes the other various
getXXXCost functions to return an InstructionCost too. This is a
non-functional change - I've added a few asserts that the costs
are valid in places where we're selecting between vector call
and intrinsic costs. However, since we don't yet return invalid
costs from any of the TTI implementations these asserts should
not fire.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Differential Revision: https://reviews.llvm.org/D94065
2021-01-19 09:08:40 +00:00