Commit Graph

152419 Commits

Author SHA1 Message Date
David Blaikie 0a5c26f2ef DebugInfo: Simplified Template Names: drop unneeded space in arrays
Matching a recent clang change I've made, now 'int[3]' is formatted
without the space between the type and array bound. This commit updates
libDebugInfoDWARF/llvm-dwarfdump to match that formatting.
2021-11-05 22:50:57 -07:00
Bin Cheng 54d891a7d5 [RISCV]: Fix typo by abstracting VWholeLoad* classes
This patch abstracts VWholeLoad* classes into VWholeLoadN, simplifies
existing code as well as fixes a typo.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D109319
2021-11-06 10:48:03 +08:00
Bin Cheng d488f1fff2 [RISCV][NFC]: Refactor classes for load/store instructions of RVV
This patch refactors classes for load/store of V extension by:
- Introduce new class for VUnitStrideLoadFF and VUnitStrideSegmentLoadFF
  so that uses of L/SUMOP* are not spread around different places.
- Reorder classes for Unit-Stride load/store in line with table
  describing lumop/sumop in riscv-v-spec.pdf.

Reviewed By: HsiangKai, craig.topper

Differential Revision: https://reviews.llvm.org/D109318
2021-11-06 10:48:03 +08:00
Kazu Hirata 87e53a0ad8 [llvm] Use make_early_inc_range (NFC) 2021-11-05 19:39:07 -07:00
David Blaikie f57d0e2726 DWARF Simplified Template Names: Narrow down the handling for operator overloads
Actually we can, for now, remove the explicit "operator" handling
entirely - since clang currently won't try to flag any of these as
rebuildable. That seems like a reasonable state for now, but it could be
narrowed down to only apply to conversion operators, most likely - but
would need more nuance for op> and op>> since they would be incorrectly
flagged as already having their template arguments (due to the trailing
'>').
2021-11-05 15:41:56 -07:00
Philip Reames d24a0e8857 [SCEV] Use constant range of RHS to prove NUW on narrow IV in trip count logic
The basic idea here is that given a zero extended narrow IV, we can prove the inner IV to be NUW if we can prove there's a value the inner IV must take before overflow which must exit the loop.

Differential Revision: https://reviews.llvm.org/D109457
2021-11-05 15:36:47 -07:00
Scott Linder f82bdf0fcc [NFC][Verifier] Remove redundant Module parameters
These `M` parameters shadow the `M` member in `VerifierSupport`, and
both always refer to the same module. Eliminate the redundant parameters
and always use the member.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D106474
2021-11-05 21:30:02 +00:00
Jay Foad bdaa181007 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 21:20:30 +00:00
Roman Lebedev a5cd27880a
[IR] Improve member `ShuffleVectorInst::isReplicationMask()`
When we have an actual shuffle, we can impose the additional restriction
that the mask replicates the elements of the first operand, so we know
the replication factor as a ratio of output and op0 vector sizes.
2021-11-06 00:09:27 +03:00
Martin Storsjö 86c01b1bc6 [DebugInfo] [PDB] Force injected source paths to use backslashes
This fixes lld/COFF/pdb-natvis.test (which only is run on Windows)
when using paths with forward slashes on Windows.

Differential Revision: https://reviews.llvm.org/D113265
2021-11-05 21:50:42 +02:00
Sanjay Patel 7e30404c3b [DAGCombiner] add fold for vselect based on mask of signbit, part 2
This is the 'or' sibling for the fold added with:
D113212

https://alive2.llvm.org/ce/z/tgnp7K

Note that neither of these transforms is poison-safe,
but it does not seem to matter at this level. We have
had the scalar version of D113212 for a long time, so
this is just making optimizer behavior consistent.

We do not have the scalar version of *this* fold,
however, so that is another follow-up.
2021-11-05 15:02:12 -04:00
Yonghong Song 3466e00716 Reland "[Attr] support btf_type_tag attribute"
This is to revert commit f95bd18b5f (Revert "[Attr] support
btf_type_tag attribute") plus a bug fix.

Previous change failed to handle cases like below:
    $ cat reduced.c
    void a(*);
    void a() {}
    $ clang -c reduced.c -O2 -g

In such cases, during clang IR generation, for function a(),
CGCodeGen has numParams = 1 for FunctionType. But for
FunctionTypeLoc we have FuncTypeLoc.NumParams = 0. By using
FunctionType.numParams as the bound to access FuncTypeLoc
params, a random crash is triggered. The bug fix is to
check against FuncTypeLoc.NumParams before accessing
FuncTypeLoc.getParam(Idx).

Differential Revision: https://reviews.llvm.org/D111199
2021-11-05 11:25:17 -07:00
Florian Hahn f64580f8d2
[AArch64][GISel] Optimize 8 and 16 bit variants of uaddo.
Try simplify G_UADDO with 8 or 16 bit operands to wide G_ADD and TBNZ if
result is only used in the no-overflow case. It is restricted to cases
where we know that the high-bits of the operands are 0. If there's an
overflow, then the the 9th or 17th bit must be set, which can be checked
using TBNZ.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D111888
2021-11-05 19:11:15 +01:00
Shao-Ce SUN 5c3d7184b4 [RISCV] Support Zfhmin extension
According to RISC-V Unprivileged ISA 15.6.

Reviewed By: asb

Differential Revision: https://reviews.llvm.org/D111866
2021-11-06 01:41:02 +08:00
Kazu Hirata 2c4ba3e9d3 [Target] Use make_early_inc_range (NFC) 2021-11-05 09:14:32 -07:00
Whitney Tsang 93421108d2 Add NoOpLoopNestPass and LOOPNEST_PASS macro
Having a NoOpLoopNestPass can ensure that only outermost loop is invoked
for a LoopNestPass with a lit test.

There are some existing passes that are implemented as LoopNestPass, but
they are still using LOOP_PASS macro.
It would be easier to identify LoopNestPasses with a LOOPNEST_PASS
macro.

Differential Revision: https://reviews.llvm.org/D113185
2021-11-05 16:11:48 +00:00
Michael Liao af2ae2cf42 [BranchRelaxation] Fix warning on unused variable. NFC. 2021-11-05 11:18:27 -04:00
David Green 08056e1888 [InstCombine] Generalize sadd.sat combine to compute sign bits.
There is a combine in instcombine to transform a saturated add/sub into
a saddsat/ssubsat, currently handling inputs which are both sign
extended (https://alive2.llvm.org/ce/z/68qpTn). This can generalize to,
for example ashr of at least the bitwidth (https://alive2.llvm.org/ce/z/4TFyX-
and https://alive2.llvm.org/ce/z/qDWzFs for example). Which means it
generalizes further to "the number of sign bits", needing to be enough
to truncate to the size of the saturate. (An example using `or` for
instance: https://alive2.llvm.org/ce/z/EI_h_A).

So this patch makes use of ComputeNumSignBits (with the newly added
ComputeMinSignedBits) in matchSAddSubSat to generalize the fold to any
inputs with enough sign bits known, truncating the inputs to the new
size of the saturate.

Differential Revision: https://reviews.llvm.org/D112298
2021-11-05 15:05:09 +00:00
David Green 61225c0818 [ValueTracking][InstCombine] Introduce and use ComputeMinSignedBits
This introduces a new ComputeMinSignedBits method for ValueTracking that
returns the BitWidth - SignBits + 1 from ComputeSignBits, and represents
the minimum bit size for the value as a signed integer.  Similar to the
existing APInt::getMinSignedBits method, this can make some of the
reasoning around ComputeSignBits more natural.

See https://reviews.llvm.org/D112298
2021-11-05 14:41:37 +00:00
Simon Pilgrim 9e6506299a [DAG] FoldConstantVectorArithmetic - remove SDNodeFlags argument
Another minor step towards merging FoldConstantVectorArithmetic into FoldConstantArithmetic.

We don't use SDNodeFlags in any constant folding inside DAG, so passing the Flags argument is a waste of time - an alternative would be to wire up FoldConstantArithmetic to take SDNodeFlags just-in-case we someday start using it, but we don't have any way to test it and I'd prefer to avoid dead code.

Differential Revision: https://reviews.llvm.org/D113276
2021-11-05 14:36:17 +00:00
Roman Lebedev ad617183bb
[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: mask is i8 not i1
Even though AVX512's masked mem ops (unlike AVX1/2) have a mask
that is a `VF x i1`, replication of said masks happens after
promotion of it to `VF x i8`, so we should use `i8`, not `i1`,
when calculating the cost of mask replication.
2021-11-05 17:27:02 +03:00
Sanjay Patel 4fc1fc4005 [DAGCombiner] add fold for vselect based on mask of signbit
(X s< 0) ? Y : 0 --> (X s>> BW-1) & Y

We canonicalize to the icmp+select form in IR, and we already have this fold
for scalar select in SDAG, so I think it's an oversight that we don't have
the fold for vectors. It seems neutral for AArch64 and saves some instructions
on x86.

Whether we should also have the sibling folds for the inverse condition or
all-ones true value may depend on target-specific factors such as whether
there's an "and-not" instruction.

Differential Revision: https://reviews.llvm.org/D113212
2021-11-05 10:06:16 -04:00
Roman Lebedev 01d8759ac9
[IR][ShuffleVector] Introduce `isReplicationMask()` matcher
Avid readers of this saga may recall from previous installments,
that replication mask replicates (lol) each of the `VF` elements
in a vector `ReplicationFactor` times. For example, the mask for
`ReplicationFactor=3` and `VF=4` is: `<0,0,0,1,1,1,2,2,2,3,3,3>`.
More importantly, replication mask is used by LoopVectorizer
when using masked interleaved memory operations.

As discussed in previous installments, while it is used by LV,
and we **seem** to support masked interleaved memory operations on X86,
it's support in cost model leaves a lot to be desired:
until basically yesterday even for AVX512 we had no cost model for it.

As it has been witnessed in the recent
AVX2 `X86TTIImpl::getInterleavedMemoryOpCost()`
costmodel patches, while it is hard-enough to query the cost
of a particular assembly sequence [from llvm-mca],
afterwards the check lines LV costmodel tests must be updated manually.
This is, at the very least, boring.

Okay, now we have decent costmodel coverage for interleaving shuffles,
but now basically the same mind-killing sequence has to be performed
for replication mask. I think we can improve at least the second half
of the problem, by teaching
the `TargetTransformInfoImplCRTPBase::getUserCost()` to recognize
`Instruction::ShuffleVector` that are repetition masks,
adding exhaustive test coverage
using `-cost-model -analyze` + `utils/update_analyze_test_checks.py`

This way we can have good exhaustive coverage for cost model,
and only basic coverage for the LV costmodel.

This patch adds precise undef-aware `isReplicationMask()`,
with exhaustive test coverage.
* `InstructionsTest.ShuffleMaskIsReplicationMask` shows that
   it correctly detects all the known masks.
* `InstructionsTest.ShuffleMaskIsReplicationMask_undef`
  shows that replacing some mask elements in a known replication mask
  still allows us to recognize it as a replication mask.
  Note, with enough undef elts, we may detect a different tuple.
* `InstructionsTest.ShuffleMaskIsReplicationMask_Exhaustive_Correctness`
  shows that if we detected the replication mask with given params,
  then if we actually generate a true replication mask with said params,
  it matches element-wise ignoring undef mask elements.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D113214
2021-11-05 16:53:47 +03:00
David Sherwood 657a1dcd0d [AArch64] Add target DAG combine for UUNPKHI/LO
When created a UUNPKLO/HI node with an undef input then the
output should also be undef. I've added a target DAG combine
function to ensure we avoid creating an unnecessary uunpklo/hi
instruction.

Differential Revision: https://reviews.llvm.org/D113266
2021-11-05 13:50:59 +00:00
Quinn Pham c71fbdd87b [NFC] Inclusive language: Remove instances of master in URLs
[NFC] This patch fixes URLs containing "master". Old URLs were either broken or
redirecting to the new URL.

Reviewed By: #libc, ldionne, mehdi_amini

Differential Revision: https://reviews.llvm.org/D113186
2021-11-05 08:48:41 -05:00
Simon Pilgrim f2703c3c33 [DAG] FoldConstantArithmetic - rename NumOps -> NumElts. NFC.
NumOps represents the number of elements for vector constant folding, rename this NumElts so in future we can the consistently use NumOps to represent the number of operands of the opcode.

Minor cleanup before trying to begin generalizing FoldConstantArithmetic to support opcodes other than binops.
2021-11-05 13:32:34 +00:00
Jingu Kang a7b1872593 [AArch64] Fix a bug from a pattern for uaddv(uaddlp(x)) ==> uaddlv
A pattern has selected wrong uaddlv MI. It should be as below.

uaddv(uaddlp(v8i8)) ==> uaddlv(v8i8)

Differential Revision: https://reviews.llvm.org/D113263
2021-11-05 12:48:18 +00:00
Alfredo Dal'Ava Junior 1cb9f37a17 [FreeBSD] Do not mark __stack_chk_guard as dso_local
This symbol is defined in libc.so so it is definitely not DSO-Local.
Marking it as such causes problems on some platforms (such as PowerPC).

Differential revision: https://reviews.llvm.org/D109090
2021-11-05 07:29:50 -05:00
Simon Pilgrim c1e7911c3b [DAG] FoldConstantArithmetic - fold bitlogic(bitcast(x),bitcast(y)) -> bitcast(bitlogic(x,y))
To constant fold bitwise logic ops where we've legalized constant build vectors to a different type (e.g. v2i64 -> v4i32), this patch adds a basic ability to peek through the bitcasts and perform the constant fold on the inner operands.

The MVE predicate v2i64 regressions will be addressed by future support for basic v2i64 type support.

One of the yak shaving fixes for D113192....

Differential Revision: https://reviews.llvm.org/D113202
2021-11-05 12:00:59 +00:00
Simon Pilgrim 5e9ac7c0a5 [X86] Enable v32i16 rotate lowering on non-BWI targets
Fixes one of the regressions in D113192
2021-11-05 11:00:31 +00:00
Chen Zheng fed2889f07 [PowerPC] use correct selection for v16i8/v8i16 splat load
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D113236
2021-11-05 10:04:03 +00:00
Jay Foad 0321bd64e6 Revert "[TwoAddressInstructionPass] Update existing physreg live intervals"
This reverts commit ec0e1e88d2.

It was pushed by mistake.
2021-11-05 09:54:26 +00:00
Jay Foad c93bf53a3e [AMDGPU] NFC formatting fixes in SIMemoryLegalizer 2021-11-05 09:10:24 +00:00
Jay Foad ec0e1e88d2 [TwoAddressInstructionPass] Update existing physreg live intervals
In TwoAddressInstructionPass::processTiedPairs with
-early-live-intervals, update any preexisting physreg live intervals,
as well as virtreg live intervals. By default (without
-precompute-phys-liveness) physreg live intervals only exist for
registers that are live-in to some basic block.

Differential Revision: https://reviews.llvm.org/D113191
2021-11-05 09:10:24 +00:00
Qiu Chaofan 5fd406e254 [PowerPC] Add intrinsic to convert between ppc_fp128 and fp128
ppc_fp128 and fp128 are both 128-bit floating point types. However, we
can't do conversion between them now, since trunc/ext are not allowed
for same-size fp types.

This patch adds two new intrinsics: llvm.ppc.convert.f128.to.ppcf128 and
llvm.convert.ppcf128.to.f128, to support such conversion.

Reviewed By: shchenz

Differential Revision: https://reviews.llvm.org/D109421
2021-11-05 16:58:38 +08:00
Martin Storsjö df0ba47c36 [Support] Allow configuring the preferred type of slashes on Windows
Default to preferring forward slashes when built for MinGW, as
many usecases, when e.g. Clang is used as a drop-in replacement
for GCC, requires the compiler to output paths with forward slashes.

Not all tests pass yet, if configuring to prefer forward slashes though.

Differential Revision: https://reviews.llvm.org/D112787
2021-11-05 10:42:02 +02:00
Martin Storsjö f4d83c56c9 [Support] [Windows] Convert paths to the preferred form
This normalizes most paths (except ones input from the user as command
line arguments) into the preferred form, if `real_style()` evaluates to
`windows_forward`.

Differential Revision: https://reviews.llvm.org/D111880
2021-11-05 10:41:51 +02:00
Martin Storsjö a8b54834a1 [Support] Add a new path style for Windows with forward slashes
This behaves just like the regular Windows style, with both separator
forms accepted, but with get_separator() returning forward slashes.

Add a more descriptive name for the existing style, keeping the old
name around as an alias initially.

Add a new function `make_preferred()` (like the C++17
`std::filesystem::path` function with the same name), which converts
windows paths to the preferred separator form (while this one works on
any platform and takes a `path::Style` argument).

Contrary to `native()` (just like `make_preferred()` in `std::filesystem`),
this doesn't do anything at all on Posix, it doesn't try to reinterpret
backslashes into forward slashes there.

Differential Revision: https://reviews.llvm.org/D111879
2021-11-05 10:41:51 +02:00
Martin Storsjö f95bd18b5f Revert "[Attr] support btf_type_tag attribute"
This reverts commits 737e4216c5 and
ce7ac9e66a.

After those commits, the compiler can crash with a reduced
testcase like this:

$ cat reduced.c
void a(*);
void a() {}
$ clang -c reduced.c -O2 -g
2021-11-05 10:36:40 +02:00
Chen Zheng 9695027066 [PowerPC] address post-commit comments for D106555; NFC
Address namanjai post commit comments.
2021-11-05 05:30:53 +00:00
Vitaly Buka 1caabbef8e [OpaquePtr] Fix initialization-order-fiasco
Asan detects it after D112732.
2021-11-04 19:29:06 -07:00
Shengchen Kan be08e452f3 [X86][MS-InlineAsm] Add constraint *m for memory access w/ global var
Constraint `*m` should be used when the address of a variable is passed
as a value. And the constraint is missing for MS inline assembly when sth
is written to the address of the variable.

The missing would cause FE delete the definition of the static varible,
and then result in "undefined reference to xxx" issue.

Reviewed By: xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D113096
2021-11-05 09:11:41 +08:00
Kirill Stoimenov 3f1aca58df [ASan] Added stack safety support in address sanitizer.
Added and implemented -asan-use-stack-safety flag, which control if ASan would use the Stack Safety results to emit less code for operations which are marked as 'safe' by the static analysis.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112098
2021-11-04 17:22:31 -07:00
Arthur Eubanks 7175886a0f [NewPM] Make eager analysis invalidation per-adaptor
Follow-up change to D111575.
We don't need eager invalidation on every adaptor. Most notably,
adaptors running passes that use very few analyses, or passes that
purely invalidate specific analyses.

Also allow testing of this via a pipeline string
"function<eager-inv>()".

The compile time/memory impact of this is very comparable to D111575.
https://llvm-compile-time-tracker.com/compare.php?from=9a2eec512a29df45c90c2fcb741e9d5c693b1383&to=b9f20bcdea138060967d95a98eab87ce725b22bb&stat=instructions

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D113196
2021-11-04 17:16:11 -07:00
Yonghong Song 41860e602a BPF: Support btf_type_tag attribute
A new kind BTF_KIND_TYPE_TAG is defined. The tags associated
with a pointer type are emitted in their IR order as modifiers.
For example, for the following declaration:
  int __tag1 * __tag1 __tag2 *g;
The BTF type chain will look like
  VAR(g) -> __tag1 --> __tag2 -> pointer -> __tag1 -> pointer -> int
In the above "->" means BTF CommonType.Type which indicates
the point-to type.

Differential Revision: https://reviews.llvm.org/D113222
2021-11-04 17:01:36 -07:00
Philip Reames dec15d9a0a [indvars] Use loop guards when canonicalizing exit conditions
This extends the logic in canonicalizeExitConditions to use loop guards to specialize the SCEV of the loop invariant term before quering it's range.
2021-11-04 15:23:34 -07:00
Arthur Eubanks 13317286f8 [NewPM] Use the default AA pipeline by default
We almost always want to use the default AA pipeline. It's very easy for
users of PassBuilder to forget to customize the AAManager to use the
default AA pipeline (for example, the NewPM C API forgets to do this).

If somebody wants a custom AA pipeline, similar to what is being done
now with the default AA pipeline registration, they can

  FAM.registerPass([&] { return std::move(MyAA); });

before calling

  PB.registerFunctionAnalyses(FAM);

For example, LTOBackend.cpp and NewPMDriver.cpp do this.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113210
2021-11-04 15:10:34 -07:00
Ben Langmuir a2639dcbe6 [ORC] Add a utility for adding missing "self" relocations to a Symbol
If a tool wants to introduce new indirections via stubs at link-time in
ORC, it can cause fidelity issues around the address of the function if
some references to the function do not have relocations. This is known
to happen inside the body of the function itself on x86_64 for example,
where a PC-relative address is formed, but without a relocation.

```
_foo:
  leaq -7(%rip), %rax ## form pointer to '_foo' without relocation

_bar:
  leaq (%rip), %rax ##  uses X86_64_RELOC_SIGNED to '_foo'
```

The consequence of introducing a stub for such a function at link time
is that if it forms a pointer to itself without relocation, it will not
have the same value as a pointer from outside the function. If the
function pointer is used as a key, this can cause problems.

This utility provides best-effort support for adding such missing
relocations using MCDisassembler and MCInstrAnalysis to identify the
problematic instructions. Currently it is only implemented for x86_64.

Note: the related issue with call/jump instructions is not handled
here, only forming function pointers.

rdar://83514317

Differential revision: https://reviews.llvm.org/D113038
2021-11-04 15:01:05 -07:00
David Blaikie 7cdd262351 DebugInfo: Fix incorrect line table lookup when resolving decl_file from a split unit
Specifically in DWARFv5 the unit for the line table entry was correct
but the context was incorrect - leading to looking up .debug_line_str in
the dwp instead of the executable.

(perhaps we could/should remove the context pointer entirely, and rely
on the one in the unit... I might try that as a separate follow-up
commit)
2021-11-04 14:54:27 -07:00
Philip Reames c0d9bf2f6a [indvars] Allow rotation (narrowing) of exit test when discovering trip count
This relaxes the one-use requirement on the rotation transform specifically for the case where we know we're zexting an IV of the loop.  This allows us to discover trip count information in SCEV, which seems worth a single extra loop invariant truncate.  Honestly, I'd prefer if SCEV could just compute the trip count directly (e.g. D109457), but this unblocks practical benefit.
2021-11-04 14:49:24 -07:00
Yonghong Song 737e4216c5 [Attr] support btf_type_tag attribute
This patch added clang codegen and llvm support
for btf_type_tag support. Currently, btf_type_tag
attribute info is preserved in DebugInfo IR only for
pointer types associated with typedef, global variable
and function declaration. Eventually, such information
is emitted to dwarf.

The following is an example:
  $ cat test.c
  #define __tag __attribute__((btf_type_tag("tag")))
  int __tag *g;
  $ clang -O2 -g -c test.c
  $ llvm-dwarfdump --debug-info test.o
  ...
  0x0000001e:   DW_TAG_variable
                  DW_AT_name      ("g")
                  DW_AT_type      (0x00000033 "int *")
                  DW_AT_external  (true)
                  DW_AT_decl_file ("/home/yhs/test.c")
                  DW_AT_decl_line (2)
                  DW_AT_location  (DW_OP_addr 0x0)

  0x00000033:   DW_TAG_pointer_type
                  DW_AT_type      (0x00000042 "int")

  0x00000038:     DW_TAG_LLVM_annotation
                    DW_AT_name    ("btf_type_tag")
                    DW_AT_const_value     ("tag")

  0x00000041:     NULL

  0x00000042:   DW_TAG_base_type
                  DW_AT_name      ("int")
                  DW_AT_encoding  (DW_ATE_signed)
                  DW_AT_byte_size (0x04)

  0x00000049:   NULL

Basically, a DW_TAG_LLVM_annotation tag will be inserted
under DW_TAG_pointer_type tag if that pointer has a btf_type_tag
associated with it.

Differential Revision: https://reviews.llvm.org/D111199
2021-11-04 14:23:31 -07:00
Philip Reames 453fdebd48 [indvars] Extend canonicalizeExitConditions to inverted operands
As discussed in the original reviews, but done in a follow on.
2021-11-04 14:20:37 -07:00
Thomas Symalla 76cbe62262 [AMDGPU] Changes the AMDGPU_Gfx calling convention by making the SGPRs 4..29 callee-save. This is to avoid superfluous s_movs when executing amdgpu_gfx function calls as the callee is likely not going to change the argument values.
This patch changes the AMDGPU_Gfx calling convention. It defines the SGPR registers s[4:29] as callee-save and leaves some SGPRs usable for callers. The intention is to avoid unneccessary s_mov instructions for arguments the caller would otherwise save and restore in these registers.

Reviewed By: sebastian-ne

Differential Revision: https://reviews.llvm.org/D111637
2021-11-04 21:50:18 +01:00
Noah Shutty d788c44f5c [Support] Improve Caching conformance with Support library behavior
This diff makes several amendments to the local file caching mechanism
which was migrated from ThinLTO to Support in
rGe678c51177102845c93529d457b020f969125373 in response to follow-up
discussion on that commit.

Patch By: noajshu

Differential Revision: https://reviews.llvm.org/D113080
2021-11-04 13:00:44 -07:00
David Green 091244023a [ARM] Move VPTBlock pass after post-ra scheduling
Currently when tail predicating loops, vpt blocks need to be created
with the vctp predicate in case we need to revert to non-tail predicated
form. This has the unfortunate side effect of severely hampering post-ra
scheduling at times as the instructions are already stuck in vpt blocks,
not allowed to be independently ordered.

This patch addresses that by just moving the creation of VPT blocks
later in the pipeline, after post-ra scheduling has been performed. This
allows more optimal scheduling post-ra before the vpt blocks are
created, leading to more optimal tail predicated loops.

Differential Revision: https://reviews.llvm.org/D113094
2021-11-04 18:42:12 +00:00
Wouter van Oortmerssen a320f877ce [WebAssembly] Fix debug locations for ExplicitLocals pass
This is a reworked version of the reverted patch: https://reviews.llvm.org/D112487
Note that
a) it doesn't need the test changes anymore, and
b) I checked at least locally it passes other.test_pthread_lsan_leak

Differential Revision: https://reviews.llvm.org/D113208
2021-11-04 11:38:03 -07:00
Rahman Lavaee f533ec37eb Make the BBAddrMap struct binary-format-agnostic.
The only binary-format-related field in the BBAddrMap structure is the function address (`Addr`), which will use uint64_t in 64B format and uint32_t in 32B format. This patch changes it to use uint64_t in both formats.
This allows non-templated use of the struct, at the expense of a marginal additional size overhead for the 32-bit format. The size of the BB address map section does not change.

Differential Revision: https://reviews.llvm.org/D112679
2021-11-04 10:27:24 -07:00
Zakk Chen 0649dfebba [RISCV] Rename some assembler mnemonic and intrinsic functions for RVV 1.0.
Rename vpopc/vmandnot/vmornot to vcpop/vmandn/vmorn assembler mnemonic.

Reviewed By: frasercrmck, jrtc27, craig.topper

Differential Revision: https://reviews.llvm.org/D111062
2021-11-04 10:08:01 -07:00
Kazu Hirata 2887117d2c [Hexagon] Use make_early_inc_range (NFC) 2021-11-04 08:51:05 -07:00
Jamie Schmeiser 8720149d9b Remove unused function from print-changed=dot-cfg code
Summary:
Remove unused function from print-changed=dot-cfg code to silence a
gcc compiler warning.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: uabelho(Mikael Holmen)
Differential Revision: https://reviews.llvm.org/D113188
2021-11-04 10:40:50 -04:00
Sander de Smalen 1ea4296208 [NFC] Remove from UnivariateLinearPolyBase::getValue().
This interface should not have existed in the first place, let alone
be a public member.

It allows calling `ElementCount::get(..)->getValue()`, which is ambiguous.
The interfaces to be used are either getFixedValue() or getKnownMinValue().
2021-11-04 14:32:08 +00:00
Sjoerd Meijer 3fd1902ad8 [FuncSpec] Enable it only with -O3
Function specialisation was running at all optimisation levels (if enabled on
the command line, it is not on by default). That was an oversight and not
something we want to do. Function specialisation duplicates functions when it
triggers, so the backend is processing more functions/instructions resulting in
compile-time increases, which seems more appropriate with -O3 and inline with
GCC. Please note that since function specialisation is not enabled by default,
this didn't require updating any pass manager tests.

Differential Revision: https://reviews.llvm.org/D112129
2021-11-04 13:59:00 +00:00
Chen Zheng f6db18fd4a [PowerPC][NFC] make option ppc-formprep-max-vars can be set more than one time. 2021-11-04 13:44:58 +00:00
Simon Pilgrim 87d5bb66eb [X86][SSE] Improve PMADDWD SimplifyDemandedVectorElts handling
Check both operands for zero elements to remove unnecessary demanded elts.

Try to help reduce some minor regressions noticed in D110995
2021-11-04 12:56:31 +00:00
Florian Hahn b4992dbb21
[LV] Clarify uniform worklist contains instrs demanding lane 0. 2021-11-04 13:11:50 +01:00
Tim Northover 3d39612b3d Coroutines: don't infer function attrs before lowering
Coroutines have weird semantics that don't quite match normal LLVM functions,
so trying to infer even simple attributes based on thier contents can go wrong.
2021-11-04 10:24:28 +00:00
David Green 1e5f814302 [InstCombine] Fix infinite recursion in ashr/xor vector fold.
The added test has poison lanes due to the vector shuffle. This can
cause an infinite loop of combines in instcombine where it folds
xor(ashr, -1) -> select (icmp slt 0), -1, 0 -> sext (icmp slt 0) -> xor(ashr, -1).
We usually prevent this by checking that the xor constant is not -1,
but with vectors some of the lanes may be -1, some may be poison. So
this changes the way we detect that from "!C1->isAllOnesValue()" to
"!match(C1, m_AllOnes())", which is more able to detect that some of the
lanes are poison.

Fixes PR52397
2021-11-04 09:24:27 +00:00
Qiu Chaofan a84118756c [PowerPC] Enforce side effects to FPSCR read/set intrinsics
Currently, FPSCR is not modeled, so in some early passes (such as
early-cse), the read/set intrinsics to FPSCR may get incorrect
simplification.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112380
2021-11-04 11:45:32 +08:00
RamNalamothu 539f500e78 [AMDGPU] Do not add debug locations to the code inside prologue
There is no real source location for code inside prologue as it is
generated by compiler but source locations are being added to code
inside prologue as a side effect of https://reviews.llvm.org/D99269
because buildSpillLoadStore() is using source location of the real
instruction in the basic block if any.

Fixes: SWDEV-307590

Reviewed By: scott.linder, sebastian-ne

Differential Revision: https://reviews.llvm.org/D113100
2021-11-04 08:02:41 +05:30
Philip Reames d4708fa480 Backout must-exit based parts of 3fc9882e, and 412eb0
Not sure these are correct.  I think I missed a case when porting this from the original SCEV change to the IndVar changes.  I may end up reapplying this later with a comment about how this is correct, but in case the current bad feeling turns out to be true, I'm removing from tree while investigating further.
2021-11-03 15:19:49 -07:00
Arthur Eubanks 88052fc362 [ArgPromo] Preserve FunctionAnalysisManagerCGSCCProxy
We already make sure to properly clear analyses for deleted functions.

This makes investigating some future potential compile time improvements easier.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D113032
2021-11-03 14:56:58 -07:00
Craig Topper 5022ac0771 [RISCV] Use HasVInstructions and HasVInstructionsAnyF in more place in TableGen. NFC
Change RISCVSubtarget.hasVInstructionAnyF() to call hasVInstructionsF32
so that any changes to hasVInstructionsF32 are reflected.

The files were missed in D112496.
2021-11-03 14:32:45 -07:00
Matthias Braun 847a680733 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This is a re-commit of e2c7ee0743 which
was reverted in a2a58d91e8. This includes
a fix to consistently check for EFLAGS being live-out. See phabricator
review.

Original Summary:

This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-11-03 14:12:23 -07:00
Philip Reames 64990f1408 Revert "[indvars] Move a check slightlly earlier [NFC]"
This reverts commit 7ff943a9ed.

This wasn't NFC.  isSigned != !isUnsigned as there are also relational operators.
2021-11-03 13:38:52 -07:00
Kirill Stoimenov a55c4ec1ce [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 20:27:53 +00:00
alex-t 0a3d755ee9 [AMDGPU] Enable divergence-driven BFE selection
Detailed description: This change enables the bit field extract patterns
selection to s_bfe_u32 or v_bfe_u32 dependent on the pattern root node
divergence.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D110950
2021-11-03 23:26:59 +03:00
Martin Storsjö a39eba7207 [Support] [Windows] Use RemoveFileOnSignal if unable to use the delete-on-close flag
This takes care of cleaning up the temp files on crashes. It doesn't
handle cleanup when explicitly killed though.

Differential Revision: https://reviews.llvm.org/D112710
2021-11-03 21:29:37 +02:00
Philip Reames 7ff943a9ed [indvars] Move a check slightlly earlier [NFC] 2021-11-03 12:24:10 -07:00
Philip Reames 3fc9882e88 [indvars] Rotate zext though icmp to reduce loop varying computation
This change looks for cases where we can prove that an exit test of a loop can be performed in a narrower bitwidth, and that by doing so we can replace a loop-varying extend with a loop-invariant truncate.

The motivation here is that doing this unblocks the trip count analysis for narrow IVs involved in extended compare exit tests. It also has the nice side effect of simply making the code faster, even if we gain no other benefit from the improved analysis ability.

I've noted a few places this could be extended, but I think this stands reasonable on it's own as well.

Differential Revision: https://reviews.llvm.org/D112262
2021-11-03 12:09:20 -07:00
Vitaly Buka 32eb697c0a [PassBuilder] Remove unused function after D113072 2021-11-03 12:03:17 -07:00
Vitaly Buka 3131714f8d [NFC][asan] Use AddressSanitizerOptions in ModuleAddressSanitizerPass
Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113072
2021-11-03 11:32:14 -07:00
Kirill Stoimenov b3145323b5 Revert "[ASan] Process functions in Asan module pass"
This reverts commit 76ea87b94e.

Reviewed By: kstoimenov

Differential Revision: https://reviews.llvm.org/D113129
2021-11-03 18:01:01 +00:00
Kirill Stoimenov 76ea87b94e [ASan] Process functions in Asan module pass
This came up as recommendation while reviewing D112098.

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D112732
2021-11-03 17:51:01 +00:00
Harald van Dijk 889c2b97bd
[X86] Fix X32 indirect call generation
The check for whether a zero extension was needed was subtly wrong and
saw a value that was already 64 bits, so did not extend.

Fixes PR52357.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112860
2021-11-03 16:43:44 +00:00
Sanjay Patel c85df3c7d5 [InstCombine] refactor fold for icmp with trunc op; NFC
There are at least 3 related folds we can add here - see D112634.
2021-11-03 12:43:15 -04:00
Roman Lebedev 9c2469c1dd
[PassManager] `buildModuleOptimizationPipeline()`: schedule `LoopDeletion` pass run before vectorization passes
Test thanks to Michael Kuklinski from `#llvm`: https://godbolt.org/z/bdrah5Goo
originally inspired by Daniel Lemire's https://lemire.me/blog/2021/10/26/in-c-is-empty-faster-than-comparing-the-size-with-zero/

We manage to deduce that the answer does not require looping,
but we do that after the last `LoopDeletion` pass run,
so we end up being stuck with a dead loop.

Now, as with all things SCEV, this has
a very expected ~`+0.12%` compile time performance regression:
https://llvm-compile-time-tracker.com/compare.php?from=0ae7bf124a9bca76dd9a91b2f7379168ff13f562&to=c2ae57c9b961aeb4a28c747266949340613a6d84&stat=instructions
(for comparison, doing that in function simplification pipeline
would have been ~`+0.5` compile time performance regression, D112840)

Looking at the transformation stats over vanilla test-suite, i think it's rather expected:
```
| statistic name                                   |  baseline |  proposed |     Δ |      % |    |%| |
|--------------------------------------------------|----------:|----------:|------:|-------:|-------:|
| scalar-evolution.NumBruteForceTripCountsComputed |       789 |       888 |    99 | 12.55% | 12.55% |
| scalar-evolution.NumTripCountsNotComputed        |    105592 |    117900 | 12308 | 11.66% | 11.66% |
| loop-delete.NumBackedgesBroken                   |       542 |       559 |    17 |  3.14% |  3.14% |
| regalloc.numExtends                              |        81 |        79 |    -2 | -2.47% |  2.47% |
| indvars.NumFoldedUser                            |       408 |       400 |    -8 | -1.96% |  1.96% |
| indvars.NumElimCmp                               |      3831 |      3758 |   -73 | -1.91% |  1.91% |
| scalar-evolution.NumTripCountsComputed           |    299759 |    304278 |  4519 |  1.51% |  1.51% |
| loop-delete.NumDeleted                           |      8055 |      8128 |    73 |  0.91% |  0.91% |
| machine-cse.NumCommutes                          |       111 |       110 |    -1 | -0.90% |  0.90% |
| globaldce.NumFunctions                           |      1187 |      1192 |     5 |  0.42% |  0.42% |
| codegenprepare.NumSelectsExpanded                |       277 |       278 |     1 |  0.36% |  0.36% |
| loop-unroll.NumRuntimeUnrolled                   |     13841 |     13791 |   -50 | -0.36% |  0.36% |
| machinelicm.NumPostRAHoisted                     |      1168 |      1172 |     4 |  0.34% |  0.34% |
| phi-node-elimination.NumCriticalEdgesSplit       |     83054 |     82879 |  -175 | -0.21% |  0.21% |
| machine-cse.NumPREs                              |      3085 |      3079 |    -6 | -0.19% |  0.19% |
| branch-folder.NumBranchOpts                      |    108122 |    107942 |  -180 | -0.17% |  0.17% |
| loop-unroll.NumUnrolled                          |     40136 |     40067 |   -69 | -0.17% |  0.17% |
| branch-folder.NumDeadBlocks                      |    130818 |    130607 |  -211 | -0.16% |  0.16% |
| codegenprepare.NumBlocksElim                     |     92856 |     92714 |  -142 | -0.15% |  0.15% |
| instsimplify.NumSimplified                       |    103263 |    103129 |  -134 | -0.13% |  0.13% |
| instcombine.NumConstProp                         |     26070 |     26102 |    32 |  0.12% |  0.12% |
| instsimplify.NumExpand                           |      1716 |      1718 |     2 |  0.12% |  0.12% |
| loop-unroll.NumCompletelyUnrolled                |      9236 |      9225 |   -11 | -0.12% |  0.12% |
| branch-folder.NumHoist                           |      2773 |      2770 |    -3 | -0.11% |  0.11% |
| regalloc.NumReloadsRemoved                       |     10822 |     10834 |    12 |  0.11% |  0.11% |
| regalloc.NumSnippets                             |     11394 |     11406 |    12 |  0.11% |  0.11% |
| machine-cse.NumCrossBBCSEs                       |      1052 |      1053 |     1 |  0.10% |  0.10% |
| machinelicm.NumCSEed                             |     99887 |     99784 |  -103 | -0.10% |  0.10% |
| branch-folder.NumTailMerge                       |     72501 |     72435 |   -66 | -0.09% |  0.09% |
| codegenprepare.NumExtUses                        |     22007 |     21987 |   -20 | -0.09% |  0.09% |
| local.NumRemoved                                 |     68232 |     68294 |    62 |  0.09% |  0.09% |
| loop-vectorize.LoopsAnalyzed                     |     75483 |     75413 |   -70 | -0.09% |  0.09% |
```

Note that i'm only changing current PM, and not touching obsolete PM.

This is an alternative to the function simplification pipeline variant
of the same change, D112840. It has both less compile time impact
(since the additional number of SCEV trip count calculations
is way lass less than with the D112840), and it is
much more powerful/impactful (almost 2x more loops deleted).

I have checked, and doing this after loop rotation
is favorable (more loops deleted).

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D112851
2021-11-03 19:24:49 +03:00
Kazu Hirata 4bef0304e1 [AArch64, AMDGPU] Use make_early_inc_range (NFC) 2021-11-03 09:22:51 -07:00
Hans Wennborg a2a58d91e8 Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr"
This casued miscompiles of switches, see comments on the code review.

> This extends `optimizeCompareInstr` to re-use previous comparison
> results if the previous comparison was with an immediate that was 1
> bigger or smaller. Example:
>
>     CMP x, 13
>     ...
>     CMP x, 12   ; can be removed if we change the SETg
>     SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP
>
> Motivation: This often happens because SelectionDAG canonicalization
> tends to add/subtract 1 often when optimizing for fallthrough blocks.
> Example for `x > C` the fallthrough optimization switches true/false
> blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
> `x < C + 1`.
>
> Differential Revision: https://reviews.llvm.org/D110867

This reverts commit e2c7ee0743.
2021-11-03 17:01:36 +01:00
Roman Lebedev df93c8a919
[X86] `X86TTIImpl::getInterleavedMemoryOpCostAVX512()`: fallback to scalarization cost computation for mask
I don't really buy that masked interleaved memory loads/stores are supported on X86.
There is zero costmodel test coverage, no actual cost modelling for the generation
of the mask repetition, and basically only two LV tests.
Additionally, i'm not very interested in AVX512.

I don't know if this really helps "soft" block over at
https://reviews.llvm.org/D111460#inline-1075467,
but i think it can't make things worse at least.

When we are being told that there is a masking, instead of
completely giving up and falling back to
fully scalarizing `BasicTTIImplBase::getInterleavedMemoryOpCost()`,
let's correctly query the cost of masked memory ops,
keep all the pretty shuffle cost modelling,
but scalarize the cost computation for the mask replication.

I think, not scalarizing the shuffles themselves
may adjust the computed costs a bit,
and maybe hopefully just enough to hide the "regressions"
at https://reviews.llvm.org/D111460#inline-1075467
I do mean hide, because the test coverage is non-existent.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112873
2021-11-03 18:14:35 +03:00
Erich Keane 09233412ed Revert part of D112349 to allow ifunc resolvers be declarations.
The patch in D112349 added a previously nonexistant restriction on ifunc
resolvers that they MUST be defintions.  However, the function
multiversioning depends on being able to resolve these resolvers at
link-time, so this additional restriction was breaking.
2021-11-03 07:15:16 -07:00
David Sherwood c0f2774973 [NFC][LoopVectorize] Simple tidy-up in InnerLoopVectorizer::createVectorIntOrFpInductionPHI
Use getSignedIntOrFpConstant instead of creating int or FP constants
manually.
2021-11-03 14:05:21 +00:00
Peter Waller 7a34145f40 Reland "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 753eba6421.

Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test
(index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 13:42:14 +00:00
Peter Waller 753eba6421 Revert "[AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store"
This reverts commit 1febf42f03, which has
a use-of-uninitialized-memory bug.

See: https://reviews.llvm.org/D112076
2021-11-03 13:39:38 +00:00
Florian Hahn 64bc31ee93
[LV] Drop unneeded use of getVPSingleValue (NFC).
VPReductionPHIRecipe inherits from VPValue, so there's no need to call
getVPSingleValue.
2021-11-03 14:26:15 +01:00
Florian Hahn 8e44bdd12a
[VPlan] Make VPWidenCanonicalIVRecipe a VPValue (NFC).
The recipe produces exactly one VPValue and can inherit directly from
it. This is in line with other recipes and avoids having to use
getVPSingleValue.
2021-11-03 14:11:01 +01:00
Andrew Savonichev 123ad720f1 [NVPTX] Mark special registers as reserved
A reserved register:
 - is not allocatable
 - is considered always live
 - is ignored by liveness tracking

NVPTX special registers match the criteria, and marking them as
reserved helps to avoid machine verifier error:

    *** Bad machine code: Using an undefined physical register ***
    - function:    foo
    - basic block: %bb.0  (0x557bb178b708)
    - instruction: %0:int32regs = MOV_SPECIAL $envreg0
    - operand 1:   $envreg0

Differential Revision: https://reviews.llvm.org/D113008
2021-11-03 15:48:04 +03:00
Cullen Rhodes d968b173d3 [TableGen] Emit a warning for unused template args
Add a warning to TableGen for unused template arguments in classes and
multiclasses, for example:

  multiclass Foo<int x> {
    def bar;
  }

  $ llvm-tblgen foo.td

  foo.td:1:20: warning: unused template argument: Foo::x
  multiclass Foo<int x> {
                     ^
A flag '--no-warn-on-unused-template-args' is added to disable the
warning. The warning is disabled for LLVM and sub-projects if
'LLVM_ENABLE_WARNINGS=OFF'.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D109359
2021-11-03 11:55:07 +00:00
Andrew Savonichev 0e70785538 [NVPTX] Add MoveParam instruction for TargetExternalSymbol operand
TargetExternalSymbol is considered to be an immediate and not a
register, so machine verifier emits an error:

    *** Bad machine code: Expected a register operand. ***
    - function:    static_offset
    - basic block: %bb.0 bb (0x560e9b306028)
    - instruction: %3:int64regs = MoveParamI64 &static_offset_param_1
    - operand 1:   &static_offset_param_1

The patch adds variants of this instruction with an immediate operand
for byval arguments on 64-bit and 32-bit targets.

Differential Revision: https://reviews.llvm.org/D113006
2021-11-03 14:43:41 +03:00
David Green 3bc586b9aa [ARM] Treat MVE gather add-like-or's like adds
LLVM has the habit of turning adds with no common bits set into ors,
which means we need to detect them and treat them like adds again in the
MVE gather/scatter lowering pass.

Differential Revision: https://reviews.llvm.org/D112922
2021-11-03 11:41:06 +00:00
Peter Waller 1febf42f03 [AArch64][SVE][InstCombine] Combine contiguous gather/scatter to load/store
Contiguous gather => masked load:

  (sve.ld1.gather.index Mask BasePtr (sve.index IndexBase 1))
  => (masked.load (gep BasePtr IndexBase) Align Mask undef)

Contiguous scatter => masked store:

  (sve.ld1.scatter.index Value Mask BasePtr (sve.index IndexBase 1))
  => (masked.store Value (gep BasePtr IndexBase) Align Mask)

Tests with <vscale x 2 x double>:

[Gather, Scatter] for each [Positive test (index=1), Negative test (index=2), Alignment propagation].

Differential Revision: https://reviews.llvm.org/D112076
2021-11-03 11:02:44 +00:00
David Green d36dd1f842 [ARM] Push gather/scatter shl index updates out of loops
This teaches the MVE gather scatter lowering pass that SHL is
essentially the same as Mul, where we are able to optimize the
induction of a gather/scatter address by pushing them out of loops.
https://alive2.llvm.org/ce/z/wG4VyT

Differential Revision: https://reviews.llvm.org/D112920
2021-11-03 11:00:05 +00:00
Qiu Chaofan 741aeda97d [PowerPC] Implement longdouble pack/unpack builtins
Implement two builtins to pack/unpack IBM extended long double float,
according to GCC 'Basic PowerPC Builtin Functions Available ISA 2.05'.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D112055
2021-11-03 17:57:25 +08:00
Andrew Savonichev 30a3a17df8 [NVPTX] Copy machine operand flags in TII::insertBranch
Before this patch, flags such as undef were dropped by TII::insertBranch
(used by BranchFolding pass), resulting in the following error from
machine verifier:

    *** Bad machine code: Reading virtual register without a def ***
    - function:    hoge
    - basic block: %bb.0 bb (0x562e9c240e68)
    - instruction: CBranch %2:int1regs, %bb.3
    - operand 0:   %2:int1regs

Differential Revision: https://reviews.llvm.org/D113001
2021-11-03 12:38:27 +03:00
Yi Kong 803d4f8a35 [ARM][AsmParser] Don't emit "deprecated instruction in IT block" warning if requested
Also fixed formatting in AsmMatcherEmitter because it was confusing.

Differential Revision: https://reviews.llvm.org/D112993
2021-11-03 17:18:04 +08:00
Piotr Sobczak 03961709ed [InstCombine] Extend pattern to replace shuffle's insertelement operand
In D71220 a pattern was added to replace shuffle's insertelement operand
if inserted scalar is not demanded. The pattern was added only for
the case where the shuffle's mask size is equal to element's vector size.
However, that condition is not required because the pattern does not
change the shuffle vector size.

This patch extends the pattern to also include cases where shuffle's mask
size is not equal to element's vector size.

Differential Revision: https://reviews.llvm.org/D112318
2021-11-03 09:43:04 +01:00
Ben Shi 59c3b48d99 Revert "[AArch64] Optimize add/sub with immediate"
This reverts commit 3de3ca3137.
2021-11-03 14:15:21 +08:00
Chen Zheng 5a8b196340 [PowerPC] handle more splat loads without stack operation
This mostly improves splat loads code generation on Power7

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D106555
2021-11-03 05:17:41 +00:00
Johannes Doerfert d61aac76bf [OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode
If we assume SPMD-mode during the fixpoint iteration we have to execute
the kernel in SPMD-mode. If we change our mind during manifest there is
the chance of a mismatch between the simplification, e.g., of
`__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem
was introduced in D109438.

This patch is compromise to resolve the problem purely in OpenMP-opt
while trying to keep the benefits of D109438 around. This might not
always work, see `get_hardware_num_threads_in_block_fold` but it often
does. At the same time we do keep value specialization and execution
mode in sync.

Proper solutions to this problem should be considered. I believe a new
execution mode is the easiest way forward (Singleton-SPMD).
Alternatively, SPMD-mode execution can be used with a way to provide a
new thread_limit (here 1) to the runtime. This is more general and could
be useful if we see `num_threads` clauses or workshared loops with small
trip counts in the kernel. In either proposal we need to disable the
guarding for the kernel (which was the motivation for D109438).

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D112894
2021-11-02 23:22:04 -05:00
Johannes Doerfert 73720c8059 [OpenMP][FIX] Introduce and use a simple generic-mode barrier
Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was
OK to be used in the custom state machine. Now that SPMD barriers are
assumed to be aligned we need to use a "generic" barrier in places
that are not aligned.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D112893
2021-11-02 23:22:01 -05:00
Johannes Doerfert e6e440ae5f [OpenMP][FIX] Ensure guarding uses proper global name
Global symbols cannot have any name so we need to sanitize the string
first. Also remove an assertion that is not actually necessary nor
true in general.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D112892
2021-11-02 23:21:53 -05:00
Abinav Puthan Purayil fbe61fb0aa [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation.
The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.

Differential Revision: https://reviews.llvm.org/D113005
2021-11-03 09:09:24 +05:30
Ben Shi 3de3ca3137 [AArch64] Optimize add/sub with immediate
Optimize ([add|sub] r, imm) -> ([ADD|SUB] ([ADD|SUB] r, #imm0, lsl #12), #imm1),
if imm == (imm0<<12)+imm1. and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Optimize ([add|sub] r, imm) -> ([SUB|ADD] ([SUB|ADD] r, #imm0, lsl #12), #imm1),
if imm == -(imm0<<12)-imm1, and both imm0 and imm1 are non-zero 12-bit unsigned
integers.

Reviewed By: jaykang10, dmgreen

Differential Revision: https://reviews.llvm.org/D111034
2021-11-03 03:06:43 +00:00
Liren Peng 57e093162e [ScalarEvolution] Infer loop max trip count from array accesses
Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.

Reviewed By: reames, mkazantsev, nikic

Differential Revision: https://reviews.llvm.org/D109821
2021-11-03 10:40:18 +08:00
Phoebe Wang 8f101971b6 [X86][VARARG] Assign MMO earlier to avoid prolog insert point been sunk across VASTART_SAVE_XMM_REGS
The changes in D80163 defered the assignment of MachineMemOperand (MMO)
until the X86ExpandPseudo pass. This will result in crash due to prolog
insert point been sunk across the pseudo instruction VASTART_SAVE_XMM_REGS.

Moving the assignment to the creation of the node can avoid the problem.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D112859
2021-11-03 10:13:32 +08:00
Mircea Trofin 34f4fe3a90 [NFC][Regalloc] Ensure Query::interferingVRegs is accurate.
To correctly use Query, one had to first call collectInterferingVRegs to
pre-cache the query result, then call interferingVRegs. Failing the
former, interferingVRegs could be stale. This did cause a bug which was
addressed in D98232, but the underlying usability issue of the Query API
wasn't.

This patch addresses the latter by making collectInterferingVRegs an
implementation detail, and having interferingVRegs play both roles. One
side-effect of this is that interferingVRegs is not const anymore.

Differential Revision: https://reviews.llvm.org/D112882
2021-11-02 18:26:54 -07:00
Kazu Hirata 1b108ab975 [Transforms] Use make_early_inc_range (NFC) 2021-11-02 18:13:23 -07:00
Hongtao Yu d0eb472f33 [llvm-profdata] Print out section flags for FunctionMetadata section
As titled.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D113064
2021-11-02 17:59:22 -07:00
Eli Friedman c964afb2c8 [AArch64] Diagnose large adrp offset on Windows.
On Windows, this relocation can only encode a 21-bit offset. Make sure
we emit an error, instead of silently truncating the offset.

Found investigating https://bugs.llvm.org/show_bug.cgi?id=52378

Differential Revision: https://reviews.llvm.org/D113051
2021-11-02 15:11:22 -07:00
Nikita Popov c00e9c6345 [BasicAA] Check known access sizes earlier (NFC)
All heuristics for variable accesses require both access sizes to
be known, so check this once at the start, rather than for each
particular heuristic.
2021-11-02 21:26:26 +01:00
Nikita Popov 0b6ed92c8a [BasicAA] Use early returns (NFC)
Reduce nesting in aliasGEP() a bit by returning early.
2021-11-02 21:17:36 +01:00
Simon Pilgrim 53900a19fd [X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper for splat of normal vector loads. NFCI.
Reapplied from rG1cfecf4fc427 with fix for PR51226 - ensure the load is a normal (non-ext) load.
2021-11-02 20:03:25 +00:00
Nikita Popov 51e9f33603 [BasicAA] Use saturating multiply on range if nsw
If we know that the var * scale multiplication is nsw, we can use
a saturating multiplication on the range (as a good approximation
of an nsw multiply). This recovers some cases where the fix from
D112611 is unnecessarily strict. (This can be further strengthened
by using a saturating add, but we currently don't track all the
necessary information for that.)

This exposes an issue in our NSW tracking for multiplies. The code
was assuming that (X +nsw Y) *nsw Z results in
(X *nsw Z) +nsw (Y *nsw Z) -- however, it is possible that the
distributed multiplications overflow, even if the non-distributed
one does not. We should discard the nsw flag if the the offset is
non-zero. If we just have (X *nsw Y) *nsw Z then concluding
X *nsw (Y *nsw Z) is fine.

Differential Revision: https://reviews.llvm.org/D112848
2021-11-02 20:27:39 +01:00
Chih-Ping Chen 2ed29d87ef [CodeView] Fortran debug info emission in Code View.
Differential Revision: https://reviews.llvm.org/D112826
2021-11-02 15:06:21 -04:00
Christopher Tetreault 5718b9f128 [NFC] Reformat VerifyPreservedCFG for non-CPP-aware syntax highlighters
* Move `);` outside the #ENDIF. Syntax highlighters that highlight missed
  closing parens, but are not aware of the C Preprocessor saw the original
  code as having missed parens.
2021-11-02 11:35:38 -07:00
Simon Pilgrim 82e0eb22af [X86][AVX] combineConcatVectorOps - use getBROADCAST_LOAD helper. NFCI.
This is part of rG1cfecf4fc427 that was reverted to fix PR51226 - concating the broadcasts is OK, its the splatted loads that crash (we're not detecting extloads). I'm still creating a reduced test case so haven't added the load handling again yet.
2021-11-02 18:04:35 +00:00
Fraser Cormack d065b03801 [RISCV] Optimize vp.load with an all-ones mask
Similar to D110206, this patch optimizes unmasked vp.load intrinsics to
avoid the need of a vmset instruction to set the mask. It does so by
selecting a riscv_vle intrinsic rather than a riscv_vle_mask intrinsic.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D113022
2021-11-02 17:23:39 +00:00
Dmitry Makogon e09958d5eb [LoopPeel] Peel loops with exits followed by an unreachable or deopt block
Added support for peeling loops with exits that are followed either by an
unreachable-terminated block or block that has a terminatnig deoptimize call.
All blocks in the sequence must have an unique successor, maybe except
for the last one.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D110922
2021-11-02 23:12:04 +07:00
Arthur Eubanks e2024d72fa Revert "[NFC] Remove LinkAll*.h"
This reverts commit fe364e5dc7.

Causes breakages, e.g. https://lab.llvm.org/buildbot/#/builders/188/builds/5266
2021-11-02 09:08:09 -07:00
Jamie Schmeiser 816761f044 Add new choices dot-cfg and dot-cfg-quiet to print-changed.
Summary:
Add new options -print-changed=[dot-cfg | dot-cfg-quiet] which create
a website of DOT files showing colourized changes as the IR is changed
by passes in the new pass manager pipeline.

A new change reporter is introduced that creates a website of changes made
by passes in the opt pipeline that change the IR. The hidden option
-dot-cfg-dir=<dir> specifies a directory (defaulting to "./") into which the
website will be created.

A file passes.html is created that contains a list of all the passes that
act on the IR. Those that do not change the IR are listed as omitted
because of no change, ignored or filtered out (using -filter-print-func
and -filter-passes) or not listed in quiet mode. Those that
do change the IR are listed as a link to a DOT file which contains a
CFG depiction of the IR (ala -dot-cfg) except that the instructions,
basic blocks and links that are only in the IR before the pass (ie, removed)
and those that are only in the IR after the pass (ie, added) are shown in
red and green, respectively, while the aspects of the CFG that do not change
are shown in black. Additional hidden options
-dot-cfg-before-color=<dot named color>,
-dot-cfg-after-color=<dot named color> and
-dot-cfg-common-color=<dot named color> are defined that allow the
customization of the colors used in colorizing the CFG.
-change-printer-dot-path=<path to dot exe> is also added.

Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D87202
2021-11-02 12:06:25 -04:00
Arthur Eubanks fe364e5dc7 [NFC] Remove LinkAll*.h
These were added to prevent functions from being removed by WPO.

But that doesn't make sense, correct WPO will not remove functions we actually use.

I noticed these because compiling cc1_main.cpp was pulling in random LLVM pass headers.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112971
2021-11-02 08:43:17 -07:00
Jay Foad be1a8f8834 [AMDGPU] Really preserve LiveVariables in SILowerControlFlow
https://bugs.llvm.org/show_bug.cgi?id=52204

Differential Revision: https://reviews.llvm.org/D112731
2021-11-02 15:03:37 +00:00
Matt 895145aacb Revert "[AArch64][SVE] Combine predicated FMUL/FADD into FMA"
This reverts commit fc28a2f8ce.
2021-11-02 14:56:01 +00:00
Youngsuk Kim 76b53da3ce
[SimpleLoopUnswitch] Remove duplicate include.
Header "llvm/Transforms/Scalar/SimpleLoopUnswitch.h" is currently
included twice. This commit removes the duplicate 'include' line.

Previous commit 693eedb138
seems to have mistakenly added the duplicate 'include'.

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D112979
2021-11-02 15:22:41 +01:00
Sanjay Patel 829146164f [InstCombine] change 'not' match for bitwise select
The tests diffs are logically equivalent, and so this is
generally NFC, but this makes the code match the code
comment.

It should also be more efficient. If we choose the 'not'
operand (rather than the 'not' instruction) as the select
condition, then we don't have to invert the select
condition/operands as a subsequent transform.
2021-11-02 10:16:01 -04:00
Simon Pilgrim e173631dd1 [X86][AVX] SimplifyDemandedVectorEltsForTargetNode - use getBROADCAST_LOAD helper. NFCI.
Reduce width of X86ISD::SUBV_BROADCAST_LOAD node.
2021-11-02 14:07:22 +00:00
Simon Pilgrim 8ca666a280 [X86][AVX] lowerV2X128Shuffle - use getBROADCAST_LOAD helper. NFCI. 2021-11-02 14:07:21 +00:00
Daniele Vettorel 67887b0f81 [Scalarizer] Do not insert instructions between PHI nodes and debug intrinsics.
The scalarizer pass seems to be inserting instructions in-between PHI nodes or debug intrinsics that end up staying at the end of the pass, resulting in malformed IR and violating assumptions.

This patch adds a check to make sure the `extractelement` instructions that it adds are correctly placed after all PHI nodes and debug intrinsics.

Patch by vettoreldaniele.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D112472
2021-11-02 09:53:59 -04:00
Martin Liska c5029023fb Fix building with GCC 12:
Fixes: https://bugs.llvm.org/show_bug.cgi?id=52380

Differential Revision: https://reviews.llvm.org/D112990
2021-11-02 14:28:00 +01:00
jacquesguan a39eadcf16 [DAGCombiner] Teach combineShiftToMULH to handle constant and const splat vector.
Fold (srl (mul (zext i32:$a to i64), i64:c), 32) -> (mulhu $a, $b),
if c can truncate to i32 without loss.

Reviewed By: frasercrmck, craig.topper, RKSimon

Differential Revision: https://reviews.llvm.org/D108129
2021-11-02 12:04:23 +00:00
David Callahan 4ec1b8eeac [RISCV] Fix invalid kill on callee save
A callee save may be live (specifically X1) on entry and so a spill
should not mark it killed.

Differential Revision: https://reviews.llvm.org/D111285
2021-11-02 11:56:54 +00:00
Simon Pilgrim 325031786e [SelectionDAG] Optimize expansion for rotates/funnel shifts
If the type of a funnel shift needs to be expanded, expand it to two funnel shifts instead of regular shifts. For constant shifts, this doesn't make much difference, but for variable shifts it allows a more optimal lowering.

Also use the optimized funnel shift lowering for rotates.

Alive2: https://alive2.llvm.org/ce/z/TvHDB- / https://alive2.llvm.org/ce/z/yzPept

(Branched from D108058 as getting this completed should help unlock some other WIP patches).

Original Patch: @efriedma (Eli Friedman)

Differential Revision: https://reviews.llvm.org/D112443
2021-11-02 11:38:25 +00:00
Simon Pilgrim 37e17f278f [DAG] MatchRotate - remove (redundant) legal type check.
Rely on the hasOperation() instead - as commented on D77804, the mid-term intention is to recognise rotate/funnel-by-constant pre-legalization to help avoid SimplifyDemandedBits regressions.
2021-11-02 11:24:50 +00:00
Frederic Cambus 650311737e [llvm-readobj] Add support for reading OpenBSD ELF core notes.
Notes generated in OpenBSD core files provide additional information
about the kernel state and CPU registers. These notes are described
in core.5, which can be viewed here: https://man.openbsd.org/core.5

Differential Revision: https://reviews.llvm.org/D111966
2021-11-02 10:18:54 +01:00
Rosie Sumpter dcb8222d87 [LoopVectorize] Propagate fast-math flags for inloop reductions
This patch updates VPReductionRecipe::execute so that the fast-math
flags associated with the underlying instruction of the VPRecipe are
propagated through to the reductions which are created.

Differential Revision: https://reviews.llvm.org/D112548
2021-11-02 08:59:53 +00:00
Kazu Hirata 6bdb61c58a [CodeGen] Use make_early_inc_range (NFC) 2021-11-01 22:38:49 -07:00
hsmahesha e9ea992496 [IR] Replace *all* uses of a constant expression by corresponding instruction
When a constant expression CE is being converted into a corresponding instruction I,
CE is supposed to be replaced by I. However, it is possible that CE is being used multiple
times within a parent instruction PI. Make sure that *all* the uses of CE within PI are
replaced by I.

Reviewed By: rampitec, arsenm

Differential Revision: https://reviews.llvm.org/D112717
2021-11-02 10:01:46 +05:30
Hongtao Yu d137854412 [SamplePGO] Fix callsite sample lookup to use dwarf names when dwarf linkage name isn't available.
When linkage name isn't available in dwarf (ususally the case of C code),  looking up callee samples should be based on the dwarf name instead of using an empty string.

Also fixing a test issue where using empty string to look up callee samples accidentally returns the correct samples because it is treated as indirect call.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112948
2021-11-01 21:24:33 -07:00
Lang Hames e9014d9743 [ORC] Run incoming jit-dispatch calls via the TaskDispatcher in SimpleRemoteEPC.
Handlers for jit-dispatch calls are allowed to make their own EPC calls, so we
don't want to run these on the handler thread.
2021-11-01 15:49:14 -07:00
Wouter van Oortmerssen ac65366485 [WebAssembly] support "return" and unreachable code in asm type checker
To support return (it not being supported well was the ground cause for
https://github.com/WebAssembly/wasi-sdk/issues/200) we also have to have
at least a basic notion of unreachable, which in this case just means to stop
type checking until there is an end_block (an incoming control flow edge).
This is conservative (may miss on some type checking opportunities) but is
simple and an improvement over what we had before.

Differential Revision: https://reviews.llvm.org/D112953
2021-11-01 15:42:58 -07:00
Yonghong Song f63405f6e3 BPF: Workaround an InstCombine ICmp transformation with llvm.bpf.compare builtin
Commit acabad9ff6 ("[InstCombine] try to canonicalize icmp with
trunc op into mask and cmp") added a transformation to
convert "(conv)a < power_2_const" to "a & <const>" in certain
cases and bpf kernel verifier has to handle the resulted code
conservatively and this may reject otherwise legitimate program.

This commit tries to prevent such a transformation. A bpf backend
builtin llvm.bpf.compare is added. The ICMP insn, which is subject to
above InstCombine transformation, is converted to the builtin
function. The builtin function is later lowered to original ICMP insn,
certainly after InstCombine pass.

With this change, all affected bpf strobemeta* selftests are
passed now.

Differential Revision: https://reviews.llvm.org/D112938
2021-11-01 14:46:20 -07:00
Arthur Eubanks 029f1a5344 [LazyCallGraph] Skip blockaddresses
blockaddresses do not participate in the call graph since the only
instructions that use them must all return to someplace within the
current function. And passes cannot retrieve a function address from a
blockaddress.

This was suggested by efriedma in D58260.

Fixes PR50881.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D112178
2021-11-01 13:10:24 -07:00
Nikita Popov 4972d12185 [SCEV] Only add direct loop users (NFC)
It it now sufficient to track only direct addrec users of a loop,
and let the SCEVUsers mechanism track and invalidate transitive users.

Differential Revision: https://reviews.llvm.org/D112875
2021-11-01 18:49:43 +01:00
Cameron McInally 702fd3d323 [SVE] Fix VLS FMA matching for CodeGenOpt::Aggressive.
For NEON, FMA matching is done in the MachineCombiner, and not the
DAGCombiner. That causes problems with VLS lowering, since the
vectors are fixed width at the DAGCombiner, but are scalable in
the MachineCombiner. This patch corrects it by matching FMAs for
VLS vectors in the DAGCombiner.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D112557
2021-11-01 10:43:52 -07:00
Jay Foad b8016b626e [CodeGen] Tweak coding style in LivePhysRegs::stepForward. NFC. 2021-11-01 16:01:24 +00:00
Sanjay Patel 42c94bc1ab [InstCombine] allow vector splat matching for bitwise logic fold
Similar to 54e969cffd (and with cosmetic updates to hopefully
make that easier to read), this fold has been around since early
in LLVM history.

Intermediate folds have been added subsequently, so extra uses
are required to exercise this code.

The test example actually shows an unintended consequence with
extra uses - we end up with an extra instruction compared to what
we started with. But this at least makes scalar/vector consistent.

General proof:
https://alive2.llvm.org/ce/z/tmuBza
2021-11-01 11:39:48 -04:00
Kazu Hirata d000431fb2 [X86] Remove X86ELFObjectWriter in X86AsmBackend.cpp (NFC)
Note that the identically named class is defined in an anonymous
namespace in X86ELFObjectWriter.cpp.
2021-11-01 08:31:54 -07:00
Jay Foad 7afef22926 [AMDGPU] Use MachineInstrBuilder::addReg. NFC. 2021-11-01 15:29:51 +00:00
Jay Foad 2b548b18c1 [AMDGPU] Shrink v_mac_legacy_f32 and v_fmac_legacy_f32
Differential Revision: https://reviews.llvm.org/D112917
2021-11-01 13:55:53 +00:00
Matt Morehouse 4d8b0aa5c0 [HWASan] Apply TagMaskByte to every global tag.
Previously we only applied it to the first one, which could allow
subsequent global tags to exceed the valid number of bits.

Reviewed By: hctim

Differential Revision: https://reviews.llvm.org/D112853
2021-11-01 06:31:44 -07:00
Sanjay Patel 54e969cffd [InstCombine] allow vector splat matching for bitwise logic folds
This fold was added long ago (part of fixing PR4216),
and it matched scalars only. Intermediate folds have
been added subsequently, so extra uses are required
to exercise this code.

General proof:
https://alive2.llvm.org/ce/z/G6BBhB

One of the specific tests:
https://alive2.llvm.org/ce/z/t0JhEB
2021-11-01 08:26:42 -04:00
Sanjay Patel 511ee8759f [InstCombine] reduce code duplication with commutative matcher; NFC 2021-11-01 08:26:41 -04:00
Mubashar Ahmad 0b83a18a2b [AArch64] Enablement of Cortex-X2
Enables support for Cortex-X2 cores.

Differential Revision: https://reviews.llvm.org/D112459
2021-11-01 11:55:24 +00:00
Simon Pilgrim 6fc50e531d [CostModel][X86] Remove old FIXME comments for AVX512F vector splitting
Similar to AVX1, the cost of splitting/merging 512-bit -> 256-bits vectors for arithmetic operations are typically hidden due to different used ports etc.
2021-11-01 11:11:11 +00:00
Simon Pilgrim fd485d8cda [X86][AVX] Prefer VINSERTF128 over VPERM2F128 for 128->256 subvector concatenations
The VINSERTF128 instruction is often much quicker, and never slower, than the more general VPERM2F128 instruction, so we should try to use that in more circumstances.

This requires a fallback to a commuted VPERM2F128 for the case where we need to fold the 256-bit vector source instead of the 128-bit subvector source.

There is one interesting side effect - DAGCombine's narrowExtractedVectorLoad combine gets called in a number of locations, this often creates an extracted subvector load without regard to other uses of the original wider load. I'm expecting AVX cpus to be capable of merging such aliased loads, but I do wonder whether narrowExtractedVectorLoad's call to X86TargetLowering::shouldReduceLoadWidth needs to be altered to check for more partial uses?

Noticed while investigating the quality of interleaved load/store codegen.

Differential Revision: https://reviews.llvm.org/D111960
2021-11-01 10:45:50 +00:00
David Sherwood 87a294d5eb [LoopVectorize] Change getRuntimeVFAsFloat to use unsigned int->FP conversion
We never expect the runtime VF to be negative so we should use
the uitofp instruction instead of sitofp.

Differential revision: https://reviews.llvm.org/D112610
2021-11-01 09:58:14 +00:00
Roman Lebedev b554e41e2d
[CVP] Canonicalize signed relational comparisons of scalar integers to unsigned comparison predicates
Now that the reasoning was added to ConstantRange in D90924,
this replicates IndVars variant of this transform (D111836)
in a pass that uses value range reasoning for the transform.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112895
2021-11-01 12:16:05 +03:00
Jun Ma 1f9fa54984 [Taildup] Don't tail-duplicate loop header with multiple successors as its latches
when Taildup hit loop with multiple latches like:
  //    1 -> 2 <-> 3                 |
  //          \  <-> 4               |
  //           \   <-> 5             |
  //            \---> rest           |
it may transform this loop into multiple loops by duplicate loop header.
However, this change may has little benefit while makes cfg much complex.
In some uncommon cases, it causes large compile time regression (offered by
@alexfh in D106056).

This patch disable tail-duplicate of such cases.

TestPlan: check-llvm

Differential Revision: https://reviews.llvm.org/D110613
2021-11-01 15:32:00 +08:00
Jun Ma c93f93b2e3 Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when switch cover all possible values.""""
This reverts commit 3a998c06a8.
2021-11-01 15:31:59 +08:00
Kazu Hirata 476e1ee3da [AArch64] Remove unused declaration hasSwiftExtendedFrame (NFC) 2021-10-31 22:58:56 -07:00
Max Kazantsev e512c5b166 [SCEV][NFC] Factor out common API for getting unique operands of a SCEV
This function is used at least in 2 places, to it makes sense to make it separate.

Differential Revision: https://reviews.llvm.org/D112516
Reviewed By: reames
2021-11-01 11:36:47 +07:00
Chen Zheng eeed1545b2 [PowerPC] turn off chain commoning by default. 2021-11-01 04:11:10 +00:00
Itay Bookstein 848812a55e [Verifier] Add verification logic for GlobalIFuncs
Verify that the resolver exists, that it is a defined
Function, and that its return type matches the ifunc's
type. Add corresponding check to BitcodeReader, change
clang to emit the correct type, and fix tests to comply.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D112349
2021-10-31 20:00:57 -07:00
Zi Xuan Wu cf78715cae [CSKY] First patch to construct codegen infra and generate first add instruction
Ooops. It constructs codegen infra and provide only basic code to generate first add instruction successfully.

Differential Revision: https://reviews.llvm.org/D112206
2021-11-01 10:06:56 +08:00
Shoaib Meenai 0cf624cad7 [TimeProfiler] Reset variable to nullptr
Otherwise we'll hit a spurious assert failure when we reset and then
reinitialize TimeProfiler on the same thread, as can happen when e.g.
using LLD as a library and running it multiple times in the same
process.

Makes `lld/test/MachO/time-trace.s` pass with `LLD_IN_TEST=2`, which
runs the linker twice in the same process and exposed the issue.

Reviewed By: MaskRay, mehdi_amini

Differential Revision: https://reviews.llvm.org/D112880
2021-10-31 16:14:30 -07:00
Roman Lebedev 03a4f1f3b8
[ConstantRange] Sign-flipping of signedness-invariant comparisons
For certain combination of LHS and RHS constant ranges,
the signedness of the relational comparison predicate is irrelevant.

This implements complete and precise model for all predicates,
as confirmed by the brute-force tests. I'm not sure if there are
some more cases that we can handle here.

In a follow-up, CVP will make use of this.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D90924
2021-10-31 22:53:17 +03:00
Lang Hames ff846fcb64 [ORC][ORC-RT] Switch MachO EH/TLV registration from EPC-calls to alloc actions.
MachOPlatform used to make an EPC-call (registerObjectSections) to register the
eh-frame and thread-data sections for each linked object with the ORC runtime.

Now that JITLinkMemoryManager supports allocation actions we can use these
instead of an EPC call. This saves us one EPC-call per object linked, and
manages registration/deregistration in the executor, rather than the controller
process. In the future we may use this to allow JIT'd code in the executor to
outlive the controller object while still being able to be cleanly destroyed.

Since the code for allocation actions must be available when the actions are
run, and since the eh-frame registration code lives in the ORC runtime itself,
this change required that MachO eh-frame support be split out of
macho_platform.cpp and into its own macho_ehframe_registration.cpp file that has
no other dependencies. During bootstrap we start by forcing emission of
macho_ehframe_registration.cpp so that eh-frame registration is guaranteed to be
available for the rest of the bootstrap process. Then we load the rest of the
MachO-platform runtime support, erroring out if there is any attempt to use
TLVs. Once the bootstrap process is complete all subsequent code can use all
features.
2021-10-31 10:27:40 -07:00
Lang Hames b77c6db959 [JITLink] Fix alloc action call signature in InProcessMemoryManager.
Alloc actions should return a CWrapperFunctionResult. JITLink does not have
access to this type yet, due to library layering issues, so add a cut-down
version with a fixme.
2021-10-31 10:27:40 -07:00
Craig Topper ada5458521 [RISCV] Expand scalable vector bswap. Fix crash for bitreverse.
Fix LegalizeVectorOps to not try shuffle or unrolling expansions for
scalable vectors.

Differential Revision: https://reviews.llvm.org/D112236
2021-10-31 10:01:27 -07:00
Kazu Hirata 1a605f395f [CodeGen] Use make_early_inc_range (NFC) 2021-10-31 07:57:36 -07:00
Kazu Hirata 72710af233 [CodeGen, Target] Use MachineBasicBlock::terminators (NFC) 2021-10-31 07:57:34 -07:00
Kazu Hirata c714da2ceb [Transforms] Use {DenseSet,SetVector,SmallPtrSet}::contains (NFC) 2021-10-31 07:57:32 -07:00
Kazu Hirata 4cc7c4724f [MachineCSE] Use make_early_inc_range (NFC) 2021-10-30 19:00:23 -07:00
Kazu Hirata c8b1ed5fb2 [clang, llvm] Use Optional::getValueOr (NFC) 2021-10-30 19:00:21 -07:00
Lang Hames 213666f804 [ORC] Move CWrapperFunctionResult out of the detail:: namespace.
This type has been moved up into the llvm::orc::shared namespace.

This type was originally put in the detail:: namespace on the assumption that
few (if any) LLVM source files would need to use it. In practice it has been
needed in many places, and will continue to be needed until/unless
OrcTargetProcess is fully merged into the ORC runtime.
2021-10-30 16:12:45 -07:00
Kazu Hirata 5970249439 [Hexagon] Remove chksetELFHeaderEFlags (NFC)
The function was introduced without any use on Nov 9, 2015 in commit
7cd0892729.
2021-10-30 08:43:43 -07:00
Kazu Hirata c3d63a0697 [Hexagon] Remove ValidArch (NFC)
This function seems to be unused for at least one year.
2021-10-30 08:43:41 -07:00
Kazu Hirata c5cd371cc9 [Hexagon] Remove unused struct InstTy (NFC) 2021-10-30 08:43:39 -07:00
Roman Lebedev 25043c8276
[NFCI] Introduce `ICmpInst::compare()` and use it where appropriate
As noted in https://reviews.llvm.org/D90924#inline-1076197
apparently this is a pretty common pattern,
let's not repeat it yet again, but have it in a common place.

There may be some more places where it could be used,
but these are the most obvious ones.
2021-10-30 17:50:06 +03:00
David Green 2c4a9e830c [ValueTracking] Teach computeConstantRange that the maximum value of a half is 65504
The maximal value of a half is 0x7bff, which is 65504 when converted to
an integer. This patch teaches that to computeConstantRange to compute a
constant range with the correct maximum value.
https://alive2.llvm.org/ce/z/BV_Spb
https://alive2.llvm.org/ce/z/Nwuqvb

The maximum value for a float converted in the same way is 3.4e38, which
requires 129bits of data. I have not added that here as integer types so
larger are rare, compared to integers types larger than 17 bits require
for half floats.

The MVE tests change because instsimplify happens to be run as a part of
the backend, where it doesn't tend to for other backends.

Differential Revision: https://reviews.llvm.org/D112694
2021-10-30 14:27:38 +01:00
Christudasan Devadasan aa2d3b59ce GlobalISel/Utils: Use incoming regbank while constraining the superclasses
Register operands with superclasses can possibly have multiple regBanks
if they have different register types. The regBank ambiguity resolved
during regbankselect should be used to constrain the operand regclass
instead of obtaining one from the MCInstrDesc.

This is a prerequisite patch for D109300 that introduces allocatable AV_*
Superclasses for AMDGPU by combining both VGPRs and AGPRs and we want to
restrain the regclass to either A or V based on the incoming regbank.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112323
2021-10-30 07:20:45 -04:00
David Green 66281baea1 [InstCombine] Fix type of constant in canonicalizeClampLike
As a followup to D108049, one of the constants could now be generated
with an incorrect type, now that the input could be truncated.
2021-10-30 09:06:21 +01:00
Kazu Hirata 972d4133e9 Use {DenseSet,SmallPtrSet}::contains (NFC) 2021-10-29 20:26:07 -07:00
Lang Hames afeb1e4ac7 [ORC] Move all pass config into MachOPlatformPlugin::modifyPassConfig.
NFC, this just makes it easier to see and reason about pass ordering.
2021-10-29 20:07:45 -07:00
Duncan P. N. Exon Smith 0d5b6423ba Support: Reduce stats in fs::copy_file on Darwin
fs::copy_file() on Darwin has a nice optimization to clone the file when
possible. Change the implementation to use clonefile() directly, instead
of the higher-level copyfile().  The latter does the wrong thing for
symlinks, which requires calling `stat` first...

With that out of the way, optimistically call clonefile() all the time,
and then for any error that's recoverable try again with copyfile()
(without the COPYFILE_CLONE flag, as before).

Differential Revision: https://reviews.llvm.org/D112250
2021-10-29 16:48:35 -07:00
Stanislav Mekhanoshin e5340ed30c [AMDGPU] Fix global isel for kernels using agprs on gfx90a
With Global ISel getReservedRegs() is called before function is
regbank selected for the first time. Defer caching of usesAGPRs()
in this case.

Differential Revision: https://reviews.llvm.org/D112644
2021-10-29 14:23:14 -07:00
Florian Hahn 274a9b0f0b
[DSE] Support redundant stores eliminated by memset.
This patch adds support to remove stores that write the same value
as earlier memesets.

It uses isOverwrite to check that a memset completely overwrites a later
store. The candidate store must store the same bytewise value as the
byte stored by the memset.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D112321
2021-10-29 22:19:53 +01:00
Nikita Popov cdf45f98ca [BasicAA] Extract linear expression multiplication (NFC)
Extract a common method for multiplying a linear expression by a
factor.
2021-10-29 22:41:40 +02:00
Sam Clegg 3b039c68f2 Revert "[WebAssembly] Fix debug locations for ExplicitLocals pass"
This reverts commit a66451ebbe.

This caused a failure when integrated with emscripten:
https://ci.chromium.org/ui/p/emscripten-releases/builders/try/linux/b8832019855439718609/overview
2021-10-29 13:34:18 -07:00
Nikita Popov 7cf7378a9d [BasicAA] Don't treat non-inbounds GEP as nsw
The scale multiplication is only guaranteed to be nsw if the GEP
is inbounds (or the multiplication is trivial). Previously we were
only considering explicit muls in GEP indices.
2021-10-29 22:30:44 +02:00
Nick Desaulniers 39e5dd113f [SparcISelLowering] avoid emitting libcalls to __muloti4 and __mulodi4
These compiler-rt-only symbols aren't available in libgcc.  Similar to
D108842, D108844, and D108926.

Fixes: pr/52043

Reviewed By: craig.topper, rengolin

Differential Revision: https://reviews.llvm.org/D112750
2021-10-29 13:14:09 -07:00
Sanjay Patel 285b8abce4 [x86] limit vector increment fold to allow load folding
The tests are based on the example from:
https://llvm.org/PR52032

I suspect that it looks worse than it actually is. :)
That is, llvm-mca says there's no uop/timing difference with the
load folding and pcmpeq vs. broadcast on Haswell (and probably
other targets).
The load-folding definitely makes the code smaller, so it's good
for that at least. So this requires carving a narrow hole in the
transform to get just this case without changing others that look
good as-is (in other words, the transform still seems good for
most examples).

Differential Revision: https://reviews.llvm.org/D112464
2021-10-29 15:48:35 -04:00
Sanjay Patel 837518d6a0 [x86] make mayFold* helpers visible to more files; NFC
The first function is needed for D112464, but we might
as well keep these together in case the others can be
used someday.
2021-10-29 15:48:35 -04:00
Sanjay Patel 8f786b4618 [InstCombine] fix comments to match code; NFC 2021-10-29 15:48:35 -04:00
modimo 5caad9b5d3 [InlineAdvisor] Add fallback/format switches and negative remark processing to Replay Inliner
Adds the following switches:

1. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback: controls what the replay advisor does for inline sites that are not present in the replay. Options are:

 1. Original: defers to original advisor
 2. AlwaysInline: inline all sites not in replay
 3. NeverInline: inline no sites not in replay

2. --sample-profile-inline-replay-format/--cgscc-inline-replay-format: controls what format should be generated to match against the replay remarks. Options are:

  1. Line
  2. LineColumn
  3. LineDiscriminator
  4. LineColumnDiscriminator

Adds support for negative inlining decisions. These are denoted by "will not be inlined into" as compared to the positive "inlined into" in the remarks.

All of these together with the previous `--sample-profile-inline-replay-scope/--cgscc-inline-replay-scope` allow tweaking in how to apply replay. In my testing, I'm using:
1. --sample-profile-inline-replay-scope/--cgscc-inline-replay-scope = Function to only replay on a function
2. --sample-profile-inline-replay-fallback/--cgscc-inline-replay-fallback = NeverInline since I'm feeding in only positive remarks to the replay system
3. --sample-profile-inline-replay-format/--cgscc-inline-replay-format = Line since I'm generating the remarks from DWARF information from GCC which can conflict quite heavily in column number compared to Clang

An alternative configuration could be to do Function, AlwaysInline, Line fallback with negative remarks which closer matches the final call-sites. Note that this can lead to unbounded inlining if a negative remark doesn't match/exist for one reason or another.

Updated various tests to cover the new switches and negative remarks

Testing:
ninja check-all

Reviewed By: wenlei, mtrofin

Differential Revision: https://reviews.llvm.org/D112040
2021-10-29 12:32:03 -07:00
Duncan P. N. Exon Smith 9902362701 Support: Use sys::path::is_style_{posix,windows}() in a few places
Use the new sys::path::is_style_posix() and is_style_windows() in a few
places that need to detect the system's native path style.

In llvm/lib/Support/Path.cpp, this patch removes most uses of the
private `real_style()`, where is_style_posix() and is_style_windows()
are just a little tidier.

Elsewhere, this removes `_WIN32` macro checks. Added a FIXME to a
FileManagerTest that seemed fishy, but maintained the existing
behaviour.

Differential Revision: https://reviews.llvm.org/D112289
2021-10-29 12:09:41 -07:00
modimo 51ce567b38 [SampleProfile] Add all callsites to AllCandidates if InlineReplay is in effect
Replay in sample profiling needs to be asked on candidates that may not have counts or below the threshold. If replay is in effect for a function make sure these are captured and also imported during thinLTO.

Testing:
ninja check-all

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D112033
2021-10-29 12:04:52 -07:00
Roman Lebedev 0ae7bf124a
[NFC][LoopDeletion] Count the number of broken backedges
Those don't contribute to the number of deleted loops.
2021-10-29 21:58:16 +03:00
Amara Emerson 5dd9e019dd [AArch64][GlobalISel] Fix an crash in RBS due to a new regclass being added.
rdar://84674985
2021-10-29 11:47:00 -07:00
Duncan P. N. Exon Smith 4e4883e1f3 Support: Expose sys::path::is_style_{posix,windows,native}()
Expose three helpers in namespace llvm::sys::path to detect the
path rules followed by sys::path::Style.

- is_style_posix()
- is_style_windows()
- is_style_native()

This are constexpr functions that that will allow a bunch of
path-related code to stop checking `_WIN32`.

Originally I looked at adding system_style(), analogous to
sys::endian::system_endianness(), but future patches (from others) will
add more Windows style variants for slash preferences. These helpers
should be resilient to that change, allowing callers to detect basic
path rules.

Differential Revision: https://reviews.llvm.org/D112288
2021-10-29 11:46:44 -07:00
Sanjay Patel d0e9879d96 [InstCombine] allow vector splat matching for bitwise logic folds
These transforms are also likely missing a one-use check,
but that's another patch.
2021-10-29 14:22:50 -04:00
Stanislav Mekhanoshin a905c54b76 [InstCombine] Fold `(~(a | b) & c) | ~(a | c)` into `~((b & c) | a)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
  %or1 = or i4 %b, %a
  %not1 = xor i4 %or1, -1
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, -1
  %and = and i4 %not2, %b
  %or3 = or i4 %and, %not1
  ret i4 %or3
}

define i4 @tgt(i4 %a, i4 %b, i4 %c) {
  %and = and i4 %c, %b
  %or = or i4 %and, %a
  %or3 = xor i4 %or, -1
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112338
2021-10-29 10:58:09 -07:00
Matt Morehouse 33cc0cfd46 [X86] Don't affect jump tables under +tagged-globals.
`classifyLocalReference(nullptr)` is called to get the appropriate
relocation type for jump tables.  We should not use @GOTPCREL for this
case.

The new test cases trigger assertions without this patch.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D112832
2021-10-29 10:37:43 -07:00
Fraser Cormack 8314a04ede [SelectionDAG] Allow FindMemType to fail when widening loads & stores
This patch removes an internal failure found in FindMemType and "bubbles
it up" to the users of that method: GenWidenVectorLoads and
GenWidenVectorStores. FindMemType -- renamed findMemType -- now returns
an optional value, returning None if no such type is found.

Each of the aforementioned users now pre-calculates the list of types it
will use to widen the memory access. If the type breakdown is not
possible they will signal a failure, at which point the compiler will
crash as it does currently.

This patch is preparing the ground for alternative legalization
strategies for vector loads and stores, such as using vector-predication
versions of loads or stores.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D112000
2021-10-29 18:27:31 +01:00
Craig Topper aefcd59895 [RISCV] Teach RISCVInsertVSETVLI::needVSETVLI to handle mask register instructions better.
If the VL operand of a mask register instruction comes from an
explicit vsetvli with a different VTYPE, we can still avoid needing
a vsetvli as long as the SEW/LMUL ratio is the same and policy bits
match.

Differential Revision: https://reviews.llvm.org/D112762
2021-10-29 09:49:36 -07:00
Simon Pilgrim 6102e5d56b [CostModel][X86] Remove old TODO comment
BMI (TZCNT) scalar handling was added at rGa2db388dce77c2f23f2009d7363a0b63bb54523c
2021-10-29 17:28:45 +01:00
Mircea Trofin d6790a0a3c [NFC] ProfileSummary: const most of the fields.
This simplifies readability / maintainability.
2021-10-29 08:36:08 -07:00
Bradley Smith 86972f1114 [AArch64][SVE] Use TargetFrameIndex in more SVE load/store addressing modes
Add support for generating TargetFrameIndex in complex patterns for
indexed addressing modes in SVE. Additionally, add missing load/stores
to getMemOpInfo and getLoadStoreImmIdx.

Differential Revision: https://reviews.llvm.org/D112617
2021-10-29 14:44:16 +00:00
Jay Foad 56f03d25b4 [IR] Remove createReplacementInstr. NFC.
It is unused since D112791.

Differential Revision: https://reviews.llvm.org/D112795
2021-10-29 15:03:19 +01:00
Jay Foad 1b758925ad [IR] Merge createReplacementInstr into ConstantExpr::getAsInstruction
createReplacementInstr was a trivial wrapper around
ConstantExpr::getAsInstruction, which also inserted the new instruction
into a basic block. Implement this directly in getAsInstruction by
adding an InsertBefore parameter and change all callers to use it. NFC.

A follow-up patch will remove createReplacementInstr.

Differential Revision: https://reviews.llvm.org/D112791
2021-10-29 15:02:58 +01:00
Jay Foad 21a1d4cf71 [AMDGPU] Change numBitsSigned for simplicity and document it. NFC.
Change numBitsSigned to return the minimum size of a signed integer that
can hold the value. This is different by one from the previous result
but is more consistent with numBitsUnsigned. Update all callers. All
callers are now more consistent between the signed and unsigned cases,
and some callers get simpler, especially the ones that deal with
quantities like numBitsSigned(LHS) + numBitsSigned(RHS).

Differential Revision: https://reviews.llvm.org/D112813
2021-10-29 14:22:06 +01:00
Chen Zheng 7591d21032 [PowerPC] fix a miscompile for Solaris build 2021-10-29 12:06:25 +00:00
Bradley Smith bf72a469ba [AArch64][SVE] Fix build failure introduced in 13faa5f440 2021-10-29 11:57:02 +00:00
David Green 11630dbbc3 [InstCombine] Fold BW/2+1 tops bits are same pattern
Match "icmp eq (trunc (lsr A, BW), (ashr (trunc A), BW-1))", which checks
the top BW/2 + 1 bits are all the same. Create "A >=s INT_MIN && A <=s
INT_MAX", which we generate as "icmp ult (add A, 2^BW-1), 2^BW" to skip
a few steps of instcombining.
https://alive2.llvm.org/ce/z/NjH6Ty
https://alive2.llvm.org/ce/z/_fEQ9P

Differential Revision: https://reviews.llvm.org/D109155
2021-10-29 12:30:20 +01:00
Simon Pilgrim 154c036ebb [X86] combineX86GatherScatter - only fold scale if the index isn't extended
As mentioned on D108539, when the gather indices are smaller than the pointer size, they are sign-extended BEFORE scale is applied, making the general fold unsafe.

If the index have sufficient sign-bits then folding the scale could be safe - I'll investigate this.
2021-10-29 11:48:05 +01:00
David Green 9020e22a87 [InstCombine] Convert xor (ashr X, BW-1), C -> select(X >=s 0, C, ~C)
The sequence of instructions `xor (ashr X, BW-1), C` (or with a truncation
`xor (trunc (ashr X, BW-1)), C)` takes a value, produces all zeros or all
ones and with it optionally inverts a constant depending on whether the
original input was positive or negative. This is the same as checking if
the value is positive, and selecting between the constant and ~constant.
https://alive2.llvm.org/ce/z/NJ85qY

This is a fairly general version of a fold that helps pull saturating
arithmetic into a canonical form.

Differential Revision: https://reviews.llvm.org/D109151
2021-10-29 11:19:20 +01:00
Bradley Smith 13faa5f440 [AArch64][SVE] Generate SVE >1 element structured load/stores from fixed types
This adds support for SVE structured loads/stores to the relevant target
hooks, such that we can support these instructions in the InterleavedAccess
pass.

Depends on D112078

Differential Revision: https://reviews.llvm.org/D112303
2021-10-29 09:35:57 +00:00
Cullen Rhodes 8686626244 [Sparc] NFC: Remove unused tblgen template args
Identified in D109359.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D109712
2021-10-29 09:16:15 +00:00
Neubauer, Sebastian c78640ee6a [TailDuplicator] Fix merging block with terminator
The TailDuplicator merged two blocks, even if the first one ended with
a terminator, resulting in invalid MIR, where a terminator is in the
middle of a block.

Abort merging if the first block ends with a terminator.

Differential Revision: https://reviews.llvm.org/D112226
2021-10-29 10:52:46 +02:00
Vang Thao 52b43d1549 [AMDGPU] Fix cvt_f32_ubyte combine with shl
Shift node is still needed to check if the shift is shr or shl to increment/decrement offset. Do not override the node.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D112733
2021-10-28 21:43:06 -07:00
Chuanqi Xu bb16e83932 [NFC] [Coroutines] Use llvm::make_scope_exit to replace self-defined RTTIHelper 2021-10-29 12:14:20 +08:00
Kazu Hirata 01b4789b62 [AMDGPU] Remove hasDefinedInitializer (NFC)
The last use was removed on Sep 16, 2021 in commit
7a62a5b56d.
2021-10-28 20:33:34 -07:00
Kazu Hirata dd5d46b009 [AMDGPU] Remove unused BBSelectRegister in AMDGPUMachineCFGStructurizer (NFC)
This field seems to be unused for at least one year.
2021-10-28 20:33:32 -07:00
Kazu Hirata 309357c01a [AMDGPU] Remove unused declaration eliminateDeadBranchOperands (NFC) 2021-10-28 20:33:30 -07:00
Abinav Puthan Purayil db8d7b6e2d [DAGCombine][NFC] s/it's/its in the comment of hasNoInfs(). 2021-10-29 07:36:38 +05:30
Lang Hames 12b2cc2294 [ORC] Rename SupportFunctionCall to WrapperFunctionCall.
The new name better suits the type.

This patch also changes the signature of the run method (it now returns a
WrapperFunctionResult), and adds runWithSPSRet methods that deserialize the
function result using SPS.

Together these chages bring this type into close alignment with its ORC runtime
counterpart.
2021-10-28 17:48:54 -07:00
Lang Hames 999c6a235e Reapply e32b1eee6a "[ORC] Change SPSExecutorAddr serialization,..." with fixes.
This re-applies e32b1eee6a, which was reverted in
20675d8f7d due to broken unit tests. This patch
includes fixes for the tests.
2021-10-28 16:40:25 -07:00
Thomas Lively fb67f3d969 [WebAssembly] Add prototype relaxed float to int trunc instructions
Add i32x4.relaxed_trunc_f32x4_s, i32x4.relaxed_trunc_f32x4_u,
i32x4.relaxed_trunc_f64x2_s_zero, i32x4.relaxed_trunc_f64x2_u_zero.

These are only exposed as builtins, and require user opt-in.

Differential Revision: https://reviews.llvm.org/D112186
2021-10-28 14:01:53 -07:00
Daniel Kiss d8075e8781 Reland "[ARM] __cxa_end_cleanup should be called instead of _UnwindResume."
This is relanding commit da1d1a0869 .
This patch additionally addresses failures found in buildbots & post review comments.

ARM EHABI[1] specifies the __cxa_end_cleanup to be called after cleanup.
It will call the UnwindResume.
__cxa_begin_cleanup will be called from libcxxabi while __cxa_end_cleanup is never called.
This will trigger a termination when a foreign exception is processed while UnwindResume is called
because the global state will be wrong due to the missing __cxa_end_cleanup call.

Additional test here: D109856
[1] https://github.com/ARM-software/abi-aa/blob/main/ehabi32/ehabi32.rst#941compiler-helper-functions

Reviewed By: logan

Differential Revision: https://reviews.llvm.org/D111703
2021-10-28 21:45:09 +02:00
Wouter van Oortmerssen a66451ebbe [WebAssembly] Fix debug locations for ExplicitLocals pass
Differential Revision: https://reviews.llvm.org/D112487
2021-10-28 12:35:46 -07:00
Stanislav Mekhanoshin f7f430c913 [InstCombine] Fixed non-determinisctic order of new instructions
Fixes non-determinisctic order of XOR instructions created after
5a7a458306. The order of call argument evaluation is not
defined, so create one Value before the call.
2021-10-28 12:14:02 -07:00
Stanislav Mekhanoshin 5a7a458306 [InstCombine] Fold `(c & ~(a | b)) | (b & ~(a | c))` to `~a & (b ^ c)`
```
----------------------------------------
define i4 @src(i4 %a, i4 %b, i4 %c) {
%0:
  %or1 = or i4 %a, %b
  %not1 = xor i4 %or1, 15
  %and1 = and i4 %not1, %c
  %or2 = or i4 %a, %c
  %not2 = xor i4 %or2, 15
  %and2 = and i4 %not2, %b
  %or3 = or i4 %and1, %and2
  ret i4 %or3
}
=>
define i4 @tgt(i4 %a, i4 %b, i4 %c) {
%0:
  %xor = xor i4 %b, %c
  %not = xor i4 %a, 15
  %or3 = and i4 %xor, %not
  ret i4 %or3
}
Transformation seems to be correct!
```

Differential Revision: https://reviews.llvm.org/D112276
2021-10-28 11:54:30 -07:00
Ahmed Bougacha bef777206e [AArch64] Rename some timm predicates for consistency. NFC.
timm isn't the common case, and TImmLeafs should make it clear what
they are.  We're adding a plain ImmLeaf for 0_65535, so rename
i64_imm0_65535 to timm64_0_65535, and imm32_0_7 to timm32_0_7.
2021-10-28 11:41:29 -07:00
Yuanfang Chen ac02bcad56 [IRSymTab] Mark __stack_chk_guard used
`__stack_chk_guard` is a global variable that has no uses before the LLVM code generation phase (how it is defined is platform-dependent). LTO needs to preserve this symbol for that reason. Currently, legacy LTO API preserves it by hardcoding the logic in Internalizer, but this symbol is not preserved by regular LTO API in thinlink phase. This patch marks `__stack_chk_guard` used during IR symbol table writing since this is how builtin functions are preserved by thinlink by using `RuntimeLibcalls.def`.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112595
2021-10-28 11:22:26 -07:00
Yuanfang Chen c18ed69873 [Internalize] Preserve __stack_chk_fail in Internalizer correctly
Move the section collecting `AlwaysPreserved` up before any
`maybeInternalize` is called. Otherwise, functions in `AlwaysPreserved` (in this case, `__stack_chk_fail`)
are not preserved.

Reviewed By: MaskRay, tejohnson

Differential Revision: https://reviews.llvm.org/D112684
2021-10-28 11:22:26 -07:00
Guozhi Wei 1e46dcb77b [TwoAddressInstructionPass] Put all new instructions into DistanceMap
In function convertInstTo3Addr, after converting a two address instruction into
three address instruction, only the last new instruction is inserted into
DistanceMap. This is wrong, DistanceMap should track all instructions from the
beginning of current MBB to the working instruction. When a two address
instruction is converted to three address instruction, multiple instructions may
be generated (usually an extra COPY is generated), all of them should be
inserted into DistanceMap.

Similarly when unfolding memory operand in function tryInstructionTransform
DistanceMap is not maintained correctly.

Differential Revision: https://reviews.llvm.org/D111857
2021-10-28 11:11:59 -07:00
Matthias Braun e2c7ee0743 X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr
This extends `optimizeCompareInstr` to re-use previous comparison
results if the previous comparison was with an immediate that was 1
bigger or smaller. Example:

    CMP x, 13
    ...
    CMP x, 12   ; can be removed if we change the SETg
    SETg ...    ; x > 12  changed to `SETge` (x >= 13) removing CMP

Motivation: This often happens because SelectionDAG canonicalization
tends to add/subtract 1 often when optimizing for fallthrough blocks.
Example for `x > C` the fallthrough optimization switches true/false
blocks with `!(x > C)` --> `x <= C` and canonicalization turns this into
`x < C + 1`.

Differential Revision: https://reviews.llvm.org/D110867
2021-10-28 10:33:56 -07:00
Matthias Braun 97a1570d8c X86InstrInfo: Optimize more combinations of SUB+CMP
`X86InstrInfo::optimizeCompareInstr` would only optimize a `SUB`
followed by a `CMP` in `isRedundantFlagInstr`. This extends the code to
also look for other combinations like `CMP`+`CMP`, `TEST`+`TEST`, `SUB
x,0`+`TEST`.

- Change `isRedundantFlagInstr` to run `analyzeCompareInstr` on the
  candidate instruction and compare the results. This normalizes things
  and gives consistent results for various comparisons (`CMP x, y`,
  `SUB x, y`) and immediate cases (`TEST x, x`, `SUB x, 0`,
  `CMP x, 0`...).
- Turn `isRedundantFlagInstr` into a member function so it can call
  `analyzeCompare`.  - We now also check `isRedundantFlagInstr` even if
  `IsCmpZero` is true, since we now have cases like `TEST`+`TEST`.

Differential Revision: https://reviews.llvm.org/D110865
2021-10-28 10:33:56 -07:00
Florian Hahn c45045bfd0
[VPlan] Keep induction recipes in header.
This patch updates recipe creation to ensure all
VPWidenIntOrFpInductionRecipes are in the header block. At the moment,
new induction recipes can be created in different blocks when trying to
optimize casts and induction variables.

Having all induction recipes in the header makes it easier to
analyze/transform them in VPlan.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D111300
2021-10-28 18:22:05 +01:00
Nicolai Hähnle b437aaa672 MachineDominators: Define MachineDomTree type alias
This is a (very) small move towards making the machine dominators more
aligned with the IR dominators:

* DominatorTree / MachineDomTree is the class holding the dominator tree
* DominatorTreeWrapperPass / MachineDominatorTree is the corresponding
  (machine) function pass

This alignment will be used by analyses that are designed as templates
that work with LLVM IR as well as Machine IR.

Reviewed By: critson

Differential Revision: https://reviews.llvm.org/D112690
2021-10-28 22:30:35 +05:30
Leonard Grey 793b481f54 [CGProfile] Don't emit call graph profile edges with zero weight
With D112160 and D112164, on a Chrome Mac build this reduces the total
size of CGProfile sections by 78% (around 25% eliminated entirely) and
total size of object files by 0.14%.

Differential Revision: https://reviews.llvm.org/D112655
2021-10-28 11:32:49 -04:00