Commit Graph

146035 Commits

Author SHA1 Message Date
Serguei Katkov cf0d3477aa [GreedyRA ORE] Separate Folder Reloads and Zero Cost Folder Reloads
Patchpoint instructions have operands which is actually zero cost
(or the same as register) to use the value from the stack.
In terms of statistic it makes same to separate them.

Move from computation instructions related to stack spill/reload to
number of stack slot referenced.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100016
2021-04-14 14:25:28 +07:00
Bogdan Graur 0acf4e5005
[NFC] Fix unused warning.
Differential Revision: https://reviews.llvm.org/D100449
2021-04-14 09:09:20 +02:00
Serguei Katkov 02265ed7ad [Live Intervals] Teach Greedy RA to recognize special case live-through
Statepoint instruction has a deopt section which is actually live-through the call.
Currently this is handled by special post pass after RA - fixup-statepoint-caller-saved.

This change teaches Greedy RA that if segment of live interval is ended with statepoint
instruction and its reg is used in deopt bundle then this live interval interferes regmask of this statepoint
and as a result caller-saved register cannot be assigned to this live interval.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100296
2021-04-14 13:26:49 +07:00
Min-Yih Hsu 91b6ef64db [M68k] Put M68kInfo as the direct library dependency for AsmParser
M68kAsmParser uses `llvm::getTheM68kTarget` from M68kInfo, therefore we
should put M68kInfo as its direct dependency. Otherwise the build will
fail when building LLVM libraries as shared objects (building LLVM
libraries statically won't have this problem though).
2021-04-13 21:21:02 -07:00
Serguei Katkov d5ed0d4816 [Live Intervals] Factor-out unionBitMask. NFC.
For further re-usage in other place.
2021-04-14 10:39:01 +07:00
Wang, Pengfei a3b52a9d13 [X86][AMX] Refactor for PostRA ldtilecfg pass.
This is a follow up of D99010. We didn't consider the live range of shape registers when hoist ldtilecfg. There maybe risks, e.g. we happen to insert it to an invalid range of some registers and get unexpected error.

This patch fixes this problem by storing the value to corresponding stack place of ldtilecfg after all its definition immediately.

This patch also fix a problem in previous code: If we don't have a ldtilecfg which dominates all AMX instructions, we cannot initialize shapes for other ldtilecfg.

There're still some optimization points left. E.g. eliminate unused mov instructions, break the def-use dependency before RA etc.

Reviewed By: LuoYuanke, xiangzhangllvm

Differential Revision: https://reviews.llvm.org/D99966
2021-04-14 10:08:23 +08:00
Philip Reames 00c8be3f93 fix whitespace type 2021-04-13 19:02:41 -07:00
ShihPo Hung d5e962f1f2 [RISCV] Implement COPY for Zvlsseg registers
When copying Zvlsseg register tuples, we split the COPY to NF whole register moves
as below:

  $v10m2_v12m2 = COPY $v4m2_v6m2 # NF = 2
=>
  $v10m2 = PseudoVMV2R_V $v4m2
  $v12m2 = PseudoVMV2R_V $v6m2

This patch copies forwardCopyWillClobberTuple from AArch64 to check
register overlapping.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D100280
2021-04-13 18:55:51 -07:00
Nemanja Ivanovic 0148bf53f0 [PowerPC] Use correct node to get a super register from a subreg
The VSX tablegen file has some rather eggregious uses of
COPY_TO_REGCLASS even in situations where it needs to use
SUBREG_TO_REG. While this produces correct code, it often doesn't
allow the register coalescer to coalesce copies and the resulting
code ends up being suboptimal. This patch just changes over
patterns that should use SUBREG_TO_REG.
2021-04-13 19:52:21 -05:00
Sterling Augustine 32e264921b Revert "[GlobalOpt] Revert valgrind hacks"
This reverts commit dbc16ed199.
2021-04-13 17:47:07 -07:00
root 645ce31c20 Title: [RISCV] Add missing part of instruction vmsge {u}. VX Review By: craig.topper Differential Revision : https://reviews.llvm.org/D100115 2021-04-14 06:41:59 +08:00
Daniel Sanders be50657c6a [TableGen] Resolve concrete but not complete field access initializers
This fixes the resolution of Rec10.Zero in ListSlices.td.

As part of this, correct the definition of complete for ListInit such that
it's complete iff all the elements in the list are complete rather than
always being complete regardless of the elements. This is the reason
Rec10.TwoFive from ListSlices.td previously resolved despite being
incomplete like Rec10.Zero was

Depends on D100247

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D100253
2021-04-13 15:14:56 -07:00
Anirudh Prasad 6ddd8c28b7 [AsmParser][SystemZ][z/OS] Add support to AsmLexer to accept HLASM style integers
- Add support for HLASM style integers. These are the decimal integers [0-9].
- HLASM does not support the additional prefixed integers like, `0b`, `0x`, octal integers and Masm style integers.
- To achieve this, a field `LexHLASMStyleIntegers` (similar to the `LexMasmStyleIntegers` field) is introduced in `MCAsmLexer.h` as well as a corresponding setter.

Note: This field could also go into MCAsmInfo.h. I used the previous precedent set by the `LexMasmIntegers` field.

Depends on https://reviews.llvm.org/D99286

Reviewed By: epastor

Differential Revision: https://reviews.llvm.org/D99374
2021-04-13 15:29:37 -04:00
Craig Topper 6aa6f748ae [RISCV] Add a generic PatGprImm class and use it to simplify patterns in RISCVInstrInfoB.td. NFC 2021-04-13 12:07:24 -07:00
Craig Topper cb073f1bc0 [RISCV] Make use of PatGprGpr and PatGpr in RISCVInstrInfoB.td. NFC 2021-04-13 12:06:58 -07:00
Yonghong Song a285bdb56f BPF: remove default .extern data section
Currently, for any extern variable, if it doesn't have
section attribution, it will be put into a default ".extern"
btf DataSec. The initial design is to put every extern
variable in a DataSec so libbpf can use it.

But later on, libbpf actually requires extern variables
to put into special sections, e.g., ".kconfig", ".ksyms", etc.
so they can be used properly based on section name.

Andrii mentioned since ".extern" variables are
not actually used, it makes sense to remove it from
the compiler so libbpf does not need to deal with it,
esp. for static linking. The BTF for these extern variables
is still generated.

With this patch, I tested kernel selftests/bpf and all tests
passed. Indeed, removing ".extern" DataSec seems having no
impact.

Differential Revision: https://reviews.llvm.org/D100392
2021-04-13 11:35:52 -07:00
Nikita Popov faf9f11589 [SCEV] Don't walk uses of phis without SCEV expression when forgetting
I've run into some cases where a large fraction of compile-time is
spent invalidating SCEV. One of the causes is forgetLoop(), which
walks all values that are def-use reachable from the loop header
phis. When invalidating a topmost loop, that might be close to all
values in a function. Additionally, it's fairly common for there to
not actually be anything to invalidate, but we'll still be performing
this walk again and again.

My first thought was that we don't need to continue walking the uses
if the current value doesn't have a SCEV expression. However, this
isn't quite right, because SCEV construction can skip over values
(e.g. for a chain of adds, we might only create a SCEV expression
for the final value).

What this patch does instead is to only walk the (full) def-use chain
of loop phis that have a SCEV expression. If there's no expression
for a phi, then we also don't have any dependent expressions to
invalidate.

Differential Revision: https://reviews.llvm.org/D100264
2021-04-13 20:28:17 +02:00
Craig Topper 1afdfc6169 [RISCV] Rename RISCVISD::GREVI(W)/GORCI(W) to RISCVISD::GREV(W)/GORC(W). Don't require second operand to be a constant.
Prep work for adding intrinsics for these instructions in the
future.
2021-04-13 11:04:28 -07:00
Jessica Paquette 516d09387b [AArch64][GlobalISel] Mark G_CTPOP as legal for v16s8 and v8s8
G_CTPOP can be directly selected to CNT in these cases.

Differential Revision: https://reviews.llvm.org/D100349
2021-04-13 11:03:39 -07:00
Simon Pilgrim 74f98391a7 [X86][SSE] combineSetCCMOVMSK - allow comparison with upper (known zero) bits in CMP(MOVMSK(PACKSS())) -> CMP(MOVMSK()) fold
We already allow the comparison of the upper bits of 'IsAllOf' (allbits) patterns, but we can safely compare the known zero bits for 'IsAnyOf' (zerobits) patterns as well.

This fixes an issues where we are comparing a type wide than the number of vector elements, which avoids a regression mentioned in rGbaadbe04bf75.
2021-04-13 17:37:24 +01:00
Anirudh Prasad 7da22dfcd0 [SystemZ][z/OS] Introduce dialect querying helper functions
- In the SystemZAsmParser, there will be a few queries to the type of dialect it is (AD_ATT, AD_HLASM) in future patches.
- It would be nice to have two small helper functions `isParsingATT()` and `isParsingHLASM()`
- Putting this as a separate smaller patch allows us to remove its definitions from other dependent patches.

Reviewed By: uweigand, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D99891
2021-04-13 12:14:34 -04:00
Evgeny Leviant dbc16ed199 [GlobalOpt] Revert valgrind hacks
Differential revision: https://reviews.llvm.org/D69428
2021-04-13 19:11:10 +03:00
Yonghong Song 968292cb93 BPF: generate proper BTF for globals with WeakODRLinkage
For a global weak symbol defined as below:
  char g __attribute__((weak)) = 2;
LLVM generates an allocated global with WeakAnyLinkage,
for which BPF backend generates proper BTF info.

For the above example, if a modifier "const" is added like
  const char g __attribute__((weak)) = 2;
LLVM generates an allocated global with WeakODRLinkage,
for which BPF backend didn't generate any BTF as it
didn't handle WeakODRLinkage.

This patch addes support for WeakODRLinkage and proper
BTF info can be generated for weak symbol defined with
"const" modifier.

Differential Revision: https://reviews.llvm.org/D100362
2021-04-13 08:54:05 -07:00
Anirudh Prasad f7eec83932 [AsmParser][SystemZ][z/OS] Add in support to allow use of additional comment strings.
- Currently, MCAsmInfo provides a CommentString attribute, that various targets can set, so that the AsmLexer can appropriately lex a string as a comment based on the set value of the attribute.
- However, AsmLexer also supports a few additional comment syntaxes, in addition to what's specified as a CommentString attribute. This includes regular C-style block comments (/* ... */), regular C-style line comments (// .... ) and #. While I'm not sure as to why this behaviour exists, I am assuming it does to maintain backward compatibility with GNU AS (see https://sourceware.org/binutils/docs/as/Comments.html#Comments for reference)
For example:
Consider a target which sets the CommentString attribute to '*'.
The following strings are all lexed as comments.

```
"# abc" -> comment
"// abc" -> comment
"/* abc */ -> comment
"* abc" -> comment
```

- In HLASM however, only "*" is accepted as a comment string, and nothing else.
- To achieve this, an additional attribute (`AllowAdditionalComments`) has been added to MCAsmInfo. If this attribute is set to false, then only the string specified by the CommentString attribute is used as a possible comment string to be lexed by the AsmLexer. The regular C-style block comments, line comments and "#" are disabled. As a final note, "#" will still be treated as a comment, if the CommentString attribute is set to "#".

Depends on https://reviews.llvm.org/D99277

Reviewed By: abhina.sreeskantharajan, myiwanch

Differential Revision: https://reviews.llvm.org/D99286
2021-04-13 11:15:09 -04:00
Tim Northover 5e3d9fcc3a StackProtector: ensure protection does not interfere with tail call frame.
The IR stack protector pass must insert stack checks before the call instead of
between it and the return.

Similarly, SDAG one should recognize that ADJCALLFRAME instructions could be
part of the terminal sequence of a tail call. In this case because such call
frames cannot be nested in LLVM the stack protection code must skip over the
whole sequence (or risk clobbering argument registers).
2021-04-13 15:14:57 +01:00
Sander de Smalen 03f47bdcb1 [TTI] NFC: Change get[Interleaved]MemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100205
2021-04-13 14:21:02 +01:00
Sander de Smalen d676b5749d [TTI] NFC: Change getMaskedMemoryOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100204
2021-04-13 14:21:01 +01:00
Sander de Smalen db134e2428 [TTI] NFC: Change getCmpSelInstrCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100203
2021-04-13 14:21:01 +01:00
Sander de Smalen 2285dfb73f [TTI] NFC: Change getMinMaxReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100202
2021-04-13 14:21:00 +01:00
Sander de Smalen bd86824d98 [TTI] NFC: Change getArithmeticReductionCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

This patch is practically NFC, with the exception of an AArch64 SVE related
cost-model change, where we can now return an Invalid cost instead of some
bogus number.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100201
2021-04-13 14:20:59 +01:00
Sander de Smalen fd1f8a5462 [TTI] NFC: Change getGatherScatterOpCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100200
2021-04-13 14:20:59 +01:00
Sander de Smalen 92d8421f49 [TTI] NFC: Change getCastInstrCost and getExtractWithExtendCost to return InstructionCost
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D100199
2021-04-13 14:20:58 +01:00
madhur13490 5682ae2fc6 [AMDGPU] Set implicit arg attributes for indirect calls
This patch adds attributes corresponding to
implicits to functions/kernels if
1. it has an indirect call OR
2. it's address is taken.

Once such attributes are set, rest of the codegen would work
out-of-box for indirect calls. This patch eliminates
the potential overhead -fixed-abi imposes even though indirect functions
calls are not used.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D99347
2021-04-13 13:15:13 +00:00
Martin Storsjö 45f8946a75 [CodeView] Fix the ARM64 CPUType enum
The old, incorrect one seems to have been added in
d41ac895bb, with a similarly placed
entry added in EnumTables.cpp in
eb4d6142dc.

This matches the value documented at
https://docs.microsoft.com/en-us/visualstudio/debugger/debug-interface-access/cv-cpu-type-e?view=vs-2019.

This fixes running obj2yaml on an object file generated by MSVC.

Differential Revision: https://reviews.llvm.org/D100306
2021-04-13 12:54:22 +03:00
Florian Hahn 467b1f1cd2
[SimplifyCFG] Allow hoisting terminators only with HoistCommonInsts=false.
As a side-effect of the change to default HoistCommonInsts to false
early in the pipeline, we fail to convert conditional branch & phis to
selects early on, which prevents vectorization for loops that contain
conditional branches that effectively are selects (or if the loop gets
vectorized, it will get vectorized very inefficiently).

This patch updates SimplifyCFG to perform hoisting if the only
instruction in both BBs is an equal branch. In this case, the only
additional instructions are selects for phis, which should be cheap.

Even though we perform hoisting, the benefits of this kind of hoisting
should by far outweigh the negatives.

For example, the loop in the code below will not get vectorized on
AArch64 with the current default, but will with the patch. This is a
fundamental pattern we should definitely vectorize. Besides that, I
think the select variants should be easier to use for reasoning across
other passes as well.

https://clang.godbolt.org/z/sbjd8Wshx

```
double clamp(double v) {
  if (v < 0.0)
    return 0.0;
  if (v > 6.0)
    return 6.0;
  return v;
}

void loop(double* X, double *Y) {
  for (unsigned i = 0; i < 20000; i++) {
    X[i] = clamp(Y[i]);
  }
}
```

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100329
2021-04-13 10:33:35 +01:00
Ricky Taylor 6e098e133d [M68k] Implement AsmParser
This is a work-in-progress implementation of an assembler for M68k.

Outstanding work:
- Updating existing tests assembly syntax
- Writing new tests for the assembler (and disassembler)

I've left those until there's consensus that this approach is okay (I hope that's okay!).

Questions I'm aware of:
- Should this use Motorola or gas syntax? (At the moment it uses Motorola syntax.)
- The disassembler produces a table at runtime for disassembly generated from the code beads. Is this okay? (This is less than ideal but as I mentioned in my llvm-dev post, it's quite complicated to write a table-gen parser for code beads.)

Depends on D98519

Depends on D98532

Depends on D98534

Depends on D98535

Depends on D98536

Differential Revision: https://reviews.llvm.org/D98537
2021-04-13 09:25:34 +01:00
Craig Topper 7c9bbbf735 [RISCV] Rename RISCVISD::SHFLI to RISCVISD::SHFL and don't require the second operand to be an immediate.
Prep work for adding intrinsics in the future.

Left an assert that the input is constant in ReplaceNodeResults,
as the intrinsic shouldn't go through that path.
2021-04-12 23:46:50 -07:00
Amy Huang dad5caa59e Revert "Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
This change causes an assert / segmentation fault in LTO builds.

This reverts commit f2e4f3eff3.
2021-04-12 20:10:17 -07:00
Serguei Katkov c362179b0a [GreedyRA ORE] Add debug location for function level report
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: thegameg
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100168
2021-04-13 09:49:12 +07:00
Chen Zheng 80aa9b0f7b [PowerPC] stop reverse mem op generation for some cases.
We should consider the feeder user number when we do reverse memory
operation transformation. Otherwise, we may get negative impact.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D100166
2021-04-12 22:41:28 -04:00
Evgeniy Brevnov e50aa1af2d [NARY][NFC] Use hasNUsesOrMore instead of getNumUses since it's more
efficient.
2021-04-13 09:29:49 +07:00
Freddy Ye 3fc1fe8db8 [X86] Support -march=rocketlake
Reviewed By: skan, craig.topper, MaskRay

Differential Revision: https://reviews.llvm.org/D100085
2021-04-13 09:48:13 +08:00
Gulfem Savrun Yeniceri e96df3e531 [Passes] Add relative lookup table converter pass
Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244

This patch adds a new pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Differential Revision: https://reviews.llvm.org/D94355
2021-04-13 01:29:41 +00:00
Nick Desaulniers 237d4ee835 [JumpThreading] merge debug info when merging select+br
Jump threading can replace select then unconditional branch with
conditional branch, but when doing so loses debug info.

This destructive transform is eventually leading to a failed Verifier
run during full LTO builds of the Linux kernel with CFI and KCOV
enabled, as reported in PR39531.

ModuleSanitizerCoveragePass will insert calls to
__sanitizer_cov_trace_pc, and sometimes split critical edges,
using whatever debug info may or may not exist for the branch for
the added libcall. Since we can inline calls to
__sanitizer_cov_trace_pc due to LTO, this can lead to the error
observed in PR39531 when the debug info isn't propagated to
the libcall, because of prior destructive transforms that failed to
retain debug info.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100137
2021-04-12 17:51:21 -07:00
Arthur Eubanks a8ab1f98d2 [Evaluator] Look through invariant.group intrinsics
Turning on -fstrict-vtable-pointers in Chrome caused an extra global
initializer. Turns out that a llvm.strip.invariant.group intrinsic was
causing GlobalOpt to fail to step through some simple code.

We can treat *.invariant.group uses as simply their operand.
Value::stripPointerCastsForAliasAnalysis() does exactly this. This
should be safe because the Evaluator does not skip memory accesses due
to invariants or alias analysis.

However, we don't want to leak that we've stripped arbitrary pointer
casts to users of Evaluator, so we bail out if we evaluate a function to
any constant, since we may have looked through *.invariant.group calls
and aliasing pointers cannot be arbitrarily substituted.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D98843
2021-04-12 16:12:15 -07:00
Nick Desaulniers 4914c98367 [SantizerCoverage] handle missing DBG MD when inserting libcalls
Instruction::getDebugLoc can return an invalid DebugLoc. For such cases
where metadata was accidentally removed from the libcall insertion
point, simply insert a DILocation with line 0 scoped to the caller. When
we can inline the libcall, such as during LTO, then we won't fail a
Verifier check that all calls to functions with debug metadata
themselves must have debug metadata.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100158
2021-04-12 15:55:58 -07:00
Yuanfang Chen c5fda0e662 Reland "Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom""
This reverts commit a3fabc79ae (relands
f4d682d6ce with fix for the compile-time
regression issue).
2021-04-12 14:50:54 -07:00
Fangrui Song 0a614fff4f [ARM] Fix -Wmissing-field-initializers 2021-04-12 14:28:23 -07:00
Jian Cai ed1734931a Fix up build failures after cfce5b26a8
Build log: https://lab.llvm.org/buildbot/#/builders/37/builds/3538

Differential Revision: https://reviews.llvm.org/D98916
2021-04-12 14:09:15 -07:00
Nikita Popov a3fabc79ae Revert "[InstCombine] when calling conventions are compatible, don't convert the call to undef idiom"
This reverts commit f4d682d6ce.

This caused a significant compile-time regression:
https://llvm-compile-time-tracker.com/compare.php?from=4b7bad9eaea2233521a94f6b096aaa88dc584e23&to=f4d682d6ce6c5b3a41a0acf297507c82f5c21eef&stat=instructions

Possibly this is due to overeager parsing of target triples.
2021-04-12 22:55:59 +02:00
Sanjay Patel 5354a213a0 [InstCombine] fold shift+trunc signbit check
https://alive2.llvm.org/ce/z/6vQvrP

This solves:
https://llvm.org/PR49866
2021-04-12 16:19:43 -04:00
Jian Cai cfce5b26a8 [ARM] support symbolic expression as immediate in memory instructions
Currently the ARM backend only accpets constant expressions as the
immediate operand in load and store instructions. This allows the
result of symbolic expressions to be used in memory instructions. For
example,

0:
.space 2048
strb r2, [r0, #(.-0b)]

would be assembled into the following instructions.

strb	r2, [r0, #2048]

This only adds support to ldr, ldrb, str, and strb in arm mode to
address the build failure of Linux kernel for now, but should facilitate
adding support to similar instructions in the future if the need arises.

Link:
https://github.com/ClangBuiltLinux/linux/issues/1329

Reviewed By: peter.smith, nickdesaulniers

Differential Revision: https://reviews.llvm.org/D98916
2021-04-12 12:13:55 -07:00
Sanjay Patel 661cc71a1c [PassManager][PhaseOrdering] lower expects before running simplifyCFG
Retry of 330619a3a6 that includes a clang test update.

Original commit message:

If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898 <https://reviews.llvm.org/D98898>.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 15:07:53 -04:00
Arthur Eubanks be00edfee5 [NewPM] Fix -print-changed when a -filter-print-funcs function is removed
-filter-print-funcs -print-changed was crashing after the filter func
was removed by a pass with
  Assertion failed: After.find("*** IR Dump") == 0 && "Unexpected banner format."
We weren't printing the banner because when we have -filter-print-funcs,
we print each function separately, letting the print function filter out
unwanted functions.

Reviewed By: jamieschmeiser

Differential Revision: https://reviews.llvm.org/D100237
2021-04-12 11:55:17 -07:00
Sanjay Patel 23ac9d1e6e Revert "[PassManager][PhaseOrdering] lower expects before running simplifyCFG"
This reverts commit 330619a3a6.
There are clang tests that also need to be updated.
2021-04-12 13:58:54 -04:00
Arthur Eubanks 269b335bd7 [Inliner] Propagate SROA analysis through invariant group intrinsics
SROA can handle invariant group intrinsics, let the inliner know that
for better heuristics when the intrinsics are present.

This fixes size issues in a couple files when turning on
-fstrict-vtable-pointers in Chrome.

Reviewed By: rnk, mtrofin

Differential Revision: https://reviews.llvm.org/D100249
2021-04-12 10:54:22 -07:00
Hamza Sood 0a92aff721 Replace uses of std::iterator with explicit using
This patch removes all uses of `std::iterator`, which was deprecated in C++17.
While this isn't currently an issue while compiling LLVM, it's useful for those using LLVM as a library.

For some reason there're a few places that were seemingly able to use `std` functions unqualified, which no longer works after this patch. I've updated those places, but I'm not really sure why it worked in the first place.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D67586
2021-04-12 10:47:14 -07:00
Fraser Cormack d737c47137 [RISCV] Support vector SET[U]LT and SET[U]GE with splatted immediates
This patch adds more optimized codegen for the above SETCC forms,
by matching the '.vi' vector forms when the immediate is a 5-bit signed
immediate plus 1. The immediate can be decremented and the corresponding
SET[U]LE or SET[U]GT forms can be matched.

This work was left as a TODO from D94168.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100096
2021-04-12 18:36:45 +01:00
Yuanfang Chen f4d682d6ce [InstCombine] when calling conventions are compatible, don't convert the call to undef idiom
D24453 enabled libcalls simplication for ARM PCS. This may cause
caller/callee calling conventions mismatch in some situations such as
LTO. This patch makes instcombine aware that the compatible calling
conventions differences are benign (not emitting undef idom).

Differential Revision: https://reviews.llvm.org/D99773
2021-04-12 09:32:23 -07:00
Sanjay Patel 330619a3a6 [PassManager][PhaseOrdering] lower expects before running simplifyCFG
If we run passes before lowering llvm.expect intrinsics to metadata,
then those passes have no way to act on the hints provided by llvm.expect.
SimplifyCFG is the known offender, and we made it smarter about profile
metadata in D98898.

In the motivating example from https://llvm.org/PR49336 , this means we
were ignoring the recommended method for a programmer to tell the compiler
that a compare+branch is expensive. This change appears to solve that case -
the metadata survives to the backend, the compare order is as expected in IR,
and the backend does not do anything to reverse it.

We make the same change to the old pass manager to keep things synchronized.

Differential Revision: https://reviews.llvm.org/D100213
2021-04-12 12:23:31 -04:00
David Green dd31b2c6e5 [ARM] Add a number of intrinsics for MVE lane interleaving
Add a number of intrinsics which natively lower to MVE operations to the
lane interleaving pass, allowing it to efficiently interleave the lanes
of chucks of operations containing these intrinsics.

Differential Revision: https://reviews.llvm.org/D97293
2021-04-12 17:23:02 +01:00
Stephen Tozer f2e4f3eff3 Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
The causes of the previous build errors have been fixed in revisions
aa3e78a59f, and
140757bfaa

This reverts commit f40976bd01.
2021-04-12 16:57:29 +01:00
Simon Pilgrim baadbe04bf [X86] Fold cmpeq/ne(trunc(logic(x)),0) --> cmpeq/ne(logic(x),0)
Fixes the issues noted in PR48768, where the and/or/xor instruction had been promoted to avoid i8/i16 partial-dependencies, but the test against zero had not.

We can almost certainly relax this fold to work for any truncation, although it breaks a number of existing folds (notable movmsk folds which tend to rely on the truncate to determine the demanded bits/elts in the source vector).

There is a reverse combine in TargetLowering.SimplifySetCC so we must wait until after legalization before attempting this.
2021-04-12 16:05:34 +01:00
Wang, Pengfei 4cbaaf4a24 [X86][AMX] Hoist ldtilecfg
The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.

This patch try to calculate the nearest point where all shapes of AMX registers are reachable.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D99010
2021-04-12 22:36:41 +08:00
David Green 6c0a1ed3a9 [ARM] Add FP handling for MVE lane interleaving
FP16 to FP32 converts can be handled in MVE lane interleaving, much like
the sext/zext lowering we do. This expands the pass with fpext and
fptrunc handling, and basic fp operations allowing more efficient
lowering of fp vectors.

Differential Revision: https://reviews.llvm.org/D97292
2021-04-12 15:28:13 +01:00
Malhar Jajoo 58f3201a20 [ARM] Updates to arm-block-placement pass
The patch makes two updates to the arm-block-placement pass:
- Handle arbitrarily nested loops
- Extends the search (for t2WhileLoopStartLR) to the predecessor of the
  preHeader.

Differential Revision: https://reviews.llvm.org/D99649
2021-04-12 14:46:23 +01:00
Paul C. Anagnostopoulos 489cdedd11 [TableGen] Fix bug in recent change to ListInit::convertInitListSlice()
Differential Revision: https://reviews.llvm.org/D100247
2021-04-12 09:44:39 -04:00
Andrew Savonichev f037b07b5c Revert "[AArch64] Add Machine InstCombiner patterns for FMUL indexed variant"
This reverts commit cca9b5985c.

Buildbot reported an error for CodeGen/AArch64/machine-combiner-fmul-dup.mir:

*** Bad machine code: Virtual register killed in block, but needed live out. ***
- function:    indexed_2s
- basic block: %bb.0 entry (0x640fee8)
Virtual register %7 is used after the block.

*** Bad machine code: Virtual register defs don't dominate all uses. ***
- function:    indexed_2s
- v. register: %7
LLVM ERROR: Found 2 machine code errors.
2021-04-12 16:28:49 +03:00
Andrew Savonichev cca9b5985c [AArch64] Add Machine InstCombiner patterns for FMUL indexed variant
This patch adds DUP+FMUL => FMUL_indexed pattern to InstCombiner.
FMUL_indexed is normally selected during instruction selection, but it
does not work in cases when VDUP and VMUL are in different basic
blocks.

Differential Revision: https://reviews.llvm.org/D99662
2021-04-12 16:08:39 +03:00
Sebastian Neubauer 6cc91adf1e [AMDGPU] Kill temporary register after restoring
Not a correctness issue, but the temporary register is not used
afterwards and should be dead.

Differential Revision: https://reviews.llvm.org/D100295
2021-04-12 14:20:03 +02:00
Bradley Smith f2593a0bd1 [AArch64][SVE] Remove redundant PTEST of MATCH/NMATCH results
Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D99584
2021-04-12 12:55:00 +01:00
Stephen Tozer aa3e78a59f Reapply "[DebugInfo] Correctly track SDNode dependencies for list debug values"
Fixed memory leak error by using BumpAllocator for SDDbgValue arrays.

This reverts commit 1b589172bd.
2021-04-12 12:51:29 +01:00
Dmitry Preobrazhensky 67b39661c8 [AMDGPU][MC][NFC] Removed extra spaces
Fixed bugs 49646, 49647.

Differential Revision: https://reviews.llvm.org/D100173
2021-04-12 13:33:19 +03:00
Sebastian Neubauer 7a8e65dd3d [AMDGPU] Fix ubsan error
The RegScavenger can be null sometimes, so a pointer is needed.

Fixes UBSan error introduced in f9a8c6a0e5.
2021-04-12 12:14:00 +02:00
Sebastian Neubauer b76c2a6c2b [AMDGPU] Fix saving fp and bp
Spilling the fp or bp to scratch could overwrite VGPRs of inactive
lanes. Fix that by using only the active lanes of the scavenged VGPR.

This builds on the assumptions that
1. a function is never called with exec=0
2. lanes do not die in a function, i.e. exec!=0 in the function epilog
3. no new lanes are active when exiting the function, i.e. exec in the
   epilog is a subset of exec in the prolog.

Differential Revision: https://reviews.llvm.org/D96869
2021-04-12 11:52:55 +02:00
Sebastian Neubauer 32bc9a9bc3 [AMDGPU] Unify spill code
Instead of reimplementing spilling in prolog and epilog, reuse
buildSpillLoadStore.

Reviewed By: scott.linder

Differential Revision: https://reviews.llvm.org/D99269
2021-04-12 11:19:08 +02:00
Sebastian Neubauer f9a8c6a0e5 [AMDGPU] Save VGPR of whole wave when spilling
Spilling SGPRs to scratch uses a temporary VGPR. LLVM currently cannot
determine if a VGPR is used in other lanes or not, so we need to save
all lanes of the VGPR. We even need to save the VGPR if it is marked as
dead.

The generated code depends on two things:
- Can we scavenge an SGPR to save EXEC?
- And can we scavenge a VGPR?

If we can scavenge an SGPR, we
- save EXEC into the SGPR
- set the needed lane mask
- save the temporary VGPR
- write the spilled SGPR into VGPR lanes
- save the VGPR again to the target stack slot
- restore the VGPR
- restore EXEC

If we were not able to scavenge an SGPR, we do the same operations, but
everytime the temporary VGPR is written to memory, we
- write VGPR to memory
- flip exec (s_not exec, exec)
- write VGPR again (previously inactive lanes)

Surprisingly often, we are able to scavenge an SGPR, even though we are
at the brink of running out of SGPRs.
Scavenging a VGPR does not have a great effect (saves three instructions
if no SGPR was scavenged), but we need to know if the VGPR we use is
live before or not, otherwise the machine verifier complains.

Differential Revision: https://reviews.llvm.org/D96336
2021-04-12 11:01:38 +02:00
Stelios Ioannou a655f250fe [AArch64] Adds memory operands for indexed loads.
This patch adds the memory operands for indexed loads so
that certain optimizations can take place.

Differential Revision: https://reviews.llvm.org/D100215/

Change-Id: I539fcf046ca4ad1e7df1d893f57d751419d8364d
2021-04-12 09:11:37 +01:00
Zhang Qing Shan d69c236e1d [NFC][Debug] Fix unnecessary deep-copy for vector to save compiling time
We saw some big compiling time impact after enabling the debug entry value
feature for X86 platform(D73534). Compiling time goes from 900s->1600s with
our testcase. It is caused by allocating/freeing the memory busily.

'using FwdRegWorklist = MapVector<unsigned, SmallVector<FwdRegParamInfo, 2>>;'
The value for this map is vector, and we miss the reference when access the
element. The same happens for `auto CalleesMap = MF->getCallSitesInfo();` which is a DenseMap.

Reviewed by: djtodoro, flychen50

Differential Revision: https://reviews.llvm.org/D100162
2021-04-12 14:55:03 +08:00
Bing1 Yu 747111ea71 [X86] Pass to transform tdpbsud&tdpbusd&tdpbuud intrinsics to scalar operation
Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D99244
2021-04-12 13:58:14 +08:00
Evgeniy Brevnov 36b932d6a3 [NARY] Don't optimize min/max if there are side uses
Say we have
%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%2,%a)

The optimization will try to reassociate the later one so that we can rewrite it to %3=min(%1, %c) and remove %2.
But if %2 has another uses outside of %3 then we can't remove %2 and end up with:

%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1, %c)

This doesn't harm by itself except it is not profitable and changes IR for no good reason.
What is bad it triggers next iteration which finds out that optimization is applicable to %2 and %3 and generates:

%1=min(%a,%b)
%2=min(%b,%c)
%3=min(%1,%c)
%4=min(%2,%a)

and so on...

The solution is to prevent optimization in the first place if intermediate result (%2) has side uses and
known to be not removed.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D100170
2021-04-12 12:43:54 +07:00
Freddy Ye 5cb47be410 [X86] Remove FeatureCLWB from FeaturesICLClient
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100279
2021-04-12 12:08:59 +08:00
Chen Zheng bb346146a5 [Debug-Info] make fortran CHARACTER(1) type as valid unsigned type
This resolves https://bugs.llvm.org/show_bug.cgi?id=49872

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D100015
2021-04-11 23:17:01 -04:00
Qiu Chaofan ece7345859 [PowerPC] Lower f128 SETCC/SELECT_CC as libcall if p9vector disabled
XSCMPUQP is not available for pre-P9 subtargets. This patch will lower
them into libcall for correct behavior on power7/power8.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D92083
2021-04-12 10:33:32 +08:00
Jim Lin a3bfddbb6a [RISCV][NFC] Remove unneeded explict XLenVT type on codegen patterns
Customized SDNode has been specified the explict XLenVT type.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100190
2021-04-12 10:16:06 +08:00
Craig Topper cb4c793e46 [RISCV] Update computeKnownBitsForTargetNode to treat READ_VLENB as being 16 byte aligned.
According to the 0.10 spec, VLEN is at least 128 bits and is a
power of 2.
2021-04-11 17:54:23 -07:00
Craig Topper ff902080a9 [RISCV] Use SLLI/SRLI instead of SLLIW/SRLIW for (srl (and X, 0xffff), C) custom isel on RV64.
We don't need the sign extending behavior here and SLLI/SRLI
are able to compress to C.SLLI/C.SRLI.
2021-04-11 13:59:51 -07:00
Roman Lebedev 8fc8c745cf
[NFCI][SimplifyCFG] PerformValueComparisonIntoPredecessorFolding(): improve Dominator Tree updating
Same as with previous patches.
2021-04-11 23:56:23 +03:00
Roman Lebedev 13fca9d816
[NFCI][SimplifyCFG] mergeEmptyReturnBlocks(): improve Dominator Tree updating
Same as with previous patches.
2021-04-11 23:56:23 +03:00
Roman Lebedev 0699da1569
[NFCI][Local] MergeBasicBlockIntoOnlyPred(): improve Dominator Tree updating
Same as with TryToSimplifyUncondBranchFromEmptyBlock()/MergeBlockIntoPredecessor() patch.
2021-04-11 23:56:23 +03:00
Roman Lebedev e5692a564a
[NFCI][BasicBlockUtils] MergeBlockIntoPredecessor(): improve Dominator Tree updating
Same as with TryToSimplifyUncondBranchFromEmptyBlock() patch.
2021-04-11 23:56:23 +03:00
Roman Lebedev 2def9c3d8e
[NFCI][Local] TryToSimplifyUncondBranchFromEmptyBlock(): improve Dominator Tree updating
First, we don't need vector-ness for the predecessor lists.

Secondly, like elsewhere, do insertions before deletions.

Lastly, the check that we actually need to insert an edge,
that it doesn't exist already, is backwards. Instead of
looking at successors of every single 'PredOfBB',
just always look at predecessors of the 'Succ'.
The result is always the same, but we avoid *really* inefficient code.
2021-04-11 23:56:22 +03:00
Roman Lebedev 6d44b3c56d
[NFCI][DomTreeUpdater] applyUpdates(): reserve space for updates first
While, indeed, we may end up pushing less updates that we'd reserve space
for, self-dominating updates aren't often enough for that to matter.
But this should matter for normal updates.
2021-04-11 23:56:22 +03:00
Simon Pilgrim 231b87618b [X86][AVX512] Fold not(kmov(x)) -> kmov(not(x)) and not(widen_subvector(x)) -> widen_subvector(not(x))
Improve AVX512 mask inversion, rG38c799bce801 exposed some missing opportunities to move scalar not() back onto the boolvector types for folding with setcc etc.
2021-04-11 20:07:09 +01:00
Thomas Lively ea8dd3ee2e [WebAssembly] Update v128.any_true
In the final SIMD spec, there is only a single v128.any_true instruction, rather
than one for each lane interpretation because the semantics do not depend on the
lane interpretation.

Differential Revision: https://reviews.llvm.org/D100241
2021-04-11 11:13:16 -07:00
Simon Pilgrim 13bdac5709 [X86] combineXor - Pull out repeated getOperand() calls. NFCI. 2021-04-11 19:01:59 +01:00
Simon Pilgrim 38c799bce8 [X86] Fold cmpeq/ne(and(X,Y),Y) --> cmpeq/ne(and(~X,Y),0)
Followup to D100177, handle an similar (demorgan inverse style) case from PR47797 as well

The AVX512 test cases could be further improved if we folded not(iX bitcast(vXi1)) -> (iX bitcast(not(vXi1)))

Alive2: https://alive2.llvm.org/ce/z/AnA_-W
2021-04-11 18:42:01 +01:00
Craig Topper 3ae71226ef [RISCV] Drop earlyclobber constraint from vwadd(u).wx, vwsub(u).wx, vfwadd.wf and vfwsub.wf.
The first source has the same EEW as the destination and the other
source is a scalar so the overlap constraints don't apply to
the unmasked version.

For the masked version we have a constraint that the destination
can't be V0 so that covers the only overlap issue there.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D100217
2021-04-11 10:19:45 -07:00
Craig Topper bc0e052730 [RISCV] Teach targetShrinkDemandedConstant to preserve (and X, 0xffff) when zext.h is supported.
Similar to what we do for zext.w.

Disable the (srl (and X, 0xffff), C) custom isel when zext.h is
available.
2021-04-11 10:03:35 -07:00
Roman Lebedev 91248e2db9
[InstCombine] Improve "get low bit mask upto and including bit X" pattern
https://alive2.llvm.org/ce/z/3u-48R
2021-04-11 18:08:08 +03:00
Roman Lebedev a36bb7fd76
[InstCombine] (X | Op01C) + Op1C --> X + (Op01C + Op1C) iff the or is actually an add
https://alive2.llvm.org/ce/z/Coc5yf
2021-04-11 18:08:08 +03:00
Roman Lebedev 005881e96e
[LoopIdiom] left-shift-until-bittest: set all allowed no-wrap flags on add/sub
I've checked each one of these with alive2,
and this is both correct and precise.
2021-04-11 18:08:07 +03:00
Arthur Eubanks c88b87f9ce Revert "Remove "Rewrite Symbols" from codegen pipeline"
This reverts commit 6210261ecb.

addr-label.ll crashes on armv7.
2021-04-10 23:28:16 -07:00
Arthur Eubanks 6210261ecb Remove "Rewrite Symbols" from codegen pipeline
It breaks up the function pass manager in the codegen pipeline.

With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.

Add a check that we only have one function pass manager in the codegen
pipeline.

This required reverting commit 9583a3f2625818b78c0cf6d473cdedb9f23ad82c:
"[AsmPrinter] Delete dead takeDeletedSymbsForFunction()".
This was not NFC as initially thought. By coalescing two function
psas managers, this exposed the reverted code as necessary.
addr-label.ll was crashing due to an emitted blockaddress's block being
removed but the label not emitted.

Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99707
2021-04-10 22:38:44 -07:00
Roman Lebedev 9829f5e6b1
[CVP] @llvm.[us]{min,max}() intrinsics handling
If we can tell that either one of the arguments is taken,
bypass the intrinsic.

Notably, we are indeed fine with non-strict predicate:
* UL: https://alive2.llvm.org/ce/z/69qVW9 https://alive2.llvm.org/ce/z/kNFTKf
      https://alive2.llvm.org/ce/z/AvaPw2 https://alive2.llvm.org/ce/z/oxo53i
* UG: https://alive2.llvm.org/ce/z/wxHeGH https://alive2.llvm.org/ce/z/Lf76qx
* SL: https://alive2.llvm.org/ce/z/hkeTGS https://alive2.llvm.org/ce/z/eR_b-W
* SG: https://alive2.llvm.org/ce/z/wEqRm7 https://alive2.llvm.org/ce/z/FpAsVr

Much like with all other comparison handling in CVP,
while we could sort-of handle two Value's,
at least for plain ICmpInst it does not appear to be worthwhile.

This only fires 78 times on test-suite + dt + rs,
but we don't canonicalize to these yet. (only SCEV produces them)
2021-04-11 00:33:47 +03:00
Nikita Popov 8de2f1ff79 [IVUsers] Check LoopSimplify cache earlier (NFC)
Check the cache before calling isLoopSimplifyForm(). Otherwise we'd
always perform the check for the innermost loop and only skip it
for dominating loops.
2021-04-10 22:58:13 +02:00
Wenlei He 00ef28ef21 [CSSPGO] Fix dangling context strings and improve profile order consistency and error handling
This patch fixed the following issues along side with some refactoring:

1. Fix bugs where StringRef for context string out live the underlying std::string. We now keep string table in profile generator to hold std::strings. We also do the same for bracketed context strings in profile writer.
2. Make sure profile output strictly follow (total sample, name) order. Previously, there's inconsistency between ProfileMap's key and FunctionSamples's name, leading to inconsistent ordering. This is now fixed by introducing context profile canonicalization. Assertions are also added to make sure ProfileMap's key and FunctionSamples's name are always consistent.
3. Enhanced error handling for profile writing to make sure we bubble up errors properly for both llvm-profgen and llvm-profdata when string table is not populated correctly for extended binary profile.
4. Keep all internal context representation bracket free. This avoids creating new strings for context trimming, merging and preinline. getNameWithContext API is now simplied accordingly.
5. Factor out the code for context trimming and merging into SampleContextTrimmer in SampleProf.cpp. This enables llvm-profdata to use the trimmer when merging profiles. Changes in llvm-profgen will be in separate patch.

Differential Revision: https://reviews.llvm.org/D100090
2021-04-10 12:39:10 -07:00
Roman Lebedev f041757e9c
[NFC][JumpThreading] Increment 'NumFolds' statistic all places terminator becomes uncond 2021-04-10 21:24:29 +03:00
Roman Lebedev a407738def
[NFC][CVP] Add statistic for function pointer argument non-null-ness deduction 2021-04-10 21:23:20 +03:00
Roman Lebedev fe7b3ad8d5
[CVP] LVI: Use in-block values when checking value signedness domain
This has a huge positive impact on all the folds that use these helpers,
as it can be seen on vanilla test-suite + rawspeed + darktable:
correlated-value-propagation.NumSRems             +75.68% (+ 28)
correlated-value-propagation.NumAShrs             +63.87% (+198)
correlated-value-propagation.NumSDivs             +49.42% (+127)
correlated-value-propagation.NumSExt              + 8.85% (+593)
correlated-value-propagation.NumUDivURemsNarrowed	+ 8.65% (+34)

... while having pretty minimal compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=e8c7f43e2c2c6f3581ec1c6489ec21ad9f98958a&to=4cd197711e58ee1b2faeee0c35eea54540185569&stat=instructions
2021-04-10 21:10:59 +03:00
Roman Lebedev 257eda0794
[NFC][LVI] getPredicateAt(): drop default value for UseBlockValue
The default is likely wrong.
Out of all the callees, only a single one needs to pass-in false (JumpThread),
everything else either already passes true, or should pass true.

Until the default is flipped, at least make it harder to unintentionally
add new callees with UseBlockValue=false.
2021-04-10 20:46:01 +03:00
Roman Lebedev e8c7f43e2c
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:38:55 +03:00
Roman Lebedev 7b12c8c59d
Revert "[NFC][ConstantRange] Add 'icmp' helper method"
This reverts commit 17cf2c9423.
2021-04-10 19:37:53 +03:00
Roman Lebedev 17cf2c9423
[NFC][ConstantRange] Add 'icmp' helper method
"Does the predicate hold between two ranges?"

Not very surprisingly, some places were already doing this check,
without explicitly naming the algorithm, cleanup them all.
2021-04-10 19:09:52 +03:00
Roman Lebedev c329a47d9e
[CVP] @llvm.abs() handling
Iff we know the sigdness domain of the argument,
we can either skip @llvm.abs, or do negation directly.

Notably, INT_MIN can belong to either domain:
* X u<= INT_MIN --> X  is always fine
  https://alive2.llvm.org/ce/z/QB8j-C https://alive2.llvm.org/ce/z/7sFKpS
* X s<= 0 --> -X  is always fine
  https://alive2.llvm.org/ce/z/QbGSyq https://alive2.llvm.org/ce/z/APsN84

If all else fails, try to inferr NSW flag:
https://alive2.llvm.org/ce/z/qCJfYm
2021-04-10 16:47:31 +03:00
dfukalov 8f4b7e94a2 [AMDGPU][CostModel] Refine cost model for control-flow instructions.
Added cost estimation for switch instruction, updated costs of branches, fixed
phi cost.
Had to increase `-amdgpu-unroll-threshold-if` default value since conditional
branch cost (size) was corrected to higher value.
Test renamed to "control-flow.ll".

Removed redundant code in `X86TTIImpl::getCFInstrCost()` and
`PPCTTIImpl::getCFInstrCost()`.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D96805
2021-04-10 09:20:24 +03:00
Duncan P. N. Exon Smith 0db6488a77 Support: Add move semantics to mapped_file_region
Update llvm::sys::fs::mapped_file_region to have a move constructor and
a move assignment operator, allowing it to be used as an Optional. Also,
update FileOutputBuffer's OnDiskBuffer to take advantage of this,
avoiding an extra allocation from the unique_ptr.

A nice follow-up would be to make the mapped_file_region constructor
private and replace its use with a factory function, such as
mapped_file_region::create(), that returns an Expected (or ErrorOr). I
don't plan on doing that immediately, but I might swing back later.

No functionality change, besides the saved allocation in OnDiskBuffer.

Differential Revision: https://reviews.llvm.org/D100159
2021-04-09 17:56:26 -07:00
Mitch Phillips 092f288d36 Revert "[AMDGPU] Remove MachineDCE after SIFoldOperands"
This reverts commit 5a0117b2d0.

Reason: Dependent change d19a42eba9 broke
the ASan buildbots.
2021-04-09 15:47:44 -07:00
Mitch Phillips 3d4730a73f Revert "[AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs"
This reverts commit d19a42eba9.

Reason: Broke the ASan buildbots. See the original phabricator review
for more details: https://reviews.llvm.org/D100188
2021-04-09 15:47:44 -07:00
Jessica Paquette 49c3565b9b [AArch64][GlobalISel] Swap compare operands when it may be profitable
This adds support for swapping comparison operands when it may introduce new
folding opportunities.

This is roughly the same as the code added to AArch64ISelLowering in
162435e7b5.

For an example of a testcase which exercises this, see
llvm/test/CodeGen/AArch64/swap-compare-operands.ll

(Godbolt for that testcase: https://godbolt.org/z/43WEMb)

The idea behind this is that sometimes, we may be able to fold away, say, a
shift or extend in a compare by swapping its operands.

e.g. in the case of this compare:

```
lsl x8, x0, #1
cmp x8, x1
cset w0, lt
```

The following is equivalent:

```
cmp x1, x0, lsl #1
cset w0, gt
```

Most of the code here is just a reimplementation of what already exists in
AArch64ISelLowering.

(See `getCmpOperandFoldingProfit` and `getAArch64Cmp` for the equivalent code.)

Note that most of the AND code in the testcase doesn't actually fold. It seems
like we're missing selection support for that sort of fold right now, since SDAG
happily folds these away (e.g testSwapCmpWithShiftedZeroExtend8_32 in the
original .ll testcase)

Differential Revision: https://reviews.llvm.org/D89422
2021-04-09 15:46:48 -07:00
Roman Lebedev 077bff39d4
[Analysis] isDereferenceableAndAlignedPointer(): recurse into select's hands
By doing this within the method itself,
we support traversing multiple levels of selects (TODO: PHI's),
fixing the SROA `std::clamp()` testcase.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47271
Mostly fixes https://bugs.llvm.org/show_bug.cgi?id=49909
2021-04-10 00:56:28 +03:00
Mitch Phillips 1a2756b777 Revert "[PowerPC] Add ROP Protection Instructions for PowerPC"
This reverts commit 16fe741c69.

Reason: Broke the UBSan buildbots. More information available in the
phabricator review: https://reviews.llvm.org/D99375
2021-04-09 13:36:41 -07:00
Jay Foad 5a0117b2d0 [AMDGPU] Remove MachineDCE after SIFoldOperands
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.

Differential Revision: https://reviews.llvm.org/D100189
2021-04-09 20:41:09 +01:00
Jay Foad d19a42eba9 [AMDGPU] SIFoldOperands: eagerly erase dead REG_SEQUENCEs
This is fairly cheap to implement and means less work for future
passes like MachineDCE.

Differential Revision: https://reviews.llvm.org/D100188
2021-04-09 20:41:09 +01:00
Alina Sbirlea f6bff8d157 [MSSA] Rename uses in IDF regardless of new def position in basic block.
When inserting a new def and renaming of uses is asked, always compute
IDF and do the renaming for the blocks with Phis in that IDF.
Resolves PR49859.

Differential Revision: https://reviews.llvm.org/D100163
2021-04-09 12:32:37 -07:00
Stefan Pintilie 5bca7cdafb Add correct types to the xxsplti32dx pattern.
Regiser types for xxsplti32dx for two td file patterns was incorrect.
Fixed the two types and added a test case that was reduced from a larger
failing test.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D100223
2021-04-09 14:11:34 -05:00
Duncan P. N. Exon Smith 365053d2a5 Support: Remove code duplication for mapped_file_region accessors, NFC 2021-04-09 11:49:40 -07:00
Thomas Lively f30c429da6 [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR
When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.

Differential Revision: https://reviews.llvm.org/D100018
2021-04-09 11:21:49 -07:00
Alex Richardson 107189a26e [TableGen] Report an error message on a missing comma
I recently forgot a comma in a defm argument list and tablegen just
failed with exit code 1 without printing an error message. I believe
this issue was introduced in a9fc44c557.

This change prints the following instead:
.../clang/include/clang/Driver/Options.td:569:3: error: Expected comma before next argument

Reviewed By: Paul-C-Anagnostopoulos

Differential Revision: https://reviews.llvm.org/D100178
2021-04-09 18:48:49 +01:00
Amara Emerson 40e75cafc0 [AArch64][GlobalISel] Fix incorrect codegen for <16 x s8> G_ASHR.
Fixes PR49904
2021-04-09 10:41:41 -07:00
Stefan Pintilie 16fe741c69 [PowerPC] Add ROP Protection Instructions for PowerPC
There are four new PowerPC instructions that are introduced in
Power 10. They are hashst, hashchk, hashstp, hashchkp.

These instructions will be used for ROP Protection.
This patch adds the four instructions.

Reviewed By: nemanjai, amyk, #powerpc

Differential Revision: https://reviews.llvm.org/D99375
2021-04-09 12:09:01 -05:00
Adrian Prantl 6ce76ff7eb Update the linkage name of coro-split functions in the debug info.
This patch updates the linkage name in the DISubprogram of coro-split
functions, which is particularly important for Swift, where the
funclets have a special name mangling. This patch does not affect C++
coroutines, since the DW_AT_specification is expected to hold the
(original) linkage name. I believe this is mostly due to limitations
in AsmPrinter, so we might be able to relax this restriction in the
future.

Differential Revision: https://reviews.llvm.org/D99693
2021-04-09 09:50:56 -07:00
Simon Pilgrim d8bc4de3cf [X86] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) on non-BMI targets (PR44136)
Followup to D100177, enable the fold for non-BMI targets as well.
2021-04-09 16:11:11 +01:00
Simon Pilgrim 245036950a [X86][BMI] Fold cmpeq/ne(or(X,Y),X) --> cmpeq/ne(and(~X,Y),0) (PR44136)
I've initially just enabled this for BMI which has the ANDN instruction for i32/i64 - the i16/i8 cases give an idea of what'd we get when we enable it in all cases (I'll do this as a later commit).

Additionally, the i16/i8 cases could be freely promoted to i32 (as the args are already zeroext) and we could then make use of ANDN + the free cmp0 there as well - this has come up in PR48768 and PR49028 so I'm going to look at this soon.

https://alive2.llvm.org/ce/z/QVWHP_
https://alive2.llvm.org/ce/z/pLngT-

Vector cases do not appear to benefit from this as we end up with having to generate the zero vector as well - this is one of the reasons I didn't try to tie this into hasAndNot/hasAndNotCompare.

Differential Revision: https://reviews.llvm.org/D100177
2021-04-09 15:52:03 +01:00
Sanjay Patel 84cdccc9dc [InstCombine] try to eliminate an instruction in min/max -> abs fold
As suggested in the review thread for 5094e12 and seen in the
motivating example from https://llvm.org/PR49885, it's not
clear if we have a way to create the optimal code without
this heuristic.
2021-04-09 10:34:03 -04:00
Momchil Velikov acf3279a03 For non-null pointer checks, do not descend through out-of-bounds GEPs
In LazyValueInfoImpl::isNonNullAtEndOfBlock we populate a set of
pointers, known to be non-null at the end of a block (e.g. because we
did a load through them). We then infer that any pointer, based on an
element of this set is non-null as well ("based" here meaning a
non-null pointer is the underlying object). This is incorrect, even if
the base pointer was non-null, the value of a GEP, that lacks the
inbounds` attribute, may be null.

This issue appeared as miscompilation of the following test case:

int puts(const char *);

typedef struct iter {
  int *val;
} iter_t;

static long distance(iter_t first, iter_t last) {
  long r = 0;
  for (; first.val != last.val; first.val++)
    ++r;
  return r;
}

int main() {
  int arr[2] = {0};
  iter_t i, j;
  i.val = arr;
  j.val = arr + 1;
  if (distance(i, j) >= 2)
    puts("failed");
  else
    puts("passed");
}

This fixes PR49662.

Differential Revision: https://reviews.llvm.org/D99642
2021-04-09 14:09:23 +01:00
Jay Foad a4ced03d34 [AMDGPU] SIFoldOperands: eagerly delete dead copies
This is cheap to implement, means less work for future passes like
MachineDCE, and slightly improves the folding in some cases.

Differential Revision: https://reviews.llvm.org/D100117
2021-04-09 13:52:54 +01:00
Sebastian Neubauer cc7add5298 [AMDGPU] Use SIInstrFlags for flat variants. NFC
Use SIInstrFlags to differentiate between the different
variants of flat instructions (flat, global and scratch).
This should make it easier to bundle the immediate offset logic in a
single place and implement restrictions and bug workarounds.

Fixed version of D99587, which does not rely on the address space.

Differential Revision: https://reviews.llvm.org/D99743
2021-04-09 12:28:36 +02:00
dfukalov c1a88e007b [AA][NFC] Convert AliasResult to class containing offset for PartialAlias case.
Add an ability to store `Offset` between partially aliased location. Use this
storage within returned `ResultAlias` instead of caching it in `AAQueryInfo`.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98718
2021-04-09 13:26:09 +03:00
dfukalov d066079728 [NFC][AA] Prepare to convert AliasResult to class with PartialAlias offset.
Main reason is preparation to transform AliasResult to class that contains
offset for PartialAlias case.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D98027
2021-04-09 12:54:22 +03:00
Simon Pilgrim 3ae0a405fc [X86] combineHorizOpWithShuffle - peek through one use bitcasts when decoding shuffles.
Checking for one use, peek through bitcasts of the horizop args to allows us to merge shuffles of different widths through the horizop.
2021-04-09 10:51:04 +01:00
Max Kazantsev baf17e2cc9 [NFC] Move statictic increment out of helper 2021-04-09 16:32:35 +07:00
Sebastian Neubauer ba217b4655 [RegisterScavenging] Add asserts for better errors
These cases were failing before, but with cryptic asserts.
Add asserts in the RegScavenger that fail earlier with better
messages. NFC

Differential Revision: https://reviews.llvm.org/D100109
2021-04-09 11:23:36 +02:00
Sebastian Neubauer 36138db116 [AMDGPU] IsFlatScratch/Global -> FlatScratch/Global
Remove 'Is' from IsFlatScratch/Global. NFC

Differential Revision: https://reviews.llvm.org/D100108
2021-04-09 11:20:31 +02:00
Max Kazantsev 275f3a2540 [GVN][NFC] Factor out load elimination logic via PRE for reuse 2021-04-09 16:12:25 +07:00
Jim Lin 7eaa2810c4 [RISCV][NFC] Replace explicit type i64 with riscv customized SDTypeProfile.
New SDTypeProfile can be reused for other word operation patterns without explicit i64 type in the future.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100097
2021-04-09 17:06:17 +08:00
Jim Lin 6169f1537c [RISCV][NFC] Fix formatting 2021-04-09 14:41:09 +08:00
Serguei Katkov f6e3b4fe58 [GreedyRA ORE] Re-factor computeNumberOfSplillsReloads.
Replace if-else to if-continue usage.
This simplifies further extension of the collected stats.
2021-04-09 12:44:11 +07:00
Arthur Eubanks 4c89bcadf6 [LICM] Hoist loads with invariant.group metadata
Previously loading the vtable used in calling a virtual method in a loop
was not hoisted out of the loop. This fixes that.

canSinkOrHoistInst() itself doesn't check that the load operands are
loop invariant, callers also check that separately.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D99784
2021-04-08 21:57:37 -07:00
Serguei Katkov d2e15a83a6 [RS4GC] Cleanup meetBDVState. NFC.
meetBDVState looks pretty difficult to read and follow.
This is purely NFC but doing several things:

1) Combine meet and meetBDVState
2) Move the function to be a member of BDVState
3) Make BDVState be a mutable object
4) Convert switch to sequence of ifs
5) Adds comments.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D99064
2021-04-09 10:20:25 +07:00