Commit Graph

156499 Commits

Author SHA1 Message Date
Aaron Puchert c1a31ee65b [PPCISelLowering] Avoid emitting calls to __multi3, __muloti4
After D108936, @llvm.smul.with.overflow.i64 was lowered to __multi3
instead of __mulodi4, which also doesn't exist on PowerPC 32-bit, not
even with compiler-rt. Block it as well so that we get inline code.

Because libgcc doesn't have __muloti4, we block that as well.

Fixes #54460.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D122090
2022-03-20 20:59:30 +01:00
Kazu Hirata bce1bf0ee2 [Transform] Apply clang-tidy fixes for readability-redundant-smartptr-get (NFC) 2022-03-20 10:41:22 -07:00
esmeyi de20a3b677 [XCOFF] support XCOFFObjectWriter for fileHeader and sectionHeaders in 64-bit XCOFF.
This is the first patch to enable the XCOFF64 object writer.
Currently only fileHeader and sectionHeaders are supported.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D120861
2022-03-20 09:31:29 -04:00
Luo, Yuanke 10bb623192 enable binop identity constant folds for add
Differential Revision: https://reviews.llvm.org/D119654
2022-03-20 19:07:16 +08:00
Shengchen Kan 51e6059c12 [X86] Simplify function isDataInvariant by using X86MnemonicTables
This is not a NFC change b/c we add more instructions like
IMUL16/32/64r, MOV16ao16 and MOV16rr_REV etc to the list.
But I think it's reasonable.

Reviewed By: Amir

Differential Revision: https://reviews.llvm.org/D122063
2022-03-20 18:38:58 +08:00
Simon Pilgrim 1ae3c4e948 [X86] combineAddOrSubToADCOrSBB - split to more cleanly handle commuted variants.
Split combineAddOrSubToADCOrSBB into wrapper (which handles ADDs with commuted args) and the real combine, which no longer has to account for commutation.

I'm intending to extend combineAddOrSubToADCOrSBB to detect patterns other than just X86ISD::SETCC, so we need to detect all patterns without detecting them as part of a commutation swap.
2022-03-20 09:14:21 +00:00
Shengchen Kan e58dadf3e2 [X86][NFC] Generate fields and getters for subtarget features
Non-duplicated comments are moved from X86Subtarget.h to X86.td.
This is a follow-up patch for D120906.
2022-03-20 15:27:21 +08:00
Shengchen Kan ae0ae91903 [X86][NFC] Remove unused variable UseAA 2022-03-20 13:21:25 +08:00
Shengchen Kan c266776429 [X86][NFC] Remove unused feature UseAA 2022-03-20 13:14:13 +08:00
Shengchen Kan 076a9dc99a [X86][NFC] Rename hasCMOV() to canUseCMOV(), hasLAHFSAHF() to canUseLAHFSAHF()
To make them less like other feature functions.
This is a follow-up patch for D121978.
2022-03-20 12:00:25 +08:00
Craig Topper 4eb59f0179 [SelectionDAG][RISCV] Make RegsForValue::getCopyToRegs explicitly zero_extend constants.
ComputePHILiveOutRegInfo assumes that constant incoming values to
Phis will be zero extended if they aren't a legal type. To guarantee
that we should zero_extend rather than any_extend constants.

This fixes a bug for RISCV where any_extend of constants can be
treated as a sign_extend.

Differential Revision: https://reviews.llvm.org/D122053
2022-03-19 18:43:14 -07:00
Jon Chesterfield bcbd4cf1f2 Revert "[amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal"
Reconsidered, better to handle per-function state in the constructor as before.
This reverts commit 98e474c1b3.
2022-03-20 00:58:26 +00:00
Philip Reames 6253b77da9 [SLP] Respect control dependence within a block during scheduling
This fixes an active miscompile visible in the test changes.  The basic problem is that the scheduling dependency graph didn't have any edges for control dependence within a single basic block.  The result is that we could (and in some rare cases *did*) perform reorderings within a block which could introduce new undefined behavior along paths which didn't previously contain any.

Impact wise, we have two major cases where control is not guaranteed to reach a later instruction in the block: may throw calls, and calls containing infinite loops.
* The former case was mostly covered by the memory dependencies, and to trigger require a function which can throw, but not write to memory.  In theory, such a case is possible, but not likely in practice.
* The later case is likely more of an issue in practice.  After this code was first written, we changed the IR semantics to allow well defined infinite loops without satisifying mustprogress.  Even for C/C++ - which do imply mustprogress - recent changes to how we treat atomics (e.g. an atomic read does not always imply a write) could expose this issue.  I'm a bit shocked we don't seem to have a bug report which hit this in real code actually.

Compile time wise, this results in a single extra scan of the scheduling window in the common case.  Since we stop scanning at the next instruction which isn't guaranteed to execute, no matter what order we traverse instructions in, we scan the block once.  The exception to this is that when we extend the scheduling window downwards, we invalidate all dependencies, and thus rescan.  So the potentially expensive case is when we a call in a big schedule window which is frequently extended.  We could optimize this case (by caching the last instruction not guaranteeed to transfer execution and scanning only the extended window) and starting there), but I decided to leave the complexity until it mattered.  That same case is already degenerate with memory dependences which is more expensive than the control dependence scan.

We could also consider combining the memory dependence and control dependence sets to reduce memory usage, but since it complicates the code slightly and makes debugging a bit harder, I went with the simplest scheme for now.

This was noticed while trying to understand the failures reported against D118538, but is not otherwise related to that change.
2022-03-19 13:36:24 -07:00
Tal Kedar 129311ac0b [libSupport] make CallBacksToRun static local
In order to allow compiling with -Werror=global-constructors with c++20 and above.

Discussion: https://discourse.llvm.org/t/llvm-lib-support-signals-cpp-fails-to-compile-due-to-werror-global-constructors/61070

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D122067
2022-03-19 20:31:04 +00:00
Florian Hahn 1a820ff039
[LV] Remove unnecessary uses of Loop* (NFC).
Update functions that previously took a loop pointer but only to get the
pre-header. Instead, pass the block directly. This removes the
requirement for the loop object to be created up-front.
2022-03-19 20:18:47 +00:00
Craig Topper 57b41af838 [X86] Rename FeatureCMPXCHG8B/FeatureCMPXCHG16B to FeatureCX8/CX16 to match CPUID.
Rename hasCMPXCHG16B() to canUseCMPXCHG16B() to make it less like other
feature functions. Add a similar canUseCMPXCHG8B() that aliases
hasCX8() to keep similar naming.

Differential Revision: https://reviews.llvm.org/D121978
2022-03-19 12:34:06 -07:00
Johannes Doerfert 4166738c38 [OpenMP][FIX] Do not crash when kernels are debug wrapper functions
With debug information enabled (-g) Clang will wrap the actual target
region into a new function which is called from the "kernel". The problem
is that the "kernel" is now basically a wrapper without all the things
we expect. More importantly, if we end up asking for an AAKernelInfo
for the "target region function" we might try to turn it into SPMD mode.
That used to cause an assertion as that function doesn't have an
appropriately named `_exec_mode` global. While the global is going away
soon we still need to make sure to properly handle this case, e.g.,
perform optimizations reliably.

Differential Revision: https://reviews.llvm.org/D122043
2022-03-19 14:15:55 -05:00
Simon Pilgrim 34110a7320 [X86] combineAddOrSubToADCOrSBB - pull out repeated Y.getOperand(1) calls. NFC. 2022-03-19 17:56:11 +00:00
Simon Pilgrim b90478d422 [X86] createShuffleMaskFromVSELECT - handle BLENDV constant masks as well as VSELECT constant masks
Handle constant masks for both vselect nodes (mask != 0) and blendv nodes (mask < 0)
2022-03-19 16:51:07 +00:00
Jon Chesterfield 98e474c1b3 [amdgpu][nfc] Pass function instead of module to allocateModuleLDSGlobal 2022-03-19 16:42:17 +00:00
Simon Pilgrim a6c18bfbe3 [X86] combineSelect - don't constant fold BLENDV nodes like VSELECT
If a X86ISD::BLENDV op appears before legalization (in this test case due to the icmp_slt x, 0) its constant mask was being treated as a vselect mask (mask != 0) instead of blendv (mask < 0)

This just prevents constant folding entirely for non-VSELECT ops.
2022-03-19 16:31:19 +00:00
Simon Pilgrim 56ad791f46 [X86] LowerAndToBT - fold BT(NOT(X),Y) -> BT(X,Y) and flip the CondCode 2022-03-19 14:03:03 +00:00
Simon Pilgrim c7ba5a9aff [X86][SSE] Add initial support for extracting non-constant bool vector elements
We can use MOVMSK+TEST/BT to extract individual bool elements even if the index isn't constant

This relies on combineBitcastvxi1 so some AVX512 cases still aren't optimized as they avoid MOVMSK usage.
2022-03-19 13:31:05 +00:00
chenglin.bi dd3b90e4d7 [AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D121449
2022-03-19 13:04:16 +00:00
Paul Walker f46fe36d59 [AArch64] Fix incorrect getSetCCInverse usage within trySwapVSelectOperands.
When inverting the compare predicate trySwapVSelectOperands is
incorrectly using the type of the select's cond operand rather
than the type of cond's operands. This means we're treating all
inversions as if they're integer.

Differential Revision: https://reviews.llvm.org/D121968
2022-03-19 12:36:14 +00:00
Shengchen Kan ea9139fe16 [Xcore] Set Int_MemBarrier as a meta-instruction
Reviewed By: nigelp-xmos

Differential Revision: https://reviews.llvm.org/D121982
2022-03-19 17:52:37 +08:00
Craig Topper 306ff74154 [SelectionDAG] Use APInt::zextOrSelf instead of zextOrTrunc in ComputePHILiveOutRegInfo
The width never decreases here.
2022-03-18 23:26:19 -07:00
Chris Bieneman 95871187bf Add DXIL triple
This patch adds triple support for:

* dxil architecture
* shadermodel OS (with version parsing)
* shader stages as environment

Reviewed By: MaskRay, pete

Differential Revision: https://reviews.llvm.org/D122031
2022-03-19 00:17:43 -05:00
Eli Friedman ddca66622c [ARM] Fix shouldExpandAtomicLoadInIR for subtargets without ldrexd.
Regression from 2f497ec3; we should not try to generate ldrexd on
targets that don't have it.

Also, while I'm here, fix shouldExpandAtomicStoreInIR, for consistency.
That doesn't really have any practical effect, though.  On Thumb targets
where we need to use __sync_* libcalls, there is no libcall for stores,
so SelectionDAG calls __sync_lock_test_and_set_8 anyway.
2022-03-18 15:54:38 -07:00
Fangrui Song c6692f819e [GlobalOpt] Don't replace alias with aliasee if either alias/aliasee may be preemptible
Generalize D99629 for ELF. A default visibility non-local symbol is preemptible
in a -shared link. `isInterposable` is an insufficient condition.

Moreover, a non-preemptible alias may be referenced in a sub constant expression
which intends to lower to a PC-relative relocation. Replacing the alias with a
preemptible aliasee may introduce a linker error.

Respect dso_preemptable and suppress optimization to fix the abose issues. With
the change, `alias = 345` will not be rewritten to use aliasee in a `-fpic`
compile.
```
int aliasee;
extern int alias __attribute__((alias("aliasee"), visibility("hidden")));
void foo() { alias = 345; } // intended to access the local copy
```

While here, refine the condition for the alias as well.

For some binary formats like COFF, `isInterposable` is a sufficient condition.
But I think canonicalization for the changed case has little advantage, so I
don't bother to add the `Triple(M.getTargetTriple()).isOSBinFormatELF()` or
`getPICLevel/getPIELevel` complexity.

For instrumentations, it's recommended not to create aliases that refer to
globals that have a weak linkage or is preemptible. However, the following is
supported and the IR needs to handle such cases.
```
int aliasee __attribute__((weak));
extern int alias __attribute__((alias("aliasee")));
```

There are other places where GlobalAlias isInterposable usage may need to be
fixed.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107249
2022-03-18 14:17:05 -07:00
Arthur Eubanks ddc702376a [NewPM] Don't skip SCCs not in current RefSCC
With D107249 I saw huge compile time regressions on a module (150s ->
5700s). This turned out to be due to a huge RefSCC in
the module. As we ran the function simplification pipeline on functions
in the SCCs in the RefSCC, some of those SCCs would be split out to
their RefSCC, a child of the current RefSCC. We'd skip the remaining
SCCs in the huge RefSCC because the current RefSCC is now the RefSCC
just split out, then revisit the original huge RefSCC from the
beginning.  This happened many times because many functions in the
RefSCC were optimizable to the point of becoming their own RefSCC.

This patch makes it so we don't skip SCCs not in the current RefSCC so
that we split out all the child RefSCCs on the first iteration of
RefSCC. When we split out a RefSCC, we invalidate the original RefSCC
and add the remainder of the SCCs into a new RefSCC in
RCWorklist. This happens repeatedly until we finish visiting all
SCCs, at which point there is only one valid RefSCC in
RCWorklist from the original RefSCC containing all the SCCs that
were not split out, and we visit that.

For example, in the newly added test cgscc-refscc-mutation-order.ll,
we'd previously run instcombine in this order:
f1, f2, f1, f3, f1, f4, f1

Now it's:
f1, f2, f3, f4, f1

This can cause more passes to be run in some specific cases,
e.g. if f1<->f2 gets optimized to f1<-f2, we'd previously run f1, f2;
now we run f1, f2, f2.

This improves kimwitu++ compile times by a lot (12-15% for various -O3 configs):
https://llvm-compile-time-tracker.com/compare.php?from=2371c5a0e06d22b48da0427cebaf53a5e5c54635&to=00908f1d67400cab1ad7bcd7cacc7558d1672e97&stat=instructions

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D121953
2022-03-18 14:16:29 -07:00
Eli Friedman 5cd9fa551e Fix computation of MadeChange bit in AtomicExpandPass.
Fixes llvm-clang-x86_64-expensive-checks-debian failure with 2f497ec3.

expandAtomicStore always modifies the function, so make sure we set
MadeChange unconditionally. Not sure how nobody else has stumbled over
this before.
2022-03-18 13:47:11 -07:00
Stanislav Mekhanoshin e9a49c6483 [AMDGPU] gfx940 basic speed model
This is incomplete and will handle more instructions as they are added.

Differential Revision: https://reviews.llvm.org/D121966
2022-03-18 13:19:47 -07:00
Stanislav Mekhanoshin 4570527e72 [AMDGPU] Disable some MFMA instructions on gfx940
Differential Revision: https://reviews.llvm.org/D121956
2022-03-18 13:19:12 -07:00
Stanislav Mekhanoshin 0a79e1f30a [AMDGPU] reuse blgp as neg in 2 mfma operations on gfx940
GFX940 repurposes BLGP as NEG only in DGEMM MFMA.

Differential Revision: https://reviews.llvm.org/D121745
2022-03-18 12:56:51 -07:00
Eli Friedman 2f497ec3a0 [ARM] Fix ARM backend to correctly use atomic expansion routines.
Without this patch, clang would generate calls to __sync_* routines on
targets where it does not make sense; we can't assume the routines exist
on unknown targets. Linux has special implementations of the routines
that work on old ARM targets; other targets have no such routines. In
general, atomics operations which aren't natively supported should go
through libatomic (__atomic_*) APIs, which can support arbitrary atomics
through locks.

ARM targets older than v6, where this patch makes a difference, are rare
in practice, but not completely extinct. See, for example, discussion on
D116088.

This also affects Cortex-M0, but I don't think __sync_* routines
actually exist in any Cortex-M0 libraries. So in practice this just
leads to a slightly different linker error for those cases, I think.

Mechanically, this patch does the following:

- Ensures we run atomic expansion unconditionally; it never makes sense to
completely skip it.
- Fixes getMaxAtomicSizeInBitsSupported() so it returns an appropriate
number on all ARM subtargets.
- Fixes shouldExpandAtomicRMWInIR() and shouldExpandAtomicCmpXchgInIR() to
correctly handle subtargets that don't have atomic instructions.

Differential Revision: https://reviews.llvm.org/D120026
2022-03-18 12:43:57 -07:00
Simon Pilgrim 5dde9c1286 [CostModel][X86] Reduce cost of extracting bool vector elements
For constant indices, these are now just a MOVMSK+TEST/BT
2022-03-18 19:02:47 +00:00
Philip Reames 1093949cff [SLP] Add comment clarifying assumption that tripped me up [NFC]
I keep thinking this assumption is probably exploitable for a bug in the existing implementation, but all of my attempts at writing a test case have failed.  So for the moment, just document this very subtle assumption.
2022-03-18 11:40:19 -07:00
Kazu Hirata 3e0f7c7881 [Vectorize] Fix an 'unused function' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3917:13: error:
  unused function 'needToScheduleSingleInstruction'
  [-Werror,-Wunused-function]
2022-03-18 11:24:57 -07:00
Kazu Hirata 6cf1bd3ad3 [VE] Fix an 'unused variable' warning
This patch fixes:

  llvm/lib/Target/VE/VVPISelLowering.cpp:186:11: error: unused
  variable 'PassThru' [-Werror,-Wunused-variable]
2022-03-18 11:24:56 -07:00
Kazu Hirata b3d8c0d069 [Vectorize] Fix an 'unused variable' warning
This patch fixes:

  llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8148:18: error:
  unused variable 'SDTE' [-Werror,-Wunused-variable]
2022-03-18 11:24:54 -07:00
Nick Desaulniers e1bae23f6f [SCCP] do not clean up dead blocks that have their address taken
[SCCP] do not clean up dead blocks that have their address taken

Fixes a crash observed in IPSCCP.

Because the SCCPSolver has already internalized BlockAddresses as
Constants or ConstantExprs, we don't want to try to update their Values
in the ValueLatticeElement. Instead, continue to propagate these
BlockAddress Constants, continue converting BasicBlocks to unreachable,
but don't delete the "dead" BasicBlocks which happen to have their
address taken.  Leave replacing the BlockAddresses to another pass.

Fixes: https://github.com/llvm/llvm-project/issues/54238
Fixes: https://github.com/llvm/llvm-project/issues/54251

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D121744
2022-03-18 11:02:15 -07:00
Philip Reames 8f108c32bc Revert "[SLP] Optionally preserve MemorySSA"
This reverts commit 1cfa986d68.  See https://github.com/llvm/llvm-project/issues/54256 for why I'm discontinuing the project.

Seperately, it turns out that while this patch does correctly preserve MSSA, it's correct only at the end of the pass; not between vectorization attempts.  Even if we decide to resurrect this, we'll need to fix that before reapplying.
2022-03-18 10:45:59 -07:00
Florian Mayer 078b546555 [HWASan] do not replace lifetime intrinsics with tagged address.
Quote from the LLVM Language Reference
  If ptr is a stack-allocated object and it points to the first byte of the
  object, the object is initially marked as dead. ptr is conservatively
  considered as a non-stack-allocated object if the stack coloring algorithm
  that is used in the optimization pipeline cannot conclude that ptr is a
  stack-allocated object.

By replacing the alloca pointer with the tagged address before this change,
we confused the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D121835
2022-03-18 10:39:51 -07:00
Florian Mayer dbc918b649 Revert "[HWASan] do not replace lifetime intrinsics with tagged address."
Failed on buildbot:

/home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/llc: error: : error: unable to get target for 'aarch64-unknown-linux-android29', see --version and --triple.
FileCheck error: '<stdin>' is empty.
FileCheck command line:  /home/buildbot/buildbot-root/llvm-clang-x86_64-sie-ubuntu-fast/build/bin/FileCheck /home/buildbot/buildbot-root/llvm-project/llvm/test/Instrumentation/HWAddressSanitizer/stack-coloring.ll --check-prefix=COLOR

This reverts commit 208b923e74.
2022-03-18 10:04:48 -07:00
Florian Hahn 5ab421fb4e
[LICM] Add allowspeculation pass options.
This adds a new option to control AllowSpeculation added in D119965 when
using `-passes=...`.

This allows reproducing #54023 using opt.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D121944
2022-03-18 16:51:57 +00:00
Florian Mayer 208b923e74 [HWASan] do not replace lifetime intrinsics with tagged address.
Quote from the LLVM Language Reference
  If ptr is a stack-allocated object and it points to the first byte of the
  object, the object is initially marked as dead. ptr is conservatively
  considered as a non-stack-allocated object if the stack coloring algorithm
  that is used in the optimization pipeline cannot conclude that ptr is a
  stack-allocated object.

By replacing the alloca pointer with the tagged address before this change,
we confused the stack coloring algorithm.

Reviewed By: eugenis

Differential Revision: https://reviews.llvm.org/D121835
2022-03-18 09:45:05 -07:00
Tomas Matheson 831ab35b2f [ARM][AArch64] generate subtarget feature flags
Reland of D120906 after sanitizer failures.

This patch aims to reduce a lot of the boilerplate around adding new subtarget
features. From the SubtargetFeatures tablegen definitions, a series of calls to
the macro GET_SUBTARGETINFO_MACRO are generated in
ARM/AArch64GenSubtargetInfo.inc.  ARMSubtarget/AArch64Subtarget can then use
this macro to define bool members and the corresponding getter methods.

Some naming inconsistencies have been fixed to allow this, and one unused
member removed.

This implementation only applies to boolean members; in future both BitVector
and enum members could also be generated.

Differential Revision: https://reviews.llvm.org/D120906
2022-03-18 16:07:00 +00:00
Florian Hahn 1b7ef6aac8
[BasicAA] Account for wrapping when using abs(VarIndex) >= abs(Scale).
The patch adds an extra check to only set MinAbsVarIndex if
abs(V * Scale) won't wrap. In the absence of IsNSW, try to use the
bitwidths of the original V and Scale to rule out wrapping.

Attempt to model https://alive2.llvm.org/ce/z/HE8ZKj

The code in the else if below probably needs the same treatment, but I
need to come up with a test first.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D121695
2022-03-18 14:41:15 +00:00
Kevin P. Neal bd050a34fe [FPEnv][InstSimplify] Teach CannotBeNegativeZero() about constrained intrinsics.
Currently some optimizations are disabled because llvm::CannotBeNegativeZero()
does not know how to deal with the constrained intrinsics. This patch fixes
that by extending the existing implementation.

Differential Revision: https://reviews.llvm.org/D121483
2022-03-18 10:24:48 -04:00