Commit Graph

147671 Commits

Author SHA1 Message Date
Jingu Kang a2a0ac42ab [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:

                             newbound = min(n, c)
 while (iv < n) {            while(iv < newbound) {
   A                           A
   if (iv < c)                 B
     B                         C
   C                         }
 }                           if (iv != n) {
                               while (iv < n) {
                                 A
                                 C
                               }
                             }

Differential Revision: https://reviews.llvm.org/D102234
2021-06-07 10:55:25 +01:00
Florian Hahn 23c2f2e6b2
[LV] Mark increment of main vector loop induction variable as NUW.
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.

If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.

This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.

At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.

Note that this could probably be further improved by using information
from the original IV.

Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g

Part of a set of fixes required for PR50412.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D103255
2021-06-07 10:47:52 +01:00
Esme-Yi bcb20aa770 Fixed the build failure of yaml2obj in XCOFFEmitter.cpp:
error: ambiguous overload for 'operator=='
  (operand types are 'llvm::yaml::Hex16' and 'llvm::XCOFF::MagicNumber')
     Is64Bit = Obj.Header.Magic == XCOFF::XCOFF64;
2021-06-07 05:45:05 +00:00
Esme-Yi 50bb1b930d [yaml2obj] Initial the support of yaml2obj for 32-bit XCOFF.
Summary: The patch implements the mapping of the Yaml
information to XCOFF object file to enable the yaml2obj
tool for XCOFF. Currently only 32-bit is supported.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D95505
2021-06-07 04:14:44 +00:00
Simon Pilgrim 432eff22ab [CostModel][X86] Add 512-bit bswap costs 2021-06-06 22:36:34 +01:00
Simon Pilgrim ae973380c5 [CostModel][X86] Improve AVX512 FDIV costs
Add missing v16f32/v8f64 costs and adjust other costs as well based off the SkylakeServer model
2021-06-06 21:41:05 +01:00
Craig Topper 8bde5f06a1 [RISCV] Replace && with ||. Spotted by coverity.
We should be exiting when the shift amount is greater than
the bit width regardless of whether it is a power of 2.

Reported by Simon Pilgrim here https://reviews.llvm.org/D96661

This requires getting a shift amount that is out of bounds that
wasn't already optimized by SelectionDAG. This would be pretty
trick to construct a test for.

Or it would require a non-power of 2 shift amount and a mask
that has runs of ones and zeros of the next lowest power of 2 from
that shift amount. I tried a little to produce a test for this,
but didn't get it to work.
2021-06-06 13:09:51 -07:00
Simon Pilgrim 8ab8b3fad7 [X86][SSE] LowerFP_TO_INT - remove dead code. NFCI.
Non-Strict v2f32->v2i64 cases have already early-returned to be handled by legalization.
2021-06-06 20:04:15 +01:00
Simon Pilgrim 4879c8f3b0 [X86][SSE] combineVectorTruncation - simplify PSHUFB-is-better logic. NFCI.
OutSVT is guaranteed to be i8/i16 and we accept any InSVT that isn't i64
2021-06-06 20:04:14 +01:00
maekawatoshiki 0a9d079931 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit 2165360003.

To fix the crash problem in legacy pass manager
2021-06-07 01:26:47 +09:00
Simon Pilgrim b69e16b5cc X86MachObjectWriter.cpp - silence null deference warnings. NFCI.
The MCSymbol data should always be present for non-absolute sections so assert that it is to silence static analysis warnings.
2021-06-06 15:33:47 +01:00
Nikita Popov 1ffa6499ea [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Simon Pilgrim 0f938a6ed8 X86Operand.h - fix uninitialized variable warnings in constructor. NFCI. 2021-06-06 15:25:03 +01:00
Simon Pilgrim 76a1be05fa AssumeBundleQueries.cpp - don't dereference a dyn_cast<> result. NFCI.
Use cast<> instead which will assert that the cast is correct and not just return null - the match() should have already failed if the cast isn't valid anyhow.

Fixes static analysis warning.
2021-06-06 15:25:03 +01:00
Nikita Popov 506875c879 [TargetLowering] Move methods out of line (NFC)
Move methods using IRBuilder out of line, so we can drop the
dependency on the header.
2021-06-06 16:02:10 +02:00
Nikita Popov 9914200393 [CodeGen] Add missing includes (NFC)
These currently rely on the IRBuilder.h include in TargetLowering.h.
Make them explicit.
2021-06-06 15:48:27 +02:00
Simon Pilgrim 9ced408fe9 SimplifyCFG.cpp - remove dead early-return code added at rGcc63203908da. NFCI.
We've already checked that ScanIdx == 0 a few lines above.
2021-06-06 14:15:11 +01:00
Simon Pilgrim 937c4cffd0 Fix implicit fall through compiler warning. NFCI. 2021-06-06 13:45:11 +01:00
Simon Pilgrim ab2d295552 BPFISelDAGToDAG.cpp - don't dereference a dyn_cast<> result. NFCI.
Use cast<> instead which will assert that the cast is correct and not just return null.

Fixes static analysis warnings.
2021-06-06 13:24:29 +01:00
Liqiang Tao 48252d7570 Revert "[llvm] Add interface to order inlining" 2021-06-06 14:45:03 +08:00
Liqiang Tao 478dc47292 [llvm] Add interface to order inlining
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order, i.e. use queue or priority queue.

Reviewed By: kazu

Differential Revision: https://reviews.llvm.org/D103315
2021-06-06 12:03:02 +08:00
Simon Pilgrim e8423dbf35 BranchProbability.cpp - add missing implicit cmath header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h (necessary for gcc builds but not MSVC)
2021-06-05 21:14:43 +01:00
David Green 12f53e5392 [AArch64] Remove AArch64ISD::NEG
This NEG node is just a vector negation, easily represented as a SUB
zero. Removing it from the one place it is generated is essentially an
NFC, but can allow some extra folding. The updated tests are now loading
different constant literals, which have already been negated.

Differential Revision: https://reviews.llvm.org/D103703
2021-06-05 19:54:42 +01:00
Simon Pilgrim be51737f59 Fix "not all control paths return a value" MSVC warning. NFCI. 2021-06-05 19:42:00 +01:00
Simon Pilgrim 24b9bc8498 MsgPackReader.cpp - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 18:05:40 +01:00
Simon Pilgrim e32d73ef5e NativeFormatting.cpp - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 18:05:39 +01:00
Roman Lebedev e350494fb0
[NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
2021-06-05 12:17:51 +03:00
Nikita Popov db45746821 [LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.

When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.

This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.

In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.

Differential Revision: https://reviews.llvm.org/D103362
2021-06-05 10:32:00 +02:00
Vitaly Buka e3258b0894 Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always)."
Windows is still broken.

This reverts commit 927688a4cd.
2021-06-05 00:39:50 -07:00
Kevin Athey 927688a4cd Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always).
In addition:
  - optionally add global flag to capture compile intent for UAR:
    __asan_detect_use_after_return_always.
    The global is a SANITIZER_WEAK_ATTRIBUTE.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D103304
2021-06-05 00:26:10 -07:00
Fangrui Song 06e7de795b Fix some -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=off build 2021-06-04 23:34:43 -07:00
Jim Lin 170b70b74b [RISCV] Replace (XLenVT (VLOp GPR:$vl)) with VLOpFrag
This is for D100288 to reduce the changes.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103682
2021-06-05 12:49:31 +08:00
Vitaly Buka d8a4a2cb93 Revert "Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always)."
Reverts commits of D103304, it breaks Darwin.

This reverts commit 60e5243e59.
This reverts commit 26b3ea224e.
This reverts commit 17600ec32a.
2021-06-04 20:20:11 -07:00
Kevin Athey 60e5243e59 Update and improve compiler-rt tests for -mllvm -asan_use_after_return=(never|[runtime]|always).
In addition:
  - optionally add global flag to capture compile intent for UAR:
    __asan_detect_use_after_return_always.
    The global is a SANITIZER_WEAK_ATTRIBUTE.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D103304
2021-06-04 16:30:47 -07:00
Roman Lebedev 852497711d
[X86] AMD Zen 3: double the LoopMicroOpBufferSize
While the IndVars issue (PR50384) has been resolved,
and the compile performance improved, a new blocker emerged,
the codegen machine instruction scheduling is also quadratic.
So we still can't really specify the right value here.

Filed PR50584.
2021-06-05 01:23:58 +03:00
Fangrui Song 9e51d1f348 [InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat
`__profd_*` variables are referenced by code only when value profiling is
enabled. If disabled (e.g. default -fprofile-instr-generate), the symbols just
waste space on ELF/Mach-O. We change the comdat symbol from `__profd_*` to
`__profc_*` because an internal symbol does not provide deduplication features
on COFF. The choice doesn't matter on ELF.

(In -DLLVM_BUILD_INSTRUMENTED_COVERAGE=on build, there is now no `__profd_*` symbols.)

On Windows this enables further optimization. We are no longer affected by the
link.exe limitation: an external symbol in IMAGE_COMDAT_SELECT_ASSOCIATIVE can
cause duplicate definition error.
https://lists.llvm.org/pipermail/llvm-dev/2021-May/150758.html
We can thus use llvm.compiler.used instead of llvm.used like ELF (D97585).
This avoids many `/INCLUDE:` directives in `.drectve`.

Here is rnk's measurement for Chrome:
```
This reduced object file size of base_unittests.exe, compiled with coverage, optimizations, and gmlt debug info by 10%:

#BEFORE

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
1047758867

$ du -cksh base_unittests.exe
82M     base_unittests.exe
82M     total

# AFTER

$ find . -iname '*.obj' | xargs du -b | awk '{ sum += $1 } END { print sum}'
937886499

$ du -cksh base_unittests.exe
78M     base_unittests.exe
78M     total
```

The change is NFC for Mach-O.

Reviewed By: davidxl, rnk

Differential Revision: https://reviews.llvm.org/D103372
2021-06-04 13:27:56 -07:00
Nikita Popov 14f350daf2 [IndVars] Don't forget value when inferring nowrap flags
When SimplifyIndVars infers IR nowrap flags from SCEV, this may
happen in two ways: Either nowrap flags were already present in
SCEV and just get transferred to IR. Or zero/sign extension of
addrecs infers additional nowrap flags, and those get transferred
to IR. In the latter case, calling forgetValue() ensures that the
newly inferred nowrap flags get propagated to any other SCEV
expressions based on the addrec. However, the invalidation can
also have a major compile-time effect in some cases. For
https://bugs.llvm.org/show_bug.cgi?id=50384 with n=512 compile-
time drops from 7.1s to 0.8s without this invalidation. At the
same time, removing the invalidation doesn't affect any codegen
in test-suite.

Differential Revision: https://reviews.llvm.org/D103424
2021-06-04 20:57:22 +02:00
Rong Xu 8d581857d7 [SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.

This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.

Differential Revision: https://reviews.llvm.org/D103550
2021-06-04 11:22:06 -07:00
Adam Nemet ffde966cd9 [Matrix] Fix transpose-multiply folding if transpose has multiple uses
Don't add it to FusedInsts in this case.

Differential Revision: https://reviews.llvm.org/D103627
2021-06-04 10:55:03 -07:00
Jessica Paquette 507d193ea7 [AArch64][GlobalISel] Handle multiple phis in fixupPHIOpBanks
If we ended up with two phi instructions in a block, and we needed to fix up
the banks for the first one, we'd end up inserting our COPY before the second
phi.

E.g.

```
%x = G_PHI ...
%fixup = COPY ...
%y = G_PHI ...
```

This is invalid MIR, and breaks assumptions made by the register allocator later
down the line. With the verifier enabled, it also emits a verification error.

This teaches fixupPHIOpBanks to walk past any phi instructions in the block
when emitting the fixup copies.

Here's an example of the crashing code (same as added testcase):
https://godbolt.org/z/h5j1x3o6e

Differential Revision: https://reviews.llvm.org/D103582
2021-06-04 09:59:36 -07:00
Mark Schimmel 12592a439a Add commutable attribute to opcodes for ARC
This patch sets the isCommutable attribute for several opcodes that have
the "reg = OPCODE reg, reg" format.

Differential Revision: https://reviews.llvm.org/D103653
2021-06-04 19:49:19 +03:00
Joseph Huber 4a08163c73 [Attributor] Check HeapToStack's state for isKnownHeapToStack
This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack`
member functions to return if a function call is going to be altered by
HeapToStack.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103574
2021-06-04 12:38:33 -04:00
Craig Topper c653711fd3 [RISCV] Teach vsetvli insertion pass that operations on masks don't care about SEW/LMUL.
All that really matters is that the VLMAX of the preceding
instructions is the same as the VLMAX required by the mask
operation.

Also update the vmsge(u) handling to use the SEW/LMUL we use for
other mask register operations. We were matching it to the compare
before. Some cases will be improve if we fix masked compares to
use tail agnostic policy. I think they ignore the tail policy
anyway.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103299
2021-06-04 09:17:46 -07:00
Alexey Bataev c84a5448b5 [OPENMP]Fix PR50129: omp cancel parallel not working as expected.
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.

Differential Revision: https://reviews.llvm.org/D103646
2021-06-04 08:24:55 -07:00
Bradley Smith a85f5874e2 [AArch64] Remove SETCC of CSEL when the latter's condition can be inverted
setcc (csel 0, 1, cond, X), 1, ne ==> csel 0, 1, !cond, X

Where X is a condition code setting instruction.

Co-authored-by: Paul Walker <paul.walker@arm.com>

Differential Revision: https://reviews.llvm.org/D103256
2021-06-04 15:53:21 +01:00
Nico Weber e9a9c85098 Revert "[InstrProfiling] If no value profiling, make data variable private and (for Windows) use one comdat"
This reverts commit a14fc749aa.
Breaks check-profile on macOS. See https://reviews.llvm.org/D103372 for details.
2021-06-04 10:00:12 -04:00
Nicholas Guy 3043cbc436 [AArch64] Further enable UnrollAndJam
Due to the dependency on runtime unrolling, UnJ is only
enabled by default on in-order scheduling models,
and if a cpu is specified through -mcpu.

Differential Revision: https://reviews.llvm.org/D103604
2021-06-04 14:18:49 +01:00
Mirko Brkusanin 35ef4c940b [AMDGPU][GlobalISel] Legalize G_ABS
Legalize and select G_ABS so that we can use llvm.abs intrinsic

Differential Revision: https://reviews.llvm.org/D102391
2021-06-04 14:46:43 +02:00
Dmitry Preobrazhensky cd093cbb11 [AMDGPU][MC][NFC] Fixed typos in parser
Differential Revision: https://reviews.llvm.org/D103680
2021-06-04 15:40:42 +03:00
Bradley Smith e42ee2d509 [AArch64][SVE] Add support for using reverse forms of SVE2 shifts
When using and ACLE intrinsic for an SVE2 shift, if the predicate passed
has all relevant lanes active, then use a reversed version of the
instruction if beneficial.
2021-06-04 12:56:53 +01:00