Commit Graph

138936 Commits

Author SHA1 Message Date
Martin Storsjö 46416f0803 [CodeGen] [WinException] Remove a redundant explicit section switch for aarch64
The following EmitWinEHHandlerData() implicitly switches to .xdata, just
like on x86_64.

This became orphaned from the original code requiring it in
0b61d220c9 / https://reviews.llvm.org/D61095.

Differential Revision: https://reviews.llvm.org/D87447
2020-09-11 10:31:04 +03:00
Michael Liao f787fe15d8 [EarlyCSE] Remove unnecessary operand swap.
- As min/max are commutative operators, there is no need to swap
  operands. That breaks the convention calculating the hash value.
2020-09-11 02:14:04 -04:00
Alok Kumar Sharma e45b0708ae [DebugInfo] Fixing CodeView assert related to lowerBound field of DISubrange.
This is to fix CodeView build failure https://bugs.llvm.org/show_bug.cgi?id=47287
    after DIsSubrange upgrade D80197

    Assert condition is now removed and Count is calculated in case LowerBound
    is absent or zero and Count or UpperBound is constant. If Count is unknown
    it is later handled as VLA (currently Count is set to zero).

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D87406
2020-09-11 11:12:49 +05:30
Mircea Trofin da92448828 [NFC][MLInliner] Presort instruction successions.
Differential Revision: https://reviews.llvm.org/D87489
2020-09-10 21:40:49 -07:00
Michael Liao 41e68f7ee7 [EarlyCSE] Fix and recommit the revised c9826829d7
In addition to calculate hash consistently by swapping SELECT's
operands, we also need to inverse the select pattern favor to match the
original logic.

[EarlyCSE] Equivalent SELECTs should hash equally

DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 23:30:56 -04:00
Michael Liao 39dc75f66c Revert "[EarlyCSE] Equivalent SELECTs should hash equally"
This reverts commit c9826829d7 as it
breaks regression tests.
2020-09-10 22:37:35 -04:00
Zarko Todorovski 035396197a Remove unused variable introduce in 0448d11a06 causing build
failures with -Werror on.
2020-09-10 20:07:11 -04:00
Reid Kleckner 2c73bef7fa Fix wrong comment about enabling optimizations to work around a bug 2020-09-10 16:45:20 -07:00
Amara Emerson 0448d11a06 [AArch64][GlobalISel] Don't emit a branch for a fallthrough G_BR at -O0.
With optimizations we leave the decision to eliminate fallthrough branches to
bock placement, but at -O0 we should do it in the selector to save code size.

This regressed -O0 with a recent change to a combiner.
2020-09-10 15:01:26 -07:00
Reid Kleckner 4e3edef4b8 Use pragmas to work around MSVC x86_32 debug miscompile bug
Halide users reported this here: https://llvm.org/pr46176
I reported the issue to MSVC here:
https://developercommunity.visualstudio.com/content/problem/1179643/msvc-copies-overaligned-non-trivially-copyable-par.html

This codepath is apparently not covered by LLVM's unit tests, so I added
coverage in a unit test.

If we want to support this configuration going forward, it means that is
in general not safe to pass a SmallVector<T, N> by value if alignof(T)
is greater than 4. This doesn't appear to come up often because passing
a SmallVector by value is inefficient and not idiomatic: it copies the
inline storage. In this case, the SmallVector<LLT,4> is captured by
value by a lambda, and the lambda is passed by value into std::function,
and that's how we hit the bug.

Differential Revision: https://reviews.llvm.org/D87475
2020-09-10 14:50:01 -07:00
Florian Hahn fb109c42d9 [DSE] Switch to MemorySSA-backed DSE by default.
The tests have been updated and I plan to move them from the MSSA
directory up.

Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.

One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.

There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.

This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html

For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.

Impact on CTMark:
```
                                     Legacy Pass Manager
                        exec instrs    size-text
O3                       + 0.60%        - 0.27%
ReleaseThinLTO           + 1.00%        - 0.42%
ReleaseLTO-g.            + 0.77%        - 0.33%
RelThinLTO (link only)   + 0.87%        - 0.42%
RelLO-g (link only)      + 0.78%        - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
                                     New Pass Manager
                       exec instrs.   size-text
O3                       + 0.95%       - 0.25%
ReleaseThinLTO           + 1.34%       - 0.41%
ReleaseLTO-g.            + 1.71%       - 0.35%
RelThinLTO (link only)   + 0.96%       - 0.41%
RelLO-g (link only)      + 2.21%       - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions

Reviewed By: asbirlea, xbolva00, nikic

Differential Revision: https://reviews.llvm.org/D87163
2020-09-10 22:24:32 +01:00
Bryan Chan c9826829d7 [EarlyCSE] Equivalent SELECTs should hash equally
DenseMap<SimpleValue> assumes that, if its isEqual method returns true
for two elements, then its getHashValue method must return the same value
for them. This invariant is broken when one SELECT node is a min/max
operation, and the other can be transformed into an equivalent min/max by
inverting its predicate and swapping its operands. This patch fixes an
assertion failure that would occur intermittently while compiling the
following IR:

    define i32 @t(i32 %i) {
      %cmp = icmp sle i32 0, %i
      %twin1 = select i1 %cmp, i32 %i, i32 0
      %cmpinv = icmp sgt i32 0, %i
      %twin2 = select i1 %cmpinv,  i32 0, i32 %i
      %sink = add i32 %twin1, %twin2
      ret i32 %sink
    }

Differential Revision: https://reviews.llvm.org/D86843
2020-09-10 16:59:24 -04:00
Lang Hames c74900ca67 [ORC] Make MaterializationResponsibility immovable, pass by unique_ptr.
Making MaterializationResponsibility instances immovable allows their
associated VModuleKeys to be updated by the ExecutionSession while the
responsibility is still in-flight. This will be used in the upcoming
removable code feature to enable safe merging of resource keys even if
there are active compiles using the keys being merged.
2020-09-10 13:21:46 -07:00
Nikita Popov a5168bdb4a [DemandedBits][BDCE] Add support for min/max intrinsics
Add DemandedBits / BDCE support for min/max intrinsics: If the low
bits are not demanded in the result, they also aren't demanded in
the operands.

Differential Revision: https://reviews.llvm.org/D87161
2020-09-10 22:13:31 +02:00
Nikita Popov 99e78cb718 [DemandedBits] Add braces to large if (NFC)
While the if only contains a single statement, it happens to be
a huge switch. Add braces to make this code easier to read.
2020-09-10 22:13:27 +02:00
Volkan Keles d4bf90271f GlobalISel: Combine fneg(fneg x) to x
https://reviews.llvm.org/D87473
2020-09-10 12:57:38 -07:00
Anna Thomas b1b9806370 [ImplicitNullChecks] NFC: Remove unused PointerReg arg in dep analysis
The PointerReg arg was passed into the dependence function for an
assertion which no longer exists. So, this patch updates the dependence
functions to avoid the PointerReg in the signature.

Tests-Run: make check
2020-09-10 15:31:57 -04:00
Christopher Tetreault 7ddfd9b3eb [SVE] Bail from VectorUtils heuristics for scalable vectors
Bail from maskIsAllZeroOrUndef and maskIsAllOneOrUndef prior to iterating over the number of
elements for scalable vectors.

Assert that the mask type is not scalable in possiblyDemandedEltsInMask .

Assert that the types are correct in all three functions.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D87424
2020-09-10 12:29:37 -07:00
Krzysztof Parzyszek 783e28a508 [Hexagon] Split pair-based masked memops 2020-09-10 14:24:42 -05:00
Dominic Chen 4252f3009b
[WebAssembly] Set unreachable as canonical to permit disassembly
Currently, using llvm-objdump to disassemble a function containing
unreachable will trigger an assertion while decoding the opcode, since both
unreachable and debug_unreachable have the same encoding. To avoid this, set
unreachable as the canonical decoding.

Differential Revision: https://reviews.llvm.org/D87431
2020-09-10 15:04:16 -04:00
Craig Topper c195ae2f00 [SLPVectorizer][X86][AMDGPU] Remove fcmp+select to fmin/fmax reduction support.
Previously we could match fcmp+select to a reduction if the fcmp had
the nonans fast math flag. But if the select had the nonans fast
math flag, InstCombine would turn it into a fminnum/fmaxnum intrinsic
before SLP gets to it. Seems fairly likely that if one of the
fcmp+select pair have the fast math flag, they both would.

My plan is to start vectorizing the fmaxnum/fminnum version soon,
but I wanted to get this code out as it had some of the strangest
fast math flag behaviors.
2020-09-10 11:49:19 -07:00
Fangrui Song a0ffe2b21a [PGO] Skip if an IndirectBrInst critical edge cannot be split
PGOInstrumentation runs `SplitIndirectBrCriticalEdges` but some IndirectBrInst
critical edge cannot be split. `getInstrBB` will crash when calling `SplitCriticalEdge`, e.g.

  int foo(char *p) {
    void *targets[2];
    targets[0] = &&indirect;
    targets[1] = &&end;
    for (;; p++)
      if (*p == 7) {
  indirect:
        goto *targets[p[1]]; // the self loop is critical in -O
      }
  end:
    return 0;
  }

Skip such critical edges to prevent a crash.

Reviewed By: davidxl, lebedev.ri

Differential Revision: https://reviews.llvm.org/D87435
2020-09-10 11:04:14 -07:00
Anna Thomas 46329f6079 [ImplicitNullCheck] Handle instructions that preserve zero value
This is the first in a series of patches to make implicit null checks
more general. This patch identifies instructions that preserves zero
value of a register and considers that as a valid instruction to hoist
along with the faulting load. See added testcases.

Reviewed-By: reames, dantrushin

Differential Revision: https://reviews.llvm.org/D87108
2020-09-10 13:39:50 -04:00
Mircea Trofin e543708e5e [NFC][ThinLTO] Let llvm::EmbedBitcodeInModule handle serialization.
llvm::EmbedBitcodeInModule handles serializing the passed-in module, if
the provided MemoryBufferRef is invalid. This is already the path taken
in one of the uses of the API - clang::EmbedBitcode, when called from
BackendConsumer::HandleTranslationUnit - so might as well do the same
here and reduce (by very little) code duplication.

The only difference this patch introduces is that the serialization happens
with ShouldPreserveUseListOrder set to true.

Differential Revision: https://reviews.llvm.org/D87339
2020-09-10 10:25:00 -07:00
Ettore Tiotto 6b13cfe739 [ArgumentPromotion]: Copy function metadata after promoting arguments
The argument promotion pass currently fails to copy function annotations
over to the modified function after promoting arguments.
This patch copies the original function annotation to the new function.

Reviewed By: fhann

Differential Revision: https://reviews.llvm.org/D86630
2020-09-10 13:08:57 -04:00
Kit Barton 009cd4e491 [PPC][GlobalISel] Add initial GlobalIsel infrastructure
This adds the initial GlobalISel skeleton for PowerPC. It can only run
ir-translator and legalizer for `ret void`.

This is largely based on the initial GlobalISel patch for RISCV
(https://reviews.llvm.org/D65219).

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D83100
2020-09-10 11:58:01 -05:00
Simon Pilgrim f42f733af9 SwitchLoweringUtils.h - reduce TargetLowering.h include. NFCI.
Only include the headers we actually need, and move the remaining includes down to implicit dependent files.
2020-09-10 17:42:18 +01:00
Owen Anderson 3d9c85e4d8 Mark FMOV constant materialization as being as cheap as a move.
This prevents us from doing things like LICM'ing it out of a loop,
which is usually a net loss because we end up having to spill a
callee-saved FPR to accomodate it.

This does perturb instruction scheduling around this instruction,
so a number of tests had to be updated to account for it.

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D87316
2020-09-10 16:38:59 +00:00
Simon Pilgrim 601557e9f9 Hexagon.h - remove unnecessary includes. NFCI.
Replace with forward declarations and move includes to implicit dependent files.
2020-09-10 16:59:43 +01:00
Krzysztof Parzyszek 8a08740db6 [GVN] Account for masked loads/stores depending on load/store instructions
This is a case where an intrinsic depends on a non-call instruction.

Differential Revision: https://reviews.llvm.org/D87423
2020-09-10 10:57:33 -05:00
Simon Pilgrim b585fdae24 [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.
2020-09-10 16:05:33 +01:00
Simon Pilgrim 9f830e0af7 AArch64MachineFunctionInfo.h - remove unnecessary TargetFrameLowering.h include. NFCI. 2020-09-10 16:05:33 +01:00
Nikita Popov 4e413e1621 [InstCombine] Temporarily do not drop volatile stores before unreachable
See discussion in D87149. Dropping volatile stores here is legal
per LLVM semantics, but causes issues for real code and may result
in a change to LLVM volatile semantics. Temporarily treat volatile
stores as "not guaranteed to transfer execution" in just this place,
until this issue has been resolved.
2020-09-10 16:16:44 +02:00
Jay Foad 517202c720 [TargetLowering] Fix comments describing XOR -> OR/AND transformations 2020-09-10 13:56:34 +01:00
Florian Hahn a5ec99da6e [DSE] Support eliminating memcpy.inline.
MemoryLocation has been taught about memcpy.inline, which means we can
get the memory locations read and written by it. This means DSE can
handle memcpy.inline
2020-09-10 13:19:25 +01:00
Max Kazantsev 8c0bbbade1 [NFC] Refactoring in SCEV: add missing `const` qualifiers 2020-09-10 19:06:37 +07:00
Simon Pilgrim de25ebaac6 [CostModel][X86] Add vXi32 division by uniform constant costs (PR47476)
Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.
2020-09-10 12:17:54 +01:00
Simon Pilgrim fc49abee56 [X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle.
lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.
2020-09-10 11:45:08 +01:00
Simon Pilgrim e80605e242 [X86] Remove WaitInsert::TTI member. NFCI.
This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.
2020-09-10 11:45:08 +01:00
Kerry McLaughlin cd89f5c91b [SVE][CodeGen] Legalisation of truncate for scalable vectors
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86548
2020-09-10 11:35:33 +01:00
Juneyoung Lee 1b9884df8d Enable InsertFreeze flag of JumpThreading when used in LTO
This patch enables inserting freeze when JumpThreading converts a select to
a conditional branch when it is run in LTO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85534
2020-09-10 19:05:49 +09:00
Sam Tebbs b81c57d646 [ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane
values

The effects of unpredicated vector instruction with unknown
lanes cannot be predicted and therefore cannot be tail predicated. This
does not apply to predicated vector instructions and so this patch
allows tail predication on them.

Differential Revision: https://reviews.llvm.org/D87376
2020-09-10 10:34:32 +01:00
Sam Parker 0bdf8c9127 [SCEV] Constant expansion cost at minsize
As code size is the only thing we care about at minsize, query the
cost of materialising immediates when calculating the cost of a SCEV
expansion. We also modify the CostKind to TCK_CodeSize for minsize,
instead of RecipThroughput.

Differential Revision: https://reviews.llvm.org/D76434
2020-09-10 08:21:11 +01:00
Sam Parker 1919b65052 [ARM] Tail predicate VQDMULH and VQRDMULH
Mark the family of instructions as valid for tail predication.

Differential Revision: https://reviews.llvm.org/D87348
2020-09-10 08:20:07 +01:00
Juneyoung Lee 39c1653b3d [JumpThreading] Conditionally freeze its condition when unfolding select
This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ).
To minimize its impact to the quality of generated code, I suggest enabling
this only for LTO as a start (it has two JumpThreading passes registered).
This patch contains a flag that makes JumpThreading enable it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84940
2020-09-10 15:49:40 +09:00
Max Kazantsev cde8fc65ae [NFC] Rename variables to avoid name confusion
Name `LI` is used for loop info, loop and load inst at the same
function, which causes a lot of confusion.
2020-09-10 13:41:10 +07:00
Max Kazantsev c413a8a8ec [LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457
The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV
is no loger a SCEVAddRec after loop versioning. The code there assumes that it is
always a SCEVAddRec and crashes otherwise.

This patch makes sure that we do not consider candidates for which this requirement
is broken after the versioning.

Differential Revision: https://reviews.llvm.org/D87355
Reviewed By: asbirlea
2020-09-10 13:30:31 +07:00
Qiu Chaofan 6afb279100 [PowerPC] [FPEnv] Disable strict FP mutation by default
22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87222
2020-09-10 13:28:09 +08:00
Petr Hosek c4d7536136 [CMake] Simplify CMake handling for libxml2
This matches the changes made to handling of zlib done in 10b1b4a
where we rely on find_package and the imported target rather than
manually appending the library and include paths. The use of
LLVM_LIBXML2_ENABLED has been replaced by LLVM_ENABLE_LIBXML2
thus reducing the number of variables.

Differential Revision: https://reviews.llvm.org/D84563
2020-09-09 21:44:44 -07:00
Jordan Rupprecht 52f0837778 [NFC] Move definition of variable now only used in debug builds 2020-09-09 20:23:59 -07:00
Matt Arsenault e15215e041 AMDGPU: Hoist check for VGPRs 2020-09-09 19:45:40 -04:00
Matt Arsenault 85490874b2 AMDGPU: Skip all meta instructions in hazard recognizer
This was not adding a necessary nop due to thinking the kill counted.
2020-09-09 19:45:40 -04:00
Matt Arsenault 82cbc9330a AMDGPU: Fix inserting waitcnts before kill uses 2020-09-09 19:45:40 -04:00
Juneyoung Lee a6183d0f02 [ValueTracking] isKnownNonZero, computeKnownBits for freeze
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.

```
  br (x != 0), BB1, BB2
BB1:
  y = freeze x
```

In the above program, we can say that y is non-zero. The reason is as follows:

(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D75808
2020-09-10 08:07:38 +09:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Jessica Paquette 480e7f43a2 [AArch64][GlobalISel] Share address mode selection code for memops
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.

As a result, we were missing cases like this:

```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```

https://godbolt.org/z/16r7ad

This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.

This is a 0.2% geomean code size improvement for CTMark at -O3.

There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.

Differential Revision: https://reviews.llvm.org/D87397
2020-09-09 15:14:46 -07:00
Florian Hahn 9969c317ff [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
2020-09-09 23:01:58 +01:00
Nikita Popov 0a5dc7effb [DAGCombiner] Fold fmin/fmax of NaN
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
2020-09-09 23:53:32 +02:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Amara Emerson 467a071285 [GlobalISel][IRTranslator] Generate better conditional branch lowering.
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.

Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.

Differential Revision: https://reviews.llvm.org/D86665
2020-09-09 13:16:11 -07:00
Amara Emerson cc76da7ada [GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less.
This combine previously tried to take sequences like:
  %cond = G_ICMP pred, a, b
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.

I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.

Differential Revision: https://reviews.llvm.org/D86664
2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi 0ab6a15698 [X86] Add support for using fast short rep mov for memcpy lowering.
Disabled by default behind an option.

Differential Revision: https://reviews.llvm.org/D86883
2020-09-09 12:46:40 -07:00
Jian Cai 415a4fbea7 [MC] Resolve the difference of symbols in consecutive MCDataFragements
Try to resolve the difference of two symbols in consecutive MCDataFragments.
This is important for an idiom like "foo:instr; .if . - foo; instr; .endif"
(https://bugs.llvm.org/show_bug.cgi?id=43795).

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D69411
2020-09-09 12:35:43 -07:00
Fangrui Song ad61e346d3 [gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output 2020-09-09 12:34:43 -07:00
Krzysztof Parzyszek 0ee54cf883 [Hexagon] Account for truncating pairs to non-pairs when widening truncates
Added missing selection patterns for vpackl.
2020-09-09 14:31:52 -05:00
Fangrui Song dbac20bb6b [gcov] Don't split entry block; add a synthetic entry block instead
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).

  int main() { // BB0
    return 0;  // BB2 (due to entry block splitting)
  }
  // BB1 is the exit block (since gcov 4.8)

This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.

  // BB0 is the synthetic entry block with a single edge to BB2
  int main() { // BB2
    return 0;  // BB2
  }
  // BB1 is the exit block (since gcov 4.8)
2020-09-09 12:25:24 -07:00
Guillaume Chatelet 5a4a0cfcfb [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD)
This is preparatory work to unable storing alignment for AtomicCmpXchgInst.
See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168

This is the fixed version of D83375, which was submitted and reverted.

Differential Revision: https://reviews.llvm.org/D87373
2020-09-09 19:10:30 +00:00
Mark de Wever 08196e0b2e Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek 81ff2d30a9 [DSE] Handle masked stores 2020-09-09 13:31:31 -05:00
Ulrich Weigand 1a25133bcd [DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist.  Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.

This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
2020-09-09 19:13:46 +02:00
Mircea Trofin 4b15fc9ddb [NFC][MLInliner] Don't initialize in an assert.
Since the build bots have assertions enabled, this flew under the radar.
2020-09-09 09:56:07 -07:00
Simon Pilgrim 6e45b98934 X86CallFrameOptimization.cpp - use const references where possible. NFCI. 2020-09-09 16:35:08 +01:00
Simon Pilgrim e706116e11 X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI.
Don't use auto for non-obvious types, and use const references.
2020-09-09 16:15:02 +01:00
Qiu Chaofan 88ff4d2ca1 [PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.

One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87220
2020-09-09 22:40:58 +08:00
Jay Foad 649bde488c [AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter
NFC.
2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky 95b7040e43 [AMDGPU][MC] Improved diagnostic messages for invalid registers
Corrected parser to issue meaningful error messages for invalid and malformed registers.

See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87234
2020-09-09 16:44:03 +03:00
Alon Kom 818cf30b83 [MachinePipeliner] Fix II_setByPragma initialization
II_setByPragma was not reset between 2 calls of the MachinePipleiner pass

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D87088
2020-09-09 13:38:35 +00:00
Denis Antrushin 4358fa782e [Statepoints] Update DAG root after emitting statepoint.
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87251
2020-09-09 20:22:10 +07:00
Ronak Chauhan f078577f31 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit 487a805310.

Tests fail on big endian machines.
2020-09-09 18:01:28 +05:30
Simon Pilgrim d816499f95 [KnownBits] Move SelectionDAG::computeKnownBits ISD::ABS handling to KnownBits::abs
Move the ISD::ABS handling to a KnownBits::abs handler, to simplify future implementations in ValueTracking/GlobalISel.
2020-09-09 13:22:58 +01:00
David Stenberg 48fc781438 [UnifyFunctionExitNodes] Fix Modified status for unreachable blocks
If a function had at most one return block, the pass would return false
regardless if an unified unreachable block was created.

This patch fixes that by refactoring runOnFunction into two separate
helper functions for handling the unreachable blocks respectively the
return blocks, as suggested by @bjope in a review comment.

This was caught using the check introduced by D80916.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D85818
2020-09-09 13:36:03 +02:00
Juneyoung Lee 36c8621638 [BuildLibCalls] Add more noundef to library functions
This patch follows D85345 and adds more noundef attributes to return values/arguments of library functions
that are mostly about accessing the file system or processes.

A few functions like `chmod` or `times` use typedef `mode_t` and `clock_t`.
They are neither struct nor union, so they cannot contain undef even if they're lowered to iN in IR. So, it is fine to add noundef to them.

- clock_t's actual type is size_t (C17, 7.27.1.3), so it isn't struct or union.

- For mode_t, either int or long is used in practice because programmers use bit manipulation. So, I think it is okay that it's never aggregate in practice.

After this patch, the remaining library functions are those that eagerly participate in optimizations: they can be removed, reordered, or
introduced by a transformation from primitive IR operations.
For them, a few testings is needed, since it may not be valid to add noundef anymore even if C standard says it's okay.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D85894
2020-09-09 20:33:35 +09:00
Juneyoung Lee 25ce1e0497 [ValueTracking] Add UndefOrPoison/Poison-only version of relevant functions
This patch adds isGuaranteedNotToBePoison and programUndefinedIfUndefOrPoison.

isGuaranteedNotToBePoison will be used at D75808. The latter function is used at isGuaranteedNotToBePoison.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D84242
2020-09-09 20:00:26 +09:00
Simon Pilgrim 455cce3e21 TrigramIndex.cpp - remove unnecessary includes. NFCI.
TrigramIndex.h already includes most of these.
2020-09-09 11:38:31 +01:00
Simon Pilgrim f16b2d8315 ARMTargetParser.cpp - use auto const references in for range loops. NFCI.
Fix static analysis warnings about unnecessary copies.
2020-09-09 11:38:31 +01:00
Simon Pilgrim 24ecfdac7b [APFloat] Fix uninitialized variable in IEEEFloat constructors
Some constructors of IEEEFloat do not initialize member variable exponent.
Fix it by initializing exponent with the following values:

For NaNs, the `exponent` is `maxExponent+1`.
For Infinities, the `exponent` is `maxExponent+1`.
For Zeroes, the `exponent` is `maxExponent-1`.

Patch by: @nullptr.cpp (Yang Fan)

Differential Revision: https://reviews.llvm.org/D86997
2020-09-09 11:38:30 +01:00
Mirko Brkusanin 43af2a6faa [AMDGPU] Workaround for LDS Misalignment bug on GFX10
Add subtarget feature check to avoid using ds_read/write_b96/128 with too
low alignment if a bug is present on that specific hardware.
Add this "feature" to GFX 10.1.1 as it is also affected.
Add global-isel test.
2020-09-09 11:46:09 +02:00
Florian Hahn 2bcc4db761 [EarlyCSE] Explicitly require AAResultsWrapperPass.
The MemorySSAWrapperPass depends on AAResultsWrapperPass and if
MemorySSA is preserved but AAResultsWrapperPass is not, this could lead
to a crash when updating the last user of the MemorySSAWrapperPass.

Alternatively AAResultsWrapperPass could be marked preserved by GVN, but
I am not sure if that would be safe. I am not sure what is required in
order to preserve AAResultsWrapperPass. At the moment, it seems like a
couple of passes that do similar transforms to GVN are preserving it.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87137
2020-09-09 09:14:50 +01:00
Denis Antrushin 2a52c3301a [Statepoints] Properly handle const base pointer.
Current code in InstEmitter assumes all GC pointers are either
VRegs or stack slots - hence, taking only one operand.
But it is possible to have constant base, in which case it
occupies two machine operands.

Add a convinience function to StackMaps to get index of next
meta argument and use it in InsrEmitter to properly advance to
the next statepoint meta operand.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87252
2020-09-09 14:07:00 +07:00
Sam Parker 3ebc755227 [ARM] Try to rematerialize VCTP instructions
We really want to try and avoid spilling P0, which can be difficult
since there's only one register, so try to rematerialize any VCTP
instructions.

Differential Revision: https://reviews.llvm.org/D87280
2020-09-09 07:41:22 +01:00
Johannes Doerfert d445b6dfec [Attributor] Cleanup `::initialize` of various AAs
This commit cleans up the ::initialize method of various AAs in the
following ways:
  - If an associated function is required, give up on declarations.
    This was discovered as a real problem when lots of llvm.dbg.XXX
    call sites were assumed `noreturn` until proven otherwise. That
    does not make any sense and caused huge regressions and missed
    deductions.
  - Require more associated declarations for function interface AAs.
  - Use the IRAttribute::initialize to determine if function interface
    AAs can be used in IPO, don't replicate the checks (especially
    isFunctionIPOAmendable) all over the place. Arguably the function
    declaration check should be moved to some central place to.
2020-09-09 01:38:25 -05:00
Fangrui Song 6a9a0bfc33 [llvm-cov gcov] Simply computation of line counts and exit block counter 2020-09-08 23:15:37 -07:00
Johannes Doerfert 849146ba93 [Attributor] Associate the callback callee with a call site argument (if any)
If we have a callback, call site arguments were already associated with
the callback callee. Now we also associate the function with the
callback callee, thus we know ensure that the following holds true (if
all return nonnull):
   `getAssociatedArgument()->getParent() == getAssociatedFunction()`

To test this an early exit from
  `AAMemoryBehaviorCallSiteArgument::initialize``
is included as well. Without the change to getAssociatedFunction() this
kind of early exit for declarations would cause callback call site
arguments to miss out.
2020-09-09 00:52:17 -05:00
Johannes Doerfert cefd2a2c70 [Attributor] Cleanup `IRPosition::getArgNo` usages
As we handle callback calls we need to disambiguate the call site
argument number from the callee argument number. While always equal in
non-callback calls, a callback comes with a partial parameter-argument
mapping so there is no implicit correspondence. Here we split
`IRPosition::getArgNo()` into two public functions, `getCallSiteArgNo()`
and `getCalleeArgNo()`. Usages are adjusted to pick the right one for
their purpose. This fixed some problems that would have been exposed as
we more aggressively optimize callbacks.
2020-09-09 00:52:17 -05:00
Johannes Doerfert c0ab901bdd [Attributor] Selectively look at the callee even when there are operand bundles
While operand bundles carry unpredictable semantics, we know some of
them and can therefore "ignore" them. In this case we allow to look at
the declaration of `llvm.assume` when asked for the attributes at a call
site. The assume operand bundles we have do not invalidate the
declaration attributes.

We cannot test this in isolation because the llvm.assume attributes are
determined by the parser. However, a follow up patch will provide test
coverage.
2020-09-09 00:52:17 -05:00
Johannes Doerfert d5d75f61e5 [Attributor] Provide a command line option that limits recursion depth
In `MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.cpp` we initialized
attributes until stack frame ~35k caused space to run out. The initial
size 1024 is pretty much random.
2020-09-09 00:47:02 -05:00
Max Kazantsev 795e4ee9d2 [NFC] Move functon from IndVarSimplify to SCEV
This function can be reused in other places.

Differential Revision: https://reviews.llvm.org/D87274
Reviewed By: fhahn, lebedev.ri
2020-09-09 11:20:59 +07:00
Krzysztof Parzyszek c2b7b9b642 [Hexagon] Fix order of operands in V6_vdealb4w 2020-09-08 22:09:28 -05:00
Fangrui Song b9d086693b [llvm-cov gcov] Compute unmeasured arc counts by Kirchhoff's circuit law
For a CFG G=(V,E), Knuth describes that by Kirchoff's circuit law, the minimum
number of counters necessary is |E|-(|V|-1). The emitted edges form a spanning
tree. libgcov emitted .gcda files leverages this optimization while clang
--coverage's doesn't.

Propagate counts by Kirchhoff's circuit law so that llvm-cov gcov can
correctly print line counts of gcc --coverage emitted files and enable
the future improvement of clang --coverage.
2020-09-08 18:45:11 -07:00
Brad Smith 88b368a1c4 [PowerPC] Set setMaxAtomicSizeInBitsSupported appropriately for 32-bit PowerPC in PPCTargetLowering
Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D86165
2020-09-08 21:21:14 -04:00
Mircea Trofin 4013bab9c4 [NFC][ThinLTO] EmbedBitcodeSection doesn't need the Config
Instead, passing in the command line options, initialized to nullptr. In
an upcoming patch, we can then use the parameter to pass actual command
line options.

Differential Revision: https://reviews.llvm.org/D87336
2020-09-08 17:14:44 -07:00
Krzysztof Parzyszek 055d209589 Handle masked loads and stores in MemoryLocation/Dependence
Differential Revision: https://reviews.llvm.org/D87061
2020-09-08 19:08:44 -05:00
David Blaikie be561fad1e Remove unused variable(s) 2020-09-08 16:58:01 -07:00
Craig Topper 844e94a502 [SelectionDAGBuilder] Remove Unnecessary FastMathFlags temporary. Use SDNodeFlags instead. NFCI
This was a missed simplication in D87200
2020-09-08 15:50:12 -07:00
David Blaikie 69da27c749 llvm-symbolizer: Add optional "start file" to match "start line"
Since a function might have portions of its code coming from multiple
different files, "start line" is ambiguous (it can't just be resolved
relative to the file/line specified). Add start file to disambiguate it.
2020-09-08 15:40:58 -07:00
Craig Topper b1e68f885b [SelectionDAGBuilder] Pass fast math flags to getNode calls rather than trying to set them after the fact.:
This removes the after the fact FMF handling from D46854 in favor of passing fast math flags to getNode. This should be a superset of D87130.

This required adding a SDNodeFlags to SelectionDAG::getSetCC.

Now we manage to contant fold some stuff undefs during the
initial getNode that we don't do in later DAG combines.

Differential Revision: https://reviews.llvm.org/D87200
2020-09-08 15:27:21 -07:00
Krzysztof Parzyszek d183f47261 [Hexagon] Handle widening of truncation's operand with legal result
Failing example: v8i8 = truncate v8i32. v8i8 is legal, but v8i32 was
widened to HVX. Make sure that v8i8 does not get altered (even if it's
changed to another legal type).
2020-09-08 16:07:39 -05:00
Nikita Popov 8453fbf088 [ValueTracking] Compute known bits of min/max intrinsics
Implement known bits for the min/max intrinsics based on the
recently added KnownBits primitives.
2020-09-08 21:08:17 +02:00
David Stenberg 17dce2fe43 [UnifyFunctionExitNodes] Remove unused getters, NFC
The get{Return,Unwind,Unreachable}Block functions in
UnifyFunctionExitNodes have not been used for many years,
so just remove them.

Reviewed By: bjope

Differential Revision: https://reviews.llvm.org/D87078
2020-09-08 20:42:28 +02:00
Nikita Popov f6b87da0c7 [InstCombine] Fold comparison of abs with int min
If the abs is poisoning, this is already folded to true/false.
For non-poisoning abs, we can convert this to a comparison with
the operand.
2020-09-08 20:23:03 +02:00
Nikita Popov e97f3b1b43 [InstCombine] Fold abs of known negative operand
If we know that the abs operand is known negative, we can replace
it with a neg.

To avoid computing known bits twice, I've removed the fold for the
non-negative case from InstSimplify. Both the non-negative and the
negative case are handled by InstCombine now, with one known bits call.

Differential Revision: https://reviews.llvm.org/D87196
2020-09-08 20:14:35 +02:00
Xun Li 59a467ee4f [Coroutine] Make dealing with alloca spills more robust
D66230 attempted to fix a problem where when there are allocas used before CoroBegin.
It keeps allocas and their uses stay in put if there are no escapse/changes to the data before CoroBegin.
Unfortunately that's incorrect.
Consider this code:

%var = alloca i32
%1 = getelementptr .. %var; stays put
%f = call i8* @llvm.coro.begin
store ... %1
After this fix, %1 will now stay put, however if a store happens after coro.begin and hence modifies the content, this change will not be reflected in the coroutine frame (and will eventually be DCEed).
To generalize the problem, if any alias ptr is created before coro.begin for an Alloca and that alias ptr is latter written into after coro.begin, it will lead to incorrect behavior.

There are also a few other minor issues, such as incorrect dominate condition check in the ptr visitor, unhandled memory intrinsics and etc.
Ths patch attempts to fix some of these issue, and make it more robust to deal with aliases.

While visiting through the alloca pointer, we also keep track of all aliases created that will be used after CoroBegin. We track the offset of each alias, and then reacreate these aliases after CoroBegin using these offset.
It's worth noting that this is not perfect and there will still be cases we cannot handle. I think it's impractical to handle all cases given the current design.
This patch makes it more robust and should be a pure win.
In the meantime, we need to think about what how to completely elimiante these issues, likely through the route as @rjmccall mentioned in D66230.

Differential Revision: https://reviews.llvm.org/D86859
2020-09-08 10:59:13 -07:00
Craig Topper e6bb4c8e7b [X86] SSE4_A should only imply SSE3 not SSSE3 in the frontend.
SSE4_1 and SSE4_2 due imply SSSE3. So I guess I got confused when
switching the code to being table based in D83273.

Fixes PR47464
2020-09-08 10:50:59 -07:00
Simon Pilgrim 0dacf3b5ac RISCVMatInt.h - remove unnecessary includes. NFCI.
Add APInt forward declaration and move include to RISCVMatInt.cpp
2020-09-08 18:25:24 +01:00
Volkan Keles 1242dd330d GlobalISel: Combine `op undef, x` to 0
https://reviews.llvm.org/D86611
2020-09-08 09:46:38 -07:00
Heejin Ahn d25c17f317 [WebAssembly] Fix fixEndsAtEndOfFunction for try-catch
When the function return type is non-void and `end` instructions are at
the very end of a function, CFGStackify's `fixEndsAtEndOfFunction`
function fixes the corresponding block/loop/try's type to match the
function's return type. This is applied to consecutive `end` markers at
the end of a function. For example, when the function return type is
`i32`,
```
block i32    ;; return type is fixed to i32
  ...
  loop i32   ;; return type is fixed to i32
    ...
  end_loop
end_block
end_function
```

But try-catch is a little different, because it consists of two parts:
a try part and a catch part, and both parts' return type should satisfy
the function's return type. Which means,
```
try i32      ;; return type is fixed to i32
  ...
  block i32  ;; this should be changed i32 too!
    ...
  end_block
catch
  ...
end_try
end_function
```
As you can see in this example, it is not sufficient to only `end`
instructions at the end of a function; in case of `try`, we should
check instructions before `catch`es, in case their corresponding `try`'s
type has been fixed.

This changes `fixEndsAtEndOfFunction`'s algorithm to use a worklist
that contains a reverse iterator, each of which is a starting point for
a new backward `end` instruction search.

Fixes https://bugs.llvm.org/show_bug.cgi?id=47413.

Reviewed By: dschuff, tlively

Differential Revision: https://reviews.llvm.org/D87207
2020-09-08 09:27:40 -07:00
Simon Pilgrim 3c83b967cf LiveRegUnits.h - reduce MachineRegisterInfo.h include. NFC.
We only need to include MachineInstrBundle.h, but exposes an implicit dependency in MachineOutliner.h.

Also, remove duplicate includes from LiveRegUnits.cpp + MachineOutliner.cpp.
2020-09-08 17:27:00 +01:00
Ronak Chauhan 487a805310 [AMDGPU] Support disassembly for AMDGPU kernel descriptors
Decode AMDGPU Kernel descriptors as assembler directives.

Reviewed By: scott.linder, jhenderson, kzhuravl

Differential Revision: https://reviews.llvm.org/D80713
2020-09-08 21:26:11 +05:30
Jonas Paulsson 6dc3e22b57 [DAGTypeLegalizer] Handle ZERO_EXTEND of promoted type in WidenVecRes_Convert.
On SystemZ, a ZERO_EXTEND of an i1 vector handled by WidenVecRes_Convert()
always ended up being scalarized, because the type action of the input is
promotion which was previously an unhandled case in this method.

This fixes https://bugs.llvm.org/show_bug.cgi?id=47132.

Differential Revision: https://reviews.llvm.org/D86268

Patch by Eli Friedman.
Review: Ulrich Weigand
2020-09-08 16:49:51 +02:00
Florian Hahn c7b7c32f4a [DSE,MemorySSA] Increase walker limit a bit.
This slightly bumps the walker limit so that it covers more cases while
not increasing compile-time too much:
http://llvm-compile-time-tracker.com/compare.php?from=0fc1c2b51ba0cfb9145139af35be638333865251&to=91144a50ea4fa82c0c877e77784f60371640b263&stat=instructions
2020-09-08 14:55:46 +01:00
Simon Pilgrim fcff2c32c0 X86CallLowering.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Simon Pilgrim 0729ae367a X86DomainReassignment.cpp - improve auto const/pointer/reference qualifiers. NFCI.
Fix clang-tidy warnings by ensuring auto variables are more cleanly qualified, or just avoid auto entirely.
2020-09-08 13:01:23 +01:00
Xing GUO 25c3fa3f13 [DWARFYAML] Make the debug_ranges section optional.
This patch makes the debug_ranges section optional. When we specify an
empty debug_ranges section, yaml2obj only emits the section header.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D87263
2020-09-08 19:55:47 +08:00
Sam Tebbs 7aabb6ad77 [ARM][LowOverheadLoops] Remove modifications to the correct element
count register

After my patch at D86087, code that now uses the mov operand rather than
the vctp operand will no longer remove modifications to the vctp operand
as they should. This patch fixes that by explicitly removing
modifications to the vctp operand rather than the register used as the
element count.
2020-09-08 10:30:05 +01:00
Qiu Chaofan 8d9c13f37d Revert "[PowerPC] Implement instruction clustering for stores"
This reverts commit 3c0b325023, (along
with ea795304 and bb39eb9e) since it breaks test with UB sanitizer.
2020-09-08 17:24:08 +08:00
Serge Guelton 38778e1087 Provide anchor for compiler extensions
This patch is cherry-picked from 04b0a4e22e3b4549f9d241f8a9f37eebecb62a31, and
amended to prevent an undefined reference to `llvm::EnableABIBreakingChecks'
2020-09-08 10:33:38 +02:00
Qiu Chaofan bb39eb9e7f [PowerPC] Fix getMemOperandWithOffsetWidth
Commit 3c0b3250 introduced memory cluster under pwr10 target, but a
check for operands was unexpectedly removed. This adds it back to avoid
regression.
2020-09-08 15:35:25 +08:00
Simon Wallis 8ee1419ab6 [AARCH64][RegisterCoalescer] clang miscompiles zero-extension to long long
Implement AArch64 variant of shouldCoalesce() to detect a known failing case
and prevent the coalescing of a 32-bit copy into a 64-bit sign-extending load.

Do not coalesce in the following case:
COPY where source is bottom 32 bits of a 64-register,
and destination is a 32-bit subregister of a 64-bit register,
ie it causes the rest of the register to be implicitly set to zero.

A mir test has been added.

In the test case, the 32-bit copy implements a 32 to 64 bit zero extension
and relies on the upper 32 bits being zeroed.

Coalescing to the result of the 64-bit load meant overwriting
the upper 32 bits incorrectly when the loaded byte was negative.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D85956
2020-09-08 08:04:52 +01:00
Mikael Holmen ea795304ec [PowerPC] Add parentheses to silence gcc warning
Without gcc 7.4 warns with

../lib/Target/PowerPC/PPCInstrInfo.cpp:2284:25: warning: suggest parentheses around '&&' within '||' [-Wparentheses]
          BaseOp1.isFI() &&
          ~~~~~~~~~~~~~~~^~
              "Only base registers and frame indices are supported.");
              ~
2020-09-08 08:39:57 +02:00
Andrew Wei 78071fb524 [LSR] Canonicalize a formula before insert it into the list
In GenerateConstantOffsetsImpl, we may generate non canonical Formula
if BaseRegs of that Formula is updated and includes a recurrent expr reg
related with current loop while its ScaledReg is not.

Patched by: mdchen
Reviewed By: qcolombet

Differential Revision: https://reviews.llvm.org/D86939
2020-09-08 13:14:53 +08:00
Johannes Doerfert 711bf7dcf9 [Attributor][FIX] Don't crash on internalizing linkonce_odr hidden functions
The CloneFunctionInto has implicit requirements with regards to the
linkage and visibility of the function. We now update these after we did
the CloneFunctionInto on the copy with the same linkage and visibility
as the original.
2020-09-07 23:38:09 -05:00
Johannes Doerfert e6208849c8 [Attributor][NFC] Change variable spelling 2020-09-07 23:38:09 -05:00
Johannes Doerfert 8637acac5a [Attributor][NFC] Clang tidy: no else after continue 2020-09-07 23:38:08 -05:00
Johannes Doerfert ff70c25d76 [Attributor][NFC] Expand `auto` types (clang-fix-it) 2020-09-07 23:38:08 -05:00
Johannes Doerfert 79651265b2 [Attributor][FIX] Properly return changed if the IR was modified
Deleting or replacing anything is certainly a modification. This caused
a later assertion in IPSCCP when compiling 400.perlbench with the new PM.
I'm not sure how to test this.
2020-09-07 23:38:08 -05:00
Qiu Chaofan 3c0b325023 [PowerPC] Implement instruction clustering for stores
On Power10, it's profitable to schedule some stores with adjacent target
address together. This patch implements this feature.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D86754
2020-09-08 11:03:09 +08:00
Florian Hahn efb8e156da [DSE,MemorySSA] Add an early check for read clobbers to traversal.
Depending on the benchmark, this early exit can save a substantial
amount of compile-time:

http://llvm-compile-time-tracker.com/compare.php?from=505f2d817aa8e07ba98e5fd4a8f6ff0666f89df1&to=eb4e441147f9b4b7a5fcbbc57428cadbe9e01f10&stat=instructions
2020-09-07 23:22:10 +01:00
Roman Lebedev bb7d3af113
Reland [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline
This was reverted in 503deec218
because it caused gigantic increase (3x) in branch mispredictions
in certain benchmarks on certain CPU's,
see https://reviews.llvm.org/D84108#2227365.

It has since been investigated and here are the results:
https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20200907/827578.html
> It's an amazingly severe regression, but it's also all due to branch
> mispredicts (about 3x without this). The code layout looks ok so there's
> probably something else to deal with. I'm not sure there's anything we can
> reasonably do so we'll just have to take the hit for now and wait for
> another code reorganization to make the branch predictor a bit more happy :)
>
> Thanks for giving us some time to investigate and feel free to recommit
> whenever you'd like.
>
> -eric

So let's just reland this.
Original commit message:


I've been looking at missed vectorizations in one codebase.
One particular thing that stands out is that some of the loops
reach vectorizer in a rather mangled form, with weird PHI's,
and some of the loops aren't even in a rotated form.

After taking a more detailed look, that happened because
the loop's headers were too big by then. It is evident that
SimplifyCFG's common code hoisting transform is at fault there,
because the pattern it handles is precisely the unrotated
loop basic block structure.

Surprizingly, `SimplifyCFGOpt::HoistThenElseCodeToIf()` is enabled
by default, and is always run, unlike it's friend, common code sinking
transform, `SinkCommonCodeFromPredecessors()`, which is not enabled
by default and is only run once very late in the pipeline.

I'm proposing to harmonize this, and disable common code hoisting
until //late// in pipeline. Definition of //late// may vary,
here currently i've picked the same one as for code sinking,
but i suppose we could enable it as soon as right after
loop rotation happens.

Experimentation shows that this does indeed unsurprizingly help,
more loops got rotated, although other issues remain elsewhere.

Now, this undoubtedly seriously shakes phase ordering.
This will undoubtedly be a mixed bag in terms of both compile- and
run- time performance, codesize. Since we no longer aggressively
hoist+deduplicate common code, we don't pay the price of said hoisting
(which wasn't big). That may allow more loops to be rotated,
so we pay that price. That, in turn, that may enable all the transforms
that require canonical (rotated) loop form, including but not limited to
vectorization, so we pay that too. And in general, no deduplication means
more [duplicate] instructions going through the optimizations. But there's still
late hoisting, some of them will be caught late.

As per benchmarks i've run {F12360204}, this is mostly within the noise,
there are some small improvements, some small regressions.
One big regression i saw i fixed in rG8d487668d09fb0e4e54f36207f07c1480ffabbfd, but i'm sure
this will expose many more pre-existing missed optimizations, as usual :S

llvm-compile-time-tracker.com thoughts on this:
http://llvm-compile-time-tracker.com/compare.php?from=e40315d2b4ed1e38962a8f33ff151693ed4ada63&to=c8289c0ecbf235da9fb0e3bc052e3c0d6bff5cf9&stat=instructions
* this does regress compile-time by +0.5% geomean (unsurprizingly)
* size impact varies; for ThinLTO it's actually an improvement

The largest fallout appears to be in GVN's load partial redundancy
elimination, it spends *much* more time in
`MemoryDependenceResults::getNonLocalPointerDependency()`.
Non-local `MemoryDependenceResults` is widely-known to be, uh, costly.
There does not appear to be a proper solution to this issue,
other than silencing the compile-time performance regression
by tuning cut-off thresholds in `MemoryDependenceResults`,
at the cost of potentially regressing run-time performance.
D84609 attempts to move in that direction, but the path is unclear
and is going to take some time.

If we look at stats before/after diffs, some excerpts:
* RawSpeed (the target) {F12360200}
  * -14 (-73.68%) loops not rotated due to the header size (yay)
  * -272 (-0.67%) `"Number of live out of a loop variables"` - good for vectorizer
  * -3937 (-64.19%) common instructions hoisted
  * +561 (+0.06%) x86 asm instructions
  * -2 basic blocks
  * +2418 (+0.11%) IR instructions
* vanilla test-suite + RawSpeed + darktable  {F12360201}
  * -36396 (-65.29%) common instructions hoisted
  * +1676 (+0.02%) x86 asm instructions
  * +662 (+0.06%) basic blocks
  * +4395 (+0.04%) IR instructions

It is likely to be sub-optimal for when optimizing for code size,
so one might want to change tune pipeline by enabling sinking/hoisting
when optimizing for size.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D84108

This reverts commit 503deec218.
2020-09-08 00:24:03 +03:00
Nikita Popov ddab4cd83e [KnownBits] Avoid some copies (NFC)
These lambdas don't need copies, use const reference.
2020-09-07 22:19:29 +02:00
Nikita Popov 9fb46a452d [SCCP] Compute ranges for supported intrinsics
For intrinsics supported by ConstantRange, compute the result range
based on the argument ranges. We do this independently of whether
some or all of the input ranges are full, as we can often still
constrain the result in some way.

Differential Revision: https://reviews.llvm.org/D87183
2020-09-07 22:16:06 +02:00
Craig Topper da79b1eecc [SelectionDAG][X86][ARM] Teach ExpandIntRes_ABS to use sra+add+xor expansion when ADDCARRY is supported.
Rather than using SELECT instructions, use SRA, UADDO/ADDCARRY and
XORs to expand ABS. This is the multi-part version of the sequence
we use in LegalizeDAG.

It's also the same as the Custom sequence uses for i64 on 32-bit
and i128 on 64-bit. So we can remove the X86 customization.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D87215
2020-09-07 13:15:26 -07:00
Sanjay Patel 8b30067919 [InstCombine] improve fold of pointer differences
This was supposed to be an NFC cleanup, but there's
a real logic difference (did not drop 'nsw') visible
in some tests in addition to an efficiency improvement.

This is because in the case where we have 2 GEPs,
the code was *always* swapping the operands and
negating the result. But if we have 2 GEPs, we
should *never* need swapping/negation AFAICT.

This is part of improving flags propagation noticed
with PR47430.
2020-09-07 15:54:32 -04:00
Craig Topper 01b3e16757 [X86] Use the same sequence for i128 ISD::ABS on 64-bit targets as we use for i64 on 32-bit targets.
Differential Revision: https://reviews.llvm.org/D87214
2020-09-07 11:14:05 -07:00
Sanjay Patel 7a06b166b1 [DAGCombiner] allow more store merging for non-i8 truncated ops
This is a follow-up suggested in D86420 - if we have a pair of stores
in inverted order for the target endian, we can rotate the source
bits into place.
The "be_i64_to_i16_order" test shows a limitation of the current
function (which might be avoided if we integrate this function with
the other cases in mergeConsecutiveStores). In the earlier
"be_i64_to_i16" test, we skip the first 2 stores because we do not
match the full set as consecutive or rotate-able, but then we reach
the last 2 stores and see that they are an inverted pair of 16-bit
stores. The "be_i64_to_i16_order" test alters the program order of
the stores, so we miss matching the sub-pattern.

Differential Revision: https://reviews.llvm.org/D87112
2020-09-07 14:12:36 -04:00
Eric Astor a3ec4a3158 [ms] [llvm-ml] Allow use of locally-defined variables in expressions
MASM allows variables defined by equate statements to be used in expressions.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86946
2020-09-07 14:00:14 -04:00
Eric Astor 2feb6e9b84 [ms] [llvm-ml] Fix STRUCT field alignment
MASM aligns fields to the _minimum_ of the STRUCT alignment value and the size of the next field.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86945
2020-09-07 13:58:59 -04:00
Eric Astor e52e7ad54d [ms] [llvm-ml] Add support for bitwise named operators (AND, NOT, OR) in MASM
Add support for expressions of the form '1 or 2', etc.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D86944
2020-09-07 13:57:54 -04:00
Simon Pilgrim 5ea9e655ef VPlan.h - remove unnecessary forward declarations. NFCI.
Already defined in includes.
2020-09-07 18:35:06 +01:00
Simon Pilgrim 4e89a0ab02 MipsISelLowering.h - remove CCState/CCValAssign forward declarations. NFCI.
These are already defined in the CallingConvLower.h include.
2020-09-07 18:15:26 +01:00
Simon Pilgrim 95ca3aacf0 BTFDebug.h - reduce MachineInstr.h include to forward declaration. NFCI. 2020-09-07 17:51:13 +01:00