Commit Graph

138807 Commits

Author SHA1 Message Date
Krzysztof Parzyszek 8a08740db6 [GVN] Account for masked loads/stores depending on load/store instructions
This is a case where an intrinsic depends on a non-call instruction.

Differential Revision: https://reviews.llvm.org/D87423
2020-09-10 10:57:33 -05:00
Simon Pilgrim b585fdae24 [X86] Use Register instead of unsigned. NFCI.
Fixes llvm-prefer-register-over-unsigned clang-tidy warnings.
2020-09-10 16:05:33 +01:00
Simon Pilgrim 9f830e0af7 AArch64MachineFunctionInfo.h - remove unnecessary TargetFrameLowering.h include. NFCI. 2020-09-10 16:05:33 +01:00
Nikita Popov 4e413e1621 [InstCombine] Temporarily do not drop volatile stores before unreachable
See discussion in D87149. Dropping volatile stores here is legal
per LLVM semantics, but causes issues for real code and may result
in a change to LLVM volatile semantics. Temporarily treat volatile
stores as "not guaranteed to transfer execution" in just this place,
until this issue has been resolved.
2020-09-10 16:16:44 +02:00
Jay Foad 517202c720 [TargetLowering] Fix comments describing XOR -> OR/AND transformations 2020-09-10 13:56:34 +01:00
Florian Hahn a5ec99da6e [DSE] Support eliminating memcpy.inline.
MemoryLocation has been taught about memcpy.inline, which means we can
get the memory locations read and written by it. This means DSE can
handle memcpy.inline
2020-09-10 13:19:25 +01:00
Max Kazantsev 8c0bbbade1 [NFC] Refactoring in SCEV: add missing `const` qualifiers 2020-09-10 19:06:37 +07:00
Simon Pilgrim de25ebaac6 [CostModel][X86] Add vXi32 division by uniform constant costs (PR47476)
Other types can be handled in future patches but their uniform / non-uniform costs are more similar and don't appear to cause many vectorization issues.
2020-09-10 12:17:54 +01:00
Simon Pilgrim fc49abee56 [X86][SSE] lowerShuffleAsSplitOrBlend always returns a shuffle.
lowerShuffleAsSplitOrBlend always returns a target shuffle result (and is the default operation for lowering some shuffle types), so we don't need to check for null.
2020-09-10 11:45:08 +01:00
Simon Pilgrim e80605e242 [X86] Remove WaitInsert::TTI member. NFCI.
This is only ever set/used inside WaitInsert::runOnMachineFunction so don't bother storing it in the class.
2020-09-10 11:45:08 +01:00
Kerry McLaughlin cd89f5c91b [SVE][CodeGen] Legalisation of truncate for scalable vectors
Truncating from an illegal SVE type to a legal type, e.g.
`trunc <vscale x 4 x i64> %in to <vscale x 4 x i32>`
fails after PromoteIntOp_CONCAT_VECTORS attempts to
create a BUILD_VECTOR.

This patch changes the promote function to create a sequence of
INSERT_SUBVECTORs if the return type is scalable, and replaces
these with UNPK+UZP1 for AArch64.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D86548
2020-09-10 11:35:33 +01:00
Juneyoung Lee 1b9884df8d Enable InsertFreeze flag of JumpThreading when used in LTO
This patch enables inserting freeze when JumpThreading converts a select to
a conditional branch when it is run in LTO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D85534
2020-09-10 19:05:49 +09:00
Sam Tebbs b81c57d646 [ARM][LowOverheadLoops] Allow tail predication on predicated instructions with unknown lane
values

The effects of unpredicated vector instruction with unknown
lanes cannot be predicted and therefore cannot be tail predicated. This
does not apply to predicated vector instructions and so this patch
allows tail predication on them.

Differential Revision: https://reviews.llvm.org/D87376
2020-09-10 10:34:32 +01:00
Sam Parker 0bdf8c9127 [SCEV] Constant expansion cost at minsize
As code size is the only thing we care about at minsize, query the
cost of materialising immediates when calculating the cost of a SCEV
expansion. We also modify the CostKind to TCK_CodeSize for minsize,
instead of RecipThroughput.

Differential Revision: https://reviews.llvm.org/D76434
2020-09-10 08:21:11 +01:00
Sam Parker 1919b65052 [ARM] Tail predicate VQDMULH and VQRDMULH
Mark the family of instructions as valid for tail predication.

Differential Revision: https://reviews.llvm.org/D87348
2020-09-10 08:20:07 +01:00
Juneyoung Lee 39c1653b3d [JumpThreading] Conditionally freeze its condition when unfolding select
This patch fixes pr45956 (https://bugs.llvm.org/show_bug.cgi?id=45956 ).
To minimize its impact to the quality of generated code, I suggest enabling
this only for LTO as a start (it has two JumpThreading passes registered).
This patch contains a flag that makes JumpThreading enable it.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D84940
2020-09-10 15:49:40 +09:00
Max Kazantsev cde8fc65ae [NFC] Rename variables to avoid name confusion
Name `LI` is used for loop info, loop and load inst at the same
function, which causes a lot of confusion.
2020-09-10 13:41:10 +07:00
Max Kazantsev c413a8a8ec [LoopLoadElim] Filter away candidates that stop being AddRecs after loop versioning. PR47457
The test in PR47457 demonstrates a situation when candidate load's pointer's SCEV
is no loger a SCEVAddRec after loop versioning. The code there assumes that it is
always a SCEVAddRec and crashes otherwise.

This patch makes sure that we do not consider candidates for which this requirement
is broken after the versioning.

Differential Revision: https://reviews.llvm.org/D87355
Reviewed By: asbirlea
2020-09-10 13:30:31 +07:00
Qiu Chaofan 6afb279100 [PowerPC] [FPEnv] Disable strict FP mutation by default
22a0edd0 introduced a config IsStrictFPEnabled, which controls the
strict floating point mutation (transforming some strict-fp operations
into non-strict in ISel). This patch disables the mutation by default
since we've finished PowerPC strict-fp enablement in backend.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87222
2020-09-10 13:28:09 +08:00
Petr Hosek c4d7536136 [CMake] Simplify CMake handling for libxml2
This matches the changes made to handling of zlib done in 10b1b4a
where we rely on find_package and the imported target rather than
manually appending the library and include paths. The use of
LLVM_LIBXML2_ENABLED has been replaced by LLVM_ENABLE_LIBXML2
thus reducing the number of variables.

Differential Revision: https://reviews.llvm.org/D84563
2020-09-09 21:44:44 -07:00
Jordan Rupprecht 52f0837778 [NFC] Move definition of variable now only used in debug builds 2020-09-09 20:23:59 -07:00
Matt Arsenault e15215e041 AMDGPU: Hoist check for VGPRs 2020-09-09 19:45:40 -04:00
Matt Arsenault 85490874b2 AMDGPU: Skip all meta instructions in hazard recognizer
This was not adding a necessary nop due to thinking the kill counted.
2020-09-09 19:45:40 -04:00
Matt Arsenault 82cbc9330a AMDGPU: Fix inserting waitcnts before kill uses 2020-09-09 19:45:40 -04:00
Juneyoung Lee a6183d0f02 [ValueTracking] isKnownNonZero, computeKnownBits for freeze
This implements support for isKnownNonZero, computeKnownBits when freeze is involved.

```
  br (x != 0), BB1, BB2
BB1:
  y = freeze x
```

In the above program, we can say that y is non-zero. The reason is as follows:

(1) If x was poison, `br (x != 0)` raised UB
(2) If x was fully undef, the branch again raised UB
(3) If x was non-zero partially undef, say `undef | 1`, `freeze x` will return a nondeterministic value which is also non-zero.
(4) If x was just a concrete value, it is trivial

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D75808
2020-09-10 08:07:38 +09:00
dfukalov c259d3a061 [AMDGPU] Fix for folding v2.16 literals.
It was found some packed immediate operands (e.g. `<half 1.0, half 2.0>`) are
incorrectly processed so one of two packed values were lost.

Introduced new function to check immediate 32-bit operand can be folded.
Converted condition about current op_sel flags value to fall-through.

Fixes: SWDEV-247595

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D87158
2020-09-10 01:39:25 +03:00
Jessica Paquette 480e7f43a2 [AArch64][GlobalISel] Share address mode selection code for memops
We were missing support for the G_ADD_LOW + ADRP folding optimization in the
manual selection code for G_LOAD, G_STORE, and G_ZEXTLOAD.

As a result, we were missing cases like this:

```
@foo = external hidden global i32*
define void @baz(i32* %0) {
store i32* %0, i32** @foo
ret void
}
```

https://godbolt.org/z/16r7ad

This functionality already existed in the addressing mode functions for the
importer. So, this patch makes the manual selection code use
`selectAddrModeIndexed` rather than duplicating work.

This is a 0.2% geomean code size improvement for CTMark at -O3.

There is one code size increase (0.1% on lencod) which is likely because
`selectAddrModeIndexed` doesn't look through constants.

Differential Revision: https://reviews.llvm.org/D87397
2020-09-09 15:14:46 -07:00
Florian Hahn 9969c317ff [DSE,MemorySSA] Handle atomic stores explicitly in isReadClobber.
Atomic stores are modeled as MemoryDef to model the fact that they may
not be reordered, depending on the ordering constraints.

Atomic stores that are monotonic or weaker do not limit re-ordering, so
we do not have to treat them as potential read clobbers.

Note that llvm/test/Transforms/DeadStoreElimination/MSSA/atomic.ll
already contains a set of negative test cases.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D87386
2020-09-09 23:01:58 +01:00
Nikita Popov 0a5dc7effb [DAGCombiner] Fold fmin/fmax of NaN
fminnum(X, NaN) is X, fminimum(X, NaN) is NaN. This mirrors the
behavior of existing InstSimplify folds.

This is expected to improve the reduction lowerings in D87391,
which use NaN as a neutral element.

Differential Revision: https://reviews.llvm.org/D87415
2020-09-09 23:53:32 +02:00
Amara Emerson e5784ef8f6 [GlobalISel] Enable usage of BranchProbabilityInfo in IRTranslator.
We weren't using this before, so none of the MachineFunction CFG edges had the
branch probability information added. As a result, block placement later in the
pipeline was flying blind.

This is enabled only with optimizations enabled like SelectionDAG.

Differential Revision: https://reviews.llvm.org/D86824
2020-09-09 14:31:12 -07:00
Amara Emerson 467a071285 [GlobalISel][IRTranslator] Generate better conditional branch lowering.
This is a port of the functionality from SelectionDAG, which tries to find
a tree of conditions from compares that are then combined using OR or AND,
before using that result as the input to a branch. Instead of naively
lowering the code as is, this change converts that into a sequence of
conditional branches on the sub-expressions of the tree.

Like SelectionDAG, we re-use the case block codegen functionality from
the switch lowering utils, which causes us to generate some different code.
The result of which I've tried to mitigate in earlier combine patches.

Differential Revision: https://reviews.llvm.org/D86665
2020-09-09 13:16:11 -07:00
Amara Emerson cc76da7ada [GlobalISel] Rewrite the elide-br-by-swapping-icmp-ops combine to do less.
This combine previously tried to take sequences like:
  %cond = G_ICMP pred, a, b
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and by inverting the compare predicate and swapping branch targets, delete the
G_BR and instead have a single conditional branch to the falsebb. Since in an
earlier patch we have a combine to fold not(icmp) into just an inverted icmp,
we don't need this combine to do as much. This patch instead generalizes the
combine by just looking for:
  G_BRCOND %cond, %truebb
  G_BR %falsebb
%truebb:
  ...
%falsebb:
  ...

and then inverting the condition using a not (xor). The xor can be folded away
in a separate combine. This change also lets us avoid some optimization code
in the IRTranslator.

I also think that deleting G_BRs in the combiner is unnecessary. That's
something that targets can decide to do at selection time and could simplify
generic code in future.

Differential Revision: https://reviews.llvm.org/D86664
2020-09-09 13:08:16 -07:00
Hiroshi Yamauchi 0ab6a15698 [X86] Add support for using fast short rep mov for memcpy lowering.
Disabled by default behind an option.

Differential Revision: https://reviews.llvm.org/D86883
2020-09-09 12:46:40 -07:00
Jian Cai 415a4fbea7 [MC] Resolve the difference of symbols in consecutive MCDataFragements
Try to resolve the difference of two symbols in consecutive MCDataFragments.
This is important for an idiom like "foo:instr; .if . - foo; instr; .endif"
(https://bugs.llvm.org/show_bug.cgi?id=43795).

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D69411
2020-09-09 12:35:43 -07:00
Fangrui Song ad61e346d3 [gcov] Give the __llvm_gcov_ctr load instruction a name for more readable output 2020-09-09 12:34:43 -07:00
Krzysztof Parzyszek 0ee54cf883 [Hexagon] Account for truncating pairs to non-pairs when widening truncates
Added missing selection patterns for vpackl.
2020-09-09 14:31:52 -05:00
Fangrui Song dbac20bb6b [gcov] Don't split entry block; add a synthetic entry block instead
The entry block is split at the first instruction where `shouldKeepInEntry`
returns false. The created basic block has a br jumping to the original entry
block. The new basic block causes the function label line and the other entry
block lines to be covered by different basic blocks, which can affect line
counts with special control flows (fork/exec in the entry block requires
heuristics in llvm-cov gcov to get consistent line counts).

  int main() { // BB0
    return 0;  // BB2 (due to entry block splitting)
  }
  // BB1 is the exit block (since gcov 4.8)

This patch adds a synthetic entry block (like PGOInstrumentation and GCC) and
inserts an edge from the synthetic entry block to the original entry block. We
can thus remove the tricky `shouldKeepInEntry` and entry block splitting. The
number of basic blocks does not change, but the emitted .gcno files will be
smaller because we can save one GCOV_TAG_LINES tag.

  // BB0 is the synthetic entry block with a single edge to BB2
  int main() { // BB2
    return 0;  // BB2
  }
  // BB1 is the exit block (since gcov 4.8)
2020-09-09 12:25:24 -07:00
Guillaume Chatelet 5a4a0cfcfb [NFC] Separate bitcode reading for FUNC_CODE_INST_CMPXCHG(_OLD)
This is preparatory work to unable storing alignment for AtomicCmpXchgInst.
See D83136 for context and bug: https://bugs.llvm.org/show_bug.cgi?id=27168

This is the fixed version of D83375, which was submitted and reverted.

Differential Revision: https://reviews.llvm.org/D87373
2020-09-09 19:10:30 +00:00
Mark de Wever 08196e0b2e Implements [[likely]] and [[unlikely]] in IfStmt.
This is the initial part of the implementation of the C++20 likelihood
attributes. It handles the attributes in an if statement.

Differential Revision: https://reviews.llvm.org/D85091
2020-09-09 20:48:37 +02:00
Krzysztof Parzyszek 81ff2d30a9 [DSE] Handle masked stores 2020-09-09 13:31:31 -05:00
Ulrich Weigand 1a25133bcd [DAGCombine] Skip re-visiting EntryToken to avoid compile time explosion
During the main DAGCombine loop, whenever a node gets replaced, the new
node and all its users are pushed onto the worklist.  Omit this if the
new node is the EntryToken (e.g. if a store managed to get optimized
out), because re-visiting the EntryToken and its users will not uncover
any additional opportunities, but there may be a large number of such
users, potentially causing compile time explosion.

This compile time explosion showed up in particular when building the
SingleSource/UnitTests/matrix-types-spec.cpp test-suite case on any
platform without SIMD vector support.

Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86963
2020-09-09 19:13:46 +02:00
Mircea Trofin 4b15fc9ddb [NFC][MLInliner] Don't initialize in an assert.
Since the build bots have assertions enabled, this flew under the radar.
2020-09-09 09:56:07 -07:00
Simon Pilgrim 6e45b98934 X86CallFrameOptimization.cpp - use const references where possible. NFCI. 2020-09-09 16:35:08 +01:00
Simon Pilgrim e706116e11 X86FrameLowering::adjustStackWithPops - cleanup auto usage. NFCI.
Don't use auto for non-obvious types, and use const references.
2020-09-09 16:15:02 +01:00
Qiu Chaofan 88ff4d2ca1 [PowerPC] Fix STRICT_FRINT/STRICT_FNEARBYINT lowering
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.

One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D87220
2020-09-09 22:40:58 +08:00
Jay Foad 649bde488c [AMDGPU] Simplify S_SETREG_B32 case in EmitInstrWithCustomInserter
NFC.
2020-09-09 15:18:31 +01:00
Dmitry Preobrazhensky 95b7040e43 [AMDGPU][MC] Improved diagnostic messages for invalid registers
Corrected parser to issue meaningful error messages for invalid and malformed registers.

See bug 41303: https://bugs.llvm.org/show_bug.cgi?id=41303

Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D87234
2020-09-09 16:44:03 +03:00
Alon Kom 818cf30b83 [MachinePipeliner] Fix II_setByPragma initialization
II_setByPragma was not reset between 2 calls of the MachinePipleiner pass

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D87088
2020-09-09 13:38:35 +00:00
Denis Antrushin 4358fa782e [Statepoints] Update DAG root after emitting statepoint.
Since we always generate CopyToRegs for statepoint results,
we must update DAG root after emitting statepoint, so that
these copies are scheduled before any possible local uses.
Note: getControlRoot() flushes all PendingExports, not only
those we generates for relocates. If that'll become a problem,
we can change it to flushing relocate exports only.

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D87251
2020-09-09 20:22:10 +07:00
Ronak Chauhan f078577f31 Revert "[AMDGPU] Support disassembly for AMDGPU kernel descriptors"
This reverts commit 487a805310.

Tests fail on big endian machines.
2020-09-09 18:01:28 +05:30