Commit Graph

141045 Commits

Author SHA1 Message Date
Kerry McLaughlin ffbbfc76ca [SVE][CodeGen] Add the isTruncatingStore flag to MSCATTER
This patch adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter().
Updated SelectionDAGDumper::print_details for MaskedScatterSDNode to print
the details of masked scatters (is truncating, signed or scaled).

This is the first in a series of patches which adds support for scalable masked scatters

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D90939
2020-11-11 10:58:24 +00:00
Sander de Smalen 9ff701100a [LoopVectorizer] Silence warning in GetRegUsage.
This patch silences the warning:
	error: lambda capture 'DL' is not used [-Werror,-Wunused-lambda-capture]
	  auto GetRegUsage = [&DL, &TTI=TTI](Type *Ty, ElementCount VF) {
	                      ~^~~
	1 error generated.

Introduced in:
  https://reviews.llvm.org/rGb873aba3943c067a5efd5303cbdf5aeb0732cf88
2020-11-11 10:54:20 +00:00
Sander de Smalen b873aba394 [LoopVectorizer] NFCI: Calculate register usage based on TLI.getTypeLegalizationCost.
This is more accurate than dividing the bitwidth based on the element count by the
maximum register size, as it can just reuse whatever has been calculated for
legalization of these types.

This change is also necessary when calculating register usage for scalable vectors, where
the legalization of these types cannot be done based on the widest register size, because
that does not take the 'vscale' component into account.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D91059
2020-11-11 10:18:50 +00:00
Sam Parker 898a81dfc5 [NFC][ARM] Replace lambda with any_of 2020-11-11 10:02:55 +00:00
Sander de Smalen 0141f5a49d [LoopVectorizer] NFC: Return ElementCount from compute[Feasible]MaxVF
Interfaces changed to return `ElementCount`:
* LoopVectorizationCostModel::computeMaxVF
* LoopVectorizationCostModel::computeFeasibleMaxVF

This is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90880
2020-11-11 09:55:06 +00:00
Chen Zheng 09e34048bf [SelectionDAG] fminnum should be a binary operator
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D91163
2020-11-11 03:41:40 -05:00
Amara Emerson 2262393090 [AArch64][GlobalISel] Port some AArch64 target specific MUL combines from SDAG.
These do things like turn a multiply of a pow-2+1 into a shift and and add,
which is a common pattern that pops up, and is universally better than expensive
madd instructions with a constant.

I've added check lines to an existing codegen test since the code being ported
is almost identical, however the mul by negative pow2 constant tests don't generate
the same code because we're missing some generic G_MUL combines still.

Differential Revision: https://reviews.llvm.org/D91125
2020-11-10 22:21:13 -08:00
Xun Li b8a8ef3276 [SafeStack] Make sure SafeStack does not break musttail call contract
SafeStack instrumentation should not insert anything inbetween musttail call and return instruction.
For every ReturnInst that needs to be instrumented, we adjust the insertion point to the musttail call if exists.

Differential Revision: https://reviews.llvm.org/D90702
2020-11-10 20:46:05 -08:00
Max Kazantsev 7dcc889917 [SCEV] Generalize no-self-wrap check in isLoopInvariantExitCondDuringFirstIterations
Lift limitation on step being `+/- 1`. In fact, the only thing it is needed for
is proving no-self-wrap. We can instead check this flag directly.

Theoretically it can increase the scope of the transform, but I could not
construct such test easily.

Differential Revision: https://reviews.llvm.org/D91126
Reviewed By: apilipenko
2020-11-11 11:17:13 +07:00
Kazu Hirata 21fbe2ee68 Revert "[BranchProbabilityInfo] Use SmallVector (NFC)"
This reverts commit 2f1038c7b6.
2020-11-10 19:17:13 -08:00
Gaurav Jain a0b7129767 [NFC] Use [MC]Register in TwoAddressInstructionPass
Differential Revision: https://reviews.llvm.org/D90902
2020-11-10 19:01:56 -08:00
Chen Zheng 4eb8359e74 [EarlyCSE] delete abs/nabs handling
delete abs/nabs handling in earlycse pass to avoid bugs related to
hashing values. After abs/nabs is canonicalized to intrinsics in D87188,
we should get CSE ability for abs/nabs back.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D90734
2020-11-10 21:10:58 -05:00
Gaurav Jain 3726b14428 [NFC] Use [MC]Register for x86 target
Differential Revision: https://reviews.llvm.org/D91161
2020-11-10 15:49:39 -08:00
Kazushi (Jam) Marukawa dd6f607ea8 [VE] Implement FoldImmediate
Implement FoldImmediate for only integer aritihmetic operations.
Add regression tests also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91150
2020-11-11 08:08:32 +09:00
Florian Hahn c8d73d939f Revert "[VPlan] Use VPValue def for VPWidenSelectRecipe."
This reverts commit a8e50f1c6e.

This reportedly breaks building the Linux kernel.
  https://bugs.llvm.org/show_bug.cgi?id=48142
2020-11-10 22:50:46 +00:00
Pirama Arumuga Nainar 8262e94a6d [ARM] Fix PR 47980: Use constrainRegClass during foldImmediate opt.
Previously we used setRegClass to rgpr, which may expand the register
domain if the result was already in a constrained class (tcgpr in the
above PR).

Differential Revision: https://reviews.llvm.org/D91192
2020-11-10 13:38:11 -08:00
Stanislav Mekhanoshin 544ef42e40 [AMDGPU] Set default op_sel_hi on accvgpr read/write
These are opsel opcodes with op_sel actually being ignored.
As a such op_sel_hi needs to be set to default 1 even though
these bits are ignored. This is compatibility change.

Differential Revision: https://reviews.llvm.org/D91202
2020-11-10 13:07:29 -08:00
Bruno Cardoso Lopes dc14542a71 [Coroutines] Add missing llvm.dbg.declare's to cover for more allocas
Tracking local variables across suspend points is still somewhat incomplete.
Consider this coroutine snippet:

```
resumable foo() {
  int x[10] = {};
  int a = 3;
  co_await std::experimental::suspend_always();
  a++;
  x[0] = 1;
  a += 2;
  x[1] = 2;
  a += 3;
  x[2] = 3;
}
```

Can't manage to print `a` or `x` if they turn out to be allocas during
CoroSplit (which happens if you build this code with `-O0` prior to this
commit):

```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x0000000100003729 main-noprint`foo() at main-noprint.cpp:43:5
   40     co_await std::experimental::suspend_always();
   41     a++;
   42     x[0] = 1;
-> 43     a += 2;
   44     x[1] = 2;
   45     a += 3;
   46     x[2] = 3;
(lldb) p x
error: <user expression 21>:1:1: use of undeclared identifier 'x'
x
^
```

The generated IR contains a `llvm.dbg.declare` for `x` in it's initialization
basic block. After CoroSplit, the `llvm.dbg.declare` might not dominate all of
`x` uses and we lose debugging quality.

Add `llvm.dbg.value`s to all relevant basic blocks such that if later
transformations break the dominance the reliable debug info is already in
place. For instance, this BB:

```
await.ready:
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

becomes:

```
await.ready:
  ...
  call void @llvm.dbg.value(metadata [10 x i32]* %x.reload.addr, metadata !751, metadata !DIExpression()), !dbg !753
  ...
  %arrayidx = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 0, !dbg !760
  ...
  %arrayidx19 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 1, !dbg !763
  ...
  %arrayidx21 = getelementptr inbounds [10 x i32], [10 x i32]* %x.reload.addr, i64 0, i64 2, !dbg !766
```

Differential Revision: https://reviews.llvm.org/D90772
2020-11-10 12:36:07 -08:00
Sjoerd Meijer 2ef47910d5 [LoopFlatten] Run it earlier, just before IndVarSimplify
This is a prep step for widening induction variables in LoopFlatten if this is
posssible (D90640), to avoid having to perform certain overflow checks. Since
IndVarSimplify may already widen induction variables, we want to run
LoopFlatten just before IndVarSimplify. This is a minor reshuffle as both
passes were already close after each other.

Differential Revision: https://reviews.llvm.org/D90402
2020-11-10 20:22:41 +00:00
Sjoerd Meijer 706ead0e87 [LoopFlatten] Make it a FunctionPass
This converts LoopFlatten from a LoopPass to a FunctionPass so that we don't
run into problems of a loop pass deleting a (inner)loop.

Differential Revision: https://reviews.llvm.org/D90940
2020-11-10 20:03:31 +00:00
Florian Hahn a8e50f1c6e
[VPlan] Use VPValue def for VPWidenSelectRecipe.
This patch turns VPWidenSelectRecipe into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84682
2020-11-10 19:39:37 +00:00
Sam Clegg 504cb2730c [lld][WebAssembly] Convert TLS tests to asm format
Fix a corresponding bug in WasmAsmParser around parsing.tdata sections.

Differential Revision: https://reviews.llvm.org/D91113
2020-11-10 11:38:53 -08:00
Benjamin Kramer 92c61a045f [ARM] Silence unused variable warning in Release builds. NFC. 2020-11-10 20:35:28 +01:00
Craig Topper 70b481e8db [RISCV] Add missing copyright header to RISCVBaseInfo.cpp. NFC 2020-11-10 11:33:08 -08:00
David Tenty ae032e2714 [CMake][ExecutionEngine] add HAVE_(DE)REGISTER_FRAME as a config.h macros
The macro HAVE_EHTABLE_SUPPORT is used by parts of ExecutionEngine to tell __register_frame/__deregister_frame is available to register the
FDE for a generated (JIT) code. It's currently set by a slowly growing set of macro tests in the respective headers, which is updated now and then when it fails to link on some platform or another due to the symbols being missing (see for example https://bugs.llvm.org/show_bug.cgi?id=5715).

This change converts the macro in two HAVE_(DE)REGISTER_FRAME config.h macros (like most of the other HAVE_* macros) and set's them based on whether CMake can actually find a definition for these symbols to link to at configuration time.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D87114
2020-11-10 13:09:44 -05:00
David Green 08d1c2d470 [ARM] Introduce t2DoLoopStartTP
This introduces a new pseudo instruction, almost identical to a
t2DoLoopStart but taking 2 parameters - the original loop iteration
count needed for a low overhead loop, plus the VCTP element count needed
for a DLSTP instruction setting up a tail predicated loop. The idea is
that the instruction holds both values and the backend
ARMLowOverheadLoops pass can pick between the two, depending on whether
it creates a tail predicated loop or falls back to a low overhead loop.

To do that there needs to be something that converts a t2DoLoopStart to
a t2DoLoopStartTP, for which this patch repurposes the
MVEVPTOptimisationsPass as a "tail predication and vpt optimisation"
pass. The extra operand for the t2DoLoopStartTP is chosen based on the
operands of VCTP's in the loop, and the instruction is moved as late in
the block as possible to attempt to increase the likelihood of making
tail predicated loops.

Differential Revision: https://reviews.llvm.org/D90591
2020-11-10 18:08:12 +00:00
Jonas Paulsson 89a1042b6a Make inferLibFuncAttributes() add SExt attribute on second arg to ldexp.
This was missing as discovered by the SystemZ multistage bot:
http://lab.llvm.org:8011/#/builders/8, where wrong code resulted when this
extension was not performed.

Thanks for review by Ulrich Weigand and Roman Lebedev.

Differential Revision: https://reviews.llvm.org/D90760
2020-11-10 18:32:15 +01:00
Jay Foad bb8d1437a6 [AMDGPU] Simplify multiclass EXP_m. NFC. 2020-11-10 17:28:36 +00:00
David Green dbe1bf63aa [ARM] Cleanup for ARMLowOverheadLoops. NFC 2020-11-10 17:28:07 +00:00
Simon Pilgrim 46a734621d [ValueTracking] computeKnownBitsFromShiftOperator - always return with Known2 containing the shifted value source. NFCI.
As detailed on D90479, in most circumstances we will always call computeKnownBits for Op0, so always perform this by pulling out the duplicate calls.
2020-11-10 17:03:17 +00:00
Simon Pilgrim 929a127932 [ValueTracking] computeKnownBitsFromShiftOperator - consistently use Known2 for the shifted value. NFCI.
Minor cleanup as part of getting D90479 moving again.
2020-11-10 17:03:17 +00:00
David Green c7e275388e [ARM] Don't aggressively unroll vector remainder loops
We already do not unroll loops with vector instructions under MVE, but
that does not include the remainder loops that the vectorizer produces.
These remainder loops will be rarely executed and are not worth
unrolling, as the trip count is likely to be low if they get executed at
all. Luckily they get llvm.loop.isvectorized to make recognizing them
simpler.

We have wanted to do this for a while but hit issues with low overhead
loops being reverted due to difficult registry allocation. With recent
changes that seems to be less of an issue now.

Differential Revision: https://reviews.llvm.org/D90055
2020-11-10 17:01:31 +00:00
Kazu Hirata 85cd7ffade [BranchProbabilityInfo] Use a range-based for loop (NFC) 2020-11-10 09:00:18 -08:00
David Green 73a6cd4b6b [ARM] Add a RegAllocHint for hinting t2DoLoopStart towards LR
This hints the operand of a t2DoLoopStart towards using LR, which can
help make it more likely to become t2DLS lr, lr. This makes it easier to
move if needed (as the input is the same as the output), or potentially
remove entirely.

The hint is added after others (from COPY's etc) which still take
precedence. It needed to find a place to add the hint, which currently
uses the post isel custom inserter.

Differential Revision: https://reviews.llvm.org/D89883
2020-11-10 16:28:57 +00:00
David Green b2ac9681a7 [ARM] Alter t2DoLoopStart to define lr
This changes the definition of t2DoLoopStart from
t2DoLoopStart rGPR
to
GPRlr = t2DoLoopStart rGPR

This will hopefully mean that low overhead loops are more tied together,
and we can more reliably generate loops without reverting or being at
the whims of the register allocator.

This is a fairly simple change in itself, but leads to a number of other
required alterations.

 - The hardware loop pass, if UsePhi is set, now generates loops of the
   form:
       %start = llvm.start.loop.iterations(%N)
     loop:
       %p = phi [%start], [%dec]
       %dec = llvm.loop.decrement.reg(%p, 1)
       %c = icmp ne %dec, 0
       br %c, loop, exit
 - For this a new llvm.start.loop.iterations intrinsic was added, identical
   to llvm.set.loop.iterations but produces a value as seen above, gluing
   the loop together more through def-use chains.
 - This new instrinsic conceptually produces the same output as input,
   which is taught to SCEV so that the checks in MVETailPredication are not
   affected.
 - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has
   been left mostly as before. We should now more reliably be able to tell
   that the t2DoLoopStart is correct without having to prove it, but
   t2WhileLoopStart and tail-predicated loops will remain the same.
 - And all the tests have been updated. There are a lot of them!

This patch on it's own might cause more trouble that it helps, with more
tail-predicated loops being reverted, but some additional patches can
hopefully improve upon that to get to something that is better overall.

Differential Revision: https://reviews.llvm.org/D89881
2020-11-10 15:57:58 +00:00
Kazushi (Jam) Marukawa 543b30db06 [VE][NFC] Change cast to dyn_cast
We used cast where we should use dyn_cast.  So, change it this time.
Old code cause problems if I implement brind instruction and compile
openmp using new compiler.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91151
2020-11-10 21:49:16 +09:00
Pablo Barrio 642b21beba [AArch64] Enable RAS 1.1 system registers in all AArch64
Some use cases (e.g. kernel devs) have strict requirements to only enable
features available with -march=armv8-a, e.g. no armv8.1-a. Enabling RAS 1.1 in
all AArch64 means they can consider to support it.

Bear in mind that the first versions of the Armv8 architecture still do not
support RAS 1.1. This patch only lets devs write code with the user-friendly
register mnemonic instead of the ugly generic S<op0>_<op1>_<Cn>_<Cm>_<op2>.
They still need to place runtime checks to make sure that the CPU to run on
supports RAS 1.1.

Differential Revision: https://reviews.llvm.org/D90594
2020-11-10 12:13:33 +00:00
Sanne Wouda dd03881bd5 Add loop distribution to the LTO pipeline
The LoopDistribute pass is missing from the LTO pipeline, so
-enable-loop-distribute has no effect during post-link. The pre-link
loop distribution doesn't seem to survive the LTO pipeline either.

With this patch (and -flto -mllvm -enable-loop-distribute) we see a 43%
uplift on SPEC 2006 hmmer for AArch64. The rest of SPECINT 2006 is
unaffected.

Differential Revision: https://reviews.llvm.org/D89896
2020-11-10 12:04:32 +00:00
Kazushi (Jam) Marukawa c84b2c49be [VE] Support inline assembly with vector regsiters
Support inline assembly with vector registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91146
2020-11-10 20:55:38 +09:00
Sander de Smalen f47573f9bf [LoopVectorizer] NFC: Propagate ElementCount to more interfaces.
Interfaces changed to take `ElementCount` as parameters:
* LoopVectorizationPlanner::buildVPlans
* LoopVectorizationPlanner::buildVPlansWithVPRecipes
* LoopVectorizationCostModel::selectVectorizationFactor

This patch is NFC for fixed-width vectors.

Reviewed By: dmgreen, ctetreau

Differential Revision: https://reviews.llvm.org/D90879
2020-11-10 11:11:02 +00:00
Mirko Brkusanin a75d6178b8 [GlobalISel] Add combine for (x | mask) -> x when (x | mask) == x
If we have a mask, and a value x, where (x | mask) == x, we can drop the OR
and just use x.

Differential Revision: https://reviews.llvm.org/D90952
2020-11-10 11:32:13 +01:00
Mirko Brkusanin fb36ab0a42 [GlobalISel] Expand combine for (x & mask) -> x when (x & mask) == x
We can use KnownBitsAnalysis to cover cases when mask is not trivial. It can
also help with cases when mask is not constant but can still be folded into
one. Since 'and' is comutative we should treat both operands as possible
replacements.

Differential Revision: https://reviews.llvm.org/D90674
2020-11-10 11:32:13 +01:00
Mirko Brkusanin 53ae95c946 [AMDGPU][GlobalISel] Combine shift + logic + shift with constant operands
This sequence of instructions can be simplified if they are single use and
some operands are constants. Additional combines may be applied afterwards.

Differential Revision: https://reviews.llvm.org/D90223
2020-11-10 11:32:13 +01:00
Mirko Brkusanin de719586a8 [AMDGPU][GlobalISel] Fold a chain of two shift instructions with constant operands
Sequence of same shift instructions with constant operands can be combined into
a single shift instruction.

Differential Revision: https://reviews.llvm.org/D90217
2020-11-10 11:32:12 +01:00
Kazushi (Jam) Marukawa b65ef65b22 [VE] Support inline assembly
Support inline assembly with scalar registers.  Add a regression test also.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D91119
2020-11-10 18:56:22 +09:00
Jay Foad 0ad4d04002 [AMDGPU] Remove an unused return value. NFC.
Differential Revision: https://reviews.llvm.org/D91063
2020-11-10 09:15:14 +00:00
Max Kazantsev 25755a0159 [NFC] Add flag to disable IV widening in indvar instance
This allows us to have control over IV widening in the pipeline.
2020-11-10 15:10:44 +07:00
Esme-Yi 6e0ad5bc8c [PowerPC] Add an ISEL pattern for Mul with Imm.
Summary: This patch try to do the following transformation if the multiplier doen't fit int16:
			(mul X, c1 << c2) -> (rldicr (mulli X, c1) c2)

Reviewed By: jsji, steven.zhang

Differential Revision: https://reviews.llvm.org/D87384
2020-11-10 06:52:39 +00:00
Max Kazantsev 3ec69c16c3 [NFC] Different way of getting step 2020-11-10 13:48:02 +07:00
Max Kazantsev 6022a8b7e8 [SCEV] Drop cached ranges of AddRecs after flag update
Our range computation methods benefit from no-wrap flags. But if the ranges
were first computed before the flags were set, the cached range will be too
pessimistic.

We need to drop cached ranges whenever we sharpen AddRec's no wrap flags.

Differential Revision: https://reviews.llvm.org/D89847
Reviewed By: fhahn
2020-11-10 12:37:12 +07:00
Carl Ritson fde8351743 [AMDGPU] Fix lowering of S_MOV_{B32,B64}_term
If the source of S_MOV_{B32,B64}_term is an immediate then it
cannot be lowered to a COPY.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D90451
2020-11-10 12:16:31 +09:00
Arthur Eubanks 1cbf8e89b5 [NewPM] Port -separate-const-offset-from-gep
Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D91095
2020-11-09 17:42:36 -08:00
Michael Kruse e5dba2d7e5 [OMPIRBuilder] Start 'Create' methods with lower case. NFC.
For consistency with the IRBuilder, OpenMPIRBuilder has method names starting with 'Create'. However, the LLVM coding style has methods names starting with lower case letters, as all other OpenMPIRBuilder already methods do. The clang-tidy configuration used by Phabricator also warns about the naming violation, adding noise to the reviews.

This patch renames all `OpenMPIRBuilder::CreateXYZ` methods to `OpenMPIRBuilder::createXYZ`, and updates all in-tree callers.

I tested check-llvm, check-clang, check-mlir and check-flang to ensure that I did not miss a caller.

Reviewed By: mehdi_amini, fghanim

Differential Revision: https://reviews.llvm.org/D91109
2020-11-09 19:35:11 -06:00
Kazu Hirata 2f1038c7b6 [BranchProbabilityInfo] Use SmallVector (NFC)
This patch simplifies BranchProbabilityInfo by changing the type of
Probs.

Without this patch:

  DenseMap<Edge, BranchProbability> Probs

maps an ordered pair of a BasicBlock* and a successor index to an edge
probability.

With this patch:

  DenseMap<const BasicBlock *, SmallVector<BranchProbability, 2>> Probs

maps a BasicBlock* to a vector of edge probabilities.

BranchProbabilityInfo has a property that for a given basic block, we
either have edge probabilities for all successors or do not have any
edge probability at all.  This property combined with the current map
type leads to a somewhat complicated algorithm in eraseBlock to erase
map entries one by one while increasing the successor index.

The new map type allows us to remove the all edge probabilities for a
given basic block in a more intuitive manner, namely:

  Probs.erase(BB);

Differential Revision: https://reviews.llvm.org/D91017
2020-11-09 17:29:40 -08:00
Xun Li c2cb093d9b [Coroutine] Move all used local allocas to the .resume function
Prior to D89768, any alloca that's used after suspension points will be put on to the coroutine frame, and hence they will always be reloaded in the resume function.
However D89768 introduced a more precise way to determine whether an alloca should live on the frame. Allocas that are only used within one suspension region (hence does not need to live across suspension points) will not be put on the frame. They will remain local to the resume function.
When creating the new entry for the .resume function, the existing logic only moved all the allocas from the old entry to the new entry. This covers every alloca from the old entry. However allocas that's defined afer coro.begin are put into a separate basic block during CoroSplit (the PostSpill basic block). We need to make sure these allocas are moved to the new entry as well if they are used.
This patch walks through all allocas, and check if they are still used but are not reachable from the new entry, if so, we move them to the new entry.

Differential Revision: https://reviews.llvm.org/D90977
2020-11-09 17:24:49 -08:00
Josh Stone 4463b73e79 Enable opt-bisect for the new pass manager
This instruments a should-run-optional-pass callback using the existing
OptBisect class to decide if new passes should be skipped. Passes that
force isRequired never reach this at all, so they are not included in
"BISECT:" output nor its pass count.

The test case is resurrected from r267022, an early version of D19172
that had new pass manager support (later reverted and redone without).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D87951
2020-11-09 15:57:48 -08:00
Michael Kruse f44ee0f5e7 [OpenMPIRBuilder] Implement CreateCanonicalLoop.
CreateCanonicalLoop generates a standardized control flow structure for OpenMP canonical for loops. The structure can be consumed by loop-associated directives such as worksharing-loop, distribute, simd etc. as well as loop transformations such as tile and unroll.

This is a first design without considering all complexities yet. The control-flow emits more basic block than strictly necessary, but these will be optimized by CFGSimplify anyway, provide a nice separation of concerns and might later be useful with more complex scenarios. I successfully implemented a basic tile construct using this API, which is not part of this patch.

The fundamental building block is the CreateCanonicalLoop that only takes the loop trip count and operates on the logical iteration spaces only. An overloaded CreateCanonicalLoop for using LB, UB, Increment is provided as well, but at least for C++, Clang will need to implement a loop counter to logical induction variable mapping anyway, since iterator overload resolution cannot be done in LLVMFrontend.

As there currently is no user for CreateCanonicalLoop, it is only called from unittests. Similarly, CanonicalLoopInfo::eraseFromParent() is used in my file implementation and might be generally useful for implementing loop-associated constructs, but is not used in this patch itself.

The following non-exhaustive list describes not yet covered items:
 * collapse clause (including non-rectangular and non-perfectly nested); idea is to provide a OpenMPIRBuilder::collapseLoopNest method consuming multiple nested loops and returning a new CanonicalLoopInfo that can be used for loop-associated directives.
 * simarly: ordered clause for DOACROSS loops
 * branch weights
 * Cancellation point (?)
 * AllocaIP
 * break statement (if needed at all)
 * Exceptions (if not completely handled in the front-end)
  * Using it in Clang; this requires implementing at least one loop-associated construct.
 * ...

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D90830
2020-11-09 15:03:32 -06:00
Arthur Eubanks cdb51bfaa7 [NewPM] Add unique-internal-linkage-names to PassRegistry.def
Pass was already ported, just not properly hooked up.
2020-11-09 12:54:13 -08:00
Eric Astor d657f7cd30 [ms] [llvm-ml] Support MASM's relational operators (EQ, LT, etc.)
Support the named relational operators (EQ, LT, etc.).

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89733
2020-11-09 14:01:36 -05:00
David Zarzycki a41ea782c8 [SelectionDAG] Enable CTPOP optimization fine tuning
Add a TLI hook to allow SelectionDAG to fine tune the conversion of CTPOP to a chain of "x & (x - 1)" when CTPOP isn't legal.

A subsequent patch will attempt to fine tune the X86 code gen.

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D89952
2020-11-09 13:49:01 -05:00
Francesco Petrogalli 9f61931e07 [llvm][AArch64] Allow TB(N)Z to drop signext for sign bit tests.
For example if the sign extension is only used in for TBZ, and the value is used elsewhere with a zero extension, this can eliminate a sign extension.

Reviewed By: samparker

Differential Revision: https://reviews.llvm.org/D90606
2020-11-09 18:27:48 +00:00
Eric Astor 3a71f55194 [ms] [llvm-ml] Support REPEAT/FOR/WHILE macro-like directives
Support MASM's REPEAT, FOR, FORC, and WHILE macro-like directives.

Also adds support for macro argument substitution inside quoted strings, and additional testing for macro directives.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D89732
2020-11-09 13:21:18 -05:00
David Green c8cd7e2bbf [ARM] Remove MI variable aliasing. NFC
This was accidentally using the same name for two different variables in
the same line. Whilst it seems to work for some compilers, others have
trouble and it is probably not a fantastic idea.
2020-11-09 18:18:43 +00:00
Craig Topper 5d3fd3df94 [RISCV] Make ctlz/cttz cheap to speculatively execute so CodeGenPrepare won't insert a zero check.
Add additional isel patterns for ctzw/clzw instructions.

Differential Revision: https://reviews.llvm.org/D91040
2020-11-09 10:13:45 -08:00
Craig Topper a59076006b [RISCV] Add isel patterns for using PACK for zext.h and zext.w.
Differential Revision: https://reviews.llvm.org/D91024
2020-11-09 10:13:45 -08:00
Craig Topper 4265cbaa34 [RISCV] Make SIGN_EXTEND_INREG from i8/i16 legal when Zbb extension is enabled.
This produces better code for sign extend to i64 on RV32 target.

Differential Revision: https://reviews.llvm.org/D91023
2020-11-09 10:13:45 -08:00
Craig Topper c0dd22e44a [RISCV] Add isel patterns to match sbset/sbclr/sbinv/sbext even if the shift amount isn't masked.
This uses the shiftop PatFrags to handle the masked shift amount
and unmasked shift amount cases. That also checks XLen as part
of the masked amount check so we don't need separate RV32 and RV64
patterns.

Differential Revision: https://reviews.llvm.org/D91016
2020-11-09 09:55:26 -08:00
Paul Robinson 920befb337 [FastISel] Reduce spills around mem-intrinsic calls
FastISel generates instructions to materialize "local values" at the
top of a block, in the hope that these values could be reused within
the block.  To reduce spills and restores, FastISel treats calls as
sub-block boundaries, flushing the "local value map" at each call.

This patch treats the mem* intrinsics as if they were calls, because
at O0 generally they are calls.  Eliminating these spills/restores is
actually better for debugging (especially a "continue at this line"
command), code size, stack frame size, and maybe even performance.

Differential Revision: https://reviews.llvm.org/D90877
2020-11-09 09:45:14 -08:00
Mircea Trofin 2ac3a7d0c4 [NFC] Use [MC]Register
Differential Revision: https://reviews.llvm.org/D90795
2020-11-09 08:37:14 -08:00
jasonliu 42d2109380 [XCOFF] Enable explicit sections on AIX
Implement mechanism to allow explicit sections to be generated on AIX.

Reviewed By: DiggerLin

Differential Revision: https://reviews.llvm.org/D88615
2020-11-09 16:27:38 +00:00
Stanislav Mekhanoshin d5a465866e [AMDGPU] Omit buffer resource with flat scratch.
Differential Revision: https://reviews.llvm.org/D90979
2020-11-09 08:05:20 -08:00
Paul C. Anagnostopoulos 91d2e5c81a [TableGen] Add the !filter bang operator.
Add a test. Update the Programmer's Reference.

Use it in some TableGen files.

Differential Revision: https://reviews.llvm.org/D91008
2020-11-09 10:56:55 -05:00
David Zarzycki 57e46e7123 [SelectionDAG] NFC: Hoist is legal check
This was requested during the code review of D89952.
2020-11-09 10:55:15 -05:00
Sebastian Neubauer a022b1ccd8 [AMDGPU] Add amdgpu_gfx calling convention
Add a calling convention called amdgpu_gfx for real function calls
within graphics shaders. For the moment, this uses the same calling
convention as other calls in amdgpu, with registers excluded for return
address, stack pointer and stack buffer descriptor.

Differential Revision: https://reviews.llvm.org/D88540
2020-11-09 16:51:44 +01:00
Momchil Velikov 937ab6a785 [ARM][MachineOutliner] Emit more CFI instructions
This patch make the outliner emit CFI instructions in a few more
places:

  * after LR is restored, but before the return in an outlined
  function

  * around save/restore of LR to/from a register at calls to outlined
  functions

  * around save/restore of LR to/from the stack at calls to outlined
  functions

The latter two only when the function does NOT spill LR. If the
function spills LR, then outliner generated saves/restores around
calls are not considered interesting for unwinding the frame.

Differential Revision: https://reviews.llvm.org/D89483
2020-11-09 15:26:18 +00:00
Sam Tebbs 40a3f7e48d [ARM][LowOverheadLoops] Merge a VCMP and the new VPST into a VPT
There were cases where a VCMP and a VPST were merged even if the VCMP
didn't have the same defs of its operands as the VPST. This is fixed by
adding RDA checks for the defs. This however gave rise to cases where
the new VPST created would precede the un-merged VCMP and so would fail
a predicate mask assertion since the VCMP wasn't predicated. This was
solved by converting the VCMP to a VPT instead of inserting the new
VPST.

Differential Revision: https://reviews.llvm.org/D90461
2020-11-09 15:03:48 +00:00
Sjoerd Meijer e2dcea4489 [LoopFlatten] FlattenInfo bookkeeping. NFC.
Introduce struct FlattenInfo to group some of the bookkeeping. Besides this
being a bit of a clean-up, it is a prep step for next additions (D90640). I
could take things a bit further, but thought this was a good first step also
not to make this change too large.

Differential Revision: https://reviews.llvm.org/D90408
2020-11-09 14:50:26 +00:00
Florian Hahn f0d76275cb
[VPlan] Print result value for loads in VPWidenMemoryInst (NFC).
For loads, print the result value.
2020-11-09 14:01:29 +00:00
Florian Hahn 537829f2a7
[VPlan] Add isStore helper to VPWidenMemoryInstructionRecipe (NFC).
Move logic to check if the recipe is a store to a helper for easier
reuse.
2020-11-09 14:01:29 +00:00
Jay Foad 55ea017759 [AMDGPU] Remove unused DisableDecoder machinery. NFC.
This has been unused since D24738.
2020-11-09 13:53:27 +00:00
Florian Hahn fec64de261
[VPlan] Use VPValue def for VPWidenCall.
This patch turns VPWidenCall into a VPValue and uses it
during VPlan construction and codegeneration instead of the plain IR
reference where possible.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D84681
2020-11-09 13:29:41 +00:00
David Green a0a9e1c798 [ARM] Remove kill flags between VCMP and insertion point
When we fold a VCMP into a VPST instruction any kill flags between the
old VCMP position and the new insertion point need to be removed, in
order to keep the verifier happy.

Differential Revision: https://reviews.llvm.org/D90964
2020-11-09 13:17:53 +00:00
Lucas Prates c2c2cc1360 [ARM][AArch64] Adding Neoverse V1 CPU support
Add support for the Neoverse V1 CPU to the ARM and AArch64 backends.

This is based on patches from Mark Murray and Victor Campos.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D90765
2020-11-09 13:15:40 +00:00
Francesco Petrogalli fc2fe6817e [llvm][AArch64] Simplify (and (sign_extend..) #bitmask).
Fold

    VT = (and (sign_extend NarrowVT to VT) #bitmask)

into

    VT = (zero_extend NarrowVT)

With this combine, the test replaces a sign extended load + an
unsigned extention with a zero extended load to render one of the
operands of the last multiplication.

  BEFORE                       |  AFTER
    f_i16_i32:                 |    f_i16_i32:
         .fnstart              |           .fnstart
         ldrsh   r0, [r0]      |           ldrh    r1, [r1]
         ldrsh   r1, [r1]      |           ldrsh   r0, [r0]
         smulbb  r0, r1, r0    |           smulbb  r0, r0, r1
         uxth    r1, r1        |           mul     r0, r0, r1
         mul     r0, r0, r1    |           bx      lr
         bx      lr            |

Reviewed By: resistor

Differential Revision: https://reviews.llvm.org/D90605
2020-11-09 12:53:36 +00:00
Florian Hahn 091c5c9a18
[VPlan] Add printOperands helper to VPUser (NFC).
Factor out the code for printing operands of a VPUser so it can be
re-used when printing other recipes.
2020-11-09 12:30:57 +00:00
LemonBoy 42732d33cc
[InstCombine] Fix constant-folding of overflowing arithmetic ops on vectors
Feeding vector values to `InstCombiner::OptimizeOverflowCheck` produces a scalar boolean flag if it proves the overflow check can be eliminated.
This causes `InstCombiner::CreateOverflowTuple` to crash as it correctly expects a vector of i1 values instead.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D89628
2020-11-09 14:41:07 +03:00
Michał Górny afcdd43bf7 [llvm] [Support] Fix segv if argv0 is null in getMainExecutable()
When LLDB Python bindings are used and stack backtraces are enabled
for logging, getMainExecutable() is called with argv0 being null.
This caused the fallback function getprogpath() (used on FreeBSD, NetBSD
and Linux) to segfault.  Make it handle null executable name gracefully.

Differential Revision: https://reviews.llvm.org/D91012
2020-11-09 11:35:11 +01:00
Georgii Rymar a7a447be0f [yaml2obj] - ProgramHeaders: introduce FirstSec/LastSec instead of Sections list.
Imagine we have a YAML declaration of few sections: `foo1`, `<unnamed 2>`, `foo3`, `foo4`.

To put them into segment we can do (1*):

```
Sections:
 - Section: foo1
 - Section: foo4
```

or we can use (2*):

```
Sections:
 - Section: foo1
 - Section: foo3
 - Section: foo4
```

or (3*) :

```
Sections:
 - Section: foo1
## "(index 2)" here is a name that we automatically created for a unnamed section.
 - Section: (index 2)
 - Section: foo3
 - Section: foo4
```

It looks really confusing that we don't have to list all of sections.

At first I've tried to make this rule stricter and report an error when there is a gap
(i.e. when a section is included into segment, but not listed explicitly).
This did not work perfect, because such approach conflicts with unnamed sections/fills (see (3*)).

This patch drops "Sections" key and introduces 2 keys instead: `FirstSec` and `LastSec`.
Both are optional.

Differential revision: https://reviews.llvm.org/D90458
2020-11-09 13:00:50 +03:00
Georgii Rymar 99a6401acc Recommit: [llvm-readelf/obj] - Allow dumping of ELF header even if some elements are corrupt.
This is recommit for D90903 with fixes for BB:
1) Used std::move<> when returning Expected<> (http://lab.llvm.org:8011/#/builders/112/builds/913)
2) Fixed the name of temporarily file in the file-headers.test (http://lab.llvm.org:8011/#/builders/36/builds/1269)
   (a local old temporarily file was used before)

For creating `ELFObjectFile` instances we have the factory method
`ELFObjectFile<ELFT>::create(MemoryBufferRef Object)`.

The problem of this method is that it scans the section header to locate some sections.
When a file is truncated or has broken fields in the ELF header, this approach does
not allow us to create the `ELFObjectFile` and dump the ELF header.

This is https://bugs.llvm.org/show_bug.cgi?id=40804

This patch suggests a solution - it allows to delay scaning sections in the
`ELFObjectFile<ELFT>::create`. It now allows user code to call an object
initialization (`initContent()`) later. With that it is possible,
for example, for dumpers just to dump the file header and exit.
By default initialization is still performed as before, what helps to keep
the logic of existent callers untouched.

I've experimented with different approaches when worked on this patch.
I think this approach is better than doing initialization of sections (i.e. scan of them)
on demand, because normally users of `ELFObjectFile` API expect to work with a valid object.
In most cases when a section header table can't be read (because of an error), we don't
have to continue to work with object. So we probably don't need to implement a more complex API.

Differential revision: https://reviews.llvm.org/D90903
2020-11-09 12:53:53 +03:00
Tim Northover f7fe7ea24d [MergeFunctions] fix function attribute comparison in FunctionComparator
The comparison of AttributeSets stopped after seeing a matching type attribute.
Subsequent mismatching attributes were not detected causing a crash.
2020-11-09 09:19:11 +00:00
Georgii Rymar f59216b58f Revert "[llvm-readelf/obj] - Allow dumping of ELF header even if some elements are corrupt."
This reverts commit ea8a0b8b29.

It broke BBots.
http://lab.llvm.org:8011/#/builders/14/builds/1439
http://lab.llvm.org:8011/#/builders/112/builds/913
2020-11-09 11:50:50 +03:00
Georgii Rymar ea8a0b8b29 [llvm-readelf/obj] - Allow dumping of ELF header even if some elements are corrupt.
For creating `ELFObjectFile` instances we have the factory method
`ELFObjectFile<ELFT>::create(MemoryBufferRef Object)`.

The problem of this method is that it scans the section header to locate some sections.
When a file is truncated or has broken fields in the ELF header, this approach does
not allow us to create the `ELFObjectFile` and dump the ELF header.

This is https://bugs.llvm.org/show_bug.cgi?id=40804

This patch suggests a solution - it allows to delay scaning sections in the
`ELFObjectFile<ELFT>::create`. It now allows user code to call an object
initialization (`initContent()`) later. With that it is possible,
for example, for dumpers just to dump the file header and exit.
By default initialization is still performed as before, what helps to keep
the logic of existent callers untouched.

I've experimented with different approaches when worked on this patch.
I think this approach is better than doing initialization of sections (i.e. scan of them)
on demand, because normally users of `ELFObjectFile` API expect to work with a valid object.
In most cases when a section header table can't be read (because of an error), we don't
have to continue to work with object. So we probably don't need to implement a more complex API.

Differential revision: https://reviews.llvm.org/D90903
2020-11-09 11:27:07 +03:00
Georgii Rymar c9d036ad4a [yaml2obj] - Implement BBAddrMapSection::getEntries(). NFC.
This allows to use the generic fields validation
mechanism that we have.

The behavior (i.e. an error reported) remains the same.
2020-11-09 11:11:57 +03:00
Michael Liao fa5d31f825 [GlobalsAA] Teach to handle `addrspacecast`. 2020-11-09 00:04:52 -05:00
Sanjay Patel 00808e321c [InstSimplify] allow vector folds for (Pow2C << X) == NonPow2C
Existing pre-conditions seem to be correct:
https://rise4fun.com/Alive/lCLB

  Name: non-zero C1
  Pre: !isPowerOf2(C1) && isPowerOf2(C2) && C1 != 0
  %sub = shl i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: one == C2
  Pre: !isPowerOf2(C1) && isPowerOf2(C2) && C2 == 1
  %sub = shl i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: nuw
  Pre: !isPowerOf2(C1) && isPowerOf2(C2)
  %sub = shl nuw i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false

  Name: nsw
  Pre: !isPowerOf2(C1) && isPowerOf2(C2)
  %sub = shl nsw i8 C2, %X
  %cmp = icmp eq i8 %sub, C1
  =>
  %cmp = false
2020-11-08 09:52:05 -05:00
Simon Pilgrim b11eaf5617 [DSE] Don't dereference a dyn_cast<> result - use cast<> instead. NFCI.
We were relying on the dyn_cast<> succeeding - better use cast<> and have it assert that its the correct type than dereference a null result.
2020-11-08 13:07:45 +00:00
Simon Pilgrim 0fe91ad463 [InstCombine] foldSelectFunnelShift - block poison in funnel shift value
As raised by @nlopes on D90382 - if this is not a rotate then the select was blocking poison from the 'shift-by-zero' non-TVal, but a funnel shift won't - so freeze it.
2020-11-08 12:58:30 +00:00
Florian Hahn e8dc17a2b7
[LoopInterchange] Skip non SCEV-able operands in cost function.
This fixes a crash when trying to get a SCEV expression for operands
that are not SCEV-able.
2020-11-08 11:41:19 +00:00
Pedro Tammela 5e8ecff0d8 [Reg2Mem] add support for the new pass manager
This patch refactors the pass to accomodate the new pass manager
boilerplate.

Differential Revision: https://reviews.llvm.org/D91005
2020-11-08 11:14:05 +00:00
Arthur Eubanks 226e179f74 Revert "[NewPM] Provide method to run all pipeline callbacks, used for -O0"
This reverts commit ae38540042.
As well as some follow-up test fixes.

The original change causes new-pass-manager.ll to fail when polly is enabled.
2020-11-08 00:32:35 -08:00