Commit Graph

34024 Commits

Author SHA1 Message Date
Anna Welker 1e413a8c36 [ARM][MVE] Add support for incrementing gathers
Enables the MVEGatherScatterLowering pass to build
pre-incrementing gathers. Incrementing writeback gathers
are built when it is possible to replace the loop increment
instruction.

Differential Revision: https://reviews.llvm.org/D76786
2020-05-07 12:33:50 +01:00
Kerry McLaughlin 3bcd3dd473 [CodeGen][SVE] Lowering of shift operations with scalable types
Summary:
Adds AArch64ISD nodes for:
 - SHL_PRED (logical shift left)
 - SHR_PRED (logical shift right)
 - SRA_PRED (arithmetic shift right)

Existing patterns for unpredicated left shift by immediate
have also been moved into the appropriate multiclasses
in SVEInstrFormats.td.

Reviewers: sdesmalen, efriedma, ctetreau, huihuiz, rengolin

Reviewed By: efriedma

Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79478
2020-05-07 11:43:49 +01:00
Jay Foad 17e13da29d [AMDGPU] Re-auto-generate test checks 2020-05-07 11:08:11 +01:00
Carl Ritson e3ffe7269b [AMDGPU] Cluster shader exports
Summary:
Add DAG scheduling mutation to cluster export instructions.
This avoids unnecessary waitcnts being added when computation
ends up interspersed with exports.

Reviewers: foad, arsenm, rampitec, nhaehnle

Reviewed By: foad

Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79481
2020-05-07 19:05:38 +09:00
Carl Ritson 0d4d86cbd1 [AMDGPU] Precommit test for D79481. NFC
Test shows unnecessary s_waitcnt between shader exports.
2020-05-07 19:01:51 +09:00
Kerry McLaughlin a31f4c52bf [SVE][CodeGen] Fix legalisation for scalable types
Summary:
This patch handles illegal scalable types when lowering IR operations,
addressing several places where the value of isScalableVector() is
ignored.

For types such as <vscale x 8 x i32>, this means splitting the
operations. In this example, we would split it into two
operations of type <vscale x 4 x i32> for the low and high halves.

In cases such as <vscale x 2 x i32>, the elements in the vector
will be promoted. In this case they will be promoted to
i64 (with a vector of type <vscale x 2 x i64>)

Reviewers: sdesmalen, efriedma, huntergr

Reviewed By: efriedma

Subscribers: david-arm, tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78812
2020-05-07 10:01:31 +01:00
Sam Parker 3c9b6dfa54 [NFC][ARM] Add tail predication test 2020-05-07 08:19:32 +01:00
Craig Topper 350645594e [X86] Enable combinePMULH to match multiplies with elements larger than i32.
We're truncating so the extra bits will be discarded.
2020-05-06 23:13:59 -07:00
Craig Topper 1796cfd837 [X86] Add test cases for missed opportunity to match pmulh from multiplies with elements larger than i32.
We currently look for vXi32 sext/zext to match PMULH, but it
doesn't matter how many extra bits above i32 there are.
2020-05-06 23:13:58 -07:00
Eli Friedman 2c8546107a [AArch64][SVE] Implement lowering for SIGN_EXTEND etc. of SVE predicates.
Now using patterns, since there's a single-instruction lowering. (We
could convert to VSELECT and pattern-match that, but there doesn't seem
to be much point.)

I think this might be the first instruction to use nested multiclasses
this way? It seems like a good way to reduce duplication between
different integer widths. Let me know if it seems like an improvement.

Also, while I'm here, fix the return type of SETCC so we don't try to
merge a sign-extend with a SETCC.

Differential Revision: https://reviews.llvm.org/D79193
2020-05-06 17:56:32 -07:00
Ulrich Weigand 947f78ac27 [SystemZ] Fix/optimize vec_load_len and related intrinsics
When using vec_load/store_len_r with an immediate length operand
of 16 or larger, LLVM will currently emit an VLRL/VSTRL instruction
with that immediate.  This creates a valid encoding (which should be
supported by the assembler), but always traps at runtime.  This patch
fixes this by not creating VLRL/VSTRL in those cases.

This would result in loading the length into a register and
calling VLRLR/VSTRLR instead.  However, these operations with
a length of 15 or larger are in fact simply equivalent to a
full vector load or store.  And in fact the same holds true for
vec_load/store_len as well.

Therefore, add a DAGCombine rule to replace those operations with
plain vector loads or stores if the length is known at compile
time and equal or larger to 15.
2020-05-06 21:15:58 +02:00
LemonBoy 7fa5abd343 [SelectionDAG] Fix assertion failure with big shift amounts
Calling getShiftAmountTy with LegalTypes set may return a type that's too narrow to hold the shift amount for integer type it's applied to.

Fixes the regression introduced by D79096

Differential Revision: https://reviews.llvm.org/D79405
2020-05-06 11:58:37 -07:00
Sanjay Patel 1b678ee8a6 [x86] add test of shift+cast+concat for PR45794; NFC
Depends on D79360 / rG2f1fe1864d25 for the transform.
2020-05-06 14:18:04 -04:00
Michael Liao 4ee5a04187 [amdgpu] Fix check of VCC.
Summary: - Need to include checking on the new 16-bit subregs.

Reviewers: rampitec

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79498
2020-05-06 14:16:37 -04:00
Simon Pilgrim fe6f5ba0bf [X86][AVX] Add PR45808 test case for badly promoted comparison mask arithmetic 2020-05-06 19:09:57 +01:00
Simon Pilgrim 8817334ce3 [X86] getShuffleScalarElt - add CONCAT_VECTORS/INSERT_VECTOR_ELT support.
This helped fix some i686 vXi64 broadcast folds that were becoming v2Xi32 broadcasts because we didn't match the broadcast until after SimplifyDemandedBits worked out we only used the bottom 32-bits in PMUL(U)DQ and type legalization had split the original i64 load.

A couple of regressions occurred which required some fixups - adding concat_vectors(broadcast_load,broadcast_load) splat support and recognising (unnecessary) unary shuffles of already broadcasted vectors.

This came about as part of the work investigating vector load combining from shuffles for PR42550.
2020-05-06 18:13:33 +01:00
Stanislav Mekhanoshin 54d6dfe996 [AMDGPU] Drop 16 bit subreg suffixes on print
We do not want to break asm syntax. These suffixes are
quite useful for debugging, so add an option to print
them. Right now it is NFC.

Differential Revision: https://reviews.llvm.org/D79435
2020-05-06 08:14:10 -07:00
Luís Marques a3e6e624c7 [RISCV][NFC] Add more constant materialization tests
This patch adds more constant materialization tests, focusing on cases where
we could improve our materialization instruction sequences (particularly for
RV64). Various of these cases will be improved upon in follow-up patches.

Differential Revision: https://reviews.llvm.org/D79453
2020-05-06 16:06:16 +01:00
David Green f5f83cf4df [ARM] VMOVhr load -> vldr
Much like the similar combine added recently for VMOVrh load, this
adds a fold for VMOVhr load turning it into a vldr.f16 as opposed to a
vldrh and vmov.f16.

Differential Revision: https://reviews.llvm.org/D78714
2020-05-06 15:45:56 +01:00
Ram Nalamothu f7060f4f88 For PAL, make sure Scratch Buffer Descriptor do not clobber GIT pointer
Since SRSRC has alignment requirements, first find non GIT pointer clobbered
registers for SRSRC and then if those registers clobber preloaded Scratch Wave
Offset register, copy the Scratch Wave Offset register to a free SGPR.
2020-05-06 10:31:15 -04:00
Sanjay Patel 2f1fe1864d [DAGCombiner] sink target-supported FP<->int cast op after concat vectors
Try to combine N short vector cast ops into 1 wide vector cast op:
concat (cast X), (cast Y)... -> cast (concat X, Y...)

This is part of solving PR45794:
https://bugs.llvm.org/show_bug.cgi?id=45794

As noted in the code comment, this is uglier than I was hoping because
the opcode determines whether we pass the source or destination type
to isOperationLegalOrCustom(). Also IIUC, there's no way to validate
what the other (dest or src) type is. Without the extra legality check
on that, there's an ARM regression test in:
test/CodeGen/ARM/isel-v8i32-crash.ll
...that will crash trying to lower an unsupported v8f32 to v8i16.

Differential Revision: https://reviews.llvm.org/D79360
2020-05-06 10:25:58 -04:00
Matt Arsenault 074c371a48 AMDGPU: Insert kernarg code after allocas
This produces more normal looking IR by keeping all the allocas
clustered at the start of the block.
2020-05-06 10:19:56 -04:00
David Green d05f8a38c5 [ARM] VMOVrh of VMOVhr
A VMOVhr of a VMOVrh can be simply folded to the original HPR value.

Differential Revision: https://reviews.llvm.org/D78710
2020-05-06 15:10:01 +01:00
David Green a349949f8a [ARM] Extract from a VDUP
If we get into the situation where we are extracting from a VDUP, the
extracted value is just the origin, so long as the types match or we can
bitcast between the two.

Differential Revision: https://reviews.llvm.org/D78708
2020-05-06 14:51:25 +01:00
David Green ed7db68c35 [ARM] Convert a bitcast VDUP to a VDUP
The idea, under MVE, is to introduce more bitcasts around VDUP's in an
attempt to get the type correct across basic block boundaries. In order
to do that without other regressions we need a few fixups, of which this
is the first. If the code is a bitcast of a VDUP, we can convert that
straight into a VDUP of the new type, so long as they have the same
size.

Differential Revision: https://reviews.llvm.org/D78706
2020-05-06 14:14:21 +01:00
Dmitry Preobrazhensky 5998baccb9 [AMDGPU][MC][GFX9+] Enabled 21-bit signed offsets for SMEM instructions
Reviewers: arsenm, rampitec

Differential Revision: https://reviews.llvm.org/D79288
2020-05-06 14:13:10 +03:00
Konstantin Schwarz e82b0e9a8e [GlobalISel][InlineAsm] Add support for basic output operand constraints
Reviewers: arsenm, dsanders, aemerson, volkan, t.p.northover, paquette

Reviewed By: arsenm

Subscribers: gargaroff, wdng, rovka, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78318
2020-05-06 10:06:13 +02:00
Craig Topper 0fac1c1912 [X86] Allow Yz inline assembly constraint to choose ymm0 or zmm0 when avx/avx512 are enabled and type is 256 or 512 bits
gcc supports selecting ymm0/zmm0 for the Yz constraint when used with 256 or 512 bit vector types.

Fixes PR45806

Differential Revision: https://reviews.llvm.org/D79448
2020-05-05 21:12:30 -07:00
Jessica Paquette b1b86d1c28 [AArch64][GlobalISel] Fold shifts into G_ICMP
Since G_ICMP can be selected to a SUBS, we can fold shifts into such compares.

E.g.

```
cmp	w1, w0, lsl #3
cmp	w1, w0, lsr #3
cmp	w1, w0, asr #3
```

This is done the same way as for adds and subtracts, using
`selectShiftedRegister`.

This gives some minor code size savings on CTMark.

https://reviews.llvm.org/D79365
2020-05-05 18:35:39 -07:00
Craig Topper a4286fc952 [X86] Fix usage of Align constructing MachineMemOperands.
Similar to D77687, but for the X86 specific code.

Differential Revision: https://reviews.llvm.org/D79381
2020-05-05 15:25:02 -07:00
Christudasan Devadasan b8a616ec59 [AMDGPU] Fixed the test by adding the triple. 2020-05-06 00:14:01 +05:30
Momchil Velikov fb18dffaeb Revert "[ARM] CMSE code generation"
This reverts commit 7cbbf89d23.

The regression tests fail with the expensive checks.
2020-05-05 19:05:40 +01:00
Christudasan Devadasan 375cec4b6c [AMDGPU] Introduce more scratch registers in the ABI.
The AMDGPU target has a convention that defined all VGPRs
(execept the initial 32 argument registers) as callee-saved.
This convention is not efficient always, esp. when the callee
requiring more registers, ended up emitting a large number of
spills, even though its caller requires only a few.

This patch revises the ABI by introducing more scratch registers
that a callee can freely use.
The 256 vgpr registers now become:
  32 argument registers
  112 scratch registers and
  112 callee saved registers.
The scratch registers and the CSRs are intermixed at regular
intervals (a split boundary of 8) to obtain a better occupancy.

Reviewers: arsenm, t-tye, rampitec, b-sumner, mjbedy, tpr

Reviewed By: arsenm, t-tye

Differential Revision: https://reviews.llvm.org/D76356
2020-05-05 23:02:58 +05:30
Momchil Velikov 7cbbf89d23 [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to not use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns
  when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-05 18:23:28 +01:00
Stanislav Mekhanoshin 9ef166e657 [AMDGPU] Fix FoldImmediate for 16 bit operand
Differential Revision: https://reviews.llvm.org/D79362
2020-05-05 10:19:14 -07:00
Jinsong Ji 80b78a47e5 [MachinePipeliner] Add ORE for MachinePipeliner
This patch adds ORE for MachinePipeliner, so that people can anaylyze
their code using opt-viewer or other tools, then optimize the code to
catch more piplining opportunities.

Reviewed By: bcahoon

Differential Revision: https://reviews.llvm.org/D79368
2020-05-05 16:04:53 +00:00
David Green 146d44c251 [LSR] Don't require register reuse under postinc
LSR has some logic that tries to aggressively reuse registers in
formula. This can lead to sub-optimal decision in complex loops where
the backend it trying to use shouldFavorPostInc. This disables the
re-use in those situations.

Differential Revision: https://reviews.llvm.org/D79301
2020-05-05 16:04:50 +01:00
Jay Foad 3d76824b7f [AMDGPU] Better support for VMEM soft clauses in GCNHazardRecognizer
VMEM soft clauses only contain VMEM and FLAT instructions. Teaching
GCNHazardRecognizer::checkSoftClauseHazards that other kinds of
instructions will naturally break the clause means there are far fewer
cases where it has to insert an s_nop instruction to forcibly break the
clause.

Differential Revision: https://reviews.llvm.org/D79353
2020-05-05 15:49:09 +01:00
Sebastian Neubauer 1de4e56933 [AMDGPU] Don't mark the .note section as ALLOC
Marking a section as ALLOC tells the ELF loader to load the section into memory.
As we do not want to load the notes into VRAM, the flag should not be there.

On AMDHSA, .note is still marked as ALLOC, apparently this is currently
needed for OpenCL (see https://reviews.llvm.org/D74995).

Differential Revision: https://reviews.llvm.org/D76278
2020-05-05 14:21:45 +02:00
Sander de Smalen 8cb5663abd [AArch64][SVE] Guard bitcast patterns under IsLE predicate
Reviewed By: efriedma

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79352
2020-05-05 13:18:35 +01:00
David Green f85acb1915 [ARM] Correct the type on a predicate cast
A PREDICATE_CAST(PREDICATE_CAST(X)) can be converted to a
PREDICATE_CAST(X) as the operation can convert between any forms of
predicates (v4i1/v8i1/v16i1/i32). Unfortunately I got the type wrong on
one of the rarer converts, which would lead to invalid nodes during
isel. This fixes it up to use the correct type.

Differential Revision: https://reviews.llvm.org/D79402
2020-05-05 13:15:10 +01:00
Simon Pilgrim e53d4869a0 [X86][AVX] combineVectorSignBitsTruncation - avoid complex vXi64->vXi32 PACKSS truncations (PR45794)
Unless we're truncating an 'all-bits' result, using PACKSS for vXi64->vXi32 truncation causes problems with later combines as ComputeNumSignBits struggles to see through BITCASTs to smaller types. If we don't use PACKSS in these cases then we fallback to shuffles which are usually just as good.
2020-05-05 11:57:25 +01:00
Simon Pilgrim 371a69ac9a [X86][AVX] Add PR45794 sitofp v4i64-v4f32 test case 2020-05-05 11:32:48 +01:00
Heejin Ahn 834debfffd [WebAssembly] Fix block marker placing after fixUnwindMismatches
Summary:
This fixes a few things that are connected. It is very hard to provide
an independent test case for each of those fixes, because they are
interconnected and sometimes one masks another. The provided test case
triggers some of those bugs below but not all.

---

1. Background:
`placeBlockMarker` takes a BB, and if the BB is a destination of some
branch, it places `end_block` marker there, and computes the nearest
common dominator of all predecessors (what we call 'header') and places
a `block` marker there.

When we first place markers, we traverse BBs from top to bottom. For
example, when there are 5 BBs A, B, C, D, and E and B, D, and E are
branch destinations, if mark the BB given to `placeBlockMarker` with `*`
and draw a rectangle representing the border of `block` and `end_block`
markers, the process is going to look like
```
                       -------
           -----       |-----|
 ---       |---|       ||---||
 |A|       ||A||       |||A|||
 ---  -->  |---|  -->  ||---||
 *B        | B |       || B ||
  C        | C |       || C ||
  D        -----       |-----|
  E         *D         |  D  |
             E         -------
                         *E
```
which means when we first place markers, we go from inner to outer
scopes. So when we place a `block` marker, if the header already
contains other `block` or `try` marker, it has to belong to an inner
scope, so the existing `block`/`try` markers should go _after_ the new
marker. This was the assumption we had.

But after placing all markers we run `fixUnwindMismatches` function.
There we do some control flow transformation and create some branches,
and we call `placeBlockMarker` again to place `block`/`end_block`
markers for those newly created branches. We can't assume that we are
traversing branch destination BBs from top to bottom now because we are
basically inserting some new markers in the middle of existing markers.

Fix:
In `placeBlockMarker`, we don't have the assumption that the BB given is
in the order of top to bottom, and when placing `block` markers,
calculates whether existing `block` or `try` markers are inner or
outer scopes with respect to the current scope.

---

2. Background:
In `fixUnwindMismatches`, when there is a call whose correct unwind
destination mismatches the current destination after initially placing
`try` markers, we wrap that with a new nested `try`/`catch`/`end` and
jump to the correct handler within the new `catch`. The correct handler
code is split as a separate BB from its original EH pad so it can be
branched to. Here's an example:

- Before
```
mbb:
  call @foo       <- Unwind destination mismatch!
wrong-ehpad:
  catch
  ...
cont:
  end_try
  ...
correct-ehpad:
  catch
  [handler code]
```

- After
```
mbb:
  try                (new)
  call @foo
nested-ehpad:        (new)
  catch              (new)
  local.set n / drop (new)
  br %handleri       (new)
nested-end:          (new)
  end_try            (new)
wrong-ehpad:
  catch
  ...
cont:
  end_try
  ...
correct-ehpad:
  catch
  local.set n / drop (new)
handler:             (new)
  end_try
  [handler code]
```

Note that after this transformation, it is possible there are no calls
to actually unwind to `correct-ehpad` here. `call @foo` now
branches to `handler`, and there can be no other calls to unwind to
`correct-ehpad`. In this case `correct-ehpad` does not have any
predecessors anymore.

This can cause a bug in `placeBlockMarker`, because we may need to place
`end_block` marker in `handler`, and `placeBlockMarker` computes the
nearest common dominator of all predecessors. If one of `handler`'s
predecessor (here `correct-ehpad`) does not have any predecessors, i.e.,
no way of reaching it, we cannot correctly compute the common dominator
of predecessors of `handler`, and end up placing no `block`/`end`
markers. This bug actually sometimes masks the bug 1.

Fix:
When we have an EH pad that does not have any predecessors after this
transformation, deletes all its successors, so that its successors don't
have any dangling predecessors.

---

3. Background:
Actually the `handler` BB in the example shown in bug 2 doesn't need
`end_block` marker, despite it being a new branch destination, because
it already has `end_try` marker which can serve the same purpose. I just
put that example there for an illustration purpose. There is a case we
actually need to place `end_block` marker: when the branch dest is the
appendix BB. The appendix BB is created when there is a call that is
supposed to unwind to the caller ends up unwinding to a wrong EH pad. In
this case we also wrap the call with a nested `try`/`catch`/`end`,
create an 'appendix' BB at the very end of the function, and branch to
that BB, where we rethrow the exception to the caller.

Fix:
When we don't actually need to place block markers, we don't.

---

4. In case we fall through to the continuation BB after the catch block,
after extracting handler code in `fixUnwindMismatches` (refer to bug 2
for an example), we now have to add a branch to it to bypass the
handler.
- Before
```
try
  ...
  (falls through to 'cont')
catch
  handler body
end
              <-- cont
```

- After
```
try
  ...
  br %cont    (new)
catch
end
handler body
              <-- cont
```

The problem is, we haven't been placing a new `end_block` marker in the
`cont` BB in this case. We should, and this fixes it. But it is hard to
provide a test case that triggers this bug, because the current
compilation pipeline from .ll to .s does not generate this kind of code;
we always have a `br` after `invoke`. But code without `br` is still
valid, and we can have that kind of code if we have some pipeline
changes or optimizations later. Even mir test cases cannot trigger this
part for now, because we don't encode auxiliary EH-related data
structures (such as `WasmEHFuncInfo`) in mir now. Those functionalities
can be added later, but I don't think we should block this fix on that.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79324
2020-05-05 02:06:47 -07:00
Pierre-vh d5eb7ffa33 [Target][ARM] Fold or(A, B) more aggressively for I1 vectors
This patch makes the folding of or(A, B) into not(and(not(A), not(B)))
more agressive for I1 vector. This only affects Thumb2 MVE and improves
codegen, because it removes a lot of msr/mrs instructions on VPR.P0.

This patch also adds a xor(vcmp) -> !vcmp fold for MVE.

Differential Revision: https://reviews.llvm.org/D77202
2020-05-05 10:03:02 +01:00
Pierre-vh ffdda495f7 [Target][ARM] Add PerformVSELECTCombine for MVE Integer Ops
This patch adds an implementation of PerformVSELECTCombine in the
ARM DAG Combiner that transforms vselect(not(cond), lhs, rhs) into
vselect(cond, rhs, lhs).

Normally, this should be done by the target-independent DAG Combiner,
but it doesn't handle the kind of constants that we generate, so we
have to reimplement it here.

Differential Revision: https://reviews.llvm.org/D77712
2020-05-05 10:03:02 +01:00
David Green 09767af848 [ARM] MVE predcast with const test. NFC 2020-05-05 09:53:42 +01:00
Krzysztof Parzyszek 156092bbcc [RegisterCoalescer] Extend a subrange if needed when filling range gap
Register live ranges may have had gaps that after coalescing should be
removed. This is done by adding a new segment to the range, and merging
it with neighboring segments. When doing so, do not assume that each
subrange of the register ended at the same index. If a subrange ended
earlier, adding this segment could make the live range invalid.
Instead, if the subrange is not live at the start of the segment,
extend it first.
2020-05-04 16:49:59 -05:00
Sanjay Patel 58c1770b8f [x86] add test for shift+op+concat; NFC
D79360 could change this kind of sequence.
2020-05-04 17:31:45 -04:00
David Green 8430141578 [ARM] Complex LSR test showing inefficient codegen. NFC 2020-05-04 21:50:10 +01:00
Eli Friedman 1eb160fe8d [ARM] Fix tail call validity checking for varargs calls.
If a varargs function is calling a non-varargs function, or vice versa,
make sure we use the correct "varargs" bit for each.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45234

Differential Revision: https://reviews.llvm.org/D79199
2020-05-04 12:34:14 -07:00
Sanjay Patel f1d083ab45 [x86] add tests for concat of casts; NFC 2020-05-04 15:22:31 -04:00
Snehasish Kumar c8ac29ab1d Descriptive symbol names for machine basic block sections.
Today symbol names generated for machine basic block sections use a
unary encoding to reduce bloat. This is essential when every basic block
in the binary is assigned a symbol however with basic block clusters
(rG05192e585ce175b55f2a26b83b4ed7882785c8e6) when we only need to
generate a few non-temporary symbols we can assign more descriptive
names making them more user friendly. With this change -

Cold cluster section for function foo is named "foo.cold"
Exception cluster section for function foo is named "foo.eh"
Other cluster sections identified by their ids are named "foo.ID"
Using this format works well with existing tools. It will demangle as
expected and works with existing symbolizers, profilers and debuggers
out of the box.

$ c++filt _Z3foov.cold
foo() [clone .cold]

$ c++filt _Z3foov.eh
foo() [clone .eh]

$c++filt _Z3foov.1234
foo() [clone 1234]

Tests for basicblock-sections are updated with some cleanup where
appropriate.

Differential Revision: https://reviews.llvm.org/D79221
2020-05-04 19:06:43 +00:00
Sean Fertile 47f9e71ac7 [PowerPC][AIX][NFC] Remove spills and reloads from arg passing test. 2020-05-04 14:26:33 -04:00
Stanislav Mekhanoshin c85eda74b8 [AMDGPU] fix copies between 32 and 16 bit
This a hack to fix illegal 32 to 16 bit copies.
The problem is when we make 16 bit subregs legal it creates
a huge amount of failures which can only be resolved at once
without a temporary hack like this.

The next step is to change operands, instruction definitions
and patterns until this hack is not needed.

Differential Revision: https://reviews.llvm.org/D79119
2020-05-04 08:54:22 -07:00
Alex Richardson d1ff003fbb [SelectionDAGBuilder] Stop setting alignment to one for hidden sret values
We allocated a suitably aligned frame index so we know that all the values
have ABI alignment.
For MIPS this avoids using pair of lwl + lwr instructions instead of a
single lw. I found this when compiling CHERI pure capability code where
we can't use the lwl/lwr unaligned loads/stores and and were to falling
back to a byte load + shift + or sequence.

This should save a few instructions for MIPS and possibly other backends
that don't have fast unaligned loads/stores.
It also improves code generation for CodeGen/X86/pr34653.ll and
CodeGen/WebAssembly/offset.ll since they can now use aligned loads.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D78999
2020-05-04 14:44:39 +01:00
Alex Richardson 3fc738846e [MIPS] Add a baseline test showing current inefficient hidden sret lowering
SelectionDAGBuilder currently doesn't propagate the known alignment of
the sret parameter. This is inefficient for MIPS and highly inefficient for
our out-of-tree CHERI-extended MIPS since we don't have lwl/lwr so fall back
to byte loads for align == 1.
2020-05-04 14:44:39 +01:00
alex-t 5b898bddff [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Summary: This change enables all kind of carry out ISD opcodes to be selected according to the node divergence.

Reviewers: rampitec, arsenm, vpykhtin

Reviewed By: rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78091
2020-05-04 16:42:25 +03:00
Raul Tambre 0863e94ebd [AArch64] Add NVIDIA Carmel support
Summary:
NVIDIA's Carmel ARM64 cores are used in Tegra194 chips found in Jetson AGX Xavier, DRIVE AGX Xavier and DRIVE AGX Pegasus.

References:
* https://devblogs.nvidia.com/nvidia-jetson-agx-xavier-32-teraops-ai-robotics/#h.huq9xtg75a5e
* NVIDIA Xavier Series System-on-Chip Technical Reference Manual 1.3 (https://developer.nvidia.com/embedded/downloads#?search=Xavier%20Series%20SoC%20Technical%20Reference%20Manual)

Reviewers: sdesmalen, paquette

Reviewed By: sdesmalen

Subscribers: llvm-commits, ianshmean, kristof.beyls, hiraditya, jfb, danielkiss, cfe-commits, t.p.northover

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D77940
2020-05-04 13:52:30 +01:00
Kerry McLaughlin 19f5da9c1d [SVE][Codegen] Lower legal min & max operations
Summary:
This patch adds AArch64ISD nodes for [S|U]MIN_PRED
and [S|U]MAX_PRED, and lowers both SVE intrinsics and
IR operations for min and max to these nodes.

There are two forms of these instructions for SVE: a predicated
form and an immediate (unpredicated) form. The patterns
which existed for the latter have been updated to match a
predicated node with an immediate and map this
to the immediate instruction.

Reviewers: sdesmalen, efriedma, dancgr, rengolin

Reviewed By: efriedma

Subscribers: huihuiz, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79087
2020-05-04 11:19:19 +01:00
Craig Topper 8b53fdd3b6 [X86] Custom legalize v16i64->v16i8 truncate with avx512.
Default legalization will create two v8i64 truncs to v8i32, concat
them to v16i32, and then truncate the rest of the way to v16i8.

Instead we can truncate directly from v8i64 to v8i8 in the lower
half of an xmm. Then concat the two halves to use vpunpcklqdq.
This is the same number of uops, but the dependency chain through
the uops is better since the halves are merged at the end.

I had to had SimplifyDemandedBits support for VTRUNC to prevent
a regression on vector-trunc-math.ll. combineTruncatedArithmetic
no longer gets a chance to shrink vXi64 mul so we were producing
the v8i64 multiply sequence using multiple PMULUDQs. With the
demanded bits fix we are able to prune out the extra ops leaving
just two PMULUDQs, one for each v8i64 half. This is twice the
width of the 2 v8i32 PMULLDs we had before, but PMULUDQ is 1
uop and PMULLD is 2. We also save some truncates. It's probably
worth using PMULUDQ even when PMULLQ is available since the latter
is 3 uops, but that will require a different change.

Differential Revision: https://reviews.llvm.org/D79231
2020-05-03 23:26:04 -07:00
Simon Pilgrim ff5094c03f [X86] Add tests showing failure to fold mul(abs(x),abs(x)) -> mul(x,x) (PR39476) 2020-05-03 17:39:48 +01:00
Craig Topper cd75b74073 [X86] Fix a few issues in the evex-to-vex-compress.mir test.
Don't use $noreg for instructions that take register inputs.
Only allow $noreg for parts of memory operands.

Don't use index register with $rip base.

Use RETQ instead of the RET pseudo. This pass is after the
ExpandPseudo pass that converts RET to RETQ.
2020-05-02 18:02:12 -07:00
LemonBoy 6d103ca855 [SelectionDAG] Unify scalarizeVectorLoad and VectorLegalizer::ExpandLoad
The two code paths have the same goal, legalizing a load of a non-byte-sized vector by loading the "flattened" representation in memory, slicing off each single element and then building a vector out of those pieces.

The technique employed by `ExpandLoad`  is slightly more convoluted and produces slightly better codegen on ARM, AMDGPU and x86 but suffers from some bugs (D78480) and is wrong for BE machines.

Differential Revision: https://reviews.llvm.org/D79096
2020-05-02 15:18:10 -07:00
Sam Elliott fe4245a4c1 [RISCV] Implement convertSelectOfConstantsToMath
Summary:
The current lowering of `select` on RISC-V uses a branch instruction to load a
register with one or other value. This is inefficient, especially in the case of
small constants that can be computed easily.

By implementing the TargetLowering::convertSelectOfConstantsToMath hook, some of
the simpler cases are covered that let us avoid introducing a branch in these
cases.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D79260
2020-05-02 15:05:57 +01:00
Sam Elliott bf552d29ee [RISCV][NFC] Tests for (select (const), (const))
Summary:
This just adds some simple cases for testing select of constants. There will be
a follow-up patch that improves code generation in some of these cases.

Reviewed By: luismarques

Differential Revision: https://reviews.llvm.org/D79259
2020-05-02 15:05:45 +01:00
Sam Elliott a4a9a1f671 [RISCV] Add patterns for checking isnan
Summary:
This patch addresses some weird assembly sequences we were seeing during
comparing floats. In particular, comparing a float to itself tells you whether
it is NaN or not, which we were doing correctly, but with an extra unneeded
`and` instruction.

This patch specialises the existing patterns to remove the `and` instructions
when both their operands are the same.

Reviewed By: luismarques, asb

Differential Revision: https://reviews.llvm.org/D78908
2020-05-02 15:01:04 +01:00
Sam Elliott 910ca0e435 [RISCV][NFC] Add tests for checking isnan patterns
Summary:
I worked on adding some SelectionDag patterns to address code generated by these
examples, which came out of some differential testing against GCC. The pattern
additions will be in a follow-up patch.

Reviewers: luismarques, asb

Reviewed By: luismarques, asb

Differential Revision: https://reviews.llvm.org/D78907
2020-05-02 14:57:18 +01:00
Craig Topper 8555c91337 [X86] Use more accurate increments for the induction variables in sad.ll. NFC
I think some copy/pasting was used to create loops of different
VFs. But the increment of the induction variable wasn't updated
to match the VF.

This has no effect on the pattern matching we're testing, it just
helps the test make sense to the reader.
2020-05-01 18:55:22 -07:00
Thomas Lively e0f52842c8 [WebAssembly] Renumber SIMD opcodes
Summary:
As described in https://github.com/WebAssembly/simd/pull/209. This is
the final reorganization of the SIMD opcode space before
standardization. It has been landed in concert with corresponding
changes in other projects in the WebAssembly SIMD ecosystem.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D79224
2020-05-01 17:20:49 -07:00
Simon Pilgrim 7cb5a51f38 [DAG] SimplifyDemandedVectorElts - add INSERT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 16:20:51 +01:00
Simon Pilgrim 65d32a9892 [DAG] SimplifyDemandedVectorElts - remove INSERT_SUBVECTOR if we don't demand the subvector 2020-05-01 16:20:51 +01:00
Simon Pilgrim e3c0be596c [DAG] SimplifyDemandedVectorElts - add EXTRACT_SUBVECTOR SimplifyMultipleUseDemandedBits handling 2020-05-01 13:48:07 +01:00
Simon Pilgrim 8cbd8194c1 [X86] Improving folding of concat_vectors of subvectors from the same broadcast
Handle concat_vectors(extract_subvector(broadcast(x)), extract_subvector(broadcast(x))) -> broadcast(x)

To expose this we also need collectConcatOps to recognise the insert_subvector(x, extract_subvector(x, lo), hi) subvector splat pattern
2020-05-01 11:23:10 +01:00
Jay Foad 5f7ea85e78 [AMDGPU] Remove unnecessary s_waitcnt between VMEM loads
VMEM loads of the same type (sampler vs no sampler) are guaranteed to
write their result registers in order, so there is no need for an
s_waitcnt even if they write to overlapping vgprs.

Differential Revision: https://reviews.llvm.org/D79176
2020-05-01 10:10:23 +01:00
Hubert Tong 0e8608b3c3 [tests] Revert unhelpful change from d73eed42d1 2020-04-30 22:18:54 -04:00
Hubert Tong d73eed42d1 [tests] Speculative fix for buildbot breakage from c5f7c039ef 2020-04-30 22:04:53 -04:00
Suyog Sarda ea093f6481 Handle cases for subregisters.
While restoring latency, check if any of the registers of
source instruction is a subregister of the successor instructions
apart from being same register.
2020-04-30 20:32:33 -05:00
Craig Topper c5f7c039ef [X86] Add x, t and g modifiers for inline asm
This patch adds the x, t and g modifiers for inline asm from GCC. These will print a vector register as xmm*, ymm* or zmm* respectively.

I also fixed register names with modifiers with inteldialect so they are no longer printed with a leading %.

Patch by Amanieu d'Antras

Differential Revision: https://reviews.llvm.org/D78977
2020-04-30 17:45:45 -07:00
Craig Topper 6a1ad76dab [X86] Don't return true from isTruncateFree for vectors
Also fix some cost tables for vXi1 types to match the costs entries for the types they will be promoted to.

Differential Revision: https://reviews.llvm.org/D79045
2020-04-30 16:43:35 -07:00
Simon Pilgrim bf468f4349 [X86][SSE] Canonicalize UNARYSHUFFLE(XOR(X,-1) -> XOR(UNARYSHUFFLE(X),-1)
This pushes the NOT pattern up the DAG to help expose it for further combines (AND->ANDN in particular).

The PSHUFD/MOVDDUP 'splat' cases are the only ones I've seen in the wild so far, we can further generalize if/when we need to.
2020-04-30 19:18:51 +01:00
Simon Pilgrim 51308ee30c [X86] Extend combine-bitselect tests
Add AVX512VL tests and v2i64/v8i64 mask broadcast tests that match the existing v4i64 versions
2020-04-30 16:24:17 +01:00
Simon Pilgrim 30211c4783 [X86] combineANDXORWithAllOnesIntoANDNP - add BROADCAST handling
Fold BROADCAST(NOT(Y)) -> BROADCAST(Y) as part of finding a NOT inversion.
2020-04-30 16:24:17 +01:00
diggerlin a2c8cd1812 [AIX] emit .extern and .weak directive linkage
SUMMARY:

emit .extern and .weak directive linkage

Reviewers: hubert.reinterpretcast, Jason Liu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76932
2020-04-30 09:54:10 -04:00
Simon Pilgrim f6cdcb0a5a [X86][SSE] Add bitselect tests where the mask is a broadcasted scalar
Shows issue that the IsNot() test can't see through shuffles/broadcasts
2020-04-30 14:45:21 +01:00
Simon Pilgrim 96238486ed [DAGCombine] Move the remaining X86 funnel shift patterns to DAGCombine
X86 matches several 'shift+xor' funnel shift patterns:

  fold (or (srl (srl x1, 1), (xor y, 31)), (shl x0, y))  -> (fshl x0, x1, y)
  fold (or (shl (shl x0, 1), (xor y, 31)), (srl x1, y))  -> (fshr x0, x1, y)
  fold (or (shl (add x0, x0), (xor y, 31)), (srl x1, y)) -> (fshr x0, x1, y)

These patterns are also what we end up with the proposed expansion changes in D77301.

This patch moves these to DAGCombine's generic MatchFunnelPosNeg.

All existing X86 test cases still pass, and we just have a small codegen change in pr32282.ll.

Reviewed By: @spatel

Differential Revision: https://reviews.llvm.org/D78935
2020-04-30 12:57:17 +01:00
Cullen Rhodes 672b62ea21 [AArch64][SVE] Custom lowering of floating-point reductions
Summary:
This patch implements custom floating-point reduction ISD nodes that
have vector results, which are used to lower the following intrinsics:

    * llvm.aarch64.sve.fadda
    * llvm.aarch64.sve.faddv
    * llvm.aarch64.sve.fmaxv
    * llvm.aarch64.sve.fmaxnmv
    * llvm.aarch64.sve.fminv
    * llvm.aarch64.sve.fminnmv

SVE reduction instructions keep their result within a vector register,
with all other bits set to zero.

Changes in this patch were implemented by Paul Walker and Sander de
Smalen.

Reviewers: sdesmalen, efriedma, rengolin

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78723
2020-04-30 10:18:40 +00:00
David Sherwood 058cd8c5be [CodeGen] Add support for inserting elements into scalable vectors
Summary:
This patch tries to ensure that we do something sensible when
generating code for the ISD::INSERT_VECTOR_ELT DAG node when operating
on scalable vectors. Previously we always returned 'undef' when
inserting an element into an out-of-bounds lane index, whereas now
we only do this for fixed length vectors. For scalable vectors it
is assumed that the backend will do the right thing in the same way
that we have to deal with variable lane indices.

In this patch I have permitted a few basic combinations for scalable
vector types where it makes sense, but in general avoided most cases
for now as they currently require the use of BUILD_VECTOR nodes.

This patch includes tests for all scalable vector types when inserting
into lane 0, but I've only included one or two vector types for other
cases such as variable lane inserts.

Differential Revision: https://reviews.llvm.org/D78992
2020-04-30 11:14:04 +01:00
Evgeniy Brevnov 3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
Puyan Lotfi 9b16ece6ca [test][MachineOutliner] REQUIRES: asserts
This new test checks some of the debug output to ensure what iteration
the outliner reached a fixed point. For now I am making it REQUIRES:
asserts so that it wont break any bots that have asserts disabled.
2020-04-29 19:43:17 -04:00
Puyan Lotfi ffd5e121d7 [NFCi] Iterative Outliner + clang-format refactoring.
Prior to D69446 I had done some NFC cleanup to make landing an iterative
outliner a cleaner more straight-forward patch. Since then, it seems that has
landed but I noticed some ways it could be cleaned up. Specifically:

1) doOutline was meant to be the re-runable function, but instead
   runOnceOnModule was created that just calls doOutline.
2) In D69446 we discussed that the flag allowing the re-run of the
   outliner should be a flag to tell how many additional times to run
   the outliner again, not the total number of times. I don't think it
   makes sense to introduce a flag, but print an error if the flag is
   set to 0.

This is an NFCi, the i being that I get rid of the way that the
machine-outline-runs flag could be used to tell the outliner to not run
at all, and because I renamed the flag to '-machine-outliner-reruns'.

Differential Revision: https://reviews.llvm.org/D79070
2020-04-29 18:36:47 -04:00
Sanjay Patel cecee111e4 [x86] add tests for awkward 'icmp eq i1'; NFC 2020-04-29 14:39:47 -04:00
Simon Pilgrim f0903de1aa [x86] Enable bypassing 64-bit division on generic x86-64
This is currently enabled for Intel big cores from Sandy Bridge onward, as well as Atom, Silvermont, and KNL, due to 64-bit division being so slow on these cores. AMD cores can do this in hardware (use 32-bit division based on input operand width), so it's not a win there. But since the majority of x86 CPUs benefit from this optimization, and since the potential upside is significantly greater than the downside, we should enable this for the generic x86-64 target.

Patch By: @atdt

Reviewed By: @craig.topper, @RKSimon

Differential Revision: https://reviews.llvm.org/D75567
2020-04-29 16:55:48 +01:00
Chen Zheng 957c5dd78b [PowerPC-QPX] add more test for QPX madd/msub operands order - NFC 2020-04-29 01:17:14 -04:00
QingShan Zhang b5f89744cc [DAGCombine] Checking the cost directly to improve the code readability
Call getNegatedExpression(Cost) and check the Cost to make the code more clear.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D78347
2020-04-29 01:49:39 +00:00
Stanislav Mekhanoshin 26777ad7a0 [AMDGPU] Adapt GCNRegBankReassign for 16 bit subregs
It allows it not to crash and analyze 16 bit subregs if those
appear in the instructions. At the same time it does not attempt
to reassign these. It still can correctly identify register
banks to let larger registers to be reassigned.

More work will be needed here when real instructions will use
these registers and more tests as well.

Differential Revision: https://reviews.llvm.org/D78772
2020-04-28 16:16:04 -07:00
Stanislav Mekhanoshin 8a30460697 [AMDGPU] Define AGPR subregs
These are only needed as VGPR counterpart.

Differential Revision: https://reviews.llvm.org/D78597
2020-04-28 15:30:43 -07:00
Craig Topper 446a3be8f1 [X86] Add PACK instructions to hasUndefRegUpdate so the BreakFalseDeps pass will reassign an undef second source to match the first source
We generate PACK instructions with an undef second source when we are truncating from a 128-bit vector to something narrower and we don't care about the upper bits of the vector register. The register allocation process will always assign untied undef uses to xmm0. This creates a false dependency on xmm0.

By adding these instructions to hasUndefRegUpdate, we can get the BreakFalseDeps pass to reassign the source to match the other input. Normally this interface is used for instructions that might need an xor inserted to break the dependency. But the pass also has a heuristic that tries to use the same register as other sources. That should always be possible for these instructions so we'll never trigger the xor dependency break.

Differential Revision: https://reviews.llvm.org/D79032
2020-04-28 15:11:32 -07:00
Stanislav Mekhanoshin 46a75436f8 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 14:57:46 -07:00
Davide Italiano 0ed276bb08 [GlobalISel] Assign the correct debug location when combining G_ANYEXT/G_ZEXT
<rdar://problem/62535712>
2020-04-28 14:12:33 -07:00
Stanislav Mekhanoshin 395d93358e Revert "[AMDGPU] Define special SGPR subregs"
This reverts commit 1baaa080e0.
2020-04-28 13:53:15 -07:00
Stanislav Mekhanoshin 1baaa080e0 [AMDGPU] Define special SGPR subregs
These are used in SReg_32 and when we start to use SGPR_LO16
there will be compaints that not all registers in RC support
all subreg indexes. For now it is NFC.

Unused regunits are reserved so that verifier does not complain
about missing phys reg live-ins.

Differential Revision: https://reviews.llvm.org/D78591
2020-04-28 13:34:24 -07:00
Sean Fertile 2a3cf5e583 [PowerPC][AIX] Pass ByVal formal args that span registers and stack.
Implement passing of ByVal formal arguments when the argument is passed
partly in the argument registers, with the remainder of the argument
passed on the stack.

Differential Revision: https://reviews.llvm.org/D78515
2020-04-28 14:57:14 -04:00
Jessica Paquette 2af31b3b65 [AArch64][GlobalISel] Select immediate forms of compares by wiggling constants
Similar to code in `getAArch64Cmp` in AArch64ISelLowering.

When we get a compare against a constant, sometimes, that constant isn't valid
for selecting an immediate form.

However, sometimes, you can get a valid constant by adding 1 or subtracting 1,
and updating the condition code.

This implements the following transformations when valid:

- x slt c => x sle c - 1
- x sge c => x sgt c - 1
- x ult c => x ule c - 1
- x uge c => x ugt c - 1

- x sle c => x slt c + 1
- x sgt c => s sge c + 1
- x ule c => x ult c + 1
- x ugt c => s uge c + 1

Valid meaning the constant doesn't wrap around when we fudge it, and the result
gives us a compare which can be selected into an immediate form.

This also moves `getImmedFromMO` higher up in the file so we can use it.

Differential Revision: https://reviews.llvm.org/D78769
2020-04-28 11:35:01 -07:00
Craig Topper 0de7ddbfb0 [X86] Handle more cases in combineAddOrSubToADCOrSBB.
This adds support for

X + SETAE --> sbb X, -1
X - SETAE --> adc X, -1

Fixes PR45700

Differential Revision: https://reviews.llvm.org/D78984
2020-04-28 10:39:39 -07:00
Craig Topper c480dc6b47 [X86] Pre-commit tests for D78984. NFC
These tests show some missed opportunities to use sbb/adc.
2020-04-28 10:39:37 -07:00
Francis Visoiu Mistrih e770153865 [AArch64] Add support for -ffixed-x30
Add support for reserving LR in:

* the driver through `-ffixed-x30`
* cc1 through `-target-feature +reserve-x30`
* the backend through `-mattr=+reserve-x30`
* a subtarget feature `reserve-x30`

the same way we're doing for the other registers.
2020-04-28 08:48:28 -07:00
David Green 1084b32339 [ARM] Always replace FP16 bitcasts with VMOVhr or VMOVrh
This changes the logic with lowering fp16 bitcasts to always produce
either a VMOVhr or a VMOVrh, instead of only trying to do it with
certain surrounding nodes. To perform the same optimisations demand bits
and known bits information has been added for them.

Differential Revision: https://reviews.llvm.org/D78587
2020-04-28 16:12:53 +01:00
Krzysztof Parzyszek 25a4b1904c Handle part-word LL/SC in atomic expansion pass
Differential Revision: https://reviews.llvm.org/D77213
2020-04-28 10:07:39 -05:00
Chen Zheng 22fdbd01a3 [Powerpc] add triple for new added qpx test case - NFC 2020-04-28 05:32:10 -04:00
Ng Zhi An 500b4ad5f4 [PowerPC] Fix downcast from nullptr for target streamer
getTargetStreamer() might return null (e.g. when running inlined-strings.ll test),
downcasting to a reference will be wrong. This is detectable with -fsanitize=null.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D78686
2020-04-28 09:20:10 +00:00
Chen Zheng 949018cc27 [PowerPC] add test case for reorder operands of qpx fma instr - nfc. 2020-04-28 04:43:32 -04:00
Jonas Paulsson c84461ba8d [SystemZ] Fix test case.
Remove bad kill flags fom load-and-test.mir as discovered by
https://reviews.llvm.org/D78586: "[MachineVerifier] Add more checks for
registers in live-in lists".

Review: Ulrich Weigand
2020-04-28 09:43:03 +02:00
Kazushi (Jam) Marukawa 3c80478d73 [VE] Update branch instructions
Summary:
Changing all mnemonic to match assembly instructions to simplify mnemonic
naming rules. This time update all branch instructions.  This also change
to use %s10 register consistently.

Differential Revision: https://reviews.llvm.org/D78889
2020-04-28 09:41:01 +02:00
Kazushi (Jam) Marukawa 0314e8980f [VE] Support floating point immediate values
Summary:
Add simm7fp/mimmfp to represent floating point immediate values.
Also clean multiclasses to define floating point arithmetic instructions
to handle simm7fp/mimmfp operands.  Also add several regression tests
for new operands.

Differential Revision: https://reviews.llvm.org/D78887
2020-04-28 09:36:10 +02:00
Chen Zheng 45d92806ea [PowerPC] use inst-level fast-math-flags to drive MachineCombiner
Currently, on PowerPC target, it uses function scope UnsafeFPMath
option to drive Machine Combiner pass.

This is not accurate in two ways:
1: the scope is not accurate. Machine Combiner pass only requires
   instruction-level flags instead of the function scope.
2: the float point flag is not accurate. Machine Combiner pass
   only requires float point flags reassoc and nsz.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D78183
2020-04-28 03:31:12 -04:00
Kang Zhang 4bb0a1cb70 [PowerPC] Fix the liveins for ppc-expand-isel pass
Summary:
In the ppc-expand-isel pass, we use stepForward() to update the
liveins, this function is not recommended, because it needs the
accurate kill info.

This patch uses the function computeAndAddLiveIns() to update the
liveins, it's the recommended method and can fix the liveins bug for
ppc-expand-isel pass..

Reviewed By: efriedma, lkail

Differential Revision: https://reviews.llvm.org/D78657
2020-04-28 03:22:48 +00:00
LemonBoy f30416fdde [AsmPrinter] Fix emission of non-standard integer constants for BE targets
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.

I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.

Differential Revision: https://reviews.llvm.org/D78011
2020-04-27 14:57:29 -07:00
Wei Mi 68d2301e12 Recommit "Generate Callee Saved Register (CSR) related cfi directives
like .cfi_restore"

Insert .cfi_offset/.cfi_register when IncomingCSRSaved of current block
is larger than OutgoingCSRSaved of its previous block.

Original commit message:
https://reviews.llvm.org/D42848 only handled CFA related cfi directives but
didn't handle CSR related cfi. The patch adds the CSR part. Basically it reuses
the framework created in D42848. For each basicblock, the patch tracks which
CSR set have been saved at its CFG predecessors's exits, and compare the CSR
set with the set at its previous basicblock's exit (The previous block is the
block laid before the current block). If the saved CSR set at its previous
basicblock's exit is larger, .cfi_restore will be inserted.

The patch also generates proper .cfi_restore in epilogue to make sure the
saved CSR set is consistent for the incoming edges of each block.

Differential Revision: https://reviews.llvm.org/D74303
2020-04-27 12:46:58 -07:00
Davide Italiano c8433a5b1b [GlobalISel] Remove debug locations when emitting constants.
The tl;dr story is that this causes jumps in the emitted line
tables, even at `-O0`. We could at some point consider more fancy
solutions to preserve locations, but it doesn't seem to be worth
the effort for now.

<rdar://problem/62460788>

Differential Revision:  https://reviews.llvm.org/D78947
2020-04-27 11:27:08 -07:00
Stefan Pintilie 1354a03e74 [PowerPC][Future] Implement PC Relative Tail Calls
Tail Calls were initially disabled for PC Relative code because it was not safe
to make certain assumptions about the tail calls (namely that all compiled
functions no longer used the TOC pointer in R2). However, once all of the
TOC pointer references have been removed it is safe to tail call everything
that was tail called prior to the PC relative additions as well as a number of
new cases.
For example, it is now possible to tail call indirect functions as there is no
need to save and restore the TOC pointer for indirect functions if the caller
is marked as may clobber R2 (st_other=1). For the same reason it is now also
possible to tail call functions that are external.

Differential Revision: https://reviews.llvm.org/D77788
2020-04-27 12:55:08 -05:00
Simon Pilgrim bd60b2983e [X86][SSE] Regenerate oddsubvector.ll test checks
Fixes some missed address symbol regexs
2020-04-27 18:48:02 +01:00
David Sherwood 121ca44c19 [CodeGen] Use SPLAT_VECTOR for zeroinitialiser with scalable types
Adding tests that I forgot to add as part of a previous change:

https://reviews.llvm.org/D78636
2020-04-27 16:16:48 +01:00
David Green 61b8af0375 [ARM] Allow fma in tail predicated loops
There are some intrinsics like this that currently block tail
predication, but should be fine. This allows fma through, as the one
that I ran into. There may be others that need the same treatment but
I've only done this one here.

Differential Revision: https://reviews.llvm.org/D78385
2020-04-27 15:32:47 +01:00
Simon Pilgrim d9e174dbf7 [X86][SSE] getFauxShuffle - account for PEXTW/PEXTB implicit zero-extension
The insert(truncate/extend(extract(vec0,c0)),vec1,c1) case in rGacbc5ede99 wasn't combining the 'mineltsize' with the src vector elt size which may be smaller due to implicit extension during extraction.

Reduced from test case provided by @mstorsjo
2020-04-27 12:46:50 +01:00
David Green 8807139026 [ARM] Only produce qadd8b under hasV6Ops
When compiling for a arm5te cpu from clang, the +dsp attribute is set.
This meant we could try and generate qadd8 instructions where we would
end up having no pattern. I've changed the condition here to be hasV6Ops
&& hasDSP, which is what other parts of ARMISelLowering seem to use for
similar instructions.

Fixed PR45677.

Differential Revision: https://reviews.llvm.org/D78877
2020-04-27 10:13:29 +01:00
Matt Arsenault 4cef9812eb AMDGPU: Add some missing atomics tests
We had no FP atomic load/store coverage.
2020-04-26 15:09:35 -04:00
Simon Pilgrim acbc5ede99 [X86][SSE] getFauxShuffle - support insert(truncate/extend(extract(vec0,c0)),vec1,c1) shuffle patterns at the byte level
Followup to the PR45604 fix at rGe71dd7c011a3 where we disabled most of these cases.

By creating the shuffle at the byte level we can handle any extension/truncation as long as we track how small the scalar got and assume that the upper bytes will need to be zero.
2020-04-26 15:31:01 +01:00
Kang Zhang f85e35d2a3 [NFC][PowerPC] Add the killed flag for the case expand-isel-liveness.mir 2020-04-26 04:40:20 +00:00
Kang Zhang fe2a522533 [NFC][PowerPC] Add a new test case in expand-isel-liveness.mir 2020-04-26 03:15:54 +00:00
Craig Topper c1cb733db6 [X86] Improve lowering of v16i8->v16i1 truncate under prefer-vector-width=256. 2020-04-25 15:20:33 -07:00
Sanjay Patel 7f4ff782d4 [x86] use vector instructions to lower even more FP->int->FP casts
This is another enhancement to D77895/D78362
to avoid a round-trip from XMM->GPR->XMM.
This time we handle the case of starting/ending with different FP types
but always with signed i32 as the intermediate value.
I think this covers all of the faux vector optimization possibilities
for pre-AVX512.

There is at least 1 other transform mentioned in PR36617:
https://bugs.llvm.org/show_bug.cgi?id=36617#c19
...where we fold an 'fpext' into a preceding 'sitofp'. I think we will
want to handle that earlier (DAGCombiner or instcombine) because that's
a target-independent optimization.

Differential Revision: https://reviews.llvm.org/D78758
2020-04-25 11:38:54 -04:00
Snehasish Kumar 0cc063a8ff Use .text.unlikely and .text.eh prefixes for MachineBasicBlock sections.
Summary:
Instead of adding a ".unlikely" or ".eh" suffix for machine basic blocks,
this change updates the behaviour to use an appropriate prefix
instead. This allows lld to group basic block sections together
when -z,keep-text-section-prefix is specified and matches the behaviour
observed in gcc.

Reviewers: tmsriram, mtrofin, efriedma

Reviewed By: tmsriram, efriedma

Subscribers: eli.friedman, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78742
2020-04-24 15:07:38 -07:00
Fangrui Song 10bc12588d [XRay] Change Sled.Function to PC-relative for sled version 2 and make llvm-xray support sled version 2 addresses
Follow-up of D78082 and D78590.

Otherwise, because xray_instr_map is now read-only, the absolute
relocation used for Sled.Function will cause a text relocation.
2020-04-24 14:41:56 -07:00
Amara Emerson dbb0356771 [AArch64][GlobalISel] Fix sub-64b stack parameter passing on Darwin.
A previous bug fix for varargs introduced a regression where we would
incorrectly widen some stores to memory when passing i8/i16 parameters on the
stack. This didn't show up seemingly because it only happens when there is
no signext/zeroext parameter attribute, which I think for Darwin clang adds.

Swift however seems to be a different story, and a plain anyext on the parameter
triggered the bug.

To fix this, I've added a new ValueHandler::assignValueToAddress type override
which lets us distiguish between varargs and fixed args (we still need this
widening behaviour for varargs to fix the original bug in 2018).

rdar://61353552
2020-04-24 13:56:43 -07:00
Matt Arsenault 35e6a9c839 AMDGPU: Break read2/write2 search range on a memory fence
This is to fix performance regressions introduced by
86c944d790.

The old search would collect all potentially mergeable instructions in
the entire block. In this case, the same address is written in
multiple places in the block on the other side of a fence. When sorted
by offset, the two unmergeable, identical addresses would be next to
each other and the merge would give up.

Break the search space when we encounter an instruction we won't be
able to merge across. This will keep the identical addresses in
different merge attempts.

This may also improve compile time by reducing the merge list size.
2020-04-24 15:53:30 -04:00
Vedant Kumar c0fa447e02 AArch64: Remove reversedInstructionsWithoutDebug helper
When using reversedInstructionsWithoutDebug to construct a range from a
pair of MachineInstrBundleIterators, the range unexpectedly leaves out an
element. This results in mis-optimization as @mstorsjo points out in
https://reviews.llvm.org/D78157.

The problem is that when we convert a MachineInstrBundleIterator to a
reverse iterator, the result gets incremented:

  MachineInstrBundleIterator(++I.getReverse())

The comment there explains that the "resulting iterator will dereference
... to the previous node, which is somewhat unexpected; but converting
the two endpoints in a range will give the same range in reverse". This
makes it hard to understand what reversedInstructionsWithoutDebug will
do: I've removed the helper to prevent similar mistakes in the future.
2020-04-24 11:28:17 -07:00
Fangrui Song 25e22613df [XRay] Change ARM/AArch64/powerpc64le to use version 2 sled (PC-relative address)
Follow-up of D78082 (x86-64).

This change avoids dynamic relocations in `xray_instr_map` for ARM/AArch64/powerpc64le.

MIPS64 cannot use 64-bit PC-relative addresses because R_MIPS_PC64 is not defined.
Because MIPS32 shares the same code, for simplicity, we don't use PC-relative addresses for MIPS32 as well.

Tested on AArch64 Linux and ppc64le Linux.

Reviewed By: ianlevesque

Differential Revision: https://reviews.llvm.org/D78590
2020-04-24 08:35:43 -07:00
Luke Geeson 7da1905125 [AArch32] Armv8.6-a Matrix Mult Assembly + Intrinsics
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

- Assembly support for AArch32
- Intrinsics Support for AArch32 Neon Intrinsics for Matrix
  Multiplication

Note: these extensions are optional in the 8.6a architecture and so have
to be enabled by default

No additional IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman

Reviewers: t.p.northover, miyuki

Reviewed By: miyuki

Subscribers: miyuki, ostannard, kristof.beyls, hiraditya, danielkiss,
cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77872
2020-04-24 15:54:06 +01:00
Luke Geeson 832cd74913 [AArch64] Armv8.6-a Matrix Mult Assembly + Intrinsics
This patch upstreams support for the Armv8.6-a Matrix Multiplication
Extension. A summary of the features can be found here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

This patch includes:

- Assembly support for AArch64 only (no SVE or Neon)
- Intrinsics Support for AArch64 Armv8.6a Matrix Multiplication Instructions (No bfloat16 matrix multiplication)

No IR types or C Types are needed for this extension.

This is part of a patch series, starting with BFloat16 support and
the other components in the armv8.6a extension (in previous patches
linked in phabricator)

Based on work by:
- Luke Geeson
- Oliver Stannard
- Luke Cheeseman

Reviewers: ostannard, t.p.northover, rengolin, kmclaughlin

Reviewed By: kmclaughlin

Subscribers: kmclaughlin, kristof.beyls, hiraditya, danielkiss,
cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D77871
2020-04-24 15:54:06 +01:00
Piotr Sobczak 7631af3af2 [AMDGPU] Skip generating cache invalidating instructions on AMDPAL
Summary:
Frontend guarantees that coherent accesses have
corresponding cache policy bits set (glc, dlc).
Therefore there is no need for extra instructions
that invalidate cache.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78800
2020-04-24 13:53:44 +02:00
David Green a947be51bd [ARM] Various tests for MVE and FP16 codegen. NFC 2020-04-24 12:11:46 +01:00
Kerry McLaughlin 53dd72a87a [SVE][CodeGen] Lower SDIV & UDIV to SVE intrinsics
Summary:
This patch maps IR operations for sdiv & udiv to the
@llvm.aarch64.sve.[s|u]div intrinsics.

A ptrue must be created during lowering as the div instructions
have only a predicated form.

Patch contains changes by Andrzej Warzynski.

Reviewers: sdesmalen, c-rhodes, efriedma, cameron.mcinally, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, andwar, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78569
2020-04-24 11:38:20 +01:00
Kang Zhang 302e11cd97 [NFC][PowerPC] Fix the liveins for 3 mir test cases 2020-04-24 08:03:02 +00:00
Christudasan Devadasan 207cd5f68f [AMDGPU] Add the SGPR used for FP copy to block livein lists.
The temporary register used for FP copy
should be live throughout the function.
2020-04-24 11:47:38 +05:30
Krzysztof Parzyszek 5c7a2cfac1 [Hexagon] Fix result word order when bitcasting vector pred to int64/128 2020-04-23 19:15:11 -05:00
James Y Knight 248a5db3f2 Change callbr to only define its output SSA variable on the normal
path, not the indirect targets.

Fixes: PR45565.

Differential Revision: https://reviews.llvm.org/D78341
2020-04-23 19:36:44 -04:00
aartbik 907871d9ad [llvm] [CodeGen] Fixed vector halving bug for masked load
Summary:
Given a VL=14 that is enveloped by a proper VL=16, splitting the
masked load using the enveloping halving VL=8/8 should yields
should eventually yield V=8/5. This fixes various assert failures
in getHalfNumVectorElementsVT() and IncrementMemoryAddress().

Note, I suspect similar fixes will be needed for other masked
operations, but for now I send out a fix for masked load only.

Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: hiraditya, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78608
2020-04-23 15:12:44 -07:00
aartbik e4e187d203 [llvm] [X86] Processed test with update_llc_test_checks
Summary:
As requested in another review for a similar regression test,
I updated this test with the same utility.

Reviewers: dmgreen, craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78739
2020-04-23 15:09:53 -07:00
Sanjay Patel b53fd70b9e [x86] add tests for FP->int->FP with different FP types; NFC 2020-04-23 17:07:16 -04:00
Jay Foad 479145a5c2 [AMDGPU] Avoid hard-coded line numbers in error message checks
This makes it easier for us to maintain downstream changes to some of
these tests. NFC.

Differential Revision: https://reviews.llvm.org/D78716
2020-04-23 21:06:09 +01:00
Matt Arsenault 89c8c80bd5 AMDGPU: Change pre-gfx9 implementation of fcanonicalize to mul
If f32 denormals were enabled pre-gfx9, we would still try to
implement this with v_max_f32. Pre-gfx9, these instructions ignored
the denormal mode and did not flush. Switch to the multiply form for
f32 as a workaround which should always work in any case.

This fixes conformance failures when the library implementation of
fmin/fmax were accidentally not inlined, forcing the assumption of no
flushing on targets where denormals are not enabled by default. This
is a workaround, since really we should not be mixing code with
different FP mode expectations, but prefer the lowering that will work
in any mode.

Now this will always use max to implement canonicalize on gfx9+. This
is only really beneficial for f64. For f32/f16 it's a neutral choice
(and worse in terms of code size in 1 case), but possibly worse for
the compiler since it does add an extra register use operand. Leave
this change for later.
2020-04-23 15:24:13 -04:00
Matt Arsenault 5fe3f06596 AMDGPU/GlobalISel: Add new baseline checks for canonicalize 2020-04-23 15:04:32 -04:00
Simon Pilgrim 757c7c244b [X86][SSE] Add SSE2 extract-concat tests
Check pre-SSE41 codegen where we have less PEXTR*/PINSR* instructions
2020-04-23 19:40:34 +01:00
Krzysztof Parzyszek d8e1dd8b9b [Hexagon] Add missing live-in registers in some codegen tests 2020-04-23 10:28:04 -05:00
Jay Foad 0337017a9f [AMDGPU] Use SGPR instead of SReg classes
12994a70cf did this for 128-bit classes:

    SGPR_128 only includes the real allocatable SGPRs, and SReg_128 adds
    the additional non-allocatable TTMP registers. There's no point in
    allocating SReg_128 vregs. This shrinks the size of the classes
    regalloc needs to consider, which is usually good.

This patch extends it to all classes > 64 bits, for consistency.

Differential Revision: https://reviews.llvm.org/D78622
2020-04-23 11:45:22 +01:00
Sander de Smalen a5e0389b2a [AArch64] Define ACLE FP conversion intrinsics with more specific predicate.
This patch changes the FP conversion intrinsics to take a predicate
that matches the number of lanes for the vector with the widest element
type as opposed to using <vscale x 16 x i1>.

For example:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f16(<vscale x 4 x float>, <vscale x 4 x i1>, <vscale x 8 x half>)```
now uses <vscale x 4 x i1> instead of <vscale x 16 x i1>

And similar for:
```<vscale x 4 x float> @llvm.aarch64.sve.fcvt.f32f64(<vscale x 4 x float>, <vscale x 2 x i1>, <vscale x 2 x double>)```
where the predicate now matches the wider type, so <vscale x 2 x i1>.

Reviewers: efriedma, SjoerdMeijer, paulwalker-arm, rengolin

Reviewed By: efriedma

Tags: #clang

Differential Revision: https://reviews.llvm.org/D78402
2020-04-23 10:53:23 +01:00
Amara Emerson 613f12dd8e [AArch64][GlobalISel] Set the current debug loc when missing in some cases. 2020-04-23 01:34:57 -07:00
Vedant Kumar e0b60c6df2 [AArch64CollectLOH] Debug insts should not break LOH collection [14/14]
Fix an issue where the presence of debug instructions could break
collection of linker optimization hints.
2020-04-22 17:03:41 -07:00
Vedant Kumar ff8c417d31 [AArch64PreLegalizerCombiner] Fix debug invariance issue in matchFConstantToConstant [13/14]
Fix an issue where the FConstantToConstant combine could fail if debug
instructions were present.
2020-04-22 17:03:41 -07:00
Vedant Kumar c2c2dc526a [AArch64LoadStoreOptimizer] Skip debug insts during pattern matching [12/14]
Do not count the presence of debug insts against the limit set by
LdStLimit, and allow the optimizer to find matching insts by skipping
over debug insts.

Differential Revision: https://reviews.llvm.org/D78411
2020-04-22 17:03:40 -07:00
Vedant Kumar bf4c70b355 [AArch64ConditionOptimizer] Fix missed optimization due to debug insts [11/14]
Summary:
The findSuitableCompare method can fail if debug instructions are
present in the MBB -- fix this by using helpers to skip over debug
insts.

Reviewers: aemerson, paquette

Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78265
2020-04-22 17:03:40 -07:00
Vedant Kumar 78d69e97cc [AArch64CondBrTuning] Ignore debug insts when scanning for NZCV clobbers [10/14]
Summary:
This fixes several instances in which condbr optimization was missed
due to a debug instruction appearing as a bogus NZCV clobber.

Reviewers: aemerson, paquette

Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78264
2020-04-22 17:03:40 -07:00
Vedant Kumar 4a51b61cb3 [AArch64] Clean up assorted usage of hasOneUse/use_instructions [9/14]
Summary:
Use the variants of these APIs which skip over debug instructions. This
is mostly a cleanup, but it does fix a debug-variance issue which causes
addsub-shifted.ll and addsub_ext.ll to fail when debug info is inserted
by -mir-debugify.

Reviewers: aemerson, paquette

Subscribers: kristof.beyls, hiraditya, jfb, danielkiss, llvm-commits, aprantl

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78262
2020-04-22 17:03:40 -07:00
Vedant Kumar b157974ab3 [AArch64ConditionalCompares] Ignore debug insts in findConvertibleCompare [8/14]
Summary:
Fix an issue where the presence of debug info could disable the ccmp
optimization due to findConvertibleCompare failing too early (the error
is "Can't create ccmp with multiple uses", where the "use" is a
DBG_VALUE inst).

Depends on D78151.

Reviewers: t.p.northover, paquette, aemerson

Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78156
2020-04-22 17:03:40 -07:00
Vedant Kumar f0b52beef3 [AArch64InstrInfo] Ignore debug insts in areCFlagsAccessedBetweenInstrs [7/14]
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization due to areCFlagsAccessedBetweenInstrs returning the wrong
result.

In test/CodeGen/AArch64/arm64-csel.ll, the issue was found in the
function @foo5, in which the first compare could successfully be
optimized but not the second.

Reviewers: t.p.northover, eastig, paquette

Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, dsanders, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78157
2020-04-22 17:03:40 -07:00
Vedant Kumar 26271c8384 [AArch64InstrInfo] Ignore debug insts in canInstrSubstituteCmpInstr [6/14]
Summary:
Fix an issue where the presence of debug info could disable a peephole
optimization in optimizeCompareInstr due to canInstrSubstituteCmpInstr
returning the wrong result.

Depends on D78137.

Reviewers: t.p.northover, eastig, paquette

Subscribers: kristof.beyls, hiraditya, danielkiss, aprantl, llvm-commits, dsanders

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78151
2020-04-22 17:03:40 -07:00
Vedant Kumar f1a71b5949 [GIsel][LegalizerHelper] Account for debug insts when creating mem libcalls [5/14]
Summary:
While lowering memory intrinsics, GIsel attempts to form a tail call to
a library routine.

There might be a DBG_LABEL or something after the intrinsic call,
though: in that case, GIsel should still be able to form the tail call,
and should also delete the debug insts after the tail call as the
transform makes them invalid.

Reviewers: dsanders, aemerson

Subscribers: hiraditya, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78335
2020-04-22 17:03:40 -07:00
Vedant Kumar ba9db54505 [GIsel][CombinerHelper] Fix for missed ElideBrByInvertingCond/CombineIndexedLoadStore combines [4/14]
Summary:
Fix an issue which could result in ElideBrByInvertingCond or
CombineIndexedLoadStore being missed when debug info is present. In both
cases the fix is s/hasOneUse/hasOneNonDbgUse/.

Reviewers: aemerson, dsanders

Subscribers: hiraditya, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78254
2020-04-22 17:03:40 -07:00
Vedant Kumar 5c04274dab [GIsel][CombinerHelper] Don't consider debug insts in dominance queries [3/14]
Summary:
This fixes several issues where the presence of debug instructions could
disable certain combines, due to dominance queries finding uses/defs that
don't actually exist.

Reviewers: dsanders, fhahn, paquette, aemerson

Subscribers: hiraditya, arphaman, aprantl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78253
2020-04-22 17:03:40 -07:00
Vedant Kumar 5bae277584 [GISel][RegBankSelect] Hide assertion failure from LLT::getScalarSizeInBits [2/14]
Summary:
It looks like RegBankSelect can try to assign a bank based on a
DBG_VALUE instead of ignoring it. This eventually leads to an assert
in AArch64RegisterBankInfo::getInstrMapping because there is some info
missing from the DBG_VALUE MachineOperand (I see: `Assertion failed:
(RawData != 0 && "Invalid Type"), function getScalarSizeInBits`).

I'm not 100% sure it's safe to insert DBG_VALUE instructions right
before RegBankSelect (that's what -debugify-and-strip-all-safe is
doing). Any advice appreciated.

Depends on D78135.

Reviewers: ab, qcolombet, dsanders, aprantl

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78137
2020-04-22 17:03:39 -07:00
Vedant Kumar 6b58018c05 [ARM] Mark some tests as not safe for -debugify-and-strip-all, NFC
These tests contain debug instructions which get checked, so we can't
insert synthetic debug info and expect the tests to pass.

The rest of the ARM backend tests appear to be fair game.
2020-04-22 17:03:39 -07:00
Vedant Kumar 2a5675f11d [MachineDebugify] Insert synthetic DBG_VALUE instructions
Summary:
Teach MachineDebugify how to insert DBG_VALUE instructions.  This can
help find bugs causing CodeGen differences when debug info is present.
DBG_VALUE instructions are only emitted when -debugify-level is set to
locations+variables.

There is essentially no attempt made to match up DBG_VALUE register
operands with the local variables they ought to correspond to. I'm not
sure how to improve the situation. In some cases (MachineMemOperand?)
it's possible to find the IR instruction a MachineInstr corresponds to,
but in general this seems to call for "undoing" the work done by ISel.

Reviewers: dsanders, aprantl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78135
2020-04-22 17:03:39 -07:00
Eli Friedman 1a78b0bd38 [MachineOutliner] Teach outliner to set live-ins
Preserving liveness can be useful even late in the pipeline, if we're
doing substantial optimization work afterwards. (See, for example,
D76065.) Teach MachineOutliner how to correctly set live-ins on the
basic block in outlined functions.

Differential Revision: https://reviews.llvm.org/D78605
2020-04-22 14:19:26 -07:00
Victor Huang a60ca4b4e9 [PowerPC][Future] Initial support for PCRel addressing to get block address
Add initial support for PCRelative addressing to get block address
instead of using TOC.

Differential Revision: https://reviews.llvm.org/D76294
2020-04-22 15:01:29 -05:00
Puyan Lotfi 264c07ef77 [llvm][MIRVRegNamer] Avoid collisions across jump table indices.
Hash Jump Table Indices uniquely within a basic block for MIR
Canonicalizer / MIR VReg Renamer passes.

Differential Revision: https://reviews.llvm.org/D77966
2020-04-22 14:58:44 -04:00
David Green eecba95067 [ARM] Replace arm vendor with none. NFC 2020-04-22 18:19:35 +01:00
Victor Huang 02141a17ae [PowerPC][Future] Remove redundant r2 save and restore for indirect call
Currently an indirect call produces the following sequence on PCRelative mode:

extern void function( );
extern void (*ptrfunc) ( );

void g() {
    ptrfunc=function;
}

void f() {
    (*ptrfunc) ( );
}

Producing

paddi 3, 0, .LC0@PCREL, 1
ld 3, 0(3)
std 2, 24(1)
ld 12, 0(3)
mtctr 12
bctrl
ld 2, 24(1)

Though the caller does not use or preserve r2, it is still saved and restored
across a function call. This patch is added to remove these redundant save and
restores for indirect calls.

Differential Revision: https://reviews.llvm.org/D77749
2020-04-22 12:05:51 -05:00
Victor Huang 43abef06f4 [PowerPC][Future] Initial support for PCRel addressing for jump tables.
Add initial support for PC Relative addressing to get jump table base
address instead of using TOC.

Differential Revision: https://reviews.llvm.org/D75931
2020-04-22 10:45:01 -05:00
John Brawn 8211cfb7c8 [ARM] Don't shrink STM if it would cause an unknown base register store
If a 16-bit thumb STM with writeback stores the base register but it isn't the
first register in the list, then an unknown value is stored. The load/store
optimizer knows this and generates a 32-bit STM without writeback instead, but
thumb2 size reduction converts it into a 16-bit STM. Fix this by having thumb2
size reduction notice such STMs and leave them as they are.

Differential Revision: https://reviews.llvm.org/D78493
2020-04-22 14:50:42 +01:00
David Green 892af45c86 [ARM] Distribute MVE post-increments
This adds some extra processing into the Pre-RA ARM load/store optimizer
to detect and merge MVE loads/stores and adds of the same base. This we
don't always turn into a post-inc during ISel, and due to the nature of
it being a graph we don't always know an order to use for the nodes, not
knowing which nodes to make post-inc and which to use the new post-inc
of. After ISel, we have an order that we can use to post-inc the
following instructions.

So this looks for a loads/store with a starting offset of 0, and an
add/sub from the same base, plus a number of other loads/stores. We then
do some checks and convert the zero offset load/store into a postinc
variant. Any loads/stores after it have the offset subtracted from their
immediates.  For example:
  LDR #4           LDR #4
  LDR #0           LDR_POSTINC #16
  LDR #8           LDR #-8
  LDR #12          LDR #-4
  ADD #16
It only handles MVE loads/stores at the moment. Normal loads/store will
be added in a followup patch, they just have some extra details to
ensure that we keep generating LDRD/LDM successfully.

Differential Revision: https://reviews.llvm.org/D77813
2020-04-22 14:16:51 +01:00
David Green 48ac4e6938 [ARM] MVE FMA loop tests. NFC 2020-04-22 13:27:40 +01:00
Jay Foad dbdffe3ee9 [AMDGPU] Add 192-bit register classes
Differential Revision: https://reviews.llvm.org/D78312
2020-04-22 13:10:37 +01:00
Lucas Prates 727e6fb84a [NFC][llvm][X86] Adding missing -mtiple to X86 test.
The modified test was missing the specification of the intended triple
in its run line, assuming X86 is the default.
2020-04-22 11:55:57 +01:00
Kerry McLaughlin 17f6e18acf [AArch64][SVE] Add SVE intrinsic for LD1RQ
Summary:
Adds the following intrinsic for contiguous load & replicate:
  - @llvm.aarch64.sve.ld1rq

The LD1RQ intrinsic only needs the SImmS16XForm added by this
patch. The others (SImmS2XForm, SImmS3XForm & SImmS4XForm)
were added for consistency.

Reviewers: andwar, sdesmalen, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76929
2020-04-22 11:29:27 +01:00
aartbik 5397f29087 [llvm] [X86] Make test more robust against different builds
Summary:
Rationale:
Using the --debug-only flag requires a debug build. Also, the debug output is not always consistent over different builds.
This change avoids all problems by just testing the generated assembly for AVX.

Reviewers: craig.topper, mehdi_amini, nicolasvasilache

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78609
2020-04-22 00:23:46 -07:00
Qiu Chaofan c12722cde8 [PowerPC] Exploit RLDIMI for OR with large immediates
This patch exploits rldimi instruction for patterns like
`or %a, 0b000011110000`, which saves number of instructions when the
operand has only one use, compared with `li-ori-sldi-or`.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D77850
2020-04-22 14:16:52 +08:00
Eli Friedman 704293b168 [ARM] Fix MIR tests with invalid live-ins.
A register can't be live if it isn't defined; fix issues in various
testcases.

Differential Revision: https://reviews.llvm.org/D78529
2020-04-21 12:13:35 -07:00
Eli Friedman b4b9faa120 [AArch64] Fix MIR tests with invalid live-ins.
A register can't be live if it isn't defined; fix issues in various
testcases.

Differential Revision: https://reviews.llvm.org/D78531
2020-04-21 12:13:32 -07:00
aartbik 8387bee94d [llvm] [X86] Fixed type bug in vselect for AVX masked load
Summary:
Bugzilla issue 45563
https://bugs.llvm.org/show_bug.cgi?id=45563

Reviewers: nicolasvasilache, mehdi_amini, craig.topper

Reviewed By: craig.topper

Subscribers: RKSimon, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78527
2020-04-21 11:11:35 -07:00
Pavel Iliin be881e2831 [AArch64] FMLA/FMLS patterns improvement.
FMLA/FMLS f16 indexed patterns added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=45467
Removed redundant v2f32 vector_extract indexed pattern since
Instruction Selection is able to match v4f32 instead.
2020-04-21 18:23:21 +01:00
Fangrui Song 5771c98562 [XRay] Change xray_instr_map sled addresses from absolute to PC relative for x86-64
xray_instr_map contains absolute addresses of sleds, which are relocated
by `R_*_RELATIVE` when linked in -pie or -shared mode.

By making these addresses relative to PC, we can avoid the dynamic
relocations and remove the SHF_WRITE flag from xray_instr_map.  We can
thus save VM pages containg xray_instr_map (because they are not
modified).

This patch changes x86-64 and bumps the sled version to 2. Subsequent
changes will change powerpc64le and AArch64.

Reviewed By: dberris, ianlevesque

Differential Revision: https://reviews.llvm.org/D78082
2020-04-21 09:36:09 -07:00
Stefan Pintilie a92ee77d85 [PowerPC][Future] Add offsets to PC Relative relocations.
This is an optimization that applies to global addresses and
allows for the following transformation:
Convert this:

paddi r3, 0, symbol@PCREL, 1
ld r4, 8(r3)

To this:

pld r4, symbol@PCREL+8(0), 1

An instruction is saved and the linker can do the addition when
the symbol is resolved.

Differential Revision: https://reviews.llvm.org/D76160
2020-04-21 11:08:19 -05:00
Kang Zhang e477915bfe [PowerPC] Add a new test case expand-isel-liveness.mir 2020-04-21 16:00:34 +00:00
Sean Fertile cd8e9e8fcd [PowerPC][AIX][NFC] Fix use of FileCheck variable in lit test. 2020-04-21 10:56:46 -04:00
Pavel Iliin c2dd38f1cb [AArch64][NFC] One more intrinsic test. 2020-04-21 15:20:07 +01:00
Kerry McLaughlin 0df40d6ef8 [AArch64][SVE] Add addressing mode for contiguous loads & stores
Summary:
This patch adds the register + register addressing mode for
SVE contiguous load and store intrinsics (LD1 & ST1)

Reviewers: sdesmalen, fpetrogalli, efriedma, rengolin

Reviewed By: fpetrogalli

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78509
2020-04-21 12:04:43 +01:00
Sam Parker 27d19101e9 [ARM][ParallelDSP] Handle squaring multiplies
The logic in ARMParallelDSP is setup to merge two 16-bits loads into
a 32-bit load and feed them into the smlads. This requires that four
loads are combined for the four inputs, but there wasn't actually a
check for this.

Differential Revision: https://reviews.llvm.org/D78492
2020-04-21 08:39:56 +01:00
Yonghong Song 3cb7e7bf95 BPF: fix a CORE optimization bug
For the test case in this patch like below
  struct t { int a; } __attribute__((preserve_access_index));
  int foo(void *);
  int test(struct t *arg) {
      long param[1];
      param[0] = (long)&arg->a;
      return foo(param);
  }

The IR right before BPF SimplifyPatchable phase:
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %2:gpr = LDD killed %1:gpr, 0
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %2:gpr
  STD killed %3:gpr, %stack.0.param, 0
After SimplifyPatchable phase, the incorrect IR is generated:
  %1:gpr = LD_imm64 @"llvm.t:0:0$0:0"
  %3:gpr = ADD_rr %0:gpr(tied-def 0), killed %1:gpr
  CORE_MEM killed %3:gpr, 306, %0:gpr, @"llvm.t:0:0$0:0"

Note that CORE_MEM pseudo op is introduced to encode
memory operations related to CORE. In the above, we intend
to check whether we have a store like
   *(%3:gpr + 0) = ...
and if this is the case, we could replace it with
   *(%0:gpr + @"llvm.t:0:0$0:0"_ = ...

Unfortunately, in the above, IR for the store is
   *(%stack.0.param + 0) = %3:gpr
and transformation should not happen.

Note that we won't have problem if the actual CORE
dereference (arg->a) happens.

This patch fixed the problem by skip CORE optimization if
the use of ADD_rr result is not the base address of the store
operation.

Differential Revision: https://reviews.llvm.org/D78466
2020-04-20 19:54:51 -07:00
Pavel Iliin 6e22a1e5c4 [AArch64][NFC] More intrinsic tests. 2020-04-21 00:00:26 +01:00
David Green ce1840a90a [ARM] MVE and scalar postinc mir tests. NFC 2020-04-20 22:00:07 +01:00
Piotr Sobczak c48ceaf37b Revert "[AMDGPU] Set the CostPerUse value for vgpr registers."
This reverts commit 728b878de6.

D76417 has caused vgpr count to go up significantly in real-world
graphics content.
2020-04-20 22:47:31 +02:00
Andrew Litteken 1488bef8fc [MachineOutliner] Annotation for outlined functions in AArch64
- Adding changes to support comments on outlined functions with outlining for the conditions through which it was outlined (e.g. Thunks, Tail calls)
- Adapts the emitFunctionHeader to print out a comment next to the header if the target specifies it based on information in MachineFunctionInfo
- Adds mir test for function annotiation

Differential Revision: https://reviews.llvm.org/D78062
2020-04-20 13:33:31 -07:00
Chris Bowler ff048af2e3 [NFC] [AIX] [PowerPC] Add missing instruction to AIX byval test 2020-04-20 15:00:59 -04:00
David Tenty 0098324947 [AIX] Return the correct set of callee saved regs
Summary:
r13 isn't reserved on 32-bit AIX, which is reflected in our calling
convention but not callee saved regs.

Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu

Reviewed By: sfertile

Subscribers: thakis, lei, wuzish, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77101
2020-04-20 14:31:08 -04:00
Nemanja Ivanovic 64b31d96df [PowerPC] Do not attempt to reuse load for 64-bit FP_TO_UINT without FPCVT
We call the function that attempts to reuse the conversion without checking
whether the target matches the constraints that the callee expects. This patch
adds the check prior to the call.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=43976

Differential revision: https://reviews.llvm.org/D77564
2020-04-20 13:00:06 -05:00
David Green 460202b464 [ARM] Add an low overhead sibling loop test. NFC 2020-04-20 18:46:38 +01:00
Sean Fertile 8541a3cc9d [PowerPC][AIX] Use a file check variable for register used in addressing. 2020-04-20 13:08:09 -04:00
David Tenty 28ae1969dc Revert "[AIX] Return the correct set of callee saved regs"
This reverts commit 6c881bf1fe.
2020-04-20 13:06:37 -04:00
Sean Fertile d52bb6d099 [PowerPC][AIX] ByVal formal argument support: passing on the stack.
Adds support for passing a ByVal formal argument completely on the stack
(ie after all argument registers are exhausted).

Differential Revision: https://reviews.llvm.org/D78263
2020-04-20 12:04:59 -04:00
David Tenty 6c881bf1fe [AIX] Return the correct set of callee saved regs
Summary:
r13 isn't reserved on 32-bit AIX, which is reflected in our calling
convention but not callee saved regs.

Reviewers: sfertile, ZarkoCA, cebowleratibm, jasonliu

Reviewed By: sfertile

Subscribers: lei, wuzish, nemanjai, hiraditya, kbarton, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77101
2020-04-20 11:22:17 -04:00
Petre-Ionut Tudor 52474992b1 Revert "[ARM] Fix conditions for lowering to S[LR]I"
This reverts commit cabfcf840a.
The patch introduced another bug in the optimization.
2020-04-20 16:11:04 +01:00
Ayke van Laethem 8aad119d93
[AVR] Do not place functions in .progmem.data
Previously, the AVR backend would put functions in .progmem.data. This
is probably a regression from when functions still lived in address
space 0. With this change, only global constants are placed in
.progmem.data.

This is not complete: avr-gcc additionally respects -fdata-sections for
progmem global constants, which LLVM doesn't yet do. But fixing that is
a bit more complicated (and I believe other backends such as RISC-V
might also have similar issues).

Differential Revision: https://reviews.llvm.org/D78212
2020-04-20 13:56:38 +02:00
Ayke van Laethem 9505b5cb66
[AVR] Do not use divmod calls for bigger integers
The avr-libc provides *divmodqi4, *divmodhi4, and *divmodsi4 functions,
but does not provide a *divmoddi4. Instead it provides regular *divdi3
and *moddi3 functions.

Note that avr-libc doesn't support *divti3 or *modti3 for 128-bit
integer manipulation.

Source:
https://github.com/gcc-mirror/gcc/blob/releases/gcc-5.4.0/libgcc/config/avr/lib1funcs.S

Differential Revision: https://reviews.llvm.org/D78437
2020-04-20 13:56:38 +02:00
Kerry McLaughlin 33ffce5414 [AArch64][SVE] Remove LD1/ST1 dependency on llvm.masked.load/store
Summary:
The SVE masked load and store intrinsics introduced in D76688 rely on
common llvm.masked.load/store nodes. This patch creates new ISD nodes
for LD1(S) & ST1 to remove this dependency.

Additionally, this adds support for sign & zero extending
loads and truncating stores.

Reviewers: sdesmalen, efriedma, cameron.mcinally, c-rhodes, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, andwar, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78204
2020-04-20 11:08:11 +01:00
Sam Parker 62f97123fb [ARM][MVE] Add patterns for VRHADD
Add patterns which use standard add nodes along with arm vshr imm
nodes.

Differential Revision: https://reviews.llvm.org/D77069
2020-04-20 10:05:21 +01:00
Kang Zhang a8e15ee04a [CodeGen] Support freeze expand for ppc_fp128
Summary:
The patch D29014 has added the new ISD::FREEZE and can deal with the
integer.
The patch D76980 has added SoftenFloatRes_FREEZE for float point.
But we still lack of expand for ppc_fp128, this will cause assertion for
some cases.
This patch is to support freeze expand for ppc_fp128.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D78278
2020-04-20 07:27:41 +00:00
Xiang1 Zhang 0980038a5e Handle CET for -exception-model sjlj
Summary:
In SjLj exception mode, the old landingpad BB will create a new landingpad BB and use indirect branch jump to the old landingpad BB in lowering.
So we should add 2 endbr for this exception model.

Reviewers: hjl.tools, craig.topper, annita.zhang, LuoYuanke, pengfei, efriedma

Reviewed By: LuoYuanke

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77124
2020-04-20 11:13:40 +08:00
David Green a0b1616359 [ARM] Regenerate tests. NFC 2020-04-19 13:45:39 +01:00
Simon Pilgrim e71dd7c011 [X86][SSE] getFauxShuffle - don't combine shuffles with small truncated scalars (PR45604)
getFauxShuffle attempts to combine INSERT_VECTOR_ELT(TRUNCATE/EXTEND(EXTRACT_VECTOR_ELT(x))) patterns into a target shuffle chain.

PR45604 identified an issue where the scalar was truncated to a size smaller than the destination vector element and then zero extended back, which requires the upper bits to be zero'd which we don't currently do.

To avoid the bug I've added an early out in these truncation cases, a future commit should allow us to handle this by inserting the necessary SM_SentinelZero padding.
2020-04-19 13:35:22 +01:00
Sanjay Patel cceb630a07 [x86] use vector instructions to lower more FP->int->FP casts
This is an enhancement to D77895 to avoid another
round-trip from XMM->GPR->XMM. This time we handle
the case of starting/ending with an f64 and casting
to signed i32 as the intermediate value.

It's a bit more involved than I initially assumed
because we need to use target-specific opcodes to
represent the non-standard cast ops.

Differential Revision: https://reviews.llvm.org/D78362
2020-04-19 08:33:17 -04:00
Simon Pilgrim d6db919bee [X86][SSE] Add test case for PR45604 2020-04-19 13:13:54 +01:00
LemonBoy a5d161c119 [PowerPC] Don't use rldicl for PPC32
According to https://www.ibm.com/support/knowledgecenter/ssw_aix_72/assembler/idalangref_rldicl_rletdw_instrs.html rldicl should not be used when targeting 32bit CPUs.

Reviewed By: #powerpc, nemanjai, MaskRay

Differential Revision: https://reviews.llvm.org/D77946
2020-04-18 17:24:25 -07:00
Andrew Litteken 8d5024f7fe fix to outline cfi instruction when can be grouped in a tail call
[MachineOutliner] fix test for excluding CFI and add test to include CFI in outlining

New test to check that we only outline CFI instruction if all CFI
Instructions in the function would be captured by the outlining

adding x86 tests analagous to AARCH64 cfi tests

Revision: https://reviews.llvm.org/D77852
2020-04-17 22:26:34 -07:00
Craig Topper 31a166e4cb [X86] Clean up some mir tests with INLINEASM to avoid regdef or to correct the immediate for the regdef.
The immediate used for the regdef is the encoding for the register
class in the enum generated by tablegen. This encoding will change
any time a new register class is added. Since the number is part
of the input, this means it can become stale.

This change modifies some test to avoid this kind of immediate
all together. And updates one test to use the current encoding of
GR64.
2020-04-17 21:55:44 -07:00
Jessica Paquette 66037b84cf MachineFunctionInfo for AArch64 in MIR
Starting with hasRedZone adding MachineFunctionInfo to be put in the YAML for MIR files.

Split out of: D78062

Based on implementation for MachineFunctionInfo for WebAssembly

Differential Revision: https://reviews.llvm.org/D78173

Patch by Andrew Litteken! (AndrewLitteken)
2020-04-17 15:16:59 -07:00
Craig Topper 5f69e53e55 [X86] Remove single incoming value phis from tests for the loop SAD pattern. NFC
InstCombine should ensure these don't exist.

I'm looking at making some changes to how we detect these
patterns and not having to worry about these phis will help.
2020-04-17 13:39:47 -07:00
Francesco Petrogalli 897fdec586 [llvm][CodeGen] Addressing modes for SVE stN.
This reverts commit 17b1869b72.

It is an attempt to fix the failure reported at

The patch differs from the original one reviwed at
https://reviews.llvm.org/D77435 only for the use of the std::make_tuple
in building the return value of `findAddrModeSVELoadStore`:

   -  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
   +  return std::make_tuple(IsRegReg ? Opc_rr : Opc_ri, NewBase,

the original patch submitted at
fc4e954ed5
was failing the following build:

http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420/

with error:

/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10:
error: chosen constructor is explicit in copy-initialization
  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19:
   note: explicit constructor declared here
           constexpr tuple(_UElements&&... __elements)
	                     ^
			     1 error generated.
2020-04-17 20:35:35 +01:00
Francesco Petrogalli 17b1869b72 Revert "[llvm][CodeGen] Addressing modes for SVE stN."
This reverts commit fc4e954ed5.

The commit reported the following failure:

http://lab.llvm.org:8011/builders/clang-armv7-linux-build-cache/builds/29420

FAILED: lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o
/usr/bin/c++   -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64 -I/usr/include/libxml2 -Iinclude -I/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/include -mthumb -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3  -fvisibility=hidden    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelDAGToDAG.cpp.o -c /home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp
/home/buildslave/buildslave/clang-armv7-linux-build-cache/llvm/llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp:1439:10: error: chosen constructor is explicit in copy-initialization
  return {IsRegReg ? Opc_rr : Opc_ri, NewBase, NewOffset};
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	   /usr/bin/../lib/gcc/arm-linux-gnueabihf/5.4.0/../../../../include/c++/5.4.0/tuple:479:19: note: explicit constructor declared here
	           constexpr tuple(_UElements&&... __elements)
2020-04-17 20:03:11 +01:00
Stanislav Mekhanoshin 992fbce4e9 [AMDGPU] copyPhysReg() for 16 bit SGPR subregs
Differential Revision: https://reviews.llvm.org/D78255
2020-04-17 11:59:39 -07:00
Stanislav Mekhanoshin fde2aefa22 [AMDGPU] Use SDWA for 16 bit subreg copy
This simplifies the logic and allows to use it on GFX8.

Differential Revision: https://reviews.llvm.org/D78150
2020-04-17 11:45:44 -07:00
Francesco Petrogalli fc4e954ed5 [llvm][CodeGen] Addressing modes for SVE stN.
Reviewers: efriedma, sdesmalen, c-rhodes, ctetreau

Reviewed By: c-rhodes

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77435
2020-04-17 19:31:44 +01:00
Francesco Petrogalli 48879c02bf [llvm][CodeGen] Fix issue for SVE gather prefetch.
Summary:
This change is fixing an issue where the dagcombine incorrectly used an addressing mode with scaled offsets (indices), instead of unscaled offsets.
Those addressing modes do not exist for `prfh` , `prfw` and `prfd`, hence we can reuse `prfb` because that has unscaled offsets, and because the pseudo-code in the XML spec suggests that the element size is not used for the amount of data that is prefetched by the instruction.

FWIW, GCC also emits a `prfb` for these cases.

Reviewers: sdesmalen, andwar, rengolin

Reviewed By: sdesmalen

Subscribers: tschuett, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78069
2020-04-17 19:23:28 +01:00
Petre-Ionut Tudor cabfcf840a [ARM] Fix conditions for lowering to S[LR]I
Summary:
Fixed wrong conditions for generating (S[LR]I X, Y, C2) from
(or (and X, BvecC1), (lsl Y, C2)) and added ISel nodes to lower to S[LR]I. The
optimisation is also enabled by default now.

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77387
2020-04-17 17:19:24 +01:00
Stefan Pintilie b771c4a842 [PowerPC][Future] More support for PCRel addressing for global values
Add initial support for PC Relative addressing for global values that
require GOT indirect addressing. This patch adds PCRelative support for
global addresses that may not be known at link time and may require
access through the GOT.

Differential Revision: https://reviews.llvm.org/D76064
2020-04-17 11:06:13 -05:00
Dominik Montada 55e3a7c6b2 [GlobalISel][AMDGPU] add legalization for G_FREEZE
Summary:
Copy the legalization rules from SelectionDAG:
-widenScalar using anyext
-narrowScalar using intermediate merges
-scalarize/fewerElements using unmerge
-moreElements using G_IMPLICIT_DEF and insert

Add G_FREEZE legalization actions to AMDGPULegalizerInfo.
Use the same legalization actions as G_IMPLICIT_DEF.

Depends on D77795.

Reviewers: dsanders, arsenm, aqjune, aditya_nandakumar, t.p.northover, lebedev.ri, paquette, aemerson

Reviewed By: arsenm

Subscribers: kzhuravl, yaxunl, dstuttard, tpr, t-tye, jvesely, nhaehnle, kerbowa, wdng, rovka, hiraditya, volkan, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78092
2020-04-17 16:44:46 +02:00
jasonliu 77618cc237 [XCOFF][AIX] Fix getSymbol to return the correct qualname when necessary
Summary:
AIX symbol have qualname and unqualified name. The stock getSymbol
could only return unqualified name, which leads us to patch many
caller side(lowerConstant, getMCSymbolForTOCPseudoMO).
So we should try to address this problem in the callee
side(getSymbol) and clean up the caller side instead.

Note: this is a "mostly" NFC patch, with a fix for the original
lowerConstant behavior.

Differential Revision: https://reviews.llvm.org/D78045
2020-04-17 13:45:14 +00:00
Sanjay Patel a6fc687e34 [x86] add/adjust tests for FP<->int casts; NFC 2020-04-17 08:22:42 -04:00
Roger Ferrer Ibanez 5f23686412 [RISCV][AsmParser] Implement .option (no)pic
Differential Revision: https://reviews.llvm.org/D77867
2020-04-17 12:08:30 +00:00
Sam Parker f88000a4b5 [ARM][MVE] Add VHADD and VHSUB patterns
Add patterns that use a normal, non-wrapping, add and sub nodes along
with an arm vshr imm node.

Differential Revision: https://reviews.llvm.org/D77065
2020-04-17 07:45:15 +01:00
QingShan Zhang 4bd186c0ff [PowerPC] Exploit the rldicl + rldicl when and with mask
If we are and the constant like 0xFFFFFFC00000, for now, we are using several
instructions to generate this 48bit constant and final an "and". However, we
could exploit it with two rotate instructions.

       MB          ME               MB+63-ME
+----------------------+     +----------------------+
|0000001111111111111000| ->  |0000000001111111111111|
+----------------------+     +----------------------+
 0                    63      0                    63
Rotate left ME + 1 bit first, and then, mask it with (MB + 63 - ME, 63),
finally, rotate back. Notice that, we need to round it with 64 bit for the
wrapping case.

Reviewed by: ChenZheng, Nemanjai

Differential Revision: https://reviews.llvm.org/D71831
2020-04-17 05:24:00 +00:00
Craig Topper 944cc5e0ab [SelectionDAGBuilder][CGP][X86] Move some of SDB's gather/scatter uniform base handling to CGP.
I've always found the "findValue" a little odd and
inconsistent with other things in SDB.

This simplfifies the code in SDB to just handle a splat constant
address or a 2 operand GEP in the same BB. This removes the
need for "findValue" since the operands to the GEP are
guaranteed to be available. The splat constant handling is
new, but was needed to avoid regressions due to constant
folding combining GEPs created in CGP.

CGP is now responsible for canonicalizing gather/scatters into
this form. The pattern I'm using for scalarizing, a scalar GEP
followed by a GEP with an all zeroes index, seems to be subject
to constant folding that the insertelement+shufflevector was not.

Differential Revision: https://reviews.llvm.org/D76947
2020-04-16 17:49:22 -07:00
Wouter van Oortmerssen 48139ebc3a [WebAssembly] Add int32 DW_OP_WASM_location variant
This to allow us to add reloctable global indices as a symbol.
Also adds R_WASM_GLOBAL_INDEX_I32 relocation type to support it.

See discussion in https://github.com/WebAssembly/debugging/issues/12
2020-04-16 16:32:17 -07:00
Sanjay Patel b29fca30fa [x86] auto-generate complete test checks; NFC 2020-04-16 17:16:51 -04:00
David Green 94052da929 [ARM] MVE postinc tests. NFC 2020-04-16 22:05:28 +01:00
David Green 8e8c3c3408 [ARM] Mir test for machine sinking multiple def instructions. NFC 2020-04-16 20:58:14 +01:00
bd1976llvm 86478d3de9 [MC][ELF] Put explicit section name symbols into entry size compatible sections
Ensure that symbols explicitly* assigned a section name are placed into
a section with a compatible entry size.

This is done by creating multiple sections with the same name** if
incompatible symbols are explicitly given the name of an incompatible
section, whilst:

  - Avoiding using uniqued sections where possible (for readability and
    to maximize compatibly with assemblers).

  - Creating as few SHF_MERGE sections as possible (for efficiency).

Given that each symbol is assigned to a section in a single pass, we
must decide which section each symbol is assigned to without seeing the
properties of all symbols. A stable and easy to understand assignment is
desirable. The following rules facilitate this: The "generic" section
for a given section name will be mergeable if the name is a mergeable
"default" section name (such as .debug_str), a mergeable "implicit"
section name (such as .rodata.str2.2), or MC has already created a
mergeable "generic" section for the given section name (e.g. in response
to a section directive in inline assembly). Otherwise, the "generic"
section for a given name is non-mergeable; and, non-mergeable symbols
are assigned to the "generic" section, while mergeable symbols are
assigned to uniqued sections.

Terminology:
"default" sections are those always created by MC initially, e.g. .text
or .debug_str.

"implicit" sections are those created normally by MC in response to the
symbols that it encounters, i.e. in the absence of an explicit section
name assignment on the symbol, e.g. a function foo might be placed into
a .text.foo section.

"generic" sections are those that are referred to when a unique section
ID is not supplied, e.g. if there are multiple unique .bob sections then
".quad .bob" will reference the generic .bob section. Typically, the
generic section is just the first section of a given name to be created.
Default sections are always generic.

* Typically, section names might be explicitly assigned in source code
using a language extension e.g. a section attribute: _attribute_
((section ("section-name"))) -
https://clang.llvm.org/docs/AttributeReference.html

** I refer to such sections as unique/uniqued sections. In assembly the
", unique," assembly syntax is used to express such sections.

Fixes https://bugs.llvm.org/show_bug.cgi?id=43457.

See https://reviews.llvm.org/D68101 for previous discussions leading to
this patch.

Some minor fixes were required to LLVM's tests, for tests had been using
the old behavior - which allowed for explicitly assigning globals with
incompatible entry sizes to a section.

This fix relies on the ",unique ," assembly feature. This feature is not
available until bintuils version 2.35
(https://sourceware.org/bugzilla/show_bug.cgi?id=25380). If the
integrated assembler is not being used then we avoid using this feature
for compatibility and instead try to place mergeable symbols into
non-mergeable sections or issue an error otherwise.

Differential Revision: https://reviews.llvm.org/D72194
2020-04-16 19:12:49 +00:00
Jaydeep Chauhan 561cb14e74 [LLVM] Remove wrong DBG_VALUE instruction with one operand in AArch64 test case
Summary:
AArch64 test case llvm/test/CodeGen/AArch64/branch-target-enforcement.mir is checking for invalid  DBG_VALUE instruction with one operand(`DBG_VALUE $lr`). And this DBG_VALUE instruction is echoed from test case it self only.

Correct format of DBG_VALUE is given in below link:
https://llvm.org/docs/SourceLevelDebugging.html#variable-locations-in-instruction-selection-and-mir

Reviewers: dsanders, eli.friedman, jmorse, vsk

Reviewed By: dsanders

Subscribers: kristof.beyls, danielkiss, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78309
2020-04-16 11:58:07 -07:00
Cameron McInally 1223255c2d [AArch64][SVE] Add DestructiveBinaryImm SQSHLU patterns.
Add DestructiveBinaryImm SQSHLU patterns and tests. These patterns allow the SQSHLU instruction to match with a MOVPRFX.

Differential Revision: https://reviews.llvm.org/D76728
2020-04-16 13:48:08 -05:00
Stefan Pintilie 18b6050324 [PowerPC][Future] Initial support for PC Relative addressing for global values
This patch adds PC Relative support for global values that are known at link
time. If a global value requires access through the global offset table (GOT)
it is not covered in this patch.

Differential Revision: https://reviews.llvm.org/D75280
2020-04-16 12:45:22 -05:00