Commit Graph

31631 Commits

Author SHA1 Message Date
Oliver Stannard 6ae3d310bd Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions"
This reverts commit cec2d5c174.

Reverting because this is still creating outlined functions with return
address signing instructions with mismatches SP values. For example:

  int *volatile v;

  void foo(int x) {
    int a[x];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

  void bar(int x) {
    int a[x];
    v = 0;
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

This generates these two outlined functions, both of which modify SP
between the paciasp and retaa instructions:

  $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf
  ...
  OUTLINED_FUNCTION_0:                    // @OUTLINED_FUNCTION_0
          .cfi_sections .debug_frame
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          mov     w8, w0
          lsl     x8, x8, #2
          add     x8, x8, #15             // =15
          mov     x9, sp
          and     x8, x8, #0x7fffffff0
          sub     x8, x9, x8
          mov     x29, sp
          mov     sp, x8
          adrp    x9, v
          retaa
  ...
  OUTLINED_FUNCTION_1:                    // @OUTLINED_FUNCTION_1
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          mov     sp, x29
          retaa
2019-12-11 12:06:20 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
czhengsz bf4580b7e7 [PowerPC][NFC] add test case for lwa - loop ds form prep 2019-12-11 06:10:11 -05:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
QingShan Zhang f99297176c [PowerPC] Exploitate the Vector Integer Average Instructions
PowerPC has instruction to do the semantics of this piece of code:

vector int foo(vector int m, vector int n) {
  return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.

Differential Revision: https://reviews.llvm.org/D71002
2019-12-11 07:25:57 +00:00
Craig Topper 935d41e4bd [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256
This is an improvement to 88dacbd436
2019-12-10 17:29:02 -08:00
Puyan Lotfi f364686f34 [llvm][MIRVRegNamerUtil] Adding hashing against MachineInstr flags.
Now, flags will result in differing hashes for a given MI. In effect, if
you have two instructions with everything identical except for their
flags then you should get two different hashes and fewer collisions.

Differential Revision: https://reviews.llvm.org/D70479
2019-12-10 20:16:14 -05:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Craig Topper 88dacbd436 [X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256.
This reverts 3e1aee2ba7 in favor
of a different approach.

Scalarizing isn't great codegen, but making the type illegal was
interfering with k constraint in inline assembly.
2019-12-10 15:07:55 -08:00
Yonghong Song 7d0e8930ed [BPF] put not-section-attribute externs into BTF ".extern" data section
Currently for extern variables with section attribute, those
BTF_KIND_VARs will not be placed in any DataSec. This is
inconvenient as any other generated BTF_KIND_VAR belongs to
one DataSec. This patch put these extern variables into
".extern" section so bpf loader can have a consistent
processing mechanism for all data sections and variables.
2019-12-10 11:45:17 -08:00
Mikhail Maltsev e6d3261c67 [ARM][MVE] Refactor complex vector intrinsics [NFCI]
Summary:
This patch refactors instruction selection of the complex vector
addition, multiplication and multiply-add intrinsics, so that it is
now based on TableGen patterns rather than C++ code.

It also changes the first parameter (halving vs non-halving) of the
arm_mve_vcaddq IR intrinsic to match the corresponding instruction
encoding, hence it requires some changes in the tests.

The patch addresses David's comment in https://reviews.llvm.org/D71190

Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71245
2019-12-10 16:21:52 +00:00
diggerlin 98f5f022f0 [BUG-FIX][XCOFF] fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections
SUMMARY:
  Fixed a bug of XCOFFObjectFile.cpp when there is padding at the last csect of a sections.
when there is a tail padding of a section, but the value of CurrentAddressLocation do not be increased by the padding size. it will hit assert assert(CurrentAddressLocation == Section->Address && "We should have no padding between sections.");

Reviewers: daltenty,hubert.reinterpretcast,

Differential Revision: https://reviews.llvm.org/D70859
2019-12-10 11:14:49 -05:00
Kiran Chandramohan 965ed1e974 [AArch64] Fix issues with large arrays on stack
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.

Reviewers: sdesmalen, efriedma, fhahn, aemerson

Reviewed By: efriedma

Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70496
2019-12-10 11:44:41 +00:00
Cullen Rhodes 1b9a608c84 [AArch64][SVE] Add wide compare immediate patterns
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.

Patch by Graham Hunter

Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71009
2019-12-10 10:41:22 +00:00
Mikael Holmen 4763267eee [LegalizeTypes] Bugfixes for big-endian targets when handling BITCASTs
Summary:
This fixes PR44135.

The special case when we promote a bitcast from a vector to an int
needs special handling when we are on a big-endian target.

Prior to this fix, for the added vec_to_int we see the following in the
SelectionDAG printouts

Type-legalized selection DAG: %bb.1 'foo:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t17: v4i32 = bitcast t2
    t23: i32 = extract_vector_elt t17, Constant:i32<3>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t23
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

and I think here the extract_vector_elt is wrong and extracts the value
from the wrong index.

The program program should return the 32 bits made up of the elements at
index 4 and 5 in the vec6 array, but with

    t23: i32 = extract_vector_elt t17, Constant:i32<3>

as far as I can tell, we will extract values that originally didn't even
exist in the vec6 vectore.

If we would instead extract the element at index 2 we would get the wanted
values.

With this fix we insert a right shift after the bitcast in
DAGTypeLegalizer::PromoteIntRes_BITCAST which then gives us

Type-legalized selection DAG: %bb.1 'vec_to_int:bb.1'
SelectionDAG has 9 nodes:
  t0: ch = EntryToken
        t2: v8i16,ch = CopyFromReg t0, Register:v8i16 %0
      t23: v4i32 = bitcast t2
    t27: i32 = extract_vector_elt t23, Constant:i32<2>
  t8: ch,glue = CopyToReg t0, Register:i32 $r0, t27
  t9: ch = ARMISD::RET_FLAG t8, Register:i32 $r0, t8:1

So now we get

    t27: i32 = extract_vector_elt t23, Constant:i32<2>

which is what we want.

Similarly, the new int_to_vec testcase exposes a bug where we cast the other
direction. Then we instead need to add a left shift before the bitcast on
big-endian targets for the bits in the input integer to end up at the exptected
place in the vector.

Reviewers: bogner, spatel, craig.topper, t.p.northover, dmgreen, efriedma, SjoerdMeijer, samparker

Reviewed By: efriedma

Subscribers: eli.friedman, bjope, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70942
2019-12-10 11:22:35 +01:00
Mikael Holmen 4d280d3ac0 Add testcases exposing PR44135 2019-12-10 11:22:35 +01:00
Yonghong Song 4448125007 [BPF] Support to emit debugInfo for extern variables
extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
  - extern types are required to have a suitable interface
    in libbpf (bpf loader) to provide kernel config parameters
    to bpf programs.
    https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
  - extern types are required so kernel bpf verifier can
    verify program which uses external functions more precisely.
    This will make later link with actual external function no
    need to reverify.
    https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed

This patch added bpf support to consume such info into BTF,
which can then be used by bpf loader. Function processFuncPrototypes()
only adds extern function definitions into BTF. The functions
with actual definition have been added to BTF in some other places.

Differential Revision: https://reviews.llvm.org/D70697
2019-12-09 21:53:29 -08:00
Liu, Chen3 bbf7860b93 add support for strict operation fpextend/fpround/fsqrt on X86 backend
Differential Revision: https://reviews.llvm.org/D71184
2019-12-10 09:04:28 +08:00
Eric Christopher 9c6b7f68b8 Revert "[ARM][MVE] Add intrinsics for immediate shifts."
and two follow-on commits: one warning fix and one functionality.

As it's breaking at least the lto bot:

http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio

This reverts commits:

 8d70f3c933
 ff4dceef92
 d97b3e3e65
2019-12-09 16:47:38 -08:00
Dávid Bolvanský 584ed88226 [Codegen][X86] Modernize/regenerate old tests. NFCI.
Summary:
Switch to FileCheck where possible.
Adjust tests so they can be easily regenerated by update scripts.

Reviewers: craig.topper, spatel, RKSimon

Reviewed By: spatel

Subscribers: MatzeB, qcolombet, arphaman, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71211
2019-12-10 00:27:46 +01:00
Eli Friedman f1ddef34f1 [AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors.
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.

Differential Revision: https://reviews.llvm.org/D71160
2019-12-09 15:09:33 -08:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Jinsong Ji 3d41a58eac [PowerPC][NFC] Rename ANDI(S)o8 to ANDI(S)8o
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.

This patch rename them to be consistent with others.

Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg

Reviewed By: jhibbits

Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70928
2019-12-09 19:21:34 +00:00
Mark Murray fc3417cb5a [ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics.
Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen, miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71198
2019-12-09 17:41:47 +00:00
Mark Murray 2eb61fa5d6 [ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT|POLY) intrinsics.
Summary: Add VMULL[BT]Q_(INT|POLY) intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71066
2019-12-09 17:41:47 +00:00
Simon Tatham d97b3e3e65 [ARM][MVE] Add intrinsics for immediate shifts.
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-09 15:44:09 +00:00
Sam Elliott cb664baf50 [RISCV] Fix mir-target-flags.ll 2019-12-09 13:51:08 +00:00
Sam Elliott c20930a724 [RISCV] Machine Operand Flag Serialization
Summary:
These hooks ensure that the RISC-V backend can serialize and parse MIR
correctly.

Reviewers: jrtc27, luismarques

Reviewed By: luismarques

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70666
2019-12-09 13:18:32 +00:00
Mikhail Maltsev 0d1490bf6a [ARM][MVE] Add complex vector intrinsics
Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA

Each of the above 3 groups has a corresponding new LLVM IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71190
2019-12-09 12:05:59 +00:00
David Green b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
Amaury Séchet d7aded3937 [PowerPC] Automatically generate store-constant.ll . NFC 2019-12-09 01:08:18 +01:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
Yonghong Song 5ea611daf9 [BPF] Support weak global variables for BTF
Generate types for global variables with "weak" attribute.
Keep allocation scope the same for both weak and non-weak
globals as ELF symbol table can determine whether a global
symbol is weak or not.

Differential Revision: https://reviews.llvm.org/D71162
2019-12-07 08:58:19 -08:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Kai Luo 884351547d [PowerPC] Fix MI peephole optimization for splats
Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.

Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.

This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.

Here is an example in pseudo-MIR of what happens in the test cased added in this patch:

Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1

$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64

%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1

%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```

After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```

As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.

Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497
2019-12-07 14:51:20 +08:00
Amara Emerson c77b441140 [AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.
Only implemented for the type combinations already supported for G_SHL.

Differential Revision: https://reviews.llvm.org/D71153
2019-12-06 16:24:57 -08:00
Amara Emerson 84fdd9d7a5 [X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.
The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.

Differential Revision: https://reviews.llvm.org/D71095
2019-12-06 14:44:56 -08:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Reid Kleckner c089f02898 [X86] Don't setup and teardown memory for a musttail call
Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.

This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.

If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.

In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71097
2019-12-06 12:58:54 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00
Alina Sbirlea c7faa68142 Revert "ARM-Darwin: keep the frame register reserved even if not updated."
This reverts commit a7d90af1be.

This revision came back as the root-cause for crashes in internal
ARM-IOS apps.
Reproducer in https://bugs.llvm.org/show_bug.cgi?id=44231.
2019-12-06 10:59:26 -08:00
Hiroshi Yamauchi 9a0b5e1407 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71072
2019-12-06 10:43:39 -08:00
Guozhi Wei 72942459d0 [MBP] Avoid tail duplication if it can't bring benefit
Current tail duplication integrated in bb layout is designed to increase the fallthrough from a BB's predecessor to its successor, but we have observed cases that duplication doesn't increase fallthrough, or it brings too much size overhead.

To overcome these two issues in function canTailDuplicateUnplacedPreds I add two checks:

  make sure there is at least one duplication in current work set.
  the number of duplication should not exceed the number of successors.

The modification in hasBetterLayoutPredecessor fixes a bug that potential predecessor must be at the bottom of a chain.

Differential Revision: https://reviews.llvm.org/D64376
2019-12-06 09:53:53 -08:00
diggerlin 4a7e00df34 [AIX][XCOFF] created a test case to verify the raw text section of xcoffobject file
SUMMARY:
in the patch https://reviews.llvm.org/D66969 . we need a test case to verify the out text section of the xcoffobject file is correct or not.

but we do not have llvm disassembly tools to dump the xcoffobjectfile . since we commit the patch https://reviews.llvm.org/D70255, we have tools for it. we create this test case for it.

Reviewers: daltenty,hubert.reinterpretcast,

Differential Revision: https://reviews.llvm.org/D70719
2019-12-06 10:12:09 -05:00
Cullen Rhodes 2c63e8e36d [AArch64] Fix a bug with jump table generation
Summary:
When trying to calculate the offsets for the jump table entries
we fail to take into account the block alignment, which could be
greater than 4 bytes. This led to cases where the jump table
offset was too big to fit in a byte.

Reviewers: t.p.northover, sdesmalen, ostannard

Reviewed By: ostannard

Subscribers: ostannard, kristof.beyls, hiraditya, llvm-commits

Committed on behalf of David Sherwood (david-arm)

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70533
2019-12-06 14:31:53 +00:00
Cullen Rhodes b31a531f9b [AArch64][SVE2] Implement while comparison intrinsics
Summary:
Adds the following intrinsics:

    * whilege, whilegt, whilehi, whilehs

Reviewers: sdesmalen, rovka, dancgr, efriedma, rengolin, huntergr

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70909
2019-12-06 11:29:34 +00:00
Ulrich Weigand b3009edcf3 [X86] Regenerate test to fix build bot failures
After my recent commit daee549 the following test case is failing:
CodeGen/X86/vector-constrained-fp-intrinsics.ll

Not sure why I didn't catch this earlier, seems to be affected
by other changes that came in recently.

Fixed by regerenating the test again.  Sorry for the disruption!
2019-12-06 12:11:56 +01:00
Cullen Rhodes bb8c679f4b [AArch64][SVE] Implement integer compare intrinsics
Summary:
Adds intrinsics for the following:

    * cmphs, cmphi
    * cmpge, cmpgt
    * cmpeq, cmpne
    * cmplt, cmple
    * cmplo, cmpls

Includes a minor change to `TLI.getMemValueType` that fixes a crash due to the
scalable flag being dropped.

Reviewers: sdesmalen, efriedma, rengolin, rovka, dancgr, huntergr

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70889
2019-12-06 10:39:06 +00:00