Commit Graph

55352 Commits

Author SHA1 Message Date
Craig Topper b2f19320dc [X86] Use isOneConstant to simplify some code. NFC 2019-12-29 16:53:38 -08:00
Craig Topper 599d070910 [X86] Remove dyn_casts to ConstantSDNode for operand 1 of X86ISD::VSRLI/VSRAI/VSRLI. Use getConstantOperandVal and APInt operations.
These nodes should only ever be formed with an i8 TargetConstant
so we don't need to check for it to be a constant. It's also
always 8-bits so we don't need to use APInt compare functions.
2019-12-29 16:53:38 -08:00
Fangrui Song 5edb40c022 [SelectionDAG] Disallow indirect "i" constraint
This allows us to delete InlineAsm::Constraint_i workarounds in
SelectionDAGISel::SelectInlineAsmMemoryOperand overrides and
TargetLowering::getInlineAsmMemConstraint overrides.

They were introduced to X86 in r237517 to prevent crashes for
constraints like "=*imr". They were later copied to other targets.
2019-12-29 16:50:42 -08:00
Craig Topper a5c96e326a [X86] Stop accidentally custom type legalizing v4i32->v4f32 on SSE1 only targets.
We had a Custom operation action for v4i32 on SSE1. But since
v4i32 isn't legal until SSE2 this was not what was intended. The
code that get executed was intended for op legalization and
creates a bunch of v4i32 nodes that all end up scalarized.
2019-12-28 23:11:48 -08:00
Craig Topper ae321faeed [X86] Remove a redundant (scalar_to_vector (extract_vector_elt X))) in LowerUINT_TO_FP_i32. NFCI 2019-12-28 21:49:22 -08:00
Fangrui Song f748fdb05f [X86] Fix -enable-machine-outliner for x86-32 after D48683
D48683 accidentally disabled -enable-machine-outliner for x86-32.
2019-12-28 17:26:51 -08:00
Nemanja Ivanovic b6cf400aae Fix bots after a9ad65a2b3
In the last commit, I neglected to initialize the new subtarget feature
I added which caused failures on a few bots. This should fix that.
2019-12-28 13:07:18 -06:00
Nemanja Ivanovic a9ad65a2b3 [PowerPC] Change default for unaligned FP access for older subtargets
This is a fix for https://bugs.llvm.org/show_bug.cgi?id=40554

Some CPU's trap to the kernel on unaligned floating point access and there are
kernels that do not handle the interrupt. The program then fails with a SIGBUS
according to the PR. This just switches the default for unaligned access to only
allow it on recent server CPUs that are known to allow this.

Differential revision: https://reviews.llvm.org/D71954
2019-12-28 11:20:52 -06:00
Kang Zhang d1b51c5de7 [PowerPC] Modify the hasSideEffects of some VSX instructions from 1 to 0
Summary:
If we didn't set the value for hasSideEffects bit in our td file,  `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
Below 6 instructions don't set the hasSideEffects flag and don't have match
pattern, so their hasSideEffects flag will be set true by llvm-tblgen.

But in fact below instructions don't modify any special register and don't have
other SideEffects, they shouldn't have SideEffects.
This patch is to modify the hasSideEffects of below instructions from 1 to 0.

```
VEXTUHLX
VEXTUHRX
VEXTUWLX
VEXTUWRX
VSPLTBs
VSPLTHs
```

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D71391
2019-12-28 09:04:54 +00:00
Matt Arsenault 5ce2ca524e AMDGPU/GlobalISel: Use SReg_32 for readfirstlane constraining
This matches the DAG behavior where we don't use SReg_32_XM0
everywhere anymore, and fixes not coalescing the copies into m0.
2019-12-27 17:52:12 -05:00
Matt Arsenault e9775bb5d8 Hexagon: Fix missing tablegen mode comment 2019-12-27 17:05:04 -05:00
Matt Arsenault e29ae3799b TII: Fix using Register for a subregister index argument 2019-12-27 16:53:29 -05:00
Matt Arsenault 9acd9544db AMDGPU: Use Register 2019-12-27 16:53:21 -05:00
Matt Arsenault e088846712 AMDGPU/GlobalISel: Fix extra result register in fdiv64 lowering
There ended up being two result registers, which would fail on
select. It was really defing a new temp register in the correct def
position, instead of the correct result register.
2019-12-27 08:49:43 -05:00
Matt Arsenault ed9a56b0f2 AMDGPU/GlobalISel: Select some 128-bit load/stores 2019-12-27 08:49:43 -05:00
Matt Arsenault a37e958558 AMDGPU: Use correct DebugLoc 2019-12-27 08:49:43 -05:00
Craig Topper fca4736874 [X86] Allow v2i32->v2f32 strict and non-strict uint_to_fp to be widened to v4i32->v4f32 under avx512.
With avx512vl we get v4i32->v4f32 uint_to_fp instructions. With
avx512f we get v16i32->v16f32 instructions which we can use to
emulate v4i32->v4f32.
2019-12-27 00:28:44 -08:00
Craig Topper 20aab49492 [X86] Custom widen v2i32->v2f32 strict_sint_to_fp to avoid scalarization. 2019-12-27 00:28:44 -08:00
Fangrui Song 7a7334663c Delete llvm.{sig,}{setjmp,longjmp} remnant after r136821
Intrinsic has incorrect argument type!
  i32 (i32*)* @llvm.setjmp

*wipes tear*
2019-12-27 00:00:14 -08:00
Craig Topper ecbaf152f8 [X86] Custom widen 128/256-bit vXi32 fp_to_uint on avx512f targets without avx512vl. Similar for vXi64 on avx512dq without avx512vl.
Summary:
Previously we did this with isel patterns that used garbage in
the widened part of the source. But that's not valid for strictfp.
So now we custom widen and use zeroes for the widened elemens for
strictfp.

This replaces D71864.

Reviewers: RKSimon, spatel, andrew.w.kaylor, pengfei, LiuChen3

Reviewed By: pengfei

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71879
2019-12-26 22:04:40 -08:00
Craig Topper 50fb3957c1 [X86] Custom widen strict v2f32->v2i32 by padding with zeroes.
For non-strict, generic type legalization will take care of this,
but that doesn't happen currently for strict nodes.
2019-12-26 21:45:18 -08:00
Fangrui Song c4a97b64e3 [X86] Fix -Wmisleading-indentation after D71892 2019-12-26 21:41:16 -08:00
Craig Topper 53ee806d93 [X86][FPEnv] Promote some float strictfp operations to double on i686-pc-windows-msvc to match what we do for non-strict.
The float libcalls are inlined in MSVC's math header where they
just cast to double and use the double libcall. Do the same when
we emit libcalls.
2019-12-26 20:22:24 -08:00
Craig Topper a5d266b9cf [X86] Add custom legalization for strict_uint_to_fp v2i32->v2f32.
I believe the algorithm we use for non-strict is exception safe
for strict. The fsub won't generate any exceptions. After it we
will have an exact version of the i32 integer in a double. Then
we just round it to f32. That rounding will generate a precision
exception if it can't be represented exactly.
2019-12-26 19:10:26 -08:00
Liu, Chen3 1a7b69f5dd add custom operation for strict fpextend/fpround
Differential Revision: https://reviews.llvm.org/D71892
2019-12-27 08:28:33 +08:00
Eric Christopher 1584e2f987 Remove SrcVT only used in an assert and propagate query. 2019-12-26 15:28:32 -08:00
Craig Topper f953882113 [X86] Custom widen 128/256-bit vXi32 uint_to_fp on avx512f targets without avx512vl. Similar for vXi64 sint_to_fp/uint_to_fp on avx512dq without avx512vl.
Previously we widened these through isel patterns, but that
didn't work for STRICT_ nodes. Those need to be padded with
zeroes in the upper bits which is harder to do in isel patterns.
2019-12-26 14:46:56 -08:00
Craig Topper 90ff34e6ab [X86] Add custom widening for v2i32->v2f64 strict_uint_to_fp with AVX512F, but not AVX512VL.
Previously we were widening with isel patterns, but that wasn't
exception safe for strict FP. So now we widen to v4i32->v4f64
during type legalization. And then let op legalization further
widen to v8i32->v8f64.

The vec_int_to_fp.ll changes are caused by us no longer narrowing
extracts of strict_uint_to_fp to the v4i32->v2f64 instruction
without AVX512VL only to have isel rewiden it. Now we just keep
it wide throughout. So we don't have an opportunity to narrow
the load.
2019-12-26 13:40:56 -08:00
Craig Topper bb0138729b [X86] Add custom widening for v2f64->v2i32 strict_fp_to_uint with avx512f, but not avx512vl.
AVX512F added instruction for vector fp_to_uint conversions. With
AVX512VL we can use a specific instruction that does v2f64->v4i32 with
zeroes in the 2 extra elements. For non-strict nodes without AVX512VL
we relied on type legalization to turn it to v4f64->v4i32 which would
later be widened by op legalization to v8f64->v8i32. But type legalization
doesn't currently widen strict nodes since it doesn't know how to
safely and efficiently pad the extra elements. But for X86 we know
padding with zeroes is safe and efficient so do that ourselves.
2019-12-26 12:42:27 -08:00
Yonghong Song ffd57408ef [BPF] Enable relocation location for load/store/shifts
Previous btf field relocation is always at assignment like
   r1 = 4
which is converted from an ld_imm64 instruction.

This patch did an optimization such that relocation
instruction might be load/store/shift. Specically, the
following insns may also have relocation, except BPF_MOV:
  LDB, LDH, LDW, LDD, STB, STH, STW, STD,
  LDB32, LDH32, LDW32, STB32, STH32, STW32,
  SLL, SRL, SRA

To accomplish this, a few BPF target specific
codegen only instructions are invented. They
are generated at backend BPF SimplifyPatchable phase,
which is at early llc phase when SSA form is available.
The new codegen only instructions will be converted to
real proper instructions at the codegen and BTF emission stage.

Note that, as revealed by a few tests, this optimization might
be actual generating more relocations:
Scenario 1:
  if (...) {
    ... __builtin_preserve_field_info(arg->b2, 0) ...
  } else {
    ... __builtin_preserve_field_info(arg->b2, 0) ...
  }
  Compiler could do CSE to only have one relocation. But if both
  of the above is translated into codegen internal instructions,
  the compiler will not be able to do that.
Scenario 2:
  offset = ... __builtin_preserve_field_info(arg->b2, 0) ...
  ...
  ...  offset ...
  ...  offset ...
  ...  offset ...
  For whatever reason, the compiler might be temporarily do copy
  propagation of the righthand of "offset" assignment like
  ...  __builtin_preserve_field_info(arg->b2, 0) ...
  ...  __builtin_preserve_field_info(arg->b2, 0) ...
  and CSE will be able to deduplicate later.
  But if these intrinsics are converted to BPF pseudo instructions,
  they will not be able to get deduplicated.

I do not expect we have big instruction count difference.
It may actually reduce instruction count since now relocation
is in deeper insn dependency chain.
For example, for test offset-reloc-fieldinfo-2.ll, this patch
generates 7 instead of 6 relocations for non-alu32 mode, but it
actually reduced instruction count from 29 to 26.

Differential Revision: https://reviews.llvm.org/D71790
2019-12-26 09:07:39 -08:00
Craig Topper c91bf72e2c [X86] Merge the SINT_TO_FP/UINT_TO_FP handlers in ReplaceNodeResults since the AVX512DQ+AVX512VL code is very similar in both. NFC 2019-12-26 08:58:34 -08:00
Craig Topper 4e6b0dd681 [X86] Add custom lowering for v2i64->v2f32 strict_sint_to_fp/strict_uint_to_fp for avx512dq+avx512vl targets.
With avx512dq+avx512vl we have instruction that implements this and
places zeroes in the upper 64-bits of the destination xmm register.
2019-12-26 08:58:34 -08:00
czhengsz 1b57749a53 [PowerPC] stop folding if result rlwinm mask is wrap while original rlwinm is not.
%1:g8rc = RLWINM8 %0:g8rc, 0, 16, 9
%2:g8rc = RLWINM8 killed %1:g8rc, 0, 0, 31
->
%2:g8rc = RLWINM8 %0:g8rc, 0, 16, 9

The above folding is wrong. Before transformation, %2:g8rc is 32 bit value. After
transformation, %2:g8rc becomes a 64 bit value.
This patch fixes above issue.

Reviewed by: steven.zhang

Differential Revision: https://reviews.llvm.org/D71833
2019-12-25 21:56:18 -05:00
QingShan Zhang e973783916 [NFC][PowerPC] Add a function tryAndWithMask to handle all the cases
that 'and' with constant

More patches will be committed later to exploit more about 'and' with
constant.

Differential Revision: https://reviews.llvm.org/D71693
2019-12-26 02:48:30 +00:00
Kang Zhang 6d88b7d6e7 [PowerPC] Modify the hasSideEffects of MTLR and MFLR from 1 to 0
Summary:
If we didn't set the value for hasSideEffects bit in our td file, `llvm-tblgen`
will set it as true for those instructions which has no match pattern.
The instructions `MTLR` and `MFLR` don't set the hasSideEffects flag and don't
have match pattern, so their hasSideEffects flag will be set true by
`llvm-tblgen`.
But in fact, we can use `[LR]` to model the two instructions, so they should not
have SideEffects.

This patch is to modify the hasSideEffects of MTLR and MFLR from 1 to 0.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D71390
2019-12-26 02:12:32 +00:00
Wang, Pengfei 472bded3ed [X86] Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend
Summary: Enable STRICT_SINT_TO_FP/STRICT_UINT_TO_FP on X86 backend

Reviewers: craig.topper, RKSimon, LiuChen3, uweigand, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71871
2019-12-26 08:15:13 +08:00
Craig Topper c5b4a2386b [X86] Use zero vector to extend to 512-bits for strict_fp_to_uint v2i1->v2f64 on targets with AVX512F, but not AVX512VL.
In the worst case, this requires a 128-bit move instruction to
implicitly zero the upper bits. In the common case, we should
recognize the producing instruction already zeroed the upper bits.
2019-12-25 10:46:00 -08:00
Craig Topper 4af5b23db3 [X86FixupSetCC] Remember the preceding eflags defining instruction while we're scanning the basic block instead of looking back for it.
Summary:
We're already scanning forward through the basic block. Might as
well just remember eflags defs instead of doing a bounded search
backwards later.

Based on a comment in D71841.

Reviewers: RKSimon, spatel, uweigand

Reviewed By: uweigand

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71865
2019-12-25 10:26:13 -08:00
Craig Topper 2498d88259 [X86] Merge together some common code in LowerFP_TO_INT now that we have STRICT_CVTTP2SI/STRICT_CVTTP2UI nodes. NFC 2019-12-25 09:57:27 -08:00
Liu, Chen3 8304781cae Add missing strict_fp_to_int
Differential Revision: https://reviews.llvm.org/D71867
2019-12-25 16:10:10 +08:00
Craig Topper 27dc4c319b [X86FixupSetCC] Use MachineInstr::readRegister/definesRegister to check for EFLAGS use/def instead of our own custom operand scan. NFCI 2019-12-24 20:34:33 -08:00
Fangrui Song 1ac7c50ded [WinEH] Delete addFnAttr("no-frame-pointer-elim") which seems no longer needed
It was added in rL238619.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D71862
2019-12-24 17:02:19 -08:00
Matt Arsenault 1f054d667e AMDGPU/GlobalISel: Fix mapping and selection of llvm.amdgcn.div.fixup 2019-12-24 15:36:29 -05:00
Craig Topper c06e53119b [X86] Use 128-bit vector instructions for f32/f64->i64 conversions on 32-bit targets with avx512dq and avx512vl instructions.
On 32-bit targets we can't use the scalar instruction so we
insert the scalar into a vector and use packed conversions.
Previously we used either v4f32->v4i64 or v4f64->v4i64 to avoid
some complexity creating target specific ISD opcodes for
v4f32->v2i64. But this causes extra vzeroupper instructions and
possibly frequency throttling on Intel CPUs.

This patch changes this to create a 128-bit vector and uses a
target specific ISD opcode if needed.
2019-12-24 11:20:10 -08:00
Craig Topper a21beccea2 [X86] Add STRICT versions of CVTTP2SI, CVTTP2UI, CMPM, and CMPP.
Differential Revision: https://reviews.llvm.org/D71850
2019-12-24 10:07:04 -08:00
Matt Arsenault df5c2159d0 AMDGPU/GlobalISel: Legalize some 16-bit round instructions 2019-12-24 09:53:01 -05:00
Matt Arsenault 9035fa6b54 AMDGPU/GlobalISel: Lower llvm.amdgcn.else 2019-12-24 09:53:01 -05:00
Fangrui Song e0d855b399 [SelectionDAG] Change SelectionDAGISel::{funcInfo,SDB} to use unique_ptr
CurDAG is referenced more than 2000 times and used in many gerated .cpp
files. Don't touch it for now.
2019-12-23 22:41:05 -08:00
Ulrich Weigand 0d3f782e41 [FPEnv][X86] More strict int <-> FP conversion fixes
Fix several several additional problems with the int <-> FP conversion
logic both in common code and in the X86 target. In particular:

- The STRICT_FP_TO_UINT expansion emits a floating-point compare. This
  compare can raise exceptions and therefore needs to be a strict compare.
  I've made it signaling (even though quiet would also be correct) as
  signaling is the more usual default for an LT. This code exists both
  in common code and in the X86 target.

- The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode:
  it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one
  of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP
  that ends up not chosen. I've fixed the algorithm to use only a single
  STRICT_SINT_TO_FP instead.

- The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do
  the wrong thing because it calls getOperationAction using the result VT.
  But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to
  be called using the operand VT.

- Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate
  STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily.

Reviewed by: craig.topper

Differential Revision: https://reviews.llvm.org/D71840
2019-12-23 21:11:45 +01:00
Jay Foad c7c05b0c8a [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointer
Summary:
The only useful information the UndefValue conveys is the address space,
which MachinePointerInfo can represent directly without referring to an
IR value.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71838
2019-12-23 15:58:19 +00:00
Sanjay Patel 8cefc37be5 [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' extract_subvector(bitcast()) support
This moves the X86 specific transform from rL364407
into DAGCombiner to generically handle 'little to big' cases
(for example: extract_subvector(v2i64 bitcast(v16i8))). This
allows us to remove both the x86 implementation and the aarch64
bitcast(extract_subvector(bitcast())) combine.

Earlier patches that dealt with regressions initially exposed
by this patch:
rG5e5e99c041e4
rG0b38af89e2c0

Patch by: @RKSimon (Simon Pilgrim)

Differential Revision: https://reviews.llvm.org/D63815
2019-12-23 10:11:45 -05:00
Martin Storsjö 5a751e747d [AArch64] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Improve the classifyGlobalFunctionReference method to set
MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in
AArch64TargetLowering::LowerCall to use the return value from
classifyGlobalFunctionReference for these cases.

Add code in both AArch64FastISel and GlobalISel/IRTranslator to
bail out for function calls to extern weak functions on windows,
to let SelectionDAG handle them.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71721
2019-12-23 12:13:49 +02:00
Martin Storsjö b774aa1011 [ARM] [Windows] Use COFF stubs for calls to extern_weak functions
As the extern_weak target might be missing, resolving to the absolute
address zero, we can't use the normal direct PC-relative branch
instructions (as that would result in relocations out of range).

Instead check the shouldAssumeDSOLocal method and load the address
from a COFF stub.

This matches what was done for X86 in 6bf108d77a.

Differential Revision: https://reviews.llvm.org/D71720
2019-12-23 12:13:49 +02:00
Shengchen Kan 70fa4c4f88 [NFC] Style cleanups
1. Remove duplicate function for class name at the beginning of the
comment.
2. Use auto where the type is already obvious from the context.
2019-12-23 17:02:36 +08:00
QingShan Zhang 6d5e35e89d [Power9] Remove the PPCISD::XXREVERSE as it has completely the same semantics of ISD::BSWAP
The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP.
We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse.

Differential Revision: https://reviews.llvm.org/D70657
2019-12-23 07:44:33 +00:00
Jim Lin da0fe5db99 [AVR] Fix codegen for rotate instructions
Summary:
    This patch introduces the ROLBRd and RORBRd pseudo-instructions,
    which implemenent the "traditional" rotate operations; instead of
    the AVR rotate instructions that use the carry bit.

    The code is not optimized at all. Especially when dealing with
    loops of rotate instructions, this codegen should be improved some
    day.

Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358>

//Note//: This is my first submitted patch.

Reviewers: dylanmckay, Jim

Reviewed By: dylanmckay

Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels

Tags: #llvm

Patched by dsprenkels (Daan Sprenkels)

Differential Revision: https://reviews.llvm.org/D60365
2019-12-23 11:41:28 +08:00
Kai Luo 9681dc9627 [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotation
Summary:
Currently, we set legalization action of `ISD::ROTL` vectors as
`Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)`
to lower `ISD::ROTL` directly.

Differential Revision: https://reviews.llvm.org/D71324
2019-12-23 03:04:43 +00:00
Mark de Wever 2d903cc965 [AMDGPU] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71815
2019-12-22 19:39:28 +01:00
Mark de Wever 9c11026c1b [Hexagon] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71814
2019-12-22 19:35:02 +01:00
Mark de Wever 31262d6722 [NVPTX] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Also removed the top-level const as requested by Aaron Ballman in similar
patches.

Differential Revision: https://reviews.llvm.org/D71812
2019-12-22 19:27:44 +01:00
Mark de Wever 1b344e7967 [PowerPC] Fixes -Wrange-loop-analysis warnings
This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall.

Differential Revision: https://reviews.llvm.org/D71811
2019-12-22 19:23:57 +01:00
Eric Astor dc5b614fa9 [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 assembly.
Summary:
This is documented as the appropriate template modifier for call operands.
Fixes PR44272, and adds a regression test.

Also adds support for operand modifiers in Intel-style inline assembly.

Reviewers: rnk

Reviewed By: rnk

Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71677
2019-12-22 09:16:34 -05:00
Sanjay Patel 0b38af89e2 [AArch64] match splat of bitcasted extract subvector to DUPLANE
This is another potential regression exposed by D63815.

Here we peek through a bitcast to find an extract subvector and
scale the splat offset based on that:
splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC'

Differential Revision: https://reviews.llvm.org/D71672
2019-12-22 08:37:03 -05:00
Simon Pilgrim 6945d383b9 Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFC. 2019-12-21 17:45:30 +00:00
Florian Hahn d269255b95 [AArch64] Respect reserved registers while renaming in LdSt opt.
We cannot pick reserved registers as rename registers.

Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
2019-12-21 15:10:07 +01:00
Matt Arsenault 42a26445f9 AMDGPU/GlobalISel: Fix misuse of div_scale intrinsics
Confusingly, the intrinsic operands do not match the
instruction/custom node. The order is shuffled, and the 3rd operand is
an immediate to select operands.

I'm not 100% sure I did this right, but fdiv still doesn't select end
to end and it will be easier to tell when it does. This at least
avoids an assertion in RegBankSelect and allows hitting the fallback
on selection.
2019-12-21 04:55:36 -05:00
Matt Arsenault dff3f8d742 AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xor 2019-12-21 04:55:36 -05:00
Matt Arsenault d688a6739d AMDGPU/GlobalISel: Simplify code
This can directly access the register bank, and doesn't need to get it
through the ID.
2019-12-21 04:55:36 -05:00
Yury Delendik adf7a0a558 [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly operands locations
Extends DWARF expression language to express locals/globals locations. (via
target-index operands atm) (possible variants are: non-virtual registers
or address spaces)

The WebAssemblyExplicitLocals can replace virtual registers to targertindex
operand type at the time when WebAssembly backend introduces
{get,set,tee}_local instead of corresponding virtual registers.

Reviewed By: aprantl, dschuff

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D52634
2019-12-20 14:39:05 -08:00
Bill Wendling dc03b960d0 Add parentheses to silence warning 2019-12-20 12:52:36 -08:00
Philip Reames c148e2e2ef More style cleanups following rG14fc20ca6282 [NFC]
Demote member functions to static functions where possible
Use early continue/early return to reduce nesting
Clarify comments slightly.
Reuse previously define expression in one case.
2019-12-20 12:19:33 -08:00
Philip Reames 4024d49edc Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282
Should have caught this in review, but only noticed when addressing post commit style items.  We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory.  This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
2019-12-20 12:04:07 -08:00
Philip Reames 14fc20ca62 Align branches within 32-Byte boundary (NOP padding)
WARNING: If you're looking at this patch because you're looking for a full
performace mitigation of the Intel JCC Erratum, this is not it!

This is a preliminary patch on the patch towards mitigating the performance
regressions caused by Intel's microcode update for Jump Conditional Code
Erratum.  For context, see:
https://www.intel.com/content/www/us/en/support/articles/000055650.html

The patch adds the required assembler infrastructure and command line options
needed to exercise the logic for INTERNAL TESTING.  These are NOT public flags,
and should not be used for anything other than LLVM's own testing/debugging
purposes.  They are likely to change both in spelling and meaning.

WARNING: This patch is knowingly incorrect in some cornercases.  We need, and
do not yet provide, a mechanism to selective enable/disable the padding.
Conversation on this will continue in parellel with work on extending this
infrastructure to support prefix padding.

The goal here is to have the assembler align specific instructions such that
they neither cross or end at a 32 byte boundary.  The impacted instructions are:
a. Conditional jump.
b. Fused conditional jump.
c. Unconditional jump.
d. Indirect jump.
e. Ret.
f. Call.

The new options for llvm-mc are:
    -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary.
    -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align.

A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit
NOP to align the fused/unfused branch.

alignBranchesBegin inserts MCBoundaryAlignFragment before instructions,
alignBranchesEnd marks the end of the branch to be aligned,
relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch.

Nop padding is disabled when the instruction may be rewritten by the linker,
such as TLS Call.

Process Note: I am landing a patch by skan as it has been LGTMed, and
continuing to iterate on the review is simply slowing us down at this point.
We can and will continue to iterate in tree.

Patch By: skan
Differential Revision: https://reviews.llvm.org/D70157
2019-12-20 11:35:50 -08:00
Fangrui Song e8054f0933 [PPC32] Emit R_PPC_PLTREL24 for calls to dso_local ifunc
static void *ifunc(void) __attribute__((ifunc("resolver")));
  void foo() { ifunc(); }

The relocation produced by the ifunc() call:

1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000
3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000
4. clang -msecure-plt -fPIE => R_PPC_REL24

4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this
relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a
function, both GNU ld and lld (after D71621 fix) may produce a wrong
result.

This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC.
Both GNU ld and lld (after D71621) will be happy.

Reviewed By: sfertile

Differential Revision: https://reviews.llvm.org/D71649
2019-12-20 11:32:02 -08:00
Craig Topper de2378b4f3 [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables before a later use.
The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code.

A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands.

To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe.

I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that.

Differential Revision: https://reviews.llvm.org/D71736
2019-12-20 11:24:45 -08:00
Danilo Carvalho Grael 15bfd2cd54 [AArch64][SVE] Replace integer immediate intrinsics with splat vector variant
Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71614
2019-12-20 13:52:19 -05:00
Jonas Paulsson 9fcebad5e5 [SystemZ] Add a mapping from "select register" to "load on condition" (2-addr).
The SELR(Mux) instructions can be converted to two-address form as LOCR(Mux)
instructions whenever one of the sources are the same reg as dest. By adding
this mapping in getTwoOperandOpcode(), we get:

- Two-address hints in getRegAllocationHints() for select register
  instructions.

- No need anymore for special handling in SystemZShortenInst.cpp -
  shortenSelect() removed.

The two-address hints are now added before the GRX32 hints, which should be
preferred.

Review: Ulrich Weigand
https://reviews.llvm.org/D68870
2019-12-20 10:44:58 -08:00
Jonas Paulsson 3174683e21 [SystemZ] Bugfix and improve the handling of CC values.
It was recently discovered that the handling of CC values was actually broken
since overflow was not properly handled ('nsw' flag not checked for).

Add and sub instructions now have a new target specific instruction flag
named SystemZII::CCIfNoSignedWrap. It means that the CC result can be used
instead of a compare with 0, but only if the instruction has the 'nsw' flag
set.

This patch also adds the improvements of conversion to logical instructions
and the analyzing of add with immediates, to be able to eliminate more
compares.

Review: Ulrich Weigand
https://reviews.llvm.org/D66868
2019-12-20 10:20:23 -08:00
Victor Campos 2ff5a596cb Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit bbcf1c3496.
2019-12-20 18:08:24 +00:00
Ulrich Weigand ede8293d7d [SystemZ][FPEnv] Enable strict vector FP extends/truncations
The back-end currently has special DAGCombine code to detect
cases where two floating-point extend or truncate operations
can be combined into a single vector operation.

This patch extends that support to also handle strict FP operations.

Note that currently only the case where both operations have the
same input chain are supported.  This already suffices to cover
the common case where the operations result from scalarizing a
non-legal vector type.  More general cases can be supported in
the future.
2019-12-20 15:36:56 +01:00
Paul Walker 6cba90dc4d [AArch64][SVE] Correct intrinsics and patterns for logical predicate instructions
In general SVE intrinsics are considered predicated and merging
with everything else having suitable decoration.  For predicated
zeroing operations (like the predicate logical instructions) we
use the "_z" suffix.  After this change all intrinsics use their
expected names (i.e. orr instead of or and eor instead of xor).

I've removed intrinsics and patterns for condition code setting
instructions as that data is not returned as part of the intrinsic.
The expectation is to ask for a cc flag explicitly.

For example:
  a = and_z(pg, p1, p2)
  cc = ptest_<flag>(pg, a)

With the code generator expected to use "s" variants of instructions
when available.

Differential Revision: https://reviews.llvm.org/D71715
2019-12-20 14:22:27 +00:00
Cullen Rhodes 974f00a436 [AArch64][SVE] Fold constant multiply of element count
Summary:
E.g.

  %0 = tail call i64 @llvm.aarch64.sve.cntw(i32 31)
  %mul = mul i64 %0, <const>

Should emit:

  cntw    x0, all, mul #<const>

For <const> in the range 1-16.

Patch by Kerry McLaughlin

Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71014
2019-12-20 11:58:00 +00:00
Andrzej Warzynski be2b7ea89a [AArch64][SVE] Add intrnisics for saturating scalar arithmetic
Summary:
The following intrnisics are added:
  * @llvm.aarch64.sve.sqdec{b|h|w|d|p}
  * @llvm.aarch64.sve.sqinc{b|h|w|d|p}
  * @llvm.aarch64.sve.uqdec{b|h|w|d|p}
  * @llvm.aarch64.sve.uqinc{b|h|w|d|p}

For every intrnisic there a scalar variants (with n32 or n64 suffix) and
vector variants (no suffix).

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71252
2019-12-20 11:09:40 +00:00
Cullen Rhodes 3f9005eb89 Recommit "[AArch64][SVE] Add permutation and selection intrinsics"
Recommit 23c28c4043 (reverted in
dcb48f50bd) with a fix for an assert
"Request for a fixed size on a scalable object" being triggered in
`LowerSVEIntrinsicEXT`. The fix is to call `getKnownMinSize` on the
TypeSize object.
2019-12-20 10:45:17 +00:00
Andrzej Warzynski 88a973cf68 [AArch64][SVE] Add intrinsics for binary narrowing operations
Summary:
The following intrinsics for binary narrowing shift righ operations are
added:
  * @llvm.aarch64.sve.shrnb
  * @llvm.aarch64.sve.uqshrnb
  * @llvm.aarch64.sve.sqshrnb
  * @llvm.aarch64.sve.sqshrunb
  * @llvm.aarch64.sve.uqrshrnb
  * @llvm.aarch64.sve.sqrshrnb
  * @llvm.aarch64.sve.sqrshrunb
  * @llvm.aarch64.sve.shrnt
  * @llvm.aarch64.sve.uqshrnt
  * @llvm.aarch64.sve.sqshrnt
  * @llvm.aarch64.sve.sqshrunt
  * @llvm.aarch64.sve.uqrshrnt
  * @llvm.aarch64.sve.sqrshrnt
  * @llvm.aarch64.sve.sqrshrunt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71552
2019-12-20 10:20:30 +00:00
Sam Parker acbc9aed72 [ARM][MVE] Fixes for tail predication.
1) Fix an issue with the incorrect value being used for the number of
   elements being passed to [d|w]lstp. We were trying to check that
   the value was available at LoopStart, but this doesn't consider
   that the last instruction in the block could also define the
   register. Two helpers have been added to RDA for this.
2) Insert some code to now try to move the element count def or the
   insertion point so that we can perform more tail predication.
3) Related to (1), the same off-by-one could prevent us from
   generating a low-overhead loop when a mov lr could have been
   the last instruction in the block.
4) Fix up some instruction attributes so that not all the
   low-overhead loop instructions are labelled as branches and
   terminators - as this is not true for dls/dlstp.

Differential Revision: https://reviews.llvm.org/D71609
2019-12-20 09:34:18 +00:00
Sam Parker 4042518335 [ARM][MVE] Tail predicate in the presence of vcmp
Record the discovered VPT blocks while checking for validity and, for
now, only handle blocks that begin with VPST and not VPT. We're now
allowing more than one instruction to define vpr, but each block must
somehow be predicated using the vctp. This leaves us with several
scenarios which need fixing up:
1) A VPT block with is only predicated by the vctp and has no
   internal vpr defs.
2) A VPT block which is only predicated by the vctp but has an
   internal vpr def.
3) A VPT block which is predicated upon the vctp as well as another
   vpr def.
4) A VPT block which is not predicated upon a vctp, but contains it
   and all instructions within the block are predicated upon in.

The changes needed are, for:
1) The easy one, just remove the vpst and unpredicate the
   instructions in the block.
2) Remove the vpst and unpredicate the instructions up to the
   internal vpr def. Need insert a new vpst to predicate the
   remaining instructions.
3) No nothing.
4) The vctp will be inside a vpt and the instruction will be removed,
   so adjust the size of the mask on the vpst.

Differential Revision: https://reviews.llvm.org/D71107
2019-12-20 08:42:11 +00:00
Sam Parker 4f0fe6b97e [ARM][MVE] Tail predicate bottom/top muls.
Add VMULL and VQDMULL variants to our tail predication white list.

Differential Revision: https://reviews.llvm.org/D71465
2019-12-20 08:33:01 +00:00
Craig Topper bf507d4259 [X86] Make EmitCmp into a static function and explicitly return chain result for STRICT_FCMP. NFCI
The only thing its getting from the X86TargetLowering class is
the subtarget which we can easily pass. This function only has
one call site now since this might help the compiler inline it.

Explicitly return both the flag result and the chain result for
STRICT_FCMP nodes. This removes an assumption in the caller that
getValue(1) is the right way to get the chain.
2019-12-19 23:03:15 -08:00
Craig Topper 9b6fafa399 [X86] Directly call EmitTest in two places instead of creating a null constant and calling EmitCmp. NFCI
EmitCmp will just immediately call EmitTest and discard the null
constant only to have EmitTest create it again if it doesn't fold.

So just skip all that and go directly to EmitTest.
2019-12-19 23:03:06 -08:00
Philip Reames 8277c91cf3 [StackMaps] Be explicit about label formation [NFC] (try 2)
Recommit after making the same API change in non-x86 targets.  This has been build for all targets, and tested for effected ones.  Why the difference?  Because my disk filled up when I tried make check for all.

For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated.  This just rearranges the code to make the upcoming change more obvious.
2019-12-19 14:05:30 -08:00
Eric Christopher add710eb23 Temporarily Revert "[StackMaps] Be explicit about label formation [NFC]"
as it broke the aarch64 build.

This reverts commit bc7595d934.
2019-12-19 12:52:40 -08:00
Philip Reames bc7595d934 [StackMaps] Be explicit about label formation [NFC]
For auto-padding assembler support, we'll need to bundle the label with the instructions (nops or call sequences) so that they don't get separated.  This just rearranges the code to make the upcoming change more obvious.
2019-12-19 12:38:44 -08:00
Philip Reames cf6aafa47c [FaultMaps] Make label formation a bit more explicit [NFC]
This is in advance of assembler padding directives support where we'll need to bundle the label w/the corresponding faulting instruction to avoid padding being inserted between.
2019-12-19 12:38:44 -08:00
Luís Marques ec4f06a77d [RISCV] Don't crash on unsupported relocations
Summary: Instead of crashing due to the `llvm_unreachable`, provide a proper
error when invalid fixups/relocations are encountered.

Reviewers: asb, lenary
Reviewed By: asb
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71536
2019-12-19 17:21:30 +00:00
Jonas Paulsson 6be1578895 [SystemZ] Recognize mrecord-mcount in backend
Emit the __mcount_loc section for all fentry calls.

Review: Ulrich Weigand
https://reviews.llvm.org/D71629
2019-12-19 09:06:18 -08:00
lewis-revill a116f28a0d [RISCV] Enable the machine outliner for RISC-V
This patch enables the machine outliner for RISC-V and adds the
necessary logic for checking whether sequences can be safely outlined,
and describing how they should be outlined. Outlined functions are
called using the register t0 (x5) as the return address register, which
must be available for an occurrence of a sequence to be safely outlined.

Differential Revision: https://reviews.llvm.org/D66210
2019-12-19 16:41:53 +00:00
Justin Hibbits d3aeac8e20 [PowerPC] Only use PLT annotations if using PIC relocation model
Summary:
The default static (non-PIC, non-PIE) model for 32-bit powerpc does not
use @PLT annotations and relocations in GCC.  LLVM shouldn't use @PLT
annotations either, because it breaks secure-PLT linking with (some
versions of?) GNU LD.

Update the available-externally.ll test to reflect that default mode should be
the same as the static relocation, by using the same check prefix.

Reviewed by:    sfertile
Differential Revision: https://reviews.llvm.org/D70570
2019-12-19 09:27:13 -06:00
Cullen Rhodes dcb48f50bd Revert "[AArch64][SVE] Add permutation and selection intrinsics"
This reverts commit 23c28c4043.

It caused build failures in the following expensive checks builders:

    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1295
    http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/700

Reverting for now whilst I figure what the issue is.
2019-12-19 14:26:14 +00:00
Cullen Rhodes 23c28c4043 [AArch64][SVE] Add permutation and selection intrinsics
Summary:
Adds the following intrinsics:

    * @llvm.aarch64.sve.clasta
    * @llvm.aarch64.sve.clasta_n
    * @llvm.aarch64.sve.clastb
    * @llvm.aarch64.sve.clastb_n
    * @llvm.aarch64.sve.compact
    * @llvm.aarch64.sve.ext
    * @llvm.aarch64.sve.lasta
    * @llvm.aarch64.sve.lastb
    * @llvm.aarch64.sve.rev
    * @llvm.aarch64.sve.splice
    * @llvm.aarch64.sve.tbl
    * @llvm.aarch64.sve.trn1
    * @llvm.aarch64.sve.trn2
    * @llvm.aarch64.sve.uzp1
    * @llvm.aarch64.sve.uzp2
    * @llvm.aarch64.sve.zip1
    * @llvm.aarch64.sve.zip2

Reviewers: sdesmalen, efriedma, dancgr, mgudim, huntergr, rengolin

Reviewed By: sdesmalen, efriedma

Subscribers: kmclaughlin, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71401
2019-12-19 13:18:40 +00:00
Miloš Stojanović d005df4c16 [llvm-exegesis] Fix pfm counter names for Haswell for older versions of libpfm
The inconsistency caused uops mode to fail on an older version of libpfm
since the dispatched_port was added as an alias for executed_port only
after v4.6.0 of libpfm.

Differential revision: https://reviews.llvm.org/D71665
2019-12-19 12:57:17 +01:00
Jay Foad c5c935ab66 Make more use of MachineInstr::mayLoadOrStore. 2019-12-19 11:51:52 +00:00
Victor Campos bbcf1c3496 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

Reviewers: dmgreen, efriedma, john.brawn

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70072
2019-12-19 11:23:01 +00:00
Cullen Rhodes eca0c97a6b [AArch64][SVE] Implement pfirst and pnext intrinsics
Reviewers: sdesmalen, efriedma, dancgr, mgudim, cameron.mcinally

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71472
2019-12-19 11:03:32 +00:00
Cullen Rhodes 49199465a3 [AArch64][SVE] Implement ptrue intrinsic
Reviewers: sdesmalen, eli.friedman, dancgr, mgudim, cameron.mcinally,
huntergr, efriedma

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71457
2019-12-19 11:02:05 +00:00
Stanislav Mekhanoshin 58578f7056 [AMDGPU] Implemented fma cost analysis
Differential Revision: https://reviews.llvm.org/D71676
2019-12-18 23:54:20 -08:00
Liu, Chen3 2f932b5729 Enable STRICT_FP_TO_SINT/UINT on X86 backend
This patch is mainly for custom lowering the vector operation.

Differential Revision: https://reviews.llvm.org/D71592
2019-12-19 14:49:13 +08:00
czhengsz f5440ec41d [PowerPC] make lwa as a valid ds candidate in ppcloopinstrformprep pass
Fix a FIXME in ppcloopinstrformprep pass.

Reviewed by: nemanjai

Differential Revision: https://reviews.llvm.org/D71346
2019-12-18 21:06:57 -05:00
Thomas Lively 71eb8023d8 [WebAssembly] Add avgr_u intrinsics and require nuw in patterns
Summary:
The vector pattern `(a + b + 1) / 2` was previously selected to an
avgr_u instruction regardless of nuw flags, but this is incorrect in
the case where either addition may have an unsigned wrap. This CL
changes the existing pattern to require both adds to have nuw flags
and adds builtin functions and intrinsics for the avgr_u instructions
because the corrected pattern is not representable in C.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71648
2019-12-18 15:31:38 -08:00
Craig Topper f0df4218b6 [X86] Add a simple hack to IsProfitableToFold to prevent vselect+strict fp operations from being folded into masked instructions.
We really need to update the isel patterns to prevent this, but
that requires some tablegen de-tangling. So this hack will work
for correctness in the short term.
2019-12-18 14:42:56 -08:00
Ulrich Weigand 1946461344 [FPEnv] Strict versions of llvm.minimum/llvm.maximum
Add new intrinsics
   llvm.experimental.constrained.minimum
   llvm.experimental.constrained.maximum
as strict versions of llvm.minimum and llvm.maximum.

Includes SystemZ back-end support.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D71624
2019-12-18 21:35:28 +01:00
Danilo Carvalho Grael c7abf88411 Revert "[AArch64][SVE] Replace integer immediate intrinsics with splat vector variant"
This reverts commit 830e08b98b and eb1857ce0d.

This commit leads to an unexpected failure on test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll.

The review will need more changes before its re-commited.
2019-12-18 14:14:10 -05:00
Jonas Paulsson ca520592c0 [Clang FE, SystemZ] Don't add "true" value for the "mnop-mcount" attribute.
Let the "mnop-mcount" function attribute simply be present or non-present.
Update SystemZ backend as well to use hasFnAttribute() instead.

Review: Ulrich Weigand
https://reviews.llvm.org/D71669
2019-12-18 11:04:13 -08:00
Stefan Pintilie ec3d6f3ecb [PowerPC][NFC] Refactor splat of constant to vector.
Refactor the splatting of a constant to a vector so that common code is used
both for Power9 and Power8.

Patch by: Anil Mahmud

Differential Revision: https://reviews.llvm.org/D71481
2019-12-18 12:43:19 -06:00
Danilo Carvalho Grael 830e08b98b [AArch64][SVE] Replace integer immediate intrinsics with splat vector variant
Summary: Replace the integer immediate intrisics with splat vector variants so they can be applied as optimizations for the C/C++ intrinsics.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71614
2019-12-18 13:11:21 -05:00
Daniel Sanders c3cb089a87 [gicombiner] Import tryCombineIndexedLoadStore()
Summary:
Now that arbitrary data is supported, import tryCombineIndexedLoadStore()

Depends on D69147

Reviewers: bogner, volkan

Reviewed By: volkan

Subscribers: hiraditya, arphaman, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69151
2019-12-18 14:41:38 +00:00
Sanjay Patel 5e5e99c041 [AArch64] match fcvtl2 with bitcasted extract
This should eliminate a regression seen in D63815.

If we are FP extending the high half extract of a vector,
we should be able to peek through a bitcast sitting
between the extract and extend.

This replaces tablegen patterns with a more general
DAG to DAG override, so we can handle any casted type.

Differential Revision: https://reviews.llvm.org/D71515
2019-12-18 08:47:07 -05:00
Daniel Sanders 55c57408b0 [gicombiner] Add support for arbitrary match data being passed from match to apply
Summary:
This is used by the extending_loads combine to tell the apply step which
use is the preferred one to fold and the other uses should be re-written
to consume.

Depends on D69117

Reviewers: volkan, bogner

Reviewed By: volkan

Subscribers: hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69147
2019-12-18 12:27:29 +00:00
Victor Campos 364b8f5fbe [AArch64] Improve codegen of volatile load/store of i128
Summary:
Instead of generating two i64 instructions for each load or store of a
volatile i128 value (two LDRs or STRs), now emit a single LDP or STP.

Reviewers: labrinea, t.p.northover, efriedma

Reviewed By: efriedma

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D69559
2019-12-18 10:03:12 +00:00
Jay Foad 97ca7c2cc9 [AArch64] Enable clustering memory accesses to fixed stack objects
Summary:
r347747 added support for clustering mem ops with FI base operands
including support for fixed stack objects in shouldClusterFI, but
apparently this was never tested.

This patch fixes shouldClusterFI to work with scaled as well as
unscaled load/store instructions, and fixes the ordering of memory ops
in MemOpInfo::operator< to ensure that memory addresses always
increase, regardless of which direction the stack grows.

Subscribers: MatzeB, kristof.beyls, hiraditya, javed.absar, arphaman, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71334
2019-12-18 09:46:11 +00:00
Anna Welker 7cd1cfdd6b [NFC][TTI] Add Alignment for isLegalMasked[Gather/Scatter]
Add an extra parameter so alignment can be taken under
consideration in gather/scatter legalization.

Differential Revision: https://reviews.llvm.org/D71610
2019-12-18 09:14:39 +00:00
Wang, Pengfei 1949235d13 [X86] Add strict fma support
Summary: Add strict fma support

Reviewers: craig.topper, RKSimon, LiuChen3

Subscribers: hiraditya, llvm-commits, LuoYuanke

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71604
2019-12-18 11:44:00 +08:00
Nemanja Ivanovic a5da8d90da [PowerPC] Add missing legalization for vector BSWAP
We somehow missed doing this when we were working on Power9 exploitation.
This just adds the missing legalization and cost for producing the vector
intrinsics.

Differential revision: https://reviews.llvm.org/D70436
2019-12-17 19:07:34 -06:00
Craig Topper 004fdbe041 [X86] Manually format some setOperationAction calls to line up arguments to improve readability. NFC 2019-12-17 16:11:31 -08:00
Xiaoqing Wu a17619e0b0 [AArch64][GlobalISel]: Fix a crash in GlobalIsel in dealing with 16bit uadd.with.overflow.
Summary: AArch64 doesn't support uadd.with.overflow.i16 natively. This change adds a legalization rule to convert the 32bit add result to 16bit. This should fix PR43981.

Reviewers: arsenm, qcolombet, paquette, aemerson

Reviewed By: paquette

Subscribers: wdng, rovka, kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71587
2019-12-17 16:05:00 -08:00
Stanislav Mekhanoshin b8ac5894a1 [AMDGPU] Fixed cost model for packed 16 bit ops
Differential Revision: https://reviews.llvm.org/D71622
2019-12-17 15:14:17 -08:00
Thomas Lively f1b351e14a [WebAssembly] Implement SIMD {i8x16,i16x8}.avgr_u instructions
Summary:
These instructions were added to the spec proposal in
https://github.com/WebAssembly/simd/pull/126. Their semantics are
equivalent to `(a + b + 1) / 2`. The opcode for the experimental
i32x4.dot_i16x8_s is also bumped due to a collision with the
i8x16.avgr_u opcode.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71628
2019-12-17 15:05:50 -08:00
David Tenty 84161f18cc [AIX] Avoid unset csect assert for functions defined after their use in TOC
Summary:
If a function is defined after it appears in a TOC expression, we may
try to access an unset containing csect when returning a symbol for the
expression.

Reviewers: Xiangling_L, DiggerLin, jasonliu, hubert.reinterpretcast

Reviewed By: hubert.reinterpretcast

Subscribers: hubert.reinterpretcast, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71125
2019-12-17 16:59:22 -05:00
Tom Stellard c3bc805f4f AMDGPU/SILoadStoreOptimillzer: Refactor CombineInfo struct
Summary:
Modify CombineInfo to only store information about a single instruction.
This is a little easier to work with and removes a lot of duplicate
initialization code.

Reviewers: arsenm, nhaehnle

Reviewed By: arsenm, nhaehnle

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71045
2019-12-17 13:43:10 -08:00
Jay Foad 0412f518dc [AMDGPU] Fix typo in SIInstrInfo::memOpsHaveSameBasePtr
Summary:
The typo has been present since memOpsHaveSameBasePtr was introduced in
r313208.

It caused SIInstrInfo::shouldClusterMemOps to cluster more mem ops than
it was supposed to.

Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71616
2019-12-17 18:54:27 +00:00
Zakk Chen 2c8e22d25c [RISCV] Add subtargets initialized with target feature
expected failed test (RV32IF-ILP32F) will be fixed in a subsequent patch.

Reviewers: efriedma, lenary, asb

Reviewed By: efriedma, lenary

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70116
2019-12-17 09:34:01 -08:00
Ulrich Weigand d1c0f14be8 [SystemZ][FPEnv] Back-end support for STRICT_[SU]INT_TO_FP
As of b1d8576 there is middle-end support for STRICT_[SU]INT_TO_FP,
so this patch adds SystemZ back-end support as well.

The patch is SystemZ target specific except for adding SD patterns
strict_[su]int_to_fp and any_[su]int_to_fp to TargetSelectionDAG.td
as usual.
2019-12-17 18:24:05 +01:00
Mitch Phillips 2423774cc2 Revert "Honor -fuse-init-array when os is not specified on x86"
This reverts commit aa5ee8f244.

This change broke the sanitizer buildbots. See comments at the patchset
(https://reviews.llvm.org/D71360) for more information.
2019-12-17 07:36:59 -08:00
Kevin P. Neal b1d8576b0a This adds constrained intrinsics for the signed and unsigned conversions
of integers to floating point.

This includes some of Craig Topper's changes for promotion support from
D71130.

Differential Revision: https://reviews.llvm.org/D69275
2019-12-17 10:06:51 -05:00
Luís Marques e332a09619 [RISCV][NFC] Trivial cleanup
Fix a typo. Remove two seemingly out-of-date TODO comments.
2019-12-17 11:44:35 +00:00
Kristof Beyls 870f39d310 Fix assertion failure in getMemOperandWithOffsetWidth
This fixes an assertion failure that triggers inside
getMemOperandWithOffset when Machine Sinking calls it on a MachineInstr
that is not a memory operation.

Different backends implement getMemOperandWithOffset differently: some
return false on non-memory MachineInstrs, others assert.

The Machine Sinking pass in at least SinkingPreventsImplicitNullCheck
relies on getMemOperandWithOffset to return false on non-memory
MachineInstrs, instead of asserting.

This patch updates the documentation on getMemOperandWithOffset that it
should return false on any MachineInstr it cannot handle, instead of
asserting. It also adapts the in-tree backends accordingly where
necessary.

Differential Revision: https://reviews.llvm.org/D71359
2019-12-17 10:56:09 +00:00
Guillaume Chatelet 531c1161b9 Resubmit "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
Summary:
This is a resubmit of D71473.

This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: aaron.ballman, courbet

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71547
2019-12-17 10:07:46 +01:00
Kamlesh Kumar aa5ee8f244 Honor -fuse-init-array when os is not specified on x86
Currently -fuse-init-array option is not effective when target triple
does not specify os, on x86,x86_64.
i.e.

// -fuse-init-array is not honored.
$ clang -target i386 -fuse-init-array test.c -S

// -fuse-init-array is honored.
$ clang -target i386-linux -fuse-init-array test.c -S

This patch fixes first case.
And does cleanup.

Reviewers: rnk, craig.topper, fhahn, echristo

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D71360
2019-12-16 15:21:23 -08:00
Ana Pazos d7af86bdd0 [RISCV] Added isCompressibleInst() to estimate size in getInstSizeInBytes()
Summary:
Modified compression emitter tablegen backend to emit isCompressibleInst()
check which in turn is used by getInstSizeInBytes() to better estimate
instruction size. Note the generation of compressed instructions in RISC-V
happens late in the assembler therefore instruction size estimate might be off
if computed before.

Reviewers: lenary, asb, luismarques, lewis-revill

Reviewed By: asb

Subscribers: sameer.abuasal, lewis-revill, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, lenary, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D68290
2019-12-16 15:15:10 -08:00
Fangrui Song 002adabb3a [AArch64][SVE] Change pattern generation code to fix -Wimplicit-fallthrough after D71483 2019-12-16 15:09:05 -08:00
Danilo Carvalho Grael f933878991 [AArch64][SVE] Add patterns for logical immediate operations.
Summary:
Add pattern matching for the following SVE logical vector and immediate instructions:
- and/bic, orr/orn, eor/eon.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71483
2019-12-16 16:18:34 -05:00
Thomas Lively 3a93756dfb [WebAssembly] Replace SIMD int min/max builtins with patterns
Summary:
The instructions were originally implemented via builtins and
intrinsics so users would have to explicitly opt-in to using
them. This was useful while were validating whether these instructions
should have been merged into the spec proposal. Now that they have
been, we can use normal codegen patterns, so the intrinsics and
builtins are no longer useful.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71500
2019-12-16 11:48:49 -08:00
Jonas Paulsson 49f55dda01 [SystemZ] Improve verification of MachineOperands.
Now that the machine verifier will check for cases of register/immediate
MachineOperands and their correspondence to the MC instruction descriptor,
this patch adds the operand types to the descriptors where they were
previously missing. All MCOI::OPERAND_UNKNOWN operand types have been handled
to get a known type, except for G_... (global isel) instructions.

Review: Ulrich Weigand
https://reviews.llvm.org/D71494
2019-12-16 09:51:54 -08:00
Miloš Stojanović d7efa6b198 [mips] Add an assert in getTargetStreamer()
Check if the TargetStreamer can be accessed.

Differential Revision: https://reviews.llvm.org/D71477
2019-12-16 17:04:55 +01:00
Guillaume Chatelet 4658da10e4 Revert "[Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove"
This reverts commit 181ab91efc.
2019-12-16 15:19:49 +01:00
David Tellenbach df0cc105fa Reland [AArch64][MachineOutliner] Return address signing for outlined functions
Summary:
Reland after fixing a bug that allowed outlining of SP modifying instructions
that invalidated return address signing.

During AArch64 frame lowering instructions to enable return address
signing are inserted into functions if needed. Functions generated during
machine outlining don't run through target frame lowering and hence are
missing such instructions.

This patch introduces the following changes:

1. If not all functions that potentially participate in function outlining agree
   on their return address signing scope and their return address signing key,
   outlining is disabled for these functions.
2. If not all functions that potentially participate in function outlining agree
   on their support for v8.3A features, outlining is disabled for these
   functions.
3. If an outlining candidate would outline instructions that modify sp in a way
   that invalidates return address signing, outlining is disabled for that
   particular candidate.
4. If all candidate functions agree on the signing scope, signing key and their
   support for v8.3 features, the outlined function behaves as if it had the
   same scope and key attributes and as if it would provide the same v8.3A
   support as the original functions.

Reviewers: ostannard, paquette

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70635
2019-12-16 14:40:45 +01:00
Guillaume Chatelet 181ab91efc [Alignment][NFC] Deprecate CreateMemCpy/CreateMemMove
Summary:
This patch introduces a set of functions to enable deprecation of IRBuilder functions without breaking out of tree clients.
Functions will be deprecated one by one and as in tree code is cleaned up.

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71473
2019-12-16 13:35:55 +01:00
Andrzej Warzynski c41d2b5ab2 [AArch64][SVE2] Add intrinsics for binary narrowing operations
Summary:
The following intrinsics for binary narrowing add and sub operations are
added:
  * @llvm.aarch64.sve.addhnb
  * @llvm.aarch64.sve.addhnt
  * @llvm.aarch64.sve.raddhnb
  * @llvm.aarch64.sve.raddhnt
  * @llvm.aarch64.sve.subhnb
  * @llvm.aarch64.sve.subhnt
  * @llvm.aarch64.sve.rsubhnb
  * @llvm.aarch64.sve.rsubhnt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71424
2019-12-16 12:22:56 +00:00
Kristof Beyls 7f4f07ddf3 [AArch64] Enable emission of stack maps for non-Mach-O binaries on AArch64.
The emission of stack maps in AArch64 binaries has been disabled for all
binary formats except Mach-O since rL206610, probably mistakenly, as far
as I can tell. This patch reverts this to its intended state.

Differential Revision: https://reviews.llvm.org/D70069

Patch by Loic Ottet.
2019-12-16 12:02:47 +00:00
Andrzej Warzynski 7e20c3a71d [Aarch64][SVE] Add intrinsics for scatter stores
Summary:
This patch adds the following SVE intrinsics for scatter stores:
* 64-bit offsets:
  * @llvm.aarch64.sve.st1.scatter (unscaled)
  * @llvm.aarch64.sve.st1.scatter.index (scaled)
* 32-bit unscaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw (sign-extended-offset)
* 32-bit scaled offsets:
  * @llvm.aarch64.sve.st1.scatter.uxtw.index (zero-extended offset)
  * @llvm.aarch64.sve.st1.scatter.sxtw.index (sign-extended offset)
* vector base + immediate:
  * @llvm.aarch64.sve.st1.scatter.imm

Reviewers: rengolin, efriedma, sdesmalen

Reviewed By: efriedma, sdesmalen

Subscribers: kmclaughlin, eli.friedman, tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71074
2019-12-16 11:52:53 +00:00
Jay Foad f8495017f0 Fix whitespace. 2019-12-16 10:42:34 +00:00
Jay Foad 4f17b1784e Fix for AMDGPU MUL_I24 known bits calculation
Summary:
At present, the code calculating known bits of AMDGPU MUL_I24 confuses the concepts of "non-negative number" and "positive number".

In some situations, it results in incorrect code. I have a case where the optimizer replaces the result of calculating MUL_I24(-5, 0) with -8.

Reviewers: foad, arsenm

Reviewed By: arsenm

Subscribers: foad, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Patch by Eugene Kuznetsov.

Differential Revision: https://reviews.llvm.org/D70367
2019-12-16 10:25:57 +00:00
Sjoerd Meijer 049f9672d8 [ARM] Move MVE opcode helper functions to ARMBaseInstrInfo. NFC.
In ARMLowOverheadLoops.cpp, MVETailPredication.cpp, and MVEVPTBlock.cpp we have
quite a few helper functions all looking at the opcodes of MVE instructions.
This moves all these utility functions to ARMBaseInstrInfo.

Diferential Revision: https://reviews.llvm.org/D71426
2019-12-16 09:13:59 +00:00
Jim Lin 7e0fd77645 [PowerPC] Fix %llvm.ppc.altivec.vc* lowering
Summary:
r372285 changed LLVM to use a `TargetConstant` for parameters of intrinsics that are required to be immediates.

Since that commit, use of `%llvm.ppc.altivec.vc{fsx,fux,tsxs,tuxs}` intrinsics has not worked, and resulted in a `LLVM ERROR: Cannot select: intrinsic %llvm.ppc.altivec.vc*` error. The intrinsics' TableGen definitions matched on `imm` instead of `timm`.

This commit updates those definitions to use `timm`.

Fixes: https://llvm.org/PR44239

Reviewers: hfinkel, nemanjai, #powerpc, Jim

Reviewed By: Jim

Subscribers: qiucf, wuzish, Jim, hiraditya, kbarton, jsji, shchenz, llvm-commits

Tags: #llvm

Patched by vddvss (Colin Samples).

Differential Revision: https://reviews.llvm.org/D71138
2019-12-16 10:21:55 +08:00
Logan Chien 061a94e4e2 Revert "AArch64: Fix frame record chain"
Breaks aosp-O3-polly-before-vectorizer-unprofitable with the following
error message:

void llvm::emitFrameOffset(llvm::MachineBasicBlock &,
MachineBasicBlock::iterator, const llvm::DebugLoc &, unsigned int,
unsigned int, llvm::StackOffset, const llvm::TargetInstrInfo *,
MachineInstr::MIFlag, bool, bool, bool *): Assertion `(DestReg !=
AArch64::SP || Bytes % 16 == 0) && "SP increment/decrement not 16-byte
aligned"' failed.

This reverts commit d4e10e6adb.
2019-12-14 13:58:40 -08:00
Logan Chien d4e10e6adb AArch64: Fix frame record chain
The commit r369122 may keep LR and FP register (aka. frame record) in
the middle of a frame, thus we must add the offsets to ensure the FP
register always points to innermost frame record on the stack.

According to AAPCS64[1], a conforming code shall construct a linked list
of stack frames that can be traversed with frame records.  This commit
is also essential to frame-pointer-based stack unwinder (e.g.  the stack
unwinder in linx-perf-tools.)

[1] https://github.com/ARM-software/software-standards/blob/master/abi/aapcs64/aapcs64.rst#the-frame-pointer

Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64/framelayout-frame-record.ll
Test: llvm-lit ${LLVM_SRC}/test/CodeGen/AArch64

Differential Revision: https://reviews.llvm.org/D70800
2019-12-14 10:23:20 -08:00
Fangrui Song a0aa58dad5 [AArch64] Save FP for leaf functions when disabling frame pointer elimination
The change allows clang -mno-omit-leaf-frame-pointer to disable frame
pointer elimination. This behavior matches X86 and Mips, and also GCC
AArch64.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D71168
2019-12-13 18:48:58 -08:00
Sean Fertile 93faa237da [PowerPC] Add Support for indirect calls on AIX.
Extends the desciptor-based indirect call support for 32-bit codegen,
and enables indirect calls for AIX.

In-depth Description:
In a function descriptor based ABI, a function pointer points at a
descriptor structure as opposed to the function's entry point. The
descriptor takes the form of 3 pointers: 1 for the function's entry
point, 1 for the TOC anchor of the module containing the function
definition, and 1 for the environment pointer:

struct FunctionDescriptor {
  void *EntryPoint;
  void *TOCAnchor;
  void *EnvironmentPointer;
};

An indirect call has several steps of loading the the information from
the descriptor into the proper registers for setting up the call. Namely
it has to:

1) Save the caller's TOC pointer into the TOC save slot in the linkage
   area, and then load the callee's TOC pointer into the TOC register
   (GPR 2 on AIX).

2) Load the function descriptor's entry point into the count register.

3) Load the environment pointer into the environment pointer register
   (GPR 11 on AIX).

4) Perform the call by branching on count register.

5) Restore the caller's TOC pointer after returning from the indirect call.

A couple important caveats to the above:

- There is no way to directly load a value from memory into the count register.
  Instead we populate the count register by loading the entry point address into
  a gpr and then moving the gpr to the count register.

- The TOC restore has to come immediately after the branch on count register
  instruction (i.e., the 1st instruction executed after we return from the
  call). This is an implementation limitation. We could, in theory, schedule
  the restore elsewhere as long as no uses of the TOC pointer fall in between
  the call and the restore; however, to keep it simple, we insert a pseudo
  instruction that represents both the indirect branch instruction and the
  load instruction that restores the caller's TOC from the linkage area. As
  they flow through the compiler as a single pseudo instruction, nothing can be
  inserted between them and the caller's TOC is then valid at any use.

Differtential Revision: https://reviews.llvm.org/D70724
2019-12-13 20:07:00 -05:00
Fangrui Song 40c288b75c [Mips] Fix gcc -Wunused-but-set-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71028 2019-12-13 16:41:08 -08:00
Roman Tereshin 8731799fc6 [Legalizer] Making artifact combining order-independent
Legalization algorithm is complicated by two facts:
1) While regular instructions should be possible to legalize in
   an isolated, per-instruction, context-free manner, legalization
   artifacts can only be eliminated in pairs, which could be deeply, and
   ultimately arbitrary nested: { [ () ] }, where which paranthesis kind
   depicts an artifact kind, like extend, unmerge, etc. Such structure
   can only be fully eliminated by simple local combines if they are
   attempted in a particular order (inside out), or alternatively by
   repeated scans each eliminating only one innermost pair, resulting in
   O(n^2) complexity.
2) Some artifacts might in fact be regular instructions that could (and
   sometimes should) be legalized by the target-specific rules. Which
   means failure to eliminate all artifacts on the first iteration is
   not a failure, they need to be tried as instructions, which may
   produce more artifacts, including the ones that are in fact regular
   instructions, resulting in a non-constant number of iterations
   required to finish the process.

I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.

Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.

In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.

Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.

Reviewers: volkan, aditya_nandakumar, qcolombet, paquette, aemerson, dsanders

Reviewed By: aditya_nandakumar, paquette

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71448
2019-12-13 15:45:18 -08:00
Sam Elliott a0f43b0043 [RISCV] Move DebugLoc Copy into CompressInstEmitter
Summary:
This copy ensures that debug location information is kept for
compressed instructions. There are places where both compressInstruction and
uncompressInstruction are called that were not doing this copy, discarding some
debug info.

This change merely moves the copy into the generated file, so you cannot forget
to copy the location over when compressing or uncompressing.

Reviewers: asb, luismarques

Reviewed By: luismarques

Subscribers: sameer.abuasal, aprantl, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D67493
2019-12-13 20:01:04 +00:00
Momchil Velikov 8e8e3181aa [ARM] Fix in ICE when retrieving the number of micro-ops for vlldm/vlstm
The big switch in `ARMBaseInstrInfo::getNumMicroOps` is missing cases for
`VLLDM` and `VLSTM`, which are currently defined with itineraries having a
dynamic count of micro-ops.

Assuming an optimistic case in which these instruction do not actually perform
loads or stores, and with the idea that Armv8-m cores are supposed to use the
new style scheduling models, this patch just sets the itinerary for those two
instructions to `NoItinerary`.

Differential Revision: https://reviews.llvm.org/D71266
2019-12-13 18:19:40 +00:00
Momchil Velikov d53e61863d [AArch64] Emit PAC/BTI .note.gnu.property flags
This patch make LLVM emit the processor specific program property types
defined in AArch64 ELF spec
https://developer.arm.com/docs/ihi0056/f/elf-for-the-arm-64-bit-architecture-aarch64-abi-2019q2-documentation

A file containing no functions gets both property flags.  Otherwise, a property
is set iff all the functions in the file have the corresponding attribute.

Patch by Daniel Kiss and Momchil Velikov.

Differential Revision: https://reviews.llvm.org/D71019
2019-12-13 17:38:20 +00:00
Fangrui Song f99eedeb72 [MC][PowerPC] Fix a crash when redefining a symbol after .set
Fix PR44284. This is probably not valid assembly but we should not crash.

Reviewed By: luporl, #powerpc, steven.zhang

Differential Revision: https://reviews.llvm.org/D71443
2019-12-13 09:31:54 -08:00
Fangrui Song f16377f11c [ARM][MVE] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D71062 2019-12-13 09:26:26 -08:00
Sam Parker 84593f058b [ARM][MVE] Make VPT invalid for tail predication
We've been marking VPT incompatible instructions as invalid for tail
predication too, though this may not strictly be true. VPT are
incompatible and, unless its the first predicate def in a loop,
they shouldn't be compatible for tail predication either.

Differential Revision: https://reviews.llvm.org/D71410
2019-12-13 15:01:08 +00:00
Mikhail Maltsev 99581fd4c8 [ARM][MVE] Add vector reduction intrinsics with two vector operands
Summary:
This patch adds intrinsics for the following MVE instructions:
* VABAV
* VMLADAV, VMLSDAV
* VMLALDAV, VMLSLDAV
* VRMLALDAVH, VRMLSLDAVH

Each of the above 4 groups has a corresponding new LLVM IR intrinsic,
since the instructions cannot be easily represented using
general-purpose IR operations.

Reviewers: simon_tatham, ostannard, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71062
2019-12-13 13:17:29 +00:00
Simon Tatham 25305a9311 [ARM][MVE] Add intrinsics for more immediate shifts.
Summary:
This fills in the remaining shift operations that take a single vector
input and an immediate shift count: the `vqshl`, `vqshlu`, `vrshr` and
`vshll[bt]` families.

`vshll[bt]` (which shifts each input lane left into a double-width
output lane) is the most interesting one. There are separate MC
instruction ids for shifting by exactly the input lane width and
shifting by less than that, because the instruction encoding is so
completely different for the lane-width special case. So I had to
write two sets of patterns to match based on the immediate shift
count, which involved adding a ComplexPattern matcher to avoid the
general-case pattern accidentally matching the special case too. For
that family I've made sure to add an llc codegen test for both
versions of each instruction.

I'm experimenting with a new strategy for parametrising the isel
patterns for all these instructions: adding extra fields to the
relevant `Instruction` subclass itself, which are ignored by the
Tablegen backends that generate the MC data, but can be retrieved from
each instance of that instruction subclass when it's passed as a
template parameter to the multiclass that generates its isel patterns.
A nice effect of that is that I can fill in those informational fields
using `let` blocks, rather than having to type them out once per
instruction at `defm` time.

(As a result, quite a lot of existing instruction `def`s are
reindented by this patch, so it's clearer to read with whitespace
changes ignored.)

Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71458
2019-12-13 13:07:39 +00:00
John Brawn 01ba201abc [ARM] Add custom strict fp conversion lowering when non-strict is custom
We have custom lowering for operations converting to/from floating-point types
when we don't have hardware support for those types, and this doesn't interact
well with the target-independent legalization of the strict versions of these
operations. Fix this by adding similar custom lowering of the strict versions.

This fixes the last of the assertion failures in the CodeGen/ARM/fp-intrinsics
test, with the remaining failures due to poor instruction selection.

Differential Revision: https://reviews.llvm.org/D71127
2019-12-13 13:00:00 +00:00
Tim Renouf fce1a6f584 Revert "AMDGPU: Try to commute sub of boolean ext"
This reverts commit 69fcfb7d35.

As shown in the test I attached to this commit, the change I reverted
causes a problem with "zext(cc1) - zext(cc2)". It commuted
the operands to the sub and used different logic to select the addc/subc
instruction:
   sub zext (setcc), x => addcarry 0, x, setcc
   sub sext (setcc), x => subcarry 0, x, setcc

... but that is bogus. I believe it is not possible to fold those commuted
patterns into any form of addcarry or subcarry. It may have worked as
intended before "AMDGPU: Change boolean content type to 0 or 1" because
the setcc was considered to be -1 rather than 1.

Differential Revision: https://reviews.llvm.org/D70978

Change-Id: If2139421aa6c935cbd1d925af58fe4a4aa9e8f43
2019-12-13 12:49:06 +00:00
Alex Richardson be15dfa88f [NFC] Use EVT instead of bool for getSetCCInverse()
Summary:
The use of a boolean isInteger flag (generally initialized using
VT.isInteger()) caused errors in our out-of-tree CHERI backend
(https://github.com/CTSRD-CHERI/llvm-project).

In our backend, pointers use a separate ValueType (iFATPTR) and therefore
.isInteger() returns false. This meant that getSetCCInverse() was using the
floating-point variant and generated incorrect code for us:
`(void *)0x12033091e < (void *)0xffffffffffffffff` would return false.

Committing this change will significantly reduce our merge conflicts
for each upstream merge.

Reviewers: spatel, bogner

Reviewed By: bogner

Subscribers: wuzish, arsenm, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70917
2019-12-13 12:22:03 +00:00
Sjoerd Meijer e91420e17d Revert "[ARM][MVE] findVCMPToFoldIntoVPS. NFC."
This reverts commit 9468e3334b.

There's a test that doesn't like this change. The RDA analysis
gets invalided by changes in the block, which is not taken into
account. Revert while I work on a fix for this.
2019-12-13 11:56:44 +00:00
Mark Murray 228c74076d [ARM][MVE][Intrinsics] Add *_x() variants of my *_m() intrinsics.
Summary:
Better use of multiclass is used, and this helped find some existing
bugs in the predicated VMULL* intrinsics, which are now fixed.

The refactored VMULL[TB]Q_(INT|POLY)_M() intrinsics were discovered
to have an argument ("inactive") with incorrect type, and this required
a fix that is included in this whole patch. The argument "inactive"
should have been the same width (per vector element) as the return
type of the intrinsic, but was not in the case where the return type
was double the element width of the input types.

To assist in testing the multiclassing , and to thwart further gremlins,
the unit tests are improved in scope.

The *.ll tests are all generated by a small bit of throw-away scripting
from the corresponding *.c tests, and as such the diffs are large and
nasty. Look at the file rather than the diff.

Reviewers: dmgreen, miyuki, ostannard, simon_tatham

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71421
2019-12-13 11:51:23 +00:00
Kerry McLaughlin 4194ca8e5a Recommit "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
Updated pred_load patterns added to AArch64SVEInstrInfo.td by this patch
to use reg + imm non-temporal loads to fix previous test failures.

Original commit message:

Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
2019-12-13 10:08:20 +00:00
Nate Voorhies bc16666de4 [NFC][AArch64] Fix typo.
Summary: Coaleascer should be coalescer.

Reviewers: qcolombet, Jim

Reviewed By: Jim

Subscribers: Jim, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70731
2019-12-13 10:25:36 +08:00
Danilo Carvalho Grael 6bed43f3c4 [AArch64][SVE] Add integer arithmetic with immediate instructions.
Summary:
Add pattern matching for the following instructions:
- add, sub, subr, sqadd, sqsub, uqadd, uqsub

This patch required complex patterns to match the immediate with optinal left shift.

I re-used the Select function from the other SVE repo to implement the complext pattern.

I plan on doing another patch to also match constant vector of the same immediate.

Reviewers: sdesmalen, huntergr, rengolin, efriedma, c-rhodes, mgudim, kmclaughlin

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, amehsan

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71370
2019-12-12 17:28:46 -05:00
Jonas Paulsson 61f5ba5c32 [SystemZ] Implement the packed stack layout
Any llvm function with the "packed-stack" attribute will be compiled to use
the packed stack layout which reuses unused parts of the incoming register
save area. This is needed for building the Linux kernel.

Review: Ulrich Weigand
https://reviews.llvm.org/D70821
2019-12-12 10:26:03 -08:00
Michael Liao 11b2b2f4b1 [amdgpu] Fix `-Wenum-compare` warning. NFC. 2019-12-12 11:44:16 -05:00
Sjoerd Meijer 9468e3334b [ARM][MVE] findVCMPToFoldIntoVPS. NFC.
This adds ReachingDefAnalysis (RDA) to the VPTBlock pass, so that we can
reimplement findVCMPToFoldIntoVPS with just a few calls to RDA.

Differential Revision: https://reviews.llvm.org/D71330
2019-12-12 15:41:20 +00:00
Tom Stellard bf13a71095 AMDGPU/SILoadStoreOptimizer: Simplify function
Reviewers: arsenm, nhaehnle

Reviewed By: arsenm

Subscribers: merge_guards_bot, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71044
2019-12-12 06:53:03 -08:00
Sam Parker 1274ac3dc2 [ARM][MVE] Sink vector shift operand
Recommit e0b966643f. sub instructions were being generated for the
negated value, and for some reason they were the register only ones.
I think the problem was because I was grabbing the 'zero' from
vmovimm, which is a target constant. Now I'm just generating a new
Constant zero and so rsb instructions are now generated.

Original commit message:

The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 14:34:00 +00:00
Mirko Brkusanin d7357c52a4 [Mips] Add support for min/max/umin/umax atomics
In order to properly implement these atomic we need one register more than other
binary atomics. It is used for storing result from comparing values in addition
to the one that is used for actual result of operation.

https://reviews.llvm.org/D71028
2019-12-12 11:32:37 +01:00
Cullen Rhodes bbd16b6876 [AArch64][SVE] Remove nxv1f32 and nxv1f64 as legal types
Summary: Also cleans up ZPR register class definition.

Reviewers: sdesmalen, cameron.mcinally, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71351
2019-12-12 09:49:22 +00:00
Sam Parker f8ff3bf55b Revert "[ARM][MVE] Sink vector shift operand"
This reverts commit e0b966643f.

Instruction selection is failing with expensive checks.
2019-12-12 07:52:57 +00:00
Sam Parker e0b966643f [ARM][MVE] Sink vector shift operand
The shift amount operand can be provided in a general purpose
register so sink it. Flip the vdup and negate so the existing
patterns can be used for matching.

Differential Revision: https://reviews.llvm.org/D70841
2019-12-12 07:35:21 +00:00
Cameron McInally 7aa5c16088 [AArch64][SVE] Add patterns for scalable vselect
This patch matches scalable vector selects to predicated move instructions.

Differential Revision: https://reviews.llvm.org/D71298
2019-12-11 20:15:44 -06:00
Reid Kleckner 5d986953c8 [IR] Split out target specific intrinsic enums into separate headers
This has two main effects:
- Optimizes debug info size by saving 221.86 MB of obj file size in a
  Windows optimized+debug build of 'all'. This is 3.03% of 7,332.7MB of
  object file size.
- Incremental step towards decoupling target intrinsics.

The enums are still compact, so adding and removing a single
target-specific intrinsic will trigger a rebuild of all of LLVM.
Assigning distinct target id spaces is potential future work.

Part of PR34259

Reviewers: efriedma, echristo, MaskRay

Reviewed By: echristo, MaskRay

Differential Revision: https://reviews.llvm.org/D71320
2019-12-11 18:02:14 -08:00
Reid Kleckner 85ba5f637a Rename TTI::getIntImmCost for instructions and intrinsics
Soon Intrinsic::ID will be a plain integer, so this overload will not be
possible.

Rename both overloads to ensure that downstream targets observe this as
a build failure instead of a runtime failure.

Split off from D71320

Reviewers: efriedma

Differential Revision: https://reviews.llvm.org/D71381
2019-12-11 18:00:20 -08:00
Sanjay Patel cdf5cfea8e Revert "[SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()"
This reverts commit d1f0bdf2d2.
The patch can cause infinite loops in DAGCombiner.
2019-12-11 16:56:58 -05:00
Sam Clegg 881d877846 [WebAssembly] Add new `export_name` clang attribute for controlling wasm export names
This is equivalent to the existing `import_name` and `import_module`
attributes which control the import names in the final wasm binary
produced by lld.

This maps the existing

This attribute currently requires a string rather than using the
symbol name for a couple of reasons:

1. Avoid confusion with static and dynamic linking which is
   based on symbol name.  Exporting a function from a wasm module using
   this directive is orthogonal to both static and dynamic linking.
2. Avoids name mangling.

Differential Revision: https://reviews.llvm.org/D70520
2019-12-11 11:54:57 -08:00
Fangrui Song 25e21a09b3 Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=Off builds after D65958 and D70450 2019-12-11 11:04:03 -08:00
Andrzej Warzynski a75463c471 Add intrinsics for unary narrowing operations
Summary:
The following intrinsics for unary narrowing operations are added:
 * @llvm.aarch64.sve.sqxtnb
 * @llvm.aarch64.sve.uqxtnb
 * @llvm.aarch64.sve.sqxtunb
 * @llvm.aarch64.sve.sqxtnt
 * @llvm.aarch64.sve.uqxtnt
 * @llvm.aarch64.sve.sqxtunt

Reviewers: sdesmalen, rengolin, efriedma

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71270
2019-12-11 18:55:51 +00:00
Florian Hahn 2675a3c880 [AArch64] Be more careful to skip debug operands in LdSt Optimizier.
This fixes crashes with $noreg operands.
2019-12-11 18:47:45 +00:00
Sanjay Patel d1f0bdf2d2 [SDAG] remove use restriction in isNegatibleForFree() when called from getNegatedExpression()
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.

We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().

This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.

Differential Revision: https://reviews.llvm.org/D70975
2019-12-11 13:30:39 -05:00
Florian Hahn 4fe92abceb [AArch64] Skip debug ops with regsOverlap in AArch64 LD/ST opt.
This fixes a crash when debug instructions are in between 2 stores.
2019-12-11 16:26:31 +00:00
Craig Topper 3adc819b7a [X86] Erase dead LEA instruction after converting it to MOV in FixupLEAPass::processInstrForSlow3OpLEA. 2019-12-11 07:51:23 -08:00
Ulrich Weigand ac473394ff [SystemZ] Fix 128-bit strict FMA expansion pre-z14
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.

This worked correctly for regular FMA, but was implemented
incorrectly for the strict version.  This was not noticed
because we did not have test coverage for this case.

This patch fixes that incorrect expansion and adds the
missing test cases.
2019-12-11 16:32:08 +01:00
Kerry McLaughlin c0a3ab3655 Revert "[AArch64][SVE] Implement intrinsics for non-temporal loads & stores"
This reverts commit 3f5bf35f86 as it was
causing build failures in llvm-clang-x86_64-expensive-checks:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/392
http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-ubuntu/builds/1045
2019-12-11 13:58:39 +00:00
Florian Hahn 17554b8961 [AArch64] Teach Load/Store optimizier to rename store operands for pairing.
In some cases, we can rename a store operand, in order to enable pairing
of stores.  For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.

First, we check if we can rename the given register:

1. The first store register must be killed at the store, which means we
   do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
   stored register and check all uses in between are renamable. Along
   they way, we collect the minimal register classes of the uses for
   overlapping (sub/super)registers.

Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be

1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.

We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.

This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:

Metric: aarch64-ldst-opt.NumPairCreated

Program                                        base     patch    diff
 test-suite...nch/fourinarow/fourinarow.test     2.00    39.00   1850.0%
 test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test    46.00    80.00   73.9%
 test-suite...chmarks/Olden/power/power.test    70.00    96.00   37.1%
 test-suite...cations/hexxagon/hexxagon.test    29.00    39.00   34.5%
 test-suite...nchmarks/McCat/05-eks/eks.test   100.00   132.00   32.0%
 test-suite.../Trimaran/enc-rc4/enc-rc4.test    46.00    59.00   28.3%
 test-suite...T2006/473.astar/473.astar.test   160.00   200.00   25.0%
 test-suite.../Trimaran/enc-md5/enc-md5.test     8.00    10.00   25.0%
 test-suite...telecomm-gsm/telecomm-gsm.test   113.00   139.00   23.0%
 test-suite...ediabench/gsm/toast/toast.test   113.00   139.00   23.0%
 test-suite...Source/Benchmarks/sim/sim.test    91.00   111.00   22.0%
 test-suite...C/CFP2000/179.art/179.art.test    41.00    49.00   19.5%
 test-suite...peg2/mpeg2dec/mpeg2decode.test   245.00   279.00   13.9%
 test-suite...marks/Olden/health/health.test    16.00    18.00   12.5%
 test-suite...ks/Prolangs-C/cdecl/cdecl.test    90.00   101.00   12.2%
 test-suite...fice-ispell/office-ispell.test    91.00   100.00    9.9%
 test-suite...oxyApps-C/miniGMG/miniGMG.test   430.00   465.00    8.1%
 test-suite...lowfish/security-blowfish.test    39.00    42.00    7.7%
 test-suite.../Applications/spiff/spiff.test    42.00    45.00    7.1%
 test-suite...arks/mafft/pairlocalalign.test   2473.00  2646.00   7.0%
 test-suite.../VersaBench/ecbdes/ecbdes.test    29.00    31.00    6.9%
 test-suite...nch/beamformer/beamformer.test   220.00   235.00    6.8%
 test-suite...CFP2000/177.mesa/177.mesa.test   2110.00  2252.00   6.7%
 test-suite...ve-susan/automotive-susan.test   109.00   116.00    6.4%
 test-suite...s-C/unix-smail/unix-smail.test    65.00    69.00    6.2%
 test-suite...CI_Purple/SMG2000/smg2000.test   1194.00  1265.00   5.9%
 test-suite.../Benchmarks/nbench/nbench.test   472.00   500.00    5.9%
 test-suite...oxyApps-C/miniAMR/miniAMR.test   248.00   262.00    5.6%
 test-suite...quoia/CrystalMk/CrystalMk.test    18.00    19.00    5.6%
 test-suite...rks/tramp3d-v4/tramp3d-v4.test   7331.00  7710.00   5.2%
 test-suite.../Benchmarks/Bullet/bullet.test   5651.00  5938.00   5.1%
 test-suite...ternal/HMMER/hmmcalibrate.test   750.00   788.00    5.1%
 test-suite...T2006/456.hmmer/456.hmmer.test   764.00   802.00    5.0%
 test-suite...ications/JM/ldecod/ldecod.test   1028.00  1079.00   5.0%
 test-suite...CFP2006/444.namd/444.namd.test   1368.00  1434.00   4.8%
 test-suite...marks/7zip/7zip-benchmark.test   4471.00  4685.00   4.8%
 test-suite...6/464.h264ref/464.h264ref.test   3122.00  3271.00   4.8%
 test-suite...pplications/oggenc/oggenc.test   1497.00  1565.00   4.5%
 test-suite...T2000/300.twolf/300.twolf.test   742.00   774.00    4.3%
 test-suite.../Prolangs-C/loader/loader.test    24.00    25.00    4.2%
 test-suite...0.perlbench/400.perlbench.test   1983.00  2058.00   3.8%
 test-suite...ications/JM/lencod/lencod.test   4612.00  4785.00   3.8%
 test-suite...yApps-C++/PENNANT/PENNANT.test   995.00   1032.00   3.7%
 test-suite...arks/VersaBench/dbms/dbms.test    54.00    56.00    3.7%

Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D70450
2019-12-11 13:50:11 +00:00
Andrzej Warzynski 65651f197a [AArch64][SVE] Add DAG combine rules for gather loads and sext/zext
Summary:
These changes allow us to support sign-extending gather loads with the
exisiting intrinsics (i.e. @llvm.aarch64.sve.ld1.gather.*).

Reviewers: sdesmalen, huntergr, kmclaughlin, efriedma, rengolin, rovka, dancgr, mgudim

Reviewed By: sdesmalen

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits

Tags: #llvm

Differential revision: https://reviews.llvm.org/D70812
2019-12-11 12:56:18 +00:00
Oliver Stannard 6ae3d310bd Revert "Reland [AArch64][MachineOutliner] Return address signing for outlined functions"
This reverts commit cec2d5c174.

Reverting because this is still creating outlined functions with return
address signing instructions with mismatches SP values. For example:

  int *volatile v;

  void foo(int x) {
    int a[x];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

  void bar(int x) {
    int a[x];
    v = 0;
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
    v = &a[0];
  }

This generates these two outlined functions, both of which modify SP
between the paciasp and retaa instructions:

  $ clang --target=aarch64-arm-none-eabi -march=armv8.3-a -c test2.c -o - -S -Oz -mbranch-protection=pac-ret+leaf
  ...
  OUTLINED_FUNCTION_0:                    // @OUTLINED_FUNCTION_0
          .cfi_sections .debug_frame
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          mov     w8, w0
          lsl     x8, x8, #2
          add     x8, x8, #15             // =15
          mov     x9, sp
          and     x8, x8, #0x7fffffff0
          sub     x8, x9, x8
          mov     x29, sp
          mov     sp, x8
          adrp    x9, v
          retaa
  ...
  OUTLINED_FUNCTION_1:                    // @OUTLINED_FUNCTION_1
          .cfi_startproc
  // %bb.0:
          paciasp
          .cfi_negate_ra_state
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          str     x8, [x9, :lo12:v]
          mov     sp, x29
          retaa
2019-12-11 12:06:20 +00:00
Kerry McLaughlin 3f5bf35f86 [AArch64][SVE] Implement intrinsics for non-temporal loads & stores
Summary:
Adds the following intrinsics:
  - llvm.aarch64.sve.ldnt1
  - llvm.aarch64.sve.stnt1

This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.

Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71000
2019-12-11 11:13:51 +00:00
Sjoerd Meijer d97cf1f889 [ARM][LowOverheadLoops] Remove dead loop update instructions.
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.

To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.

This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:

    ..
    dlstp.32  lr, r2
  .LBB0_1:
    mov r3, r2
    subs  r2, #4
    vldrh.u32 q2, [r1], #8
    vmov  q1, q0
    vmla.u32  q0, q2, r0
    letp  lr, .LBB0_1
  @ %bb.2:
    vctp.32 r3
    ..

which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.

Differential Revision: https://reviews.llvm.org/D71007
2019-12-11 10:20:19 +00:00
Simon Tatham bd0f271c9e [ARM][MVE] Add intrinsics for immediate shifts. (reland)
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: dmgreen, MarkMurrayARM

Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-11 10:10:09 +00:00
QingShan Zhang eba7cbd3d0 [NFC][PowerPC] Remove the dead conditions in the if(cond) 2019-12-11 09:57:06 +00:00
QingShan Zhang f99297176c [PowerPC] Exploitate the Vector Integer Average Instructions
PowerPC has instruction to do the semantics of this piece of code:

vector int foo(vector int m, vector int n) {
  return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.

Differential Revision: https://reviews.llvm.org/D71002
2019-12-11 07:25:57 +00:00
Craig Topper 935d41e4bd [X86] Split v64i1 arguments into 2 v32i1s that will be promoted to v32i8 under min-legal-vector-width=256
This is an improvement to 88dacbd436
2019-12-10 17:29:02 -08:00
Wang, Pengfei 21bc8631fe [FPEnv][X86] Constrained FCmp intrinsics enabling on X86
Summary: This is a follow up of D69281, it enables the X86 backend support for the FP comparision.

Reviewers: uweigand, kpn, craig.topper, RKSimon, cameron.mcinally, andrew.w.kaylor

Subscribers: hiraditya, llvm-commits, annita.zhang, LuoYuanke, LiuChen3

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70582
2019-12-11 08:23:09 +08:00
Craig Topper 88dacbd436 [X86] Go back to considering v64i1 as a legal type under min-legal-vector-width=256. Scalarize v64i1 arguments and shuffles under min-legal-vector-width=256.
This reverts 3e1aee2ba7 in favor
of a different approach.

Scalarizing isn't great codegen, but making the type illegal was
interfering with k constraint in inline assembly.
2019-12-10 15:07:55 -08:00
Yonghong Song 7d0e8930ed [BPF] put not-section-attribute externs into BTF ".extern" data section
Currently for extern variables with section attribute, those
BTF_KIND_VARs will not be placed in any DataSec. This is
inconvenient as any other generated BTF_KIND_VAR belongs to
one DataSec. This patch put these extern variables into
".extern" section so bpf loader can have a consistent
processing mechanism for all data sections and variables.
2019-12-10 11:45:17 -08:00
Simon Cook a6e50e40e6 [RISCV] Improve assembler missing feature warnings
This adds support for printing improved missing feature error messages
from the assembler, which now indicates which feature caused the parse
to fail.

Differential Revision: https://reviews.llvm.org/D69899
2019-12-10 16:44:48 +00:00
Mikhail Maltsev e6d3261c67 [ARM][MVE] Refactor complex vector intrinsics [NFCI]
Summary:
This patch refactors instruction selection of the complex vector
addition, multiplication and multiply-add intrinsics, so that it is
now based on TableGen patterns rather than C++ code.

It also changes the first parameter (halving vs non-halving) of the
arm_mve_vcaddq IR intrinsic to match the corresponding instruction
encoding, hence it requires some changes in the tests.

The patch addresses David's comment in https://reviews.llvm.org/D71190

Reviewers: dmgreen, ostannard, simon_tatham, MarkMurrayARM

Reviewed By: dmgreen

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71245
2019-12-10 16:21:52 +00:00
Guillaume Chatelet 1b2842bf90 [Alignment][NFC] CreateMemSet use MaybeAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71213
2019-12-10 15:17:44 +01:00
Kiran Chandramohan 965ed1e974 [AArch64] Fix issues with large arrays on stack
Summary:
This patch fixes a few issues when large arrays are allocated on the
stack. Currently, clang has inconsistent behaviour, for debug builds
there is an assertion failure when the array size on stack is around 2GB
but there is no assertion when the stack is around 8GB. For release
builds there is no assertion, the compilation succeeds but generates
incorrect code. The incorrect code generated is due to using
int/unsigned int instead of their 64-bit counterparts. This patch,
1) Removes the assertion in frame legality check.
2) Converts int/unsigned int in some places to the 64-bit variants. This
helps in generating correct code and removes the inconsistent behaviour.
3) Adds a test which runs without optimisations.

Reviewers: sdesmalen, efriedma, fhahn, aemerson

Reviewed By: efriedma

Subscribers: eli.friedman, fpetrogalli, kristof.beyls, hiraditya,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70496
2019-12-10 11:44:41 +00:00
Cullen Rhodes 1b9a608c84 [AArch64][SVE] Add wide compare immediate patterns
Summary:
Recognize wide compares where the wide operand is a splat of a scalar
value in the appropriate range and convert to the immediate variant of
the instruction.

Patch by Graham Hunter

Reviewers: sdesmalen, efriedma, dancgr, rovka, rengolin

Reviewed By: efriedma

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl,
llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71009
2019-12-10 10:41:22 +00:00
Yonghong Song 4448125007 [BPF] Support to emit debugInfo for extern variables
extern variable usage in BPF is different from traditional
pure user space application. Recent discussion in linux bpf
mailing list has two use cases where debug info types are
required to use extern variables:
  - extern types are required to have a suitable interface
    in libbpf (bpf loader) to provide kernel config parameters
    to bpf programs.
    https://lore.kernel.org/bpf/CAEf4BzYCNo5GeVGMhp3fhysQ=_axAf=23PtwaZs-yAyafmXC9g@mail.gmail.com/T/#t
  - extern types are required so kernel bpf verifier can
    verify program which uses external functions more precisely.
    This will make later link with actual external function no
    need to reverify.
    https://lore.kernel.org/bpf/87eez4odqp.fsf@toke.dk/T/#m8d5c3e87ffe7f2764e02d722cb0d8cbc136880ed

This patch added bpf support to consume such info into BTF,
which can then be used by bpf loader. Function processFuncPrototypes()
only adds extern function definitions into BTF. The functions
with actual definition have been added to BTF in some other places.

Differential Revision: https://reviews.llvm.org/D70697
2019-12-09 21:53:29 -08:00
Huihui Zhang 6507e13589 [NFC] Add { } to silence compiler warning [-Wmissing-braces].
../llvm/lib/Target/PowerPC/PPCISelLowering.cpp:5371:37: warning: suggest braces around initialization of subobject [-Wmissing-braces]
  std::array<EVT, 2> ReturnTypes = {MVT::Other, MVT::Glue};
                                    ^~~~~~~~~~~~~~~~~~~~~
                                    {                    }
2019-12-09 17:19:34 -08:00
Liu, Chen3 bbf7860b93 add support for strict operation fpextend/fpround/fsqrt on X86 backend
Differential Revision: https://reviews.llvm.org/D71184
2019-12-10 09:04:28 +08:00
Eric Christopher 9c6b7f68b8 Revert "[ARM][MVE] Add intrinsics for immediate shifts."
and two follow-on commits: one warning fix and one functionality.

As it's breaking at least the lto bot:

http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/15132/steps/test-stage1-compiler/logs/stdio

This reverts commits:

 8d70f3c933
 ff4dceef92
 d97b3e3e65
2019-12-09 16:47:38 -08:00
Eli Friedman f1ddef34f1 [AArch64][SVE] Implement SPLAT_VECTOR for i1 vectors.
The generated sequence with whilelo is unintuitive, but it's the best
I could come up with given the limited number of SVE instructions that
interact with scalar registers. The other sequence I was considering
was something like dup+cmpne, but an extra scalar instruction seems
better than an extra vector instruction.

Differential Revision: https://reviews.llvm.org/D71160
2019-12-09 15:09:33 -08:00
Jinsong Ji a0b025b8e7 [PowerPC] [NFC] Cleanup xxpermdi peephole optimization
Summary:
Following on from rG884351547da2, this patch cleans up the logic for `xxpermdi`
peephole optimizations by converting two layers of nested `if`s to early breaks
and simplifying the logic.

Reviewers: hfinkel, nemanjai, jsji, lkail, #powerpc, steven.zhang

Reviewed By: #powerpc, steven.zhang

Subscribers: wuzish, steven.zhang, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71170

Patch by vddvss (Colin Samples).
2019-12-09 21:41:26 +00:00
Hiroshi Yamauchi d9ae493937 [PGO][PGSO] Instrument the code gen / target passes.
Summary:
Split off of D67120.

Add the profile guided size optimization instrumentation / queries in the code
gen or target passes. This doesn't enable the size optimizations in those passes
yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass
queries).

A second try after reverted D71072.

Reviewers: davidxl

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71149
2019-12-09 12:42:59 -08:00
Jinsong Ji 3d41a58eac [PowerPC][NFC] Rename ANDI(S)o8 to ANDI(S)8o
Summary:
This is found during https://reviews.llvm.org/D70758
All the other record forms are having suffix o at the end.
ANDIo8 and ANDISo8 are the only two that put o before 8.

This patch rename them to be consistent with others.

Reviewers: #powerpc, hfinkel, nemanjai, lei, steven.zhang, echristo, jhibbits, joerg

Reviewed By: jhibbits

Subscribers: wuzish, hiraditya, kbarton, shchenz, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70928
2019-12-09 19:21:34 +00:00
Mark Murray fc3417cb5a [ARM][MVE][Intrinsics] Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics.
Summary: Add VQADDQ, VHADDQ, VRHADDQ, VQSUBQ, VHSUBQ, VQDMULHQ, VQRDMULHQ intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen, miyuki

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71198
2019-12-09 17:41:47 +00:00
Mark Murray 2eb61fa5d6 [ARM][MVE][Intrinsics] Add VMULL[BT]Q_(INT|POLY) intrinsics.
Summary: Add VMULL[BT]Q_(INT|POLY) intrinsics and unit tests.

Reviewers: simon_tatham, ostannard, dmgreen

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71066
2019-12-09 17:41:47 +00:00
Sean Fertile c78726fae0 [PowerPC] Refactor FinishCall. [NFC]
Refactor FinishCall to be more easily understandable as a precursor to
implementing indirect calls for AIX. The refactor tries to group similar
code together at the cost of some code duplication. The high level
overview of the refactor:

- Adds a number of helper functions for things like:
  * Determining if a call is indirect.
  * What the Opcode for a call is.
  * Transforming the callee for a direct function call.
  * Extracting the Chain operand from a CallSeqStart node.
  * Building the operands of the call.

- Adds helpers for building the indirect call DAG nodes
  (excluding the call instruction itself which is created in
  `FinishCall`).

- Removes PrepareCall, which has been subsumed by the
  helpers.

- Rename 'InFlag' to 'Glue'.

- FinishCall has been refactored to:
  1) Set TOC pointer usage on the DAG for the TOC based
     subtargets.
  2) Calculate if a call is indirect.
  3) Determine the Opcode to use for the call
     instruction.
  4) Transform the Callee for direct calls, or build
     the DAG nodes for indirect calls.
  5) Buildup the call operands.
  6) Emit the call instruction.
  7) If needed, emit the callSeqEnd Node and
     finish lowering by calling `LowerCallResult`

Differential Revision: https://reviews.llvm.org/D70126
2019-12-09 12:40:15 -05:00
Simon Tatham 8d70f3c933 [ARM] Fix NEON failure introduced by D71065.
I rewrote the isel tablegen for MVE immediate shifts, and accidentally
removed the `let Predicates=[HasMVEInt]` that was wrapping the old
version, which seems to have allowed those rules to cause trouble on
non-MVE targets. That's what I get for only re-running the MVE tests.
2019-12-09 16:56:00 +00:00
Simon Tatham d97b3e3e65 [ARM][MVE] Add intrinsics for immediate shifts.
Summary:
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.

There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.

In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.

Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard

Reviewed By: MarkMurrayARM

Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71065
2019-12-09 15:44:09 +00:00
Sam Elliott c20930a724 [RISCV] Machine Operand Flag Serialization
Summary:
These hooks ensure that the RISC-V backend can serialize and parse MIR
correctly.

Reviewers: jrtc27, luismarques

Reviewed By: luismarques

Subscribers: hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, pzheng, sameer.abuasal, apazos, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D70666
2019-12-09 13:18:32 +00:00
Mikhail Maltsev 0d1490bf6a [ARM][MVE] Add complex vector intrinsics
Summary:
This patch adds intrinsics for the following MVE instructions:
* VCADD, VHCADD
* VCMUL
* VCMLA

Each of the above 3 groups has a corresponding new LLVM IR intrinsic.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: MarkMurrayARM

Subscribers: merge_guards_bot, kristof.beyls, hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D71190
2019-12-09 12:05:59 +00:00
David Green b1aba0378e [ARM] Enable MVE masked loads and stores
With the extra optimisations we have done, these should now be fine to
enable by default. Which is what this patch does.

Differential Revision: https://reviews.llvm.org/D70968
2019-12-09 11:37:34 +00:00
David Green be7a107070 [ARM] Teach the Arm cost model that a Shift can be folded into other instructions
This attempts to teach the cost model in Arm that code such as:
  %s = shl i32 %a, 3
  %a = and i32 %s, %b
Can under Arm or Thumb2 become:
  and r0, r1, r2, lsl #3

So the cost of the shift can essentially be free. To do this without
trying to artificially adjust the cost of the "and" instruction, it
needs to get the users of the shl and check if they are a type of
instruction that the shift can be folded into. And so it needs to have
access to the actual instruction in getArithmeticInstrCost, which if
available is added as an extra parameter much like getCastInstrCost.

We otherwise limit it to shifts with a single user, which should
hopefully handle most of the cases. The list of instruction that the
shift can be folded into include ADC, ADD, AND, BIC, CMP, EOR, MVN, ORR,
ORN, RSB, SBC and SUB. This translates to Add, Sub, And, Or, Xor and
ICmp.

Differential Revision: https://reviews.llvm.org/D70966
2019-12-09 10:24:33 +00:00
David Green f008b5b8ce [ARM] Additional tests and minor formatting. NFC
This adds some extra cost model tests for shifts, and does some minor
adjustments to some Neon code to make it clear as to what it applies to.
Both NFC.
2019-12-09 10:24:33 +00:00
David Stenberg 6965f835b4 [DebugInfo] Make describeLoadedValue() reg aware
Summary:
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes situations in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.

Reviewers: djtodoro, NikolaPrica, aprantl, vsk

Reviewed By: djtodoro, vsk

Subscribers: ormris, hiraditya, llvm-commits

Tags: #debug-info, #llvm

Differential Revision: https://reviews.llvm.org/D70431
2019-12-09 10:47:49 +01:00
David Stenberg f3696533f2 Revert "[DebugInfo] Make describeLoadedValue() reg aware"
This reverts commit 3cd93a4efc.
I'll recommit with a well-formatted arcanist commit message.
2019-12-09 10:45:13 +01:00
David Stenberg 3cd93a4efc [DebugInfo] Make describeLoadedValue() reg aware
Currently the describeLoadedValue() hook is assumed to describe the
value of the instruction's first explicit define. The hook will not be
called for instructions with more than one explicit define.

This commit adds a register parameter to the describeLoadedValue() hook,
and invokes the hook for all registers in the worklist.

This will allow us to for example describe instructions which produce
more than two parameters' values; e.g. Hexagon's various combine
instructions.

This also fixes a case in our downstream target where we may pass
smaller parameters in the high part of a register. If such a parameter's
value is produced by a larger copy instruction, we can't describe the
call site value using the super-register, and we instead need to know
which sub-register that should be used.

This also allows us to handle cases like this:

  $ebx = [...]
  $rdi = MOVSX64rr32 $ebx
  $esi = MOV32rr $edi
  CALL64pcrel32 @call

The hook will first be invoked for the MOV32rr instruction, which will
say that @call's second parameter (passed in $esi) is described by $edi.
As $edi is not preserved it will be added to the worklist. When we get
to the MOVSX64rr32 instruction, we need to describe two values; the
sign-extended value of $ebx -> $rdi for the first parameter, and $ebx ->
$edi for the second parameter, which is now possible.

This commit modifies the dbgcall-site-lea-interpretation.mir test case.
In the test case, the values of some 32-bit parameters were produced
with LEA64r. Perhaps we can in general cases handle such by emitting
expressions that AND out the lower 32-bits, but I have not been able to
land in a case where a LEA64r is used for a 32-bit parameter instead of
LEA64_32 from C code.

I have not found a case where it would be useful to describe parameters
using implicit defines, so in this patch the hook is still only invoked
for explicit defines of forwarding registers.
2019-12-09 10:44:17 +01:00
David Green 792fab343b [ARM] Attempt to use whole register vmovs for MVE shuffles.
MVE doesn't have the range of shuffle instructions available in Neon. We
also cannot use the trick of cutting a difficult vector shuffle in half
to simplify things. Instead we need to be more careful about how we
lower shuffles.

This patch adds an extra combine that attempts to find "whole lane"
vmovs when lowering shuffles of smaller types. This helps us make some
shuffles a lot simpler, generating single lane movs for the parts that
can make use of it, falling back to the original shuffle for the rest.

Differential Revision: https://reviews.llvm.org/D69509
2019-12-08 10:53:54 +00:00
David Green 3a6eb5f160 [ARM] Disable VLD4 under MVE
Alas, using half the available vector registers in a single instruction
is just too much for the register allocator to handle. The mve-vldst4.ll
test here fails when these instructions are enabled at present. This
patch disables the generation of VLD4 and VST4 by adding a
mve-max-interleave-factor option, which we currently default to 2.

Differential Revision: https://reviews.llvm.org/D71109
2019-12-08 10:37:29 +00:00
Ulrich Weigand a6fcdb211d [SystemZ] Fix build bot failures
My patch 9db13b5a7d seems to have
caused some build bots to fail due to warnings that appear only
when using -Wcovered-switch-default.

This patch is an attempt to fix this by trying to avoid both the warning
"default label in switch which covers all enumeration values"
for the inner switch statements and at the same time the warning
"this statement may fall through"
for the outer switch statement in getVectorComparison
(SystemZISelLowering.cpp).
2019-12-07 19:37:16 +01:00
Yonghong Song 5ea611daf9 [BPF] Support weak global variables for BTF
Generate types for global variables with "weak" attribute.
Keep allocation scope the same for both weak and non-weak
globals as ELF symbol table can determine whether a global
symbol is weak or not.

Differential Revision: https://reviews.llvm.org/D71162
2019-12-07 08:58:19 -08:00
Ulrich Weigand 9db13b5a7d [FPEnv] Constrained FCmp intrinsics
This adds support for constrained floating-point comparison intrinsics.

Specifically, we add:

      declare <ty2>
      @llvm.experimental.constrained.fcmp(<type> <op1>, <type> <op2>,
                                          metadata <condition code>,
                                          metadata <exception behavior>)
      declare <ty2>
      @llvm.experimental.constrained.fcmps(<type> <op1>, <type> <op2>,
                                           metadata <condition code>,
                                           metadata <exception behavior>)

The first variant implements an IEEE "quiet" comparison (i.e. we only
get an invalid FP exception if either argument is a SNaN), while the
second variant implements an IEEE "signaling" comparison (i.e. we get
an invalid FP exception if either argument is any NaN).

The condition code is implemented as a metadata string.  The same set
of predicates as for the fcmp instruction is supported (except for the
"true" and "false" predicates).

These new intrinsics are mapped by SelectionDAG codegen onto two new
ISD opcodes, ISD::STRICT_FSETCC and ISD::STRICT_FSETCCS, again
representing quiet vs. signaling comparison operations.  Otherwise
those nodes look like SETCC nodes, with an additional chain argument
and result as usual for strict FP nodes.  The patch includes support
for the common legalization operations for those nodes.

The patch also includes full SystemZ back-end support for the new
ISD nodes, mapping them to all available SystemZ instruction to
fully implement strict semantics (scalar and vector).

Differential Revision: https://reviews.llvm.org/D69281
2019-12-07 11:28:39 +01:00
Kai Luo 884351547d [PowerPC] Fix MI peephole optimization for splats
Summary:
This patch fixes an issue where the PPC MI peephole optimization pass incorrectly remove a vector swap.

Specifically, the pass can combine a splat/swap to a splat/copy. It uses `TargetRegisterInfo::lookThruCopyLike` to determine that the operands to the splat are the same. However, the current logic only compares the operands based on register numbers. In the case where the splat operands are ultimately feed from the same physical register, the pass can incorrectly remove a swap if the feed register for one of the operands has been clobbered.

This patch adds a check to ensure that the registers feeding are both virtual registers or the operands to the splat or swap are both the same register.

Here is an example in pseudo-MIR of what happens in the test cased added in this patch:

Before PPC MI peephole optimization:
```
%arg = XVADDDP %0, %1

$f1 = COPY %arg.sub_64
call double rint(double)
%res.first = COPY $f1
%vec.res.first = SUBREG_TO_REG 1, %res.first, %subreg.sub_64

%arg.swapped = XXPERMDI %arg, %arg, 2
$f1 = COPY %arg.swapped.sub_64
call double rint(double)
%res.second = COPY $f1

%vec.res.second = SUBREG_TO_REG 1, %res.second, %subreg.sub_64
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
%vec.res = XXPERMDI %vec.res.splat, %vec.res.splat, 2
; %vec.res == [ %vec.res.second[0], %vec.res.first[0] ]
```

After optimization:
```
; ...
%vec.res.splat = XXPERMDI %vec.res.first, %vec.res.second, 0
; lookThruCopyLike(%vec.res.first) == lookThruCopyLike(%vec.res.second) == $f1
; so the pass replaces the swap with a copy:
%vec.res = COPY %vec.res.splat
; %vec.res == [ %vec.res.first[0], %vec.res.second[0] ]
```

As best as I can tell, this has occurred since r288152, which added support for lowering certain vector operations to direct moves in the form of a splat.

Committed for vddvss (Colin Samples). Thanks Colin for the patch!
Differential Revision: https://reviews.llvm.org/D69497
2019-12-07 14:51:20 +08:00
Amara Emerson 7ac9662401 [AArch64][GlobalISel] Add missing default statement to a switch in the selector. 2019-12-06 17:43:27 -08:00
Sterling Augustine aa3c877fb5 Move variable only used in an assert into the assert itself.
This prevents unused variable warnings from breaking the build.
2019-12-06 17:09:19 -08:00
Amara Emerson c77b441140 [AArch64][GlobalISel] Add support for selection of vector G_SHL with immediates.
Only implemented for the type combinations already supported for G_SHL.

Differential Revision: https://reviews.llvm.org/D71153
2019-12-06 16:24:57 -08:00
Sam Clegg b4f4e370b5 [WebAssebmly][MC] Support .import_name/.import_field asm directives
Convert the MC test to use asm rather than bitcode.

This is a precursor to https://reviews.llvm.org/D70520.

Differential Revision: https://reviews.llvm.org/D70877
2019-12-06 15:09:56 -08:00
Amara Emerson 84fdd9d7a5 [X86] Fix prolog/epilog mismatch for stack protectors on win32-macho.
The xor'ing behaviour is only used for msvc/crt environments, when we're targeting
macho the guard load code doesn't know about the xor in the epilog. Disable xor'ing
when targeting win32-macho to be consistent.

Differential Revision: https://reviews.llvm.org/D71095
2019-12-06 14:44:56 -08:00
Craig Topper 28b573d249 [TargetLowering] Fix another potential FPE in expandFP_TO_UINT
D53794 introduced code to perform the FP_TO_UINT expansion via FP_TO_SINT in a way that would never expose floating-point exceptions in the intermediate steps. Unfortunately, I just noticed there is still a way this can happen. As discussed in D53794, the compiler now generates this sequence:

// Sel = Src < 0x8000000000000000
// Val = select Sel, Src, Src - 0x8000000000000000
// Ofs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val) ^ Ofs
The problem is with the Src - 0x8000000000000000 expression. As I mentioned in the original review, that expression can never overflow or underflow if the original value is in range for FP_TO_UINT. But I missed that we can get an Inexact exception in the case where Src is a very small positive value. (In this case the result of the sub is ignored, but that doesn't help.)

Instead, I'd suggest to use the following sequence:

// Sel = Src < 0x8000000000000000
// FltOfs = select Sel, 0, 0x8000000000000000
// IntOfs = select Sel, 0, 0x8000000000000000
// Result = fp_to_sint(Val - FltOfs) ^ IntOfs
In the case where the value is already in range of FP_TO_SINT, we now simply compute Val - 0, which now definitely cannot trap (unless Val is a NaN in which case we'd want to trap anyway).

In the case where the value is not in range of FP_TO_SINT, but still in range of FP_TO_UINT, the sub can never be inexact, as Val is between 2^(n-1) and (2^n)-1, i.e. always has the 2^(n-1) bit set, and the sub is always simply clearing that bit.

There is a slight complication in the case where Val is a constant, so we know at compile time whether Sel is true or false. In that scenario, the old code would automatically optimize the sub away, while this no longer happens with the new code. Instead, I've added extra code to check for this case and then just fall back to FP_TO_SINT directly. (This seems to catch even slightly more cases.)

Original version of the patch by Ulrich Weigand. X86 changes added by Craig Topper

Differential Revision: https://reviews.llvm.org/D67105
2019-12-06 14:11:04 -08:00
Reid Kleckner c089f02898 [X86] Don't setup and teardown memory for a musttail call
Summary:
musttail calls should not require allocating extra stack for arguments.
Updates to arguments passed in memory should happen in place before the
epilogue.

This bug was mostly a missed optimization, unless inalloca was used and
store to push conversion fired.

If a reserved call frame was used for an inalloca musttail call, the
call setup and teardown instructions would be deleted, and SP
adjustments would be inserted in the prologue and epilogue. You can see
these are removed from several test cases in this change.

In the case where the stack frame was not reserved, i.e. call frame
optimization fires and turns argument stores into pushes, then the
imbalanced call frame setup instructions created for inalloca calls
become a problem. They remain in the instruction stream, resulting in a
call setup that allocates zero bytes (expected for inalloca), and a call
teardown that deallocates the inalloca pack. This deallocation was
unbalanced, leading to subsequent crashes.

Reviewers: hans

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D71097
2019-12-06 12:58:54 -08:00
Hiroshi Yamauchi 2eb30fafa5 Revert "[PGO][PGSO] Instrument the code gen / target passes."
This reverts commit 9a0b5e1407.

This seems to break buildbots.
2019-12-06 12:17:32 -08:00