Commit Graph

56885 Commits

Author SHA1 Message Date
Shengchen Kan d0efd7bfcf [X86][MC] Disable Prefix padding after hardcode/prefix
Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight, eli.friedman

Reviewed By: craig.topper

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76475
2020-04-01 09:49:52 +08:00
Matt Arsenault 43e576593e AMDGPU/GlobalISel: Fix insert point when lowering G_FMAD 2020-03-31 19:57:06 -04:00
Fangrui Song 4af7560b37 [PPCInstPrinter] Print conditional branches as `bt 2, $target` instead of `bt 2, .+$imm`
Follow-up of D76591.

Reviewed By: #powerpc, sfertile

Differential Revision: https://reviews.llvm.org/D76907
2020-03-31 15:05:38 -07:00
Daniel Frampton 494abe139a [AArch64] Change AArch64 Windows EH UnwindHelp object to be a fixed object
The UnwindHelp object is used during exception handling by runtime
code. It must be findable from a fixed offset from FP.

This change allocates the UnwindHelp object as a fixed object (as is
done for x86_64) to ensure that both the generated code and runtime
agree on the location of the object.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45346

Differential Revision: https://reviews.llvm.org/D77016
2020-03-31 14:21:21 -07:00
Daniel Frampton 522b4c4b88 [AArch64] Fix mismatch in prologue and epilogue for funclets on Windows
The generated code for a funclet can have an add to sp in the epilogue
for which there is no corresponding sub in the prologue.

This patch removes the early return from emitPrologue that was
preventing the sub to sp, and instead conditionalizes the appropriate
parts of the rest of the function.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45345

Differential Revision: https://reviews.llvm.org/D77015
2020-03-31 14:21:18 -07:00
Eli Friedman 1ee6ec2bf3 Remove "mask" operand from shufflevector.
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

Differential Revision: https://reviews.llvm.org/D72467
2020-03-31 13:08:59 -07:00
Eli Friedman dacf8d3562 [AArch64][SVE] Add support for fcmp.
This also requires support for boolean "not", so I added boolean logic
while I was there.

Differential Revision: https://reviews.llvm.org/D76901
2020-03-31 12:04:39 -07:00
Stanislav Mekhanoshin 08682dcc86 [AMDGPU] Define 16 bit VGPR subregs
We have loads preserving low and high 16 bits of their
destinations. However, we always use a whole 32 bit register
for these. The same happens with 16 bit stores, we have to
use full 32 bit register so if high bits are clobbered the
register needs to be copied. One example of such code is
added to the load-hi16.ll.

The proper solution to the problem is to define 16 bit subregs
and use them in the operations which do not read another half
of a VGPR or preserve it if the VGPR is written.

This patch simply defines subregisters and register classes.
At the moment there should be no difference in code generation.
A lot more work is needed to actually use these new register
classes. Therefore, there are no new tests at this time.

Register weight calculation has changed with new subregs so
appropriate changes were made to keep all calculations just
as they are now, especially calculations of register pressure.

Differential Revision: https://reviews.llvm.org/D74873
2020-03-31 11:49:06 -07:00
Ulrich Weigand c726c920e0 [SystemZ] Allow %r0 in address context for AsmParser
Registers used in any address (as well as in a few other contexts)
have special semantics when a "zero" register is used, which is
why the back-end defines extra register classes ADDR32, ADDR64 etc
to be used to prevent the register allocator from using %r0 there.

However, when writing assembler code "by hand", you sometimes need
to trigger that special semantics.  However, currently the AsmParser
will reject %r0 in those places.  In some cases it may be possible
to write that instruction differently - but in others it is currently
not possible at all.

This check in AsmParser simply seems overly strict, so this patch
just removes the check completely.  This brings the behaviour of
AsmParser in line with the GNU assembler as well.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=45092
2020-03-31 19:48:50 +02:00
Guillaume Chatelet 998118c3d3 [Alignment][NFC] Deprecate MachineMemOperand::getMachineMemOperand version that takes an untyped alignement.
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, jfb, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77138
2020-03-31 16:05:31 +00:00
Jonas Paulsson 665bebb46f [SystemZ] Add isCommutable flag on VFA and VFM.
NFC

Review: Ulrich Weigand
2020-03-31 17:17:52 +02:00
Jonas Paulsson f481d48893 [SystemZ] Improve foldMemoryOperandImpl().
Fold MS(G)RKC -> MS(G)C.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76771
2020-03-31 17:17:51 +02:00
Simon Pilgrim 7e0e5fa499 Revert rGefe59d6717dcdf7777acb9b7a734e1a520bdf22a "[X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations"
This might be causing an issue on the fuchsia-x86_64-linux buildbot - reverting to see what happens.
2020-03-31 15:47:30 +01:00
Simon Pilgrim 38619fa7da Fix enumeral mismatch warning. NFCI.
Don't mix llvm::ISD and llvm::X86ISD.
2020-03-31 15:38:02 +01:00
Simon Pilgrim efe59d6717 [X86][SSE] lowerShuffleWithPACK - extend to use chained PACKs for larger truncations
If canLowerByDroppingEvenElements indicates that the shuffle is a N:1 compaction pattern and the inputs are suitably sign/zero extended then we can use a chain of PACKSS/PACKUS to compact.

This helps avoid PSHUFB (and its mask load) for short shuffle chains, shuffle combining will still replace with a PSHUFB if we have enough shuffles as getFauxShuffleMask can recognise PACKSS/PACKUS chains.
2020-03-31 14:48:48 +01:00
Guillaume Chatelet b9810988b2 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77127
2020-03-31 11:04:10 +00:00
Simon Pilgrim 98357dee1c [X86] Combine concat(palignr,palignr) -> palignr(concat,concat)
combineX86ShufflesRecursively should handle this someday
2020-03-31 11:06:35 +01:00
Simon Pilgrim 7a4a98a9c4 [X86] Move canLowerByDroppingEvenElements earlier to be with matchShuffleWithPACK. NFCI.
Make sure its defined earlier so more shuffle lowering methods can use it.
2020-03-31 10:56:35 +01:00
David Green 2c5f43f9dd [ARM] Fix qdadd operand order
qdadd is defined as sat(Rm + sat(2*Rn)). We had the Rm and Rn switched
the wrong way around.

Differential Revision: https://reviews.llvm.org/D77049
2020-03-31 10:11:36 +01:00
Guillaume Chatelet c9d5c19597 [Alignment][NFC] Transitionning more getMachineMemOperand call sites
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, jrtc27, atanasyan, Jim, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77121
2020-03-31 08:36:18 +00:00
Sebastian Neubauer 5d3a69feca [AMDGPU] New llvm.amdgcn.ballot intrinsic
Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

Differential Revision: https://reviews.llvm.org/D65088
2020-03-31 10:35:39 +02:00
Kai Wang 581ba35291 [RISCV] ELF attribute section for RISC-V.
Leverage ARM ELF build attribute section to create ELF attribute section
for RISC-V. Extract the common part of parsing logic for this section
into ELFAttributeParser.[cpp|h] and ELFAttributes.[cpp|h].

Differential Revision: https://reviews.llvm.org/D74023
2020-03-31 16:16:19 +08:00
Shengchen Kan 86b4076027 [NFC] Remove unuseful infrastructure 2020-03-31 16:14:08 +08:00
Guillaume Chatelet 0de874adfb [Alignment][NFC] Transition to inferAlignFromPtrInfo
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, jvesely, nhaehnle, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77120
2020-03-31 08:06:49 +00:00
Guillaume Chatelet 80ef5c5640 Remove unused variable 2020-03-31 07:29:49 +00:00
Djordje Todorovic bcbd60aeb5 [Mips] Make MipsBranchExpansion aware of BBIT family of branch
Octeon branches (bbit0/bbit032/bbit1/bbit132) have an immediate operand,
so it is legal to have such replacement within
MipsBranchExpansion::replaceBranch().

According to the specification, a branch (e.g. bbit0 ) looks like:

bbit0  rs p offset  // p is an immediate operand
  if !rs<p> then branch

Without this patch, an assertion triggers in the method,
and the problem has been found in the real example.

Differential Revision: https://reviews.llvm.org/D76842
2020-03-31 09:20:51 +02:00
Dylan McKay 7b808b105f [AVR] Generalize the previous interrupt bugfix to signal handlers too 2020-03-31 19:33:34 +13:00
Dylan McKay 339b34266c [AVR] Respect the 'interrupt' function attribute
In the past, AVR functions were only lowered with interrupt-specific
machine code if the function was defined with the "avr-interrupt" or
"avr-signal" calling conventions.

This patch modifies the backend so that if the function does not have a
special calling convention, but does have an "interrupt" attribute,
that function is interpreted as a function with interrupts.

This also extracts the "is this function an interrupt" logic from
several disparate places in the backend into one AVRMachineFunctionInfo
attribute.

Bug found by Wilhelm Meier.
2020-03-31 19:00:18 +13:00
QingShan Zhang 4eeb56d088 [PowerPC] Don't do the folding if the operand is R0/X0
We have this transformation in PowerPC peephole:

Replace instruction:
  renamable $x28 = ADDI8 renamable $x7, -8
  renamable $x28 = ADD8 killed renamable $x28, renamable $x0
  STFD killed renamable $f0, -8, killed renamable $x28 :: (store 8 into %ir._ind_cast99.epil)
with:
  renamable $x28 = ADDI8 renamable $x7, -16
  STFDX killed renamable $f0, $x0, killed $x28 :: (store 8 into %ir._ind_cast99.epil)

It is invalid as the '$x0' in STFDX is constant 0, not register r0.

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D77034
2020-03-31 02:50:19 +00:00
Matt Arsenault b8fc192d42 Revert "[GISel]: Fix incorrect IRTranslation while translating null pointer types"
This reverts commit b3297ef051.

This change is incorrect. The current semantic of null in the IR is a
pointer with the bitvalue 0. It is not a cast from an integer 0, so
this should preserve the pointer type.
2020-03-30 19:30:42 -04:00
Matt Arsenault d0dd24a381 AMDGPU/GlobalISel: Fix crashing on weird G_INSERT sources
No test since these cases shouldn't really be getting through the
legalizer.
2020-03-30 18:14:04 -04:00
Matt Arsenault db9f0d1ce5 AMDGPU: Form v_cvt_ubyte* with f16 results
We get 2 conversion instructions anyway. Previously we would get a
conversion with SDWA reading from a byte source, which has a larger
encoding.
2020-03-30 17:59:49 -04:00
Matt Arsenault b27d255e1e AMDGPU/GlobalISel: Form CVT_F32_UBYTE0 2020-03-30 17:45:55 -04:00
Matt Arsenault bcb643c8af AMDGPU/GlobalISel: Handle image atomics 2020-03-30 17:41:04 -04:00
Matt Arsenault 48eda37282 AMDGPU/GlobalISel: Start selecting image intrinsics
Does not handled atomics yet.
2020-03-30 17:33:04 -04:00
Matt Arsenault 570a578e46 AMDGPU: Account for dmask when computing image mem size
Only the number of elements in the dmask will really be accessed.
2020-03-30 17:30:58 -04:00
Jay Foad cee65d51fe AMDGPU: Implement getMemcpyLoopLoweringType
Summary: Based on a patch by Matt Arsenault.

Reviewers: rampitec, kerbowa, nhaehnle, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77057
2020-03-30 22:21:01 +01:00
Matt Arsenault 2641ba52a9 AMDGPU/GlobalISel: Round up image operations with 5, 6 or 7 addresses
The instruction definitions are missing for these register types, so
round up to 8 like the DAG.
2020-03-30 17:02:47 -04:00
Matt Arsenault 42d5609809 AMDGPU/GlobalISel: Start handling _L to _LZ optimization
We currently don't have a way to map to the equivalent intrinsic
opcode, so track immediate 0s in place of the address for the
selection to know to change the final opcode.
2020-03-30 17:02:30 -04:00
Matt Arsenault 4919f2e1c5 AMDGPU/GlobalISel: Basic legalize rules for G_FSHR
Only handles easy 32-bit cases.
2020-03-30 11:53:01 -07:00
Jakub Kuderski 77ce2e21a8 [AMDGPU] Add Relocation Constant Support
Summary:
This change adds amdgcn.reloc.constant intrinsic to the amdgpu backend, which will compile into a relocation entry in the resulting elf.

The intrinsics takes a MetadataNode (String) as its only argument, which specifies the symbol name of the relocation entry.

`SelectionDAGBuilder::getValueImpl` is changed to allow metadata operands passed through to ISel.

Author: csyonghe <yonghe@google.com>

Reviewers: tpr, nhaehnle

Reviewed By: nhaehnle

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76440
2020-03-30 13:49:20 -04:00
Sameer Sahasrabuddhe 3cbbded68c Introduce unify-loop-exits pass.
For each natural loop with multiple exit blocks, this pass creates a
new block N such that all exiting blocks now branch to N, and then
control flow is redistributed to all the original exit blocks.

The bulk of the tranformation is a new function introduced in
BasicBlockUtils that an redirect control flow from a set of incoming
blocks to a set of outgoing blocks via a common "hub".

This is a useful workaround for a limitation in the structurizer which
incorrectly orders blocks when processing a nest of loops. This pass
bypasses that issue by ensuring that each natural loop is recognized
as a separate region. Since the structurizer is a region pass, it no
longer sees a nest of loops in a single region, and instead processes
each "level" in the nesting as a separate region.

The AMDGPU backend provides a new option to enable this pass before
the structurizer, which may eventually be enabled by default.

Reviewers: madhur13490, arsenm, nhaehnle

Reviewed By: nhaehnle

Differential Revision: https://reviews.llvm.org/D75865
2020-03-30 13:23:56 -04:00
Yuanfang Chen ece79f4708 [X86] make sure POP has implicit def/use of stack pointer when materializing 8-bit immediates for minsize
Summary:
Otherwise PostRA list scheduler may reorder instruction, such as

schedule this
'''
pushq  $0x8
pop    %rbx
lea    0x2a0(%rsp),%r15
'''
to
'''
pushq  $0x8
lea    0x2a0(%rsp),%r15
pop    %rbx
'''
by mistake. The patch is to prevent this to happen by making sure POP has
implicit use of SP.

Reviewers: craig.topper

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77031
2020-03-30 09:25:31 -07:00
Guillaume Chatelet bdf77209b9 [Alignment][NFC] Use Align version of getMachineMemOperand
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: jyknight, sdardis, nemanjai, hiraditya, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, jfb, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77059
2020-03-30 15:46:27 +00:00
Matt Arsenault bb009498c2 AMDGPU/GlobalISel: Hack to fix i24 argument lowering
I still think the call lowering type legalization logic split between
the generic code and target is too confusing, but largely induced by
the reliance on the DAG infrastructure.
2020-03-30 11:00:45 -04:00
Matt Arsenault 90a36bbd7c AMDGPU/GlobalISel: Legalize 64-bit G_UDIV/G_UREM
Mostly ported from the DAG version. This results in much worse code
than the DAG version, largely due to a much worse expansion for
G_UMULH.
2020-03-30 10:57:37 -04:00
Simon Pilgrim e95d04f4f1 [X86][AVX] lowerV4X128Shuffle - attempt to widen to 2x256 to simplify shuffles
If we are lowering to X86ISD::SHUF128 we are going to lose track of individual 128-bit lanes that are UNDEF, so if we can widen these to guarantee that they are sequential with their neighbour we should. This helps with later shuffle combines.
2020-03-30 12:22:26 +01:00
Florian Hahn c3b03f3d0c [AMDGPU] Drop const for value that is copied (NFC).
This fixes

    warning: loop variable 'Def' of type 'const llvm::Register' creates a copy from type 'const llvm::Register' [-Wrange-loop-analysis]

llvm::Register just contains a single unsigned and should be copied.

Reviewers: rampitec

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D77011
2020-03-30 10:59:59 +01:00
Sam Parker 94b195ff12 [ARM][LowOverheadLoops] Add horizontal reduction support
Add a bit more logic into the 'FalseLaneZeros' tracking to enable
horizontal reductions and also make the VADDV variants
validForTailPredication.

Differential Revision: https://reviews.llvm.org/D76708
2020-03-30 09:55:41 +01:00
Guillaume Chatelet b91535f6c7 [Alignment][NFC] Return Align for SelectionDAGNodes::getOriginalAlignment/getAlignment
Summary:
Also deprecate getOriginalAlignment, getAlignment will take much more time as it is pervasive through the codebase (including TableGened files).

This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76933
2020-03-30 07:26:48 +00:00
David Green c9eaed5149 [ARM] MVE VMOV.i64
In the original batch of MVE VMOVimm code generation VMOV.i64 was left
out due to the way it was done downstream. It turns out that it's fairly
simple though. This adds the codegen for it, similar to NEON.

Bigendian is technically incorrect in this version, which John is fixing
in a Neon patch.
2020-03-30 07:44:23 +01:00
Simon Pilgrim 9c8ec99c80 [X86][AVX] Combine 128/256-bit lane shuffles with zeroable upper subvectors to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128/SHUF128, and we can use the implicit zeroing of the uppers.
2020-03-29 19:51:38 +01:00
Simon Pilgrim 8206c50cde [X86] Add isAnyZero shuffle mask helper 2020-03-29 19:51:37 +01:00
Matt Arsenault d15723ef06 AMDGPU/GlobalISel: Remove redundant virtual 2020-03-29 14:03:07 -04:00
Matt Arsenault ab7a41069e AMDGPU: Fix using wrong instruction for FP conversion
This was was never actually hit, but FTRUNC was clearly not the intent
here.
2020-03-29 14:03:07 -04:00
Matt Arsenault 97bbe7ad2a AMDGPU: Fix typo 2020-03-29 14:03:06 -04:00
Simon Pilgrim 7734e4b3a3 [X86][AVX] Combine 128-bit lane shuffles with a zeroable upper half to EXTRACT_SUBVECTOR (PR40720)
As explained on PR40720, EXTRACTF128 is always as good/better than VPERM2F128, and we can use the implicit zeroing of the upper half.

I've added some extra tests to vector-shuffle-combining-avx2.ll to make sure we don't lose coverage.
2020-03-29 16:41:59 +01:00
Simon Pilgrim da4c7db793 [X86] Rename matchShuffleAsByteRotate to matchShuffleAsElementRotate. NFC.
This was an inner helper function for the real matchShuffleAsByteRotate function, but it is more generic and is used directly for VALIGN lowering which doesn't work at the byte level.
2020-03-29 16:41:58 +01:00
Simon Pilgrim 10439f9e32 [X86][AVX] Add X86ISD::VALIGN target shuffle decode support
Allows us to combine VALIGN instructions with other shuffles - the combiner doesn't create VALIGN yet though.
2020-03-29 16:41:58 +01:00
Simon Pilgrim a7115d51be [X86] X86CallFrameOptimization - generalize slow push code path
Replace the explicit isAtom() || isSLM() test with the more general (and more specific) slowTwoMemOps() check to avoid the use of the PUSHrmm push from memory case.

This is actually very tricky to test in anything but quite complex code, but the atomic-idempotent.ll tests seem to be the most straightforward to use.

Differential Revision: https://reviews.llvm.org/D76239
2020-03-29 11:01:59 +01:00
Fangrui Song fc93787d7e [MC][PowerPC] Make .reloc support arbitrary relocation types
Generalizes ad7199f3e6 (R_PPC_NONE/R_PPC64_NONE).
2020-03-28 17:04:31 -07:00
Matt Arsenault 9564f46766 AMDGPU: Make use of default operands 2020-03-28 17:33:29 -04:00
Benjamin Kramer 2d24d74b85 [AMDGPU] Stabilize sort order
Found by the expensive checks in llvm::sort.
2020-03-28 20:20:14 +01:00
Yonghong Song ced0d1f42b [BPF] support 128bit int explicitly in layout spec
Currently, bpf does not specify 128bit alignment in its
layout spec. So for a structure like
  struct ipv6_key_t {
    unsigned pid;
    unsigned __int128 saddr;
    unsigned short lport;
  };
clang will generate IR type
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }
Additional padding is to ensure later IR->MIR can generate correct
stack layout with target layout spec.

But it is common practice for a tracing program to be
first compiled with target flag (e.g., x86_64 or aarch64) through
clang to generate IR and then go through llc to generate bpf
byte code. Tracing program often refers to kernel internal
data structures which needs to be compiled with non-bpf target.

But such a compilation model may cause a problem on aarch64.
The bcc issue https://github.com/iovisor/bcc/issues/2827
reported such a problem.

For the above structure, since aarch64 has "i128:128" in its
layout string, the generated IR will have
  %struct.ipv6_key_t = type { i32, i128, i16 }

Since bpf does not have "i128:128" in its spec string,
the selectionDAG assumes alignment 8 for i128 and
computes the stack storage size for the above is 32 bytes,
which leads incorrect code later.

The x86_64 does not have this issue as it does not have
"i128:128" in its layout spec as it does permits i128 to
be alignmented at 8 bytes at stack. Its IR type looks like
  %struct.ipv6_key_t = type { i32, [12 x i8], i128, i16, [14 x i8] }

The fix here is add i128 support in layout spec, the same as
aarch64. The only downside is we may have less optimal stack
allocation in certain cases since we require 16byte alignment
for i128 instead of 8. But this is probably fine as i128 is
not used widely and in most cases users should already
have proper alignment.

Differential Revision: https://reviews.llvm.org/D76587
2020-03-28 11:46:29 -07:00
Benjamin Kramer 4065e92195 Upgrade some instances of std::sort to llvm::sort. NFC. 2020-03-28 19:23:29 +01:00
Michael Liao d2dd0fac48 Fix `-Wsign-compare` warning. NFC. 2020-03-28 10:20:27 -04:00
Kamlesh Kumar aabc24acf0 [RISCV] Support llvm.thread.pointer
Fixes https://bugs.llvm.org/show_bug.cgi?id=45303 (clang crashed on __builtin_thread_pointer)

Reviewed By: lenary, MaskRay, luismarques

Differential Revision: https://reviews.llvm.org/D76828
2020-03-27 17:30:12 -07:00
Francesco Petrogalli c66d1f38f6 [llvm][Support] Add isZero method for TypeSize. [NFC]
Summary:
The method is used where TypeSize is implicitly cast to integer for
being checked against 0.

Reviewers: sdesmalen, efriedma

Reviewed By: sdesmalen, efriedma

Subscribers: efriedma, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76748
2020-03-27 21:03:44 +00:00
Matt Arsenault a8cc9047de CodeGen: Add -denormal-fp-math-f32 flag
Make the set of FP related attributes and command flags closer.
2020-03-27 14:00:39 -07:00
Jay Foad a6dfd827e5 [AMDGPU] Fix getEUsPerCU for gfx10 in CU mode
Summary:
"Per CU" is a bit simplistic for gfx10, but I couldn't think of a better
name.

Reviewers: arsenm, rampitec, nhaehnle, dstuttard, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76861
2020-03-27 20:36:49 +00:00
Fangrui Song 152d14da64 [MC][X86] Make .reloc support arbitrary relocation types
Generalizes D62014 (R_386_NONE/R_X86_64_NONE).

Unlike ARM (D76746) and AArch64 (D76754), we cannot delete FK_NONE from
getFixupKindSize because FK_NONE is still used by R_386_TLS_DESC_CALL/R_X86_64_TLSDESC_CALL.
2020-03-27 13:33:15 -07:00
diggerlin 9c20f09985 [AIX] Address comment https://reviews.llvm.org/D76162#inline-701237
SUMMARY:

Address clang format issue:

"clang format this block, I don't think the spaces are aligned correctly."

Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-27 16:21:53 -04:00
Matt Arsenault 348735b723 AMDGPU: Stop setting attributes based on TargetOptions
Having arbitrary passes looking at the TargetOptions is pretty
messy. This was also disregarding if a function already had an
explicit attribute setting on it. opt/llc now add the attributes to
functions that don't specify the attribute. clang and lld do not call
the function to do this, which they maybe should.

This was also treating unsafe-fp-math as implying the others, and
setting the other attributes based on it. This is not done anywhere
else, and I'm not sure is correct based on the current description of
the option bit.

Effectively reverts 1d8cf2be89
2020-03-27 13:13:43 -07:00
Matt Arsenault 0ab5b5b858 Fix denormal-fp-math flag and attribute interaction
Make these behave the same way unsafe-fp-math and co. The command line
flag should add the attribute to functions that do not already have
it, and leave existing attributes. The attribute is the actual
implementation, but the flag is useful in some testing situations.

AMDGPU has a variety of tests with denormals enabled/disabled that
would require a painful level of test duplication without a flag. This
doesn't expose setting the separate input/output modes, or add a flag
for the f32 version yet.

Tests will be included in future patch.
2020-03-27 12:48:58 -07:00
Fangrui Song 34d77516b8 [MC][AArch64] Make .reloc support arbitrary relocation types
Depends on D76746. Generalizes D61973.

Differential Revision: https://reviews.llvm.org/D76754
2020-03-27 12:30:52 -07:00
Fangrui Song c389526171 [MC][ARM] Make .reloc support arbitrary relocation types
Generalizes D61992. In GNU as, the .reloc directive supports arbitrary relocation types.

A MCFixupKind value `V` larger than or equal to FirstLiteralRelocationKind
is used to represent the relocation type whose number is V-FirstLiteralRelocationKind.

This is useful for linker tests. Without the feature the assembler
cannot produce certain relocation records (e.g.  R_ARM_ALU_PC_G0/R_ARM_LDR_PC_G0)
This helps move forward D75349 and D76575.

Differential Revision: https://reviews.llvm.org/D76746
2020-03-27 12:29:49 -07:00
Craig Topper cdd1cd7120 [X86] Don't form masked instructions if the operation has an additional user.
This will cause the operation to be repeated in both a mask and another masked
or unmasked form. This can a wasted of execution resources.

Differential Revision: https://reviews.llvm.org/D60940
2020-03-27 10:44:22 -07:00
Simon Pilgrim 950ea61653 [X86] Remove orphan LowerSTRICT_FSETCC declaration. NFCI.
LowerSETCC handles strict cases as well, we don't have a separate function.
2020-03-27 17:03:19 +00:00
Guillaume Chatelet 74eac9031a [Alignment][NFC] MachineMemOperand::getAlign/getBaseAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: arsenm, dschuff, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, jrtc27, atanasyan, jfb, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76925
2020-03-27 15:49:13 +00:00
Sam Parker d7084fa34a [ARM][LowOverheadLoops] DoubleWidthResult instructions canGenerateZeros
Given that some instructions generate wider result elements than
their inputs, flag them as being able to generate non zeros in the
false lanes.

Differential Revision: https://reviews.llvm.org/D76766
2020-03-27 15:26:13 +00:00
Sam Parker 0e6aa08381 [ARM][MVE] Add DoubleWidthResult flag
Add a flag for those instructions which read from the top/bottom
halves of their inputs and produce a vector of results with double
width elements.

Differential Revision: https://reviews.llvm.org/D76762
2020-03-27 13:44:04 +00:00
Guillaume Chatelet e2ef6127d9 [Alignment] Fix overaligning bug
Summary:
This was discovered while converting to Align type.

See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76914
2020-03-27 12:57:50 +00:00
Simon Pilgrim d6ddabd7ef Revert rG6ff1ea3244c543ad24fc99c7f4979db2f2078593 "Fix "use of uninitialized variable" static analyzer warning. NFCI."
@dblaikie noticed that this may interfere with msan analysis
2020-03-27 11:44:03 +00:00
Jonas Paulsson 35173dddd1 [SystemZ] Fix typos in comments. 2020-03-27 12:31:48 +01:00
David Green 8689f98e9b [ARM] Fix MVE VCMPr f16 pattern
This patterns seemed to be using the f32 instruction, not f16. Fix it to
use the correct one.

Differential Revision: https://reviews.llvm.org/D76841
2020-03-27 11:18:24 +00:00
Fangrui Song 6728a9ae19 [MCInstPrinter] Add parameter `Address` to printCustomAliasOperand. NFC
Follow-up of D72172 and llvmorg-11-init-6896-gb3cc5dcef0f.
2020-03-27 00:38:20 -07:00
Fangrui Song b3cc5dcef0 [MCInstPrinter] Add parameter `Address` to MCInstPrinter::printAliasInstr. NFC
Follow-up of D72172.
2020-03-27 00:03:32 -07:00
Shengchen Kan 1fb4f99a21 [X86][MC] Fix the bug for prefix padding support
Summary:
There is a tiny logic error of D75300, making branch is not
correctly aligned with option -x86-pad-max-prefix-size

Reviewers: reames, MaskRay, craig.topper, LuoYuanke, jyknight

Reviewed By: reames

Subscribers: hiraditya, llvm-commits, annita.zhang

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76285
2020-03-27 14:16:09 +08:00
Dan Gohman d865437d9c [WebAssembly] Fix the order of destructors in the LowerGlobalDtors pass.
Fix the LowerGlobalDtors pass to run destructors in the same order as the
regular LLVM destructor lowering -- in reverse order. Adjacent
destructors with the same associated object are grouped, but destructors
are not reordered based on associated objects.

Differential Revision: https://reviews.llvm.org/D70685
2020-03-26 16:19:02 -07:00
Stanislav Mekhanoshin 4c4b71843b [AMDGPU] Propagate amdgpu-waves-per-eu to callees
Differential Revision: https://reviews.llvm.org/D76868
2020-03-26 14:43:44 -07:00
Craig Topper 9f7d4150b9 [X86] Move combineLoopMAddPattern and combineLoopSADPattern to an IR pass before SelecitonDAG.
These transforms rely on a vector reduction flag on the SDNode
set by SelectionDAGBuilder. This flag exists because SelectionDAG
can't see across basic blocks so SelectionDAGBuilder is looking
across and saving the info. X86 is the only target that uses this
flag currently. By removing the X86 code we can remove the flag
and the SelectionDAGBuilder code.

This pass adds a dedicated IR pass for X86 that looks across the
blocks and transforms the IR into a form that the X86 SelectionDAG
can finish.

An advantage of this new approach is that we can enhance it to
shrink the phi nodes and final reduction tree based on the zeroes
that we need to concatenate to bring the partially reduced
reduction back up to the original width.

Differential Revision: https://reviews.llvm.org/D76649
2020-03-26 14:10:20 -07:00
Simon Pilgrim ad36491ebb [X86] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on all targets
Extends rG9d1721ce3926 to support AVX2+ targets.
2020-03-26 20:46:24 +00:00
Jay Foad 0fe096c4e9 [AMDGPU] Rename overloaded getMaxWavesPerEU to getWavesPerEUForWorkGroup
Summary: I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76860
2020-03-26 20:21:04 +00:00
Jay Foad bb9c4fd7ea [AMDGPU] Remove getMaxWavesPerCU in favour of getWavesPerWorkGroup.
Summary:
These methods were identical. I chose to remove getMaxWavesPerCU because
I think Max in the name was misleading. NFC.

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76859
2020-03-26 20:21:04 +00:00
Derek Schuff e110897e28 [WEbAssembly] Clear frame base vreg in explicit-locals when stack pointer is dead
Having an alloca in a function causes the stack pointer to be generated in the
prolog, but if it's unused other than for debug info, explicit-locals will drop
it and not allocate a local. In this case we need to reset the FrameBaseVreg.

Differential Revision: https://reviews.llvm.org/D76784
2020-03-26 13:07:32 -07:00
Simon Pilgrim 39a52a19ed [X86] lowerV16I8Shuffle - create v8i16 mask for PACKUS(AND(),AND()) patterns.
We can improve computeKnownBits results by avoiding excess bitcasts.

For this pattern we were doing:

  (v16i8 PACKUS(v8i16 BITCAST(v16i8 AND(V1, MASK)), v8i16 BITCAST(v16i8 AND(V2, MASK))))

By performing the MASK/AND with a v8i16 type and bitcasting V1/V2 directly we can help computeKnownBits see that the mask is clearing the upper bits and allows shuffle combining to peek through later on.

This will be necessary to extend rG9d1721ce3926 to AVX2+ targets in a future patch.
2020-03-26 19:59:57 +00:00
diggerlin fdfe411e7c [AIX] discard the label in the csect of function description and use qualname for linkage
SUMMARY:

SUMMARY
for a source file  "test.c"

void foo() {};

llc will generate assembly code as (assembly patch)
     .globl  foo
     .globl  .foo
     .csect foo[DS]
foo:

        .long   .foo
        .long   TOC[TC0]
        .long   0

   and symbol table as (xcoff object file)
   [4]     m   0x00000004     .data     1  unamex                    foo
   [5]     a4  0x0000000c       0    0     SD       DS    0    0
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     LD       DS    0    0

   After first patch, the assembly will be as

        .globl  foo[DS]                 # -- Begin function foo
        .globl  .foo
        .align  2
        .csect foo[DS]
        .long   .foo
        .long   TOC[TC0]
        .long   0

    and symbol table will as
   [6]     m   0x00000004     .data     1  extern                    foo
   [7]     a4  0x00000004       0    0     DS      DS    0    0
Change the code for the assembly path and xcoff objectfile patch for llc.

Reviewers: Jason Liu
Subscribers: wuzish, nemanjai, hiraditya

Differential Revision: https://reviews.llvm.org/D76162
2020-03-26 15:46:52 -04:00
Scott Linder bd12ecb88f [AMDGPU] Fix PC register mapping in wave32 mode
Summary:
The PC_32 DWARF register is for a 32-bit process address space which we
don't implement in AMDGCN; another way of putting this is that the size
of the PC register is not a function of the wavefront size. If we ever
implement a 32-bit process address space we will need to add two more
DwarfFlavours i.e. we will need to represent the product of (wave32,
wave64) x (64-bit address space, 32-bit address space).

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76732
2020-03-26 14:43:25 -04:00
David Blaikie 9002db05a2 Roll otherwise unused subexpressions into an assertion 2020-03-26 11:32:33 -07:00
Guillaume Chatelet b727aabcb8 [Alignment][NFC] Use llvmTargetFrameLowering::getStackAlign
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Reviewed By: courbet

Subscribers: wuzish, arsenm, jyknight, nemanjai, jvesely, nhaehnle, hiraditya, kbarton, fedor.sergeev, jrtc27, kerbowa, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76613
2020-03-26 18:15:53 +00:00
Jay Foad 0602c20b1b [AMDGPU] Make use of divideCeil. NFC. 2020-03-26 16:11:35 +00:00
Jay Foad 596bed3fd3 [AMDGPU] Remove unused methods. NFC. 2020-03-26 16:11:35 +00:00
Justin Hibbits 459e8e9488 [PowerPC]: Don't allow r0 as a target for LD_GOT_TPREL_L/32
Summary:
The linker is free to relax this (relocation R_PPC_GOT_TPREL16) against
R_PPC_TLS, if it sees fit (initial exec to local exec).  If r0 is used,
this can generate execution-invalid code (converts to 'addi %rX, %r0,
FOO, which translates in PPC-lingo to li %rX, FOO).  Forbid this
instead.

This fixes static binaries using locales on FreeBSD/powerpc
(tested on FreeBSD/powerpcspe).

Reviewed By: nemanjai
Differential Revision: https://reviews.llvm.org/D76662
2020-03-26 10:59:28 -05:00
Simon Pilgrim 9d1721ce39 [X86][SSE] Prefer PACKUS(AND(),AND()) to SHUFFLE(PSHUFB(),PSHUFB()) on pre-AVX2 targets
As discussed on PR31443, we should be trying to use PACKUS for binary truncation patterns to reduce the number of shuffles.

The plan is to support AVX2+ targets once we've worked around PR45315 - we fail to peek through a VBROADCAST_LOAD mask to recognise zero upper bits in a PACKUS pattern.

We should also be able to add support for v8i16 and possibly 256/512-bit vectors as well.
2020-03-26 15:47:43 +00:00
Fangrui Song 3eef47407b [PPCInstPrinter] Change printBranchOperand(calltarget) to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
0: bl .-4
4: bl .+0
8: bl .+4

// llvm-objdump -d output (after) ; GNU objdump -d
0: bl 0xfffffffc / bl 0xfffffffffffffffc
4: bl 0x4
8: bl 0xc
```

Many Operand's are not annotated as OPERAND_PCREL.
They are not affected (e.g. `b .+67108860`). I plan to fix them in future patches.

Modified test/tools/llvm-objdump/ELF/PowerPC/branch-offset.s to test
address space wraparound for powerpc32 and powerpc64.

Reviewed By: sfertile, jhenderson

Differential Revision: https://reviews.llvm.org/D76591
2020-03-26 08:32:29 -07:00
Fangrui Song 87de9a0786 [X86InstPrinter] Change printPCRelImm to print the target address in hexadecimal form
```
// llvm-objdump -d output (before)
400000: e8 0b 00 00 00   callq 11
400005: e8 0b 00 00 00   callq 11

// llvm-objdump -d output (after)
400000: e8 0b 00 00 00  callq 0x400010
400005: e8 0b 00 00 00  callq 0x400015

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
400000: e8 0b 00 00 00  callq 400010
400005: e8 0b 00 00 00  callq 400015
```

In llvm-objdump, we pass the address of the next MCInst. Ideally we
should just thread the address of the current address, unfortunately we
cannot call X86MCCodeEmitter::encodeInstruction (X86MCCodeEmitter
requires MCInstrInfo and MCContext) to get the length of the MCInst.

MCInstPrinter::printInst has other callers (e.g llvm-mc -filetype=asm, llvm-mca) which set Address to 0.
They leave MCInstPrinter::PrintBranchImmAsAddress as false and this change is a no-op for them.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76580
2020-03-26 08:28:59 -07:00
Fangrui Song 5fad05e80d [MCInstPrinter] Pass `Address` parameter to MCOI::OPERAND_PCREL typed operands. NFC
Follow-up of D72172 and D72180

This patch passes `uint64_t Address` to print methods of PC-relative
operands so that subsequent target specific patches can change
`*InstPrinter::print{Operand,PCRelImm,...}` to customize the output.

Add MCInstPrinter::PrintBranchImmAsAddress which is set to true by
llvm-objdump.

```
// Current llvm-objdump -d output
aarch64: 20000: bl #0
ppc:     20000: bl .+4
x86:     20000: callq 0

// Ideal output
aarch64: 20000: bl 0x20000
ppc:     20000: bl 0x20004
x86:     20000: callq 0x20005

// GNU objdump -d. The lack of 0x is not ideal because the result cannot be re-assembled
aarch64: 20000: bl 20000
ppc:     20000: bl 0x20004
x86:     20000: callq 20005
```

In `lib/Target/X86/X86GenAsmWriter1.inc` (generated by `llvm-tblgen -gen-asm-writer`):

```
   case 12:
     // CALL64pcrel32, CALLpcrel16, CALLpcrel32, EH_SjLj_Setup, JCXZ, JECXZ, J...
-    printPCRelImm(MI, 0, O);
+    printPCRelImm(MI, Address, 0, O);
     return;
```

Some targets have 2 `printOperand` overloads, one without `Address` and
one with `Address`. They should annotate derived `Operand` properly with
`let OperandType = "OPERAND_PCREL"`.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D76574
2020-03-26 08:21:15 -07:00
Simon Pilgrim e30d29ebc1 [X86][SSE] getFauxShuffleMask - peek through TRUNCATE/AEXT/ZEXT for INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT())
As long we extract from a source vector with smaller elements and we zero-extend the element in the final shuffle mask then we can safely peek through truncations and any/zero-extensions to find the source extraction.
2020-03-26 11:57:45 +00:00
Jonas Paulsson 8bf9e317e4 [SystemZ] Bugfix in tieOpsIfNeeded()
This function did a check which was broken to see if an opcode requires op0
and op1 to be tied. By chance this is NFC.

Review: Ulrich Weigand
2020-03-26 12:22:14 +01:00
Kang Zhang 4673699a47 [PowerPC] Remove the repeated definition for some InstAlias for mtspr/mfspr
Summary:
Below InstAlias have been redefined, this patch is to remove the repeated
definition.
mtdec/mfdec mtsdr1/mfsdr1 mtsrr0/mfsrr0 mtsrr1/mfsrr1 mtasr

Reviewed By: nemanjai, steven.zhang

Differential Revision: https://reviews.llvm.org/D75821
2020-03-26 09:58:30 +00:00
Cullen Rhodes 9086db707d [AArch64][SVE] Implement structured store intrinsics
Summary:
This patch adds initial support for the following intrinsics:

    * llvm.aarch64.sve.st2
    * llvm.aarch64.sve.st3
    * llvm.aarch64.sve.st4

For storing two, three and four vectors worth of data. Basic codegen for
reg+immediate forms are implemented. Reg+reg addressing modes will be
addressed in a later patch.

These intrinsics are intended for use in the Arm C Language Extension
(ACLE).

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D75947
2020-03-26 09:34:51 +00:00
Ties Stuij 71ae267d1f [PATCH] [ARM] ARMv8.6-a command-line + BFloat16 Asm Support
Summary:
This patch introduces command-line support for the Armv8.6-a architecture and assembly support for BFloat16. Details can be found
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

in addition to the GCC patch for the 8..6-a CLI:
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02647.html

In detail this patch

- march options for armv8.6-a
- BFloat16 assembly

This is part of a patch series, starting with command-line and Bfloat16
assembly support. The subsequent patches will upstream intrinsics
support for BFloat16, followed by Matrix Multiplication and the
remaining Virtualization features of the armv8.6-a architecture.

Based on work by:
- labrinea
- MarkMurrayARM
- Luke Cheeseman
- Javed Asbar
- Mikhail Maltsev
- Luke Geeson

Reviewers: SjoerdMeijer, craig.topper, rjmccall, jfb, LukeGeeson

Reviewed By: SjoerdMeijer

Subscribers: stuij, kristof.beyls, hiraditya, dexonsmith, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D76062
2020-03-26 09:17:20 +00:00
David Green 37b9cc8f29 [ARM] Sink splats to vector float instructions
Some MVE floating point instructions have gpr register variants that take
the scalar gpr value and splat them to all lanes. In order to accept
them in loops, the shuffle_vector and insert need to be sunk down into
the loop, next to the instruction so that ISel can see the whole
pattern.

This does that sinking for FAdd, FSub, FMul and FCmp. The patterns for
mul are slightly more constrained as there are no fms variants taking
register arguments.

Differential Revision: https://reviews.llvm.org/D76023
2020-03-26 09:02:18 +00:00
QingShan Zhang 1ef7bf4121 [PowerPC] Improve the way legalize mul for v8i16 and add pattern to match mul + add
We can legalize the operation MUL for v8i16 with instruction (vmladduhm A, B, 0)
if altivec enabled. Now, it is set as custom and expand it later, which is not
the right way. And then, we can add the pattern to match the mul + add with (vmladduhm A, B, C)

Reviewed By: Nemanjai

Differential Revision: https://reviews.llvm.org/D76751
2020-03-26 04:46:49 +00:00
Stanislav Mekhanoshin e06d707aa2 [AMDGPU] Fixed function traversal in attribute propagation
AMDGPUPropagateAttributes pass was skipping some of the functions
when cloning. Functions were added to root set and then skipped
on the next interation because they are already in the root set,
while were meant to be processed with different features.

Differential Revision: https://reviews.llvm.org/D76815
2020-03-25 18:47:09 -07:00
Stanislav Mekhanoshin 6e00e3fcb0 [AMDGPU] Preserve original symbol during attribute propagation
AMDGPUPropagateAttributes can swap names while cloning a function.
Only do it if original symbol was not externally visible.

Differential Revision: https://reviews.llvm.org/D76789
2020-03-25 15:26:30 -07:00
Simon Pilgrim c6e5531f9b [X86][AVX] Combine shuffles to TRUNCATE/VTRUNC patterns
Add support for combining shuffles to AVX512 truncate instructions - another step toward fixing D56387/D66004. It also fixes SKX code on PR31443.

We could probably extend this further to handle non-VLX truncation cases.
2020-03-25 17:41:51 +00:00
Mikhail Maltsev bb4da94e5b [ARM,CDE] Implement predicated Q-register CDE intrinsics
Summary:
This patch implements the following CDE intrinsics:

  T __arm_vcx1q_m(int coproc, T inactive, uint32_t imm, mve_pred_t p);
  T __arm_vcx2q_m(int coproc, T inactive, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3q_m(int coproc, T inactive, U n, V m, uint32_t imm, mve_pred_t p);

  T __arm_vcx1qa_m(int coproc, T acc, uint32_t imm, mve_pred_t p);
  T __arm_vcx2qa_m(int coproc, T acc, U n, uint32_t imm, mve_pred_t p);
  T __arm_vcx3qa_m(int coproc, T acc, U n, V m, uint32_t imm, mve_pred_t p);

The intrinsics are not part of the released ACLE spec, but internally at
Arm we have reached consensus to add them to the next ACLE release.

Reviewers: simon_tatham, MarkMurrayARM, ostannard, dmgreen

Reviewed By: simon_tatham

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D76610
2020-03-25 17:08:19 +00:00
Yvan Roux bd069ad39c [ARM] Move ConstantIsland and LowOverheadLoops Passes.
Move ARM ConstantIsland and LowOverheadLopps passes later in the pipeline
such that they will be run after the upcoming Machine Outlining pass.

Differential Revision: https://reviews.llvm.org/D76065
2020-03-25 16:49:21 +01:00
cdevadas ce984129ea [AMDGPU] Add SIPreEmitPeephole pass.
This pass can handle all the optimization
opportunities found just before code emission.
Presently it includes the handling of vcc branch
optimization that was handled earlier in SIInsertSkips.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D76712
2020-03-25 15:35:35 +00:00
Jonas Paulsson f09b891d4a [SystemZ] Improve foldMemoryOperandImpl()
A spilled load of an immediate can use MVHI/MVGHI instead.
A compare of a spilled register against an immediate can use CHSI/CGHSI.
A logical compare can use CLFHSI/CLGHSI.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76055
2020-03-25 16:21:08 +01:00
Sean Fertile 3282d875d6 [PowerPC][AIX] ByVal formal arguments in a single register.
Adds support for passing ByVal formal arguments as long as they fit in a
single register.

Differential Revision: https://reviews.llvm.org/D76401
2020-03-25 11:09:40 -04:00
Kerry McLaughlin 05606329e2 [AArch64][SVE] Add SVE intrinsics for masked loads & stores
Summary:
Implements the following intrinsics for contiguous loads & stores:
  - @llvm.aarch64.sve.ld1
  - @llvm.aarch64.sve.st1

Reviewers: sdesmalen, andwar, efriedma, cameron.mcinally, dancgr, rengolin

Reviewed By: cameron.mcinally

Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76688
2020-03-25 11:48:40 +00:00
Sam Parker e87250202d [ARM][MVE] Add HorizontalReduction flag
Add a target flag for instructions that reduce into one, or more,
scalar reg(s), including variants of:
- VADDV
- VABAV
- VMINV/VMAXV
- VMLADAV

Differential Revision: https://reviews.llvm.org/D76683
2020-03-25 11:12:03 +00:00
Kazushi (Jam) Marukawa 28a42dd1b9 [VE] Change name of enum to CondCode
Summary: Change enum name for condition codes from CondCodes to CondCode.

Reviewers: arsenm, simoll, k-ishizaka

Reviewed By: arsenm

Subscribers: wdng, hiraditya, llvm-commits

Tags: #llvm, #ve

Differential Revision: https://reviews.llvm.org/D76747
2020-03-25 09:20:05 +01:00
Amara Emerson 472d282046 [AArch64][GlobalISel] Don't localize TLS G_GLOBAL_VALUEs on Darwin.
On Darwin these need to be selected into a function call for the TLS
address lookup. As a result, they can't be moved below a physreg write,
which happens in call sequences. In the long term, we should have some
mechanism in the localizer to prevent localizing into target-specific
atomic instruction sequences.

rdar://60056248

Differential Revision: https://reviews.llvm.org/D76652
2020-03-24 13:35:50 -07:00
Reid Kleckner 597718aae0 Re-land "Avoid emitting unreachable SP adjustments after `throw`"
This reverts commit 4e0fe038f4. Re-lands
65b21282c7.

After landing 5ff5ddd0ad to add int3 into
trailing unreachable blocks, we can now remove these extra stack
adjustments without confusing the Win64 unwinder. See
https://llvm.org/45064#c4 or X86AvoidTrailingCall.cpp for a full
explanation.

Fixes PR45064.
2020-03-24 12:04:43 -07:00
Matt Arsenault 26ebc51a34 AMDGPU/GlobalISel: Fix smrd loads of v4i64 2020-03-24 13:44:41 -04:00
David Green f8c79b94af [ARM] Fold VMOVrh VLDR to LDRH
This adds a simple fold to combine VMOVrh load to a integer load.
Similar to what is already performed for BITCAST, but needs to account
for the types being of different sizes, creating an zero extending load.

Differential Revision: https://reviews.llvm.org/D76485
2020-03-24 15:51:03 +00:00
Simon Pilgrim 714402147d [X86][SSE1] Add support for logic+movmsk patterns (PR42870)
rL368506 handled the basic case, but we need to account for boolean logic patterns as well.
2020-03-24 14:28:40 +00:00
David Green 1232cfa385 [ARM] Don't split trunc stores that can be better handled as VMOVN
We deliberately split stores of the form
store(truncate(larger-than-legal-type)) into two stores, allowing each
store to perform part of the truncate for free.

There are times however where it makes more sense to use VMOVN to
de-interlace the results back into a single vector, and store that in
one go. This adds a check for that situation, not splitting the store if
it looks like a VMOVN can be more useful.

Differential Revision: https://reviews.llvm.org/D76511
2020-03-24 08:48:52 +00:00
Sam Parker 94cacebcca [ARM][LowOverheadLoops] Add checks for narrowing
Modify ValidateLiveOuts to track 'FalseLaneZeros' more precisely,
including checks on specific operations that can generate non-zeros
from zero values, e.g VMVN. We can then check that any instructions
that retain some information in their output register (all narrowing
instructions) that they only use and def registers that always have
zeros in their falsely predicated bytes, whether or not tail
predication happens.

Most of the logic remains the same, just the names of the data
structures and helpers have been renamed to reflect the change in
logic. The key change, apart from the opcode checkers, is that the
FalseZeros set now strictly contains only instructions which will
always generate zeros, and not instructions that could also have
their false bytes masked away later.

Differential Revision: https://reviews.llvm.org/D76235
2020-03-24 08:41:48 +00:00
Sam Parker 6f86e6bf40 [ARM][MVE] Add target flag for narrowing insts
Add a flag, 'RetainsPreviousHalfElement', for operations that operate
on top/bottom halves of their input and only write to half of their
destination, leaving the other half to retain its previous value.

Differential Revision: https://reviews.llvm.org/D76608
2020-03-24 08:36:44 +00:00
Chen Zheng 9d07d91fb6 [PowerPC] fix a typo in commit 3f85134d71
Implement target hook isProfitableToHoist - typo fix.
2020-03-24 01:56:15 -04:00
Nemanja Ivanovic bfa9ce1cb2 [PowerPC] Improve handling of some BUILD_VECTOR nodes
An analysis of real world code turned up a number of patterns with BUILD_VECTOR
of nodes resulting from operations on extracted vector elements for which we
produce poor code. This addresses those cases. No attempt is made for
completeness as that would entail a large amount of work for something that
there is no evidence of in real code.

Differential revision: https://reviews.llvm.org/D72660
2020-03-23 17:34:29 -05:00
Justin Hibbits f0990e104b [PowerPC]: e500 target can't use lwsync, use msync instead
The e500 core has a silicon bug that triggers an illegal instruction
program trap on any sync other than msync.  Other cores will typically
ignore illegal sync types, and the documentation even implies that the
'illegal' bits are ignored.

Address this hardware deficiency by only using msync, like the PPC440.

Differential Revision:  https://reviews.llvm.org/D76614
2020-03-23 17:15:27 -05:00
Matt Arsenault 66073953a5 AMDGPU: Allow vectorization of round intrinsic
There seems to be a small benefit to the legalized sequence for v2f16
round with packed instructions, so allow vectorizing it by reducing
the cost.

An unintended side effect is vectorization of f32 round also
happens. The current FMA logic seems off to me, and isn't checking for
packed instructions.
2020-03-23 17:00:41 -04:00
Matt Arsenault 2ad5fc1d91 AMDGPU/GlobalISel: Implement computeNumSignBitsForTargetInstr 2020-03-23 15:02:30 -04:00
Reid Kleckner 5ff5ddd0ad [Win64] Insert int3 into trailing empty BBs
Otherwise, the Win64 unwinder considers direct branches to such empty
trailing BBs to be a branch out of the function. It treats such a branch
as a tail call, which can only be part of an epilogue. If the unwinder
misclassifies such a branch as part of the epilogue, it will fail to
unwind the stack further. This can lead to bad stack traces, or failure
to handle exceptions properly. This is described in
https://llvm.org/PR45064#c4, and by the comment at the top of the
X86AvoidTrailingCallPass.cpp file.

It should be safe to insert int3 for such blocks. An empty trailing BB
that reaches this pass is pretty much guaranteed to be unreachable.  If
a program executed such a block, it would fall off the end of the
function.

Most of the complexity in this patch comes from threading through the
"EHFuncletEntry" boolean on the MIRParser and registering the pass so we
can stop and start codegen around it. I used an MIR test because we
should teach LLVM to optimize away these branches as a follow-up.

Reviewed By: hans

Differential Revision: https://reviews.llvm.org/D76531
2020-03-23 08:50:37 -07:00
Ram Nalamothu 24698e526f Implement wave32 DWARF register mapping
Implement the DWARF register mapping described in llvm/docs/AMDGPUUsage.rst.

This enables generating appropriate DWARF register numbers for wave64 and
wave32 modes.
2020-03-23 10:24:16 -04:00
Sanjay Patel 0eeee83d75 [VectorUtils] move x86's scaleShuffleMask to generic VectorUtils
We have some long-standing missing shuffle optimizations that could
use this transform via VectorCombine now:
https://bugs.llvm.org/show_bug.cgi?id=35454
(and we still don't get that case in the backend either)

This function is apparently templated because there's existing code
in IR that treats mask values as unsigned and backend code that
treats masks values as signed.

The mask values are not endian-dependent (as shown by the existing
bitcast transform from DAGCombiner).

Differential Revision: https://reviews.llvm.org/D76508
2020-03-23 09:58:55 -04:00
Jonas Paulsson 9adc7fc3cd [SystemZ] Perform instruction shortening for fused fp ops.
Replace single-lane (W... form) vector "multiply and add" and "multiply and
subtract" instructions with equivalent floating point instructions whenever
possible in SystemZShortenInst.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D76370
2020-03-23 14:12:13 +01:00
Guillaume Chatelet 3ba550a05a [Alignment][NFC] Use TFL::getStackAlign()
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: dylanmckay, sdardis, nemanjai, hiraditya, kbarton, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76551
2020-03-23 13:48:29 +01:00
Guillaume Chatelet ea64ee0edb [Alignment][NFC] Deprecate ensureMaxAlignment
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790

Reviewers: courbet

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76368
2020-03-23 11:31:33 +01:00
Craig Topper e2cb121374 [X86] Remove maximum vector length limit from combineBasicSADPattern.
createPSADBW uses SplitsOpsAndApply so should be able to handle
any size.

Restrict the extract result type to i32 or i64 since that's what
we have coverage for today and probably matches what the
isSimple() check gave us before.

Differential Revision: https://reviews.llvm.org/D76560
2020-03-22 15:02:05 -07:00
Craig Topper f4c67dfa92 [X86] More accurately model the cost of horizontal reductions.
This patch attempts to more accurately model the reduction of
power of 2 vectors of types we natively support. This takes into
account the narrowing of vectors that occur as we go from 512
bits to 256 bits, to 128 bits. It also takes into account the use
of wider elements in the shuffles for the first 2 steps of a
reduction from 128 bits. And uses a v8i16 shift for the final step
of vXi8 reduction.

The default implementation uses the legalized type for the arithmetic
for all levels. And uses the single source permute cost of the
legalized type for all levels. This penalizes things like
lack of v16i8 pshufb on pre-sse3 targets and the splitting and
joining that needs to be done for integer types on AVX1. We never
need v16i8 shuffle for a reduction and we only need split AVX1 ops
when type the type wide and needs to be split. I think we're still
over costing splits and joins for AVX1, but we're closer now.

I've also removed all pairwise special casing because I don't
think we ever want to generate that on X86. I've also adjusted
the add handling to more accurately account for any type splitting
that occurs before we reach a legal type.

Differential Revision: https://reviews.llvm.org/D76478
2020-03-22 14:20:15 -07:00
Simon Atanasyan 2dc4eb08cd [mips] Implement .cpadd directive
This directive inserts code to add $gp to the argument's register when
support for position independent code is enabled.

For example, this code:
  .cpadd $4
expands to:
  addu $4, $4, $gp
2020-03-22 23:34:32 +03:00
Simon Atanasyan 9bbddfbeaa [mips] Implement sne pseudo instruction
The `sne Dst, Src1, Src2/Imm` pseudo instruction sets register `Dst` to
1 if register `Src1` is not equal to `Src2/Imm` and to 0 otherwise.
2020-03-22 23:34:31 +03:00
Simon Atanasyan dca9e40c0c [mips] Implement sle/sleu pseudo instructions
The `sle/sleu Dst, Src1, Src2/Imm` pseudo instructions set register
`Dst` to 1 if register `Src1` is less than or equal `Src2/Imm` and
to 0 otherwise.
2020-03-22 23:34:31 +03:00
Simon Atanasyan 862f120fdb [mips] Remove instructions related to "wired paired single" from the P5600 model. 2020-03-22 23:34:31 +03:00