Commit Graph

49998 Commits

Author SHA1 Message Date
Craig Topper 35585aff34 [X86] Remove custom DAG combine for SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG.
We only needed this because it provided really aggressive constant folding even through constant pool entries created from build_vectors. The main case was for vXi8 MULH legalization which was happening as part of legalize DAG instead of as part of legalize vector ops. Now its part of vector op legalization and we've added special handling for build vectors of all constants there. This has removed the need for this code on the list tests we have.

llvm-svn: 348237
2018-12-04 04:51:07 +00:00
Sanjin Sijaric dc6403d133 [ARM64][Windows] Fix local stack size for funclets
The comment was misplaced, and the code didn't do what the comment indicated,
namely ignoring the varargs portion when computing the local stack size of a
funclet in emitEpilogue.  This results in incorrect offset computations within
funclets that are contained in vararg functions.

Differential Revision: https://reviews.llvm.org/D55096

llvm-svn: 348222
2018-12-04 00:54:52 +00:00
Jessica Paquette bce2086ad1 [MachineOutliner] Move stack instr check logic to getOutliningCandidateInfo
This moves the stack check logic into a lambda within getOutliningCandidateInfo.

This allows us to be less conservative with stack checks. Whether or not a
stack instruction is safe to outline is dependent on the frame variant and call
variant of the outlined function; only in cases where we modify the stack can
these be unsafe.

So, if we move that logic later, when we're looking at an individual candidate,
we can make better decisions here.

This gives some code size savings as a result.

llvm-svn: 348220
2018-12-04 00:31:55 +00:00
Jessica Paquette 2f5833ecd9 [MachineOutliner][AArch64][NFC] Add early exit to candidate discarding logic
If we dropped too many candidates to be beneficial when dropping candidates
that modify the stack, there's no reason to check for other cost model
qualities.

llvm-svn: 348219
2018-12-04 00:31:47 +00:00
Krzysztof Parzyszek 44c1f81b27 [Hexagon] Switch to auto-generated intrinsic definitions and patterns
llvm-svn: 348206
2018-12-03 22:40:36 +00:00
Krzysztof Parzyszek 9dafa8a2c6 [Hexagon] Extract operand decoders into a separate file, NFC
These decoders are automatically generated. Keeping them separated makes
updating architectures easier.

llvm-svn: 348196
2018-12-03 21:59:21 +00:00
Sanjay Patel d24f63477d [DAGCombiner] narrow truncated vector binops when legal
This is the smallest vector enhancement I could find to D54640.
Here, we're allowing narrowing to only legal vector ops because we'll see
regressions without that. All of the test diffs are wins from what I can tell.
With AVX/AVX512, we can shrink ymm/zmm ops to xmm.

x86 vector multiplies are the problem case that we're avoiding due to the
patchwork ISA, and it's not clear to me if we can dance around those
regressions using TLI hooks or if we need preliminary patches to plug those
holes.

Differential Revision: https://reviews.llvm.org/D55126

llvm-svn: 348195
2018-12-03 21:57:35 +00:00
Simon Atanasyan f76884b0d3 [mips] Fix TestDWARF32Version5Addr8AllForms test failure on MIPS hosts
The `DIEExpr` is used in debug information entries for either TLS variables
or call sites. For now the last case is unsupported for targets with delay
slots, for MIPS in particular.

The `DIEExpr::EmitValue` method calls a virtual `EmitDebugThreadLocal`
routine which, in case of MIPS, always emits either `.dtprelword` or
`.dtpreldword` directives. That is okay for "main" code, but in unit
tests `DIEExpr` instances can be created not for TLS variables only even
on MIPS hosts. That is a reason of the `TestDWARF32Version5Addr8AllForms`
failure because handling of the `R_MIPS_TLS_DTPREL` relocation writes
incorrect value into dwarf structures. And anyway unconditional emitting
of `.dtprelword` directives will be incorrect when/if debug information
entries for call sites become supported on MIPS.

The patch solves the problem by wrapping expression created in the
`MipsTargetObjectFile::getDebugThreadLocalSymbol` method in to the
`MipsMCExpr` expression with a new `MEK_DTPREL` tag. This tag is
recognized in the `MipsAsmPrinter::EmitDebugThreadLocal` method and
`.dtprelword` directives created in this case only. In other cases the
expression saved as a regular data.

Differential Revision: http://reviews.llvm.org/D54937

llvm-svn: 348194
2018-12-03 21:54:43 +00:00
Krzysztof Parzyszek a45a55fc67 [Hexagon] Remove unused encodings, NFC
llvm-svn: 348193
2018-12-03 21:49:12 +00:00
Wouter van Oortmerssen c7b89f0f62 [WebAssembly] Enforce assembler emits to streamer in order.
Summary:
The assembler processes directives and instructions in whatever order
they are in the file, then directly emits them to the streamer. This
could cause badly written (or generated) .s files to produce
incorrect binaries.

It now has state that tracks what it has most recently seen, to
enforce they are emitted in a given order that always produces
correct wasm binaries.

Also added a new test that compares obj2yaml output from llc (the
backend) to that going via .s and the assembler to ensure both paths
generate the same binaries.

The features this test covers could be extended.

Passes all wasm Lit tests.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=39557

Reviewers: sbc100, dschuff, aheejin

Subscribers: jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D55149

llvm-svn: 348185
2018-12-03 20:30:28 +00:00
Krzysztof Parzyszek 6290a73f29 [Hexagon] Update timing classes
llvm-svn: 348183
2018-12-03 20:13:18 +00:00
Krzysztof Parzyszek 1cbc5cd364 [Hexagon] Change instruction type field in TSFlags to 7 bits
llvm-svn: 348171
2018-12-03 19:34:04 +00:00
Jessica Paquette 2accb31690 [MachineOutliner] Drop candidates that require fixups if it's beneficial
If it's a bigger code size win to drop candidates that require stack fixups
than to demote every candidate to that variant, the outliner should do that.

This happens if the number of bytes taken by calls to functions that don't
require fixups, plus the number of bytes that'd be left is less than the
number of bytes that it'd take to emit a save + restore for all candidates.

Also add tests for each possible new behaviour.

- machine-outliner-compatible-candidates shows that when we have candidates
that don't use the stack, we can use the default call variant along with the
no save/regsave variant.

- machine-outliner-all-stack shows that when it's better to fix up the stack,
we still will demote all candidates to that case

- machine-outliner-drop-stack shows that we can discard candidates that
require stack fixups when it would be beneficial to do so.

llvm-svn: 348168
2018-12-03 19:11:27 +00:00
Krzysztof Parzyszek 71a7f447f6 [Hexagon] Add HasV5 predicate for compatibility with auto-generated files
llvm-svn: 348167
2018-12-03 19:05:42 +00:00
Krzysztof Parzyszek a55515f9a6 [Hexagon] Remove unused operand definitions, NFC
llvm-svn: 348163
2018-12-03 18:54:24 +00:00
Krzysztof Parzyszek 7ecc277ef9 [Hexagon] Some formatting changes, NFC
llvm-svn: 348162
2018-12-03 18:40:15 +00:00
Craig Topper 5440b63fa8 [X86] Teach LowerMUL/LowerMULH for vXi8 to unpack constant RHS.
Summary:
We need to unpackl and unpackh the operands to use two vXi16 multiplies. Previously it looks like the low unpack would get constant folded at least in the 128-bit case after shuffle lowering turned the unpackl into ZERO_EXTEND_VECTOR_INREG and X86 custom DAG combined it. The same doesn't happen for the high half. So we'd load a constant and then shuffle it. But the low half would just be loaded and used by the multiply directly.

After this patch we now end up with a constant pool entry for the low and high unpacks separately with no shuffle operations.

This is a step towards removing custom constant folding for ZERO_EXTEND_VECTOR_INREG/SIGN_EXTEND_VECTOR_INREG in the X86 backend.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55165

llvm-svn: 348159
2018-12-03 18:26:27 +00:00
Craig Topper e35b01f8ea [X86] Add DAG combine to combine a v8i32->v8i16 truncate with a packuswb that truncates v8i16->v8i8.
Summary:
Under -x86-experimental-vector-widening-legalization, fp_to_uint/fp_to_sint with a smaller than 128 bit vector type results are custom type legalized by promoting the result to a 128 bit vector by promoting the elements, inserting an assertzext/assertsext, then truncating back to original type. The truncate will be further legalizdd to a pack shuffle. In the case of a v8i8 result type, we'll end up with a v8i16 fp_to_sint. This will need to be further legalized during vector op legalization by promoting to v8i32 and then truncating again. Under avx2 this produces good code with two pack instructions, but Under avx512 this will result in a truncate instruction and a packuswb instruction. But we should be able to get away with a single truncate instruction.

The other option is to promote all the way to vXi32 result type during the first type legalization. But in some experimentation that seemed to require more work to produce good code for other configurations.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54836

llvm-svn: 348158
2018-12-03 18:26:24 +00:00
Jonas Paulsson 8ae0f88b13 [SystemZ::TTI] Return zero cost for ICmp that becomes Load And Test.
A loaded value with multiple users compared with 0 will become a load and
test single instruction. The load is not folded in this case (multiple
users), but the compare instruction is eliminated.

This patch returns 0 cost for the icmp in these cases.

Review: Ulrich Weigand
https://reviews.llvm.org/D55111

llvm-svn: 348141
2018-12-03 14:30:18 +00:00
Pablo Barrio a17f855698 [AArch64] Add command-line option for SSBS
Summary:
SSBS (Speculative Store Bypass Safe) is only mandatory from 8.5
onwards but is optional from Armv8.0-A. This patch adds a command
line option to enable SSBS, as it was previously only possible to
enable by selecting -march=armv8.5-a.

Similar patch upstream in GNU binutils:
https://sourceware.org/ml/binutils/2018-09/msg00274.html

Reviewers: olista01, samparker, aemerson

Reviewed By: samparker

Subscribers: javed.absar, kristof.beyls, kristina, llvm-commits

Differential Revision: https://reviews.llvm.org/D54629

llvm-svn: 348137
2018-12-03 14:00:47 +00:00
Ron Lieberman 16de4fd2eb [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
The introduction of S_{ADD|SUB}_U64_PSEUDO instructions which are decomposed
into VOP3 instruction pairs for S_ADD_U64_PSEUDO:
  V_ADD_I32_e64
  V_ADDC_U32_e64
and for S_SUB_U64_PSEUDO
  V_SUB_I32_e64
  V_SUBB_U32_e64
preclude the use of SDWA to encode a constant.
SDWA: Sub-Dword addressing is supported on VOP1 and VOP2 instructions,
but not on VOP3 instructions.

We desire to fold the bit-and operand into the instruction encoding
for the V_ADD_I32 instruction. This requires that we transform the
VOP3 into a VOP2 form of the instruction (_e32).
  %19:vgpr_32 = V_AND_B32_e32 255,
      killed %16:vgpr_32, implicit $exec
  %47:vgpr_32, %49:sreg_64_xexec = V_ADD_I32_e64
      %26.sub0:vreg_64, %19:vgpr_32, implicit $exec
 %48:vgpr_32, dead %50:sreg_64_xexec = V_ADDC_U32_e64
      %26.sub1:vreg_64, %54:vgpr_32, killed %49:sreg_64_xexec, implicit $exec

which then allows the SDWA encoding and becomes
  %47:vgpr_32 = V_ADD_I32_sdwa
      0, %26.sub0:vreg_64, 0, killed %16:vgpr_32, 0, 6, 0, 6, 0,
      implicit-def $vcc, implicit $exec
  %48:vgpr_32 = V_ADDC_U32_e32
      0, %26.sub1:vreg_64, implicit-def $vcc, implicit $vcc, implicit $exec


Differential Revision: https://reviews.llvm.org/D54882

llvm-svn: 348132
2018-12-03 13:04:54 +00:00
Tim Northover 5745b6ac3b ARM: use target-specific SUBS node when combining cmp with cmov.
This has two positive effects. First, using a custom node prevents
recombination leading to an infinite loop since the output DAG is notionally a
little more complex than the input one. Using a flag-setting instruction also
allows the subtraction to be folded with the related comparison more easily.

https://reviews.llvm.org/D53190

llvm-svn: 348122
2018-12-03 11:16:21 +00:00
Diogo N. Sampaio 3c7d062b6b [NFC][AArch64] Split out backend features
This patch splits backend features currently
hidden behind architecture versions.

For example, currently the only way to activate
complex numbers extension is targeting an v8.3
architecture, where after the patch this extension
can be added separately.

This refactoring is required by the new command lines proposal:
http://lists.llvm.org/pipermail/llvm-dev/2018-September/126346.html

Reviewers: DavidSpickett, olista01, t.p.northover

Subscribers: kristof.beyls, bryanpkc, javed.absar, pbarrio

Differential revision: https://reviews.llvm.org/D54633

llvm-svn: 348121
2018-12-03 11:08:13 +00:00
Oliver Stannard 4cf35b4ab0 [ARM][MC] Move information about variadic register defs into tablegen
Currently, variadic operands on an MCInst are assumed to be uses,
because they come after the defs. However, this is not always the case,
for example the Arm/Thumb LDM instructions write to a variable number of
registers.

This adds a property of instruction definitions which can be used to
mark variadic operands as defs. This only affects MCInst, because
MachineInstruction already tracks use/def per operand in each instance
of the instruction, so can already represent this.

This property can then be checked in MCInstrDesc, allowing us to remove
some special cases in ARMAsmParser::isITBlockTerminator.

Differential revision: https://reviews.llvm.org/D54853

llvm-svn: 348114
2018-12-03 10:32:42 +00:00
Oliver Stannard c588110f13 [ARM][Asm] Debug trace for the processInstruction loop
In the Arm assembly parser, we first match an instruction, then call
processInstruction to possibly change it to a different encoding, to
match rules in the architecture manual which can't be expressed by the
table-generated matcher.

This adds debug printing so that this process is visible when using the
-debug option.

To support this, I've added a new overload of MCInst::dump_pretty which
takes the opcode name as a StringRef, since we don't have an InstPrinter
instance in the assembly parser. Instead, we can get the same
information directly from the MCInstrInfo.

Differential revision: https://reviews.llvm.org/D54852

llvm-svn: 348113
2018-12-03 10:21:28 +00:00
Sjoerd Meijer 5afc957eba [ARM] FP16: support vld1.16 for vector loads with post-increment
Differential Revision: https://reviews.llvm.org/D55112

llvm-svn: 348110
2018-12-03 08:26:34 +00:00
Kang Zhang 51986417f9 [PowerPC] Fix inconsistent ImmMustBeMultipleOf for same instruction
Summary:
There are 4 instructions which have Inconsistent ImmMustBeMultipleOf in the
function PPCInstrInfo::instrHasImmForm, they are LFS, LFD, STFS, STFD.
These four instructions should set the ImmMustBeMultipleOf to 1 instead of 4.

Reviewed By: steven.zhang

Differential Revision: https://reviews.llvm.org/D54738

llvm-svn: 348109
2018-12-03 03:32:57 +00:00
QingShan Zhang 8b7653db72 [NFC] [PowerPC] add an routine in PPCTargetLowering to determine if a global is accessed as got-indirect or not.
In theory, we should let the PPC target to determine how to lower the TOC Entry for globals. 
And the PPCTargetLowering requires this query to do some optimization for TOC_Entry. 

Differential Revision: https://reviews.llvm.org/D54925

llvm-svn: 348108
2018-12-03 03:32:16 +00:00
Craig Topper 959b415e2f [X86] Add a DAG combine to turn stores of vXi1 on pre-avx512 targets into a bitcast and a store of a iX scalar.
llvm-svn: 348104
2018-12-02 19:47:14 +00:00
Craig Topper 6f54ff57fd [X86] Fix bad comment. NFC
llvm-svn: 348103
2018-12-02 19:47:13 +00:00
Craig Topper 204e4110e0 [X86] Simplify LowerBITCAST code for v2i32/v4i16/v8i8/i64->mmx/i64/f64 bitcast.
Previously this code generated its own extracts and build_vector. But we can use a simpler concat_vectors or scalar_to_vector operation and let type legalization do additional legalization of those operations.

llvm-svn: 348087
2018-12-02 07:52:39 +00:00
Craig Topper 4bb077910a [X86] Add custom type legalization for v2i32/v4i16/v8i8->mmx bitcasts to avoid a store/load to/from the stack.
Widen the input to a 128 bit vector by padding with undef elements. Then use a movdq2q to convert from xmm register to mmx register.

llvm-svn: 348086
2018-12-02 05:46:50 +00:00
Craig Topper ec096a1dae [X86] Custom type legalize v2i32/v4i16/v8i8->i64 bitcasts in 64-bit mode similar to what's done when the destination is f64.
The generic legalizer will fall back to a stack spill that uses a truncating store. That store will get expanded into a shuffle and non-truncating store on pre-avx512 targets. Once that happens the stack store/load pair will be combined away leaving behind the shuffle and bitcasts. On avx512 targets the truncating store is legal so doesn't get folded away.

By custom legalizing it we can avoid this churn and maybe produce better code.

llvm-svn: 348085
2018-12-02 05:46:48 +00:00
Jessica Paquette 9a7103b0f8 [MachineOutliner][AArch64] Improve checks for stack instructions
If we know that we'll definitely save LR to a register, there's no reason to
pre-check whether or not a stack instruction is unsafe to fix up.

This makes it so that we check for that condition before mapping instructions.

This allows us to outline more, since we don't pessimise as many instructions.

Also update some tests, since we outline more.

llvm-svn: 348081
2018-12-01 21:24:06 +00:00
Craig Topper f4b13927e7 [X86] Don't use zero_extend_vector_inreg for mulhu lowering with sse 4.1
Summary: With sse4.1 we use two zero_extend_vector_inreg and a pshufd to expand the v16i8 input into two v8i16 vectors for the multiply. That's 3 shuffles to extend one operand. The other operand is usually constant as this is mostly used by division by constant optimization. Pre sse4.1 we use a punpckhbw and a punpcklbw with a zero vector. That's two shuffles and an xor and a copy due to tied register constraints. That seems maybe better than the 3 shuffles. With AVX we avoid the copy so that's obviously better.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D55138

llvm-svn: 348079
2018-12-01 19:26:31 +00:00
Graham Sellers ba559ac058 [AMDGPU] Split 64-Bit XNOR to 64-Bit NOT/XOR
The identity ~(x ^ y) == (~x ^ y) == (x ^ ~y) allows XNOR (XOR/NOT) to turn into NOT/XOR. Handling this case with its own split means we can make the NOT remain in the scalar unit. Previously, we split 64-bit XNOR into two 32-bit XNOR, then lowered. Now, we get three instructions (s_not, v_xor, v_xor) rather than four in the case where either of the sources is a scalar 64-bit.

Add test cases to xnor.ll to attempt XNOR Vx, Sy and XNOR Sx, Vy. Also adding test that uses the opposite identity such that (~x ^ y) on the scalar unit (or vector for gfx906) can generate XNOR. This already worked, but I didn't see a test for it.

Differential: https://reviews.llvm.org/D55071
llvm-svn: 348075
2018-12-01 12:27:53 +00:00
Alex Bradbury 757d296222 [RISCV] Remove RV64I SLLW/SRLW/SRAW patterns and add new test cases
As noted by Eli Friedman <https://reviews.llvm.org/D52977?id=168629#1315291>, 
the RV64I shift patterns for SLLW/SRLW/SRAW make some incorrect assumptions. 
SRAW assumed that (sext_inreg foo, i32) could only be produced when 
sign-extended an i32. However, it can be produced by input such as:

define i64 @tricky_ashr(i64 %a, i64 %b) {
  %1 = shl i64 %a, 32
  %2 = ashr i64 %1, 32
  %3 = ashr i64 %2, %b
  ret i64 %3
}

It's important not to select sraw in the above case, because sraw only uses 
bits lower 5 bits from the shift, while a shift of 32-63 would be valid.

Similarly, the patterns for srlw assumed (and foo, 0xffffffff) would only be 
produced when zero-extending a value that was originally i32 in LLVM IR. This
is obviously incorrect.

This patch removes the SLLW/SRLW/SRAW shift patterns for the time being and 
adds test cases that would demonstrate a miscompile if the incorrect patterns 
were re-added.

llvm-svn: 348067
2018-12-01 05:00:00 +00:00
Artem Belevich e5664b1559 [NVPTX] Add lowering of i128 numbers as struct fields
Addition to D34555 - override VTs computation with ComputePTXValueVTs
for struct fields.

Author: Denys Zariaiev<denys.zariaiev@gmail.com>

Differential Revision: https://reviews.llvm.org/D55144

llvm-svn: 348057
2018-12-01 00:21:52 +00:00
Nicolai Haehnle a7b00058e0 AMDGPU: Divergence-driven selection of scalar buffer load intrinsics
Summary:
Moving SMRD to VMEM in SIFixSGPRCopies is rather bad for performance if
the load is really uniform. So select the scalar load intrinsics directly
to either VMEM or SMRD buffer loads based on divergence analysis.

If an offset happens to end up in a VGPR -- either because a floating
point calculation was involved, or due to other remaining deficiencies
in SIFixSGPRCopies -- we use v_readfirstlane.

There is some unrelated churn in tests since we now select MUBUF offsets
in a unified way with non-scalar buffer loads.

Change-Id: I170e6816323beb1348677b358c9d380865cd1a19

Reviewers: arsenm, alex-t, rampitec, tpr

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D53283

llvm-svn: 348050
2018-11-30 22:55:38 +00:00
Nicolai Haehnle a9cc92c247 AMDGPU: Fix various issues around the VirtReg2Value mapping
Summary:
The VirtReg2Value mapping is crucial for getting consistently
reliable divergence information into the SelectionDAG. This
patch fixes a bunch of issues that lead to incorrect divergence
info and introduces tight assertions to ensure we don't regress:

1. VirtReg2Value is generated lazily; there were some cases where
   a lookup was performed before all relevant virtual registers were
   created, leading to an out-of-sync mapping. Those cases were:

  - Complex code to lower formal arguments that generated CopyFromReg
    nodes from live-in registers (fixed by never querying the mapping
    for live-in registers).

  - Code that generates CopyToReg for formal arguments that are used
    outside the entry basic block (fixed by never querying the
    mapping for Register nodes, which don't need the divergence info
    anyway).

2. For complex values that are lowered to a sequence of registers,
   all registers must be reflected in the VirtReg2Value mapping.

I am not adding any new tests, since I'm not actually aware of any
bugs that these problems are causing with trunk as-is. However,
I recently added a test case (in r346423) which fails when D53283 is
applied without this change. Also, the new assertions should provide
most of the effective test coverage.

There is one test change in sdwa-peephole.ll. The underlying issue
is that since the divergence info is now correct, the DAGISel will
select V_OR_B32 directly instead of S_OR_B32. This leads to an extra
COPY which affects the behavior of MachineLICM in a way that ends up
with the S_MOV_B32 with the constant in a different basic block than
the V_OR_B32, which is presumably what defeats the peephole.

Reviewers: alex-t, arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D54340

llvm-svn: 348049
2018-11-30 22:55:29 +00:00
Jessica Paquette 1cb18ec4ec [MachineOutliner] Outline both register save calls + no LR save calls together
Instead of treating the outlined functions for these as distinct frames, they
should be combined into one case. Neither allows for stack fixups, and both
generate the same frame. Thus, they ought to be considered one case.

This makes the code far easier to understand, for one thing. It also offers
some small code size improvements. It's fairly rare to see a class of outlined
functions that doesn't fall entirely into one variant (on CTMark anyway). It
does happen from time to time though.

This mostly offers some serious simplification.

Also update the test to show the added functionality.

llvm-svn: 348036
2018-11-30 21:14:58 +00:00
Peter Collingbourne 35fcc294ab AArch64: Don't emit CFI for SCS register in nounwind functions.
All that you can legitimately do with the CFI for a nounwind function
is get a backtrace, and adjusting the SCS register is not (currently)
required for this purpose.

Differential Revision: https://reviews.llvm.org/D54988

llvm-svn: 348035
2018-11-30 21:04:25 +00:00
Craig Topper 4d80f199e8 [X86] Change vXi8 MULHU lowering to unpack high and low half of lanes instead of extracting and concating low and high half registers.
This reduces the number of shuffle operations that need to be done. The splitting strategy requires the shuffle unit for the extraction and the extension. With the unpack strategy the unpacks accomplish a splitting and extending in one operation.

llvm-svn: 348019
2018-11-30 18:43:18 +00:00
Craig Topper 8191307d09 [X86] Prefer lowerVectorShuffleAsBitMask over using a avx512 masked operation when avx512bw/avx512vl is enabled.
This does require a constant pool load instead of loading an immediate into a gpr, moving to a k register and masking. But its less instructions and more consistent with previous ISAs. It probably opens up more combine opportunities as one of the test cases demonstrates.

llvm-svn: 348018
2018-11-30 18:43:15 +00:00
Ron Lieberman f48e43bbf7 [AMDGPU] Disable SReg Global LD/ST, perf regression
Differential Revision: https://reviews.llvm.org/D55093

llvm-svn: 348014
2018-11-30 18:29:17 +00:00
Valery Pykhtin 3d9afa273f [AMDGPU] Combine DPP mov with use instructions (VOP1/2/3)
Introduces DPP pseudo instructions and the pass that combines DPP mov with subsequent uses.

Differential revision: https://reviews.llvm.org/D53762

llvm-svn: 347993
2018-11-30 14:21:56 +00:00
Alex Bradbury 4830fdd21a [RISCV] Add additional CSR instruction aliases (imm. operands)
This patch adds CSR instructions aliases for the cases where the instruction 
takes an immediate operand but the alias doesn't have the i suffix. This is 
necessary for gas/gcc compatibility.

gas doesn't do a similar conversion for fsflags or fsrm, so this should be 
complete.

Differential Revision: https://reviews.llvm.org/D55008
Patch by Luís Marques.

llvm-svn: 347991
2018-11-30 14:10:52 +00:00
Alex Bradbury 26403def69 [RISCV] Add UNIMP instruction (32- and 16-bit forms)
This patch adds support for UNIMP in both 32- and 16-bit forms. The 32-bit 
form can be seen as a variant of the ECALL/EBREAK/etc. family of instructions. 
The 16-bit form is just all zeroes, which isn't a valid RISC-V instruction, 
but still follows the 16-bit instruction form (i.e. bits 0-1 != 11).

Until recently unimp was undocumented and supported just by binutils, which 
printed unimp for either the 16 or 32-bit form. Both forms are now documented 
<https://github.com/riscv/riscv-asm-manual/pull/20> and binutils now supports 
c.unimp <https://sourceware.org/ml/binutils-cvs/2018-11/msg00179.html>.

Differential Revision: https://reviews.llvm.org/D54316
Patch by Luís Marques.

llvm-svn: 347988
2018-11-30 13:39:17 +00:00
Alex Bradbury e0e62e97df [TargetLowering][RISCV] Introduce isSExtCheaperThanZExt hook and implement for RISC-V
DAGTypeLegalizer::PromoteSetCCOperands currently prefers to zero-extend 
operands when it is able to do so. For some targets this is more expensive 
than a sign-extension, which is also a valid choice. Introduce the 
isSExtCheaperThanZExt hook and use it in the new SExtOrZExtPromotedInteger 
helper. On RISC-V, we prefer sign-extension for FromTy == MVT::i32 and ToTy == 
MVT::i64, as it can be performed using a single instruction.

Differential Revision: https://reviews.llvm.org/D52978

llvm-svn: 347977
2018-11-30 09:56:54 +00:00
Alex Bradbury bc96a98ed0 [RISCV] Introduce codegen patterns for instructions introduced in RV64I
As discussed in the RFC 
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, 64-bit 
RISC-V has i64 as the only legal integer type.  This patch introduces patterns 
to support codegen of the new instructions 
introduced in RV64I: addiw, addiw, subw, sllw, slliw, srlw, srliw, sraw, 
sraiw, ld, sd.

Custom selection code is needed for srliw as SimplifyDemandedBits will remove 
lower bits from the mask, meaning the obvious pattern won't work:

def : Pat<(sext_inreg (srl (and GPR:$rs1, 0xffffffff), uimm5:$shamt), i32),
          (SRLIW GPR:$rs1, uimm5:$shamt)>;
This is sufficient to compile and execute all of the GCC torture suite for 
RV64I other than those files using frameaddr or returnaddr intrinsics 
(LegalizeDAG doesn't know how to promote the operands - a future patch 
addresses this).

When promoting i32 sltu/sltiu operands, it would be more efficient to use 
sign-extension rather than zero-extension for RV64. A future patch adds a hook 
to allow this.

Differential Revision: https://reviews.llvm.org/D52977

llvm-svn: 347973
2018-11-30 09:38:44 +00:00