Commit Graph

50431 Commits

Author SHA1 Message Date
Thomas Lively 6bf2b40051 [WebAssembly] Expand SIMD shifts while V8's implementation disagrees
Summary:
V8 currently implements SIMD shifts as taking an immediate operation,
which disagrees with the spec proposal and the toolchain
implementation. As a stopgap measure to get things working, unroll all
vector shifts. Since this is a temporary measure, there are no tests.

Reviewers: aheejin, dschuff

Subscribers: sbc100, jgravelle-google, sunfish, dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D56520

llvm-svn: 351151
2019-01-15 02:16:03 +00:00
Marek Olsak 33eb4d947d AMDGPU: Add a fast path for icmp.i1(src, false, NE)
Summary:
This allows moving the condition from the intrinsic to the standard ICmp
opcode, so that LLVM can do simplifications on it. The icmp.i1 intrinsic
is an identity for retrieving the SGPR mask.

And we can also get the mask from and i1, or i1, xor i1.

Reviewers: arsenm, nhaehnle

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D52060

llvm-svn: 351150
2019-01-15 02:13:18 +00:00
Evandro Menezes f793fe1402 [AArch64] Adjust the feature set for Exynos
Enable the fusion of arithmetic and logic instructions for Exynos M4.

llvm-svn: 351149
2019-01-15 01:53:49 +00:00
Reid Kleckner fe5e5dcab0 [X86] Avoid clobbering ESP/RSP in the epilogue.
Summary:
In r345197 ESP and RSP were added to GR32_TC/GR64_TC, allowing them to
be used for tail calls, but this also caused `findDeadCallerSavedReg` to
think they were acceptable targets for clobbering. Filter them out.

Fixes PR40289.

Patch by Geoffry Song!

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D56617

llvm-svn: 351146
2019-01-15 01:24:18 +00:00
Eli Friedman 33aecc8182 [AArch64] Explicitly use v1i64 type for llvm.aarch64.neon.abs.i64 .
Otherwise, with D56544, the intrinsic will be expanded to an integer
csel, which is probably not what the user expected.  This matches the
general convention of using "v1" types to represent scalar integer
operations in vector registers.

While I'm here, also add some error checking so we don't generate
illegal ABS nodes.

Differential Revision: https://reviews.llvm.org/D56616

llvm-svn: 351141
2019-01-15 00:15:24 +00:00
Evandro Menezes bf59cb02c3 [AArch64] Add new target feature to fuse arithmetic and logic operations
This feature enables the fusion of some arithmetic and logic instructions
together.

Differential revision: https://reviews.llvm.org/D56572

llvm-svn: 351139
2019-01-14 23:54:36 +00:00
Benjamin Kramer 2fc8ede082 [X86] Fix unused variable warning in Release builds. NFC.
llvm-svn: 351136
2019-01-14 23:29:54 +00:00
Nikita Popov 5885eec35a Revert "[CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors"
This reverts commit r351125.

I missed test changes in an SLPVectorizer test, due to the cost model
changes. Reverting for now.

llvm-svn: 351129
2019-01-14 22:18:39 +00:00
Thomas Lively 2a47e03ee4 [WebAssembly][FastISel] Do not assume naive CmpInst lowering
Summary:
Fixes https://bugs.llvm.org/show_bug.cgi?id=40172. See
test/CodeGen/WebAssembly/PR40172.ll for an explanation.

Reviewers: dschuff, aheejin

Subscribers: nikic, llvm-commits, sunfish, jgravelle-google, sbc100

Differential Revision: https://reviews.llvm.org/D56457

llvm-svn: 351127
2019-01-14 22:03:43 +00:00
Nikita Popov 8e9a8432a8 [CodeGen][X86] Expand USUBSAT to UMAX+SUB, also for vectors
Related to https://bugs.llvm.org/show_bug.cgi?id=40123.

Rather than scalarizing, expand a vector USUBSAT into UMAX+SUB,
which produces much better code for X86.

Differential Revision: https://reviews.llvm.org/D56636

llvm-svn: 351125
2019-01-14 21:43:30 +00:00
Craig Topper 9906f77f82 [X86] Silence a -Wparentheses warning on gcc. NFC
llvm-svn: 351111
2019-01-14 19:44:02 +00:00
Simon Pilgrim bfe2ee453a [X86][SSSE3] Bailout of lowerVectorShuffleAsPermuteAndUnpack for shuffle-with-zero (PR40306)
If we have PSHUFB and we're shuffling with a zero vector, then we are better off not doing VECTOR_SHUFFLE(UNPCK()) as we lose track of those zero elements.

llvm-svn: 351103
2019-01-14 19:07:26 +00:00
Sanjay Patel b23ff7a0e2 [x86] lower extracted add/sub to horizontal vector math
add (extractelt (X, 0), extractelt (X, 1)) --> extractelt (hadd X, X), 0

This is the integer sibling to D56011.

There's an additional restriction to only to do this transform in the
case where we don't have extra extracts from the source vector. Without
that, we can fail to match larger horizontal patterns that are more 
beneficial than this minimal case. An improvement to the more general 
h-op lowering may allow us to remove the restriction here in a follow-up.

llvm-svn: 351093
2019-01-14 18:44:02 +00:00
Dan Gohman bbb548d85f [WebAssembly] Remove old intrinsics
This removes the old grow_memory and mem.grow-style intrinsics, leaving just
the memory.grow-style intrinsics.

Differential Revision: https://reviews.llvm.org/D56645

llvm-svn: 351084
2019-01-14 18:23:45 +00:00
Aleksandar Beserminji 4c4c0377ca [mips] Optimize shifts for types larger than GPR size (mips2/mips3)
With this patch, shifts are lowered to optimal number of instructions
necessary to shift types larger than the general purpose register size.

This resolves PR/32293.

Thanks to Kyle Butt for reporting the issue!

Differential Revision: https://reviews.llvm.org/D56320

llvm-svn: 351059
2019-01-14 12:28:51 +00:00
Diana Picus 8987d00653 [ARM GlobalISel] Import MOVi32imm into GlobalISel
Make it possible for TableGen to produce code for selecting MOVi32imm.
This allows reasonably recent ARM targets to select a lot more constants
than before.

We achieve this by adding GISelPredicateCode to arm_i32imm. It's
impossible to use the exact same code for both DAGISel and GlobalISel,
since one uses "Subtarget->" and the other "STI." to refer to the
subtarget. Moreover, in GlobalISel we don't have ready access to the
MachineFunction, so we need to add a bit of code for obtaining it from
the instruction that we're selecting. This is also the reason why it
needs to remain a PatLeaf instead of the more specific IntImmLeaf.

llvm-svn: 351056
2019-01-14 12:04:08 +00:00
David Stuttard f77079f892 [AMDGPU] Add support for TFE/LWE in image intrinsics. 2nd try
TFE and LWE support requires extra result registers that are written in the
event of a failure in order to detect that failure case.
The specific use-case that initiated these changes is sparse texture support.

This means that if image intrinsics are used with either option turned on, the
programmer must ensure that the return type can contain all of the expected
results. This can result in redundant registers since the vector size must be a
power-of-2.

This change takes roughly 6 parts:
1. Modify the instruction defs in tablegen to add new instruction variants that
can accomodate the extra return values.
2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE
(where the bulk of the work for these instruction types is now done)
3. Extra verification code to catch cases where intrinsics have been used but
insufficient return registers are used.
4. Modification to the adjustWritemask optimisation to account for TFE/LWE being
enabled (requires extra registers to be maintained for error return value).
5. An extra pass to zero initialize the error value return - this is because if
the error does not occur, the register is not written and thus must be zeroed
before use. Also added a new (on by default) option to ensure ALL return values
are zero-initialized that is required for sparse texture support.
6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO
for this to re-enable and handle correctly).

There's an additional fix now to avoid a dmask=0

For an image intrinsic with tfe where all result channels except tfe
were unused, I was getting an image instruction with dmask=0 and only a
single vgpr result for tfe. That is incorrect because the hardware
assumes there is at least one vgpr result, plus the one for tfe.

Fixed by forcing dmask to 1, which gives the desired two vgpr result
with tfe in the second one.

The TFE or LWE result is returned from the intrinsics using an aggregate
type. Look in the test code provided to see how this works, but in essence IR
code to invoke the intrinsic looks as follows:

%v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15,
                                      i32 %s, <8 x i32> %rsrc, i32 1, i32 0)
%v.vec = extractvalue {<4 x float>, i32} %v, 0
%v.err = extractvalue {<4 x float>, i32} %v, 1

This re-submit of the change also includes a slight modification in
SIISelLowering.cpp to work-around a compiler bug for the powerpc_le
platform that caused a buildbot failure on a previous submission.

Differential revision: https://reviews.llvm.org/D48826

Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda


Work around for ppcle compiler bug

Change-Id: Ie284cf24b2271215be1b9dc95b485fd15000e32b
llvm-svn: 351054
2019-01-14 11:55:24 +00:00
Francis Visoiu Mistrih b7cef81fd3 Replace "no-frame-pointer-*" function attributes with "frame-pointer"
Part of the effort to refactoring frame pointer code generation. We used
to use two function attributes "no-frame-pointer-elim" and
"no-frame-pointer-elim-non-leaf" to represent three kinds of frame
pointer usage: (all) frames use frame pointer, (non-leaf) frames use
frame pointer, (none) frame use frame pointer. This CL makes the idea
explicit by using only one enum function attribute "frame-pointer"

Option "-frame-pointer=" replaces "-disable-fp-elim" for tools such as
llc.

"no-frame-pointer-elim" and "no-frame-pointer-elim-non-leaf" are still
supported for easy migration to "frame-pointer".

tests are mostly updated with

// replace command line args ‘-disable-fp-elim=false’ with ‘-frame-pointer=none’
grep -iIrnl '\-disable-fp-elim=false' * | xargs sed -i '' -e "s/-disable-fp-elim=false/-frame-pointer=none/g"

// replace command line args ‘-disable-fp-elim’ with ‘-frame-pointer=all’
grep -iIrnl '\-disable-fp-elim' * | xargs sed -i '' -e "s/-disable-fp-elim/-frame-pointer=all/g"

Patch by Yuanfang Chen (tabloid.adroit)!

Differential Revision: https://reviews.llvm.org/D56351

llvm-svn: 351049
2019-01-14 10:55:55 +00:00
Petar Avramovic 7d370a36bb [MIPS GlobalISel] Add pre legalizer combiner pass
Introduce GlobalISel pre legalizer pass for MIPS.
It will be used to cope with instructions that require
combining before legalization.

Differential Revision: https://reviews.llvm.org/D56269

llvm-svn: 351046
2019-01-14 10:27:05 +00:00
Craig Topper e7b4ea4726 [X86] Remove mask parameter from avx512 pmultishiftqb intrinsics. Use select in IR instead.
Fixes PR40259

llvm-svn: 351035
2019-01-14 08:46:45 +00:00
Craig Topper ab077dda72 [X86] Update type profile for DBPSADBW to indicate the immediate is an i8 not just any int.
Removes some type checks from X86GenDAGISel.inc

llvm-svn: 351033
2019-01-14 02:59:08 +00:00
Craig Topper c8cd85588b [X86] Remove unused intrinsic handlers. NFC
llvm-svn: 351032
2019-01-14 01:56:59 +00:00
Craig Topper 075fcc1151 [X86] Remove FPCLASS intrinsic handler. Use INTR_TYPE_2OP instead. NFC
llvm-svn: 351031
2019-01-14 01:44:09 +00:00
Craig Topper 3f3b8ef442 [X86] Remove mask parameter from vpshufbitqmb intrinsics. Change result to a vXi1 vector.
The input mask can be represented with an AND in IR.

Fixes PR40258

llvm-svn: 351028
2019-01-14 00:03:50 +00:00
Craig Topper 31156bbdb9 [X86] Add more ISD nodes to handle masked versions of VCVT(T)PD2DQZ128/VCVT(T)PD2UDQZ128 which only produce 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877

llvm-svn: 351018
2019-01-13 02:59:59 +00:00
Craig Topper 4561edbec0 [X86] Add X86ISD::VMFPROUND to handle the masked case of VCVTPD2PSZ128 which only produces 2 result elements and zeroes the upper elements.
We can't represent this properly with vselect like we normally do. We also have to update the instruction definition to use a VK2WM mask instead of VK4WM to represent this.

Fixes another case from PR34877.

llvm-svn: 351017
2019-01-13 02:59:57 +00:00
Simon Pilgrim a0069ba0db [X86] More aggressive shuffle mask widening in combineExtractWithShuffle
Use demanded extract index to set most of the shuffle mask to undef, making it easier to widen and peek through.

llvm-svn: 351013
2019-01-12 16:38:56 +00:00
Simon Pilgrim a21e2bd682 [X86] Improve vXi64 ISD::ABS codegen with SSE41+
Make use of vblendvpd to select on the signbit

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350999
2019-01-12 10:28:12 +00:00
Simon Pilgrim ca0de0363b [X86][AARCH64] Improve ISD::ABS support
This patch takes some of the code from D49837 to allow us to enable ISD::ABS support for all SSE vector types.

Differential Revision: https://reviews.llvm.org/D56544

llvm-svn: 350998
2019-01-12 09:59:32 +00:00
Craig Topper 90fe6edcba [X86] Remove X86ISD::SELECT as its no longer used by any of our intrinsic lowering.
llvm-svn: 350995
2019-01-12 08:15:54 +00:00
Craig Topper 33b2cf50e3 [X86] Add ISD node for masked version of CVTPS2PH.
The 128-bit input produces 64-bits of output and fills the upper 64-bits with 0. The mask only applies to the lower elements. But we can't represent this with a vselect like we normally do.

This also avoids the need to have a special X86ISD::SELECT when avx512bw isn't enabled since vselect v8i16 isn't legal there.

Fixes another instruction for PR34877.

llvm-svn: 350994
2019-01-12 08:05:12 +00:00
Alex Bradbury 61aa940074 [RISCV] Introduce codegen patterns for RV64M-only instructions
As discussed on llvm-dev
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>, we have
to be careful when trying to select the *w RV64M instructions. i32 is not a
legal type for RV64 in the RISC-V backend, so operations have been promoted by
the time they reach instruction selection. Information about whether the
operation was originally a 32-bit operations has been lost, and it's easy to
write incorrect patterns.

Similarly to the variable 32-bit shifts, a DAG combine on ANY_EXTEND will
produce a SIGN_EXTEND if this is likely to result in sdiv/udiv/urem being
selected (and so save instructions to sext/zext the input operands).

Differential Revision: https://reviews.llvm.org/D53230

llvm-svn: 350993
2019-01-12 07:43:06 +00:00
Alex Bradbury d05eae7a7b [RISCV] Add patterns for RV64I SLLW/SRLW/SRAW instructions
This restores support for selecting the SLLW/SRLW/SRAW instructions, which was
removed in rL348067 as the previous patterns made some unsafe assumptions.
Also see the related llvm-dev discussion
<http://lists.llvm.org/pipermail/llvm-dev/2018-December/128497.html>

Ultimately I didn't introduce a custom SelectionDAG node, but instead added a
DAG combine that inserts an AssertZext i5 on the shift amount for an i32
variable-length shift and also added an ANY_EXTEND DAG-combine which will
instead produce a SIGN_EXTEND for an i32 variable-length shift, increasing the
opportunity to safely select SLLW/SRLW/SRAW.

There are obviously different ways of addressing this (a number discussed in
the llvm-dev thread), so I'd welcome further feedback and comments.

Note that there are now some cases in
test/CodeGen/RISCV/rv64i-exhaustive-w-insts.ll where sraw/srlw/sllw is
selected even though sra/srl/sll could be used without any extra instructions.
Given both are semantically equivalent, there doesn't seem a good reason to
prefer one vs the other. Given that would require more logic to still select
sra/srl/sll in those cases, I've left it preferring the *w variants.

Differential Revision: https://reviews.llvm.org/D56264

llvm-svn: 350992
2019-01-12 07:32:31 +00:00
Craig Topper a69d903204 [X86] Remove unnecessary code from getMaskNode.
We no longer need to extend mask scalars before bitcasting them to vXi1. This was only needed for the truncate intrinsics. And was really a bug in our lowering of them.

llvm-svn: 350991
2019-01-12 06:13:44 +00:00
Craig Topper bf61525e8c [X86] When lowering v1i1/v2i1/v4i1/v8i1 load/store with avx512f, but not avx512dq, use v16i1 as the intermediate mask type instead of v8i1.
We still use i8 for the load/store type. So we need to convert to/from i16 to around the mask type.

By doing this we get an i8->i16 extload which we can then pattern match to a KMOVW if the access is aligned.

llvm-svn: 350989
2019-01-12 02:22:10 +00:00
Craig Topper 8695e6dfc4 [X86] Change some patterns that select MOVZX16rm8 to instead select MOVZX32rm8 and extract the subregister.
This should be a shorter encoding and is consistent with what we do for zext i8->i16

llvm-svn: 350988
2019-01-12 02:22:06 +00:00
Evandro Menezes 7bc55e4075 [ARM] Fix typo
Fix typo in r350952.

llvm-svn: 350986
2019-01-12 01:06:43 +00:00
Craig Topper abe6ef8d09 [X86] Add ISD nodes for masked truncate so we can properly represent when the output has more elements than the input due to needing to be 128 bits.
We can't properly represent this with a vselect since the upper elements of the result are supposed to be zeroed regardless of the mask.

This also reuses the new nodes even when the result type fits in 128 bits if the input is q/d and the result is w/b since vselect w/b using k-register condition isn't legal without avx512bw. Currently we're doing this even when avx512bw is enabled, but I might change that.

This fixes some of PR34877

llvm-svn: 350985
2019-01-12 00:55:27 +00:00
Evandro Menezes 7d7e3256cd [AArch64] Improve Exynos predicates
Expand the predicate using shifted arithmetic and logic instructions to also
consider the respective not shifted instructions.

llvm-svn: 350976
2019-01-11 22:39:47 +00:00
Nirav Dave 6b7f5aac72 [X86] Fix incomplete handling of register-assigned variables in parsing.
Teach x86 assembly operand parsing to distinguish between assembler
variable assigned to named registers and those assigned to immediate
values.

Reviewers: rnk, nickdesaulniers, void

Subscribers: hiraditya, jyknight, llvm-commits

Differential Revision: https://reviews.llvm.org/D56287

llvm-svn: 350966
2019-01-11 20:17:36 +00:00
Evandro Menezes 0c14c87d00 [AArch64] Add pipeline model for Exynos M4
Add the scheduling and cost model for Exynos M4.

llvm-svn: 350960
2019-01-11 19:36:25 +00:00
Evandro Menezes 0674762112 [AArch64] Create feature set for Exynos M4
Complete the feature set for Exynos M4 and update test cases.

llvm-svn: 350953
2019-01-11 18:54:25 +00:00
Sanjay Patel 40cd4b77e9 [x86] allow insert/extract when matching horizontal ops
Previously, we limited this transform to cases where the
extraction into the build vector happens from vectors of
the same type as the build vector, but that's not required.

There's a slight potential regression seen in the AVX512
result for phadd -- we're using the 256-bit flavor of the
instruction now even though the 128-bit subset is sufficient.
The same problem could already be seen in the AVX2 result.
Follow-up patches will attempt to narrow that back down.

llvm-svn: 350928
2019-01-11 14:27:59 +00:00
Craig Topper b97885cc2e [X86] Change vXi1 extract_vector_elt lowering to be legal if the index is 0. Add DAG combine to turn scalar_to_vector+extract_vector_elt into extract_subvector.
We were lowering the last step extract_vector_elt to a bitcast+truncate. Change it to use an extract_vector_elt of index 0 instead. Add isel patterns to do the equivalent of what the bitcast would have done. Plus an isel pattern for an any_extend+extract to prevent some regressions.

Finally add a DAG combine to turn v1i1 scalar_to_vector+extract_vector_elt of 0 into an extract_subvector.

This fixes some of the regressions from D350800.

llvm-svn: 350918
2019-01-11 05:44:56 +00:00
Heejin Ahn e73c7a1ab2 [WebAssembly] Fix stack pointer store check in RegStackify
Summary:
We now use __stack_pointer global and global.get/global.set instruction.
This fixes the checking routine for stack_pointer writes accordingly.

This also fixes the existing __stack_pointer test in reg-stackify.ll:
That test used to pass not because of __stack_pointer clashes but
because the function `stackpointer_callee` was not marked as `readnone`,
so it was assumed to possibly write to memory arbitraily, and
`global.set` instruction was marked as `mayStore` in the .td definition,
so they were identified as intervening writes. After we added `readnone`
to its attribute, this test fails without this patch.

Reviewers: dschuff, sunfish

Subscribers: jgravelle-google, sbc100, llvm-commits

Differential Revision: https://reviews.llvm.org/D56094

llvm-svn: 350906
2019-01-10 23:12:07 +00:00
Anton Korobeynikov 0681d6bc90 [MSP430] Minor fixes/improvements for assembler/disassembler
* Teach AsmParser to recognize @rn in distination operand as 0(rn).
* Do not allow Disassembler decoding instructions that have size more
  than a number of input bytes.
* Fix UB in MSP430MCCodeEmitter.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56547

llvm-svn: 350903
2019-01-10 22:59:50 +00:00
Anton Korobeynikov 29ffb6d558 [MSP430] Add missing instruction forms
* Add missing mm, [r|m]n, [r|m]p instruction forms.
* Fix bit16mc instruction.

Patch by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D56546

llvm-svn: 350902
2019-01-10 22:54:53 +00:00
Thomas Lively 64a39a1c4e [WebAssembly] Add unimplemented-simd128 subtarget feature
Summary:
This is a third attempt, but this time we have vetted it on Windows
first. The previous errors were due to an uninitialized class member.

Reviewers: aheejin

Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D56560

llvm-svn: 350901
2019-01-10 22:32:11 +00:00
Craig Topper 844f989608 [X86] Call SimplifyDemandedBits on conditions of X86ISD::SHRUNKBLEND
This extends to combineVSelectToShrunkBlend to be able to resimplify SHRUNKBLENDS that have already been created.

This should help some of the regressions from D56387

Differential Revision: https://reviews.llvm.org/D56421

llvm-svn: 350875
2019-01-10 19:05:34 +00:00
Craig Topper 350e6e9d7c [X86] Simplify the BRCOND handling for FCMP_UNE.
Despite what the comment says, FCMP_UNE would be an OR not an AND. In the lowering code the first branch created still goes to the original destination. The second branch was exchanged to go to where the subsequent unconditional branch went. This is different than what we do for FCMP_OEQ where both branches that we create go to the original unconditional branch.

As far as I can tell, I think this means we don't need to exchange the branch target with the unconditional branch for FCMP_UNE at all.

Differential Revision: https://reviews.llvm.org/D56309

llvm-svn: 350873
2019-01-10 19:02:14 +00:00