Commit Graph

19838 Commits

Author SHA1 Message Date
Sanjay Patel 8db22f9032 [x86] add memcmp tests, remove run
Add tests for vector lengths that could be handled without a libcall.

There's an explicit test for 'nobuiltin', so there's not much value
in a separate run that checks that same behavior over and over again.

llvm-svn: 298611
2017-03-23 15:38:22 +00:00
Igor Breger a8ba572dcf [GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE

Reviewers: zvi, rovka, ab, qcolombet

Reviewed By: rovka

Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

Differential Revision: https://reviews.llvm.org/D30973

llvm-svn: 298609
2017-03-23 15:25:57 +00:00
Nirav Dave e9ca32ae52 [SDAG] Fix zeroExtend assertion error
Move CombineTo preventing deleted node from being returned in
visitZERO_EXTEND.

Fixes PR32284.

Reviewers: RKSimon, bogner

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31254

llvm-svn: 298604
2017-03-23 15:01:50 +00:00
Simon Pilgrim d341290b5c [X86][SSE] Add computeNumSignBits test for sitofp of (extended) i64 extracted element
llvm-svn: 298592
2017-03-23 13:18:09 +00:00
Michael Zuckerman 85436ece89 [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).

Differential Revision: https://reviews.llvm.org/D30654

llvm-svn: 298586
2017-03-23 09:57:01 +00:00
Artyom Skrobov 92c0653095 Reapply r298417 "[ARM] Recommit the glueless lowering of addc/adde in Thumb1"
The UB in t2_so_imm_neg conversion has been addressed under D31242 / r298512

This reverts commit r298482.

llvm-svn: 298562
2017-03-22 23:35:51 +00:00
Konstantin Zhuravlyov 4cbb68959b [AMDGPU] Do not emit isa info as code object metadata
- It was decided to expose this information through other means (rocr)

Differential Revision: https://reviews.llvm.org/D30970

llvm-svn: 298560
2017-03-22 23:27:09 +00:00
Konstantin Zhuravlyov a780ffaac2 [AMDGPU] Emit kernel debug properties as code object metadata
Differential Revision: https://reviews.llvm.org/D30969

llvm-svn: 298558
2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov ca0e7f6472 [AMDGPU] Emit kernel code properties as code object metadata
- These are not required for low level runtime

Differential Revision: https://reviews.llvm.org/D29949

llvm-svn: 298556
2017-03-22 22:54:39 +00:00
Sanjay Patel 51e077924d [x86] improve tests, add tests, auto-generate checks; NFC
llvm-svn: 298553
2017-03-22 22:39:17 +00:00
Konstantin Zhuravlyov 7498cd61fb [AMDGPU] Restructure code object metadata creation
- Rename runtime metadata -> code object metadata
  - Make metadata not flow
  - Switch enums to use ScalarEnumerationTraits
  - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
  - Introduce in-memory representation for attributes
  - Code object metadata streamer
  - Create metadata for isa and printf during EmitStartOfAsmFile
  - Create metadata for kernel during EmitFunctionBodyStart
  - Finalize and emit metadata to .note during EmitEndOfAsmFile
  - Other minor improvements/bug fixes

Differential Revision: https://reviews.llvm.org/D29948

llvm-svn: 298552
2017-03-22 22:32:22 +00:00
Artyom Skrobov 50a066b313 [ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one.
Summary: Thanks to Vitaly Buka for helping catch this.

Reviewers: rengolin, jmolloy, efriedma, vitalybuka

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31242

llvm-svn: 298512
2017-03-22 15:09:30 +00:00
Simon Pilgrim 345481599f [X86] Add multiply by constant tests (PR28513)
As discussed on PR28513, add tests for constant multiplication by constants between 1 to 32

llvm-svn: 298497
2017-03-22 12:03:56 +00:00
Jonas Paulsson 808c89f467 [SystemZ] Don't drop any operands in expandZExtPseudo()
Make sure that any operands, e.g. of an implicit def of a super reg is
transferred to the new instruction.

Review: Ulrich Weigand
llvm-svn: 298484
2017-03-22 06:03:32 +00:00
Vitaly Buka e69c137f90 Revert "[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648."
Fails check-llvm with ubsan

This reverts commit r298417.

llvm-svn: 298482
2017-03-22 05:07:44 +00:00
Aditya Nandakumar bc389badbc [GlobalISel]: Create VREGs for ConstantInt args
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.

Reviewed by: ab

llvm-svn: 298473
2017-03-22 01:16:39 +00:00
Ahmed Bougacha 15b3e8a93a [GlobalISel] Update DBG_VALUEs referencing DCE'd instructions.
Quentin points out that r298358 would cause us to emit different code
with debug info.  That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.

Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality.  I'm not sure there is a better scheme.

llvm-svn: 298460
2017-03-21 23:42:54 +00:00
Ahmed Bougacha e8e1fa3a7c [GlobalISel] Don't translate br to layout successor.
MI can represent fallthrough to layout successor blocks, and our
post-isel representation uses that extensively.

We might as well use it too, to avoid translating and carrying along
unnecessary branches.

llvm-svn: 298459
2017-03-21 23:42:50 +00:00
Matt Arsenault 5b20fbb748 AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452
2017-03-21 22:18:10 +00:00
Matthias Braun 8445cbd1ca SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
  when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
  block.

This fixes http://llvm.org/PR32353

llvm-svn: 298448
2017-03-21 21:58:08 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Tim Northover dd4b9d6d7b GlobalISel: widen booleans by zero-extending to a byte.
A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.

llvm-svn: 298439
2017-03-21 21:12:04 +00:00
George Burgess IV 56c7e88c2c Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430
2017-03-21 20:08:59 +00:00
Artyom Skrobov 40a4f40679 [ARM] Recommit the glueless lowering of addc/adde in Thumb1,
including the amended (no UB anymore) fix for adding/subtracting -2147483648.

This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""

llvm-svn: 298417
2017-03-21 18:39:41 +00:00
Krzysztof Parzyszek d033d1fd82 Recommit r298282 with fixes for memory allocation/deallocation
[Hexagon] Recognize polynomial-modulo loop idiom again

Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298400
2017-03-21 17:09:27 +00:00
Marek Olsak 5c7a61d221 AMDGPU: Buffer descriptor changes for GFX9
Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31158

llvm-svn: 298397
2017-03-21 17:00:39 +00:00
Marek Olsak e22fdb9cac AMDGPU: Always use VGPR indexing on GFX9
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31157

llvm-svn: 298396
2017-03-21 17:00:32 +00:00
Krzysztof Parzyszek 5e7f06f354 [Hexagon] Add -march=hexagon to a testcase
llvm-svn: 298395
2017-03-21 16:59:40 +00:00
Matt Arsenault f8fb605a68 AMDGPU: Fix asserting on 0 dmask for image intrinsics
Fold these to undef during lowering so users get eliminated.

llvm-svn: 298387
2017-03-21 16:32:17 +00:00
Matt Arsenault 964a848514 AMDGPU: Convert image intrinsic uses in tests
llvm-svn: 298386
2017-03-21 16:24:12 +00:00
Matt Arsenault dce313c3cf DAG: Fold bitcast/extract_vector_elt of undef to undef
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
2017-03-21 16:20:16 +00:00
Simon Pilgrim 5e39cbaee5 Fix shufpd test name.
llvm-svn: 298381
2017-03-21 15:12:53 +00:00
Sanjay Patel 79379cae15 [x86] use PMOVMSK for vector-sized equality comparisons
We could do better by splitting any oversized type into whatever vector size the target supports, 
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.

Differential Revision: https://reviews.llvm.org/D31156

llvm-svn: 298376
2017-03-21 13:50:33 +00:00
Simon Pilgrim 8bda035121 [X86][AVX] Tests showing missing SHUFPD + ZERO lowering
This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector.

llvm-svn: 298370
2017-03-21 13:30:40 +00:00
Valery Pykhtin fd4c410f4d [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
2017-03-21 13:15:46 +00:00
Volkan Keles 044e003203 [GlobalISel] Fix shufflevector tests
clang-lld-x86_64-2stage fails because of the order
of the instructions. `CHECK-DAG` directives should
fix the problem.

llvm-svn: 298367
2017-03-21 13:12:59 +00:00
Sam Kolton f60ad58dad [ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
2017-03-21 12:51:34 +00:00
Andrea Di Biagio 7937be7dd3 [DebugInfo][X86] Teach Optimize LEAs pass to handle debug values
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.

For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.

Patch by Andrew Ng.

Differential Revision: https://reviews.llvm.org/D30835

llvm-svn: 298360
2017-03-21 11:36:21 +00:00
Jonas Paulsson 54c7680e1f [DAGTypeLegalizer] Handle widening truncate to vector of i1.
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().

This patch adds the needed handling, along with a test case.

Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077

llvm-svn: 298357
2017-03-21 10:24:14 +00:00
Volkan Keles 75bdc7690e [GlobalISel] Translate shufflevector
Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders

Reviewed By: javed.absar

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30962

llvm-svn: 298347
2017-03-21 08:44:13 +00:00
Jonas Paulsson bd65421f08 [SystemZ] Don't drop MO flags in foldMemoryOperandImpl()
The def operand of the new LG/LD should have the old def operands
flags and subreg index.

New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll

Review: Ulrich Weigand
llvm-svn: 298341
2017-03-21 05:49:40 +00:00
Vitaly Buka c12716e742 Revert "[Hexagon] Recognize polynomial-modulo loop idiom again"
Fix memory leaks on check-llvm tests detected by Asan.

This reverts commit r298282.

llvm-svn: 298329
2017-03-21 00:59:51 +00:00
Eli Friedman 76732acc23 [ARM] Revert r297443 and r297820.
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs.  It's not
clear when they will be fixed, so let's just take them out
of the tree for now.

(I resolved a small conflict with r297453.)

llvm-svn: 298328
2017-03-21 00:26:39 +00:00
Vadzim Dambrouski ba789cbd3d [ARM] Fix PR32130: Handle promotion of zero sized constants.
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.

Patch by Tim Neumann.

Differential Revision: https://reviews.llvm.org/D31116

llvm-svn: 298320
2017-03-20 22:59:57 +00:00
Sanjay Patel f238902f52 [x86] add tests for setcc of i128/i256; NFC
llvm-svn: 298317
2017-03-20 22:15:40 +00:00
Tim Northover 4340d64f91 GlobalISel: add implicit defs & uses when mutating an instruction.
Otherwise a scheduler might do bad things to the code we produce.

llvm-svn: 298311
2017-03-20 21:58:23 +00:00
David L. Jones d61548471c [X86] Clean up test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
Summary:
- Migrated from grep to FileCheck.
- Re-indented, removed boilerplate comments.
- Added 'entry' label at beginning of basic block.

Patch by Jorge Gorbe!

Reviewed By: RKSimon

Subscribers: RKSimon, jgorbe, llvm-commits

Differential Revision: https://reviews.llvm.org/D30317

llvm-svn: 298298
2017-03-20 20:10:30 +00:00
Nirav Dave f5f0864ac2 Add test case for merging of chained stores of mismatched type.
llvm-svn: 298293
2017-03-20 19:48:22 +00:00
Krzysztof Parzyszek 8490251de3 [Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298282
2017-03-20 18:12:58 +00:00
Konstantin Zhuravlyov 2534bc07f4 [AMDGPU] Run always inliner early in opt
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
2017-03-20 18:06:45 +00:00
Reid Kleckner 8819c73878 [WinEH] Adjust decision to emit SEH moves for leaf functions
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.

llvm-svn: 298276
2017-03-20 17:45:59 +00:00
Tim Northover 89268b183f GlobalISel: allow quad-precision values to be dumped.
Otherwise the fallback path fails with an assertion on AAPCS AArch64 targets,
when "long double" is encountered.

llvm-svn: 298273
2017-03-20 16:52:08 +00:00
Diana Picus d79253a9f7 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

llvm-svn: 298254
2017-03-20 14:40:18 +00:00
Konstantin Zhuravlyov 8a67eb144f Revert "[AMDGPU] Run always inliner early in opt"
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
2017-03-20 09:26:08 +00:00
Craig Topper 5992c8d1dc [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel
Summary:
Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust.

This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely.

Reviewers: zvi, delena, RKSimon, igorb

Reviewed By: igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31056

llvm-svn: 298228
2017-03-19 17:11:09 +00:00
Simon Pilgrim 8424df7dea Fix constant folding of fp2int to large integers
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.

Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.

This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.

Differential Revision: https://reviews.llvm.org/D31074

llvm-svn: 298226
2017-03-19 16:50:25 +00:00
Ahmed Bougacha 931904d777 [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224
2017-03-19 16:13:00 +00:00
Ahmed Bougacha 48bcd22ce8 [GlobalISel][AArch64] Add DBG_VALUE select test. NFC.
llvm-svn: 298223
2017-03-19 16:12:53 +00:00
Ahmed Bougacha dcd416a4b9 [GlobalISel][AArch64] Split out cast select tests. NFC.
And remove some redundant bitcast tests.

Also split the test functions themselves: it makes it obvious to see
what's tested where and what isn't, it makes the tests much easier to
read and manually update, and, most importantly, it makes them almost
trivial to update using tooling.  Yes, it's obnoxiously verbose, but
said tooling helps upgrade to better MIR syntax whenever available.

llvm-svn: 298222
2017-03-19 16:12:51 +00:00
Oren Ben Simhon 75537b6566 [MIR] Test assumes x64 windows calling convention upon printing/parsing MIR output/input.
llvm-svn: 298212
2017-03-19 13:23:20 +00:00
Benjamin Kramer 6520f83ba4 [MIR] Add triple to test that assumes it runs on windows.
llvm-svn: 298211
2017-03-19 13:04:35 +00:00
Oren Ben Simhon 9ce0ec5dbc CalleeSavedRegister was removed from MIR and is recalculated upon MIR parsing.
llvm-svn: 298210
2017-03-19 11:18:09 +00:00
Oren Ben Simhon a96fdbf233 Moving the test to x86 because other architectures do not suport regcall calling convention.
llvm-svn: 298209
2017-03-19 08:53:42 +00:00
Oren Ben Simhon 0ef61ec32a [MIR] Support Customed Register Mask and CSRs
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.

Differential Revision: https://reviews.llvm.org/D30971

llvm-svn: 298207
2017-03-19 08:14:18 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Stanislav Mekhanoshin 8e45acfc38 [AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172
2017-03-17 23:56:58 +00:00
Sanjay Patel 0429b1a431 [x86] regenerate checks; NFC
llvm-svn: 298166
2017-03-17 23:04:18 +00:00
Sanjay Patel 77e6ebe748 [x86] regenerate checks; NFC
llvm-svn: 298164
2017-03-17 22:47:21 +00:00
Jessica Paquette ea8cc09be0 [Outliner] Add outliner for AArch64
This commit adds the necessary target hooks for outlining in AArch64. It also
refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a
more general function, `getMemOpInfo`. This allows the outliner to share that
code without copying and pasting it.

The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with
the X86-64 outliner.

The test for this pass verifies that the outliner does, in fact outline
functions, fixes up the stack accesses properly, and can correctly generate a
tail call. In the future, this test should be replaced with a MIR test, so that
we can properly test immediate offset overflows in fixed-up instructions.

llvm-svn: 298162
2017-03-17 22:26:55 +00:00
Evgeniy Stepanov 51c962f72e Add !associated metadata.
This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.

!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.

Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do.

Differential Revision: https://reviews.llvm.org/D29104

llvm-svn: 298157
2017-03-17 22:17:24 +00:00
Eli Friedman 46ddab3810 [SelectionDAG] Remove redundant stores more aggressively.
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects.  This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.

It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.

Differential Revision: https://reviews.llvm.org/D29845

llvm-svn: 298156
2017-03-17 22:15:50 +00:00
Matt Arsenault e70d5dcf3e AMDGPU: Fix handling of constant phi input loop conditions
If the loop condition was an i1 phi with a constantexpr input, this
would add a loop intrinsic fed by a phi dependent on a call to
if.break in the same block. Insert the call in the loop header.

llvm-svn: 298121
2017-03-17 20:52:21 +00:00
Sanjay Patel 455703a0c6 [x86] clean up setcc with negated operand transform and add missing test; NFCI
llvm-svn: 298118
2017-03-17 20:29:40 +00:00
Reid Kleckner edf1cbb580 [X86] Emit fewer instructions to allocate >16GB stack frames
Summary:
Use this code pattern when RAX is live, instead of emitting up to 2
billion adjustments:
  pushq %rax
  movabsq +-$Offset+-8, %rax
  addq %rsp, %rax
  xchg %rax, (%rsp)
  movq (%rsp), %rsp

Try to clean this code up a bit while I'm here. In particular, hoist the
logic that handles the entire adjustment with `movabsq $imm, %rax` out
of the loop.

This negates the offset in the prologue and uses ADD because X86 only
has a two operand subtract which always subtracts from the destination
register, which can no longer be RSP.

Fixes PR31962

Reviewers: majnemer, sdardis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30052

llvm-svn: 298116
2017-03-17 20:25:49 +00:00
Jun Bum Lim 4230101def [CodeGenPrep]Restructure promoting Ext to form ExtLoad
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524.  Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.

Reviewers: jmolloy, qcolombet, mcrosier, javed.absar

Reviewed By: qcolombet

Subscribers: rengolin, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27853

llvm-svn: 298114
2017-03-17 19:05:21 +00:00
Simon Pilgrim 5a68d401c7 [SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS
llvm-svn: 298108
2017-03-17 17:45:36 +00:00
Sanjay Patel 25bd713d33 [x86] avoid adc/sbb assert when both sides of add are zexted (PR32316)
As noted in the comment, we might want to account for this case,
but I didn't look at what that would mean for the asm. 

I'm also not sure why this only reproduces with avx512, but I'm 
putting a conservative fix in for now to avoid the crash. 

Also, if both sides of an add are zexted, shouldn't we shrink that add?

https://bugs.llvm.org/show_bug.cgi?id=32316

llvm-svn: 298107
2017-03-17 17:27:31 +00:00
Simon Pilgrim d06b025c9c [X86] Add SelectionDAG.computeKnownBits test showing inability to handle ISD::ABS
We have to be careful as abs(INT_MIN) == INT_MIN.

llvm-svn: 298103
2017-03-17 16:58:15 +00:00
Chad Rosier a69dcb6b66 [AArch64] Use alias analysis in the load/store optimization pass.
This allows the optimization to rearrange loads and stores more aggressively.

Differential Revision: http://reviews.llvm.org/D30903

llvm-svn: 298092
2017-03-17 14:19:55 +00:00
Craig Topper a8d4097445 [AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line.
We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't.

I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled.

Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better.

llvm-svn: 298051
2017-03-17 07:37:31 +00:00
Jonas Paulsson f496bd9a59 [SystemZ] New CodeGen tests for vector compare / select.
New SystemZ tests for the improved codegen of vector compare and select,
including cases with a logical combination of two compares.

Review: Ulrich Weigand.
https://reviews.llvm.org/D29489

llvm-svn: 298049
2017-03-17 07:11:46 +00:00
Jonas Paulsson 8a7bd24c82 [SystemZ] Add use of super-reg in splitMove()
If one of the subregs of the 128 bit reg is undefined when splitMove() splits
a store into two instructions, a use of an undefined physical register
results.

To remedy this, an implicit use of the super register is added onto both new
instructions, along with propagated kill and undef flags.

This was discovered with llvm-stress, and that test case is attached as
test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll

Thanks to Matthias Braun for helping with a nice explanation.

Review: Ulrich Weigand
llvm-svn: 298047
2017-03-17 06:47:08 +00:00
Craig Topper 6a1290a0fd [AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX.
We were giving priority if VLX was enabled.

llvm-svn: 298046
2017-03-17 06:10:37 +00:00
Craig Topper c1338f21ed [X86] Use update_llc_test_checks.py to regenerate a test and add command lines to demonstrate that we don't pick EVEX encoded instruction when AVX512 and FMA3 are both enabled.
This bug only exists on the scalar llvm.fma instrinsics. Looks like we don't test the llvm.fma intrinsics very thoroughly. In fact I don't see any tests for the vector versions.

llvm-svn: 298045
2017-03-17 06:00:01 +00:00
Craig Topper 30c89eeeb6 [X86] Use update_llc_test_checks.py to regenerate a test.
llvm-svn: 298044
2017-03-17 05:59:57 +00:00
Matthias Braun f0b68d3fbc SplitKit: Correctly implement partial subregister copies
- This fixes a bug where subregister incompatible with the vregs register
  class where used.
- Implement the case where multiple copies are necessary to cover a
  given lanemask.

Differential Revision: https://reviews.llvm.org/D30438

llvm-svn: 298025
2017-03-17 00:41:39 +00:00
Eli Friedman da228fee0c [ARM] Use alias analysis in ARMPreAllocLoadStoreOpt.
This allows the optimization to rearrange loads and stores more
aggressively. This doesn't really affect performance, but it helps
codesize.

Differential Revision: https://reviews.llvm.org/D30839

llvm-svn: 298021
2017-03-17 00:34:26 +00:00
Kyle Butt d609d6ebf0 CodeGen: BlockPlacement: Adjust test case so it covers rL297925. NFC
I had ajusted the test case before when testing a chain of length 2, and then
reverted it with rL296845 when I switched to 3 triangles. After running
benchmarks and examining generated code at length 2 I forgot to put the test
back.

llvm-svn: 298000
2017-03-16 21:33:29 +00:00
Daniel Sanders 16846764d0 [globalisel] Correct one more simple immediate that should be a ConstantInt.
llvm-svn: 297979
2017-03-16 19:59:19 +00:00
Craig Topper 96efc5c227 [AVX-512] Add tests for kandn, kor, kxor, and kxnor intrinsics.
llvm-svn: 297978
2017-03-16 19:58:06 +00:00
Daniel Sanders 0e64202871 [globalisel] Correct G_CONSTANT path of selectArithImmed()
Earlier stages of GlobalISel always use ConstantInt in G_CONSTANT so that's
what we should check for.

This fixes a crash introduced in r297782.

llvm-svn: 297968
2017-03-16 18:04:50 +00:00
Adrian Prantl 981f03e6a2 PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288

  The DWARF generated by LLVM includes this location:

  0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
  0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
  to assume the DWARF consumer knows which part of a register
  logically holds the value (low bytes, high bytes, how many bytes,
  etc) for a primitive value like an integer.

This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.

(This reapplies r297960 with two additional testcase updates).

rdar://problem/31069390
https://reviews.llvm.org/D31010

llvm-svn: 297965
2017-03-16 17:14:56 +00:00
Stanislav Mekhanoshin f80507979d [AMDGPU] Run always inliner early in opt
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958
2017-03-16 16:11:46 +00:00
Simon Pilgrim e5d7e6f5e3 [X86] Add PR22338 test case
llvm-svn: 297957
2017-03-16 15:10:42 +00:00
Jonas Paulsson 84319bfc40 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

llvm-svn: 297930
2017-03-16 07:17:12 +00:00
Kyle Butt 08655997eb CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925
2017-03-16 01:32:29 +00:00
Matt Arsenault 7dc01c96ae AMDGPU: Allow sinking of addressing modes for atomic_inc/dec
llvm-svn: 297913
2017-03-15 23:15:12 +00:00
Matt Arsenault 02d915be90 CodeGenPrepare: Sink addressing modes for atomics
llvm-svn: 297903
2017-03-15 22:35:20 +00:00
Zvi Rackover 9ad1b235d5 Second attempt for fix Hexagon buildbot by moving test to under X86/
llvm-svn: 297893
2017-03-15 21:13:45 +00:00
Zvi Rackover de43c859f5 Limit test's triple in attempt to fix broken buildbot
Regression test for a target-independent bug keeps failing in the
Hexagon backend due to what appears an unrelated issue.

llvm-svn: 297888
2017-03-15 20:29:58 +00:00
Zvi Rackover 48cdde0e59 [DAGCombine] Bail out if can't create a vector with at least two elements
Summary:

Fixes pr32278

Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30978

llvm-svn: 297878
2017-03-15 19:48:36 +00:00
Ahmed Bougacha 2fb8030748 [GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS.
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.

When we're emitting synthetic constants, we do need to get Constants from
the context.  One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless.  We currently do this for extractvalue and all.

For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.

llvm-svn: 297875
2017-03-15 19:21:11 +00:00
Ahmed Bougacha 62cd73d989 [GlobalISel][AArch64] Select ADDXri.
We're now able to select ADDWri thanks to the new complex pattern
support.  Extend that to ADDXri.

llvm-svn: 297874
2017-03-15 19:20:59 +00:00
Matt Arsenault 86e02ce2dc AMDGPU: Fix unnecessary ands when packing f16 vectors
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.

llvm-svn: 297873
2017-03-15 19:04:26 +00:00
Tim Northover 0d98b03b9f ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.

Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.

llvm-svn: 297871
2017-03-15 18:38:13 +00:00
Ahmed Bougacha 07f247b6c2 [GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.

llvm-svn: 297869
2017-03-15 18:22:37 +00:00
Ahmed Bougacha a61c214f51 [GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable;  the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.

getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything.  Rename it and extract the MBB creation logic out of it.

A couple tests were sensitive to the order. Update them.

llvm-svn: 297868
2017-03-15 18:22:33 +00:00
Ahmed Bougacha 1a6deeefe0 [GlobalISel][AArch64] Add back constant select tests. NFC.
More of r297856.

llvm-svn: 297859
2017-03-15 16:51:41 +00:00
Ahmed Bougacha d691cf731c [GlobalISel][AArch64] Use appropriate test function names. NFC.
These FP tests are on FPR, not GPR.  Don't lie in the name.

llvm-svn: 297857
2017-03-15 16:29:40 +00:00
Ahmed Bougacha 170778f0db [GlobalISel][AArch64] Split out select tests. NFC.
The test has grown enough to be annoying to navigate.
While there, Remove unnecessary RUNs, and cleanup a couple comments.

llvm-svn: 297856
2017-03-15 16:29:37 +00:00
Peter Collingbourne d44a01aae6 CodeGen: Use the source filename as the argument to .file, rather than the module ID.
Using the module ID here is wrong for a couple of reasons:
1) The module ID is not persisted, so we can end up with different
   object file contents given the same input file (for example if the same
   file is accessed via different paths).
2) With ThinLTO the module ID field may contain the path to a bitcode file,
   which is incorrect, as the .file argument is supposed to contain the path to
   a source file.

Differential Revision: https://reviews.llvm.org/D30584

llvm-svn: 297853
2017-03-15 16:24:52 +00:00
Simon Pilgrim 018eedd9a5 [SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273)
llvm-svn: 297852
2017-03-15 16:22:24 +00:00
Nemanja Ivanovic ffcf0fb1cc [PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic
mfvrd and mffprd are both alias to mfvrsd.
This patch enables correct parsing of the aliases, but we still emit a mfvrsd.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29177

llvm-svn: 297849
2017-03-15 16:04:53 +00:00
Simon Pilgrim a5f332edd1 [SelectionDAG][AArch64] Add test case showing incorrect SelectionDAG::ComputeNumSignBits BUILD_VECTOR handling
Reduced from a mixture of PR32273 and David Green's test cases showing SelectionDAG::ComputeNumSignBits not correctly handling BUILD_VECTOR implicit truncation of inputs.

llvm-svn: 297847
2017-03-15 15:40:34 +00:00
Artyom Skrobov e72e1ba434 Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"
This reverts r297820 which apparently fails on A15 hosts.

llvm-svn: 297842
2017-03-15 14:50:43 +00:00
Eric Liu 8f49635500 Add 'REQUIRES: asserts' to pr32278.ll introduced in r297822
llvm-svn: 297835
2017-03-15 13:37:20 +00:00
Simon Pilgrim 493f4462bf [X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs
Turns out it can happen, so the assertion was too harsh

Found during fuzz testing

llvm-svn: 297833
2017-03-15 13:16:46 +00:00
Petar Jovanovic b71386a4a4 [Mips] Add support to match more patterns for DEXT and CINS
This patch adds support for recognizing more patterns to match to DEXT and
CINS instructions.
It finds cases where multiple instructions could be replaced with a single
DEXT or CINS instruction.

For example, for the following:

define i64 @dext_and32(i64 zeroext %a) {
entry:

 %and = and i64 %a, 4294967295
 ret i64 %and
}

instead of generating:

 0000000000000088 <dext_and32>:

 88:   64010001        daddiu  at,zero,1
 8c:   0001083c        dsll32  at,at,0x0
 90:   6421ffff        daddiu  at,at,-1
 94:   03e00008        jr      ra
 98:   00811024        and     v0,a0,at
 9c:   00000000        nop

the following gets generated:

 0000000000000068 <dext_and32>:

 68:   03e00008        jr      ra
 6c:   7c82f803        dext    v0,a0,0x0,0x20

Cases that are covered:

DEXT:

 1. and $src, mask where mask > 0xffff
 2. zext $src zero extend from i32 to i64

CINS:

 1. and (shl $src, pos), mask
 2. shl (and $src, mask), pos
 3. zext (shl $src, pos) zero extend from i32 to i64

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D30464

llvm-svn: 297832
2017-03-15 13:10:08 +00:00
Zvi Rackover 4aacd5d3c4 Fix malformed XFAIL in previous commit
llvm-svn: 297823
2017-03-15 11:44:14 +00:00
Zvi Rackover 81f7b88910 [DAGCombine] Add reproducer for pr32278
llvm-svn: 297822
2017-03-15 11:34:51 +00:00
Artyom Skrobov 3fa5fd1dd2 [Thumb1] Fix the bug when adding/subtracting -2147483648
Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297820
2017-03-15 10:19:16 +00:00
Sam Parker 654cb8263a [ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.

Differential Revision: https://reviews.llvm.org/D30044

llvm-svn: 297809
2017-03-15 08:27:11 +00:00
Taewook Oh fb1833efeb [BranchFolding] Merge debug locations from common tail instead of removing
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.

Reviewers: aprantl, probinson, MatzeB, rob.lougher

Reviewed By: aprantl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D30226

llvm-svn: 297805
2017-03-15 05:44:59 +00:00
Peter Collingbourne 7f6e2c97b8 Ensure that prefix data is preserved with subsections-via-symbols
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.

Patch by Moritz Angermann!

Differential Revision: https://reviews.llvm.org/D30770

llvm-svn: 297804
2017-03-15 04:18:16 +00:00
Volkan Keles 4862c63594 [GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30948

llvm-svn: 297792
2017-03-14 23:45:06 +00:00
Daniel Sanders 8a4bae9993 [globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.

Depends on D30046, D29712

Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30089

llvm-svn: 297782
2017-03-14 21:32:08 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Sanjay Patel 8dd99dce6c [DAG] vector div/rem with any zero element in divisor is undef
This is the backend counterpart to:
https://reviews.llvm.org/rL297390
https://reviews.llvm.org/rL297409
and follow-up to:
https://reviews.llvm.org/rL297384

It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, 
but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those 
someday.

Differential Revision: https://reviews.llvm.org/D30826

llvm-svn: 297762
2017-03-14 18:06:28 +00:00
Simon Pilgrim 3a196cbc4f [X86] Add extra BITREVERSE tests
Test on 32-bit and 64-bit targets.

Add bitreverse tests for i64, i32 and i16

llvm-svn: 297741
2017-03-14 14:03:16 +00:00
Simon Pilgrim e1a72a936f [X86][MMX] Update FIXME comment. NFCI.
llvm-svn: 297736
2017-03-14 12:13:41 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Oren Ben Simhon fe34c5e429 Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.

Differential Revision: https://reviews.llvm.org/D28566

llvm-svn: 297715
2017-03-14 09:09:26 +00:00
Craig Topper 7a5ee1c5ed [AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index.
llvm-svn: 297707
2017-03-14 06:40:04 +00:00
Craig Topper b0a82eaea6 [AVX-512] Add test cases that demonstrate some patterns that don't work correctly in 32-bit mode. NFC
llvm-svn: 297706
2017-03-14 06:40:00 +00:00
Nirav Dave 4fc8401abf Recommitting Craig Topper's patch now that r296476 has been recommitted.
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.

This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 297698
2017-03-14 01:42:23 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Artyom Skrobov bf19d4bc29 [Thumb1] combine ADDC/SUBC with a negative immediate
Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400

Reviewers: efriedma, jmolloy

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297682
2017-03-13 22:36:14 +00:00
Craig Topper 784f241b59 [AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel.
Fixes PR32256. Still planning to do an audit for other possible cases.

llvm-svn: 297678
2017-03-13 21:58:54 +00:00
Volkan Keles 38a91a0de6 GlobalISel: Translate ConstantDataVector
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab

Reviewed By: qcolombet, dsanders, ab

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30216

llvm-svn: 297670
2017-03-13 21:36:19 +00:00
Tim Northover 55e6f10d69 Revert "GlobalISel: move vector extract/insert inside generic opcode region."
I was writing against an earlier branch and Volkan had already fixed this.

llvm-svn: 297668
2017-03-13 21:25:10 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Tim Northover 0f1d32d557 GlobalISel: move vector extract/insert inside generic opcode region.
Otherwise they won't be legalized or selected, causing instruction selection to
fail horribly.

llvm-svn: 297666
2017-03-13 21:18:59 +00:00
Andrew Kaylor a11d020699 Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier
I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced.

llvm-svn: 297664
2017-03-13 20:35:10 +00:00
Matt Arsenault 971c85ebb4 AMDGPU: Treat 0 as private null pointer in addrspacecast lowering
llvm-svn: 297658
2017-03-13 19:47:31 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Craig Topper 616641632e [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies.
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.

llvm-svn: 297652
2017-03-13 18:34:46 +00:00
Craig Topper eb7ea28bdd [AVX-512] If gather mask is all ones, force the input to a zero vector.
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.

Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.

llvm-svn: 297651
2017-03-13 18:17:46 +00:00
Diana Picus 94db2e288b [ARM] GlobalISel: Support SP in regbankselect
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.

llvm-svn: 297621
2017-03-13 14:28:34 +00:00
Craig Topper 7746565754 [AVX-512] Add EVEX2VEX test cases for the cvt instructions fixed in r297599 and r297600.
llvm-svn: 297603
2017-03-13 05:47:56 +00:00
Craig Topper bb4089d260 Revert "[AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead."
This reverts r297596.

There were other issues that were making this not work that have been fixed now. Reverting this results in a more accurate table.

llvm-svn: 297602
2017-03-13 05:34:03 +00:00
Craig Topper 166085f0f2 [AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead.
This exposed that we have several intrinsic instructions that have identical TSFlags to other instructions. We should merge their patterns and kill of the duplicate. I'll fix that in a follow up patch.

llvm-svn: 297596
2017-03-13 00:36:49 +00:00
Craig Topper 7d56c8315b [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.

llvm-svn: 297591
2017-03-12 22:29:12 +00:00
Sanjay Patel f06b963a2b [x86] don't blindly transform SETB into SBB
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. 
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.

This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.

Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.

The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.

Differential Revision: https://reviews.llvm.org/D30611

llvm-svn: 297586
2017-03-12 18:28:48 +00:00
Azharuddin Mohammed 473b75c3d5 Remove CRC32 instructions from AArch64InstrInfo::hasShiftedReg
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.

The case statements corresponding to CRC instructions are incorrect and should
be removed.

Also adding a testcase while on this.

Reviewers: t.p.northover, javed.absar, apazos, rengolin

Reviewed By: rengolin

Subscribers: evandro, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30274

llvm-svn: 297582
2017-03-12 14:02:32 +00:00
Igor Breger 293dfb9768 [X86] Add vector zext tests.
llvm-svn: 297581
2017-03-12 13:20:10 +00:00
Craig Topper 58647b16e5 [AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.

I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.

llvm-svn: 297574
2017-03-12 03:37:37 +00:00
Craig Topper e726cd0cd1 [AVX-512] Add test case for PR32241. Fix coming in another commit.
llvm-svn: 297573
2017-03-12 03:37:34 +00:00
Simon Pilgrim 18debfa5b4 [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.

This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.

Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?

Differential Revision: https://reviews.llvm.org/D29841

llvm-svn: 297568
2017-03-11 20:42:31 +00:00
Craig Topper d511c2ce04 [X86] Add avx2 gather tests cases that show a failure to remove zeroing of the source when the mask is all ones.
llvm-svn: 297564
2017-03-11 18:26:00 +00:00
Matt Arsenault dd905b0e9b AMDGPU: Remove packf16 intrinsic
llvm-svn: 297557
2017-03-11 05:51:16 +00:00
Matt Arsenault 3cb9ff8863 AMDGPU: Keep track of modifiers when converting v_mac to v_mad
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.

This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.

Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 297556
2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin 79da2a7698 [AMDGPU] Remove getBidirectionalReasonRank
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.

The CandReason enum is properly sorted, so just remove artificial
ranking.

Differential Revision: https://reviews.llvm.org/D30557

llvm-svn: 297536
2017-03-11 00:29:27 +00:00
Krzysztof Parzyszek 0e7b1f83b7 [RDF] Remove the map of reaching defs from copy propagation
Use Liveness::getNearestAliasedRef to find the reaching def instead.

llvm-svn: 297526
2017-03-10 22:44:24 +00:00
Simon Pilgrim 128a10a41d [X86][SSE] Fix load folding for (V)CVTDQ2PD
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl

llvm-svn: 297523
2017-03-10 22:35:07 +00:00
Simon Pilgrim 9956661456 [X86][RTM] Regenerate RTM intrinsic tests for 32/64-bit targets.
llvm-svn: 297518
2017-03-10 21:55:24 +00:00
Volkan Keles 970fee4bfe GlobalISel: Translate ConstantAggregateZero vectors
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30259

llvm-svn: 297509
2017-03-10 21:23:13 +00:00
Volkan Keles 04cb08cc83 [GlobalISel] Translate insertelement and extractelement
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30761

llvm-svn: 297495
2017-03-10 19:08:28 +00:00
Simon Pilgrim 7dedbfa89d [SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits
llvm-svn: 297492
2017-03-10 18:36:46 +00:00
Simon Pilgrim e54cd65399 [X86][SSE] Added tests showing missed truncations for sitofp conversion
SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller.

llvm-svn: 297486
2017-03-10 18:01:53 +00:00
Amaury Sechet 62e0759d56 [SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC.
Summary:
Depends on D30379

This improves the state of things for the sub class of operation.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30436

llvm-svn: 297482
2017-03-10 17:26:44 +00:00
Simon Pilgrim ed655f09db [X86][MMX] Add tests showing missed opportunities to use MMX sitofp conversions
If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc.

llvm-svn: 297481
2017-03-10 17:23:55 +00:00
Amaury Sechet 69fa16c810 [SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO.
Summary: As per title. This is extracted from D29872 and I threw SADDO in.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30379

llvm-svn: 297479
2017-03-10 17:06:52 +00:00
Simon Pilgrim c6b55729a5 [X86][MMX] Add tests showing missed opportunities to use MMX fptosi conversions
If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc.

llvm-svn: 297476
2017-03-10 16:59:43 +00:00
Simon Pilgrim b8856148d9 [X86][MMX] Updated bad stack spill shift value test to actually show the problem
Cleaning up the ir had stopped showing the issue.

llvm-svn: 297475
2017-03-10 16:18:50 +00:00
Simon Pilgrim 67d25b298a [X86][MMX] Regenerate mmx bitcast tests
llvm-svn: 297474
2017-03-10 16:07:39 +00:00
Simon Pilgrim caa9172ba7 [X86][MMX] Add test showing bad stack spill of shift value
i32 is spilled to stack but 64-bit mmx is reloaded - leaving garbage in the other half of the register

llvm-svn: 297471
2017-03-10 15:53:41 +00:00
Simon Pilgrim 63ad95aee6 [X86][MMX] Regenerate mmx load folding tests
llvm-svn: 297470
2017-03-10 15:41:05 +00:00
Simon Dardis 7090d145e8 [mips][msa] Accept more values for constant splats
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.

As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30640

llvm-svn: 297457
2017-03-10 13:27:14 +00:00
Artyom Skrobov 0c93ceb5d8 For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.

Reviewers: jmolloy, rogfer01, efriedma

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30400

llvm-svn: 297443
2017-03-10 07:40:27 +00:00
Sanjay Patel 65e2e6805a [x86] add tests for vec div/rem with 0 element in divisor; NFC
llvm-svn: 297433
2017-03-10 00:55:29 +00:00
Ahmed Bougacha 4ec6d5abed [GlobalISel] Fallback when failing to translate invoke.
We unintentionally stopped falling back in r293670.

While there, change an unusual construct.

llvm-svn: 297425
2017-03-10 00:25:35 +00:00
Tim Northover aa995c98f4 GlobalISel: support trivial inlineasm calls.
They're used for nefarious purposes by ObjC.

llvm-svn: 297422
2017-03-09 23:36:26 +00:00
Amaury Sechet e7d102cf02 [DAGCombiner] Do various combine on uaddo.
Summary: This essentially does the same transform as for ADC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30417

llvm-svn: 297416
2017-03-09 22:47:00 +00:00
Krzysztof Parzyszek 544210304f [Hexagon] Fixes to the bitsplit generation
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
  could be reused.

llvm-svn: 297414
2017-03-09 22:02:14 +00:00
Tim Northover d1e951e5eb GlobalISel: inform FrameLowering when we emit a function call.
Amongst other things (I expect) this is necessary to ensure decent backtraces
when an "unreachable" is involved.

llvm-svn: 297413
2017-03-09 22:00:39 +00:00
Tim Northover 7a9ea8f628 GlobalISel: put debug info for static allocas in the MachineFunction.
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.

The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken.  And it still gets it wrong if the variable is at SP
directly.

llvm-svn: 297410
2017-03-09 21:12:06 +00:00
Amaury Sechet 10425de063 [DAGCombiner] Do various combine on usubo.
Summary: This essentially does the same transform as for SUBC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30437

llvm-svn: 297404
2017-03-09 19:28:00 +00:00
Krzysztof Parzyszek 78c4fcf12e [Hexagon] Propagate zext of i1 into arithmetic code in selection DAG
(op ... (zext i1 c) ...) -> (select c (op ... 1 ...),
                                      (op ... 0 ...))

llvm-svn: 297391
2017-03-09 16:29:30 +00:00
Sam Parker b308b48d69 [ARM] Remove t2xtpk feature from tests
I previously removed the T2XtPk feature from the ARM backend, but it
looks like I missed some of the tests that were using the feature.

Differential Revision: https://reviews.llvm.org/D30778

llvm-svn: 297386
2017-03-09 15:14:32 +00:00
Sanjay Patel df21979db7 [DAG] recognize div/rem by 0 as undef before trying constant folding
As discussed in the review thread for rL297026, this is actually 2 changes that 
would independently fix all of the test cases in the patch:

1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before 
   foldBinopIntoSelect() as a matter of efficiency.

I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().

I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).

Differential Revision:
https://reviews.llvm.org/D30741

llvm-svn: 297384
2017-03-09 15:02:25 +00:00
Simon Dardis 7577ce2140 [mips] Revert fixes for PR32020.
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.

Revert "[Target/MIPS] Kill dead code, no functional change intended."

This reverts commit r296153.

Revert "Recommit "[mips] Fix atomic compare and swap at O0.""

This reverts commit r296134.

llvm-svn: 297380
2017-03-09 14:03:26 +00:00
Simon Dardis 158956c6cc [mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.

This partially resolves PR/27458

Thanks to Quentin Colombet for reporting the issue!

llvm-svn: 297372
2017-03-09 11:19:48 +00:00
Adam Nemet 5361b82d54 [SSP] In opt remarks, stream Function directly
With this, it shows up as an attribute in YAML and non-printable characters
are properly removed by GlobalValue::getRealLinkageName.

llvm-svn: 297362
2017-03-09 06:10:27 +00:00
Matt Arsenault 9a3fd87523 DAG: Check no signed zeros instead of unsafe math attribute
llvm-svn: 297354
2017-03-09 01:36:39 +00:00
Tim Northover 7596bd7a27 GlobalISel: correctly handle trivial fcmp predicates.
It makes sense to only do them once in IRTranslator rather than making everyone
deal with them.

llvm-svn: 297304
2017-03-08 18:49:54 +00:00
Volkan Keles 5698b2ae6e [GlobalISel] Add default action for G_FNEG
Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets.

Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits

Differential Revision: https://reviews.llvm.org/D30721

llvm-svn: 297301
2017-03-08 18:09:14 +00:00
Sanjay Patel 9f495695bb [x86] regenerate checks; NFC
This test could be reduced? The check fails for a seemingly unrelated change,
so I'm adding full checks to see what is happening.

llvm-svn: 297296
2017-03-08 17:19:56 +00:00
Krzysztof Parzyszek 1b7197e690 [Hexagon] Use correct offset when extracting from the high word
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).

llvm-svn: 297288
2017-03-08 15:46:28 +00:00
Daniel Cederman 9db582a656 [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: davide, RKSimon, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D27089

llvm-svn: 297285
2017-03-08 15:23:10 +00:00
Tim Shen c7472d912b Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

llvm-svn: 297256
2017-03-08 02:41:35 +00:00