Commit Graph

19938 Commits

Author SHA1 Message Date
Sanjay Patel 8db22f9032 [x86] add memcmp tests, remove run
Add tests for vector lengths that could be handled without a libcall.

There's an explicit test for 'nobuiltin', so there's not much value
in a separate run that checks that same behavior over and over again.

llvm-svn: 298611
2017-03-23 15:38:22 +00:00
Igor Breger a8ba572dcf [GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE

Reviewers: zvi, rovka, ab, qcolombet

Reviewed By: rovka

Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

Differential Revision: https://reviews.llvm.org/D30973

llvm-svn: 298609
2017-03-23 15:25:57 +00:00
Nirav Dave e9ca32ae52 [SDAG] Fix zeroExtend assertion error
Move CombineTo preventing deleted node from being returned in
visitZERO_EXTEND.

Fixes PR32284.

Reviewers: RKSimon, bogner

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31254

llvm-svn: 298604
2017-03-23 15:01:50 +00:00
Simon Pilgrim d341290b5c [X86][SSE] Add computeNumSignBits test for sitofp of (extended) i64 extracted element
llvm-svn: 298592
2017-03-23 13:18:09 +00:00
Michael Zuckerman 85436ece89 [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).

Differential Revision: https://reviews.llvm.org/D30654

llvm-svn: 298586
2017-03-23 09:57:01 +00:00
Artyom Skrobov 92c0653095 Reapply r298417 "[ARM] Recommit the glueless lowering of addc/adde in Thumb1"
The UB in t2_so_imm_neg conversion has been addressed under D31242 / r298512

This reverts commit r298482.

llvm-svn: 298562
2017-03-22 23:35:51 +00:00
Konstantin Zhuravlyov 4cbb68959b [AMDGPU] Do not emit isa info as code object metadata
- It was decided to expose this information through other means (rocr)

Differential Revision: https://reviews.llvm.org/D30970

llvm-svn: 298560
2017-03-22 23:27:09 +00:00
Konstantin Zhuravlyov a780ffaac2 [AMDGPU] Emit kernel debug properties as code object metadata
Differential Revision: https://reviews.llvm.org/D30969

llvm-svn: 298558
2017-03-22 23:10:46 +00:00
Konstantin Zhuravlyov ca0e7f6472 [AMDGPU] Emit kernel code properties as code object metadata
- These are not required for low level runtime

Differential Revision: https://reviews.llvm.org/D29949

llvm-svn: 298556
2017-03-22 22:54:39 +00:00
Sanjay Patel 51e077924d [x86] improve tests, add tests, auto-generate checks; NFC
llvm-svn: 298553
2017-03-22 22:39:17 +00:00
Konstantin Zhuravlyov 7498cd61fb [AMDGPU] Restructure code object metadata creation
- Rename runtime metadata -> code object metadata
  - Make metadata not flow
  - Switch enums to use ScalarEnumerationTraits
  - Cleanup and move AMDGPUCodeObjectMetadata.h to AMDGPU/MCTargetDesc
  - Introduce in-memory representation for attributes
  - Code object metadata streamer
  - Create metadata for isa and printf during EmitStartOfAsmFile
  - Create metadata for kernel during EmitFunctionBodyStart
  - Finalize and emit metadata to .note during EmitEndOfAsmFile
  - Other minor improvements/bug fixes

Differential Revision: https://reviews.llvm.org/D29948

llvm-svn: 298552
2017-03-22 22:32:22 +00:00
Artyom Skrobov 50a066b313 [ARM] t2_so_imm_neg had a subtle bug in the conversion, and could trigger UB by negating (int)-2147483648. By pure luck, none of the pre-existing tests triggered this; so I'm adding one.
Summary: Thanks to Vitaly Buka for helping catch this.

Reviewers: rengolin, jmolloy, efriedma, vitalybuka

Subscribers: llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D31242

llvm-svn: 298512
2017-03-22 15:09:30 +00:00
Simon Pilgrim 345481599f [X86] Add multiply by constant tests (PR28513)
As discussed on PR28513, add tests for constant multiplication by constants between 1 to 32

llvm-svn: 298497
2017-03-22 12:03:56 +00:00
Jonas Paulsson 808c89f467 [SystemZ] Don't drop any operands in expandZExtPseudo()
Make sure that any operands, e.g. of an implicit def of a super reg is
transferred to the new instruction.

Review: Ulrich Weigand
llvm-svn: 298484
2017-03-22 06:03:32 +00:00
Vitaly Buka e69c137f90 Revert "[ARM] Recommit the glueless lowering of addc/adde in Thumb1, including the amended (no UB anymore) fix for adding/subtracting -2147483648."
Fails check-llvm with ubsan

This reverts commit r298417.

llvm-svn: 298482
2017-03-22 05:07:44 +00:00
Aditya Nandakumar bc389badbc [GlobalISel]: Create VREGs for ConstantInt args
This patch changes the behavior of IRTranslating intrinsics where we
now create VREG + G_CONSTANT for ConstantInt values. We already do this
for FloatingPoint values. This makes it easier for the backends to
select code and it won't have to de-duplicate creation+selection of
constants.

Reviewed by: ab

llvm-svn: 298473
2017-03-22 01:16:39 +00:00
Ahmed Bougacha 15b3e8a93a [GlobalISel] Update DBG_VALUEs referencing DCE'd instructions.
Quentin points out that r298358 would cause us to emit different code
with debug info.  That's a big no-no; also erase the instructions that
only live thanks to DBG_VALUE users.

Adrian explained how this is an existing problem and an OK thing to do:
clang has allocas for all variables so shouldn't be affected at -O0, but
swift uses a bit of inlineasm to explicitly keep values live for the
purpose of debug info quality.  I'm not sure there is a better scheme.

llvm-svn: 298460
2017-03-21 23:42:54 +00:00
Ahmed Bougacha e8e1fa3a7c [GlobalISel] Don't translate br to layout successor.
MI can represent fallthrough to layout successor blocks, and our
post-isel representation uses that extensively.

We might as well use it too, to avoid translating and carrying along
unnecessary branches.

llvm-svn: 298459
2017-03-21 23:42:50 +00:00
Matt Arsenault 5b20fbb748 AMDGPU: Rename SI_RETURN
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.

llvm-svn: 298452
2017-03-21 22:18:10 +00:00
Matthias Braun 8445cbd1ca SplitKit: Fix subreg copy related problems
Fix two problems related to r298025:
- SplitKit would create duplicate VNIs in some cases leading to crashs
  when hoisting copies.
- VirtRegMap could fail expanding copies at the beginning of a basic
  block.

This fixes http://llvm.org/PR32353

llvm-svn: 298448
2017-03-21 21:58:08 +00:00
Matt Arsenault 3dbeefa978 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Tim Northover dd4b9d6d7b GlobalISel: widen booleans by zero-extending to a byte.
A bool is represented by a single byte, which the ARM ABI requires to be either
0 or 1. So we cannot use G_ANYEXT when legalizing the type.

llvm-svn: 298439
2017-03-21 21:12:04 +00:00
George Burgess IV 56c7e88c2c Let llvm.objectsize be conservative with null pointers
This adds a parameter to @llvm.objectsize that makes it return
conservative values if it's given null.

This fixes PR23277.

Differential Revision: https://reviews.llvm.org/D28494

llvm-svn: 298430
2017-03-21 20:08:59 +00:00
Artyom Skrobov 40a4f40679 [ARM] Recommit the glueless lowering of addc/adde in Thumb1,
including the amended (no UB anymore) fix for adding/subtracting -2147483648.

This reverts r298328 "[ARM] Revert r297443 and r297820."
and partially reverts r297842 "Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648""

llvm-svn: 298417
2017-03-21 18:39:41 +00:00
Krzysztof Parzyszek d033d1fd82 Recommit r298282 with fixes for memory allocation/deallocation
[Hexagon] Recognize polynomial-modulo loop idiom again

Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298400
2017-03-21 17:09:27 +00:00
Marek Olsak 5c7a61d221 AMDGPU: Buffer descriptor changes for GFX9
Reviewers: arsenm

Subscribers: qcolombet, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31158

llvm-svn: 298397
2017-03-21 17:00:39 +00:00
Marek Olsak e22fdb9cac AMDGPU: Always use VGPR indexing on GFX9
Reviewers: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr

Differential Revision: https://reviews.llvm.org/D31157

llvm-svn: 298396
2017-03-21 17:00:32 +00:00
Krzysztof Parzyszek 5e7f06f354 [Hexagon] Add -march=hexagon to a testcase
llvm-svn: 298395
2017-03-21 16:59:40 +00:00
Matt Arsenault f8fb605a68 AMDGPU: Fix asserting on 0 dmask for image intrinsics
Fold these to undef during lowering so users get eliminated.

llvm-svn: 298387
2017-03-21 16:32:17 +00:00
Matt Arsenault 964a848514 AMDGPU: Convert image intrinsic uses in tests
llvm-svn: 298386
2017-03-21 16:24:12 +00:00
Matt Arsenault dce313c3cf DAG: Fold bitcast/extract_vector_elt of undef to undef
Fixes not eliminating store when intrinsic is lowered to undef.

llvm-svn: 298385
2017-03-21 16:20:16 +00:00
Simon Pilgrim 5e39cbaee5 Fix shufpd test name.
llvm-svn: 298381
2017-03-21 15:12:53 +00:00
Sanjay Patel 79379cae15 [x86] use PMOVMSK for vector-sized equality comparisons
We could do better by splitting any oversized type into whatever vector size the target supports, 
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.

Differential Revision: https://reviews.llvm.org/D31156

llvm-svn: 298376
2017-03-21 13:50:33 +00:00
Simon Pilgrim 8bda035121 [X86][AVX] Tests showing missing SHUFPD + ZERO lowering
This lowers to SHUFPD if the input is zeroinitializer but not with a demanded elts optimized build vector.

llvm-svn: 298370
2017-03-21 13:30:40 +00:00
Valery Pykhtin fd4c410f4d [AMDGPU] Iterative scheduling infrastructure + minimal registry scheduler
Differential revision: https://reviews.llvm.org/D31046

llvm-svn: 298368
2017-03-21 13:15:46 +00:00
Volkan Keles 044e003203 [GlobalISel] Fix shufflevector tests
clang-lld-x86_64-2stage fails because of the order
of the instructions. `CHECK-DAG` directives should
fix the problem.

llvm-svn: 298367
2017-03-21 13:12:59 +00:00
Sam Kolton f60ad58dad [ADMGPU] SDWA peephole optimization pass.
Summary:
First iteration of SDWA peephole.

This pass tries to combine several instruction into one SDWA instruction. E.g. it converts:
'''
    V_LSHRREV_B32_e32 %vreg0, 16, %vreg1
    V_ADD_I32_e32 %vreg2, %vreg0, %vreg3
    V_LSHLREV_B32_e32 %vreg4, 16, %vreg2
'''
Into:
'''
   V_ADD_I32_sdwa %vreg4, %vreg1, %vreg3 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:DWORD
'''

Pass structure:
    1. Iterate over machine instruction in basic block and try to apply "SDWA patterns" to each of them. SDWA patterns match machine instruction into either source or destination SDWA operand. E.g. ''' V_LSHRREV_B32_e32 %vreg0, 16, %vreg1''' is matched to source SDWA operand '''%vreg1 src_sel:WORD_1'''.
    2. Iterate over found SDWA operands and find instruction that could be potentially coverted into SDWA. E.g. for source SDWA operand potential instruction are all instruction in this basic block that uses '''%vreg0'''
    3. Iterate over all potential instructions and check if they can be converted into SDWA.
    4. Convert instructions to SDWA.

This review contains basic implementation of SDWA peephole pass. This pass requires additional testing fot both correctness and performance (no performance testing done).
There are several ways this pass can be improved:
    1. Make this pass work on whole function not only basic block. As I can see this can be done right now without changes to pass.
    2. Introduce more SDWA patterns
    3. Introduce mnemonics to limit when SDWA patterns should apply

Reviewers: vpykhtin, alex-t, arsenm, rampitec

Subscribers: wdng, nhaehnle, mgorny

Differential Revision: https://reviews.llvm.org/D30038

llvm-svn: 298365
2017-03-21 12:51:34 +00:00
Andrea Di Biagio 7937be7dd3 [DebugInfo][X86] Teach Optimize LEAs pass to handle debug values
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.

For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.

Patch by Andrew Ng.

Differential Revision: https://reviews.llvm.org/D30835

llvm-svn: 298360
2017-03-21 11:36:21 +00:00
Jonas Paulsson 54c7680e1f [DAGTypeLegalizer] Handle widening truncate to vector of i1.
Previously, PromoteIntRes_TRUNCATE() did not handle the case where
the operand needs widening, which resulted in llvm_unreachable().

This patch adds the needed handling, along with a test case.

Review: Eli Friedman, Simon Pilgrim.
https://reviews.llvm.org/D31077

llvm-svn: 298357
2017-03-21 10:24:14 +00:00
Volkan Keles 75bdc7690e [GlobalISel] Translate shufflevector
Reviewers: qcolombet, aditya_nandakumar, t.p.northover, javed.absar, ab, dsanders

Reviewed By: javed.absar

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30962

llvm-svn: 298347
2017-03-21 08:44:13 +00:00
Jonas Paulsson bd65421f08 [SystemZ] Don't drop MO flags in foldMemoryOperandImpl()
The def operand of the new LG/LD should have the old def operands
flags and subreg index.

New test: test/CodeGen/SystemZ/fold-memory-op-impl.ll

Review: Ulrich Weigand
llvm-svn: 298341
2017-03-21 05:49:40 +00:00
Vitaly Buka c12716e742 Revert "[Hexagon] Recognize polynomial-modulo loop idiom again"
Fix memory leaks on check-llvm tests detected by Asan.

This reverts commit r298282.

llvm-svn: 298329
2017-03-21 00:59:51 +00:00
Eli Friedman 76732acc23 [ARM] Revert r297443 and r297820.
The glueless lowering of addc/adde in Thumb1 has known serious
miscompiles (see https://reviews.llvm.org/D31081), and r297820
causes an infinite loop for certain constructs.  It's not
clear when they will be fixed, so let's just take them out
of the tree for now.

(I resolved a small conflict with r297453.)

llvm-svn: 298328
2017-03-21 00:26:39 +00:00
Vadzim Dambrouski ba789cbd3d [ARM] Fix PR32130: Handle promotion of zero sized constants.
The special case of zero sized values was previously not handled correctly.
This patch handles this by not promoting if the size is zero.

Patch by Tim Neumann.

Differential Revision: https://reviews.llvm.org/D31116

llvm-svn: 298320
2017-03-20 22:59:57 +00:00
Sanjay Patel f238902f52 [x86] add tests for setcc of i128/i256; NFC
llvm-svn: 298317
2017-03-20 22:15:40 +00:00
Tim Northover 4340d64f91 GlobalISel: add implicit defs & uses when mutating an instruction.
Otherwise a scheduler might do bad things to the code we produce.

llvm-svn: 298311
2017-03-20 21:58:23 +00:00
David L. Jones d61548471c [X86] Clean up test/CodeGen/X86/2006-03-01-InstrSchedBug.ll
Summary:
- Migrated from grep to FileCheck.
- Re-indented, removed boilerplate comments.
- Added 'entry' label at beginning of basic block.

Patch by Jorge Gorbe!

Reviewed By: RKSimon

Subscribers: RKSimon, jgorbe, llvm-commits

Differential Revision: https://reviews.llvm.org/D30317

llvm-svn: 298298
2017-03-20 20:10:30 +00:00
Nirav Dave f5f0864ac2 Add test case for merging of chained stores of mismatched type.
llvm-svn: 298293
2017-03-20 19:48:22 +00:00
Krzysztof Parzyszek 8490251de3 [Hexagon] Recognize polynomial-modulo loop idiom again
Regain the ability to recognize loops calculating polynomial modulo
operation. This ability has been lost due to some changes in the
preceding optimizations. Add code to preprocess the IR to a form
that the pattern matching code can recognize.

llvm-svn: 298282
2017-03-20 18:12:58 +00:00
Konstantin Zhuravlyov 2534bc07f4 [AMDGPU] Run always inliner early in opt
Differential Revision: https://reviews.llvm.org/D31141

llvm-svn: 298281
2017-03-20 18:06:45 +00:00
Reid Kleckner 8819c73878 [WinEH] Adjust decision to emit SEH moves for leaf functions
Move the check for "MF->hasWinCFI()" up into the calculation of the
shouldEmitMoves boolean, rather than putting it in the early returning
if. This ensures that endFunction doesn't try to emit .seh_* directives
for leaf functions.

llvm-svn: 298276
2017-03-20 17:45:59 +00:00
Tim Northover 89268b183f GlobalISel: allow quad-precision values to be dumped.
Otherwise the fallback path fails with an assertion on AAPCS AArch64 targets,
when "long double" is encountered.

llvm-svn: 298273
2017-03-20 16:52:08 +00:00
Diana Picus d79253a9f7 [GlobalISel] Use the correct calling conv for calls
This commit adds a parameter that lets us pass in the calling convention
of the call to CallLowering::lowerCall. This allows us to handle
situations where the calling convetion of the callee is different from
that of the caller.

Differential Revision: https://reviews.llvm.org/D31039

llvm-svn: 298254
2017-03-20 14:40:18 +00:00
Konstantin Zhuravlyov 8a67eb144f Revert "[AMDGPU] Run always inliner early in opt"
This reverts commit r297958, it breaks device-libs build.

llvm-svn: 298239
2017-03-20 09:26:08 +00:00
Craig Topper 5992c8d1dc [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel
Summary:
Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust.

This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely.

Reviewers: zvi, delena, RKSimon, igorb

Reviewed By: igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31056

llvm-svn: 298228
2017-03-19 17:11:09 +00:00
Simon Pilgrim 8424df7dea Fix constant folding of fp2int to large integers
We make the assumption in most of our constant folding code that a fp2int will target an integer of 128-bits or less, calling the APFloat::convertToInteger with only uint64_t[2] of raw bits for the result.

Fuzz testing (PR24662) showed that we don't handle other cases at all, resulting in stack overflows and all sorts of crashes.

This patch uses the APSInt version of APFloat::convertToInteger instead to better handle such cases.

Differential Revision: https://reviews.llvm.org/D31074

llvm-svn: 298226
2017-03-19 16:50:25 +00:00
Ahmed Bougacha 931904d777 [GlobalISel] Don't select trivially dead instructions.
Folding instructions when selecting can cause them to become dead.
Don't select these dead instructions (if they don't have other side
effects, and don't define physical registers).

Preserve existing tests by adding COPYs.

In some tests, the G_CONSTANT vregs never get constrained to a class:
the only use of the vreg was folded into another instruction, so the
G_CONSTANT, now dead, never gets selected.

llvm-svn: 298224
2017-03-19 16:13:00 +00:00
Ahmed Bougacha 48bcd22ce8 [GlobalISel][AArch64] Add DBG_VALUE select test. NFC.
llvm-svn: 298223
2017-03-19 16:12:53 +00:00
Ahmed Bougacha dcd416a4b9 [GlobalISel][AArch64] Split out cast select tests. NFC.
And remove some redundant bitcast tests.

Also split the test functions themselves: it makes it obvious to see
what's tested where and what isn't, it makes the tests much easier to
read and manually update, and, most importantly, it makes them almost
trivial to update using tooling.  Yes, it's obnoxiously verbose, but
said tooling helps upgrade to better MIR syntax whenever available.

llvm-svn: 298222
2017-03-19 16:12:51 +00:00
Oren Ben Simhon 75537b6566 [MIR] Test assumes x64 windows calling convention upon printing/parsing MIR output/input.
llvm-svn: 298212
2017-03-19 13:23:20 +00:00
Benjamin Kramer 6520f83ba4 [MIR] Add triple to test that assumes it runs on windows.
llvm-svn: 298211
2017-03-19 13:04:35 +00:00
Oren Ben Simhon 9ce0ec5dbc CalleeSavedRegister was removed from MIR and is recalculated upon MIR parsing.
llvm-svn: 298210
2017-03-19 11:18:09 +00:00
Oren Ben Simhon a96fdbf233 Moving the test to x86 because other architectures do not suport regcall calling convention.
llvm-svn: 298209
2017-03-19 08:53:42 +00:00
Oren Ben Simhon 0ef61ec32a [MIR] Support Customed Register Mask and CSRs
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.

Differential Revision: https://reviews.llvm.org/D30971

llvm-svn: 298207
2017-03-19 08:14:18 +00:00
Nirav Dave ac6081cb67 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Stanislav Mekhanoshin 8e45acfc38 [AMDGPU] Add address space based alias analysis pass
This is direct port of HSAILAliasAnalysis pass, just cleaned for
style and renamed.

Differential Revision: https://reviews.llvm.org/D31103

llvm-svn: 298172
2017-03-17 23:56:58 +00:00
Sanjay Patel 0429b1a431 [x86] regenerate checks; NFC
llvm-svn: 298166
2017-03-17 23:04:18 +00:00
Sanjay Patel 77e6ebe748 [x86] regenerate checks; NFC
llvm-svn: 298164
2017-03-17 22:47:21 +00:00
Jessica Paquette ea8cc09be0 [Outliner] Add outliner for AArch64
This commit adds the necessary target hooks for outlining in AArch64. It also
refactors the switch statement used in `getMemOpBaseRegImmOfsWidth` into a
more general function, `getMemOpInfo`. This allows the outliner to share that
code without copying and pasting it.

The AArch64 outliner can be run using -mllvm -enable-machine-outliner, as with
the X86-64 outliner.

The test for this pass verifies that the outliner does, in fact outline
functions, fixes up the stack accesses properly, and can correctly generate a
tail call. In the future, this test should be replaced with a MIR test, so that
we can properly test immediate offset overflows in fixed-up instructions.

llvm-svn: 298162
2017-03-17 22:26:55 +00:00
Evgeniy Stepanov 51c962f72e Add !associated metadata.
This is an ELF-specific thing that adds SHF_LINK_ORDER to the global's section
pointing to the metadata argument's section. The effect of that is a reverse dependency
between sections for the linker GC.

!associated does not change the behavior of global-dce. The global
may also need to be added to llvm.compiler.used.

Since SHF_LINK_ORDER is per-section, !associated effectively enables
fdata-sections for the affected globals, the same as comdats do.

Differential Revision: https://reviews.llvm.org/D29104

llvm-svn: 298157
2017-03-17 22:17:24 +00:00
Eli Friedman 46ddab3810 [SelectionDAG] Remove redundant stores more aggressively.
Handle TokenFactors more aggressively in
SDValue::reachesChainWithoutSideEffects.  This isn't really a
very effective change anymore because of other changes to
chain handling, but it's a cheap check, and the expanded
comments are still useful.

It might be possible to loosen the hasOneUse() requirement with a
deeper analysis, but a naive implementation of that check would be
expensive.

Differential Revision: https://reviews.llvm.org/D29845

llvm-svn: 298156
2017-03-17 22:15:50 +00:00
Matt Arsenault e70d5dcf3e AMDGPU: Fix handling of constant phi input loop conditions
If the loop condition was an i1 phi with a constantexpr input, this
would add a loop intrinsic fed by a phi dependent on a call to
if.break in the same block. Insert the call in the loop header.

llvm-svn: 298121
2017-03-17 20:52:21 +00:00
Sanjay Patel 455703a0c6 [x86] clean up setcc with negated operand transform and add missing test; NFCI
llvm-svn: 298118
2017-03-17 20:29:40 +00:00
Reid Kleckner edf1cbb580 [X86] Emit fewer instructions to allocate >16GB stack frames
Summary:
Use this code pattern when RAX is live, instead of emitting up to 2
billion adjustments:
  pushq %rax
  movabsq +-$Offset+-8, %rax
  addq %rsp, %rax
  xchg %rax, (%rsp)
  movq (%rsp), %rsp

Try to clean this code up a bit while I'm here. In particular, hoist the
logic that handles the entire adjustment with `movabsq $imm, %rax` out
of the loop.

This negates the offset in the prologue and uses ADD because X86 only
has a two operand subtract which always subtracts from the destination
register, which can no longer be RSP.

Fixes PR31962

Reviewers: majnemer, sdardis

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30052

llvm-svn: 298116
2017-03-17 20:25:49 +00:00
Jun Bum Lim 4230101def [CodeGenPrep]Restructure promoting Ext to form ExtLoad
Summary:
Instead of just looking for a load which is mergable with Ext to form ExtLoad, trying to promote Exts as long as the cost is acceptable. This change is not a NFC as it continue promoting Exts even after finding a load during promotions; the change in arm64-codegen-prepare-extload.ll described in 2.b might show the case.
This change was motivated from D26524.  Based on this change, I will move the transformation performed in aarch64-type-promotion into CGP.

Reviewers: jmolloy, qcolombet, mcrosier, javed.absar

Reviewed By: qcolombet

Subscribers: rengolin, llvm-commits, aemerson

Differential Revision: https://reviews.llvm.org/D27853

llvm-svn: 298114
2017-03-17 19:05:21 +00:00
Simon Pilgrim 5a68d401c7 [SelectionDAG] Add SelectionDAG.computeKnownBits test support for ISD::ABS
llvm-svn: 298108
2017-03-17 17:45:36 +00:00
Sanjay Patel 25bd713d33 [x86] avoid adc/sbb assert when both sides of add are zexted (PR32316)
As noted in the comment, we might want to account for this case,
but I didn't look at what that would mean for the asm. 

I'm also not sure why this only reproduces with avx512, but I'm 
putting a conservative fix in for now to avoid the crash. 

Also, if both sides of an add are zexted, shouldn't we shrink that add?

https://bugs.llvm.org/show_bug.cgi?id=32316

llvm-svn: 298107
2017-03-17 17:27:31 +00:00
Simon Pilgrim d06b025c9c [X86] Add SelectionDAG.computeKnownBits test showing inability to handle ISD::ABS
We have to be careful as abs(INT_MIN) == INT_MIN.

llvm-svn: 298103
2017-03-17 16:58:15 +00:00
Chad Rosier a69dcb6b66 [AArch64] Use alias analysis in the load/store optimization pass.
This allows the optimization to rearrange loads and stores more aggressively.

Differential Revision: http://reviews.llvm.org/D30903

llvm-svn: 298092
2017-03-17 14:19:55 +00:00
Craig Topper a8d4097445 [AVX-512] Make VEX encoded FMA instructions available when AVX512 is enabled regardless of whether +fma was added on the command line.
We weren't able to handle isel of the 128/256-bit FMA instructions when AVX512F was enabled but VLX and FMA weren't.

I didn't mask FeatureAVX512 imply FeatureFMA as I wasn't sure I wanted disabling FMA to also disable AVX512. Instead we just can't prevent FMA instructions if AVX512 is enabled.

Another option would be to promote 128/256-bit to 512-bit, do the operation and extract it. But that requires a lot of extra isel patterns. Since no CPUs exist that support AVX512, but not FMA just using the VEX instructions seems better.

llvm-svn: 298051
2017-03-17 07:37:31 +00:00
Jonas Paulsson f496bd9a59 [SystemZ] New CodeGen tests for vector compare / select.
New SystemZ tests for the improved codegen of vector compare and select,
including cases with a logical combination of two compares.

Review: Ulrich Weigand.
https://reviews.llvm.org/D29489

llvm-svn: 298049
2017-03-17 07:11:46 +00:00
Jonas Paulsson 8a7bd24c82 [SystemZ] Add use of super-reg in splitMove()
If one of the subregs of the 128 bit reg is undefined when splitMove() splits
a store into two instructions, a use of an undefined physical register
results.

To remedy this, an implicit use of the super register is added onto both new
instructions, along with propagated kill and undef flags.

This was discovered with llvm-stress, and that test case is attached as
test/CodeGen/SystemZ/splitMove_undefReg_mverifier.ll

Thanks to Matthias Braun for helping with a nice explanation.

Review: Ulrich Weigand
llvm-svn: 298047
2017-03-17 06:47:08 +00:00
Craig Topper 6a1290a0fd [AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX.
We were giving priority if VLX was enabled.

llvm-svn: 298046
2017-03-17 06:10:37 +00:00
Craig Topper c1338f21ed [X86] Use update_llc_test_checks.py to regenerate a test and add command lines to demonstrate that we don't pick EVEX encoded instruction when AVX512 and FMA3 are both enabled.
This bug only exists on the scalar llvm.fma instrinsics. Looks like we don't test the llvm.fma intrinsics very thoroughly. In fact I don't see any tests for the vector versions.

llvm-svn: 298045
2017-03-17 06:00:01 +00:00
Craig Topper 30c89eeeb6 [X86] Use update_llc_test_checks.py to regenerate a test.
llvm-svn: 298044
2017-03-17 05:59:57 +00:00
Matthias Braun f0b68d3fbc SplitKit: Correctly implement partial subregister copies
- This fixes a bug where subregister incompatible with the vregs register
  class where used.
- Implement the case where multiple copies are necessary to cover a
  given lanemask.

Differential Revision: https://reviews.llvm.org/D30438

llvm-svn: 298025
2017-03-17 00:41:39 +00:00
Eli Friedman da228fee0c [ARM] Use alias analysis in ARMPreAllocLoadStoreOpt.
This allows the optimization to rearrange loads and stores more
aggressively. This doesn't really affect performance, but it helps
codesize.

Differential Revision: https://reviews.llvm.org/D30839

llvm-svn: 298021
2017-03-17 00:34:26 +00:00
Kyle Butt d609d6ebf0 CodeGen: BlockPlacement: Adjust test case so it covers rL297925. NFC
I had ajusted the test case before when testing a chain of length 2, and then
reverted it with rL296845 when I switched to 3 triangles. After running
benchmarks and examining generated code at length 2 I forgot to put the test
back.

llvm-svn: 298000
2017-03-16 21:33:29 +00:00
Daniel Sanders 16846764d0 [globalisel] Correct one more simple immediate that should be a ConstantInt.
llvm-svn: 297979
2017-03-16 19:59:19 +00:00
Craig Topper 96efc5c227 [AVX-512] Add tests for kandn, kor, kxor, and kxnor intrinsics.
llvm-svn: 297978
2017-03-16 19:58:06 +00:00
Daniel Sanders 0e64202871 [globalisel] Correct G_CONSTANT path of selectArithImmed()
Earlier stages of GlobalISel always use ConstantInt in G_CONSTANT so that's
what we should check for.

This fixes a crash introduced in r297782.

llvm-svn: 297968
2017-03-16 18:04:50 +00:00
Adrian Prantl 981f03e6a2 PR32288: More efficient encoding for DWARF expr subregister access.
Citing http://bugs.llvm.org/show_bug.cgi?id=32288

  The DWARF generated by LLVM includes this location:

  0x55 0x93 0x04 DW_OP_reg5 DW_OP_piece(4) When GCC's DWARF is simply
  0x55 (DW_OP_reg5) without the DW_OP_piece. I believe it's reasonable
  to assume the DWARF consumer knows which part of a register
  logically holds the value (low bytes, high bytes, how many bytes,
  etc) for a primitive value like an integer.

This patch gets rid of the redundant DW_OP_piece when a subregister is
at offset 0. It also adds previously missing subregister masking when
a subregister is followed by another operation.

(This reapplies r297960 with two additional testcase updates).

rdar://problem/31069390
https://reviews.llvm.org/D31010

llvm-svn: 297965
2017-03-16 17:14:56 +00:00
Stanislav Mekhanoshin f80507979d [AMDGPU] Run always inliner early in opt
We can mark functions to always inline early in the opt. Since we do not have
call support this early inlining creates opportunities for inter-procedural
optimizations which would not occur otherwise.

Differential Revision: https://reviews.llvm.org/D31016

llvm-svn: 297958
2017-03-16 16:11:46 +00:00
Simon Pilgrim e5d7e6f5e3 [X86] Add PR22338 test case
llvm-svn: 297957
2017-03-16 15:10:42 +00:00
Jonas Paulsson 84319bfc40 [SelectionDAG] Optimize VSELECT->SETCC of incompatible or illegal types.
Don't scalarize VSELECT->SETCC when operands/results needs to be widened,
or when the type of the SETCC operands are different from those of the VSELECT.

(VSELECT SETCC) and (VSELECT (AND/OR/XOR (SETCC,SETCC))) are handled.

The previous splitting of VSELECT->SETCC in DAGCombiner::visitVSELECT() is
no longer needed and has been removed.

Updated tests:

test/CodeGen/ARM/vuzp.ll
test/CodeGen/NVPTX/f16x2-instructions.ll
test/CodeGen/X86/2011-10-19-widen_vselect.ll
test/CodeGen/X86/2011-10-21-widen-cmp.ll
test/CodeGen/X86/psubus.ll
test/CodeGen/X86/vselect-pcmp.ll

Review: Eli Friedman, Simon Pilgrim
https://reviews.llvm.org/D29489

llvm-svn: 297930
2017-03-16 07:17:12 +00:00
Kyle Butt 08655997eb CodeGen: BlockPlacement: Reduce TriangleChainCount to 2
This produces a 1% speedup on an important internal Google benchmark
(protocol buffers), with no other regressions in google or in the llvm
test-suite. Only 5 targets in the entire llvm test-suite are affected,
and on those 5 targets the size increase is 0.027%

llvm-svn: 297925
2017-03-16 01:32:29 +00:00
Matt Arsenault 7dc01c96ae AMDGPU: Allow sinking of addressing modes for atomic_inc/dec
llvm-svn: 297913
2017-03-15 23:15:12 +00:00
Matt Arsenault 02d915be90 CodeGenPrepare: Sink addressing modes for atomics
llvm-svn: 297903
2017-03-15 22:35:20 +00:00
Zvi Rackover 9ad1b235d5 Second attempt for fix Hexagon buildbot by moving test to under X86/
llvm-svn: 297893
2017-03-15 21:13:45 +00:00
Zvi Rackover de43c859f5 Limit test's triple in attempt to fix broken buildbot
Regression test for a target-independent bug keeps failing in the
Hexagon backend due to what appears an unrelated issue.

llvm-svn: 297888
2017-03-15 20:29:58 +00:00
Zvi Rackover 48cdde0e59 [DAGCombine] Bail out if can't create a vector with at least two elements
Summary:

Fixes pr32278

Reviewers: igorb, craig.topper, RKSimon, spatel, hfinkel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30978

llvm-svn: 297878
2017-03-15 19:48:36 +00:00
Ahmed Bougacha 2fb8030748 [GlobalISel] Avoid translating synthetic constants to new G_CONSTANTS.
Currently, we create a G_CONSTANT for every "synthetic" integer
constant operand (for instance, for the G_GEP offset).
Instead, share the G_CONSTANTs we might have created by going through
the ValueToVReg machinery.

When we're emitting synthetic constants, we do need to get Constants from
the context.  One could argue that we shouldn't modify the context at
all (for instance, this means that we're going to use a tad more memory
if the constant wasn't used elsewhere), but constants are mostly
harmless.  We currently do this for extractvalue and all.

For constant fcmp, this does mean we'll emit an extra COPY, which is not
necessarily more optimal than an extra materialized constant.
But that preserves the current intended design of uniqued G_CONSTANTs,
and the rematerialization problem exists elsewhere and should be
resolved with a single coherent solution.

llvm-svn: 297875
2017-03-15 19:21:11 +00:00
Ahmed Bougacha 62cd73d989 [GlobalISel][AArch64] Select ADDXri.
We're now able to select ADDWri thanks to the new complex pattern
support.  Extend that to ADDXri.

llvm-svn: 297874
2017-03-15 19:20:59 +00:00
Matt Arsenault 86e02ce2dc AMDGPU: Fix unnecessary ands when packing f16 vectors
computeKnownBits didn't handle fp_to_fp16 to report
the high bits as 0. ARM maps the generic node to an instruction
that does not modify the high bits of the register, so introduce
a target node where the high bits are known 0.

llvm-svn: 297873
2017-03-15 19:04:26 +00:00
Tim Northover 0d98b03b9f ARM: avoid clobbering register in v6 jump-table expansion.
If we got unlucky with register allocation and actual constpool placement, we
could end up producing a tTBB_JT with an index that's already been clobbered.

Technically, we might be able to fix this situation up with a MOV, but I think
the constant islands pass is complex enough without having to deal with more
weird edge-cases.

llvm-svn: 297871
2017-03-15 18:38:13 +00:00
Ahmed Bougacha 07f247b6c2 [GlobalISel] Insert translated switch icmp blocks after switch parent.
Now that we preserve the IR layout, we would end up with all the newly
synthesized switch comparison blocks at the end of the function.
Instead, use a hopefully more reasonable layout, with the comparison
blocks immediately following the switch comparison blocks.

llvm-svn: 297869
2017-03-15 18:22:37 +00:00
Ahmed Bougacha a61c214f51 [GlobalISel] Preserve IR block layout.
It makes the output function layout more predictable;  the layout has
an effect on performance, we don't want it to be at the mercy of the
translator's visitation order and such.
The predictable output is also easier to digest.

getOrCreateBB isn't appropriately named anymore, as it never needs to
create anything.  Rename it and extract the MBB creation logic out of it.

A couple tests were sensitive to the order. Update them.

llvm-svn: 297868
2017-03-15 18:22:33 +00:00
Ahmed Bougacha 1a6deeefe0 [GlobalISel][AArch64] Add back constant select tests. NFC.
More of r297856.

llvm-svn: 297859
2017-03-15 16:51:41 +00:00
Ahmed Bougacha d691cf731c [GlobalISel][AArch64] Use appropriate test function names. NFC.
These FP tests are on FPR, not GPR.  Don't lie in the name.

llvm-svn: 297857
2017-03-15 16:29:40 +00:00
Ahmed Bougacha 170778f0db [GlobalISel][AArch64] Split out select tests. NFC.
The test has grown enough to be annoying to navigate.
While there, Remove unnecessary RUNs, and cleanup a couple comments.

llvm-svn: 297856
2017-03-15 16:29:37 +00:00
Peter Collingbourne d44a01aae6 CodeGen: Use the source filename as the argument to .file, rather than the module ID.
Using the module ID here is wrong for a couple of reasons:
1) The module ID is not persisted, so we can end up with different
   object file contents given the same input file (for example if the same
   file is accessed via different paths).
2) With ThinLTO the module ID field may contain the path to a bitcode file,
   which is incorrect, as the .file argument is supposed to contain the path to
   a source file.

Differential Revision: https://reviews.llvm.org/D30584

llvm-svn: 297853
2017-03-15 16:24:52 +00:00
Simon Pilgrim 018eedd9a5 [SelectionDAG] Support BUILD_VECTOR implicit truncation in SelectionDAG::ComputeNumSignBits (PR32273)
llvm-svn: 297852
2017-03-15 16:22:24 +00:00
Nemanja Ivanovic ffcf0fb1cc [PowerPC][Altivec] Add mfvrd and mffprd extended mnemonic
mfvrd and mffprd are both alias to mfvrsd.
This patch enables correct parsing of the aliases, but we still emit a mfvrsd.

Committing on behalf of brunoalr (Bruno Rosa).

Differential Revision: https://reviews.llvm.org/D29177

llvm-svn: 297849
2017-03-15 16:04:53 +00:00
Simon Pilgrim a5f332edd1 [SelectionDAG][AArch64] Add test case showing incorrect SelectionDAG::ComputeNumSignBits BUILD_VECTOR handling
Reduced from a mixture of PR32273 and David Green's test cases showing SelectionDAG::ComputeNumSignBits not correctly handling BUILD_VECTOR implicit truncation of inputs.

llvm-svn: 297847
2017-03-15 15:40:34 +00:00
Artyom Skrobov e72e1ba434 Revert "[Thumb1] Fix the bug when adding/subtracting -2147483648"
This reverts r297820 which apparently fails on A15 hosts.

llvm-svn: 297842
2017-03-15 14:50:43 +00:00
Eric Liu 8f49635500 Add 'REQUIRES: asserts' to pr32278.ll introduced in r297822
llvm-svn: 297835
2017-03-15 13:37:20 +00:00
Simon Pilgrim 493f4462bf [X86][SSE] Fixed shuffle MOVSS/MOVSD combining of all zeroable inputs
Turns out it can happen, so the assertion was too harsh

Found during fuzz testing

llvm-svn: 297833
2017-03-15 13:16:46 +00:00
Petar Jovanovic b71386a4a4 [Mips] Add support to match more patterns for DEXT and CINS
This patch adds support for recognizing more patterns to match to DEXT and
CINS instructions.
It finds cases where multiple instructions could be replaced with a single
DEXT or CINS instruction.

For example, for the following:

define i64 @dext_and32(i64 zeroext %a) {
entry:

 %and = and i64 %a, 4294967295
 ret i64 %and
}

instead of generating:

 0000000000000088 <dext_and32>:

 88:   64010001        daddiu  at,zero,1
 8c:   0001083c        dsll32  at,at,0x0
 90:   6421ffff        daddiu  at,at,-1
 94:   03e00008        jr      ra
 98:   00811024        and     v0,a0,at
 9c:   00000000        nop

the following gets generated:

 0000000000000068 <dext_and32>:

 68:   03e00008        jr      ra
 6c:   7c82f803        dext    v0,a0,0x0,0x20

Cases that are covered:

DEXT:

 1. and $src, mask where mask > 0xffff
 2. zext $src zero extend from i32 to i64

CINS:

 1. and (shl $src, pos), mask
 2. shl (and $src, mask), pos
 3. zext (shl $src, pos) zero extend from i32 to i64

Patch by Violeta Vukobrat.

Differential Revision: https://reviews.llvm.org/D30464

llvm-svn: 297832
2017-03-15 13:10:08 +00:00
Zvi Rackover 4aacd5d3c4 Fix malformed XFAIL in previous commit
llvm-svn: 297823
2017-03-15 11:44:14 +00:00
Zvi Rackover 81f7b88910 [DAGCombine] Add reproducer for pr32278
llvm-svn: 297822
2017-03-15 11:34:51 +00:00
Artyom Skrobov 3fa5fd1dd2 [Thumb1] Fix the bug when adding/subtracting -2147483648
Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297820
2017-03-15 10:19:16 +00:00
Sam Parker 654cb8263a [ARM] Enable SMLAL[B|T] isel
Enable the selection of the 64-bit signed multiply accumulate
instructions which operate on 16-bit operands. These are enabled for
ARMv5TE onwards for ARM and for V6T2 and other DSP enabled Thumb
architectures.

Differential Revision: https://reviews.llvm.org/D30044

llvm-svn: 297809
2017-03-15 08:27:11 +00:00
Taewook Oh fb1833efeb [BranchFolding] Merge debug locations from common tail instead of removing
Summary: D25742 improved the precision of debug locations for PGO by removing debug locations from common tail when tail-merging. However, if identical insturctions that are merged into a common tail have the same debug locations, there's no need to remove them. This patch creates a merged debug location of identical instructions across SameTails and assign it to the instruction in the common tail, so that the debug locations are maintained if they are same across identical instructions.

Reviewers: aprantl, probinson, MatzeB, rob.lougher

Reviewed By: aprantl

Subscribers: andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D30226

llvm-svn: 297805
2017-03-15 05:44:59 +00:00
Peter Collingbourne 7f6e2c97b8 Ensure that prefix data is preserved with subsections-via-symbols
On MachO platforms that use subsections-via-symbols dead code stripping will
drop prefix data. Unfortunately there is no great way to convey the relationship
between a function and its prefix data to the linker. We are forced to use a bit
of a hack: we give the prefix data it’s own symbol, and mark the actual function
entry an .alt_entry.

Patch by Moritz Angermann!

Differential Revision: https://reviews.llvm.org/D30770

llvm-svn: 297804
2017-03-15 04:18:16 +00:00
Volkan Keles 4862c63594 [GlobalISel] IRTranslator: Return the scalar for <1 x Ty> constant vectors
Summary:
<1 x Ty> is not a legal vector type in LLT, we shouldn’t build G_MERGE_VALUES
instruction for them.

Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, ab, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30948

llvm-svn: 297792
2017-03-14 23:45:06 +00:00
Daniel Sanders 8a4bae9993 [globalisel][tblgen] Add support for ComplexPatterns
Summary:
Adds a new kind of MachineOperand: MO_Placeholder.
This operand must not appear in the MIR and only exists as a way of
creating an 'uninitialized' operand until a matcher function overwrites it.

Depends on D30046, D29712

Reviewers: t.p.northover, ab, rovka, aditya_nandakumar, javed.absar, qcolombet

Reviewed By: qcolombet

Subscribers: dberris, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D30089

llvm-svn: 297782
2017-03-14 21:32:08 +00:00
Simon Pilgrim cf2da96c82 [SelectionDAG] Add a signed integer absolute ISD node
Reduced version of D26357 - based on the discussion on llvm-dev about canonicalization of UMIN/UMAX/SMIN/SMAX as well as ABS I've reduced that patch to just the ABS ISD node (with x86/sse support) to improve basic combines and lowering.

ARM/AArch64, Hexagon, PowerPC and NVPTX all have similar instructions allowing us to make this a generic opcode and move away from the hard coded tablegen patterns which makes it tricky to match more complex patterns.

At the moment this patch doesn't attempt legalization as we only create an ABS node if its legal/custom.

Differential Revision: https://reviews.llvm.org/D29639

llvm-svn: 297780
2017-03-14 21:26:58 +00:00
Sanjay Patel 8dd99dce6c [DAG] vector div/rem with any zero element in divisor is undef
This is the backend counterpart to:
https://reviews.llvm.org/rL297390
https://reviews.llvm.org/rL297409
and follow-up to:
https://reviews.llvm.org/rL297384

It surprised me that we need to duplicate the check in FoldConstantArithmetic and FoldConstantVectorArithmetic, 
but one or the other doesn't catch all of the test cases. There is an existing code comment about merging those 
someday.

Differential Revision: https://reviews.llvm.org/D30826

llvm-svn: 297762
2017-03-14 18:06:28 +00:00
Simon Pilgrim 3a196cbc4f [X86] Add extra BITREVERSE tests
Test on 32-bit and 64-bit targets.

Add bitreverse tests for i64, i32 and i16

llvm-svn: 297741
2017-03-14 14:03:16 +00:00
Simon Pilgrim e1a72a936f [X86][MMX] Update FIXME comment. NFCI.
llvm-svn: 297736
2017-03-14 12:13:41 +00:00
Sam Parker 916b1ba617 [ARM] Move SMULW[B|T] isel to DAG Combine
Create nodes for smulwb and smulwt and move their selection from
DAGToDAG to DAG combine. smlawb and smlawt can then be selected
using tablegen. Added some helper functions to detect shift patterns
as well as a wrapper around SimplifyDemandBits. Added a couple of
extra tests.

Differential Revision: https://reviews.llvm.org/D30708

llvm-svn: 297716
2017-03-14 09:13:22 +00:00
Oren Ben Simhon fe34c5e429 Disable Callee Saved Registers
Each Calling convention (CC) defines a static list of registers that should be preserved by a callee function. All other registers should be saved by the caller.
Some CCs use additional condition: If the register is used for passing/returning arguments – the caller needs to save it - even if it is part of the Callee Saved Registers (CSR) list.
The current LLVM implementation doesn’t support it. It will save a register if it is part of the static CSR list and will not care if the register is passed/returned by the callee.
The solution is to dynamically allocate the CSR lists (Only for these CCs). The lists will be updated with actual registers that should be saved by the callee.
Since we need the allocated lists to live as long as the function exists, the list should reside inside the Machine Register Info (MRI) which is a property of the Machine Function and managed by it (and has the same life span).
The lists should be saved in the MRI and populated upon LowerCall and LowerFormalArguments.
The patch will also assist to implement future no_caller_saved_regsiters attribute intended for interrupt handler CC.

Differential Revision: https://reviews.llvm.org/D28566

llvm-svn: 297715
2017-03-14 09:09:26 +00:00
Craig Topper 7a5ee1c5ed [AVX-512] Use iPTR instead of i64 in patterns for extract_subvector/insert_subvector index.
llvm-svn: 297707
2017-03-14 06:40:04 +00:00
Craig Topper b0a82eaea6 [AVX-512] Add test cases that demonstrate some patterns that don't work correctly in 32-bit mode. NFC
llvm-svn: 297706
2017-03-14 06:40:00 +00:00
Nirav Dave 4fc8401abf Recommitting Craig Topper's patch now that r296476 has been recommitted.
When checking if chain node is foldable, make sure the intermediate nodes have a single use across all results not just the result that was used to reach the chain node.

This recovers a test case that was severely broken by r296476, my making sure we don't create ADD/ADC that loads and stores when there is also a flag dependency.

llvm-svn: 297698
2017-03-14 01:42:23 +00:00
Nirav Dave 54e22f33d9 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements

    Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.

    * Simplify Consecutive Merge Store Candidate Search

    Now that address aliasing is much less conservative, push through
    simplified store merging search and chain alias analysis which only
    checks for parallel stores through the chain subgraph. This is cleaner
    as the separation of non-interfering loads/stores from the
    store-merging logic.

    When merging stores search up the chain through a single load, and
    finds all possible stores by looking down from through a load and a
    TokenFactor to all stores visited.

    This improves the quality of the output SelectionDAG and the output
    Codegen (save perhaps for some ARM cases where we correctly constructs
    wider loads, but then promotes them to float operations which appear
    but requires more expensive constant generation).

    Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)

    Additional Minor Changes:

      1. Finishes removing unused AliasLoad code

      2. Unifies the chain aggregation in the merged stores across code
         paths

      3. Re-add the Store node to the worklist after calling
         SimplifyDemandedBits.

      4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
         arbitrary, but seems sufficient to not cause regressions in
         tests.

      5. Remove Chain dependencies of Memory operations on CopyfromReg
         nodes as these are captured by data dependence

      6. Forward loads-store values through tokenfactors containing
          {CopyToReg,CopyFromReg} Values.

      7. Peephole to convert buildvector of extract_vector_elt to
         extract_subvector if possible (see
         CodeGen/AArch64/store-merge.ll)

      8. Store merging for the ARM target is restricted to 32-bit as
         some in some contexts invalid 64-bit operations are being
         generated. This can be removed once appropriate checks are
         added.

    This finishes the change Matt Arsenault started in r246307 and
    jyknight's original patch.

    Many tests required some changes as memory operations are now
    reorderable, improving load-store forwarding. One test in
    particular is worth noting:

      CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store
      forwarding converts a load-store pair into a parallel store and
      a memory-realized bitcast of the same value. However, because we
      lose the sharing of the explicit and implicit store values we
      must create another local store. A similar transformation
      happens before SelectionDAG as well.

    Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

llvm-svn: 297695
2017-03-14 00:34:14 +00:00
Artyom Skrobov bf19d4bc29 [Thumb1] combine ADDC/SUBC with a negative immediate
Summary: This simple optimization has been split out of https://reviews.llvm.org/D30400

Reviewers: efriedma, jmolloy

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30829

llvm-svn: 297682
2017-03-13 22:36:14 +00:00
Craig Topper 784f241b59 [AVX-512] Fix another case where we are copying from a mask register using AH/BH/CH/DH with fastisel.
Fixes PR32256. Still planning to do an audit for other possible cases.

llvm-svn: 297678
2017-03-13 21:58:54 +00:00
Volkan Keles 38a91a0de6 GlobalISel: Translate ConstantDataVector
Reviewers: qcolombet, aditya_nandakumar, dsanders, t.p.northover, javed.absar, ab

Reviewed By: qcolombet, dsanders, ab

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30216

llvm-svn: 297670
2017-03-13 21:36:19 +00:00
Tim Northover 55e6f10d69 Revert "GlobalISel: move vector extract/insert inside generic opcode region."
I was writing against an earlier branch and Volkan had already fixed this.

llvm-svn: 297668
2017-03-13 21:25:10 +00:00
Simon Pilgrim 9df7d08cb2 [X86][MMX] Fix folding of shift value loads to cover whole 64-bits
rL230225 made the assumption that only the lower 32-bits of an MMX register load is used as a shift value, when in fact the whole 64-bits are reloaded and treated as a i64 to determine the shift value.

This patch reverts rL230225 to ensure that the whole 64-bits of memory are folded and ensures that the upper 32-bit are zero'd for cases where the shift value has come from a scalar source.

Found during fuzz testing.

Differential Revision: https://reviews.llvm.org/D30833

llvm-svn: 297667
2017-03-13 21:23:29 +00:00
Tim Northover 0f1d32d557 GlobalISel: move vector extract/insert inside generic opcode region.
Otherwise they won't be legalized or selected, causing instruction selection to
fail horribly.

llvm-svn: 297666
2017-03-13 21:18:59 +00:00
Andrew Kaylor a11d020699 Revert r295004 (Add MXCSR) due to errors reported by MachineVerifier
I am leaving the code in clang which filters mxcsr from the clobber list because that is still technically correct and will be useful again when the MXCSR register is reintroduced.

llvm-svn: 297664
2017-03-13 20:35:10 +00:00
Matt Arsenault 971c85ebb4 AMDGPU: Treat 0 as private null pointer in addrspacecast lowering
llvm-svn: 297658
2017-03-13 19:47:31 +00:00
Jessica Paquette c984e21394 [Outliner] Add tail call support
This commit adds tail call support to the MachineOutliner pass. This allows
the outliner to insert jumps rather than calls in areas where tail calling is
possible. Outlined tail calls include the return or terminator of the basic
block being outlined from.

Tail call support allows the outliner to take returns and terminators into
consideration while finding candidates to outline. It also allows the outliner
to save more instructions. For example, in the X86-64 outliner, a tail called
outlined function saves one instruction since no return has to be inserted.

llvm-svn: 297653
2017-03-13 18:39:33 +00:00
Craig Topper 616641632e [X86] Lower AVX2 gather intrinsics similar to AVX-512. Apply the same input source optimizations to break execution dependencies.
For AVX-512 we force the input to zero if the input is undef or the mask is all ones to break an execution dependency. This patch brings the same behavior to AVX2.

llvm-svn: 297652
2017-03-13 18:34:46 +00:00
Craig Topper eb7ea28bdd [AVX-512] If gather mask is all ones, force the input to a zero vector.
We were already forcing undef inputs to become a zero vector, this now catches an all ones mask too.

Ideally we'd use undef and let execution dep fix handle picking the best register/clearance for the undef, but I don't think it can handle the early clobber today.

llvm-svn: 297651
2017-03-13 18:17:46 +00:00
Diana Picus 94db2e288b [ARM] GlobalISel: Support SP in regbankselect
We used to hit an unreachable in getRegBankFromRegClass when dealing with the
stack pointer. This commit adds support for the GPRsp reg class.

llvm-svn: 297621
2017-03-13 14:28:34 +00:00
Craig Topper 7746565754 [AVX-512] Add EVEX2VEX test cases for the cvt instructions fixed in r297599 and r297600.
llvm-svn: 297603
2017-03-13 05:47:56 +00:00
Craig Topper bb4089d260 Revert "[AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead."
This reverts r297596.

There were other issues that were making this not work that have been fixed now. Reverting this results in a more accurate table.

llvm-svn: 297602
2017-03-13 05:34:03 +00:00
Craig Topper 166085f0f2 [AVX-512] EVEX2VEX, don't reject intrinsic instructions when both have a memory operand. We should just continue to check other operands instead.
This exposed that we have several intrinsic instructions that have identical TSFlags to other instructions. We should merge their patterns and kill of the duplicate. I'll fix that in a follow up patch.

llvm-svn: 297596
2017-03-13 00:36:49 +00:00
Craig Topper 7d56c8315b [AVX-512] Fix the valid immediates for the scatter/gather prefetch intrinsics.
The immediate should be 1 or 2, not 0 or 1. This was found while adding bounds checking to clang. In fact the existing clang builtin test failed if we ran it all the way to assembly.

llvm-svn: 297591
2017-03-12 22:29:12 +00:00
Sanjay Patel f06b963a2b [x86] don't blindly transform SETB into SBB
I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. 
This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.

This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it
is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that
existing behavior in this patch.

Also, I'm skeptical that sbb/adc are a win for all micro-arches, so I added comments to the test files
where this transform still fires.

The test changes here are all cases where we no longer produce sbb/adc. Avoiding partial register
stalls (generating an xor to clear a register) is not handled in some cases, but that's a separate
issue.

Differential Revision: https://reviews.llvm.org/D30611

llvm-svn: 297586
2017-03-12 18:28:48 +00:00
Azharuddin Mohammed 473b75c3d5 Remove CRC32 instructions from AArch64InstrInfo::hasShiftedReg
Summary:
A53 scheduler causes an assertion failure on all CRC instructions:
include/llvm/CodeGen/MachineInstr.h:280: const llvm::MachineOperand
&llvm::MachineInstr::getOperand(unsigned int) const: Assertion `i <
getNumOperands() && "getOperand() out of range!"' failed.

The case statements corresponding to CRC instructions are incorrect and should
be removed.

Also adding a testcase while on this.

Reviewers: t.p.northover, javed.absar, apazos, rengolin

Reviewed By: rengolin

Subscribers: evandro, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30274

llvm-svn: 297582
2017-03-12 14:02:32 +00:00
Igor Breger 293dfb9768 [X86] Add vector zext tests.
llvm-svn: 297581
2017-03-12 13:20:10 +00:00
Craig Topper 58647b16e5 [AVX-512] Fix a bad use of a high GR8 register after copying from a mask register during fast isel. This ends up extracting from bits 15:8 instead of the lower bits of the mask.
I'm pretty sure there are more problems lurking here. But I think this fixes PR32241.

I've added the test case from that bug and added asserts that will fail if we ever try to copy between high registers and mask registers again.

llvm-svn: 297574
2017-03-12 03:37:37 +00:00
Craig Topper e726cd0cd1 [AVX-512] Add test case for PR32241. Fix coming in another commit.
llvm-svn: 297573
2017-03-12 03:37:34 +00:00
Simon Pilgrim 18debfa5b4 [X86][SSE] Improve extraction of elements from v16i8 (pre-SSE41)
Without SSE41 (pextrb) we currently extract byte elements from a vector by spilling to stack and reloading the byte.

This patch is an initial attempt at using MOVD/PEXTRW to extract the relevant DWORD/WORD from the vector and then shift+truncate to collect the correct byte.

Extraction of multiple bytes this way would result in code bloat, but as explained in the patch we could probably afford to be more aggressive with the supported extractions before again falling back on spilling - possibly through counting the number of extracts and which DWORD/WORD they originate?

Differential Revision: https://reviews.llvm.org/D29841

llvm-svn: 297568
2017-03-11 20:42:31 +00:00
Craig Topper d511c2ce04 [X86] Add avx2 gather tests cases that show a failure to remove zeroing of the source when the mask is all ones.
llvm-svn: 297564
2017-03-11 18:26:00 +00:00
Matt Arsenault dd905b0e9b AMDGPU: Remove packf16 intrinsic
llvm-svn: 297557
2017-03-11 05:51:16 +00:00
Matt Arsenault 3cb9ff8863 AMDGPU: Keep track of modifiers when converting v_mac to v_mad
Since v_max_f32_e64/v_max_f16_e64 can be folded if the target
instruction supports the clamp bit, we also need to maintain
modifiers when converting v_mac to v_mad.

This fixes a rendering issue with Dirt Rally because a v_mac
instruction with the clamp bit set was converted to a v_mad
but that bit was lost during the conversion.

Fixes: e184e01dd79 ("AMDGPU: Fold FP clamp as modifier bit")

Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com>

llvm-svn: 297556
2017-03-11 05:40:40 +00:00
Stanislav Mekhanoshin 79da2a7698 [AMDGPU] Remove getBidirectionalReasonRank
This method inverts the Reason field of a scheduling candidate.
It does right comparison between RegCritical and RegExcess, but
everything else is broken. In fact it can prefer less strong reason
such as Weak over RegCritical because Weak > -RegCritical.

The CandReason enum is properly sorted, so just remove artificial
ranking.

Differential Revision: https://reviews.llvm.org/D30557

llvm-svn: 297536
2017-03-11 00:29:27 +00:00
Krzysztof Parzyszek 0e7b1f83b7 [RDF] Remove the map of reaching defs from copy propagation
Use Liveness::getNearestAliasedRef to find the reaching def instead.

llvm-svn: 297526
2017-03-10 22:44:24 +00:00
Simon Pilgrim 128a10a41d [X86][SSE] Fix load folding for (V)CVTDQ2PD
This only requires a 64-bit memory source, not the whole 128-bits. But the 128-bit case is still supported via X86InstrInfo::foldMemoryOperandImpl

llvm-svn: 297523
2017-03-10 22:35:07 +00:00
Simon Pilgrim 9956661456 [X86][RTM] Regenerate RTM intrinsic tests for 32/64-bit targets.
llvm-svn: 297518
2017-03-10 21:55:24 +00:00
Volkan Keles 970fee4bfe GlobalISel: Translate ConstantAggregateZero vectors
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30259

llvm-svn: 297509
2017-03-10 21:23:13 +00:00
Volkan Keles 04cb08cc83 [GlobalISel] Translate insertelement and extractelement
Reviewers: qcolombet, aditya_nandakumar, dsanders, ab, t.p.northover, javed.absar

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30761

llvm-svn: 297495
2017-03-10 19:08:28 +00:00
Simon Pilgrim 7dedbfa89d [SelectionDAG] Add support for BUILD_VECTOR to ComputeNumSignBits
llvm-svn: 297492
2017-03-10 18:36:46 +00:00
Simon Pilgrim e54cd65399 [X86][SSE] Added tests showing missed truncations for sitofp conversion
SelectionDAG::ComputeNumSignBits is poor at build_vector handling, meaning that we can't see that all the vXi64 sources are in fact sign extended i32 or smaller.

llvm-svn: 297486
2017-03-10 18:01:53 +00:00
Amaury Sechet 62e0759d56 [SelectionDAG] Make SelectionDAG aware of the known bits in USUBO and SSUBO and SUBC.
Summary:
Depends on D30379

This improves the state of things for the sub class of operation.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30436

llvm-svn: 297482
2017-03-10 17:26:44 +00:00
Simon Pilgrim ed655f09db [X86][MMX] Add tests showing missed opportunities to use MMX sitofp conversions
If we are transferring MMX registers to XMM for conversion we could use the MMX equivalents (CVTPI2PD + CVTPI2PS) without affecting rounding/exceptions etc.

llvm-svn: 297481
2017-03-10 17:23:55 +00:00
Amaury Sechet 69fa16c810 [SelectionDAG] Make SelectionDAG aware of the known bits in UADDO and SADDO.
Summary: As per title. This is extracted from D29872 and I threw SADDO in.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30379

llvm-svn: 297479
2017-03-10 17:06:52 +00:00
Simon Pilgrim c6b55729a5 [X86][MMX] Add tests showing missed opportunities to use MMX fptosi conversions
If we are transferring XMM conversion results to MMX registers we could use the MMX equivalents (CVTPD2PI/CVTTPD2PI + CVTPS2PI/CVTTPS2PI) with affecting rounding/expections etc.

llvm-svn: 297476
2017-03-10 16:59:43 +00:00
Simon Pilgrim b8856148d9 [X86][MMX] Updated bad stack spill shift value test to actually show the problem
Cleaning up the ir had stopped showing the issue.

llvm-svn: 297475
2017-03-10 16:18:50 +00:00
Simon Pilgrim 67d25b298a [X86][MMX] Regenerate mmx bitcast tests
llvm-svn: 297474
2017-03-10 16:07:39 +00:00
Simon Pilgrim caa9172ba7 [X86][MMX] Add test showing bad stack spill of shift value
i32 is spilled to stack but 64-bit mmx is reloaded - leaving garbage in the other half of the register

llvm-svn: 297471
2017-03-10 15:53:41 +00:00
Simon Pilgrim 63ad95aee6 [X86][MMX] Regenerate mmx load folding tests
llvm-svn: 297470
2017-03-10 15:41:05 +00:00
Simon Dardis 7090d145e8 [mips][msa] Accept more values for constant splats
This patches teaches the MIPS backend to accept more values for constant
splats. Previously, only 10 bit signed immediates or values that could be
loaded using an ldi.[bhwd] instruction would be acceptted. This patch relaxes
that constraint so that any constant value that be splatted is accepted.

As a result, the constant pool is used less for vector operations, and the
suite of bit manipulation instructions b(clr|set|neg)i can now be used with
the full range of their immediate operand.

Reviewers: slthakur

Differential Revision: https://reviews.llvm.org/D30640

llvm-svn: 297457
2017-03-10 13:27:14 +00:00
Artyom Skrobov 0c93ceb5d8 For Thumb1, lower ADDC/ADDE/SUBC/SUBE via the glueless ARMISD nodes,
same as already done for ARM and Thumb2.

Reviewers: jmolloy, rogfer01, efriedma

Subscribers: aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30400

llvm-svn: 297443
2017-03-10 07:40:27 +00:00
Sanjay Patel 65e2e6805a [x86] add tests for vec div/rem with 0 element in divisor; NFC
llvm-svn: 297433
2017-03-10 00:55:29 +00:00
Ahmed Bougacha 4ec6d5abed [GlobalISel] Fallback when failing to translate invoke.
We unintentionally stopped falling back in r293670.

While there, change an unusual construct.

llvm-svn: 297425
2017-03-10 00:25:35 +00:00
Tim Northover aa995c98f4 GlobalISel: support trivial inlineasm calls.
They're used for nefarious purposes by ObjC.

llvm-svn: 297422
2017-03-09 23:36:26 +00:00
Amaury Sechet e7d102cf02 [DAGCombiner] Do various combine on uaddo.
Summary: This essentially does the same transform as for ADC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30417

llvm-svn: 297416
2017-03-09 22:47:00 +00:00
Krzysztof Parzyszek 544210304f [Hexagon] Fixes to the bitsplit generation
- Fix the insertion point, which occasionally could have been incorrect.
- Avoid creating multiple bitsplits with the same operands, if an old one
  could be reused.

llvm-svn: 297414
2017-03-09 22:02:14 +00:00
Tim Northover d1e951e5eb GlobalISel: inform FrameLowering when we emit a function call.
Amongst other things (I expect) this is necessary to ensure decent backtraces
when an "unreachable" is involved.

llvm-svn: 297413
2017-03-09 22:00:39 +00:00
Tim Northover 7a9ea8f628 GlobalISel: put debug info for static allocas in the MachineFunction.
The good reason to do this is that static allocas are pretty simple to handle
(especially at -O0) and avoiding tracking DBG_VALUEs throughout the pipeline
should give some kind of performance benefit.

The bad reason is that the debug pipeline is an unholy mess of implicit
contracts, where determining whether "DBG_VALUE %reg, imm" actually implies a
load or not involves the services of at least 3 soothsayers and the sacrifice
of at least one chicken.  And it still gets it wrong if the variable is at SP
directly.

llvm-svn: 297410
2017-03-09 21:12:06 +00:00
Amaury Sechet 10425de063 [DAGCombiner] Do various combine on usubo.
Summary: This essentially does the same transform as for SUBC.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30437

llvm-svn: 297404
2017-03-09 19:28:00 +00:00
Krzysztof Parzyszek 78c4fcf12e [Hexagon] Propagate zext of i1 into arithmetic code in selection DAG
(op ... (zext i1 c) ...) -> (select c (op ... 1 ...),
                                      (op ... 0 ...))

llvm-svn: 297391
2017-03-09 16:29:30 +00:00
Sam Parker b308b48d69 [ARM] Remove t2xtpk feature from tests
I previously removed the T2XtPk feature from the ARM backend, but it
looks like I missed some of the tests that were using the feature.

Differential Revision: https://reviews.llvm.org/D30778

llvm-svn: 297386
2017-03-09 15:14:32 +00:00
Sanjay Patel df21979db7 [DAG] recognize div/rem by 0 as undef before trying constant folding
As discussed in the review thread for rL297026, this is actually 2 changes that 
would independently fix all of the test cases in the patch:

1. Return undef in FoldConstantArithmetic for div/rem by 0.
2. Move basic undef simplifications for div/rem (simplifyDivRem()) before 
   foldBinopIntoSelect() as a matter of efficiency.

I will handle the case of vectors with any zero element as a follow-up. That change
is the DAG sibling for D30665 + adding a check of vector elements to FoldConstantVectorArithmetic().

I'm deleting the test for PR30693 because it does not test for the actual bug any more
(dangers of using bugpoint).

Differential Revision:
https://reviews.llvm.org/D30741

llvm-svn: 297384
2017-03-09 15:02:25 +00:00
Simon Dardis 7577ce2140 [mips] Revert fixes for PR32020.
The fix introduces segfaults and clobbers the value to be stored when
the atomic sequence loops.

Revert "[Target/MIPS] Kill dead code, no functional change intended."

This reverts commit r296153.

Revert "Recommit "[mips] Fix atomic compare and swap at O0.""

This reverts commit r296134.

llvm-svn: 297380
2017-03-09 14:03:26 +00:00
Simon Dardis 158956c6cc [mips] Fix return lowering
Fix a machine verifier issue where a instruction was using a invalid
register. The return pseudo is expanded and has the return address
register added to it. The return register may have been spuriously
mark as killed earlier.

This partially resolves PR/27458

Thanks to Quentin Colombet for reporting the issue!

llvm-svn: 297372
2017-03-09 11:19:48 +00:00
Adam Nemet 5361b82d54 [SSP] In opt remarks, stream Function directly
With this, it shows up as an attribute in YAML and non-printable characters
are properly removed by GlobalValue::getRealLinkageName.

llvm-svn: 297362
2017-03-09 06:10:27 +00:00
Matt Arsenault 9a3fd87523 DAG: Check no signed zeros instead of unsafe math attribute
llvm-svn: 297354
2017-03-09 01:36:39 +00:00
Tim Northover 7596bd7a27 GlobalISel: correctly handle trivial fcmp predicates.
It makes sense to only do them once in IRTranslator rather than making everyone
deal with them.

llvm-svn: 297304
2017-03-08 18:49:54 +00:00
Volkan Keles 5698b2ae6e [GlobalISel] Add default action for G_FNEG
Summary: rL297171 introduced G_FNEG for floating-point negation instruction and IRTranslator started to translate `FSUB -0.0, X` to `FNEG X`. This patch adds a default action for G_FNEG to avoid breaking existing targets.

Reviewers: qcolombet, ab, kristof.beyls, t.p.northover, aditya_nandakumar, dsanders

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits

Differential Revision: https://reviews.llvm.org/D30721

llvm-svn: 297301
2017-03-08 18:09:14 +00:00
Sanjay Patel 9f495695bb [x86] regenerate checks; NFC
This test could be reduced? The check fails for a seemingly unrelated change,
so I'm adding full checks to see what is happening.

llvm-svn: 297296
2017-03-08 17:19:56 +00:00
Krzysztof Parzyszek 1b7197e690 [Hexagon] Use correct offset when extracting from the high word
When extracting a bitfield from the high register in a register pair,
the final offset should be relative to the high register (for 32-bit
extracts).

llvm-svn: 297288
2017-03-08 15:46:28 +00:00
Daniel Cederman 9db582a656 [Sparc] Check register use with isPhysRegUsed() instead of reg_nodbg_empty()
Summary: By using reg_nodbg_empty() to determine if a function can be
treated as a leaf function or not, we miss the case when the register
pair L0_L1 is used but not L0 by itself. This has the effect that
use_all_i32_regs(), a test in reserved-regs.ll which tries to use all
registers, gets treated as a leaf function.

Reviewers: jyknight, venkatra

Reviewed By: jyknight

Subscribers: davide, RKSimon, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D27089

llvm-svn: 297285
2017-03-08 15:23:10 +00:00
Tim Shen c7472d912b Revert "Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size""
After inspection, it's an UB in our code base. Someone cast a var-arg
function pointer to a non-var-arg one. :/

Re-commit r296771 to continue testing on the patch.

Sorry for the trouble!

llvm-svn: 297256
2017-03-08 02:41:35 +00:00
Matt Arsenault 52d1b62a28 AMDGPU: Don't wait at end of block with a trivial successor
If there is only one successor, and that successor only
has one predecessor the wait can obviously be delayed until
uses or the end of the next block. This avoids code quality
regressions when there are trivial fallthrough blocks inserted
for structurization.

llvm-svn: 297251
2017-03-08 01:06:58 +00:00
Eli Friedman c2c2e21d77 [DAGCombine] Simplify ISD::AND in GetDemandedBits.
This helps in cases involving bitfields where an AND is exposed by
legalization.

Differential Revision: https://reviews.llvm.org/D30472

llvm-svn: 297249
2017-03-08 00:56:35 +00:00
Matt Arsenault d8ed207a20 AMDGPU: Constant fold rcp node
When doing arcp optimization with a constant denominator,
this was leaving behind rcps with constant inputs.

llvm-svn: 297248
2017-03-08 00:48:46 +00:00
Changpeng Fang 6b49fa4ca7 AMDGPU/SI: Do not insert EndCf in an unreachable block
Reviewers:
  arsenm

Differential Revision:
  http://reviews.llvm.org/D22025

llvm-svn: 297243
2017-03-07 23:29:36 +00:00
Krzysztof Parzyszek 434d50a796 [Hexagon] Check for presence before looking registers up in bit tracker
llvm-svn: 297240
2017-03-07 23:12:04 +00:00
Krzysztof Parzyszek 8e4d2e0512 [Hexagon] Generate bitsplit instruction
llvm-svn: 297239
2017-03-07 23:08:35 +00:00
Tim Northover 542d1c1463 GlobalISel: use inserts for landingpad instead of sequences.
llvm-svn: 297237
2017-03-07 23:04:06 +00:00
Tim Northover 2eb18d3c4b GlobalISel: fix legalization of G_INSERT
We were calculating incorrect extract/insert offsets by trying to be too
tricksy with min/max. It's clearer to just split the logic up into "register
starts before this segment" vs "after".

llvm-svn: 297226
2017-03-07 21:24:33 +00:00
Ahmed Bougacha 55d10423a6 [GlobalISel] Don't translate intrinsics with metadata parameters.
Some intrinsics take metadata parameters.  These all need custom
handling of some form, and cannot possibly be lowered generically to
G_INTRINSIC calls with vreg operands.
Reject them, instead of hitting an assert later in getOrCreateVReg.

llvm-svn: 297209
2017-03-07 20:53:09 +00:00
Ahmed Bougacha 5c7924fca5 [GlobalISel] Avoid invalidating ValToVReg when translating no-op bitcast.
When we translate a no-op (same type) bitcast, we try to be clever and
only emit a COPY if we already assigned a vreg to the defined value.
However, when we didn't, we tried to assign to a reference into the
ValToVReg DenseMap, even though the RHS of the assignment
(getOrCreateVReg) could potentially grow that DenseMap, invalidating the
reference.

Avoid that by getting the source vreg first.
I audited the rest of the translator; this is the only tricky case.

The test is quite unwieldy, as the problem is caused by the DenseMap
growing, which happens after the 47th mapped value.

llvm-svn: 297208
2017-03-07 20:53:06 +00:00
Ahmed Bougacha 38455ea8a6 [GlobalISel] Relax vector G_SELECT assertion.
For vector operands, the `select` instruction supports both vector and
non-vector conditions.  The MIR builder had an overly restrictive
assertion, that only accepted vector conditions for vector selects
(in effect implementing ISD::VSELECT).

Make it possible to express the full range of G_SELECTs.

llvm-svn: 297207
2017-03-07 20:53:03 +00:00
Ahmed Bougacha 70dd6c2212 [GlobalISel] Add vector select translation test. NFC.
llvm-svn: 297206
2017-03-07 20:53:00 +00:00
Ahmed Bougacha c373262d52 [GlobalISel] Ignore %noreg when applying default regbank mapping.
When computing the mapping for non-generic instructions, we skipped
%noreg operands, because we can't always reason about their banks.

Also skip them when applying the mapping.  Otherwise, we could end
up with mappings that we can't apply.

While there, duplicate an assert to distinguish between the two
error conditions.

llvm-svn: 297201
2017-03-07 20:34:23 +00:00
Ahmed Bougacha 4826bae8b4 [GlobalISel] Emit DBG_VALUE %noreg for non-int/fp constant values.
When a dbg_value has a constant operand that isn't representable in MI,
there isn't much we can do.  Use %noreg (0) for those situations.
This matches the SelectionDAG behavior.

llvm-svn: 297200
2017-03-07 20:34:20 +00:00
Ahmed Bougacha ab50ecb1c7 [GlobalISel] Add constant dbg.value translation tests. NFC.
llvm-svn: 297199
2017-03-07 20:34:13 +00:00
Artem Belevich 2524a22562 [NVPTX] Fixed lowering of unaligned loads/stores of f16 scalars and vectors.
Differential Revision: https://reviews.llvm.org/D30672

llvm-svn: 297198
2017-03-07 20:33:38 +00:00
Arnold Schwaighofer 69e74b48f2 SjLjEHPrepare: Fix the pass for swifterror arguments
We cannot leave the identity copies 'select true, arg, undef' that this pass
inserts for arguments to simplify handling of values on swifterror arguments.

swifterror arguments have restrictions on their uses.

rdar://30839288

llvm-svn: 297197
2017-03-07 20:29:02 +00:00
Joel Jones 2852088126 [AArch64] Vulcan is now ThunderXT99
Broadcom Vulcan is now Cavium ThunderX2T99.

LLVM Bugzilla: http://bugs.llvm.org/show_bug.cgi?id=32113

Minor fixes for the alignments of loops and functions for
ThunderX T81/T83/T88 (better performance).

Patch was tested with SpecCPU2006.

Patch by Stefan Teleman

Differential Revision: https://reviews.llvm.org/D30510

llvm-svn: 297190
2017-03-07 19:42:40 +00:00
Volkan Keles 20d3c4200d [GlobalISel] Translate floating-point negation
Reviewers: qcolombet, javed.absar, aditya_nandakumar, dsanders, t.p.northover, ab

Reviewed By: qcolombet

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30671

llvm-svn: 297171
2017-03-07 18:03:28 +00:00
Krzysztof Parzyszek 3cceffb752 [Hexagon] Do not insert instructions before PHI nodes
llvm-svn: 297141
2017-03-07 14:20:19 +00:00
Ranjeet Singh 3d0af578cc [ARM] Reapply r296865 "[ARM] fpscr read/write intrinsics not aware of each other""
The original patch r296865 was reverted as it broke the chromium builds for
Android https://bugs.llvm.org/show_bug.cgi?id=32134, this patch reapplies
r296865 with a fix to make sure it doesn't cause the build regression.

The problem was that intrinsic selection on int_arm_get_fpscr was failing in
ISel this was because the code to manually select this intrinsic still thought
it was the version with no side-effects (INTRINSIC_WO_CHAIN) which is wrong as
it doesn't semantically match the definition in the tablegen code which says it
does have side-effects, I've fixed this by updating the intrinsic type to
INTRINSIC_W_CHAIN (has side-effects). I've also added a test for this based on
Hans original reproducer.

Differential Revision: https://reviews.llvm.org/D30645

llvm-svn: 297137
2017-03-07 11:17:53 +00:00
Jonas Paulsson 1d33cd3988 [SystemZ] Add check VT.isSimple() in canTreateAsByteVector()
Since BB-vectorizer can produce vectors of for example 3 elements,
this check is needed.

Review: Ulrich Weigand
llvm-svn: 297136
2017-03-07 09:49:31 +00:00
Artyom Skrobov 1388e2f792 In Thumb1, materialize a move between low registers as a `movs`, if CPSR isn't live.
Summary: Previously, it had always been materialized as a push/pop sequence.

Reviewers: labrinea, jroelofs

Reviewed By: jroelofs

Subscribers: llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D30648

llvm-svn: 297134
2017-03-07 09:38:16 +00:00
Ayman Musa ac5a2c43af [X86][AVX512] Add missing entries to EVEX2VEX tables
evex2vex pass defines 2 tables which maps EVEX instructions to their VEX identical when possible. Adding all missing entries.

Differential Revision: https://reviews.llvm.org/D30501

llvm-svn: 297126
2017-03-07 08:05:53 +00:00
Tim Shen 70054bb827 Revert "[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size"
This reverts commit r296771.

We found some wide spread test failures internally. I'm working on a
testcase. Politely revert the patch in the mean time. :)

llvm-svn: 297124
2017-03-07 07:40:10 +00:00
Konstantin Zhuravlyov e8aaab8abe Revert "AMDGPU: Set MCAsmInfo::PointerSize"
It breaks line tables because the patch is not complete, working on a complete one at the moment

This reverts commit r294031.

llvm-svn: 297118
2017-03-07 04:44:33 +00:00
Tim Northover c2c545b8f7 GlobalISel: restrict G_EXTRACT instruction to just one operand.
A bit more painful than G_INSERT because it was more widely used, but this
should simplify the handling of extract operations in most locations.

llvm-svn: 297100
2017-03-06 23:50:28 +00:00
Jessica Paquette 596f483a5e [Outliner] Fixed Asan bot failure in r296418
Fixed the asan bot failure which led to the last commit of the outliner being reverted.
The change is in lib/CodeGen/MachineOutliner.cpp in the SuffixTree's constructor. LeafVector
is no longer initialized using reserve but just a standard constructor.

llvm-svn: 297081
2017-03-06 21:31:18 +00:00
Chad Rosier 9a70c7c02a [AArch64][Redundant Copy Elim] Add support for CMN and shifted imm.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle CMN instructions as well as a shifted
immediates.

Differential Revision: https://reviews.llvm.org/D30576.

llvm-svn: 297078
2017-03-06 21:20:00 +00:00
Krzysztof Parzyszek 9e60e51a71 Revert r297039, it's causing some mysterious buildbot failures
llvm-svn: 297062
2017-03-06 20:24:21 +00:00
Jan Vesely 3ea1704434 AMDGPU/R600: Fix ALU clause markers use detection
also exit early on kill instead of redefinition.

Differential Revision: https://reviews.llvm.org/D30230

llvm-svn: 297060
2017-03-06 20:10:05 +00:00
Krzysztof Parzyszek 5b8fae5edd [IfConversion] Only renormalize probabilities if branches are analyzable
If a block has non-analyzable branches, the listed successors don't need
to add up to one. For example, if a block has a conditional tail call,
that tail call will not have a corresponding successor in the successor
list, but will still be a possible branch.

Differential Revision: https://reviews.llvm.org/D30556

llvm-svn: 297054
2017-03-06 19:12:42 +00:00
Tim Northover 95b6d5f2b1 GlobalISel: don't emit degenerate G_INSERT instructions.
Before, we were producing G_INSERT instructions that were actually closer to a
cast or even a COPY when both input and output sizes are the same. This doesn't
really make sense and means that everything interpreting a G_INSERT also has to
handle all these kinds of casts.

So now we detect these degenerate cases and emit real casts instead.

llvm-svn: 297051
2017-03-06 19:04:17 +00:00
Reid Kleckner 812191584f [X86] Fix arg copy elision for illegal types
Use the store size of the argument type, which will be a byte-sized
quantity, rather than dividing the size in bits by 8.

Fixes PR32136 and re-enables copy elision from i64 arguments.

Reverts the workaround in from r296950.

llvm-svn: 297045
2017-03-06 18:39:39 +00:00
Tim Northover 75e0b91e59 GlobalISel: refactor legalization of G_INSERT.
Now that G_INSERT instructions can only insert one register, this code was
overly general. In another direction it didn't handle registers that crossed
split boundaries properly, which needed to be fixed.

llvm-svn: 297042
2017-03-06 18:23:04 +00:00
Krzysztof Parzyszek 03c5c21568 [TableGen] Ensure proper ordering of subtarget feature names
llvm-svn: 297039
2017-03-06 18:08:37 +00:00
Krzysztof Parzyszek 8a4c601abc [Hexagon] Early-if-convert branches that may exit the loop
Merge the tail block into the loop in cases where the main loop body
exits early, subject to profitability constraints. This will coalesce
the loop body into fewer blocks.

For example:
  loop:                           loop:
    // loop body                      // loop body
    if (...) jump exit      -->       // more body
  more:                               if (...) jump exit
    // more body                      jump loop
    jump loop

llvm-svn: 297033
2017-03-06 17:24:04 +00:00
Krzysztof Parzyszek e16ce15687 [Hexagon] Mark dead defs as <dead> in expand-condsets
The code in updateDeadFlags removed unnecessary <dead> flags, but there
can be cases where such a flag is not set, and yet a register has become
dead. For example, if a mux with identical inputs is replaced with a COPY,
the predicate register may no longer be used after that.

llvm-svn: 297032
2017-03-06 17:09:06 +00:00
Krzysztof Parzyszek 143158b72e [Hexagon] Pick a dot-old instruction that matches the architecture
llvm-svn: 297031
2017-03-06 17:03:16 +00:00
Sanjay Patel 7f7947bf41 [DAGCombiner] simplify div/rem-by-0
Refactoring of duplicated code and more fixes to follow.

This is motivated by the post-commit comments for r296699:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435182.html

Ie, we can crash if we're missing obvious simplifications like this that
exist in the IR simplifier or if these occur later than expected.

The x86 change for non-splat division shows a potential opportunity to improve
vector codegen: we assumed that since only one lane had meaningful results, we
should do the math in scalar. But that means moving back and forth from vector
registers.

llvm-svn: 297026
2017-03-06 16:36:42 +00:00
Sanjay Patel 9aad934710 [x86] add tests to show missing div/rem simplifications; NFC
These are not x86-specific, but the problem is not visible for all targets
because it is masked by other transforms. These can lead to compiler crashes.

llvm-svn: 297017
2017-03-06 15:50:07 +00:00
Nemanja Ivanovic 12e67d868a [PowerPC] Fix failure with STBRX when store is narrower than the bswap
Fixes a crash caused by r296811 by truncating the input of the STBRX node
when the bswap is wider than i32.

Fixes https://bugs.llvm.org/show_bug.cgi?id=32140

Differential Revision: https://reviews.llvm.org/D30615

llvm-svn: 297001
2017-03-06 07:32:13 +00:00
Dean Michael Berris 7e8eea429f [XRay] Allow logging the first argument of a function call.
Summary:
Functions with the "xray-log-args" attribute will have a special XRay sled kind
emitted, for compiler-rt to copy any call arguments to your logging handler.

For practical and performance reasons, only the first argument is supported, and
only up to 64 bits.

Reviewers: dberris

Reviewed By: dberris

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29702

llvm-svn: 296998
2017-03-06 06:48:56 +00:00
Simon Pilgrim 584d6d9d91 [SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions
Found by fuzz testing after rL296985 landed

llvm-svn: 296989
2017-03-05 15:52:18 +00:00
Simon Pilgrim 9f5c251d57 [X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops
As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT.

We're missing a couple of shuffle combines that will be added in a future patch for review.

Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases.

Differential Revision: https://reviews.llvm.org/D30549

llvm-svn: 296985
2017-03-05 09:57:20 +00:00
Sanjay Patel b974be5ef4 [x86] don't require a zext when forming ADC/SBB
The larger goal is to move the ADC/SBB transforms currently in 
combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're 
creating ADC/SBB in lots of places where we shouldn't.

This was intended to be an NFC change, but avx-512 has something 
strange going on. It doesn't seem like any of the affected tests 
should really be using SET+TEST or ADC; a simple ADD could replace
several instructions. But that's another bug...

llvm-svn: 296978
2017-03-04 20:35:19 +00:00
Sanjay Patel 066f3208bf [DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C)
select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook.

This is part of the ongoing process to obsolete D24480.  The motivation is to 
canonicalize to select IR in InstCombine whenever possible, so we need to have a way to
undo that easily in codegen.
 
PowerPC is an obvious winner for this kind of transform because it has fast and complete 
bit-twiddling abilities but generally lousy conditional execution perf (although this might
have changed in recent implementations).

x86 also sees some wins, but the effect is limited because these transforms already mostly
exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 
changes just shows that that code is a mess of special-case holes. We may be able to remove 
some of that logic now.

My guess is that other targets will want to enable this hook for most cases. The likely 
follow-ups would be to add value type and/or the constants themselves as parameters for the
hook. As the tests in select_const.ll show, we can transform any select-of-constants to 
math/logic, but the general transform for any 2 constants needs one more instruction 
(multiply or 'and').

ARM is one target that I think may not want this for most cases. I see infinite loops there
because it wants to use selects to enable conditionally executed instructions.

Differential Revision: https://reviews.llvm.org/D30537

llvm-svn: 296977
2017-03-04 19:18:09 +00:00
Simon Pilgrim 40a0e66b37 [X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets
Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks.

Differential Revision: https://reviews.llvm.org/D30213

llvm-svn: 296966
2017-03-04 12:50:47 +00:00
Florian Hahn 6406f98342 [legalize-types] Remove stale entries from SoftenedFloats.
Summary:
When replacing a SDValue, we should remove the replaced value from
SoftenedFloats (and possibly the other maps as well?).

When we revisit a Node because it needs analyzing again, we have to
remove all result values from SoftenedFloats (and possibly other maps?).

This fixes the fp128 test failures with expensive checks for X86.

I think we probably should also remove the values from the other maps
(PromotedIntegers and so on), let me know what you think.

Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon

Reviewed By: chh

Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits

Differential Revision: https://reviews.llvm.org/D29265

llvm-svn: 296964
2017-03-04 12:00:35 +00:00
Matthias Braun 21f340fd25 X86ISelLowering: Only perform copy elision on legal types.
This fixes cases where i1 types were not properly legalized yet and lead
to the creating of 0-sized stack slots.

This fixes http://llvm.org/PR32136

llvm-svn: 296950
2017-03-04 01:40:40 +00:00
Sanjay Patel a84fd041c6 [x86] check for commuted add pattern to find ADC/SBB
llvm-svn: 296933
2017-03-04 00:18:31 +00:00
Sanjay Patel 71c1958fca [x86] add test to show failed recognition of commuted pattern; NFC
llvm-svn: 296931
2017-03-04 00:06:37 +00:00
Hans Wennborg 1c9d800fbc Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other"
It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr".

llvm-svn: 296926
2017-03-03 23:19:31 +00:00
Tim Northover 3e6a7afd81 GlobalISel: constrain G_INSERT to inserting just one value per instruction.
It's much easier to reason about single-value inserts and no-one was actually
using the variadic variants before.

llvm-svn: 296923
2017-03-03 23:05:47 +00:00
Tim Northover bf017293af GlobalISel: add merge/unmerge nodes for legalization.
These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which
assume the individual parts will be contiguous, homogeneous, and occupy the
entirity of the larger register. This makes reasoning about them much easer
since you only have to look at the first register being merged and the result
to know what the instruction is doing.

I intend to gradually replace all uses of the more complicated sequence/extract
with these (or single-element insert/extracts), and then remove the older
variants. For now we start with legalization.

llvm-svn: 296921
2017-03-03 22:46:09 +00:00
Sanjay Patel 000d61acfd [x86] regenerate checks; NFC
llvm-svn: 296908
2017-03-03 20:48:54 +00:00
Sanjay Patel 1716aa45f1 [x86] regenerate checks; NFC
llvm-svn: 296883
2017-03-03 16:58:51 +00:00
Sanjay Patel 9f5db7d4e0 [x86] regenerate checks; NFC
llvm-svn: 296881
2017-03-03 16:45:57 +00:00
Sanjay Patel 872e0b86eb [x86] regenerate checks; NFC
llvm-svn: 296880
2017-03-03 16:42:43 +00:00
Sanjay Patel 6fa6316769 [x86] regenerate checks; NFC
llvm-svn: 296877
2017-03-03 16:34:35 +00:00
Ranjeet Singh 7b60a9ed0c [ARM] fpscr read/write intrinsics not aware of each other
The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and
write to the fpscr (Floating-Point Status and Control Register) register.

A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which
treats this intrinsic as a IntroNoMem which means it's not a memory access and
doesn't have any other side-effects. Having this property on this intrinsic
means that various optimizations can be done on this such as common
sub-expression elimination with other reads. This can cause issues if there has
been write to this register, e.g.

void foo(int *p) {
     p[0] = __builtin_arm_get_fpscr();
     __builtin_arm_set_fpscr(1);
     p[1] = __builtin_arm_get_fpscr();
}

in the above example the second read is currently CSE'd into the first read,
this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr
effects the same register that __builtin_arm_get_fpscr reads from, to fix this
problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is
treated as a memory access.

Differential Revision: https://reviews.llvm.org/D30542

llvm-svn: 296865
2017-03-03 11:40:07 +00:00
Chandler Carruth ce52b80744 [SDAG] Revert r296476 (and r296486, r296668, r296690).
This patch causes compile times for some patterns to explode. I have
a (large, unreduced) test case that slows down by more than 20x and
several test cases slow down by 2x. I'm sending some of the test cases
directly to Nirav and following up with more details in the review log,
but this should unblock anyone else hitting this.

llvm-svn: 296862
2017-03-03 10:02:25 +00:00
Amjad Aboud 4f97751798 [X86] Generate VZEROUPPER for Skylake-avx512.
VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be.

Differential Revision: https://reviews.llvm.org/D29874

llvm-svn: 296859
2017-03-03 09:03:24 +00:00
Igor Breger 321cf3c650 [GlobalISel][X86] Support float/double and vector types.
Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector.

Reviewers: delena, zvi

Reviewed By: zvi

Subscribers: dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30533

llvm-svn: 296856
2017-03-03 08:06:46 +00:00
Kyle Butt 1fa6030767 CodeGen: BlockPlacement: Precompute layout for chains of triangles.
For chains of triangles with small join blocks that can be tail duplicated, a
simple calculation of probabilities is insufficient. Tail duplication
can be profitable in 3 different ways for these cases:

1) The post-dominators marked 50% are actually taken 56% (This shrinks with
   longer chains)
2) The chains are statically correlated. Branch probabilities have a very
   U-shaped distribution.
   [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805]
   If the branches in a chain are likely to be from the same side of the
   distribution as their predecessor, but are independent at runtime, this
   transformation is profitable. (Because the cost of being wrong is a small
   fixed cost, unlike the standard triangle layout where the cost of being
   wrong scales with the # of triangles.)
3) The chains are dynamically correlated. If the probability that a previous
   branch was taken positively influences whether the next branch will be
   taken
We believe that 2 and 3 are common enough to justify the small margin in 1.

The code pre-scans a function's CFG to identify this pattern and marks the edges
so that the standard layout algorithm can use the computed results.

llvm-svn: 296845
2017-03-03 01:00:22 +00:00
Taewook Oh 96c6415697 [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Summary:
Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

```
extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}
```

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

```
define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

```

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Reviewers: atrick, bogner, andreadb, craig.topper, aprantl

Reviewed By: andreadb

Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29813

llvm-svn: 296825
2017-03-02 21:58:35 +00:00
Krzysztof Parzyszek e720feb1c6 [Hexagon] Pick the right branch opcode depending on branch probabilities
Specifically, pick the opcode with the correct branch prediction, i.e.
jump:t or jump:nt.

llvm-svn: 296821
2017-03-02 21:49:49 +00:00
Tobias Grosser 02d86b80ec Revert "AMDGPU: Re-do update for branch-relaxation test"
This commit also relied on r296812, which I just reverted. We should probably
apply it again, after the r296812 has been discussed and been reapplied in some
variant.

llvm-svn: 296820
2017-03-02 21:47:51 +00:00
Kyle Butt 1393761e0c CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic.
Outlining optional branches isn't a good heuristic, and it's never been
on by default. Remove it to clean things up.

llvm-svn: 296818
2017-03-02 21:44:24 +00:00
Eli Friedman bb821276d0 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation
as LastOp.

This patch fixes some cases where we would move stores to the wrong
insert point.

Re-commit with a fix to increment NumMove in the right place.

Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296815
2017-03-02 21:39:39 +00:00
Guozhi Wei ed28e742ee [PPC] Fix code generation for bswap(int32) followed by store16
This patch fixes pr32063.

Current code in PPCTargetLowering::PerformDAGCombine can transform

bswap
store

into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications,

1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT().

2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side.

Differential Revision: https://reviews.llvm.org/D30362

llvm-svn: 296811
2017-03-02 21:07:59 +00:00
Chad Rosier ea25eca04a [AArch64] Extend redundant copy elimination pass to handle non-zero stores.
This patch extends the current functionality of the AArch64 redundant copy
elimination pass to handle non-zero cases such as:

BB#0:
  cmp x0, #1
  b.eq .LBB0_1
.LBB0_1:
  orr x0, xzr, #0x1  ; <-- redundant copy; x0 known to hold #1.

Differential Revision: https://reviews.llvm.org/D29344

llvm-svn: 296809
2017-03-02 20:48:11 +00:00
Vadzim Dambrouski eafb805506 [MSP430] Add SRet support to MSP430 target
This patch adds support for struct return values to the MSP430
target backend. It also reverses the order of argument and return
registers in the calling convention to bring it into closer
alignment with the published EABI from TI.

Patch by Andrew Wygle (awygle).

Differential Revision: https://reviews.llvm.org/D29069

llvm-svn: 296807
2017-03-02 20:25:10 +00:00
Simon Pilgrim b3067dc374 [X86][MMX] Fixed i32 extraction on 32-bit targets
MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal

llvm-svn: 296782
2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek 056c945a5d [Hexagon] Skip blocks that define vector predicate registers in early-if
llvm-svn: 296777
2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek fcbb7d10fe [Hexagon] Properly handle 'q' constraint in 128-byte vector mode
llvm-svn: 296772
2017-03-02 17:50:24 +00:00
Nemanja Ivanovic db8425eff0 [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
This patch reduces the stack frame size by not allocating the parameter area if
it is not required. In the current implementation LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

llvm-svn: 296771
2017-03-02 17:38:59 +00:00
Sanjay Patel fffa179837 [DAGCombiner] avoid assertion when folding binops with opaque constants
This bug was introduced with:
https://reviews.llvm.org/rL296699

There may be a way to loosen the restriction, but for now just bail out
on any opaque constant.

The tests show that opacity is target-specific. This goes back to cost
calculations in ConstantHoisting based on TTI->getIntImmCost().

llvm-svn: 296768
2017-03-02 17:18:56 +00:00
Tim Northover e80d6d1360 GlobalISel: record correct stack usage for signext parameters.
The CallingConv.td rules allocate 8 bytes for these kinds of arguments
on AAPCS targets, but we were only recording the smaller amount. The
difference is theoretical on AArch64 because we don't actually store
more than the smaller amount, but it's still much better to have these
two components in agreement.

Based on Diana Picus's ARM equivalent patch (where it matters a lot
more).

llvm-svn: 296754
2017-03-02 15:34:18 +00:00
Andrew V. Tischenko 2855dc7ddc Added special test covering a problem with PIC relocation model on SLM architecture. The fix will come in D26855.
llvm-svn: 296746
2017-03-02 13:47:03 +00:00
Serge Pavlov e2bf69715f Do not verify MachimeDominatorTree if it is not calculated
If dominator tree is not calculated or is invalidated, set corresponding
pointer in the pass state to nullptr. Such pointer value will indicate
that operations with dominator tree are not allowed. In particular, it
allows to skip verification for such pass state. The dominator tree is
not calculated if the machine dominator pass was skipped, it occures in
the case of entities with linkage available_externally.

The change fixes some test fails observed when expensive checks
are enabled.

Differential Revision: https://reviews.llvm.org/D29280

llvm-svn: 296742
2017-03-02 12:00:10 +00:00
Matthias Braun dbcf9e2ee4 LiveRegMatrix: Fix some subreg interference checks
Surprisingly, one of the three interference checks in LiveRegMatrix was
using the main live range instead of the apropriate subregister range
resulting in unnecessarily conservative results.

llvm-svn: 296722
2017-03-02 00:35:08 +00:00
Eli Friedman 933863ce61 Revert r296708; causing test failures on ARM hosts.
Original commit message:

[ARM] Fix insert point for store rescheduling.
    
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.

llvm-svn: 296718
2017-03-02 00:08:50 +00:00
Amaury Sechet 71f511fd1e [DAGCombiner] mulhi + 1 never overflow.
Summary:
This can be used to optimize large multiplications after legalization.

Depends on D29565

Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29587

llvm-svn: 296711
2017-03-01 23:44:17 +00:00
Ahmed Bougacha 120ae22d70 [GlobalISel] Add a way for targets to enable GISel.
Until now, we've had to use -global-isel to enable GISel.  But using
that on other targets that don't support it will result in an abort, as we
can't build a full pipeline.
Additionally, we want to experiment with enabling GISel by default for
some targets: we can't just enable GISel by default, even among those
target that do have some support, because the level of support varies.

This first step adds an override for the target to explicitly define its
level of support.  For AArch64, do that using
a new command-line option (I know..):
  -aarch64-enable-global-isel-at-O=<N>
Where N is the opt-level below which GISel should be used.

Default that to -1, so that we still don't enable GISel anywhere.
We're not there yet!

While there, remove a couple LLVM_UNLIKELYs.  Building the pipeline is
such a cold path that in practice that shouldn't matter at all.

llvm-svn: 296710
2017-03-01 23:33:08 +00:00
Amaury Sechet 683f5743f6 Improve mulhi overflow test. NFC
llvm-svn: 296709
2017-03-01 23:31:19 +00:00
Eli Friedman 1c9216b003 [ARM] Fix insert point for store rescheduling.
In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last
operation which we want to merge. If we break out of the loop because
an operation has the wrong offset, we shouldn't use that operation as
LastOp.
    
This patch fixes some cases where we would sink stores for no reason.
    
Differential Revision: https://reviews.llvm.org/D30124

llvm-svn: 296708
2017-03-01 23:20:29 +00:00
Eli Friedman 28c2c0e311 [ARM] Check correct instructions for load/store rescheduling.
This code starts from the high end of the sorted vector of offsets, and
works backwards: it tries to find contiguous offsets, process them, then
pops them from the end of the vector. Most of the code agrees with this
order of processing, but one loop doesn't: it instead processes elements
from the low end of the vector (which are nodes with unrelated offsets).
Fix that loop to process the correct elements.
    
This has a few implications. One, we don't incorrectly return early when
processing multiple groups of offsets in the same block (which allows
rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert
point for loads, so they're correctly sorted (which affects the
scheduling of vldm-liveness.ll). I think it might also impact some of
the heuristics slightly.
    
Differential Revision: https://reviews.llvm.org/D30368

llvm-svn: 296701
2017-03-01 22:56:20 +00:00
Sanjay Patel 92938657a0 [DAGCombiner] fold binops with constant into select-of-constants
This is part of the ongoing attempt to improve select codegen for all targets and select 
canonicalization in IR (see D24480 for more background). The transform is a subset of what
is done in InstCombine's FoldOpIntoSelect().

I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that 
hopes to convert more selects to basic math ops. This appears to be a general missing DAG
transform though, so I added tests for all standard binops in rL296621 
(PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86).

The poor output for "sel_constants_shl_constant" is tracked with:
https://bugs.llvm.org/show_bug.cgi?id=32105

Differential Revision: https://reviews.llvm.org/D30502

llvm-svn: 296699
2017-03-01 22:51:31 +00:00
Amaury Sechet 250b4a7491 Add test case for mulhi's overflow. NFC
llvm-svn: 296696
2017-03-01 22:27:21 +00:00
Reid Kleckner f7c0980c10 Elide argument copies during instruction selection
Summary:
Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped.  This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before ISel to identify
copy elision candidates. A copy elision candidate is an argument that is
used to fully initialize an alloca before any other possibly escaping
uses of that alloca. If an argument is a copy elision candidate, we set
a flag on the InputArg. If the the target generates loads from a fixed
stack object that matches the size and alignment requirements of the
alloca, the SelectionDAG builder will delete the stack object created
for the alloca and replace it with the fixed stack object. The load is
left behind to satisfy any remaining uses of the argument value. The
store is now dead and is therefore elided. The fixed stack object is
also marked as mutable, as it may now be modified by the user, and it
would be invalid to rematerialize the initial load from it.

Supersedes D28388

Fixes PR26328

Reviewers: chandlerc, MatzeB, qcolombet, inglorion, hans

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D29668

llvm-svn: 296683
2017-03-01 21:42:00 +00:00
Sanjay Patel f8edc3e870 [x86] add vector tests for more coverage of D30502; NFC
llvm-svn: 296671
2017-03-01 20:31:23 +00:00
Nemanja Ivanovic b223cfabcc Improve scheduling with branch coalescing
This patch adds a MachineSSA pass that coalesces blocks that branch
on the same condition.

Committing on behalf of Lei Huang.

Differential Revision: https://reviews.llvm.org/D28249

llvm-svn: 296670
2017-03-01 20:29:34 +00:00
Nirav Dave 0a4703b5ec [DAG] Prevent Stale nodes from entering worklist
Add check that deleted nodes do not get added to worklist. This can
occur when a node's operand is simplified to an existing node.

This fixes PR32108.

Reviewers: jyknight, hfinkel, chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D30506

llvm-svn: 296668
2017-03-01 20:19:38 +00:00
Nirav Dave 3de7fce3ac Add test cases for merging stores of multiply used stores
llvm-svn: 296667
2017-03-01 20:18:14 +00:00
Artur Pilipenko e1b2d31468 [DAGCombiner] Support {a|s}ext, {a|z|s}ext load nodes in load combine
Resubmit r295336 after the bug with non-zero offset patterns on BE targets is fixed (r296336).

Support {a|s}ext, {a|z|s}ext load nodes as a part of load combine patters.

Reviewed By: filcab

Differential Revision: https://reviews.llvm.org/D29591

llvm-svn: 296651
2017-03-01 18:12:29 +00:00
Krzysztof Parzyszek 8f23dd6d68 [Hexagon] Fix lowering of formal arguments of type i1
On Hexagon, values of type i1 are passed in registers of type i32,
even though i1 is not a legal value for these registers. This is a
special case and needs special handling to maintain consistency of
the lowering information.

This fixes PR32089.

llvm-svn: 296645
2017-03-01 17:30:10 +00:00
Diana Picus 9c52309b37 [ARM] GlobalISel: Lower call params that need extensions
Lower i1, i8 and i16 call parameters by extending them before storing them on
the stack. Also make sure we encode the correct, extended size in the
corresponding memory operand, and that we compute the correct stack size in the
end.

The latter is a bit more complicated because we used to compute the stack size
in the getStackAddress method, based on the Size and Offset of the parameters.
However, if the last parameter is sign extended, we'd be using the wrong,
non-extended size, and we'd end up with a smaller stack than we need to hold the
extended value. Instead of hacking this up based on the value of Size in
getStackAddress, we move our stack size handling logic to assignArg, where we
have access to the CCState which knows everything we could possibly want to know
about the stack. This way we don't need to duplicate any knowledge or resort to
any ugly hacks.

On this same occasion, update the IRTranslator test to check the sizes of the
stores everywhere, not just for sign extended paramteres.

llvm-svn: 296631
2017-03-01 15:35:14 +00:00
Sanjay Patel 88a1b8b466 [x86] auto-generate checks; NFC
llvm-svn: 296629
2017-03-01 14:46:59 +00:00
Sanjay Patel f0496a6a5c [x86] regenerate checks; NFC
llvm-svn: 296628
2017-03-01 14:41:57 +00:00