Commit Graph

6656 Commits

Author SHA1 Message Date
Reid Kleckner 6ddae31045 [WinEH] Fix funclet prologues with stack realignment
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.

While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.

llvm-svn: 252210
2015-11-05 21:09:49 +00:00
Petar Jovanovic 99fba3c141 Add cfi instr for CFA calculation when movpc is expanded to call and pop
This fixes the issue of wrong CFA calculation in the following case:

0x08048400 <+0>:	push   %ebx
0x08048401 <+1>:	sub    $0x8,%esp
0x08048404 <+4>:	**call   0x8048409 <test+9>**
0x08048409 <+9>:	**pop    %eax**
0x0804840a <+10>:	add    $0x1bf7,%eax
0x08048410 <+16>:	mov    %eax,%ebx
0x08048412 <+18>:	call   0x80483f0 <bar>
0x08048417 <+23>:	add    $0x8,%esp
0x0804841a <+26>:	pop    %ebx
0x0804841b <+27>:	ret

The highlighted instructions are a product of movpc instruction. The call
instruction changes the stack pointer, and pop instruction restores its
value. However, the rule for computing CFA is not updated and is wrong on
the pop instruction. So, e.g. backtrace in gdb does not work when on the pop
instruction. This adds cfi instructions for both call and pop instructions.

cfi_adjust_cfa_offset** instruction is used with the appropriate offset for
setting the rules to calculate CFA correctly.

Patch by Violeta Vukobrat.

Differential Revision: http://reviews.llvm.org/D14021

llvm-svn: 252176
2015-11-05 17:19:59 +00:00
Asaf Badouh f99c054ebc revert rev. 252153 due to build failure on ubuntu
[X86][AVX512] add comi with Sae

llvm-svn: 252154
2015-11-05 08:55:54 +00:00
Asaf Badouh 7fdabf0a35 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 252153
2015-11-05 08:45:06 +00:00
Quentin Colombet 421723cdd8 [x86] Teach the shrink-wrapping hooks to do the proper thing with Win64.
Win64 has some strict requirements for the epilogue. As a result, we disable
shrink-wrapping for Win64 unless the block that gets the epilogue is already an
exit block.

Fixes PR24193.

llvm-svn: 252088
2015-11-04 22:37:28 +00:00
Simon Pilgrim 7e6606f4f1 [X86][SSE] Add general memory folding for (V)INSERTPS instruction
This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction.

The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening.

This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done.

It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted.

Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll

Differential Revision: http://reviews.llvm.org/D13988

llvm-svn: 252074
2015-11-04 20:48:09 +00:00
Andrew Kaylor e41a8c4182 Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of scalar FMA intrinsics.
Patch by Slava Klochkov 

The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.

Reviewers: Quentin Colombet, Elena Demikhovsky

Differential Revision: http://reviews.llvm.org/D13710

llvm-svn: 252060
2015-11-04 18:10:41 +00:00
Michael Kuperstein b34de72269 [X86] DAGCombine should not introduce FILD in soft-float mode
The x86 "sitofp i64 to double" dag combine, in 32-bit mode, lowers sitofp 
directly to X86ISD::FILD (or FILD_FLAG). This should not be done in soft-float mode.

llvm-svn: 252042
2015-11-04 11:17:53 +00:00
Igor Laevsky 35fe692025 [StatepointLowering] Remove distinction between call and invoke safepoints
There is no point in having invoke safepoints handled differently than the
call safepoints. All relevant decisions could be made by looking at whether
or not gc.result and gc.relocate lay in a same basic block. This change will
 allow to lower call safepoints with relocates and results in a different 
basic blocks. See test case for example.

Differential Revision: http://reviews.llvm.org/D14158

llvm-svn: 252028
2015-11-04 01:16:10 +00:00
Simon Pilgrim b0d860a394 [X86][AVX] Tweaked shuffle stack folding tests
To avoid alternative lowerings.

llvm-svn: 251986
2015-11-03 21:58:35 +00:00
Simon Pilgrim df993479c9 [X86][AVX512] Fixed shuffle test name to match shuffle
llvm-svn: 251984
2015-11-03 21:39:30 +00:00
Simon Pilgrim e88dc04c48 [X86][XOP] Add support for the matching of the VPCMOV bit select instruction
XOP has the VPCMOV instruction that performs the common vector bit select operation OR( AND( SRC1, SRC3 ), AND( SRC2, ~SRC3 ) )

This patch adds tablegen pattern matching for this instruction.

Differential Revision: http://reviews.llvm.org/D8841

llvm-svn: 251975
2015-11-03 20:27:01 +00:00
Rafael Espindola 41302723de Remove unnecessary dependency on section and string positions.
llvm-svn: 251964
2015-11-03 19:24:17 +00:00
Michael Kuperstein 73dc85293f [X86] Generate .cfi_adjust_cfa_offset correctly when pushing arguments
When push instructions are being used to pass function arguments on
the stack, and either EH or debugging are enabled, we need to generate
.cfi_adjust_cfa_offset directives appropriately. For (synch) EH, it is
enough for the CFA offset to be correct at every call site, while
for debugging we want to be correct after every push.

Darwin does not support this well, so don't use pushes whenever it
would be required.

Differential Revision: http://reviews.llvm.org/D13767

llvm-svn: 251904
2015-11-03 08:17:25 +00:00
Cong Hou b90b9e0531 In MachineBlockPlacement, filter cold blocks off the loop chain when profile data is available.
In the current BB placement algorithm, a loop chain always contains all loop blocks. This has a drawback that cold blocks in the loop may be inserted on a hot function path, hence increasing branch cost and also reducing icache locality.

Consider a simple example shown below:

A
|
B⇆C
|
D

When B->C is quite cold, the best BB-layout should be A,B,D,C. But the current implementation produces A,C,B,D.

This patch filters those cold blocks off from the loop chain by comparing the ratio:

LoopBBFreq / LoopFreq

to 20%: if it is less than 20%, we don't include this BB to the loop chain. Here LoopFreq is the frequency of the loop when we reduce the loop into a single node. In general we have more cold blocks when the loop has few iterations. And vice versa.


Differential revision: http://reviews.llvm.org/D11662

llvm-svn: 251833
2015-11-02 21:24:00 +00:00
James Y Knight 646c4032e7 Fix two issues in MergeConsecutiveStores:
1) PR25154. This is basically a repeat of PR18102, which was fixed in
r200201, and broken again by r234430. The latter changed which of the
store nodes was merged into from the first to the last. Thus, we now
also need to prefer merging a later store at a given address into the
target node, instead of an earlier one.

2) While investigating that, I also realized I'd introduced a bug in
r236850. There, I removed a check for alignment -- not realizing that
nothing except the alignment check was ensuring that none of the stores
were overlapping! This is a really bogus way to ensure there's no
aliased stores.

A better solution to both of these issues is likely to always use the
code added in the 'if (UseAA)' branches which rearrange the chain based
on a more principled analysis. I'll look into whether that can be used
always, but in the interest of getting things back to working, I think a
minimal change makes sense.

llvm-svn: 251816
2015-11-02 18:48:08 +00:00
Igor Breger fa798a9dbb AVX512: Implemented encoding and intrinsics for VBROADCASTI32x2 and VBROADCASTF32x2 instructions.
Differential Revision: http://reviews.llvm.org/D14216

llvm-svn: 251781
2015-11-02 07:39:36 +00:00
Craig Topper cc56e3a1fc [X86] Don't pass a scale value of 0 to scatter/gather intrinsics. This causes the code emitter to throw an assertion if we try to encode it. Need to add a check to fail isel for this, but for now avoid testing it.
llvm-svn: 251779
2015-11-02 07:24:37 +00:00
Elena Demikhovsky db738d9cc3 AVX-512: Optimized SIMD truncate operations for AVX512F set.
Optimized <8 x i32> to <8 x i16>
<4 x i64> to < 4 x i32>
<16 x i16> to <16 x i8>
All these oprtrations use now AVX512F set (KNL). Before this change it was implemented with AVX2 set.


Differential Revision: http://reviews.llvm.org/D14108

llvm-svn: 251764
2015-11-01 11:45:47 +00:00
Simon Pilgrim 6baace0324 [X86][SSE] Added load+sext tests for 16i1->16i8 and 32i1->32i8
llvm-svn: 251661
2015-10-29 22:19:21 +00:00
Simon Pilgrim ca56a72af9 [X86][SSE] Shuffle blends with zero
This patch generalizes the zeroing of vector elements with the BLEND instructions. Currently a zero vector will only blend if the shuffled elements are correctly inline, this patch recognises when a vector input is zero (or zeroable) and modifies a local copy of the shuffle mask to support a blend. As a zeroable vector input may not be all zeroes, the zeroable vector is regenerated if necessary.

Differential Revision: http://reviews.llvm.org/D14050

llvm-svn: 251659
2015-10-29 22:11:28 +00:00
Simon Pilgrim 94c4943562 [X86][AVX512] Test UNPCK with non-sequential scalars
Missing tests for r251297

llvm-svn: 251453
2015-10-27 21:18:45 +00:00
Sanjay Patel bbd4c79c8f Use the 'arcp' fast-math-flag when combining repeated FP divisors
This is a usage of the IR-level fast-math-flags now that they are propagated to SDNodes. 
This was originally part of D8900.

Removing the global 'enable-unsafe-fp-math' checks will require auto-upgrade and 
possibly other changes.

Differential Revision: http://reviews.llvm.org/D9708

llvm-svn: 251450
2015-10-27 20:27:25 +00:00
Asaf Badouh c7cb880669 [X86][AVX512] [X86][AVX512] add convert float to half
convert float to half with mask/maskz for the reg to reg version and mask for the reg to mem version (there is no maskz version for reg to mem).

Differential Revision: http://reviews.llvm.org/D14113

llvm-svn: 251409
2015-10-27 15:37:17 +00:00
Michael Kuperstein e1194bdb4f [X86] Make elfiamcu an OS, not an environment.
GNU tools require elfiamcu to take up the entire OS field, so, e.g.
i?86-*-linux-elfiamcu is not considered a legal triple.
Make us compatible.

Differential Revision: http://reviews.llvm.org/D14081

llvm-svn: 251390
2015-10-27 07:23:59 +00:00
Sanjay Patel 309c4f93e5 [x86] replace integer logic ops with packed SSE FP logic ops
If we have an operand to a bitwise logic op that's already in
an XMM register and the result is going to be sent to an XMM
register, then use an SSE logic op to avoid moves between the
integer and vector register files.

Related commits:
http://reviews.llvm.org/rL248395
http://reviews.llvm.org/rL248399
http://reviews.llvm.org/rL248404
http://reviews.llvm.org/rL248409
http://reviews.llvm.org/rL248415

This should solve PR22428:
https://llvm.org/bugs/show_bug.cgi?id=22428

llvm-svn: 251378
2015-10-27 01:28:07 +00:00
Steve King fee370be72 Fix llc crash processing S/UREM for -Oz builds caused by rL250825.
When taking the remainder of a value divided by a constant, visitREM()
attempts to convert the REM to a longer but faster sequence of instructions.
This conversion calls combine() on a speculative DIV instruction. Commit
rL250825 may cause this combine() to return a DIVREM, corrupting nearby nodes.
Flow eventually hits unreachable().

This patch adds a test case and a check to prevent visitREM() from trying
to convert the REM instruction in cases where a DIVREM is possible.
See http://reviews.llvm.org/D14035

llvm-svn: 251373
2015-10-27 00:14:06 +00:00
Sanjay Patel 28d1598e5b add FP logic test cases to show current codegen (PR22428)
llvm-svn: 251370
2015-10-26 23:52:42 +00:00
Chandler Carruth 5a14186b6c [x86] Make the vselect-minmax test 2x to 3x faster by deleting all the
instructions that aren't relevant for instruction selection of vector
min and max.

llvm-svn: 251366
2015-10-26 22:54:53 +00:00
Igor Breger 2e919c89ce fix test errors (on windows) for commit r251287
llvm-svn: 251288
2015-10-26 13:31:41 +00:00
Igor Breger e4ddc3f4cd AVX512: Enabled VPBROADCASTB lowering for v64i8 vectors.
Differential Revision: http://reviews.llvm.org/D13896

llvm-svn: 251287
2015-10-26 13:01:02 +00:00
Igor Breger 684af8156c AVX-512: Use correct extract vector length.
Bug https://llvm.org/bugs/show_bug.cgi?id=25318

Differential Revision: http://reviews.llvm.org/D14062

llvm-svn: 251285
2015-10-26 12:26:34 +00:00
Igor Breger f8e461f920 AVX512: Add AVX-512 not materializable instructions.
Otherwise value can be reused , despite its value could be changed - produces incorrect assembler.

https://llvm.org/bugs/show_bug.cgi?id=25270

Differential Revision: http://reviews.llvm.org/D14057

llvm-svn: 251275
2015-10-26 08:37:12 +00:00
Simon Pilgrim 3e5e272fca [X86][AVX] Regenerate tests.
llvm-svn: 251263
2015-10-25 21:47:09 +00:00
Simon Pilgrim ec6db262e0 [X86][SSE4A] Fix for EXTRQI shuffle lowering.
Incorrect range test - found during fuzz testing.

llvm-svn: 251245
2015-10-25 17:40:54 +00:00
Simon Pilgrim b9ab397647 [X86][SSE] Refreshed tests (missing AVX512 patterns)
llvm-svn: 251238
2015-10-25 15:39:22 +00:00
Elena Demikhovsky 092858588a Scalarizer for masked.gather and masked.scatter intrinsics.
When the target does not support these intrinsics they should be converted to a chain of scalar load or store operations.
If the mask is not constant, the scalarizer will build a chain of conditional basic blocks.
I added isLegalMaskedGather() isLegalMaskedScatter() APIs.

Differential Revision: http://reviews.llvm.org/D13722

llvm-svn: 251237
2015-10-25 15:37:55 +00:00
Simon Pilgrim be187a0a1a [X86][SSE] Added tests for shuffling through bitcasts.
llvm-svn: 251236
2015-10-25 15:32:04 +00:00
Simon Pilgrim 40ada57b7e [X86][SSE] vector sext/zext tests - remove unnecessary mcpu arguments
llvm-svn: 251233
2015-10-25 12:15:00 +00:00
Simon Pilgrim b398da1d5c [X86][SSE] shift/rotate tests - remove unnecessary mcpu arguments and regenerate/cleanup
llvm-svn: 251232
2015-10-25 12:07:45 +00:00
Simon Pilgrim 5d4866b174 [X86] PMOV*X* tests - remove unnecessary mcpu arguments and regenerate
llvm-svn: 251230
2015-10-25 11:55:10 +00:00
Simon Pilgrim 5e79ea8281 [X86] Stack folding tests - just use mtriple - no need for mcpu in these tests
llvm-svn: 251229
2015-10-25 11:42:46 +00:00
Michael Kuperstein eaa16005af [X86] Use correct calling convention for MCU psABI libcalls
When using the MCU psABI, compiler-generated library calls should pass
some parameters in-register. However, since inreg marking for x86 is currently
done by the front end, it will not be applied to backend-generated calls.

This is a workaround for PR3997, which describes a similar issue for -mregparm.

Differential Revision: http://reviews.llvm.org/D13977

llvm-svn: 251223
2015-10-25 08:14:05 +00:00
Simon Pilgrim 53c2bff5fe [X86][SSE] Use lowerVectorShuffleWithUNPCK instead of custom matches.
Most 128-bit and 256-bit shuffles were manually matching UNPCK patterns - use lowerVectorShuffleWithUNPCK to be more thorough.

llvm-svn: 251211
2015-10-24 22:45:04 +00:00
Simon Pilgrim 64f772eac9 Removed old FIXME - we do generate movddup for SSE3 and higher
llvm-svn: 251205
2015-10-24 20:15:43 +00:00
Simon Pilgrim 7430804fe1 [DAGCombiner] Generalize masking of constant rotates.
We don't need a mask of a rotation result to be a constant splat - any constant scalar/vector can be usefully folded.

Followup to D13851.

llvm-svn: 251197
2015-10-24 18:44:52 +00:00
Hans Wennborg 34d40434a7 X86ISelLowering: Support tail calls to/from callee pop functions
This enables tail calls with thiscall, stdcall, vectorcall and
fastcall functions.

Differential Revision: http://reviews.llvm.org/D13999

llvm-svn: 251190
2015-10-24 16:47:10 +00:00
Simon Pilgrim d5ef318b5b [X86][XOP] Add support for lowering vector rotations
This patch adds support for lowering to the XOP VPROT / VPROTI vector bit rotation instructions.

This has required changes to the DAGCombiner rotation pattern matching to support vector types - so far I've only changed it to support splat vectors, but generalising this further is feasible in the future.

Differential Revision: http://reviews.llvm.org/D13851

llvm-svn: 251188
2015-10-24 13:17:26 +00:00
Reid Kleckner f02e33ce42 [X86] Clean up the tail call eligibility logic
Summary:
The logic here isn't straightforward because our support for
TargetOptions::GuaranteedTailCallOpt.

Also fix a bug where we were allowing tail calls to cdecl functions from
fastcall and vectorcall functions. We were special casing thiscall and
stdcall callers rather than checking for any convention that requires
clearing stack arguments before returning.

Reviewers: hans

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14024

llvm-svn: 251137
2015-10-23 19:35:38 +00:00
Joseph Tremoulet 3d0fbf1d74 [CodeGen] Mark setjmp/catchret MBBs address-taken
Summary:
This ensures that BranchFolding (and similar) won't remove these blocks.

Also allow AsmPrinter::EmitBasicBlockStart to process MBBs which are
address-taken but do not have BBs that are address-taken, since otherwise
its call to getAddrLabelSymbolTableToEmit would fail an assertion on such
blocks.  I audited the other callers of getAddrLabelSymbolTableToEmit
(and getAddrLabelSymbol); they all have BBs known to be address-taken
except for the call through getAddrLabelSymbol from
WinException::create32bitRef; that call is actually now unreachable, so
I've removed it and updated the signature of create32bitRef.

This fixes PR25168.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: pgavlin, llvm-commits

Differential Revision: http://reviews.llvm.org/D13774

llvm-svn: 251113
2015-10-23 15:06:05 +00:00
Zia Ansari 8f509a7044 [X86] - Catch extra combine opportunities for redundant imuls.
When we fold "mul ((add x, c1), c1)" -> "add ((mul x, c2), c1*c2)", we bail if (add x, c1) has multiple
users which would result in an extra add instruction.
In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add.

I also added the capability of doing the existing optimization with non-splatted vectors (splatted also works).

Differential Revision: http://reviews.llvm.org/D13740

llvm-svn: 251028
2015-10-22 16:14:45 +00:00
Asaf Badouh 7c52245660 [X86][AVX512] extend vcvtph2ps to support xmm/ymm and sae versions
Differential Revision: http://reviews.llvm.org/D13945

llvm-svn: 251018
2015-10-22 14:01:16 +00:00
Elena Demikhovsky 5c97dfdc9c AVX-512: Fixed a bug in select_cc for i1 type
Fixed faiure:
LLVM ERROR: Cannot select: t33: i1 = select_cc t25, Constant:i32<0>, t45, t42, seteq:ch

added a test

Differential Revision: http://reviews.llvm.org/D13943

llvm-svn: 250996
2015-10-22 07:10:29 +00:00
Reid Kleckner cfaeb42f9c [WinEH] Add test for llvm.va.start in catchpad
It already works, but we should have a test for it.

This used to be PR23094 in the old model.

llvm-svn: 250936
2015-10-21 19:54:40 +00:00
Sanjay Patel 557001d1c7 [x86] add test case that shows holes in LEA isel
llvm-svn: 250910
2015-10-21 17:24:00 +00:00
Elena Demikhovsky 3ad76a1acd Masked Load/Store optimization for scalar code
When we have to convert the masked.load, masked.store to scalar code, we generate a chain of conditional basic blocks.
I added optimization for constant mask vector.

Differential Revision: http://reviews.llvm.org/D13855

llvm-svn: 250893
2015-10-21 11:50:54 +00:00
Simon Pilgrim bb17881731 [X86][SSE] Add 256-bit vector bit rotation tests.
llvm-svn: 250853
2015-10-20 20:27:23 +00:00
Igor Breger 21296d230a AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions.
Differential Revision: http://reviews.llvm.org/D13884

llvm-svn: 250819
2015-10-20 11:56:42 +00:00
Andrea Di Biagio 9a85b7abe0 [x86] Fix AVX maskload/store intrinsic prototypes.
The mask value type for maskload/maskstore GCC builtins is never a vector of
packed floats/doubles.

This patch fixes the following issues:
1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd
   should be of type llvm_v2i64_ty and not llvm_v2f64_ty.
2. The mask argument for builtin_ia32_maskloadpd256 and
   builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not
   llvm_v4f64_ty.
3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps
   should be of type llvm_v4i32_ty and not llvm_v4f32_ty.
4. The mask argument for builtin_ia32_maskloadps256 and
   builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not
   llvm_v8f32_ty.

Differential Revision: http://reviews.llvm.org/D13776

llvm-svn: 250817
2015-10-20 11:20:13 +00:00
Cong Hou 7745dbc5c4 Enhance loop rotation with existence of profile data in MachineBlockPlacement pass.
Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation:

1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header.
2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop.
3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain.

Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies.

Differential revision: http://reviews.llvm.org/D10717

llvm-svn: 250754
2015-10-19 23:16:40 +00:00
Simon Pilgrim 4708060e94 [X86][SSE] Add vector bit rotation tests.
llvm-svn: 250656
2015-10-18 12:54:37 +00:00
Asaf Badouh 696e8e0bb7 [X86][AVX512DQ] add scalar fpclass
Differential Revision: http://reviews.llvm.org/D13769

llvm-svn: 250650
2015-10-18 11:04:38 +00:00
Igor Breger cbb9550537 AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction
Differential Revision: http://reviews.llvm.org/D13632

llvm-svn: 250649
2015-10-18 09:56:39 +00:00
Simon Pilgrim 4773763bb0 [X86][XOP] Add VPROT rotate by immediate intrinsics tests
llvm-svn: 250618
2015-10-17 18:21:53 +00:00
Simon Pilgrim 5b65f28fe7 [X86][FastISel] Teach how to select SSE4A nontemporal stores.
Add FastISel support for SSE4A scalar float / double non-temporal stores

Follow up to D13698

Differential Revision: http://reviews.llvm.org/D13773

llvm-svn: 250610
2015-10-17 13:04:42 +00:00
Reid Kleckner 28e490342b [WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation
Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset
is incorrect when CSRs are involved. We were supposed to have a test
case to catch this, but it wasn't very rigorous.

The main effect here is that calling _CxxThrowException inside a
catchpad doesn't immediately crash on MOVAPS when you have an odd number
of CSRs.

llvm-svn: 250583
2015-10-16 23:43:27 +00:00
Sanjay Patel bbd524496c [x86] promote 'add nsw' to a wider type to allow more combines
The motivation for this patch starts with PR20134:
https://llvm.org/bugs/show_bug.cgi?id=20134

void foo(int *a, int i) {
  a[i] = a[i+1] + a[i+2];
}

It seems better to produce this (14 bytes):

movslq	%esi, %rsi
movl	0x4(%rdi,%rsi,4), %eax
addl	0x8(%rdi,%rsi,4), %eax
movl	%eax, (%rdi,%rsi,4)

Rather than this (22 bytes):

leal	0x1(%rsi), %eax
cltq             
leal	0x2(%rsi), %ecx      
movslq	%ecx, %rcx     
movl	(%rdi,%rcx,4), %ecx
addl	(%rdi,%rax,4), %ecx
movslq	%esi, %rax       
movl	%ecx, (%rdi,%rax,4)

The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, 
but it gets more complicated after that because we need to consider architecture and micro-architecture. For
example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in
hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: 
FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm 
also not sure if FeatureSlowLEA should also mean "slow complex addressing mode".

I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size
improvements when building clang and the LLVM tools with the patched compiler.

A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available
in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that
this patch takes care of.

Differential Revision: http://reviews.llvm.org/D13757

llvm-svn: 250560
2015-10-16 22:14:12 +00:00
Joseph Tremoulet d11a998e81 [WinEH] Fix CatchRetSuccessorColorMap accounting
Summary:
We now use the block for the catchpad itself, rather than its normal
successor, as the funclet entry.
Putting the normal successor in the map leads downstream funclet
membership computations to erroneous results.

Reviewers: majnemer, rnk

Subscribers: rnk, llvm-commits

Differential Revision: http://reviews.llvm.org/D13798

llvm-svn: 250552
2015-10-16 21:22:54 +00:00
Andrew Kaylor 09b39acc03 Fix assertion failure with fp128 to unsigned i64 conversion
Patch by Mitch Bodart

Differential Revision: http://reviews.llvm.org/D13780

llvm-svn: 250550
2015-10-16 20:39:20 +00:00
Craig Topper 09b6598572 [X86] Add fxsr feature flag for fxsave/fxrestore instructions.
llvm-svn: 250497
2015-10-16 06:03:09 +00:00
JF Bastien 2cdd5e4710 x86: preserve flags when folding atomic operations
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags. I fixed some of this issue in D13680 but had missed INC/DEC.

This patch adds the missing EFLAGS definition.

llvm-svn: 250438
2015-10-15 18:24:52 +00:00
Kevin B. Smith 89760f04f9 Change test to use FileCheck rather than grep.
Differential Revision: http://reviews.llvm.org/D13751

llvm-svn: 250431
2015-10-15 17:05:12 +00:00
JF Bastien 5b327712b0 x86 FP atomic codegen: don't drop globals, stack
Summary:
x86 codegen is clever about generating good code for relaxed
floating-point operations, but it was being silly when globals and
immediates were involved, forgetting where the global was and
loading/storing from/to the wrong place. The same applied to hard-coded
address immediates.

Don't let it forget about the displacement.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25171

A very similar bug when doing floating-points atomics to the stack is
also fixed by this patch.

This fixes https://llvm.org/bugs/show_bug.cgi?id=25144

Reviewers: pete

Subscribers: llvm-commits, majnemer, rsmith

Differential Revision: http://reviews.llvm.org/D13749

llvm-svn: 250429
2015-10-15 16:46:29 +00:00
Igor Breger d7bae451de AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity )
Differential Revision: http://reviews.llvm.org/D13648

llvm-svn: 250400
2015-10-15 13:29:07 +00:00
Igor Breger b4bb190eed AVX512: Implemented encoding and intrinsics for vpternlogd/q.
Differential Revision: http://reviews.llvm.org/D13768

llvm-svn: 250396
2015-10-15 12:33:24 +00:00
Elena Demikhovsky ecff21b297 AVX-512: Fixed a bug in shuffle lowering 32-bit mode
AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants.
I split 8x64-bit const vector to 16x32 on 32-bit mode.

Differential Revision: http://reviews.llvm.org/D13644

llvm-svn: 250390
2015-10-15 11:35:33 +00:00
Andrea Di Biagio 6a61cecef0 [x86] Merge test pr24562.ll into x86-fold-pshufb.ll. NFC.
llvm-svn: 250387
2015-10-15 09:54:25 +00:00
Akira Hatanaka 8ad7399f8e [MachO] Stop generating *coal* sections.
Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250370
2015-10-15 05:28:38 +00:00
Akira Hatanaka 276332b47f Revert r250349.
Test case coal-sections-powerpc.s is still failing on some buildbots.

llvm-svn: 250351
2015-10-15 00:11:03 +00:00
Akira Hatanaka 1cea644114 [MachO] Stop generating *coal* sections.
Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests.

Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250349
2015-10-14 23:48:10 +00:00
Akira Hatanaka d58d347e42 Revert r250342.
Investigate why coal-sections-powerpc.s is failing on some buildbots.

llvm-svn: 250346
2015-10-14 23:29:10 +00:00
Akira Hatanaka c078ae3e4f [MachO] Stop generating *coal* sections.
Some background on why we don't have to use *coal* sections anymore:
Long ago when C++ was new and "weak" had not been standardized, an attempt was
made in cctools to support C++ inlines that can be coalesced by putting them
into their own section (TEXT/textcoal_nt instead of TEXT/text).

The current macho linker supports the weak-def bit on any symbol to allow it to
be coalesced, but the compiler still puts weak-def functions/data into alternate
section names, which the linker must map back to the base section name.

This patch makes changes that are necessary to prevent the compiler from using
the "coal" sections and have it use the non-coal sections instead when the
target architecture is not powerpc:

TEXT/textcoal_nt instead use TEXT/text
TEXT/const_coal instead use TEXT/const
DATA/datacoal_nt instead use DATA/data

If the target is powerpc, we continue to use the *coal* sections since anyone
targeting powerpc is probably using an old linker that doesn't have support for
the weak-def bits.

Also, have the assembler issue a warning if it encounters a *coal* section in
the assembly file and inform the users to use the non-coal sections instead.

rdar://problem/14265330

Differential Revision: http://reviews.llvm.org/D13188

llvm-svn: 250342
2015-10-14 22:45:36 +00:00
Sanjay Patel e2b528074d add x86 codegen tests for 'add nsw' followed by 'sext'
llvm-svn: 250332
2015-10-14 21:47:03 +00:00
Andrea Di Biagio c47edbef4c [x86][FastISel] Teach how to select nontemporal stores.
This patch teaches x86 fast-isel how to select nontemporal stores.

On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords.
Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal
vector stores.

Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti'
for doubleword/quadword nontemporal stores. In the case of nontemporal stores
of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of
movntps/movntpd/movntdq.

With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now
always get the expected (V)MOVNT instructions.
The lack of fast-isel support for nontemporal stores was spotted when analyzing
the -O0 codegen for nontemporal stores.

Differential Revision: http://reviews.llvm.org/D13698

llvm-svn: 250285
2015-10-14 10:03:13 +00:00
JF Bastien 986ed68eed x86: preserve flags when folding atomic operations
Summary:
D4796 taught LLVM to fold some atomic integer operations into a single
instruction. The pattern was unaware that the instructions clobbered
flags.

This patch adds the missing EFLAGS definition.

Floating point operations don't set flags, the subsequent fadd
optimization is therefore correct. The same applies for surrounding
load/store optimizations.

Reviewers: rsmith, rtrieu

Subscribers: llvm-commits, reames, morisset

Differential Revision: http://reviews.llvm.org/D13680

llvm-svn: 250135
2015-10-13 00:28:47 +00:00
Cong Hou bf22f5063a Assign correct edge weights to unwind destinations when lowering invoke statement.
When lowering invoke statement, all unwind destinations are directly added as successors of call site block, and the weight of those new edges are not assigned properly. Actually, default weight 16 are used for those edges. This patch calculates the proper edge weights for those edges when collecting all unwind destinations.

Differential revision: http://reviews.llvm.org/D13354

llvm-svn: 250119
2015-10-12 23:02:58 +00:00
Reid Kleckner 4a5f35c0ae Make Win64 localescape offsets FP relative instead of SP relative
We made them SP relative back in March (r233137) because that's the
value the runtime passes to EH functions. With the new cleanuppad IR,
funclets adjust their frame argument from SP to FP, so our offsets
should now be FP-relative.

llvm-svn: 250088
2015-10-12 19:43:34 +00:00
Andrea Di Biagio b0fe4eb199 [x86] Fix wrong lowering of vsetcc nodes (PR25080).
Function LowerVSETCC (in X86ISelLowering.cpp) worked under the wrong
assumption that for non-AVX512 targets, the source type and destination type
of a type-legalized setcc node were always the same type.

This assumption was unfortunately incorrect; the type legalizer is not always
able to promote the return type of a setcc to the same type as the first
operand of a setcc.

In the case of a vsetcc node, the legalizer firstly checks if the first input
operand has a legal type. If so, then it promotes the return type of the vsetcc
to that same type. Otherwise, the return type is promoted to the 'next legal
type', which, for vectors of MVT::i1 is always a 128-bit integer vector type.

Example (-mattr=+avx):

  %0 = trunc <8 x i32> %a to <8 x i23>
  %1 = icmp eq <8 x i23> %0, zeroinitializer

The initial selection dag for the code above is:

v8i1 = setcc t5, t7, seteq:ch
  t5: v8i23 = truncate t2
    t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %vreg1
    t7: v8i32 = build_vector of all zeroes.

The type legalizer would firstly check if 't5' has a legal type. If so, then it
would reuse that same type to promote the return type of the setcc node.
Unfortunately 't5' is of illegal type v8i23, and therefore it cannot be used to
promote the return type of the setcc node. Consequently, the setcc return type
is promoted to v8i16. Later on, 't5' is promoted to v8i32 thus leading to the
following dag node:
  v8i16 = setcc t32, t25, seteq:ch

  where t32 and t25 are now values of type v8i32.

Before this patch, function LowerVSETCC would have wrongly expanded the setcc
to a single X86ISD::PCMPEQ. Surprisingly, ISel was still able to match an
instruction. In our case, ISel would have matched a VPCMPEQWrr:
  t37: v8i16 = X86ISD::VPCMPEQWrr t36, t25

However, t36 and t25 are both VR256, while the result type is instead of class
VR128. This inconsistency ended up causing the insertion of COPY instructions
like this:
  %vreg7<def> = COPY %vreg3; VR128:%vreg7 VR256:%vreg3

Which is an invalid full copy (not a sub register copy).
Eventually, the backend would have hit an UNREACHABLE "Cannot emit physreg copy
instruction" in the attempt to expand the malformed pseudo COPY instructions.

This patch fixes the problem adding the missing logic in LowerVSETCC to handle
the corner case of a setcc with 128-bit return type and 256-bit operand type.

This problem was originally reported by Dimitry as PR25080. It has been latent
for a very long time. I have added the minimal reproducible from that bugzilla
as test setcc-lowering.ll.

Differential Revision: http://reviews.llvm.org/D13660

llvm-svn: 250085
2015-10-12 19:22:30 +00:00
Amjad Aboud 1db6d7af46 [X86] Add XSAVE intrinsic family
Add intrinsics for the
  XSAVE instructions (XSAVE/XSAVE64/XRSTOR/XRSTOR64)
  XSAVEOPT instructions (XSAVEOPT/XSAVEOPT64)
  XSAVEC instructions (XSAVEC/XSAVEC64)
  XSAVES instructions (XSAVES/XSAVES64/XRSTORS/XRSTORS64)

Differential Revision: http://reviews.llvm.org/D13012

llvm-svn: 250029
2015-10-12 11:47:46 +00:00
Andrea Di Biagio a0922ed8fe [x86] PR24562: fix incorrect folding of PSHUFB nodes with a mask where all indices have the most significant bit set.
This patch fixes a problem in function 'combineX86ShuffleChain' that causes a
chain of shuffles to be wrongly folded away when the combined shuffle mask has
only one element.

We may end up with a combined shuffle mask of one element as a result of
multiple calls to function 'canWidenShuffleElements()'.
Function canWidenShuffleElements attempts to simplify a shuffle mask by widening
the size of the elements being shuffled.
For every pair of shuffle indices, function canWidenShuffleElements checks if
indices refer to adjacent elements. If all pairs refer to "adjacent" elements
then the shuffle mask is safely widened. As a consequence of widening, we end up
with a new shuffle mask which is half the size of the original shuffle mask.

The byte shuffle (pshufb) from test pr24562.ll has a mask of all SM_SentinelZero
indices. Function canWidenShuffleElements would combine each pair of
SM_SentinelZero indices into a single SM_SentinelZero index. So, in a
logarithmic number of steps (4 in this case), the pshufb mask is simplified to
a mask with only one index which is equal to SM_SentinelZero.

Before this patch, function combineX86ShuffleChain wrongly assumed that a mask
of size one is always equivalent to an identity mask. So, the entire shuffle
chain was just folded away as the combined shuffle mask was treated as a no-op
mask.

With this patch we know check if the only element of a combined shuffle mask is
SM_SentinelZero. In case, we propagate a zero vector.

Differential Revision: http://reviews.llvm.org/D13364

llvm-svn: 250027
2015-10-12 11:25:41 +00:00
Simon Pilgrim d45c88bbb5 [DAGCombiner] Improved FMA combine support for vectors
Enabled constant canonicalization for all constants.

Improved combining of constant vectors.

llvm-svn: 249993
2015-10-11 19:48:12 +00:00
Craig Topper 87990ee4ec [X86] Remove special validation for INT immediate operand from AsmParser. Instead mark its operand type as u8imm which will cause it to fail to match. This is more consistent with other instruction behavior.
This also fixes a bug where negative immediates below -128 were not being reported as errors.

llvm-svn: 249989
2015-10-11 18:27:24 +00:00
Simon Pilgrim 52d47e5704 [X86][XOP] Added support for the lowering of 128-bit vector integer comparisons to XOP PCOM/PCOMU instructions.
The XOP vector integer comparisons can deal with all signed/unsigned comparison cases directly and can be easily commuted as well (D7646).

llvm-svn: 249976
2015-10-11 14:15:17 +00:00
Simon Pilgrim bdbf839a3b [X86][SSE] Vector signed/unsigned integer compare tests.
llvm-svn: 249954
2015-10-10 22:21:05 +00:00
Reid Kleckner eb7cd6c889 [SEH] Update SEH codegen tests to use the new IR
Also Fix a buglet where SEH tables had ranges that spanned funclets.

The remaining tests using the old landingpad IR are preparation tests,
and will be deleted along with the old preparation.

llvm-svn: 249917
2015-10-09 23:05:54 +00:00
David Majnemer 35d27b21a1 [WinEH] Insert the catchpad return before CSR restoration
x64 catchpads use rax to inform the unwinder where control should go
next.  However, we must initialize rax before the epilogue sequence so
as to not perturb the unwinder.

llvm-svn: 249910
2015-10-09 22:18:45 +00:00
Reid Kleckner e1c8a7f9c7 [SEH] Fix _except_handler4 table base states
We got them right for the old IR, but not with funclets.  Port the old
test to the new IR and fix the code.

llvm-svn: 249906
2015-10-09 21:27:28 +00:00
Reid Kleckner d880dc7509 [SEH] Remember to emit the last invoke range for SEH
This wasn't very observable in execution tests, because usually there is
an invoke in the catchpad that unwinds the the catchendpad but never
actually throws.

llvm-svn: 249898
2015-10-09 20:39:39 +00:00
Reid Kleckner ae44e871cd Revert "Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64"""
This reverts commit r249794.

Apparently my checkouts are full of unexpected surprises today.

llvm-svn: 249796
2015-10-09 01:13:17 +00:00
Reid Kleckner b510401785 Revert "Revert r248959, "[WinEH] Emit int3 after noreturn calls on Win64""
This reverts commit r249032.

TODO write commit msg

llvm-svn: 249794
2015-10-09 01:11:37 +00:00