llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	dc8a24ea4c	[x86] remove stale comments from tests; NFC llvm-svn: 295569	2017-02-18 21:07:37 +00:00
Sanjay Patel	12c2093e1e	[x86] fold sext (xor Bool, -1) --> sub (zext Bool), 1 This is the same transform that is current used for: select Bool, 0, -1 llvm-svn: 295568	2017-02-18 21:03:28 +00:00
Craig Topper	ba2a726cc6	Revert "[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR." This reverts r295564. I missed that clang was still using the intrinsics despite our half implemented autoupgrade support. llvm-svn: 295565	2017-02-18 20:14:20 +00:00
Craig Topper	884db3f85d	[X86] Remove XOP VPCMOV intrinsics and autoupgrade them to native IR. It seems we were already upgrading 128-bit VPCMOV, but the intrinsic was still defined and being used in isel patterns. While I was here I also simplified the tablegen multiclasses. llvm-svn: 295564	2017-02-18 19:51:25 +00:00
Craig Topper	a505169ca5	[AVX-512] Remove 128/256-bit masked fp max/min intrinsics. Upgrade them to legacy unmasked intrinsics and select instructions. llvm-svn: 295543	2017-02-18 07:07:50 +00:00
Matthias Braun	2a707a3d3d	machine-region-info.mir: Slightly simplify test, -mtriple llvm-svn: 295520	2017-02-18 00:48:43 +00:00
Matthias Braun	431305927f	MachineRegionInfo: Fix pass initialization - Adapt MachineBasicBlock::getName() to have the same behavior as the IR BasicBlock (Value::getName()). - Add it to lib/CodeGen/CodeGen.cpp::initializeCodeGen so that it is linked in the CodeGen library. - MachineRegionInfoPass's name conflicts with RegionInfoPass's name ("region"). - MachineRegionInfo should depend on MachineDominatorTree, MachinePostDominatorTree and MachineDominanceFrontier instead of their respective IR versions. - Since there were no tests for this, add a X86 MIR test. Patch by Francis Visoiu Mistrih<fvisoiumistrih@apple.com> llvm-svn: 295518	2017-02-18 00:41:16 +00:00
Simon Pilgrim	8670993dc1	[X86] Add MOVBE targets to load combine tests Test folded endian swap tests with MOVBE instructions. llvm-svn: 295508	2017-02-17 23:00:21 +00:00
Simon Pilgrim	996f9b4cad	[X86] Add subborrow stack folding tests llvm-svn: 295496	2017-02-17 21:16:24 +00:00
Sanjay Patel	00872c3dfe	[x86] add tests for sext (not bool); NFC llvm-svn: 295495	2017-02-17 21:10:40 +00:00
Simon Pilgrim	a4c350ff17	[X86][SSE] Add (V)MOVD folding pattern with zextloadi64i32 load node. Fixes PRPR31309 llvm-svn: 295492	2017-02-17 20:43:32 +00:00
Simon Pilgrim	a3f2803905	[X86][SHA] Add SHA stack folding tests llvm-svn: 295479	2017-02-17 19:24:55 +00:00
Simon Pilgrim	f4f5cd5d19	[X86][TBM] Add TBM stack folding tests llvm-svn: 295477	2017-02-17 18:51:53 +00:00
Simon Pilgrim	99193de8ab	[X86][BMI] Add BMI2 stack folding tests llvm-svn: 295470	2017-02-17 18:00:43 +00:00
Simon Pilgrim	09dde435ab	[X86][BMI] Add BMI stack folding tests llvm-svn: 295462	2017-02-17 17:11:00 +00:00
Simon Pilgrim	511d788a95	[DAGCombine] Recognise any_extend_vector_inreg and truncation style shuffle masks During legalization we are often creating shuffles (via a build_vector scalarization stage) that are "any_extend_vector_inreg" style masks, and also other masks that are the equivalent of "truncate_vector_inreg" (if we had such a thing). This patch is an attempt to match these cases to help undo the effects of just leaving shuffle lowering to handle it - which typically means we lose track of the undefined elements of the shuffles resulting in an unnecessary extension+truncation stage for widened illegal types. The 2011-10-21-widen-cmp.ll regression will be fixed by making SIGN_EXTEND_VECTOR_IN_REG legal in SSE instead of lowering them to X86ISD::VSEXT (PR31712). Differential Revision: https://reviews.llvm.org/D29454 llvm-svn: 295451	2017-02-17 15:14:48 +00:00
Chandler Carruth	96d86a7f9c	[x86] Give this test a triple so that we don't have to cope with two different asm comment syntaxes. llvm-svn: 295394	2017-02-17 01:18:38 +00:00
Chandler Carruth	8960686927	FileCheck-ize some tests in test/CodeGen/X86/ Patch by Jorge Gorbe! Differential Revision: https://reviews.llvm.org/D29807 llvm-svn: 295386	2017-02-17 00:29:59 +00:00
Simon Pilgrim	e5215751ff	[X86][SSE] Add PR31309 test case (load-extend i32 to i128). llvm-svn: 295363	2017-02-16 19:17:36 +00:00
Hans Wennborg	35905d6a67	Re-apply r282920 "X86: Allow conditional tail calls in Win64 "leaf" functions (PR26302)" The original commit was reverted in r283329 due to a miscompile in Chromium. That turned out to be the same issue as PR31257, which was fixed in r295262. llvm-svn: 295357	2017-02-16 19:04:42 +00:00
Andrea Di Biagio	42f7712e23	x86 interrupt calling convention: only save xmm registers if the target supports SSE The existing code always saves the xmm registers for 64-bit targets even if the target doesn't support SSE (which is common for kernels). Thus, the compiler inserts movaps instructions which lead to CPU exceptions when an interrupt handler is invoked. This commit fixes this bug by returning a register set without xmm registers from getCalleeSavedRegs and getCallPreservedMask for such targets. Patch by Philipp Oppermann. Differential Revision: https://reviews.llvm.org/D29959 llvm-svn: 295347	2017-02-16 18:25:37 +00:00
Sanjay Patel	8e55b685c2	[x86] add more tests of select of constants; NFC llvm-svn: 295346	2017-02-16 18:15:16 +00:00
Artur Pilipenko	85d758299e	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Artur Pilipenko	a1b384c4ce	Rever -r295314 "[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine" This change causes some of AMDGPU and PowerPC tests to fail. llvm-svn: 295316	2017-02-16 13:04:46 +00:00
Artur Pilipenko	daaa0c0f7d	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295314	2017-02-16 12:53:26 +00:00
Craig Topper	715873ead3	[AVX-512] Remove masked packss/packus intrinsics and autoupgrade to unmasked intrinsics with select instructions. For 512-bit add new unmasked intrinsics. The new 512-bit unmasked intrinsics will make it easy to handle these with the SSE/AVX intrinsics in InstCombine where we currently have a TODO. llvm-svn: 295290	2017-02-16 06:31:54 +00:00
Hans Wennborg	a468601e0e	[X86] Re-enable conditional tail calls and fix PR31257. This reverts r294348, which removed support for conditional tail calls due to the PR above. It fixes the PR by marking live registers as implicitly used and defined by the now predicated tailcall. This is similar to how IfConversion predicates instructions. Differential Revision: https://reviews.llvm.org/D29856 llvm-svn: 295262	2017-02-16 00:04:05 +00:00
Kyle Butt	7fbec9bdf1	Codegen: Make chains from trellis-shaped CFGs Lay out trellis-shaped CFGs optimally. A trellis of the shape below: A B \|\ /\| \| \ / \| \| X \| \| / \ \| \|/ \\| C D would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors. because of this we can tail duplicate to extend existing trellises. As an example consider the following CFG: B D F H / \ / \ / \ / \ A---C---E---G---Ret Where A,C,E,G are all small (Currently 2 instructions). The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret. The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void } here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly. This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks. Here is the resulting concrete layout: straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit Differential Revision: https://reviews.llvm.org/D28522 llvm-svn: 295223	2017-02-15 19:49:14 +00:00
Michael Kuperstein	ba80db39d7	[DAG] Don't try to create an INSERT_SUBVECTOR with an illegal source We currently can't legalize those, but we should really not be creating them in the first place, since legalization would probably look similar to the way we legalize CONCAT_VECTORS - basically replace the INSERT with a BUILD. This fixes PR311956. Differential Revision: https://reviews.llvm.org/D29961 llvm-svn: 295213	2017-02-15 18:37:26 +00:00
Simon Pilgrim	da25d5c7b6	[X86][SSE] Propagate undef upper elements from scalar_to_vector during shuffle combining Only do this for integer types currently - floats types (in particular insertps) load folding often fails with this. llvm-svn: 295208	2017-02-15 17:41:33 +00:00
Simon Pilgrim	d811bdd61a	[X86] Regenerate scalar stack reload test llvm-svn: 295195	2017-02-15 16:48:45 +00:00
Simon Pilgrim	a0e56d2d68	[X86] Regenerate i64 ext-load on 32-bit target tests llvm-svn: 295177	2017-02-15 14:06:17 +00:00
Simon Pilgrim	0f0e5bd3c6	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise ZERO inputs Add support for specifying an UNPCK input as ZERO, particularly improves ZEXT cases with non-zero offsets llvm-svn: 295169	2017-02-15 11:46:15 +00:00
Craig Topper	fbc7805e25	[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types Summary: We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs. As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast. I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable. This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused. Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0. Reviewers: delena, RKSimon, zvi Reviewed By: zvi Subscribers: igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D28747 llvm-svn: 295155	2017-02-15 06:58:47 +00:00
Craig Topper	ec5df5f4aa	[AVX-512] Add PACKSS/PACKUS instructions to load folding tables. llvm-svn: 295154	2017-02-15 06:51:39 +00:00
Reid Kleckner	a622fc9bdf	[BranchFolding] Tail common all identical unreachable blocks Summary: Blocks ending in unreachable are typically cold because they end the program or throw an exception, so merging them with other identical blocks is usually profitable because it reduces the size of cold code. MachineBlockPlacement generally does not arrange to fall through to such blocks, so commoning these blocks will not introduce additional unconditional branches. Reviewers: hans, iteratee, haicheng Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29153 llvm-svn: 295105	2017-02-14 21:02:24 +00:00
Simon Pilgrim	6f732e026d	[X86][SSE] Allow matchVectorShuffleWithUNPCK to recognise UNDEF inputs Add support for specifying an UNPCK input as UNDEF llvm-svn: 295061	2017-02-14 16:22:04 +00:00
Simon Pilgrim	5b281d9a5c	[X86][SSE] Add shuffle combine tests showing missed opportunities to use UNPCK Not correctly using UNDEF or ZERO inputs to combine to UNPCK shuffles llvm-svn: 295059	2017-02-14 15:49:37 +00:00
Simon Pilgrim	8351cf1b6e	[X86][SSE] Regenerate intrinsic upgrade tests Remove excess semicolons llvm-svn: 295058	2017-02-14 15:29:50 +00:00
Simon Pilgrim	75dda50ebe	[X86][SSE] Test case showing missed PSHUFB target shuffle constant fold opportunity. It also shows an unnecessary pshufb/broadcast being used - the original pshufb mask only requested the lowest byte. llvm-svn: 295046	2017-02-14 11:20:11 +00:00
Craig Topper	d2d50cba2a	[AVX-512] Add PAVGB/PAVGW to load folding tables. llvm-svn: 295035	2017-02-14 06:54:57 +00:00
Andrew Kaylor	709f1c2a9b	[X86] Add MXCSR register This adds MXCSR to the set of recognized registers for X86 targets and updates the instructions that read or write it. I do not intend for all of the various floating point instructions that implicitly use the control bits or update the status bits of this register to ever have that usage modeled by default. However, when constrained floating point modes (such as strict FP exception status modeling or dynamic rounding modes) are enabled, implicit use/def information for MXCSR will be added to those instructions. Until those additional updates are made this should cause (almost?) no functional changes. Theoretically, this will prevent instructions like LDMXCSR and STMXCSR from being moved past one another, but that should be prevented anyway and I haven't found a case where it is happening now. Differential Revision: https://reviews.llvm.org/D29903 llvm-svn: 295004	2017-02-13 23:38:52 +00:00
Amaury Sechet	3422b539c8	Revert autogenerated check result for test/CodeGen/X86/atomic-minmax-i6432.ll as they don't regenerate cleanly. llvm-svn: 294996	2017-02-13 23:00:23 +00:00
Arnold Schwaighofer	8f3df731dc	swiftcc: Don't emit tail calls from callers with swifterror parameters Backends don't support this yet. They would have to move to the swifterror register before the tail call to make sure it is live-in to the call. rdar://30495920 llvm-svn: 294982	2017-02-13 19:58:28 +00:00
Taewook Oh	06a2128cfa	Make MachineBasicBlock::updateTerminator to update DebugLoc as well Summary: Currently MachineBasicBlock::updateTerminator simply drops DebugLoc for newly created branch instructions, which may cause incorrect stepping and/or imprecise sample profile data. Below is an example: ``` 1 extern int bar(int x); 2 3 int foo(int begin, int end) { 4 int i; 5 int ret = 0; 6 for ( 7 i = begin ; 8 i != end ; 9 i++) 10 { 11 ret += bar(i); 12 } 13 return ret; 14 } ``` Below is a bitcode of 'foo' at the end of LLVM-IR level optimizations with -O3: ``` define i32 @foo(i32* readonly %begin, i32* readnone %end) !dbg !4 { entry: %cmp6 = icmp eq i32* %begin, %end, !dbg !9 br i1 %cmp6, label %for.end, label %for.body.preheader, !dbg !12 for.body.preheader: ; preds = %entry br label %for.body, !dbg !13 for.body: ; preds = %for.body.preheader, %for.body %ret.08 = phi i32 [ %add, %for.body ], [ 0, %for.body.preheader ] %i.07 = phi i32* [ %incdec.ptr, %for.body ], [ %begin, %for.body.preheader ] %0 = load i32, i32* %i.07, align 4, !dbg !13, !tbaa !15 %call = tail call i32 @bar(i32 %0), !dbg !19 %add = add nsw i32 %call, %ret.08, !dbg !20 %incdec.ptr = getelementptr inbounds i32, i32* %i.07, i64 1, !dbg !21 %cmp = icmp eq i32* %incdec.ptr, %end, !dbg !9 br i1 %cmp, label %for.end.loopexit, label %for.body, !dbg !12, !llvm.loop !22 for.end.loopexit: ; preds = %for.body br label %for.end, !dbg !24 for.end: ; preds = %for.end.loopexit, %entry %ret.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ] ret i32 %ret.0.lcssa, !dbg !24 } ``` where ``` !12 = !DILocation(line: 6, column: 3, scope: !11) ``` . As you can see, the terminator of 'entry' block, which is a loop control branch, has a DebugLoc of line 6, column 3. Howerver, after the execution of 'MachineBlock::updateTerminator' function, which is triggered by MachineSinking pass, the DebugLoc info is dropped as below (see there's no debug-location for JNE_1): ``` bb.0.entry: successors: %bb.4(0x30000000), %bb.1.for.body.preheader(0x50000000) liveins: %rdi, %rsi %6 = COPY %rsi %5 = COPY %rdi %8 = SUB64rr %5, %6, implicit-def %eflags, debug-location !9 JNE_1 %bb.1.for.body.preheader, implicit %eflags ``` This patch addresses this issue and make newly created branch instructions to keep debug-location info. Reviewers: aprantl, MatzeB, craig.topper, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D29596 llvm-svn: 294976	2017-02-13 18:15:31 +00:00
Quentin Colombet	fbae5fcb96	[FastISel] Add a diagnostic to warm on fallback. This is consistent with what we do for GlobalISel. That way, it is easy to see whether or not FastISel is able to fully select a function. At some point we may want to switch that to an optimization remark. llvm-svn: 294970	2017-02-13 17:38:59 +00:00
Simon Pilgrim	ce2cb2d968	[X86][SSE] Add v4f32 and v2f64 extract to store tests llvm-svn: 294952	2017-02-13 14:20:13 +00:00
Simon Pilgrim	0de807f878	[X86][SSE] Add more thorough extract to store tests Added v4i32 and v2i64 tests and test on i686 as well as x86_64. llvm-svn: 294946	2017-02-13 13:40:12 +00:00
Pierre Gousseau	796e0d6df1	[X86] Improve readability of test/CodeGen/X86/lzcnt-zext-cmp.ll by adding a common check prefix ALL. NFC. llvm-svn: 294938	2017-02-13 09:57:17 +00:00
Craig Topper	3668bde371	[DAGCombiner] Teach DAG combine that inserting an extract_subvector result into the same location of a an undef vector can just use the original input to the extract. llvm-svn: 294932	2017-02-13 04:53:33 +00:00

1 2 3 4 5 ...

9083 Commits