llvm-project

Commit Graph

Author	SHA1	Message	Date
Ahmed Bougacha	04200a7c86	[X86] Remove \brief in FixupBW. NFC. llvm-svn: 268754	2016-05-06 17:28:47 +00:00
Ahmed Bougacha	cfd9e55e90	[X86] Simplify FixupBW sub_8bit_hi-related logic. NFC. Instead of passing around sizes and asking for subregs, we can check the subreg indices we care about: sub_8bit_hi and sub_8bit. Differential Revision: http://reviews.llvm.org/D20006 llvm-svn: 268753	2016-05-06 17:28:42 +00:00
Justin Bogner	b012699741	SDAG: Rename Select->SelectImpl and repurpose Select as returning void This is a step towards removing the rampant undefined behaviour in SelectionDAG, which is a part of llvm.org/PR26808. We rename SelectionDAGISel::Select to SelectImpl and update targets to match, and then change Select to return void and consolidate the sketchy behaviour we're trying to get away from there. Next, we'll update backends to implement `void Select(...)` instead of SelectImpl and eventually drop the base Select implementation. llvm-svn: 268693	2016-05-05 23:19:08 +00:00
Hans Wennborg	501e739d8a	X86CallFrameOptimization: make adjustCallSequence's return type void It always returned the same value (true). No functionality change. llvm-svn: 268645	2016-05-05 16:39:31 +00:00
Marcin Koscielnicki	0275fac2c9	[X86] Extend some Linux special cases to cover kFreeBSD. Both Linux and kFreeBSD use glibc, so follow similiar code paths. Add isTargetGlibc to check for this, and use it instead of isTargetLinux in a few places. Fixes PR22248 for kFreeBSD. Differential Revision: http://reviews.llvm.org/D19104 llvm-svn: 268624	2016-05-05 11:35:51 +00:00
David Majnemer	911d0e3c21	[X86] Use the right type when folding xor (truncate (shift)) -> setcc The result type of setcc is dependent on whether or not AVX512 is present. We had an X86-specific DAG-combine which assumed that the result type should be i8 when it could be i1. This meant that we would generate illegal setccs which LowerSETCC did not like. Instead, use an appropriate type and zero extend to i8. Also, there were some scenarios where the fold should have fired but didn't because we were overly cautious about the types. This meant that we generated: shrl $31, %edi andl $1, %edi kmovw %edi, %k0 kxnorw %k0, %k0, %k1 kshiftrw $15, %k1, %k1 kxorw %k1, %k0, %k0 kmovw %k0, %eax instead of: testl %edi, %edi setns %al This fixes PR27638. llvm-svn: 268609	2016-05-05 06:00:56 +00:00
Quentin Colombet	0c5bfd0514	[X86] Add a few register classes for x32 address accesses. The new register classes allow to tell the machine verifier that it is fine to use RIP for address accesses in x32 mode. Prior to that patch, we would complain that we are using a GR64 in place of GR32, whereas it is actually fine to use GR64 for x32 as long as the 32 high bits are 0s. RIP has this property and is used for RIP-relative addressing. This partially fixes http://llvm.org/PR27481. llvm-svn: 268567	2016-05-04 22:45:31 +00:00
David Majnemer	2c5aeabedd	[X86] Lower zext i1 arguments i1 is now a legal type for X86 with AVX512. There were some paths in X86FastISel which were not quite ready to see an i1 value: they were not quite sure how to deal with sign/zero extends for call arguments. DTRT by extending to i8 for zeroext and bailing out of FastISel for signext. This fixes PR27591. llvm-svn: 268470	2016-05-04 00:22:23 +00:00
Simon Pilgrim	be439d7f1a	[X86] Tidied up SDValue's SDNode referencing. NFCI. llvm-svn: 268445	2016-05-03 21:44:45 +00:00
Tim Northover	d2ecbccf27	X86-Darwin: start emitting data-region directives for jump-tables. The surrounding tools can cope these days, and they were invented for a reason. llvm-svn: 268437	2016-05-03 21:03:41 +00:00
David L Kreitzer	c9fbf1018a	Add an address space for the X86 SS segment. Patch by Michael LeMay (michael.lemay@intel.com) Differential Revision: http://reviews.llvm.org/D17093 llvm-svn: 268431	2016-05-03 20:16:08 +00:00
Simon Pilgrim	d2752708a3	[X86][SSE] Added target shuffle combine to MOVQ llvm-svn: 268391	2016-05-03 15:05:13 +00:00
Igor Breger	58c07806ae	[AVX512] Add support for commutative MAX/MIN . In general VMAX{PS,PD} and VMIN{PS,PD} instruction are not commutative . In combine pass only if UnsafeFPMath are used VMAX/VMAX are converted to commutative nodes VMAXC/VMAXC. Differential Revision: http://reviews.llvm.org/D19860 llvm-svn: 268375	2016-05-03 11:51:45 +00:00
Igor Breger	ab076c683c	[AVX512] Fix lowerV4X128VectorShuffle to select correctly input operands . Differential Revision: http://reviews.llvm.org/D19803 llvm-svn: 268368	2016-05-03 08:08:44 +00:00
Matthias Braun	d1aabb2813	livePhysRegs: Pass MBB by reference in addLive{Ins\|Outs}(); NFC The block must no be nullptr for the addLiveIns()/addLiveOuts() function. llvm-svn: 268340	2016-05-03 00:24:32 +00:00
Matthias Braun	24f26e6d91	LivePhysRegs: Automatically determine presence of pristine regs. Remove the AddPristinesAndCSRs parameters from addLiveIns()/addLiveOuts(). We need to respect pristine registers after prologue epilogue insertion, Seeing that we got this wrong in at least two commits already, we should rather pay the small price to query MachineFrameInfo for it. There are three cases that did not set AddPristineAndCSRs to true even after register allocation: - ExecutionDepsFix: live-out registers are used as a hint that the register is used soon. This is not true for pristine registers so use the new addLiveOutsNoPristines() to maintain this behaviour. - SystemZShortenInst: Not setting AddPristineAndCSRs to true looks like a bug, should do the right thing automatically now. - StackMapLivenessAnalysis: Not adding pristine registers looks like a bug to me. Added a FIXME comment but maintain the current behaviour as a change may need to get coordinated with GC runtimes. llvm-svn: 268336	2016-05-03 00:08:46 +00:00
Quentin Colombet	4e1d389ac5	[X86] Model FAULTING_LOAD_OP as a terminator and branch. This operation may branch to the handler block and we do not want it to happen anywhere within the basic block. Moreover, by marking it "terminator and branch" the machine verifier does not wrongly assume (because of AnalyzeBranch not knowing better) the branch is analyzable. Indeed, the target was seeing only the unconditional branch and not the faulting load op and thought it was a simple unconditional block. The machine verifier was complaining because of that and moreover, other optimizations could have done wrong transformation! In the process, simplify the representation of the handler block in the faulting load op. Now, we directly reference the handler block instead of using a label. This has the benefits of: 1. MC knows how to issue a label for a BB, so leave that to it. 2. Accessing the target BB from its label is painful, whereas it is direct from a MBB operand. Note: The 2 bytes offset in implicit-null-check.ll comes from the fact the unconditional jumps are not removed anymore, as the whole terminator sequence is not analyzable anymore. Will fix it in a subsequence commit. llvm-svn: 268327	2016-05-02 22:58:54 +00:00
Simon Pilgrim	52f8693263	[X86][SSE] Added placeholder for 128/256-bit wide shuffle combines Begun adding placeholder for future support for vperm2f128/vshuff64x2 style 128/256-bit wide shuffles llvm-svn: 268306	2016-05-02 21:12:48 +00:00
Simon Pilgrim	e5e04baf95	[X86][SSE] Dropped X86ISD::FGETSIGNx86 and use MOVMSK instead for FGETSIGN lowering movmsk.ll tests are unchanged. llvm-svn: 268237	2016-05-02 14:58:22 +00:00
David L Kreitzer	0fe4632bd7	Enable the X86 call frame optimization for the 64-bit targets that allow it. Fixes PR27241. Differential Revision: http://reviews.llvm.org/D19688 llvm-svn: 268227	2016-05-02 13:45:25 +00:00
Craig Topper	7b5925a5b6	[X86] Fix a bug in LOCK arithmetic operation pattern matching where the wrong immediate predicate check was being used for 64-bit instructions with 8-bit immediates. This didn't cause a bug because the order of the patterns ensured that the 64-bit instructions with 32-bit immediates were selected first. llvm-svn: 268212	2016-05-02 05:44:21 +00:00
Craig Topper	b6da65403a	[AVX512] VPACKUSWB/VPACKSSWB should not be encoded with EVEX.W=1. While there fix the execution domain for VPACKSSDW/VPACKUSDW. llvm-svn: 268200	2016-05-01 17:38:32 +00:00
Igor Breger	131008fbcb	Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth. Differential Revision: http://reviews.llvm.org/D19579 llvm-svn: 268190	2016-05-01 08:40:00 +00:00
Craig Topper	e430de8be6	[AVX512] Prefer AVX512 VPACK instructions over AVX/AVX2 instructions when VLX and BWI are supported. llvm-svn: 268189	2016-05-01 06:52:19 +00:00
Craig Topper	5acb5a1caf	[AVX512] Add HasVLX to the 128/256-bit versions of VPACKSSDW/USDW/SSWB/USWB and VPMADDUBSW/VPMADDWD. llvm-svn: 268188	2016-05-01 06:24:57 +00:00
Craig Topper	db290664f6	[AVX512] Make sure 128/256-bit DQI versions of VAND/VANDN/VOR/VXOR are also marked as requiring VLX. llvm-svn: 268186	2016-05-01 05:57:06 +00:00
Craig Topper	f77ca947ce	[X86] Add an AddedComplexity to another pattern to put it near similar in the output file. llvm-svn: 268184	2016-05-01 05:22:15 +00:00
Craig Topper	742977ede8	[X86] Remove a seemlingly unused pattern. The same pattern appears elsewhere with an AddedComplexity that made this unreachable. llvm-svn: 268183	2016-05-01 05:22:13 +00:00
Craig Topper	eb9a87918b	[X86] Add AddedComplexity to keep some similar patterns near each other in the output file. llvm-svn: 268181	2016-05-01 04:59:49 +00:00
Craig Topper	7ed84d826e	[X86] Remove some redundant selection patterns. llvm-svn: 268180	2016-05-01 04:59:46 +00:00
Craig Topper	c9b1923358	[AVX512] Replace vector_extract with extractelt in some patterns. They mean the same thing but vector_extract is deprecated. NFC llvm-svn: 268179	2016-05-01 04:59:44 +00:00
Craig Topper	99f6b620cc	[AVX512] Add hasSideEffects/mayLoad/mayStore flags to some instructions. llvm-svn: 268174	2016-05-01 01:03:56 +00:00
Craig Topper	e012ede137	[X86] Reduce memory usage of MemOp2RegOp and RegOp2MemOp folding maps. llvm-svn: 268164	2016-04-30 17:59:49 +00:00
Sriraman Tallam	7da9b445ea	Differential Revision: http://reviews.llvm.org/D19733 llvm-svn: 268106	2016-04-29 21:19:16 +00:00
Filipe Cabecinhas	0da9937517	Unify XDEBUG and EXPENSIVE_CHECKS (into the latter), and add an option to the cmake build to enable them. Summary: Historically, we had a switch in the Makefiles for turning on "expensive checks". This has never been ported to the cmake build, but the (dead-ish) code is still around. This will also make it easier to turn it on in buildbots. Reviewers: chandlerc Subscribers: jyknight, mzolotukhin, RKSimon, gberry, llvm-commits Differential Revision: http://reviews.llvm.org/D19723 llvm-svn: 268050	2016-04-29 15:22:48 +00:00
Craig Topper	b805723294	[X86] Remove unnecessary header file containing a small class. It was only included in one place. Just define the class directly in the cpp file. NFC llvm-svn: 267985	2016-04-29 04:22:28 +00:00
Craig Topper	e7c1cd18d3	[X86] Include X86MCTargetDesc.h directly in X86Disassembler.cpp instead of duplicating parts of it. NFC llvm-svn: 267984	2016-04-29 04:22:26 +00:00
Craig Topper	184310d6a9	[X86] Use nested switches to vary the operand to helper functions that were previously called in multiple cases. This seems to help the inliner reduce code. NFC llvm-svn: 267964	2016-04-29 00:51:30 +00:00
Craig Topper	477649a4c0	[X86] Remove unused operand from a function and all its callers. NFC llvm-svn: 267854	2016-04-28 05:58:46 +00:00
Craig Topper	33772c5375	[CodeGen] Default CTTZ_ZERO_UNDEF/CTLZ_ZERO_UNDEF to Expand in TargetLoweringBase. This is what the majority of the targets want and removes a bunch of code. Set it to Legal explicitly in the few cases where that's the desired behavior. llvm-svn: 267853	2016-04-28 03:34:31 +00:00
Mitch Bodart	e60465ddf7	[X86] Enable the post-RA-scheduler for clang's default 32-bit cpu. For compilations with no explicit cpu specified, this exhibits nice gains on Silvermont, with neutral performance on big cores. Differential Revision: http://reviews.llvm.org/D19138 llvm-svn: 267809	2016-04-27 22:52:35 +00:00
Quentin Colombet	bf200688de	[X86][FastISel] Make sure we use the right register class when we select stores. llvm-svn: 267806	2016-04-27 22:33:42 +00:00
Quentin Colombet	d6dbec4c6f	[X86] Fix the lowering of TLS calls. The callseq_end node must be glued with the TLS calls, otherwise, the generic code will miss the uses of the returned value and will mark it dead. Moreover, TLSCall 64-bit pseudo must not set an implicit-use on RDI, the pseudo uses the symbol address at this point not RDI and the lowering will do the right thing. llvm-svn: 267797	2016-04-27 21:37:37 +00:00
Kevin B. Smith	c378a99ba5	[X86]: Quit promoting 16 bit loads to 32 bit. Differential Revision: http://reviews.llvm.org/D19592 llvm-svn: 267773	2016-04-27 19:58:03 +00:00
Nico Weber	e69b9548b8	Revert r267649, it caused PR27539. llvm-svn: 267723	2016-04-27 15:16:54 +00:00
Ahmed Bougacha	9a0c9adade	[X86] Set AddPristinesAndCSRs to FixupBW LivePhysRegs. NFC. We run after PEI, so we need to AddPristinesAndCSRs. In practice, that makes no difference here, because we only ask about liveness of super-registers of defined GR8/GR16 registers, so they can't be pristine. Still, it's the correct thing to do. Thanks to Quentin for noticing! Follow-up to r267495. llvm-svn: 267658	2016-04-27 01:51:38 +00:00
Ahmed Bougacha	19a2ee591a	[X86] Don't assume that MMX extractelts are from index 0. It's probably the case for all 3 MMX users out there, but with hand-crafted IR, you can trigger selection failures. Fix that. llvm-svn: 267652	2016-04-27 01:35:29 +00:00
Ahmed Bougacha	e68363a03c	[X86] Re-enable MMX i32 extractelt combine. This effectively adds back the extractelt combine removed by r262358: the direct case can still occur (because x86_mmx is special, see r262446), but it's the indirect case that's now superseded by the generic combine. llvm-svn: 267651	2016-04-27 01:35:25 +00:00
Cong Hou	6f879d9eb1	Detects the SAD pattern on X86 so that much better code will be emitted once the pattern is matched. Differential revision: http://reviews.llvm.org/D14840 llvm-svn: 267649	2016-04-27 01:29:18 +00:00
Quentin Colombet	4ff3cfb673	[X86] Make sure it is safe to clobber EFLAGS, if need be, when choosing the prologue. Do not use basic blocks that have EFLAGS live-in as prologue if we need to realign the stack. Realigning the stack uses AND instruction and this clobbers EFLAGS. An other alternative would have been to save and restore EFLAGS around the stack realignment code, but this is likely inefficient. Fixes PR27531. llvm-svn: 267634	2016-04-26 23:44:14 +00:00
Quentin Colombet	2b3a4e787e	[X86] Teach the expansion of copy instructions how to do proper liveness. When the simple analysis provided by MachineBasicBlock::computeRegisterLiveness fails, fall back on the LivePhysReg utility. llvm-svn: 267623	2016-04-26 23:14:32 +00:00
Andrew Kaylor	2bee5ef462	Optimization bisect support in X86-specific passes Differential Revision: http://reviews.llvm.org/D19439 llvm-svn: 267608	2016-04-26 21:44:24 +00:00
Ahmed Bougacha	128f8732a5	[CodeGen] Add getBuildVector and getSplatBuildVector helpers. NFCI. Differential Revision: http://reviews.llvm.org/D17176 llvm-svn: 267606	2016-04-26 21:15:30 +00:00
Manman Ren	1c3f65a18c	Swift Calling Convention: use %RAX for sret. We don't need to copy the sret argument into %rax upon return. rdar://25671494 llvm-svn: 267579	2016-04-26 18:08:06 +00:00
Andrey Turetskiy	b405606432	[X86] PR27502: Fix the LEA optimization pass. Handle MachineBasicBlock as a memory displacement operand in the LEA optimization pass. Differential Revision: http://reviews.llvm.org/D19409 llvm-svn: 267551	2016-04-26 12:18:12 +00:00
Ahmed Bougacha	5cf735a5b1	[X86] Use LivePhysRegs in X86FixupBWInsts. Kill-flags, which computeRegisterLiveness uses, are not reliable. LivePhysRegs is. Differential Revision: http://reviews.llvm.org/D19472 llvm-svn: 267495	2016-04-26 00:00:48 +00:00
Craig Topper	03734c7ce1	[X86] Replace a SmallVector used to pass 2 values to an ArrayRef parameter with a fixed size array. NFC llvm-svn: 267377	2016-04-25 04:30:29 +00:00
Simon Pilgrim	dd748b83aa	[X86][SSE] getTargetShuffleMaskIndices - dropped (unused) UNDEF handling We aren't currently making use of this in any successful mask decode and its actually incorrect as it inserts the wrong number of SM_SentinelUndef mask elements. llvm-svn: 267350	2016-04-24 16:49:53 +00:00
Simon Pilgrim	7c25ef92a3	[X86][SSE] Use range loop. NFCI. llvm-svn: 267349	2016-04-24 16:33:35 +00:00
Simon Pilgrim	f379a6c684	[X86][XOP] Fixed VPPERM permute op decoding (PR27472). Fixed issue with VPPERM target shuffle mask decoding that was incorrectly masking off the 3-bit permute op with a 2-bit mask. llvm-svn: 267346	2016-04-24 15:05:04 +00:00
Simon Pilgrim	9f5697ef68	[X86][SSE] Improved support for decoding target shuffle masks through bitcasts Reused the ability to split constants of a type wider than the shuffle mask to work with masks generated from scalar constants transfered to xmm. This fixes an issue preventing PSHUFB target shuffle masks decoding rematerialized scalar constants and also exposes the XOP VPPERM bug described in PR27472. llvm-svn: 267343	2016-04-24 14:53:54 +00:00
Craig Topper	dbc981f71f	[X86] Merge LowerCTLZ and LowerCTLZ_ZERO_UNDEF into a single function that branches internally for the one difference, allowing the rest of the code to be common. NFC llvm-svn: 267331	2016-04-24 06:27:39 +00:00
Craig Topper	6469a39f51	[X86] Node need to check if AVX512 is supported when lowering vector CTLZ. The CTLZ operation is only Custom for vectors if AVX512 is enabled so if a vector gets here AVX512 is implied. NFC llvm-svn: 267330	2016-04-24 06:27:35 +00:00
Craig Topper	e78eac1d31	[X86] Remove isel patterns for selecting tzcnt/lzcnt from cmove/ne+cttz/ctlz. These are folded by DAG combine now. llvm-svn: 267326	2016-04-24 04:38:34 +00:00
Craig Topper	601b6c69bc	[X86] Fix patterns that turn cmove/cmovne+ctlz/cttz into lzcnt/tzcnt instructions. Only one of the conditions should be valid for each pattern, not both. Update tests accordingly. llvm-svn: 267311	2016-04-24 02:01:22 +00:00
Davide Italiano	f59b0da654	[MC/ELF] Implement support for GOTPCRELX/REX_GOTPCRELX. The option to control the emission of the new relocations is -relax-relocations (blatantly copied from GNU as). It can't be enabled by default because it breaks relatively recent versions of ld.bfd/ld.gold (late 2015). llvm-svn: 267307	2016-04-24 01:03:57 +00:00
Davide Italiano	4652c59568	[MC/ELF] Pass Fixup to getRelocType64. In preparation for other changes. llvm-svn: 267300	2016-04-23 22:26:31 +00:00
Sriraman Tallam	3cb773431d	Differential Revision: http://reviews.llvm.org/D19040 llvm-svn: 267229	2016-04-22 21:41:58 +00:00
Peter Collingbourne	265ebd7d70	CodeGen: Use PLT relocations for relative references to unnamed_addr functions. The relative vtable ABI (PR26723) needs PLT relocations to refer to virtual functions defined in other DSOs. The unnamed_addr attribute means that the function's address is not significant, so we're allowed to substitute it with the address of a PLT entry. Also includes a bonus feature: addends for COFF image-relative references. Differential Revision: http://reviews.llvm.org/D17938 llvm-svn: 267211	2016-04-22 20:40:10 +00:00
Nirav Dave	9a878c4930	Emit code16 in assembly in 16-bit mode Summary: When generating assembly using -m16 we must explicitly mark it as 16-bit. Emit .code16 at beginning of file. Fixes wrong results when using -fno-integrated-as. Reviewers: dwmw2 Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D19392 llvm-svn: 267152	2016-04-22 13:36:11 +00:00
Ashutosh Nema	468558a061	[X86]: Changing cost for “TRUNCATE v16i32 to v16i8” in SSE4.1 mode. Summary: rL256194 transforms truncations between vectors of integers into PACKUS/PACKSS operations during DAG combine. This generates better code for truncate, so cost of truncate needs to be changed but looks like it got changed only in SSE2 table Whereas this change is also applicable for SSE4.1, so the cost of truncate needs to be changed for that as well. Cost of “TRUNCATE v16i32 to v16i8” & “TRUNCATE v16i16 to v16i8” should be same in SSE4.1 & SSE2 table. Removing their cost from SSE4.1, so it will fall back to SSE2. Reviewers: Simon Pilgrim llvm-svn: 267123	2016-04-22 08:34:05 +00:00
Craig Topper	59479e7208	[AVX512] Teach lowering to use vplzcntd/q to implement 128/256-bit CTTZ_ZERO_UNDEF even without VLX support. We can just extend to 512-bits and extract like we do for CTLZ. llvm-svn: 267100	2016-04-22 03:22:38 +00:00
Craig Topper	21690db05a	[AVX512] Add CTTZ support for v8i64 and v16i32 vectors. llvm-svn: 266968	2016-04-21 07:30:06 +00:00
Craig Topper	340ad0a0c9	[AVX512] Add support for lowering CTTZ v64i8 and v32i16 with BWI instructions. llvm-svn: 266963	2016-04-21 06:39:34 +00:00
Craig Topper	7dedfdc60a	[X86] Remove redundant calls to setOperationAction for EXTRACT_VECTOR_ELT/INSERT_VECTOR_ELT from SSE41 block. They were already done in an earlier block. NFC llvm-svn: 266962	2016-04-21 06:39:32 +00:00
Craig Topper	032e985cbc	[X86] Remove some operations from the default Expand all vector ops loop. Instead let them stay Legal and mark them Expand for specific types where needed. Reduces overall number of calls to setOperationAction. NFC llvm-svn: 266961	2016-04-21 06:39:29 +00:00
Craig Topper	98c855d480	[X86] Remove old leftover MMX code that sets various 64-bit vector operations to Expand. These vector types aren't legal so these operations would never make it far enough to need to expand. NFC llvm-svn: 266960	2016-04-21 06:39:26 +00:00
Craig Topper	3e6be4c27a	[X86] Remove unnecessary setting of CTTZ_ZERO_UNDEF to Custom for vector types where we can't do any better than the Custom lowering of CTTZ. LegalizeVectorOps will expand to CTTZ since its marked Custom. CTTZ_ZERO_UNDEF can be custom lowered specially if CTLZ is supported. Otherwise CTTZ and CTTZ_ZERO_UNDEF are handled the same way by using CTPOP and bitmath. llvm-svn: 266952	2016-04-21 04:44:00 +00:00
Craig Topper	3dd625ce79	[AVX512] Add support for popcount of v8i64 and v16i32 with and without BWI instructions. Without BWI we have to split the vectors into 256-bit vectors so we can use AVX2 pshufb and then concatenate the results. llvm-svn: 266950	2016-04-21 03:57:24 +00:00
Davide Italiano	bf4df85ba7	[MC] Silence warning due to unused variable in !Debug builds. llvm-svn: 266901	2016-04-20 18:45:31 +00:00
Davide Italiano	8a8f24b098	[MC] EmitNop: Make an assertion more useful. Differential Revision: http://reviews.llvm.org/D19334 llvm-svn: 266895	2016-04-20 17:53:21 +00:00
Asaf Badouh	89406d1815	[X86] enable PIE for functions Call locally defined function directly for PIE/fPIE Differential Revision: http://reviews.llvm.org/D19226 llvm-svn: 266863	2016-04-20 08:32:57 +00:00
Craig Topper	99e60e9f1f	[AVX512] Add popcount support for v32i16 and v64i8. llvm-svn: 266858	2016-04-20 05:18:55 +00:00
Craig Topper	3e8f1e483c	[X86] Mark some floating point operations that are always expanded for vector types as Expand in a floating point only loop instead of looping through all vector types. llvm-svn: 266850	2016-04-20 01:57:44 +00:00
Craig Topper	7f28d55a00	[X86] Don't mark vector loads and shifts Expand in advance. Loads are always marked Legal or Promote for all the legal types later. Shifts are always marked custom. NFC llvm-svn: 266849	2016-04-20 01:57:42 +00:00
Craig Topper	ab7497dd6e	[X86] Merge the two different SSE2 blocks in the X86TargetLowering constructor. Also qualfiy the XOP block with !useSoftFloat to match the other vector blocks. llvm-svn: 266848	2016-04-20 01:57:40 +00:00
Craig Topper	397968ea16	[X86] Don't set vector FADD,FSUB,FMUL,FDIV,FNEG,FSQRT to Expand early. For every legal FP type we either set them to Legal or Custom anyway. So let them stay defaulted to Legal and only change when they need to be Custom. llvm-svn: 266847	2016-04-20 01:57:38 +00:00
Tim Shen	a1d8bc5597	[PPC, SSP] Support PowerPC Linux stack protection. llvm-svn: 266809	2016-04-19 20:14:52 +00:00
Tim Shen	e885d5e4d3	[SSP, 2/2] Create llvm.stackguard() intrinsic and lower it to LOAD_STACK_GUARD With this change, ideally IR pass can always generate llvm.stackguard call to get the stack guard; but for now there are still IR form stack guard customizations around (see getIRStackGuard()). Future SSP customization should go through LOAD_STACK_GUARD. There is a behavior change: stack guard values are not CSEed anymore, since we should never reuse the value in case that it has been spilled (and corrupted). See ssp-guard-spill.ll. This also cause the change of stack size and codegen in X86 and AArch64 test cases. Ideally we'd like to know if the guard created in llvm.stackprotector() gets spilled or not. If the value is spilled, discard the value and reload stack guard; otherwise reuse the value. This can be done by teaching register allocator to know how to rematerialize LOAD_STACK_GUARD and force a rematerialization (which seems hard), or check for spilling in expandPostRAPseudo. It only makes sense when the stack guard is a global variable, which requires more instructions to load. Anyway, this seems to go out of the scope of the current patch. llvm-svn: 266806	2016-04-19 19:40:37 +00:00
Sanjoy Das	2effffd456	[X86] Simplify StackMapShadowTracker; NFC - Elide trivial contructor and desctructor - Move implementation out of an unnecessary explicit llvm namespace scope llvm-svn: 266794	2016-04-19 18:48:16 +00:00
Sanjoy Das	6ecfae61dc	[X86MCInstLower] Clean up EmitNops; NFC Instead of having a conditional assert inside EmitNops, refactor so that the caller can have the assert instead. llvm-svn: 266793	2016-04-19 18:48:13 +00:00
David L Kreitzer	d5cb34118d	Preliminary changes for fixing PR27241. Generalized/restructured some things in preparation for enabling the outgoing parameter store-to-push optimization for 64-bit targets. Differential Revision: http://reviews.llvm.org/D19222 llvm-svn: 266774	2016-04-19 17:43:44 +00:00
Simon Pilgrim	32b1c9fe7f	[X86][AVX2] Prefer VPERMQ/VPERMPD over VINSERTI128/VINSERTF128 for unary shuffles Using VPERMQ/VPERMPD allows memory folding of the (repeated) input where VINSERTI128/VINSERTF128 can not. Differential Revision: http://reviews.llvm.org/D19228 llvm-svn: 266728	2016-04-19 12:26:40 +00:00
Sanjoy Das	c0441c29df	Introduce a "patchable-function" function attribute Summary: The `"patchable-function"` attribute can be used by an LLVM client to influence LLVM's code generation in ways that makes the generated code easily patchable at runtime (for instance, to redirect control). Right now only one patchability scheme is supported, `"prologue-short-redirect"`, but this can be expanded in the future. Reviewers: joker.eph, rnk, echristo, dberris Subscribers: joker.eph, echristo, mcrosier, llvm-commits Differential Revision: http://reviews.llvm.org/D19046 llvm-svn: 266715	2016-04-19 05:24:47 +00:00
Mehdi Amini	b550cb1750	[NFC] Header cleanup Removed some unused headers, replaced some headers with forward class declarations. Found using simple scripts like this one: clear && ack --cpp -l '#include "llvm/ADT/IndexedMap.h"' \| xargs grep -L 'IndexedMap[<]' \| xargs grep -n --color=auto 'IndexedMap' Patch by Eugene Kosov <claprix@yandex.ru> Differential Revision: http://reviews.llvm.org/D19219 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266595	2016-04-18 09:17:29 +00:00
Craig Topper	221e1c2b1f	[X86] Be explicit about calls to setOperationAction for AVX2 and AVX512 rather than just looping over all vector types and conditinally matching them. NFC llvm-svn: 266577	2016-04-17 22:49:46 +00:00
Simon Pilgrim	dd153476fd	[X86] Added TODO comment for target shuffle mask decoding of bitcasted masks llvm-svn: 266559	2016-04-17 11:34:18 +00:00
Asaf Badouh	aec79651c1	[X86] Remove unneeded variables no functional change. ExtraLoad and WrapperKind are been used only if (OpFlags == X86II::MO_GOTPCREL). Differential Revision: http://reviews.llvm.org/D18942 llvm-svn: 266557	2016-04-17 08:28:40 +00:00
Craig Topper	75869d5701	[AVX512] ISD::MUL v2i64/v4i64 should only be legal if DQI and VLX features are enabled. llvm-svn: 266554	2016-04-17 07:25:39 +00:00
Craig Topper	1663e7a472	[X86] Use ternary operator to reduce code slightly. NFC llvm-svn: 266534	2016-04-16 19:09:32 +00:00
Simon Pilgrim	fd4b9b02a3	[X86][XOP] Added VPPERM constant mask decoding and target shuffle combining support Added additional test that peeks through bitcast to v16i8 mask llvm-svn: 266533	2016-04-16 17:52:07 +00:00
Craig Topper	ea46b592ab	Add a setOperationPromotedToType convenience method that sets an operation to promoted and set the type in one call. Use it so save code in X86. llvm-svn: 266413	2016-04-15 06:20:18 +00:00
Craig Topper	13e9dc66e4	[X86] AND, OR, and XOR of vectors are always legal no need to set them legal explicitly. llvm-svn: 266412	2016-04-15 06:20:14 +00:00
Craig Topper	5e20fd3e7c	[X86] Combine an if and else block that had the same set of calls to setOperationAction that only varied in Legal/Custom. Use the ternary operator on that argument instead. NFC llvm-svn: 266410	2016-04-15 04:57:09 +00:00
Reid Kleckner	28865809fe	Sink DI metadata usage out of MachineInstr.h and MachineInstrBuilder.h MachineInstr.h and MachineInstrBuilder.h are very popular headers, widely included across all LLVM backends. It turns out that there only a handful of TUs that actually care about DI operands on MachineInstrs. After this change, touching DebugInfoMetadata.h and rebuilding llc only needs 112 actions instead of 542. llvm-svn: 266351	2016-04-14 18:29:59 +00:00
Mehdi Amini	867e91468b	Do not use getGlobalContext()... ever. This code was creating a new type in the global context, regardless of which context the user is sitting in, what can possibly go wrong? From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 266275	2016-04-14 04:36:40 +00:00
Matthias Braun	46b0f03e12	TargetLowering: Factor out common code for tail call eligibility checking; NFC llvm-svn: 266270	2016-04-14 01:10:42 +00:00
Matthias Braun	588d1cdad4	X86: Use a callee save register for the swiftself parameter. It is very likely that the swiftself parameter is alive throughout most functions function so putting it into a callee save register should avoid spills for the callers with only a minimum amount of extra spills in the callees. Currently the generated code is correct but unnecessarily spills and reloads arguments passed in callee save registers, I will address this in upcoming patches. This also adds a missing check that for tail calls the preserved value of the caller must be the same as the callees parameter. Differential Revision: http://reviews.llvm.org/D18902 llvm-svn: 266252	2016-04-13 21:43:21 +00:00
David L Kreitzer	99775c1b6e	Fixed a few typos and formatting problems. NFCI. llvm-svn: 266135	2016-04-12 21:45:09 +00:00
Justin Bogner	32ad24d4ef	X86: Avoid accessing SDValues after they've been RAUW'd This fixes two use-after-frees in selectLEA64_32Addr. If matchAddress matches an ADD with an AND as an operand, and that AND hits one of the "heroic transforms" that folds masks and shifts, we end up with N pointing to an SDNode that was deleted. Make sure we're done accessing it before that. Found by ASan with the recycling allocator changes in llvm.org/PR26808. llvm-svn: 266130	2016-04-12 21:34:24 +00:00
Sanjay Patel	e6a0a23e08	fix indentation; NFC llvm-svn: 266097	2016-04-12 18:01:48 +00:00
Manman Ren	5751814eda	Swift Calling Convention: swifterror target support. Differential Revision: http://reviews.llvm.org/D18716 llvm-svn: 265997	2016-04-11 21:08:06 +00:00
Sriraman Tallam	f39e190ad8	Test commit. llvm-svn: 265976	2016-04-11 18:40:50 +00:00
Andrey Turetskiy	9df334c28e	[X86] Restrict max long nop length for Lakemont. Restrict the max length of long nops for Lakemont to 7. Experiments on MCU benchmarks (Dhrystone, Coremark) show that this is the most optimal length. Differential Revision: http://reviews.llvm.org/D18897 llvm-svn: 265924	2016-04-11 10:07:36 +00:00
Simon Pilgrim	d263fdc512	[X86][AVX512BW] Add support for v64i8 multiplies Extend the existing lowering of vXi8 multiplies to support v64i8 on avx512bw targets. I added the Lower512IntArith helper function to help with this - not sure how often this could be used in the future, but it seemed better than putting all that logic inside LowerMUL. Differential Revision: http://reviews.llvm.org/D18937 llvm-svn: 265902	2016-04-10 17:02:48 +00:00
Craig Topper	35db8ecb50	[X86] Use for loops over types to reduce code for setting up operation actions. llvm-svn: 265893	2016-04-10 05:39:32 +00:00
Craig Topper	dcc8f49bf0	[X86] Remove unnecessary setOperationAction for SRA v2i64/v4i64 when VLX is suppored. This is already done for SSE2/AVX2 which VLX implies. NFC llvm-svn: 265892	2016-04-10 05:39:28 +00:00
Davide Italiano	7aa47094b2	[MC] support TLSDESC and TLSCALL / GNU2 tls dialect Differential Revision: http://reviews.llvm.org/D18885 llvm-svn: 265881	2016-04-09 20:32:33 +00:00
Sanjay Patel	4abae4e0fa	[x86] use BMI 'andn' for logic + compare ops With BMI, we can use 'andn' to save an instruction when the result is only used in a compare. This is related to one of the potential sequences to check 'isfinite' in: https://llvm.org/bugs/show_bug.cgi?id=27164 Differential Revision: http://reviews.llvm.org/D18910 llvm-svn: 265875	2016-04-09 16:02:52 +00:00
Simon Pilgrim	1cc5712763	[X86][XOP] Support for VPPERM 2-input shuffle mask decoding This patch adds support for decoding XOP VPPERM instruction when it represents a basic shuffle. The mask decoding required the existing MCInstrLowering code to be updated to support binary shuffles - the implementation now matches what is done in X86InstrComments.cpp. Differential Revision: http://reviews.llvm.org/D18441 llvm-svn: 265874	2016-04-09 14:51:26 +00:00
Craig Topper	f027107094	[X86] Use for loops over types to reduce code for setting up operation actions. NFC llvm-svn: 265871	2016-04-09 06:31:02 +00:00
Craig Topper	e801ed9e15	[X86] Remove calls to setOperationAction that set CTLZ_ZERO_UNDEF for some vector types to Expand. Expand is already set for all operations for all vector types earlier so this is redundant. NFC llvm-svn: 265870	2016-04-09 05:53:48 +00:00
Tim Shen	0012756489	[SSP] Remove llvm.stackprotectorcheck. This is a cleanup patch for SSP support in LLVM. There is no functional change. llvm.stackprotectorcheck is not needed, because SelectionDAG isn't actually lowering it in SelectBasicBlock; rather, it adds check code in FinishBasicBlock, ignoring the position where the intrinsic is inserted (See FindSplitPointForStackProtector()). llvm-svn: 265851	2016-04-08 21:26:31 +00:00
Hans Wennborg	e25b65bdb7	Rangeify a loop. NFC. llvm-svn: 265846	2016-04-08 20:46:09 +00:00
Hans Wennborg	74ff770670	Remove some redundant variables from X86TargetLowering::LowerDYNAMIC_STACKALLOC These are already defined, with the same values, a few lines up. NFC. llvm-svn: 265845	2016-04-08 20:46:00 +00:00
Kevin B. Smith	e0a6fc3bcc	[X86] Fix PR23155 by turning on X86FixupBWInsts by default. Differential Revision: http://reviews.llvm.org/D18866 llvm-svn: 265830	2016-04-08 18:58:29 +00:00
Simon Pilgrim	476170384f	[X86] Tidied up shuffle decode function doxygen descriptions As discussed on D18441 - auto brief is used so we don't need /brief, we don't need to include the function name and added some missing descriptions. llvm-svn: 265785	2016-04-08 14:17:07 +00:00
Kevin B. Smith	3802c4af59	[X86]: Fix for PR27251. Differential Revision: http://reviews.llvm.org/D18850 llvm-svn: 265690	2016-04-07 16:15:34 +00:00
Simon Pilgrim	d54bae6525	[X86][SSE] Add support for VZEXT constant folding llvm-svn: 265646	2016-04-07 07:52:45 +00:00
Ahmed Bougacha	1cf67fb9cb	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. Re-apply r265450 which caused PR27245 and was reverted in r265559 because of a wrong generalization: the fetch_and_add->add_and_fetch combine only works in specific, but pretty common, cases: (icmp slt x, 0) -> (icmp sle (add x, 1), 0) (icmp sge x, 0) -> (icmp sgt (add x, 1), 0) (icmp sle x, 0) -> (icmp slt (sub x, 1), 0) (icmp sgt x, 0) -> (icmp sge (sub x, 1), 0) Original Message: We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265636	2016-04-07 02:07:10 +00:00
Hans Wennborg	ab16be799c	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" Third time's the charm? The previous attempt (r265345) caused ASan test failures on X86, as broken CFI caused stack traces to not work. This version of the patch makes sure not to merge with stack adjustments that have CFI, and to not add merged instructions' offests to the CFI about to be generated. This is already covered by the lit tests; I just got the expectations wrong previously. llvm-svn: 265623	2016-04-07 00:05:49 +00:00
JF Bastien	800f87a871	NFC: make AtomicOrdering an enum class Summary: In the context of http://wg21.link/lwg2445 C++ uses the concept of 'stronger' ordering but doesn't define it properly. This should be fixed in C++17 barring a small question that's still open. The code currently plays fast and loose with the AtomicOrdering enum. Using an enum class is one step towards tightening things. I later also want to tighten related enums, such as clang's AtomicOrderingKind (which should be shared with LLVM as a 'C++ ABI' enum). This change touches a few lines of code which can be improved later, I'd like to keep it as NFC for now as it's already quite complex. I have related changes for clang. As a follow-up I'll add: bool operator<(AtomicOrdering, AtomicOrdering) = delete; bool operator>(AtomicOrdering, AtomicOrdering) = delete; bool operator<=(AtomicOrdering, AtomicOrdering) = delete; bool operator>=(AtomicOrdering, AtomicOrdering) = delete; This is separate so that clang and LLVM changes don't need to be in sync. Reviewers: jyknight, reames Subscribers: jyknight, llvm-commits Differential Revision: http://reviews.llvm.org/D18775 llvm-svn: 265602	2016-04-06 21:19:33 +00:00
Hans Wennborg	6849f8f15f	Revert r265450 "[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC." It caused ASan 32-bit tests to hang (PR27245). llvm-svn: 265559	2016-04-06 16:44:38 +00:00
Hans Wennborg	a7e396b5ef	Revert "Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)"" It seems to be causing ASan tests to crash, probably due to miscompiling the run-time somehow. llvm-svn: 265551	2016-04-06 16:10:20 +00:00
Evgeniy Stepanov	dde29e2799	Faster stack-protector for Android/AArch64. Bionic has a defined thread-local location for the stack protector cookie. Emit a direct load instead of going through __stack_chk_guard. llvm-svn: 265481	2016-04-05 22:41:50 +00:00
Manman Ren	f8bdd88cd9	Swift Calling Convention: add swiftcc. Differential Revision: http://reviews.llvm.org/D17863 llvm-svn: 265480	2016-04-05 22:41:47 +00:00
Duncan P. N. Exon Smith	91d3cfed78	Revert "Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes." This reverts commit r265454 since it broke the build. E.g.: http://lab.llvm.org:8080/green/job/clang-stage1-cmake-RA-incremental_build/22413/ llvm-svn: 265459	2016-04-05 20:45:04 +00:00
Eugene Zelenko	1760dc2a23	Fix Clang-tidy modernize-deprecated-headers warnings in remaining files; other minor fixes. Some Include What You Use suggestions were used too. Use anonymous namespaces in source files. Differential revision: http://reviews.llvm.org/D18778 llvm-svn: 265454	2016-04-05 20:19:49 +00:00
Ahmed Bougacha	50e6cd4a3a	[X86] Reuse EFLAGS and form LOCKed ops when only user is SETCC. We only generate LOCKed versions of add/sub when the result is unused. It often happens that the result is used, but only by a comparison. We can optimize those out by reusing EFLAGS, which lets us use the proper instructions, instead of having to fallback to LXADD. Instead of doing this as an MI peephole (as we do for the other non-LOCKed (really, non-MR) forms), do it in ISel. It becomes quite tricky later. This also makes it eventually possible to stop expanding and/or/xor if the only user is an icmp (also see D18141). This uses the LOCK ISD opcodes added by r262244. Differential Revision: http://reviews.llvm.org/D17633 llvm-svn: 265450	2016-04-05 20:02:57 +00:00
Ahmed Bougacha	629446ba03	[X86] Simplify early-exit check. NFC. llvm-svn: 265447	2016-04-05 20:02:22 +00:00
Sanjay Patel	4c7d094451	fix typo; NFC llvm-svn: 265442	2016-04-05 19:27:39 +00:00
Hans Wennborg	a47a692341	Re-commit r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" The original commit miscompiled things on 32-bit Windows, e.g. a Clang boostrap. It turns out that mergeSPUpdates() was a bit too generous in what it interpreted as a stack adjustment, causing the following code: addl $12, %esp leal -4(%ebp), %esp To be "optimized" into simply: addl $8, %esp This commit tightens up mergeSPUpdates() and includes a new test (test14 in movtopush.ll) for this situation. llvm-svn: 265345	2016-04-04 21:02:46 +00:00
Matthias Braun	870c34f0cf	ARM, AArch64, X86: Check preserved registers for tail calls. We can only perform a tail call to a callee that preserves all the registers that the caller needs to preserve. This situation happens with calling conventions like preserver_mostcc or cxx_fast_tls. It was explicitely handled for fast_tls and failing for preserve_most. This patch generalizes the check to any calling convention. Related to rdar://24207743 Differential Revision: http://reviews.llvm.org/D18680 llvm-svn: 265329	2016-04-04 18:56:13 +00:00
Derek Schuff	1dbf7a571f	Add MachineFunctionProperty checks for AllVRegsAllocated for target passes Summary: This adds the same checks that were added in r264593 to all target-specific passes that run after register allocation. Reviewers: qcolombet Subscribers: jyknight, dsanders, llvm-commits Differential Revision: http://reviews.llvm.org/D18525 llvm-svn: 265313	2016-04-04 17:09:25 +00:00
Elena Demikhovsky	e99c561391	AVX-512: Truncating store for i1 vectors Implemented truncstore for KNL and skylake-avx512. Covered vectors from v2i1 to v64i1. We save the value in bits (not in bytes) - v32i1 is saved in 4 bytes. Differential Revision: http://reviews.llvm.org/D18740 llvm-svn: 265283	2016-04-04 07:17:47 +00:00
Simon Pilgrim	0edd3d771a	[X86] Removed duplicate code. llvm-svn: 265274	2016-04-03 20:40:35 +00:00
Simon Pilgrim	cd0dfc93eb	[X86][SSE] Support for MOVMSK signbit extraction instructions Add support for lowering with the MOVMSK instruction to extract vector element signbits to a GPR. This is an early step towards more optimal handling of vector comparison results. Differential Revision: http://reviews.llvm.org/D18741 llvm-svn: 265266	2016-04-03 18:22:03 +00:00
Simon Pilgrim	20d1d4f045	[X86] Tidied up X86ISD instruction nodes. NFCI. Tidied up comments, stripped trailing whitespace, split apart nodes that aren't related. No change in ordering although there is definitely some scope for it. llvm-svn: 265263	2016-04-03 14:14:32 +00:00
Elena Demikhovsky	5e426f7356	AVX-512: Load and Extended Load for i1 vectors Implemented load+{sign\|zero}_extend for i1 vectors Fixed failures in i1 vector load. Covered loading of v2i1, v4i1, v8i1, v16i1, v32i1, v64i1 vectors for KNL and SKX. Differential Revision: http://reviews.llvm.org/D18737 llvm-svn: 265259	2016-04-03 08:41:12 +00:00
Sanjay Patel	9f413364d5	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to http://reviews.llvm.org/D18566 and http://reviews.llvm.org/D18676 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The 16-byte test that was added in D18566 is now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. Note that the SSE1 path is not changed in this patch. That can be a follow-up. This patch should resolve PR27100. llvm-svn: 265161	2016-04-01 17:36:45 +00:00
Sanjay Patel	a05e0ff223	[x86] avoid intermediate splat for non-zero memsets (PR27100) Follow-up to D18566 - where we noticed that an intermediate splat was being generated for memsets of non-zero chars. That was because we told getMemsetStores() to use a 32-bit vector element type, and it happily obliged by producing that constant using an integer multiply. The tests that were added in the last patch are now equivalent for AVX1 and AVX2 (no splats, just a vector load), but we have PR27141 to track that splat difference. In the new tests, the splat via shuffling looks ok to me, but there might be some room for improvement depending on uarch there. Note that the SSE1/2 paths are not changed in this patch. That can be a follow-up. This patch should resolve PR27100. Differential Revision: http://reviews.llvm.org/D18676 llvm-svn: 265148	2016-04-01 16:27:14 +00:00
Andrea Di Biagio	8c48841907	[x86] Remove redundant call to setTargetDAGCombine for BUILD_VECTOR node type. Since revision 235394, we no longer perform target specific combines on build_vector nodes. No functional change intended. llvm-svn: 265138	2016-04-01 12:25:44 +00:00
Andrey Turetskiy	958eb46443	[X86] Introduce Lakemont CPU. Add a new Intel MCU CPU Lakemont, which doesn't support X87. Differential Revision: http://reviews.llvm.org/D18650 llvm-svn: 265128	2016-04-01 10:16:15 +00:00
Michael Kuperstein	7bab713188	Use range-based for loops. NFC. llvm-svn: 265105	2016-04-01 03:45:08 +00:00
Hans Wennborg	649159df3c	Follow-up to r265036: I got these iterators mixed up llvm-svn: 265076	2016-03-31 23:55:16 +00:00
Hans Wennborg	132cd62121	Revert r265039 "[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140)" I think it might have caused these build breakages: http://lab.llvm.org:8011/builders/clang-x86-win2008-selfhost/builds/7234/steps/build%20stage%202/logs/stdio http://lab.llvm.org:8011/builders/sanitizer-windows/builds/19566/steps/run%20tests/logs/stdio llvm-svn: 265046	2016-03-31 20:27:30 +00:00
Hans Wennborg	e97fb414e8	[X86] Merge adjacent stack adjustments in eliminateCallFramePseudoInstr (PR27140) For code such as: void f(int, int); void g() { f(1, 2); } compiled for 32-bit X86 Linux, Clang would previously generate: subl $12, %esp subl $8, %esp pushl $2 pushl $1 calll f addl $16, %esp addl $12, %esp retl This patch fixes that by merging adjacent stack adjustments in eliminateCallFramePseudoInstr(). Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265039	2016-03-31 19:26:24 +00:00
Hans Wennborg	e1a2e90ffa	Change eliminateCallFramePseudoInstr() to return an iterator This will become necessary in a subsequent change to make this method merge adjacent stack adjustments, i.e. it might erase the previous and/or next instruction. It also greatly simplifies the calls to this function from Prolog- EpilogInserter. Previously, that had a bunch of logic to resume iteration after the call; now it just continues with the returned iterator. Note that this changes the behaviour of PEI a little. Previously, it attempted to re-visit the new instruction created by eliminateCallFramePseudoInstr(). That code was added in r36625, but I can't see any reason for it: the new instructions will obviously not be pseudo instructions, they will not have FrameIndex operands, and we have already accounted for the stack adjustment. Differential Revision: http://reviews.llvm.org/D18627 llvm-svn: 265036	2016-03-31 18:33:38 +00:00
Sanjay Patel	92d5ea5e07	[x86] use SSE/AVX ops for non-zero memsets (PR27100) Move the memset check down to the CPU-with-slow-SSE-unaligned-memops case: this allows fast targets to take advantage of SSE/AVX instructions and prevents slow targets from stepping into a codegen sinkhole while trying to splat a byte into an XMM reg. Follow-on bugs exposed by the current codegen are: https://llvm.org/bugs/show_bug.cgi?id=27141 https://llvm.org/bugs/show_bug.cgi?id=27143 Differential Revision: http://reviews.llvm.org/D18566 llvm-svn: 265029	2016-03-31 17:30:06 +00:00
Nirav Dave	83ce54aac2	Prevent X86ISelLowering from merging volatile loads Change isConsecutiveLoads to check that loads are non-volatile as this is a requirement for any load merges. Propagate change to two callers. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18546 llvm-svn: 265013	2016-03-31 13:40:55 +00:00
Craig Topper	d2aa03a60a	[X86] Use MVT instead of EVT in code called after legalization. llvm-svn: 264992	2016-03-31 04:37:41 +00:00
Hans Wennborg	6596977130	[X86] Enable call frame optimization ("mov to push") not only for optsize (PR26325) The size savings are significant, and from what I can tell, both ICC and GCC do this. Differential Revision: http://reviews.llvm.org/D18573 llvm-svn: 264966	2016-03-30 23:38:01 +00:00
Matthias Braun	8d41436004	CodeGen: Factor out code for tail call result compatibility check; NFC llvm-svn: 264959	2016-03-30 22:46:04 +00:00
Aaron Ballman	ef0fe1eed8	Silencing warnings from MSVC 2015 Update 2. All of these changes silence "C4334 '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)". NFC. llvm-svn: 264929	2016-03-30 21:30:00 +00:00
Simon Pilgrim	c49bd2ede0	[X86][AVX] Ensure EltsFromConsecutiveLoads tests the entire vector for consecutive loads/zeros Fix for issue introduced D17297, where we were breaking early from the loop detecting consecutive loads which could leave us thinking a consecutive load with zeros was possible. llvm-svn: 264922	2016-03-30 20:52:24 +00:00
Nirav Dave	8dd66e5753	Remove HasFnAttribute guards to getFnAttribute calls These checks are redundant and can be removed Reviewers: hans Subscribers: llvm-commits, mzolotukhin Differential Revision: http://reviews.llvm.org/D18564 llvm-svn: 264872	2016-03-30 15:41:12 +00:00
Simon Pilgrim	b87ffe8519	[X86][XOP] BITREVERSE lowering using VPPERM XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction. llvm-svn: 264870	2016-03-30 14:14:00 +00:00
Chandler Carruth	8e06a10d1f	[x86] Fix a horrible bug in our lowering of x86 floating point atomic operations. Specifically, we had code that tried to badly approximate reconstructing all of the possible variations on addressing modes in two x86 instructions based on those in one pseudo instruction. This is not the first bug uncovered with doing this, so stop doing it altogether. Instead generically and pedantically copy every operand from the address over to both new instructions, and strip kill flags from any register operands. This fixes a subtle bug seen in the wild where we would mysteriously drop parts of the addressing mode, causing for example the index argument in the added test case to just be completely ignored. Hypothetically, this was an extremely bad miscompile because it actually caused a predictable and leveragable write of a 64bit quantity to an unintended offset (the first element of the array intead of whatever other element was intended). As a consequence, in theory this could even have introduced security vulnerabilities. However, this was only something that could happen with an atomic floating point add. No other operation could trigger this bug, so it seems extremely unlikely to have occured widely in the wild. But it did in fact occur, and frequently in scientific applications which were using relaxed atomic updates of a floating point value after adding a delta. Those would end up being quite badly miscompiled by LLVM, which is how we found this. Of course, this often looks like a race condition in the code, but it was actually a miscompile. I suspect that this whole RELEASE_FADD thing was a complete mistake. There is no such operation, and I worry that anything other than add will get remarkably worse codegeneration. But that's not for this change.... llvm-svn: 264845	2016-03-30 08:41:59 +00:00
Chandler Carruth	81c3ddeb1c	[x86] Extract a helper function to compute the full addressing mode from an x86 MachineInstr's operands. This will be super useful to fix some bad atomics code in my next commit. No functionality changed. llvm-svn: 264819	2016-03-30 03:10:24 +00:00
Manman Ren	f46262e0b7	Swift Calling Convention: add swiftself attribute. Differential Revision: http://reviews.llvm.org/D17866 llvm-svn: 264754	2016-03-29 17:37:21 +00:00
Elena Demikhovsky	95629caaa9	AVX-512: fixed a bug in fp_to_uint pattern on KNL Fixed fp_to_uint instruction selection on KNL. One pattern was missing for <4 x double> to <4 x i32> Differential Revision: http://reviews.llvm.org/D18512 llvm-svn: 264701	2016-03-29 06:33:41 +00:00
Simon Pilgrim	d3df400fa9	[X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements. If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors. The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent. Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least. Differential Revision: http://reviews.llvm.org/D18492 llvm-svn: 264666	2016-03-28 21:33:52 +00:00
Elena Demikhovsky	83f0647d85	AVX-512: Fixed ICMP instruction selection for i1 operands ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566	2016-03-28 07:47:58 +00:00
Simon Pilgrim	dcdf85033c	[X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517	2016-03-26 18:32:13 +00:00
Simon Pilgrim	e4dbeb40c6	[X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512	2016-03-26 15:44:55 +00:00
Simon Pilgrim	3eef33a806	[X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511	2016-03-26 15:27:20 +00:00
Simon Pilgrim	7379a70677	[X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509	2016-03-26 09:44:27 +00:00
Simon Pilgrim	ff7b7141cd	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC. LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506	2016-03-26 09:29:04 +00:00
David Majnemer	020e890a19	[X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465	2016-03-25 21:49:11 +00:00
Hans Wennborg	5f916d3df4	[X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440	2016-03-25 18:11:31 +00:00
Simon Pilgrim	ac04923b0f	[X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC. LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith. llvm-svn: 264403	2016-03-25 14:17:54 +00:00
Elena Demikhovsky	abc9c04ab7	fixed typo llvm-svn: 264395	2016-03-25 10:08:36 +00:00
Hans Wennborg	4ae5119eeb	X86: Use push-pop for materializing 8-bit immediates for minsize (take 2) This is the same as r255936, with added logic for avoiding clobbering of the red zone (PR26023). Differential Revision: http://reviews.llvm.org/D18246 llvm-svn: 264375	2016-03-25 01:10:56 +00:00
Simon Pilgrim	a6ba27fbde	[X86][XOP] Fixed instruction postfixes to more closely match operands Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky llvm-svn: 264305	2016-03-24 16:31:30 +00:00
Elena Demikhovsky	95f3173ce9	AVX-512: Generate KTEST instead of TEST fir i1 vectors KTEST instruction may be used instead of TEST in this case: %int_sel3 = bitcast <8 x i1> %sel3 to i8 %res = icmp eq i8 %int_sel3, zeroinitializer br i1 %res, label %L2, label %L1 Differential Revision: http://reviews.llvm.org/D18444 llvm-svn: 264298	2016-03-24 15:53:45 +00:00
Simon Pilgrim	d7c4fce47d	[X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI. llvm-svn: 264294	2016-03-24 15:28:02 +00:00
Simon Pilgrim	572ca71573	[X86][XOP] Support for VPPERM byte shuffle instruction This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode. Differential Revision: http://reviews.llvm.org/D18189 llvm-svn: 264260	2016-03-24 11:52:43 +00:00
Paul Robinson	f81836bd18	[PS4] Guarantee an instruction after a 'noreturn' call. We need the "return address" of a noreturn call to be within the bounds of the calling function; TrapUnreachable turns 'unreachable' into a 'ud2' instruction, which has that desired effect. Differential Revision: http://reviews.llvm.org/D18414 llvm-svn: 264224	2016-03-24 00:10:03 +00:00
Cong Hou	94710840fb	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 264199	2016-03-23 21:45:37 +00:00
Sanjay Patel	7876f180b5	[x86] make peekThroughBitcasts() a helper function This should be hoisted further up so it can be used in DAGCombiner and other backends, but I'm limiting the scope in the interest of patch minimalism. It's not quite NFC because some of the replaced code was using an 'if' check rather than a 'while' loop, so those cases would only look through a single bitcast. llvm-svn: 264186	2016-03-23 20:16:37 +00:00
Andrey Turetskiy	6a3d561ea0	[X86] Introduction of FeatureX87. Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87. Differential Revision: http://reviews.llvm.org/D13979 llvm-svn: 264148	2016-03-23 11:13:54 +00:00
Joerg Sonnenberger	772bb5b65d	Typo llvm-svn: 264110	2016-03-22 22:24:52 +00:00
Simon Pilgrim	25fb4177fb	[X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions. We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT. Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718). Reapplied with a fix for PR26953 (missing vector widening legalization). Differential Revision: http://reviews.llvm.org/D17932 llvm-svn: 264062	2016-03-22 16:22:08 +00:00
Simon Pilgrim	fcc4532afa	[X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements Based on feedback for D14261 llvm-svn: 263911	2016-03-20 17:43:07 +00:00
Simon Pilgrim	c44472a5bc	[X86][SSE] Detect zeroable shuffle elements from different value types Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask. Differential Revision: http://reviews.llvm.org/D14261 llvm-svn: 263906	2016-03-20 15:45:42 +00:00
Igor Breger	3ea8af5108	AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR Differential Revision: http://reviews.llvm.org/D18211 llvm-svn: 263898	2016-03-20 13:09:43 +00:00
Michael Kuperstein	048cc3b7a8	Use a range-based for loop. NFC. llvm-svn: 263889	2016-03-20 00:16:13 +00:00
Manman Ren	4865d89653	[CXX_FAST_TLS] Disable tail call when calling conventions are mismatched. Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller and callee have mismatched calling conventions. llvm-svn: 263856	2016-03-18 23:41:51 +00:00
Simon Pilgrim	0f37fbac51	[X86][SSE] Simplified blend-with-zero combining We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines This patch stops the combine if we already have a blend in place (means we miss some domain corrections) llvm-svn: 263717	2016-03-17 15:59:36 +00:00
Sanjay Patel	be37e62e0c	fix function names; NFC llvm-svn: 263646	2016-03-16 18:00:09 +00:00

... 2 3 4 5 6 ...

13230 Commits