llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	ed44cf6ccd	Refactor duplicated code. llvm-svn: 272936	2016-06-16 18:50:12 +00:00
Sanjay Patel	0e9afea3c8	[x86] autoupgrade and remove AVX2 integer min/max intrinsics This will (hopefully very temporarily) break clang. The clang side of this should be the next commit. llvm-svn: 272932	2016-06-16 18:44:20 +00:00
Wei Ding	ab3d91b8f1	AMDGPU: Add v_mad 16-bit instructions definition. Differential Revision: http://reviews.llvm.org/D21362 llvm-svn: 272919	2016-06-16 16:50:04 +00:00
Rafael Espindola	afade35003	Don't print (PLT) on arm. The R_ARM_PLT32 relocation is deprecated and is not produced by MC. This means that the code being deleted is dead from the .o point of view and was making the .s more confusing. llvm-svn: 272909	2016-06-16 16:09:53 +00:00
Sanjay Patel	51ab757941	[x86] autoupgrade and remove SSE2/SSE41 integer min/max intrinsics Follow-up to: http://reviews.llvm.org/rL272806 http://reviews.llvm.org/rL272807 llvm-svn: 272907	2016-06-16 15:48:30 +00:00
Rafael Espindola	9ba9c5bde5	Refactor duplicated code. NFC. llvm-svn: 272905	2016-06-16 15:44:06 +00:00
Rafael Espindola	c1d739f253	Refactor duplicated code. NFC. llvm-svn: 272904	2016-06-16 15:40:24 +00:00
Rafael Espindola	c24f0eeb8d	Refactor duplicated code. NFC. llvm-svn: 272903	2016-06-16 15:31:06 +00:00
Rafael Espindola	3888bdb022	Refactor duplicated code. NFC. llvm-svn: 272901	2016-06-16 15:22:01 +00:00
Vasileios Kalintiris	22ec97fb24	[mips] Fix small typo. NFC. llvm-svn: 272895	2016-06-16 14:25:13 +00:00
Daniel Sanders	de7816b0cd	[mips][mips16] Fix machine verifier errors about incorrect register classes on load/stores. Summary: [ls][bh] and [ls][bh]u cannot use sp-relative addresses and must therefore lower frameindex nodes such that there is a copy to a CPU16Regs register. This is now done consistently using a separate addressing mode that does not permit frameindex nodes. As part of this I've had to remove an optimization that reduced the number of instructions needed to work around the lack of sp-relative addresses on [ls][bh] and [ls][bh]u. This optimization used one of the eight CPU16Regs registers as a copy of the stack pointer and it's implementation was the root cause of many of the register vs register class mismatches. lw/sw can use sp-relative addresses but we ought to ensure that we use the correct version of lw/sw internally for things like IAS. This is not currently the case and this change does not fix this. However, this change does clean it up sufficiently well to fix the machine verifier failures. Also removed irrelevant functions from stchar.ll. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21062 llvm-svn: 272882	2016-06-16 10:20:59 +00:00
Daniel Sanders	1d14864bb3	[llvm-objdump] Support detection of feature bits from the object and implement this for Mips. Summary: The Mips implementation only covers the feature bits described by the ELF e_flags so far. Mips stores additional feature bits such as MSA in the .MIPS.abiflags section. Also fixed a small bug this revealed where microMIPS wouldn't add the EF_MIPS_MICROMIPS flag when using -filetype=obj. Reviewers: echristo, rafael Subscribers: rafael, mehdi_amini, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21125 llvm-svn: 272880	2016-06-16 09:17:03 +00:00
Hrvoje Varga	f1e0a03d08	[mips][micromips] Implement DCLO, DCLZ, DROTR, DROTR32 and DROTRV instructions Differential Revision: http://reviews.llvm.org/D16917 llvm-svn: 272876	2016-06-16 07:06:25 +00:00
Craig Topper	97b1fc92e8	[X86] Pre-size some SmallVectors using the constructor in the shuffle lowering code instead of using push_back. Some of these already did this but used resize or assign instead of the constructor. NFC llvm-svn: 272872	2016-06-16 03:58:45 +00:00
Craig Topper	66f1a8b608	[X86] Remove else after return. NFC llvm-svn: 272871	2016-06-16 03:58:42 +00:00
Craig Topper	ceda65bdc4	[X86] Inline a couple lambdas into their callers since they are only used once and it all fits on a single line. NFC llvm-svn: 272869	2016-06-16 03:11:00 +00:00
Tim Northover	daa1c018b0	AArch64: allow MOV (imm) alias to be printed The backend has been around for years, it's pretty ridiculous that we can't even use the preferred form for printing "MOV" aliases. Unfortunately, TableGen can't handle the complex predicates when printing so it's a bunch of nasty C++. Oh well. llvm-svn: 272865	2016-06-16 01:42:25 +00:00
Eric Christopher	87590fae55	Tidy the asm parser: 80-col, whitespace. llvm-svn: 272861	2016-06-16 01:00:53 +00:00
Krzysztof Parzyszek	f2a4f8f10a	[Hexagon] Fix/simplify some conditional statements Fix for PR28138. llvm-svn: 272836	2016-06-15 21:05:04 +00:00
Kevin B. Smith	4f81990049	[X86]: Fix for uninitialized access introduced in r272797. llvm-svn: 272835	2016-06-15 20:52:19 +00:00
Tim Northover	389a1e39ea	AArch64: stop trying to use 32-bit MOVZs when expanding patchpoints. Of course the assembly was right but because the opcode was MOVZWi it was encoded as "movz w16, #65535, lsl #32" which is an unallocated encoding and would go horribly wrong on a CPU. No idea how this bug survived this long. It seems nobody is using that aspect of patchpoints. llvm-svn: 272831	2016-06-15 20:33:36 +00:00
Sanjay Patel	1a4569df54	[x86] add folds for x86 vector compare nodes (PR27924) Ideally, we can get rid of most x86 LLVM intrinsics by transforming them to IR (and some of that happened with http://reviews.llvm.org/rL272807), but it doesn't cost much to have some simple folds in the backend too while we're working on that and as a backstop. This fixes: https://llvm.org/bugs/show_bug.cgi?id=27924 Differential Revision: http://reviews.llvm.org/D21356 llvm-svn: 272828	2016-06-15 20:26:58 +00:00
Kevin B. Smith	acbda9ef30	[X86]: Updated r272801 to promote 16 bit compares with immediate operand to 32 bits. This is in response to a comment by Eli Friedman. llvm-svn: 272814	2016-06-15 18:18:05 +00:00
Pankaj Gode	a67fea464c	Test commit after access grant. Modified comment by adding a period. llvm-svn: 272808	2016-06-15 17:24:52 +00:00
Sanjay Patel	30e0456562	[x86] fix function name; NFC llvm-svn: 272805	2016-06-15 17:12:29 +00:00
Kevin B. Smith	54566a0e9a	[X86]: Quit promoting 8 and 16 bit compares to 32 bit. Differential Revision: http://reviews.llvm.org/D21144 llvm-svn: 272801	2016-06-15 16:37:46 +00:00
Nirav Dave	194cb55f37	Revert "Preserve DebugInfo when replacing values in DAGCombiner" Reverting due to assertion failure in lib/CodeGen/SelectionDAG/InstrEmitter.cpp This reverts commit r272792. llvm-svn: 272799	2016-06-15 16:08:50 +00:00
Kevin B. Smith	c3c82cdbd0	[X86]: Improve Liveness checking for X86FixupBWInsts.cpp Differential Revision: http://reviews.llvm.org/D21085 llvm-svn: 272797	2016-06-15 16:03:06 +00:00
Vasileios Kalintiris	7b4ab98b03	[mips] Eliminate unused code for addrRegReg complex pattern. NFC. Reviewers: dsanders, sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21381 llvm-svn: 272794	2016-06-15 15:30:07 +00:00
Nirav Dave	a72e308403	Preserve DebugInfo when replacing values in DAGCombiner [DAG] Previously debug values would transfer debuginfo for the selected start node for a replacement which allows for debug to be dropped. Push debug value transfer to occur with node/value replacement in SelectionDAG, remove now extraneous transfers of debug values. This refixes PR9817 which was being incompletely checked in the testsuite. Reviewers: jyknight Subscribers: dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D21037 llvm-svn: 272792	2016-06-15 14:50:08 +00:00
Ranjeet Singh	0db7be886e	Reverting r272778 because there's an assertion failure when running the test CodeGen/ARM/intrinsics-coprocessor.ll llvm-svn: 272791	2016-06-15 14:23:29 +00:00
Valery Pykhtin	02e2086e41	[AMDGPU] Fix few coding style issues. NFC. llvm-svn: 272785	2016-06-15 13:55:09 +00:00
Ranjeet Singh	351364fe76	[ARM] Add support for mrrc/mrrc2 intrinsics. Differential Revision: http://reviews.llvm.org/D21178 llvm-svn: 272778	2016-06-15 11:32:24 +00:00
Daniel Sanders	f2895d344d	[mips] Replace AdditionalRequires<[IsGP64bit]> with GPR_64. NFC. Summary: Also fixed one case where HasMips64 was being used instead of IsGP64bit. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D21028 llvm-svn: 272771	2016-06-15 10:36:16 +00:00
Daniel Sanders	8015c708ea	[mips] clang-format Mips16ISelDAGToDAG.{cpp,h} llvm-svn: 272768	2016-06-15 09:44:22 +00:00
Daniel Sanders	d3bb20821d	[mips][msa] Fix register/register-class mismatches in emitINSERT_DF_VIDX(). Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21068 llvm-svn: 272765	2016-06-15 08:43:23 +00:00
Zlatko Buljan	d2ed9c6c2c	[mips][microMIPS] Add CodeGen support for AND, OR16, OR, XOR*, NOT16 and NOR instructions Differential Revision: http://reviews.llvm.org/D16719 llvm-svn: 272764	2016-06-15 07:46:24 +00:00
Igor Breger	64cfd3a442	[AVX512] Fix BLENDM lowering patterns. Operands should be swapped to match SELECT behavior. Use BLENDM instead of masked move instruction. Differential Revision: http://reviews.llvm.org/D21001 llvm-svn: 272763	2016-06-15 07:30:38 +00:00
Sanjoy Das	4f7a86c74d	Push a dependent computation into the assert that uses it; NFC ... instead of explicitly conditioning on NDEBUG. Also use an easier to read conditional expression. (Addresses post-commit review from David Blaikie.) llvm-svn: 272762	2016-06-15 07:27:04 +00:00
Nicolai Haehnle	a609259832	AMDGPU: Fix MUBUF offset bugs affecting llvm.amdgcn.buffer.* intrinsics Summary: This fixes two related bugs. First, the generic optimization passes unfortunately generate negative constant offsets but the hardware treats SOffset as an unsigned value. Second, there is a hardware bug on SI and CI, where address clamping in MUBUF instructions does not work correctly when SOffset is larger than the buffer size. This patch works around this bug by never using SOffset. An alternative workaround would be to do the clamping manually when SOffset is too large, but generating the required code sequence during instruction selection would be rather involved, and in any case the resulting code would probably be worse. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96360 Reviewers: arsenm, tstellarAMD Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21326 llvm-svn: 272761	2016-06-15 07:13:05 +00:00
Sanjoy Das	3f59c0c3ab	Fix unused variable warning; NFC TailCallReturnAddrDelta is used only in an assert, so put it under defined(NDEBUG). llvm-svn: 272760	2016-06-15 06:53:59 +00:00
Sanjoy Das	0272be206a	Don't force SP-relative addressing for statepoints Summary: ... when the offset is not statically known. Prioritize addresses relative to the stack pointer in the stackmap, but fallback gracefully to other modes of addressing if the offset to the stack pointer is not a known constant. Patch by Oscar Blumberg! Reviewers: sanjoy Subscribers: llvm-commits, majnemer, rnk, sanjoy, thanm Differential Revision: http://reviews.llvm.org/D21259 llvm-svn: 272756	2016-06-15 05:35:14 +00:00
Tom Stellard	82785e9fe7	AMDGPU/SI: Correctly encode constant expressions Summary: We we have an MCConstantExpr, we can encode it directly into the instruction instead of emitting fixups. Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov, arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21236 Change-Id: I88b3edf288d48e65c5d705fc4850d281f8e36948 llvm-svn: 272750	2016-06-15 03:09:39 +00:00
Tom Stellard	89049702ce	AMDGPU/AsmParser: Add support for parsing symbol operands Summary: We can now reference symbols directly in operands, like this: s_mov_b32 s0, global Reviewers: artem.tamazov, vpykhtin, SamWot, nhaustov Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D21038 llvm-svn: 272748	2016-06-15 02:54:14 +00:00
David Majnemer	cbf614a93b	Remove the ScalarReplAggregates pass Nearly all the changes to this pass have been done while maintaining and updating other parts of LLVM. LLVM has had another pass, SROA, which has superseded ScalarReplAggregates for quite some time. Differential Revision: http://reviews.llvm.org/D21316 llvm-svn: 272737	2016-06-15 00:19:09 +00:00
Matt Arsenault	f42c69206d	AMDGPU: Run pointer optimization passes llvm-svn: 272736	2016-06-15 00:11:01 +00:00
Peter Collingbourne	96efdd6107	IR: Introduce local_unnamed_addr attribute. If a local_unnamed_addr attribute is attached to a global, the address is known to be insignificant within the module. It is distinct from the existing unnamed_addr attribute in that it only describes a local property of the module rather than a global property of the symbol. This attribute is intended to be used by the code generator and LTO to allow the linker to decide whether the global needs to be in the symbol table. It is possible to exclude a global from the symbol table if three things are true: - This attribute is present on every instance of the global (which means that the normal rule that the global must have a unique address can be broken without being observable by the program by performing comparisons against the global's address) - The global has linkonce_odr linkage (which means that each linkage unit must have its own copy of the global if it requires one, and the copy in each linkage unit must be the same) - It is a constant or a function (which means that the program cannot observe that the unique-address rule has been broken by writing to the global) Although this attribute could in principle be computed from the module contents, LTO clients (i.e. linkers) will normally need to be able to compute this property as part of symbol resolution, and it would be inefficient to materialize every module just to compute it. See: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160509/356401.html http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160516/356738.html for earlier discussion. Part of the fix for PR27553. Differential Revision: http://reviews.llvm.org/D20348 llvm-svn: 272709	2016-06-14 21:01:22 +00:00
Tom Stellard	bf3e6e5bb4	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Re-commit this after fixing a bug where we were trying to use a reference to a Triple object that had already been destroyed. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272705	2016-06-14 20:29:59 +00:00
Wei Mi	b799a625f9	[X86] Reduce the width of multiplification when its operands are extended from i8 or i16 For <N x i32> type mul, pmuludq will be used for targets without SSE41, which often introduces many extra pack and unpack instructions in vectorized loop body because pmuludq generates <N/2 x i64> type value. However when the operands of <N x i32> mul are extended from smaller size values like i8 and i16, the type of mul may be shrunk to use pmullw + pmulhw/pmulhuw instead of pmuludq, which generates better code. For targets with SSE41, pmulld is supported so no shrinking is needed. Differential Revision: http://reviews.llvm.org/D20931 llvm-svn: 272694	2016-06-14 18:53:20 +00:00
Tom Stellard	b1a523fa68	Revert "AMDGPU/SI: Refactor fixup handling for constant addrspace variables" This reverts commit r272675. llvm-svn: 272677	2016-06-14 15:16:35 +00:00
Tom Stellard	5e6298b0f2	AMDGPU/SI: Refactor fixup handling for constant addrspace variables Summary: We now use a standard fixup type applying the pc-relative address of constant address space variables, and we have the GlobalAddress lowering code add the required 4 byte offset to the global address rather than doing it as part of the fixup. This refactoring will make it easier to use the same code for global address space variables and also simplifies the code. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21154 llvm-svn: 272675	2016-06-14 15:11:01 +00:00
Artem Tamazov	17091364d1	[AMDGPU][llvm-mc] Predefined symbols to access -mcpu from the assembly source (.option.machine_version...) The feature allows for conditional assembly etc. TODO: make those symbols read-only. Test added. Differential Revision: http://reviews.llvm.org/D21238 llvm-svn: 272673	2016-06-14 15:03:59 +00:00
Simon Dardis	878c0b1b76	[mips] Optimize stack pointer adjustments. Instead of always using addu to adjust the stack pointer when the size out is of the range of an addiu instruction, use subu so that a smaller constant can be generated. This can give savings of ~3 instructions whenever a function has a a stack frame whose size is out of range of an addiu instruction. This change may break some naive stack unwinders. Partially resolves PR/26291. Thanks to David Chisnall for reporting the issue. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D21321 llvm-svn: 272666	2016-06-14 13:39:43 +00:00
James Molloy	65b6be1d3a	[Thumb] Fix off-by-one error in r272007 We can only generate immediates up to #510 with a MOV+ADD, not #511, because there's no such instruction as add #256. Found by Oliver Stannard and csmith! llvm-svn: 272665	2016-06-14 13:33:07 +00:00
Simon Dardis	4fbf76f7c3	[mips][atomics] Fix atomic instruction descriptions and uses. PR27458 highlights that the MIPS backend does not have well formed MIR for atomic operations (among other errors). This patch adds expands and corrects the LL/SC descriptions and uses for MIPS(64). Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D19719 llvm-svn: 272655	2016-06-14 11:29:28 +00:00
Daniel Sanders	e858136d91	[mips][ias] Implement one N32 case (of two) for .cpsetup. This patch implements the N32 case where -mno-shared is in effect. The case where -mshared is in effect will be added later since doing that now requires additional changes to how we handle %hi(%neg(%gp_rel(foo))) expressions to emit the three relocations as three relocations (currently only one of the three would be emitted) which then requires further changes to our MCFixup handling. While we could fix both cases together, fixing the -mno-shared case allows us to fix the ELFCLASS bug (where N32 incorrectly uses ELFCLASS64 instead of ELFCLASS32) in a way that allows cpsetup.s to check for a correct output instead of another incorrect output. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D21131 llvm-svn: 272652	2016-06-14 10:13:47 +00:00
Simon Pilgrim	cf1165b86e	[X86][SSE4A] Added patterns for nontemporal stores of scalar float/doubles using MOVNTSD/MOVNTSS llvm-svn: 272651	2016-06-14 09:43:38 +00:00
Simon Dardis	e661e528db	[mips] MIPS32/64 itineraries Itineraries for some pre MIPSR6 and EVA instructions. Some pseudo expanded instructions are marked as having no scheduling info. Reviewers: dsanders, vkalintiris Differential Review: http://reviews.llvm.org/D20418 llvm-svn: 272648	2016-06-14 09:35:29 +00:00
Daniel Sanders	435a653437	[mips][dsp] Fix use without def on DSPCtrl registers read by rddsp intrinsic. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21063 llvm-svn: 272647	2016-06-14 09:29:46 +00:00
Daniel Sanders	d2a49ec3ab	[mips][msa] copyPhysReg() should not set RegState::Define on result of CTCMSA. Summary: The machine verifier reports 'Explicit operand marked as def' when it is manually specified even though it agrees with the operand info. Reviewers: sdardis Subscribers: dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D21065 llvm-svn: 272646	2016-06-14 09:11:33 +00:00
Craig Topper	34d9707825	[AVX512] Use AND32ri8 instead of AND32ri when anding with 1 to create single bit masks. This results in a smaller encoding. llvm-svn: 272627	2016-06-14 03:13:03 +00:00
Craig Topper	99e30e6a66	[AVX512] Use MOVZX32 instead of MOVZ16 for loading single v8/v4/v2/v1 masks when KMOVB is not available. This has better behavior with respect to partial register stalls since it won't need to preserve the upper 16-bits of the GPR. llvm-svn: 272626	2016-06-14 03:13:00 +00:00
Craig Topper	ddab395397	[AVX512] Add patterns for zero-extending a mask that use the def of KMOVW/KMOVB without going through an EXTRACT_SUBREG and a MOVZX. llvm-svn: 272625	2016-06-14 03:12:54 +00:00
Kevin Enderby	d2d2ce9b9f	Update the AArch64ExternalSymbolizer to print literal strings as escaped strings so it is the same as the MCExternalSymbolizer. rdar://17349181 llvm-svn: 272588	2016-06-13 21:08:57 +00:00
David Majnemer	248190ba69	[X86] Remove llvm.x86.bit.scan.{forward,reverse}.32 The need for these intrinsics has been obviated by r272564 which reimplements their functionality using generic IR. llvm-svn: 272566	2016-06-13 17:33:13 +00:00
Marek Olsak	e93f6d6923	AMDGPU/SI: Set INDEX_STRIDE for scratch coalescing Summary: Mesa and other users must set this to enable coalescing: - STRIDE = 0 - SWIZZLE_ENABLE = 1 This makes one particular compute shader 8x faster. Reviewers: tstellarAMD, arsenm Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21136 llvm-svn: 272556	2016-06-13 16:05:57 +00:00
Matt Arsenault	80bc355048	AMDGPU: Fix post-RA verifier errors with trackLivenessAfterRegAlloc The condition reg of the cndmask_b64 expansion can't be killed by the first one, and the implicit super register implicit def is needed. llvm-svn: 272554	2016-06-13 15:53:52 +00:00
Ulrich Weigand	daae87aa21	[SystemZ] Enable index register memory constraints for inline ASM This enables use of the 'R' and 'T' memory constraints for inline ASM operands on SystemZ, which allow an index register as well as an immediate displacement. This patch includes corresponding documentation and test case updates. As with the last patch of this kind, I moved the 'm' constraint to the most general case, which is now 'T' (base + 20-bit signed displacement + index register). Author: colpell Differential Revision: http://reviews.llvm.org/D21239 llvm-svn: 272547	2016-06-13 14:24:05 +00:00
Ranjeet Singh	933e1aa39f	[ARM] Reverting r272544 because clang patch needs to go in as soon as llvm patch has gone in because tests will start breaking in Clang. llvm-svn: 272546	2016-06-13 10:58:24 +00:00
Ranjeet Singh	8feacb330d	[ARM] Add mrrc/mrrc2 co-processor intrinsics MRRC/MRRC2 instruction writes to two registers. The intrinsic definition returns a single uint64_t to represent the write, this is a compact way of representing a write to two 32 bit registers, the alternative might have been two return a struct of 2 uint32_t's but this isn't as nice. Differential Revision: llvm-svn: 272544	2016-06-13 10:43:50 +00:00
Haojian Wu	7900ca1e7e	Fix an enumeral mismatch warning. Summary: The "-Werror=enum-compare" shows that the statement is using two different enums: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' A follow-up fix on D21235. Reviewers: klimek Subscribers: spatel, cfe-commits Differential Revision: http://reviews.llvm.org/D21278 llvm-svn: 272539	2016-06-13 09:03:45 +00:00
Craig Topper	13cf7cac07	[AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector. llvm-svn: 272527	2016-06-13 02:36:48 +00:00
Benjamin Kramer	4ca41fd09e	Run clang-tidy's performance-unnecessary-copy-initialization over LLVM. No functionality change intended. llvm-svn: 272516	2016-06-12 17:30:47 +00:00
Benjamin Kramer	d3f4c05aea	Move instances of std::function. Or replace with llvm::function_ref if it's never stored. NFC intended. llvm-svn: 272513	2016-06-12 16:13:55 +00:00
Benjamin Kramer	bdc4956bac	Pass DebugLoc and SDLoc by const ref. This used to be free, copying and moving DebugLocs became expensive after the metadata rewrite. Passing by reference eliminates a ton of track/untrack operations. No functionality change intended. llvm-svn: 272512	2016-06-12 15:39:02 +00:00
Sanjay Patel	977530a8c9	[x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044) This patch is intended to solve: https://llvm.org/bugs/show_bug.cgi?id=28044 By changing the definition of X86ISD::CMPP to use float types, we allow it to be created and pass legalization for an SSE1-only target where v4i32 is not legal. The motivational trail for this change includes: https://llvm.org/bugs/show_bug.cgi?id=28001 and eventually makes this trigger: http://reviews.llvm.org/D21190 Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86 intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86 intrinsics, a big pile of generic optimizations can trigger. Differential Revision: http://reviews.llvm.org/D21235 llvm-svn: 272511	2016-06-12 15:03:25 +00:00
Craig Topper	1067986c5b	[X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector. llvm-svn: 272510	2016-06-12 14:11:32 +00:00
Craig Topper	251030babe	[AVX512] Remove the masked palignr intrinsics that I forgot to remove when I added auto-upgrade code to turn them into shufflevectors and selects. llvm-svn: 272497	2016-06-12 04:14:13 +00:00
Simon Pilgrim	3fc09f7be6	[CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets To account for the fast PSHUFB implementation now available llvm-svn: 272484	2016-06-11 19:23:02 +00:00
Simon Pilgrim	5b9bade8dd	[X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach. llvm-svn: 272477	2016-06-11 15:44:13 +00:00
Simon Pilgrim	b13961d25b	Strip trailing whitespace. NFCI. llvm-svn: 272476	2016-06-11 14:34:10 +00:00
Craig Topper	504fba5c8a	[AVX512] Lower v8i64 and v16i32 to pshufd when possible. llvm-svn: 272473	2016-06-11 13:43:21 +00:00
Simon Pilgrim	6800a45790	[X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining llvm-svn: 272471	2016-06-11 13:38:28 +00:00
Simon Pilgrim	255fdd0666	[X86][SSE] Use vXi8 return type for PSLLDQ/PSRLDQ instructions These are byte shift instructions and it will make shuffle combining a lot more straightforward if we can assume a vXi8 vector of bytes so decoded shuffle masks match the return type's number of elements llvm-svn: 272468	2016-06-11 12:54:37 +00:00
Simon Pilgrim	d386941676	[X86][AVX512] Tidied up VSHUFF32x4/VSHUFF64x2/VSHUFI32x4/VSHUFI64x2 comment generation Now matches other shuffles llvm-svn: 272464	2016-06-11 11:18:38 +00:00
Chandler Carruth	4c0e94dce6	Try a bit harder to remove the signed and unsigned comparison warning. Hopefully this time it actually works and stays away. llvm-svn: 272463	2016-06-11 09:13:00 +00:00
Chandler Carruth	306e270b83	Compare to an unsigned literal to avoid a -Wsign-compare warning. llvm-svn: 272459	2016-06-11 08:02:01 +00:00
Craig Topper	40abd1cc61	[AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW. llvm-svn: 272450	2016-06-11 03:27:42 +00:00
Craig Topper	b9b86fcfff	[AVX512] No need to check for BWI being enabled before lowering v32i16 and v64i8 shuffles. If we get this far the types are already legal which means BWI must be enabled. llvm-svn: 272449	2016-06-11 03:27:37 +00:00
Sanjoy Das	39c226fdba	[STLExtras] Introduce and use llvm::count_if; NFC (This is split out from was D21115) llvm-svn: 272435	2016-06-10 21:18:39 +00:00
Chad Rosier	d1f6c840ee	[AArch64] Move comments closer to relevant check. NFC. llvm-svn: 272430	2016-06-10 20:49:18 +00:00
Chad Rosier	c5083c2ccf	[AArch64] Refactor a check earlier. NFC. llvm-svn: 272429	2016-06-10 20:47:14 +00:00
Sanjay Patel	b114fd65fc	[x86] enable bitcasted fabs/fneg transforms The vector cases don't change because we already have folds in X86ISelLowering to look through and remove bitcasts. llvm-svn: 272427	2016-06-10 20:33:50 +00:00
Nico Weber	2cf5e89e1d	Remove a few gendered pronouns. llvm-svn: 272422	2016-06-10 20:06:03 +00:00
Zhan Jun Liau	ab42cbce98	[SystemZ] Support Compare and Traps Support and generate Compare and Traps like CRT, CIT, etc. Support Trap as legal DAG opcodes and generate "j .+2" for them by default. Add support for Conditional Traps and use the If Converter to convert them into the corresponding compare and trap opcodes. Differential Revision: http://reviews.llvm.org/D21155 llvm-svn: 272419	2016-06-10 19:58:10 +00:00
Tom Stellard	f3af841462	AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations Summary: We need to set the fixup type to FK_Data_4 for the SCRATCH_RSRC_DWORD[01] symbols, since these require absolute relocations, and fixup_si_rodata is for relative relocations. Reviewers: arsenm, kzhuravl Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: http://reviews.llvm.org/D21153 llvm-svn: 272417	2016-06-10 19:26:38 +00:00
Michael Kuperstein	9a0542a792	[X86] Add costs for SSE zext/sext to v4i64 to TTI The costs are somewhat hand-wavy, but should be much closer to the truth than what we get from BasicTTI. Differential Revision: http://reviews.llvm.org/D21156 llvm-svn: 272406	2016-06-10 17:01:05 +00:00
Evandro Menezes	a3a0a60cff	[AArch64] Add preferred alignments for Exynos M1 Differential Revision: http://reviews.llvm.org/D21203 llvm-svn: 272400	2016-06-10 16:00:18 +00:00
Krzysztof Parzyszek	96cfc381e2	[Hexagon] Remove incorrect offset scaling llvm-svn: 272399	2016-06-10 15:43:18 +00:00
Roman Shirokiy	d93998f606	Test commit llvm-svn: 272393	2016-06-10 13:12:48 +00:00
Sam Kolton	945231ada5	[AMDGPU] AsmParser: Support for sext() modifier in SDWA. Some code cleaning in AMDGPUOperand. Summary: sext() modifier is supported in SDWA instructions only for integer operands. Spec is unclear should integer operands support abs and neg modifiers with sext - for now they are not supported. Renamed InputModsWithNoDefault to FloatInputMods. Added SextInputMods for operands that support sext() modifier. Added AMDGPUOperand::Modifier struct to handle register and immediate modifiers. Code cleaning in AMDGPUOperand class: organize method in groups (render-, predicate-methods...). Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20968 llvm-svn: 272384	2016-06-10 09:57:59 +00:00
Roger Ferrer Ibanez	1efc17e1ec	test commit: remove trailing whitespaces in README.txt llvm-svn: 272380	2016-06-10 08:19:58 +00:00
Craig Topper	200d237e57	[AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ. llvm-svn: 272371	2016-06-10 05:12:40 +00:00
Craig Topper	89c1761474	[AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names. llvm-svn: 272367	2016-06-10 04:48:05 +00:00
Matt Arsenault	37fefd68cc	AMDGPU: Fix trailing whitespace llvm-svn: 272364	2016-06-10 02:18:02 +00:00
Matt Arsenault	58ddad5bd6	AMDGPU: v_cndmask_b32 does not def vcc Fixes verifier errors after SIShrinkInstructions. llvm-svn: 272351	2016-06-10 00:18:41 +00:00
Tom Stellard	26a2ab7477	AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_permute Summary: This fixes a bug with ds_permute instructions where if it was passed a constant address, then the offset operand would get assigned a register operand instead of an immediate. Reviewers: scchan, arsenm Subscribers: arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19994 llvm-svn: 272349	2016-06-10 00:01:04 +00:00
Tom Stellard	1d3940e8cc	AMDGPU/SI: Use common topological sort algorithm in SIScheduleDAGMI Reviewers: arsenm, axeldavy Subscribers: MatzeB, arsenm, llvm-commits Differential Revision: http://reviews.llvm.org/D19823 llvm-svn: 272346	2016-06-09 23:48:02 +00:00
Matt Arsenault	7757c59e48	AMDGPU: Fix flat atomics The flat atomics could already be selected, but only when using flat instructions for global memory. Add patterns for flat addresses. llvm-svn: 272345	2016-06-09 23:42:54 +00:00
Matt Arsenault	887018179a	AMDGPU: Fix i64 global cmpxchg This was using extract_subreg sub0 to extract the low register of the result instead of sub0_sub1, producing an invalid copy. There doesn't seem to be a way to use the compound subreg indices in tablegen since those are generated, so manually select it. llvm-svn: 272344	2016-06-09 23:42:48 +00:00
Eric Christopher	1dbb23e162	Add aliases for mfvrsave/mtvrsave. Update a test as we're now going to emit it for easier reading of generated assembly as well. llvm-svn: 272339	2016-06-09 23:27:48 +00:00
Matt Arsenault	e2bd9a32bc	AMDGPU: Run verifer after insert waits pass llvm-svn: 272338	2016-06-09 23:19:14 +00:00
Matt Arsenault	cfb61e780a	AMDGPU: Remove incorrect assertion I'm still not sure under what circumstances the offset here is non-0, but private memory is not limited to 27-bits. llvm-svn: 272337	2016-06-09 23:19:08 +00:00
Matt Arsenault	c3a01ec9db	AMDGPU: Properly initialize SIShrinkInstructions llvm-svn: 272336	2016-06-09 23:18:47 +00:00
Simon Pilgrim	643734c565	[X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments llvm-svn: 272319	2016-06-09 22:03:15 +00:00
Simon Pilgrim	f718682eb9	[X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ llvm-svn: 272308	2016-06-09 21:09:03 +00:00
Simon Pilgrim	47c76e201a	[X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR llvm-svn: 272307	2016-06-09 20:53:12 +00:00
Simon Pilgrim	0ab9d3026a	[X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts 512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget llvm-svn: 272300	2016-06-09 20:13:58 +00:00
Justin Lebar	ed2c282d4b	[NVPTX] Add intrinsics for shfl instructions. Summary: Currently clang emits these instructions via inline (volatile) asm in the CUDA headers. Switching to intrinsics will let the optimizer reason across calls to these intrinsics. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: http://reviews.llvm.org/D21160 llvm-svn: 272298	2016-06-09 20:04:08 +00:00
Wei Ding	ed0f97fad2	AMDGPU/SI: Fix 32-bit fdiv lowering We were using the fast fdiv lowering for all division, implementation of IEEE754 fdiv is added. http://reviews.llvm.org/D20557 llvm-svn: 272292	2016-06-09 19:17:15 +00:00
Ulrich Weigand	79564611d9	[SystemZ] Enable long displacement constraints for inline ASM operands This enables use of the 'S' constraint for inline ASM operands on SystemZ, which allows for a memory reference with a signed 20-bit immediate displacement. This patch includes corresponding documentation and test case updates. I've changed the 'T' constraint to match the new behavior for 'S', as 'T' also uses a long displacement (though index constraints are still not implemented). I also changed 'm' to match the behavior for 'S' as this will allow for a wider range of displacements for 'm', though correct me if that's not the right decision. Author: colpell Differential Revision: http://reviews.llvm.org/D21097 llvm-svn: 272266	2016-06-09 15:19:16 +00:00
Hrvoje Varga	c962c4936e	[mips][microMIPS] Implement BOVC, BNVC, EXT, INS and JALRC instructions Differential Revision: http://reviews.llvm.org/D11798 llvm-svn: 272259	2016-06-09 12:57:23 +00:00
James Molloy	a7dbf987b5	[Thumb] A branch is not part of an IT block ReplaceTailWithBranchTo assumed that if an instruction is predicated, it must be part of an IT block. This is not correct for conditional branches. No testcase as this was triggered by the reverted patch r272017 - test coverage will occur when that patch is re-reverted and there is no known way to trigger this in the meantime. llvm-svn: 272258	2016-06-09 11:51:29 +00:00
Igor Breger	f635367e2b	[AVX512] Remove masked_move/blendm intrinsic from back-end. This is complement patch to D21060. Differential Revision: http://reviews.llvm.org/D21174 llvm-svn: 272257	2016-06-09 11:46:55 +00:00
Zlatko Buljan	cd242c1655	[mips][microMIPS] Add CodeGen support for SEL., SELEQZ, SELNEZ, SELEQZ., SELNEZ.* and CMP.condn.fmt instructions Differential Revision: http://reviews.llvm.org/D20862 llvm-svn: 272256	2016-06-09 11:15:53 +00:00
Sam Kolton	c9bdcb75c4	[AMDGPU] Disassembler: Support for sdwa instructions Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D21129 llvm-svn: 272255	2016-06-09 11:04:45 +00:00
Craig Topper	6f7288dc44	[AVX512] Fix shuffle decode printing for several instructions with write masks. There are still more bugs here with UNPCK and PALIGN for sure. But these were the easiest ones to fix. llvm-svn: 272252	2016-06-09 07:49:08 +00:00
James Molloy	feb9f4243b	[Thumb] Select a BIC instead of AND if the immediate can be encoded more optimally negated If an immediate is only used in an AND node, it is possible that the immediate can be more optimally materialized when negated. If this is the case, we can negate the immediate and use a BIC instead; int i(int a) { return a & 0xfffffeec; } Used to produce: ldr r1, [CONSTPOOL] ands r0, r1 CONSTPOOL: 0xfffffeec And now produces: movs r1, #255 adds r1, #20 ; Less costly immediate generation bics r0, r1 llvm-svn: 272251	2016-06-09 07:39:08 +00:00
Craig Topper	7a2993093e	[X86] Bring consistent naming to the SSE/AVX and AVX512 PALIGNR instructions. Then add shuffle decode printing for the EVEX forms which is made easier by having the naming structure more similar to other instructions. llvm-svn: 272249	2016-06-09 07:06:38 +00:00
Craig Topper	565a5b5451	[X86] Fix bad comment in assert. NFC llvm-svn: 272248	2016-06-09 07:06:33 +00:00
Saleem Abdulrasool	6c19ffc8bc	AArch64: support the `.arch` directive in the IAS Add support to the AArch64 IAS for the `.arch` directive. This allows the assembly input to use architectural functionality in part of a file. This is used in existing code like BoringSSL. Resolves PR26016! llvm-svn: 272241	2016-06-09 02:56:40 +00:00
Benjamin Kramer	c321e53402	Apply most suggestions of clang-tidy's performance-unnecessary-value-param Avoids unnecessary copies. All changes audited & pass tests with asan. No functional change intended. llvm-svn: 272190	2016-06-08 19:09:22 +00:00
Quentin Colombet	d1cd30b218	[AArch64][RegisterBankInfo] G_OR are fine on either GPR or FPR. Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or FPR for 64-bit or 32-bit values. Add test cases demonstrating how this information is used to coalesce a computation on a single register bank. llvm-svn: 272170	2016-06-08 16:53:32 +00:00
Oliver Stannard	b3378e2f3c	[ARM] MSR instructions implicitly set CPSR The MSR instructions can write to the CPSR, but we did not model this fact, so we could emit them in the middle of IT blocks, changing the condition flags for later instructions in the block. The tests use two calls to llvm.write_register.i32 because it is valid to use these instructions at the end of an IT block, which if conversion does do in some cases. With two calls, the first clobbers the flags, so a branch has to be used to make the second one conditional. Differential Revision: http://reviews.llvm.org/D21139 llvm-svn: 272154	2016-06-08 15:26:34 +00:00
Vasileios Kalintiris	a9e5154dc5	[mips] Add a proper file header in MipsFastISel.cpp llvm-svn: 272138	2016-06-08 13:13:15 +00:00
Krzysztof Parzyszek	b16882ddf1	[Hexagon] Modify HexagonExpandCondsets to handle subregisters Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. Re-committing r272045. llvm-svn: 272135	2016-06-08 12:31:16 +00:00
Diana Picus	0781d10ac4	[ARM] Remove redundant check. NFC isSwift is tested earlier and known to be false when we reach this code. llvm-svn: 272127	2016-06-08 10:29:02 +00:00
Benjamin Kramer	46e38f3678	Avoid copies of std::strings and APInt/APFloats where we only read from it As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126	2016-06-08 10:01:20 +00:00
Igor Breger	982e4003a6	[AVX512] Fix cvtusi2sd instruction Opcode, it should be 0x7B instead of 0x2A. llvm-svn: 272122	2016-06-08 07:48:23 +00:00
Quentin Colombet	a4ac7cdac2	[AArch64][RegisterBankInfo] Use the generic implementation of copyCost. Long term we may want to give high cost at FPR to/from GPR copies. llvm-svn: 272086	2016-06-08 01:24:00 +00:00
Quentin Colombet	cfbdee2312	[RegisterBankInfo] Add a size argument for the cost of copy. The cost of a copy may be different based on how many bits we have to copy around. E.g., a 8-bit copy may be different than a 32-bit copy. llvm-svn: 272084	2016-06-08 01:11:03 +00:00
Nicolai Haehnle	c00e03b8f5	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063	2016-06-07 21:37:17 +00:00
Eric Christopher	538d09d0dd	Revert "Differential Revision: http://reviews.llvm.org/D20557 " Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056	2016-06-07 20:27:12 +00:00
Etienne Bergeron	22bfa83208	[stack-protection] Add support for MSVC buffer security check Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053	2016-06-07 20:15:35 +00:00
Krzysztof Parzyszek	8b61759c05	Revert r272045 since GCC doesn't know how to compile it. llvm-svn: 272048	2016-06-07 19:25:28 +00:00
Krzysztof Parzyszek	8424280eef	[Hexagon] Modify HexagonExpandCondsets to handle subregisters Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. llvm-svn: 272045	2016-06-07 19:06:23 +00:00
Wei Ding	a70216f1b3	Differential Revision: http://reviews.llvm.org/D20557 llvm-svn: 272044	2016-06-07 19:04:44 +00:00
Geoff Berry	486f49cc63	Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid negative values. Originally reviewed here: http://reviews.llvm.org/D17463 llvm-svn: 272023	2016-06-07 16:48:43 +00:00
Oliver Stannard	8de5f24d10	[ARM] Accept conditional versions of BXNS and BLXNS These instructions end in "S" but are not flag-setting, so they need including in the list of special cases in the assembly parser. Differential Revision: http://reviews.llvm.org/D21077 llvm-svn: 272015	2016-06-07 14:58:48 +00:00
Simon Pilgrim	35c06a0282	[X86][SSE] Add general lowering of nontemporal vector loads (fixed bad merge) Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272011	2016-06-07 13:47:23 +00:00
Simon Pilgrim	9a89623b57	[X86][SSE] Add general lowering of nontemporal vector loads Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010	2016-06-07 13:34:24 +00:00
James Molloy	b101383fb5	[Thumb-1] Add optimized constant materialization for integers [256..512) We can materialize these integers using a MOV; ADDi8 pair. llvm-svn: 272007	2016-06-07 13:10:14 +00:00
Igor Breger	61e628591f	[AVX512] Fix load opcode for fast isel. Differential Revision: http://reviews.llvm.org/D21067 llvm-svn: 272006	2016-06-07 13:08:45 +00:00
Ulrich Weigand	6b0634b304	[PowerPC] Support multiple return values with fast isel Using an LLVM IR aggregate return value type containing three or more integer values causes an abort in the fast isel pass. This patch adds two more registers to RetCC_PPC64_ELF_FIS to allow returning up to four integers with fast isel, just the same as is currently supported with regular isel (RetCC_PPC). This is needed for Swift and (possibly) other non-clang frontends. Fixes PR26190. llvm-svn: 272005	2016-06-07 12:48:22 +00:00
Simon Pilgrim	ca1da1bf07	[X86][SSE] Improved blend+zero target shuffle combining to use combined shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. llvm-svn: 272003	2016-06-07 12:20:14 +00:00
James Molloy	53298a1808	[ARM] Shrink post-indexed LDR and STR to LDM/STM A Thumb-2 post-indexed LDR instruction such as: ldr.w r0, [r1], #4 Can be rewritten as: ldm.n r1!, {r0} LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode. llvm-svn: 272002	2016-06-07 12:13:34 +00:00
James Molloy	75afc95112	[ARM] Transform LDMs into writeback form to save code size If we have an LDM that uses only low registers and doesn't write to its base register: ldm.w r0, {r1, r2, r3} And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding: ldm.n r0!, {r1, r2, r3} Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't. llvm-svn: 272000	2016-06-07 11:47:24 +00:00
Peter Smith	353a2286e2	[ARM] Incorrect relocation type for Thumb2 B<cond>.w The Thumb2 conditional branch B<cond>.W has a different encoding (T3) to the unconditional branch B.W (T4) as it needs to record <cond>. As the encoding is different the B<cond>.W is given a different relocation type. ELF for the ARM Architecture 4.6.1.6 (Table-13) states that R_ARM_THM_JUMP19 should be used for B<cond>.W. At present the MC layer is using the R_ARM_THM_JUMP24 from B.W. This change makes B<cond>.W use R_ARM_THM_JUMP19 and alters the existing test that checks for R_ARM_THM_JUMP24 to expect R_ARM_THM_JUMP19. llvm-svn: 271997	2016-06-07 10:34:33 +00:00
Craig Topper	2f90c1fedf	[AVX512] Allow avx2 and sse41 nontemporal load intrinsics to select EVEX encoded instructions when VLX is enabled. llvm-svn: 271988	2016-06-07 07:27:57 +00:00
Craig Topper	e1cac15feb	[AVX512] Remove unnecessary mayLoad, mayStore, hasSidEffects flags from instructions that have patterns that imply them. Add the same set of flags to instructions that don't have patterns to imply them. llvm-svn: 271987	2016-06-07 07:27:54 +00:00
Craig Topper	0fcf925699	[AVX512] Add NoVLX to a couple patterns that have VLX equivalents. Ordering of the patterns in the .td file protects this, but its better to be explicit. llvm-svn: 271986	2016-06-07 07:27:51 +00:00
Saleem Abdulrasool	532dcbc2c5	ARM: correct TLS access on WoA TLS access requires an offset from the TLS index. The index itself is the section-relative distance of the symbol. For ARM, the relevant relocation (IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may not be an immediate and must be lowered into a constant pool. This offset will not be base relocated. We were previously emitting the actual address of the symbol which would be base relocated and would therefore be the vaue offset by the ImageBase + TLS Offset. llvm-svn: 271974	2016-06-07 03:15:07 +00:00
Saleem Abdulrasool	ce4eee4951	ARM: clang-format a couple of switches, add comments clang-format a couple of switches in preparation for a future change. Add some enumeration comments llvm-svn: 271973	2016-06-07 03:15:01 +00:00
Saleem Abdulrasool	0cd0cb992a	ARM: normalise space in the patterns Just adjust the whitespace for the selection patterns. NFC. llvm-svn: 271972	2016-06-07 03:14:57 +00:00
Matt Arsenault	02458c2d27	AMDGPU: Add function for getting instruction size llvm-svn: 271936	2016-06-06 20:10:33 +00:00
Matt Arsenault	3b2e2a59e8	AMDGPU: Fix constantexpr addrspacecasts If we had a constant group address space cast the queue pointer wasn't enabled for the function, resulting in a crash on noreg later. llvm-svn: 271935	2016-06-06 20:03:31 +00:00
Artem Tamazov	135487767b	[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC. Another step for unification llvm assembler/disassembler with sp3. Besides, CodeGen output is a bit improved, thus changes in CodeGen tests. Assembler/Disassembler tests updated/added. Differential Revision: http://reviews.llvm.org/D20796 llvm-svn: 271900	2016-06-06 15:23:43 +00:00
Igor Breger	edafb0595e	[KNL] Fix UMULO lowering. Differential Revision: http://reviews.llvm.org/D21013 llvm-svn: 271891	2016-06-06 12:24:52 +00:00
Benjamin Kramer	f21beb2c74	Remove dead function with incredibly broken assert. Found by clang-tidy's misc-assert-side-effect. llvm-svn: 271887	2016-06-06 12:10:42 +00:00
Filipe Cabecinhas	6e7d5467c0	[NFC] Silence gcc warning (-Wsign-compare) llvm-svn: 271882	2016-06-06 10:49:56 +00:00
Craig Topper	143446d5c1	[AVX512] Add PALIGNR shuffle lowering for v32i16 and v16i32. llvm-svn: 271870	2016-06-06 05:39:10 +00:00
Simon Pilgrim	64c6de4525	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS raw mask decoding for target shuffle combines llvm-svn: 271834	2016-06-05 15:21:30 +00:00
Simon Pilgrim	478295dadd	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS as a target shuffle type llvm-svn: 271831	2016-06-05 15:01:45 +00:00
Simon Pilgrim	163987a235	[X86][XOP] Tidied up DecodeVPERMIL2PMask to more closely match DecodeVPERMILPMask. llvm-svn: 271830	2016-06-05 14:33:43 +00:00
Craig Topper	8eeda57a40	[AVX512] Add support for lowering PALIGNR for v64i8. Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang. llvm-svn: 271828	2016-06-05 06:29:12 +00:00
Craig Topper	9f51c9ef15	[AVX512] Fix PANDN combining for v4i32/v8i32 when VLX is enabled. v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled. llvm-svn: 271826	2016-06-05 05:35:11 +00:00
Simon Pilgrim	2ead861d07	[X86][XOP] Added VPERMIL2PD/VPERMIL2PS shuffle mask comment decoding llvm-svn: 271809	2016-06-04 21:44:28 +00:00
Craig Topper	e609bd6600	[X86] Add the VR128L/H and VR256L/H to the list of vector register classes for inline asm constraints. Also fix the comment on the function. llvm-svn: 271802	2016-06-04 20:15:08 +00:00
Saleem Abdulrasool	1fcdc23a6e	X86: enable TLS on Windows itanium Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for C++). Enable the TLS support for the target similar to the MSVC model. llvm-svn: 271797	2016-06-04 18:27:22 +00:00
Simon Pilgrim	fd2eda4f64	[X86][AVX2] Fix v16i16 SHL lowering (PR27730) The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result. The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits. llvm-svn: 271796	2016-06-04 16:45:33 +00:00
Craig Topper	6ae375c9ba	[X86] Use smaller types to shrink the intrinsic lowering tables by about 12K. llvm-svn: 271776	2016-06-04 04:32:17 +00:00
Craig Topper	5250634334	[X86] Use X86ISD::ABS for lowering pabs SSSE3/AVX intrinsics to match AVX512. Should allow those intrinsics to use the EVEX encoded instructions and get the extra registers when available. llvm-svn: 271775	2016-06-04 04:32:15 +00:00
Chad Rosier	be879ea751	[AArch64] Spot SBFX-compatible code expressed with sign_extend. This is very similar to r271677, but for extracts from i32 with the SIGN_EXTEND acting on a arithmetic shift. llvm-svn: 271717	2016-06-03 20:05:49 +00:00
Derek Schuff	5859a9ed80	[WebAssembly] Emit type signatures for declared functions Under emscripten, C code can take the address of a function implemented in Javascript (which is exposed via an import in wasm). Because imports do not have linear memory address in wasm, we need to generate a thunk to be the target of the indirect call; it call the import directly. To make this possible, LLVM needs to emit the type signatures for these functions, because they may not be called directly or referred to other than where the address is taken. This uses s new .s directive (.functype) which specifies the signature. Differential Revision: http://reviews.llvm.org/D20891 Re-apply r271599 but instead of bailing with an error when a declared function has multiple returns, replace it with a pointer argument. Also add the test case I forgot to 'git add' last time around. llvm-svn: 271703	2016-06-03 18:34:36 +00:00
Sjoerd Meijer	9bc93f6298	Code size optimisation: do not inline memcpy if this expansion results in more instructions than the libary call. Differential Revision: http://reviews.llvm.org/D20958 llvm-svn: 271678	2016-06-03 15:38:55 +00:00
Chad Rosier	2d658703e1	[AArch64] Spot SBFX-compatbile code expressed with sign_extend_inreg. We were assuming all SBFX-like operations would have the shl/asr form, but often when the field being extracted is an i8 or i16, we end up with a SIGN_EXTEND_INREG acting on a shift instead. This is a port of r213754 from ARM to AArch64. llvm-svn: 271677	2016-06-03 15:00:09 +00:00
Artem Tamazov	f88397c84c	[test/AMDGPU] Square-braced-syntax for registers: add macro test/example. Test added as per discussion in http://reviews.llvm.org/D20588. The macro is just a demonstration, useless in practice. Coding style fixes. Differential Revision: http://reviews.llvm.org/D20797 llvm-svn: 271675	2016-06-03 14:41:17 +00:00
Sjoerd Meijer	d906bf1369	RAS extensions are part of ARMv8.2-A. This change enables them by introducing a new instruction to ARM and AArch64 targets and several system registers. Patch by: Roger Ferrer Ibanez and Oliver Stannard Differential Revision: http://reviews.llvm.org/D20282 llvm-svn: 271670	2016-06-03 14:03:27 +00:00
Sjoerd Meijer	9da258d8e5	ARM target does not use printAliasInstr machinery which forces having special checks in ArmInstPrinter::printInstruction. This patch addresses this issue. Not all special checks could be removed: either they involve elaborated conditions under which the alias is emitted (e.g. ldm/stm on sp may be pop/push but only if the number of registers is >= 2) or the number of registers is multivalued (like happens again with ldm/stm) and they do not match the InstAlias pattern which assumes single-valued operands in the pattern. Patch by: Roger Ferrer Ibanez Differential Revision: http://reviews.llvm.org/D20237 llvm-svn: 271667	2016-06-03 13:19:43 +00:00
Sam Kolton	a4a99ad1bc	[AMDGPU] Assembler: More tests for SDWA instructions. Fix for SDWA float modifiers. Summary: Depends on D20625 Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20674 llvm-svn: 271662	2016-06-03 11:43:09 +00:00
Daniel Sanders	43750eab82	[mips] EABI CodeGen is completely untested and seems to have bitrotted. Remove it. Summary: There are no tests, no EABI buildbots, and simple test cases do not work. There is a single MIPS16 test using a mips*-gnueabi triple but this test doesn't test EABI and the triple doesn't cause EABI to be used. Reviewers: sdardis Subscribers: tberghammer, danalbert, srhines, dsanders, sdardis, llvm-commits Differential Revision: http://reviews.llvm.org/D20906 llvm-svn: 271658	2016-06-03 10:38:09 +00:00
Sam Kolton	05ef1c940f	[AMDGPU] Assembler: Custom converters for SDWA instructions. Support for _dpp and _sdwa suffixes in mnemonics. Summary: Added custom converters for SDWA instruction to support optional operands and modifiers. Support for _dpp and _sdwa suffixes that allows to force DPP or SDWA encoding for instructions. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: arsenm, kzhuravl Differential Revision: http://reviews.llvm.org/D20625 llvm-svn: 271655	2016-06-03 10:27:37 +00:00
Chandler Carruth	9ac86efd4d	Remove bogus initialization of the PPC and Hexagon SelectionDAGISel subclasses. These are not passes proper. We don't support registering them, they can't be constructed with default arguments, and the ID is actually in a base class. Only these two targets even had any boiler plate to try to do this, and it had to be munged out of the INITIALIZE_PASS macros to work. What's worse, the boiler plate has rotted and the "name" of the pass is actually the description string now!!! =/ All of this is completely unnecessary. No other target bothers, and nothing breaks if you don't initialize them because CodeGen has an entirely separate initialization path that is somewhat more durable than relying on the implicit initialization the way the 'opt' tool does for registered passes. llvm-svn: 271650	2016-06-03 10:13:31 +00:00
Chandler Carruth	d474144a7a	Use the standard INITIALIZE_PASS macro rather than hand rolling a (not entirely correct) version of its contents. llvm-svn: 271649	2016-06-03 10:13:29 +00:00
Daniel Sanders	6ba3dd6b71	[mips] Implement 'la' macro in PIC mode for O32. Summary: N32 support will follow in a later patch since the symbol version of 'la' incorrectly believes N32 to have 64-bit pointers and rejects it early. This fixes the three incorrectly expanded 'la' macros found in bionic. Reviewers: sdardis Subscribers: dsanders, llvm-commits, sdardis Differential Revision: http://reviews.llvm.org/D20820 llvm-svn: 271644	2016-06-03 09:53:06 +00:00
Simon Pilgrim	e85506b6e0	[X86][XOP] Support for VPERMIL2PD/VPERMIL2PS 2-input shuffle instructions This patch begins adding support for lowering to the XOP VPERMIL2PD/VPERMIL2PS shuffle instructions - adding the X86ISD::VPERMIL2 opcode and cleaning up the usage. The internal llvm intrinsics were assuming the shuffle mask operand was the same type as the float/double input operands (I guess to simplify the intrinsic definitions in X86InstrXOP.td to a single value type). These needed changing to integer types (matching the clang builtin and the AMD intrinsics definitions), an auto upgrade path is added to convert old calls. Mask decoding/target shuffle support will be added in future patches. Differential Revision: http://reviews.llvm.org/D20049 llvm-svn: 271633	2016-06-03 08:06:03 +00:00
Craig Topper	5e3d314488	[X86] Fix some isel patterns to remove an operand from some multiclasses. NFC llvm-svn: 271631	2016-06-03 05:58:52 +00:00
Craig Topper	e7ae106147	[AVX512] Ensure EVEX vpshufd, vpshuflw, and vpshufhw have isel priority over the VEX encoded ones. llvm-svn: 271629	2016-06-03 05:31:04 +00:00
Craig Topper	01f53b1773	[AVX512] Fix shuffle comment printing for EVEX encoded PSHUFD, PSHUFHW, and PSHUFLW. llvm-svn: 271628	2016-06-03 05:31:00 +00:00
Craig Topper	dc70d8a4b7	[X86] Simplify a multiclass to remove a parameter. NFC llvm-svn: 271627	2016-06-03 05:30:56 +00:00

... 2 3 4 5 6 ...

38071 Commits