llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	4fa79758b7	Small simplification, p0 is the same as p. llvm-svn: 197699	2013-12-19 16:51:03 +00:00
Zoran Jovanovic	8e918c3c4d	Support for microMIPS control instructions. llvm-svn: 197696	2013-12-19 16:25:00 +00:00
Rafael Espindola	9ec26f395b	Long doubles are required to be aligned to 128 bits and svr4 32 bits. Clang was already getting this right. llvm-svn: 197694	2013-12-19 16:23:59 +00:00
Hal Finkel	2345347eb9	Add a disassembler to the PowerPC backend The tests for the disassembler were adapted from the encoder tests, and for the most part, the output from the disassembler matches that encoder-test inputs. There are some places where more-informative mnemonics could be produced (notably for the branch instructions), and those cases are noted in the tests with FIXMEs. Future work includes: - Generating more-informative mnemonics when possible (this may also be done in the printer). - Remove the dependence on positional "numbered" operand-to-variable mapping (for both encoding and decoding). - Internally using 64-bit instruction variants in 64-bit mode (if this turns out to matter). llvm-svn: 197693	2013-12-19 16:13:01 +00:00
Zoran Jovanovic	ff9d5f3284	Support for microMIPS LL and SC instructions. llvm-svn: 197692	2013-12-19 16:12:56 +00:00
Zoran Jovanovic	69be811a6e	Support for microMIPS TLS relocations. llvm-svn: 197685	2013-12-19 16:02:32 +00:00
Matt Arsenault	a98cd6a56e	R600/SI: Make private pointers be 32-bit. Different sized address spaces should theoretically work most of the time now, and since 64-bit add is currently disabled, using more 32-bit pointers fixes some cases. llvm-svn: 197659	2013-12-19 05:32:55 +00:00
Saleem Abdulrasool	c0da2cb3b4	ARM IAS: support .inst directive This adds support for the .inst directive. This is an ARM specific directive to indicate an instruction encoded as a constant expression. The major difference between .word, .short, or .byte and .inst is that the latter will be disassembled as an instruction since it does not get flagged as data. llvm-svn: 197657	2013-12-19 05:17:58 +00:00
Josh Magee	22b8ba2d67	[stackprotector] Use analysis from the StackProtector pass for stack layout in PEI a nd LocalStackSlot passes. This changes the MachineFrameInfo API to use the new SSPLayoutKind information produced by the StackProtector pass (instead of a boolean flag) and updates a few pass dependencies (to preserve the SSP analysis). The stack layout follows the same approach used prior to this change - i.e., only LargeArray stack objects will be placed near the canary and everything else will be laid out normally. After this change, structures containing large arrays will also be placed near the canary - a case previously missed by the old implementation. Out of tree targets will need to update their usage of MachineFrameInfo::CreateStackObject to remove the MayNeedSP argument. The next patch will implement the rules for sspstrong and sspreq. The end goal is to support ssp-strong stack layout rules. WIP. Differential Revision: http://llvm-reviews.chandlerc.com/D2158 llvm-svn: 197653	2013-12-19 03:17:11 +00:00
Rafael Espindola	2fc7101e3c	Add stack alignment information for Sparc. This matches the data in clang which was added by Jakob Stoklund Olesen in r179596. Thanks for erikjv on irc for pointing me to the relevant documents: http://sparc.com/standards/64.psabi.1.35.ps.Z page 25: Every stack frame must be 16-byte aligned. http://sparc.com/standards/psABI3rd.pdf page 3-10: Although the architecture requires only word alignment, software convention and the operating system require every stack frame to be doubleword aligned. I tried to add a test, but it looks like sparc doesn't implement dynamic stack realignment. This will be tested in clang shortly. llvm-svn: 197646	2013-12-19 02:21:16 +00:00
Reid Kleckner	a534a38130	Begin adding docs and IR-level support for the inalloca attribute The inalloca attribute is designed to support passing C++ objects by value in the Microsoft C++ ABI. It behaves the same as byval, except that it always implies that the argument is in memory and that the bytes are never copied. This attribute allows the caller to take the address of an outgoing argument's memory and execute arbitrary code to store into it. This patch adds basic IR support, docs, and verification. It does not attempt to implement any lowering or fix any possibly broken transforms. When this patch lands, a complete description of this feature should appear at http://llvm.org/docs/InAlloca.html . Differential Revision: http://llvm-reviews.chandlerc.com/D2173 llvm-svn: 197645	2013-12-19 02:14:12 +00:00
Rafael Espindola	ddb913cc8f	Synchronize the NaCl DataLayout strings with the ones in clang. Patch by Derek Schuff. llvm-svn: 197640	2013-12-19 00:44:37 +00:00
Reed Kotler	47f3c64a48	Make cosmetic changes as part of Mips internal post commit review of patch r196331. llvm-svn: 197638	2013-12-19 00:43:08 +00:00
Reed Kotler	2500bd6c20	Fix a problem with mips16 stubs when calls are transformed during tail call optimization. Some more work may be needed for indirect calls but this patch fixes the current regression in Prolangc++/trees. S2 optimization as part of the general cleanup and optimization of prolog and epilog was not saving S2 in this case and needed to. llvm-svn: 197630	2013-12-18 23:57:48 +00:00
Weiming Zhao	63871d255f	[aarch32] fix bug 18268: Incorrect condition of vsel Given vsel_cc, op1, op2, since vsel has no LE/LT, to generate vsel for such selection, it needs to inverse cc and swap op1 and op2. To inverse cc, both L/G and E bits should be flipped. llvm-svn: 197615	2013-12-18 22:25:17 +00:00
Rafael Espindola	84a8726a31	Correctly handle the degenerated triple "thumb". Fixes a crash in llc where some parts think the target is thumb and others think it is ARM. llvm-svn: 197607	2013-12-18 21:29:44 +00:00
Logan Chien	a39510aeaa	[arm] Rename Tag_VFP_arch to Tag_FP_arch. According to "Addenda to ABI for ARM architecture", Tag_FP_arch is the new name for the equivalent Tag_VFP_arch. This commit renames Tag_VFP_arch to Tag_FP_arch. llvm-svn: 197587	2013-12-18 17:23:15 +00:00
Rafael Espindola	988f35e999	Fix f64 and f128 for ppc-darwin. This patch adds -f64:32:64 to 32 bit ppc darwin since a f64 inside a structure are only 32 bit aligned. The patch also drop -f128:64:128 from all ppc darwin, since f128 is 128 bit aligned. llvm-svn: 197574	2013-12-18 15:06:25 +00:00
Rafael Espindola	382ee385fd	One ppc32-darwin, a i64 inside a structure can have 32 bit alignment. Thanks for Iain Sandoe for testing this with the original gcc. Clang was already getting this right. llvm-svn: 197572	2013-12-18 14:35:37 +00:00
Tim Northover	f1c31b95e0	ARM: update comment to match reality llvm-svn: 197570	2013-12-18 14:18:36 +00:00
Tim Northover	44594ad7e2	ARM: set default float ABI based on triple. Clang sets the float-abi target option manually, but no longer annotates each function with its ABI. This can lead to confusing mistmatch between "clang -emit-llvm \| llc" and normal clang invocations. Besides which, gnueabihf actually is hard-float. Defaulting to soft was just perverse. llvm-svn: 197554	2013-12-18 09:27:33 +00:00
Kevin Qin	53eaea0104	[AArch64 NEON]Implment loading vector constant form constant pool. llvm-svn: 197551	2013-12-18 06:26:04 +00:00
Rafael Espindola	febb8d2b96	Fix N32 registers and stack alignment. This patch fixes the "n" and "S" components of the data layout for mips. Clang already gets this right. This will be tested in clang. llvm-svn: 197536	2013-12-17 23:15:58 +00:00
Hal Finkel	b4b99e545b	Eliminate PPC instruction decoding ambiguities The instruction definitions in the PPC backend have a number of variants defined for the same instruction to represent differences between 64-bit and 32-bit semantics. In order to generate a disassembler for the PPC backend, we need to mark all but one of these as CodeGen only. No functionality change intended; this is prep work for PPC disassembly support. llvm-svn: 197535	2013-12-17 23:05:18 +00:00
Rafael Espindola	8c08120dba	On APCS, only try to align aggregates to 32 bits instead of 64. This matches clang's behavior and since it is only a preference, it is not an ABI issue. llvm-svn: 197526	2013-12-17 21:36:54 +00:00
Rafael Espindola	9704fd03d1	Handle i64 first for clarity. No functionality change. llvm-svn: 197524	2013-12-17 21:28:36 +00:00
Duncan P. N. Exon Smith	ab5dbebc11	Assert that the last operand is actually EFLAGS This is another follow-up to r197503, after a post-commit review by Andy. <rdar://problem/15627766> llvm-svn: 197520	2013-12-17 20:28:21 +00:00
Matheus Almeida	8cc8b35a73	[mips] Fix off by one issue when applying a fixup. The branch offset for a R_MIPS_PC16 relocation is indeed a 16-bit signed immediate. llvm-svn: 197506	2013-12-17 17:10:00 +00:00
Duncan P. N. Exon Smith	512601d77f	Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS"" This reverts commit r197481, recommiting r197469 with an extra fix. The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which changed the initial scheduler to source-order as part of enabling the MI Scheduler for X86. This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to try to save %flags, and adds a test that catches the bad behavior of r197469. <rdar://problem/15627766> llvm-svn: 197503	2013-12-17 15:54:45 +00:00
Rafael Espindola	345d718d16	Fix the pointer size for the PS3 datalayout. This will be tested from clang. llvm-svn: 197501	2013-12-17 15:29:48 +00:00
Stepan Dyatkovskiy	7f7c2710e0	Fix for PR18045: http://llvm.org/bugs/show_bug.cgi?id=18045 Short issue description: For X86 machines with sse < sse4.1 we got failures for some particular load/store vector sequences: $ clang-trunk -m32 -O2 test-case.c fatal error: error in backend: Cannot select: 0x4200920: v4i32,ch = load 0x41d6ab0, 0x4205850, 0x41dcb10<LD16[getelementptr inbounds ([4 x i32]* @e, i32 0, i32 0)](align=4)> [ORD=82] [ID=58] 0x4205850: i32 = X86ISD::Wrapper 0x41d5490 [ORD=26] [ID=43] 0x41d5490: i32 = TargetGlobalAddress<[4 x i32]* @e> 0 [ORD=26] [ID=23] 0x41dcb10: i32 = undef [ID=2] The reason is that EltsFromConsecutiveLoads could emit such load instruction both before and after legalize stage. Though this instruction is not legal for machines with SSSE3 and lower. The fix: In EltsFromConsecutiveLoads, if we have passed legalize stage, we check whether nodes it emits are legal. P.S.: If you get failure in time from 12:00 and till 22:00 (UTC-8), perhaps I'll slow with response, so you better reject this commit. Thanks! llvm-svn: 197492	2013-12-17 12:07:33 +00:00
Elena Demikhovsky	c5f6726a24	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Duncan P. N. Exon Smith	b2d4274d3f	Revert "Mark vastart_save_xmm_regs as changing EFLAGS" This reverts commit r197469. The sanitizer and dragonegg buildbots are failing, I think because of this change. Reverting until I figure out why. llvm-svn: 197481	2013-12-17 07:13:58 +00:00
Duncan P. N. Exon Smith	a4acde39e9	Mark vastart_save_xmm_regs as changing EFLAGS The vastart_save_xmm_regs pseudo-instruction expands to a test and a branch, so it modifies EFLAGS. Mark it so, or else the scheduler might place it in the middle of another test+branch. This fixes a bug exposed by r192750, which turned on the MI Scheduler for X86. <rdar://problem/15627766> llvm-svn: 197469	2013-12-17 06:12:05 +00:00
Andrew Trick	e339828b90	Allow MachineCSE to coalesce trivial subregister copies the same way that it coalesces normal copies. Without this, MachineCSE is powerless to handle redundant operations with truncated source operands. This required fixing the 2-addr pass to handle tied subregisters. It isn't clear what combinations of subregisters can legally be tied, but the simple case of truncated source operands is now safely handled: %vreg11<def> = COPY %vreg1:sub_32bit; GR32:%vreg11 GR64:%vreg1 %vreg12<def> = COPY %vreg2:sub_32bit; GR32:%vreg12 GR64:%vreg2 %vreg13<def,tied1> = ADD32rr %vreg11<tied0>, %vreg12<kill>, %EFLAGS<imp-def> Test case: cse-add-with-overflow.ll. This exposed an existing bug in PPCInstrInfo::commuteInstruction. Thanks to Rafael for the test case: PowerPC/crash.ll. llvm-svn: 197465	2013-12-17 04:50:45 +00:00
Andrew Trick	9defbd882b	whitespace llvm-svn: 197464	2013-12-17 04:50:40 +00:00
Yi Jiang	6ab044ee35	Enable double to float shrinking optimizations for binary functions like 'fmin/fmax'. Fix radar:15283121 llvm-svn: 197434	2013-12-16 22:42:40 +00:00
Juergen Ributzka	9ed985baad	[Stackmap] Allow WebKit_JS calling convention to store 4 byte sized and aligned arguments. This allows the WebKit_JS calling convention to perform partial writes on a 4 byte granularity to stack slots. llvm-svn: 197431	2013-12-16 22:05:32 +00:00
Matt Arsenault	cb34f84e39	Fix typo in instruction name. SI_KIL -> SI_KILL llvm-svn: 197425	2013-12-16 20:58:33 +00:00
Juergen Ributzka	b1612c18ab	[Stackmap] The first integer argument is passed in register for the WebKit_JS calling convention. Pass the first integer argument (callee) in register to optimize inline caches. llvm-svn: 197416	2013-12-16 19:53:31 +00:00
Rafael Espindola	e89b41495a	One last cleanup of LLVM's DataLayout strings. Produce them in the same order on every target. The order is that of getStringRepresentation: e\|E-i-f-v-a-s-n-S*. llvm-svn: 197411	2013-12-16 19:31:14 +00:00
Rafael Espindola	0eb1ebeaac	Structure R600's computeDataLayout more like every other target. While there, simplify "p3:32:32:32" to "p3:32:32". llvm-svn: 197407	2013-12-16 19:18:57 +00:00
Joerg Sonnenberger	8fe41b7319	Recognize EABIHF as environment and use it for RTAPI + VFP. llvm-svn: 197405	2013-12-16 18:51:28 +00:00
Chad Rosier	5f87edb484	[AArch64] Fix v1fx patterns for Floating-point Multiply Extend and Floating-point Compare to Zero. llvm-svn: 197402	2013-12-16 18:29:35 +00:00
Rafael Espindola	bccb9d45ad	The preferred alignment defaults to the abi alignment. Omit if it is the same. llvm-svn: 197400	2013-12-16 18:01:51 +00:00
Rafael Espindola	f057093fdc	Don't duplicate the DataLayout defaults for integer, floats and vectors. llvm-svn: 197398	2013-12-16 17:41:15 +00:00
Rafael Espindola	8afbb28cea	On DataLayout, omit the default of p:64:64:64. llvm-svn: 197397	2013-12-16 17:15:29 +00:00
Hal Finkel	0a576d52fa	Set has_asmparser in PowerPC/LLVMBuild.txt PowerPC now has an asm parser (and has for many months now); indicate this in PowerPC/LLVMBuild.txt. llvm-svn: 197393	2013-12-16 15:48:09 +00:00
Elena Demikhovsky	47fc44e52e	AVX-512: Added legal type MVT::i1 and VK1 register for it. Added scalar compare VCMPSS, VCMPSD. Implemented LowerSELECT for scalar FP operations. I replaced FSETCCss, FSETCCsd with one node type FSETCCs. Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1. llvm-svn: 197384	2013-12-16 13:52:35 +00:00
Evgeniy Stepanov	a1df6379a6	Fix Android regression in r197332. llvm-svn: 197366	2013-12-16 07:02:51 +00:00
Hao Liu	774cabb538	[AArch64]Fix the pattern match failure for v1i8/v1i16/v1i32 types. Currently we have such types as legal vector types. The DAG combiner may generate some DAG nodes having such types but we don't have patterns to match them. E.g. a load i32 and a bitcast i32 to v1i32 will be combined into a load v1i32: bitcast (load i32) to v1i32 -> load v1i32. So this patch fixes such problems for load/dup instructions. If v1i8/v1i16/v1i32 are not legal any more, the code in this patch can be deleted. So I also add some FIXME. llvm-svn: 197361	2013-12-16 02:51:28 +00:00
Reed Kotler	b69ea1e92e	remove an uneeded statement (condition is covered by the statement that follows). llvm-svn: 197358	2013-12-15 23:33:59 +00:00
Reed Kotler	06b3c4f484	Fix some indentation. llvm-svn: 197357	2013-12-15 23:03:35 +00:00
Reed Kotler	4d030b4e89	Get rid of an superfluous tab in the .s file. This was originally part of a multi-line pseudo which worked around a linker bug for mips16. llvm-svn: 197356	2013-12-15 22:02:31 +00:00
Reed Kotler	5c29d63a66	Last change for mips16 prolog/epilog cleanup and optimization. Some tiny cosmetic code changes to follow. Because of the wide ranging nature of the patch a full 24 test cycle was needed to check against regression. This was the smallest patch I could make to progress from the earlier ones in the series. llvm-svn: 197350	2013-12-15 20:49:30 +00:00
Joerg Sonnenberger	ddb582896a	There is no exp10 on NetBSD. llvm-svn: 197348	2013-12-15 20:36:17 +00:00
Joerg Sonnenberger	7466979f20	Replace string matching with a switch on Triple::getEnvironment. llvm-svn: 197332	2013-12-15 00:12:52 +00:00
Matt Arsenault	52226f9a8e	Don't manually calculate size in bytes llvm-svn: 197327	2013-12-14 18:21:59 +00:00
Iain Sandoe	e0b4cb62f5	[Powerpc darwin] AsmParser Base implementation. This is a base implementation of the powerpc-apple-darwin asm parser dialect. * Enables infrastructure (essentially isDarwin()) and fixes up the parsing of asm directives to separate out ELF and MachO/Darwin additions. * Enables parsing of {r,f,v}XX as register identifiers. * Enables parsing of lo16() hi16() and ha16() as modifiers. The changes to the test case are from David Fang (fangism). llvm-svn: 197324	2013-12-14 13:34:02 +00:00
Juergen Ributzka	36f4619753	[Stackmap] Only the AnyReg calling convention should preserve all registers. llvm-svn: 197316	2013-12-14 06:52:59 +00:00
Rafael Espindola	456f047546	Refactor NVPTX's computeDataLayout. No functionality change. llvm-svn: 197312	2013-12-14 06:42:48 +00:00
Rafael Espindola	307d7abc7f	Turn NVPTXSubtarget::getDataLayout into a static function. No functionality change. llvm-svn: 197311	2013-12-14 06:36:30 +00:00
Rafael Espindola	ceb0c4962a	Turn AMDGPUSubtarget::getDataLayout into a static function. No functionality change. llvm-svn: 197310	2013-12-14 06:13:44 +00:00
Kevin Enderby	651898c19f	Fixed a bug in getARMFixupKindMachOInfo() where three ARM fixup kinds were falling into the cases for 24-bit branch kinds which are not 24-bit branches. The routine is to return false for fixups are expected to always be resolvable at assembly time. Which these three fixups are as they have limited displacement and are for local references within a function. rdar://15586725 llvm-svn: 197282	2013-12-13 22:46:54 +00:00
Chad Rosier	e139dd4fe6	[AArch64] Simplify the Neon Scalar3Same patterns for floating-point reciprocal step, floating-point reciprocal square root step, floating-point absolute difference, and integer/floating-point compare instructions. Also, move the scalar general arithmetic operation patterns closer to similar code. No functional change intended. llvm-svn: 197250	2013-12-13 17:56:44 +00:00
Rafael Espindola	1caa693a7b	Assume defaults to produce smaller datalayout strings. llvm-svn: 197249	2013-12-13 17:56:11 +00:00
Rafael Espindola	dfc1470d2d	Fix pr18235. The cpp backend is not a reasonable fallback for a missing target. It is a very special backend, so it is reasonable to use it only if explicitly requested. While at it, simplify the interface a bit. llvm-svn: 197241	2013-12-13 16:05:32 +00:00
Richard Sandiford	0847c450b6	[SystemZ] Optimize X [!=]= Y in cases where X - Y or Y - X is also computed In those cases it's better to compare the result of the subtraction against zero. llvm-svn: 197239	2013-12-13 15:50:30 +00:00
Richard Sandiford	c3dc44781b	[SystemZ] Make more use of TMHH This originally came about after noticing that InstCombine turns some of the TMHH (icmp (and...), ...) tests into plain comparisons. Since there is no instruction to compare with a 64-bit immediate, TMHH is generally better than an ordered comparison for the cases that it can handle. llvm-svn: 197238	2013-12-13 15:46:55 +00:00
Iain Sandoe	680385830f	test commit. Amend a comment. llvm-svn: 197237	2013-12-13 15:46:48 +00:00
Richard Sandiford	57485472e2	[SystemZ] Extend integer absolute selection This patch makes more use of LPGFR and LNGFR. It builds on top of the LTGFR selection from r197234. Most of the tests are motivated by what InstCombine would produce. llvm-svn: 197236	2013-12-13 15:35:00 +00:00
Richard Sandiford	d420f7344f	[SystemZ] Add a structure to represent a selected comparison ...in an attempt to rein back the increasingly complex selection code. A knock-on effect is that ICmpType is exposed from the outset, which slightly simplifies adjustSubwordCmp. The code is no piece of art even after this change, but at least it should be slightly better. No behavioral change intended. llvm-svn: 197235	2013-12-13 15:28:45 +00:00
Richard Sandiford	bd2f0e9cd0	[SystemZ] Make more use of LTGFR InstCombine turns (sext (trunc)) into (ashr (shl)), then converts any comparison of the ashr against zero into a comparison of the shl against zero. This makes sense in itself, but we want to undo it for z, since the sign- extension instruction has a CC-setting form. I've included tests for both the original and InstCombined variants, but the former already worked. The patch fixes the latter. llvm-svn: 197234	2013-12-13 15:07:39 +00:00
Benjamin Kramer	e723bb10b0	X86: When lowering shl_parts, don't emit shift amounts larger than the bit width. While it's safe for the X86-specific shift nodes, dag combining will kill generic nodes. Insert an AND to make it safe, isel will nuke it as x86's shift instructions have an implicit AND. Fixes PR16108, which contains a contraption to hit this case in between constant folders. llvm-svn: 197228	2013-12-13 13:40:24 +00:00
Joerg Sonnenberger	002a14765e	Enabling thumb2 mode used to force support for armv6t2. Replace this with a temporary assertion and adjust the various test cases. llvm-svn: 197224	2013-12-13 11:16:00 +00:00
Matheus Almeida	e0d75aacf1	[mips] Add checks for alignment and maximum displacements for most of the branch instructions for mips and micromips instruction sets thus avoiding the situation of generating branches to undesired locations if offsets cannot be encoded. This patch also checks if a fixup cannot be applied and returns a fatal error if that's the case. llvm-svn: 197223	2013-12-13 11:11:02 +00:00
Kai Nacke	87b23aec08	Change stack probing code for MingW. Since gcc 4.6 the compiler uses ___chkstk_ms which has the same semantics as the MS CRT function __chkstk. This simplifies the prologue generation a bit. Reviewed by Rafael Espíndola. llvm-svn: 197205	2013-12-13 05:37:05 +00:00
Rafael Espindola	720ae4f885	Simplify the datalayout string of ARM and AArch64. No functionality change. Reviewed by Tim Northover. llvm-svn: 197172	2013-12-12 17:43:37 +00:00
Rafael Espindola	3db958387f	Simplify the SystemZ datalayout string. Reviewed by Richard Sandiford. llvm-svn: 197170	2013-12-12 17:30:07 +00:00
Rafael Espindola	e8f4d58700	Use "a" instead of "a0" in DataLayout. It means exactly the same and is just a bit shorter. llvm-svn: 197169	2013-12-12 17:21:51 +00:00
Rafael Espindola	32cb5ac904	Switch to the new MingW ABI. GCC 4.7 changed the MingW ABI. On the LLVM side it means that sret functions don't pop the stack. llvm-svn: 197163	2013-12-12 16:06:58 +00:00
Chad Rosier	4055f42d22	[AArch64] Removed unnecessary copy patterns with v1fx types. - Copy patterns with float/double types are enough. - Fix typos in test case names that were using v1fx. - There is no ACLE intrinsic that uses v1f32 type. And there is no conflict of neon and non-neon ovelapped operations with this type, so there is no need to support operations with this type. - Remove v1f32 from FPR32 register and disallow v1f32 as a legal type for operations. Patch by Ana Pazos! llvm-svn: 197159	2013-12-12 15:46:29 +00:00
Andrea Di Biagio	9b5c3dcf01	Added new X86 patterns to select SSE scalar fp arithmetic instructions from a vector packed single/double fp operation followed by a vector insert. The effect is that the backend coverts the packed fp instruction followed by a vectro insert into a SSE or AVX scalar fp instruction. For example, given the following code: __m128 foo(__m128 A, __m128 B) { __m128 C = A + B; return (__m128) {c[0], a[1], a[2], a[3]}; } previously we generated: addps %xmm0, %xmm1 movss %xmm1, %xmm0 we now generate: addss %xmm1, %xmm0 llvm-svn: 197145	2013-12-12 11:50:47 +00:00
Gabor Greif	5fde43bf2e	typo in comment llvm-svn: 197136	2013-12-12 08:00:34 +00:00
Hao Liu	46a10eec28	[AArch64]Fix the problem that AArch64 backend fails to select scalar_to_vector of vector types having more than one element. llvm-svn: 197135	2013-12-12 07:36:26 +00:00
Reed Kotler	3230e725aa	Check for null pointer before dereferencing. A careless typo on my part. I don't know why this did not show up earlier. This code has been around for ages. llvm-svn: 197119	2013-12-12 02:41:11 +00:00
Yi Jiang	f92a574246	Resubmit r196544: Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 197109	2013-12-12 01:55:04 +00:00
Hal Finkel	fa50630e43	Remove unused multiclass from PPCInstrInfo.td llvm-svn: 197100	2013-12-12 00:23:29 +00:00
Hal Finkel	ceb1f12d9a	Improve instruction scheduling for the PPC POWER7 Aside from a few minor latency corrections, the major change here is a new hazard recognizer which focuses on better dispatch-group formation on the POWER7. As with the PPC970's hazard recognizer, the most important thing it does is avoid load-after-store hazards within the same dispatch group. It uses the POWER7's special dispatch-group-terminating nop instruction (instead of inserting multiple regular nop instructions). This new hazard recognizer makes use of the scheduling dependency graph itself, built using AA information, to robustly detect the possibility of load-after-store hazards. significant test-suite performance changes (the error bars are 99.5% confidence intervals based on 5 test-suite runs both with and without the change -- speedups are negative): speedups: MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 -0.55171% +/- 0.333168% MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl -17.5576% +/- 14.598% MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl -29.5708% +/- 7.09058% MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt -34.9471% +/- 11.4391% SingleSource/Benchmarks/BenchmarkGame/puzzle -25.1347% +/- 11.0104% SingleSource/Benchmarks/Misc/flops-8 -17.7297% +/- 9.79061% SingleSource/Benchmarks/Shootout-C++/ary3 -35.5018% +/- 23.9458% SingleSource/Regression/C/uint64_to_float -56.3165% +/- 25.4234% SingleSource/UnitTests/Vectorizer/gcc-loops -18.5309% +/- 6.8496% regressions: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 18.351% +/- 12.156% SingleSource/Benchmarks/Shootout-C++/methcall 27.3086% +/- 14.4733% llvm-svn: 197099	2013-12-12 00:19:11 +00:00
Chad Rosier	446d8ea0fb	[AArch64] Refactor NEON floating-point Max/Min/Maxnm/Minnm across vector AArch64 intrinsics to use f32 types, rather than their vector equivalents. llvm-svn: 197090	2013-12-11 23:21:25 +00:00
Hal Finkel	94a6f380bb	Fix the PPC subsumes-predicate check For one predicate to subsume another, they must both check the same condition register. Failure to check this prerequisite was causing miscompiles. Fixes PR18003. llvm-svn: 197089	2013-12-11 23:12:25 +00:00
Chad Rosier	088f93d4b5	[AArch64] Add NEON scalar floating-point compare LLVM AArch64 intrinsics that use f32/f64 types, rather than their vector equivalents. llvm-svn: 197068	2013-12-11 21:03:46 +00:00
Chad Rosier	473a01e1c9	[AArch64] Refactor the NEON scalar floating-point reciprocal step and floating-point reciprocal square root step LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector equivalents. llvm-svn: 197067	2013-12-11 21:03:43 +00:00
Chad Rosier	7098fcc062	[AArch64] Refactor the NEON scalar floating-point reciprocal estimate, floating- point reciprocal exponent, and floating-point reciprocal square root estimate LLVM AArch64 intrinsics to use f32/f64 types, rather than their vector equivalents. llvm-svn: 197066	2013-12-11 21:03:40 +00:00
Rafael Espindola	009e758628	Don't set unused variable. llvm-svn: 197064	2013-12-11 20:40:57 +00:00
Tom Stellard	d7e146ede6	R600: Re-format Processors.td This makes it a little easier to read. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197058	2013-12-11 17:51:51 +00:00
Tom Stellard	f2ba972af6	R600: Register AMDGPUCFGStructurizer pass This enables -print-before-all to dump MachineInstrs after it is run. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197057	2013-12-11 17:51:47 +00:00
Tom Stellard	1de5582d06	R600: Register R600EmitClauseMarkers pass This enables -print-before-all to dump MachineInstrs after it is run. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 197056	2013-12-11 17:51:41 +00:00
Logan Chien	439e8f9e38	[arm] Implement ARM .arch directive. llvm-svn: 197052	2013-12-11 17:16:25 +00:00
Tim Northover	76fc8a4c40	ARM: constrain register-class in fast-isel The tests were no longer using fast-isel at all (MachO needs an "ios" rather than "darwin" triple at the moment and Linux needs ARM mode). Once that was corrected, the verifier complained about a t2ADDri created for the alloca. llvm-svn: 197046	2013-12-11 16:04:57 +00:00
Elena Demikhovsky	cf08809813	AVX-512: Removed "z" suffix from AVX-512 instructions, since it is incompatible with GCC. I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions). llvm-svn: 197041	2013-12-11 14:31:04 +00:00
Richard Sandiford	73170f8488	[SystemZ] Optimize fcmp X, 0 in cases where X is also negated In such cases it's often better to test the result of the negation instead, since the negation also sets CC. llvm-svn: 197032	2013-12-11 11:45:08 +00:00
Reed Kotler	5bde5c35f4	Distinguish and choose 16 or 32 bit forms of save/restore for Mips16. llvm-svn: 196999	2013-12-11 03:32:44 +00:00
Kevin Qin	310b6c08ba	[AArch64 NEON] Get instruction BSL matched to VSELECT. llvm-svn: 196998	2013-12-11 02:33:50 +00:00
Rafael Espindola	b2fb78d45a	Move mips' datalayout computation out of line and add comments. llvm-svn: 196996	2013-12-11 01:41:10 +00:00
Rafael Espindola	60f48e5a67	Move Sparc's getDataLayout out of line and add comments. llvm-svn: 196990	2013-12-11 01:07:43 +00:00
NAKAMURA Takumi	8bc9bfaa5a	Prune redundant dependencies in LLVMBuild.txt. llvm-svn: 196988	2013-12-11 00:30:57 +00:00
Rafael Espindola	5b3585871b	Move PPC's getDataLayoutString out of line and document it better. llvm-svn: 196987	2013-12-11 00:09:06 +00:00
Reid Kleckner	ad92aca47c	Revert the backend fatal error from r196939 The combination of inline asm, stack realignment, and dynamic allocas turns out to be too common to reject out of hand. ASan inserts empy inline asm fragments and uses aligned allocas. Compiling any trivial function containing a dynamic alloca with ASan is enough to trigger the check. XFAIL the test cases that would be miscompiled and add one that uses the relevant functionality. llvm-svn: 196986	2013-12-10 23:23:52 +00:00
Rafael Espindola	002f8aa584	Refactor the computation of the x86 datalayout. llvm-svn: 196976	2013-12-10 22:05:32 +00:00
Matt Arsenault	eaa3a7efab	Use llvm_unreachable instead of assert(0) llvm-svn: 196971	2013-12-10 21:37:42 +00:00
David Fang	1b01849f2d	on darwin<10, fallback to .weak_definition (PPC,X86) .weak_def_can_be_hidden was not yet supported by the system assembler llvm-svn: 196970	2013-12-10 21:37:41 +00:00
Chad Rosier	f70af21651	[AArch64] Refactor the NEON floating-point absolute difference LLVM AArch64 intrinsic to use f32/f64 types, rather than their vector equivalents. llvm-svn: 196965	2013-12-10 21:33:59 +00:00
Chad Rosier	07cc3f9100	[AArch64] Refactor the NEON signed/unsigned floating-point convert to fixed-point LLVM AArch64 intrinsics to use f32/f64, rather than their vector equivalents. llvm-svn: 196964	2013-12-10 21:33:56 +00:00
Chad Rosier	98b8baa35c	[AArch64] Overload NEON signed/unsigned floating-point convert to fixed-point and fixed-point convert to floating-point LLVM AArch64 intrinsics. llvm-svn: 196963	2013-12-10 21:33:53 +00:00
Chad Rosier	cc34d187b8	[AArch64] Overload NEON signed/unsigned integer convert to floating-point LLVM AArch64 intrinsics. llvm-svn: 196962	2013-12-10 21:33:50 +00:00
Reid Kleckner	ee08897fb8	Reland "Fix miscompile of MS inline assembly with stack realignment" This re-lands commit r196876, which was reverted in r196879. The tests have been fixed to pass on platforms with a stack alignment larger than 4. Update to clang side tests will land shortly. llvm-svn: 196939	2013-12-10 18:27:32 +00:00
Tim Northover	9653eb5759	Make Triple's isOSBinFormatXXX functions partition triple-space. Most users would be surprised if "isCOFF" and "isMachO" were simultaneously true, unless they'd put the compiler in a box with a gun attached to a photon detector. This makes sure precisely one of the three formats is true for any triple and simplifies some target logic based on that. llvm-svn: 196934	2013-12-10 16:57:43 +00:00
Chad Rosier	7a9bba442f	[AArch64] Refactor the Neon vector/scalar floating-point convert intrinsics so that they use float/double rather than the vector equivalents when appropriate. llvm-svn: 196930	2013-12-10 16:11:39 +00:00
Chad Rosier	fcc4c366d1	[AArch64] Refactor the Neon vector/scalar floating-point convert implementation. Specifically, reuse the ARM intrinsics when possible. llvm-svn: 196926	2013-12-10 15:35:33 +00:00
Andrea Di Biagio	f7c33c8162	Ensure that the backend no longer emits unnecessary vector insert instructions immediately after SSE scalar fp instructions like addss or mulss. Added patterns to select SSE scalar fp arithmetic instructions from a scalar fp operation followed by a blend. For example, given the following code: __m128 foo(__m128 A, __m128 B) { A[0] += B[0]; return A; } previously we generated: addss %xmm0, %xmm1 movss %xmm1, %xmm0 now we generate: addss %xmm1, %xmm0 llvm-svn: 196925	2013-12-10 15:22:48 +00:00
Vincent Lejeune	cc0ea74c7b	R600: Fix an infinite loop when trying to reorganize export/tex vector input llvm-svn: 196923	2013-12-10 14:43:31 +00:00
Vincent Lejeune	f92d64d160	R600: Fix input modifiers lost for Cayman llvm-svn: 196922	2013-12-10 14:43:27 +00:00
Reed Kotler	0ff4001781	Next step in Mips16 prologue/epilogue cleanup. Save S2(reg 18) only when we are calling floating point stubs that have a return value of float or complex. Some more work to make this better but this is the first step. llvm-svn: 196921	2013-12-10 14:29:38 +00:00
Elena Demikhovsky	e382c3fdcd	AVX-512: changed intrinsics for mask operations llvm-svn: 196918	2013-12-10 13:53:10 +00:00
Elena Demikhovsky	6270b388c8	AVX-512: Changed intrinsics of VPCONFLICT to match GCC builtin form llvm-svn: 196914	2013-12-10 11:58:35 +00:00
Daniel Sanders	c309be2f1f	[mips][msa] Correct sld and sldi builtins. Summary: The result register of these instructions is also the first operand. Reviewers: jacksprat, dsanders Reviewed By: dsanders Differential Revision: http://llvm-reviews.chandlerc.com/D2362 Differential Revision: http://llvm-reviews.chandlerc.com/D2363 llvm-svn: 196910	2013-12-10 11:37:00 +00:00
Richard Sandiford	bef3d7af2b	Add TargetLowering::prepareVolatileOrAtomicLoad One unusual feature of the z architecture is that the result of a previous load can be reused indefinitely for subsequent loads, even if a cache-coherent store to that location is performed by another CPU. A special serializing instruction must be used if you want to force a load to be reattempted. Since volatile loads are not supposed to be omitted in this way, we should insert a serializing instruction before each such load. The same goes for atomic loads. The patch implements this at the IR->DAG boundary, in a similar way to atomic fences. It is a no-op for targets other than SystemZ. llvm-svn: 196906	2013-12-10 10:49:34 +00:00
Richard Sandiford	9afe613d12	Add TargetLowering::prepareVolatileOrAtomicLoad One unusual feature of the z architecture is that the result of a previous load can be reused indefinitely for subsequent loads, even if a cache-coherent store to that location is performed by another CPU. A special serializing instruction must be used if you want to force a load to be reattempted. Since volatile loads are not supposed to be omitted in this way, we should insert a serializing instruction before each such load. The same goes for atomic loads. The patch implements this at the IR->DAG boundary, in a similar way to atomic fences. It is a no-op for targets other than SystemZ. llvm-svn: 196905	2013-12-10 10:36:34 +00:00
Kevin Qin	43385c7065	[AArch64 NEON] Replace fpimm with fpz32 for floating compare with zero. This is a small change to be strict. Just want get pattern safer. llvm-svn: 196889	2013-12-10 06:51:07 +00:00
Kevin Qin	04396d1e69	[AArch64 NEON] Support poly128_t and implement relevant intrinsic. llvm-svn: 196887	2013-12-10 06:48:35 +00:00
NAKAMURA Takumi	396d4d3c7e	Add proper dependencies to LLVMBuild.txt in llvm/lib. I'll prune redundant deps in LLVMBuild.txt, later. llvm-svn: 196881	2013-12-10 05:39:34 +00:00
NAKAMURA Takumi	e3afe2ef62	Whitespaces. llvm-svn: 196880	2013-12-10 05:39:12 +00:00
Reid Kleckner	0a9509f080	Revert "Fix miscompile of MS inline assembly with stack realignment" This reverts commit r196876. Its tests failed on the bots, so I'll figure it out tomorrow. llvm-svn: 196879	2013-12-10 05:31:27 +00:00
Reid Kleckner	7f10a8cd45	Fix miscompile of MS inline assembly with stack realignment For stack frames requiring realignment, three pointers may be needed: - ebp to address incoming arguments - esi (could be any callee-saved register) to address locals - esp to address outgoing arguments We would use esi unconditionally without verifying that it did not conflict with inline assembly. This change doesn't do the verification, it simply emits a fatal error on functions that use stack realignment, dynamic SP adjustments, and inline assembly. Because stack realignment is common on Windows, we also no longer assume that MS inline assembly clobbers esp. Instead, we analyze the inline instructions for implicit definitions and check if esp is there. If so, we require the use of a base pointer and consider it in the condition above. Mostly fixes PR16830, but we could try harder to find a non-conflicting base pointer. Reviewers: sunfish Differential Revision: http://llvm-reviews.chandlerc.com/D1317 llvm-svn: 196876	2013-12-10 05:12:23 +00:00
Rafael Espindola	1d224bd65f	Add comments documenting the ARM datalayout string. llvm-svn: 196850	2013-12-10 00:37:37 +00:00
Rafael Espindola	74d682b443	Simplify further. Thanks to Jim Grosbach for noticing it. llvm-svn: 196846	2013-12-10 00:15:35 +00:00
Rafael Espindola	964bf07fb8	Refactor the construction of the DataLayout string on ARM. llvm-svn: 196843	2013-12-09 23:56:41 +00:00
Chad Rosier	5c8bf9c3db	[AArch64] Refactor the NEON scalar reduce pairwise intrinsics, so that they use float/double rather than the vector equivalents when appropriate. llvm-svn: 196833	2013-12-09 22:47:38 +00:00
Chad Rosier	3b0b3ee71e	[AArch64] Refactor NEON scalar reduce pairwise front-end codegen to remove unnecessary patterns in tablegen. llvm-svn: 196832	2013-12-09 22:47:34 +00:00
Chad Rosier	397ff3945c	[AArch64] Remove q and non-q intrinsic definitions in the NEON scalar reduce pairwise implementation, using an overloaded definition instead. llvm-svn: 196831	2013-12-09 22:47:31 +00:00
Reed Kotler	b102fa5aef	get rid of superfluous comment llvm-svn: 196829	2013-12-09 22:08:32 +00:00
Reed Kotler	2e362b3b4b	Delete some old code used for testing that is not needed anymore. This is part of the mips16 epilogue/prologue cleanup. llvm-svn: 196824	2013-12-09 21:19:51 +00:00
Rafael Espindola	1a3a22fad1	Don't add suffixes for stdcall/fastcall on 64 coff. This matches the behavior of both msvc and mingw. llvm-svn: 196814	2013-12-09 20:44:48 +00:00
Rafael Espindola	e2a1418e68	Don't set a variable to its default value. llvm-svn: 196807	2013-12-09 19:36:11 +00:00
Ana Pazos	bde2828ae0	Fix pattern match for movi with 0D result Patch by Jiangning Liu. With some test case changes: - intrinsic test added to the existing /test/CodeGen/AArch64/neon-aba-abd.ll. - New test cases to cover movi 1D scenario without using the intrinsic in test/CodeGen/AArch64/neon-mov.ll. llvm-svn: 196806	2013-12-09 19:29:14 +00:00
Daniel Sanders	3519dce968	[mips][msa] Fix invalid generated code when lowering FrameIndex involving unaligned offsets. Summary: The MSA ld.[bhwd] and st.[bhwd] instructions scale the immediate by the element size before use as an offset. The offset must therefore be a multiple of the element size to be valid in these instructions. However, an unaligned base address is valid in MSA. This commit causes the compiler to emit valid code when the calculated offset is not a multiple of the element size by accounting for the offset using addiu and using a zero offset in the load/store. Depends on D2338 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D2339 llvm-svn: 196777	2013-12-09 12:47:12 +00:00
Daniel Sanders	26a5a7475e	[mips][msa] Fix suboptimal FrameIndex lowering for ld.[hwd] and st.[hwd] Summary: The immediate in these instructions is scaled before use as an offset. They therefore have a wider reach than ld.b/st.b. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D2338 llvm-svn: 196775	2013-12-09 11:50:16 +00:00
Vladimir Medic	0d02be37c2	Method parseSetAssignment treats every operand with '$' sign as register and the parsing is directed to set alias for register. This will result in errors reported when expressions containing label references are parsed(for example long jumps) As we can't make a complete solution now it has been decided to enable .set directive to handle long jump expressions. This will cause parser to report errors when parsing integer based register assignments, for example: .set r3, will be reported as error. Still, the need for expressions is higher priority as the integer based register assignments are Mips specific and can be avoided using register names. llvm-svn: 196773	2013-12-09 11:03:25 +00:00
Venkatraman Govindaraju	61116e7084	[SPARCV9]: Adjust the resultant pointer of DYNAMIC_STACKALLOC with the stack BIAS on sparcV9. llvm-svn: 196755	2013-12-09 05:13:25 +00:00
Venkatraman Govindaraju	f6c8fe983b	[Sparc]: Implement getSetCCResultType() in SparcTargetLowering so that umulo/smulo can be lowered on sparcv9 without an assertion error. llvm-svn: 196751	2013-12-09 04:02:15 +00:00
Hao Liu	96a587a9f7	[AArch64]Add missing pair intrinsics such as: int32_t vminv_s32(int32x2_t a) which should be compiled into SMINP Vd.2S,Vn.2S,Vm.2S llvm-svn: 196749	2013-12-09 03:51:42 +00:00
Hao Liu	868caea6d1	[AArch64]Pattern match failures for truncate store and extend load llvm-svn: 196748	2013-12-09 03:34:08 +00:00
Venkatraman Govindaraju	72cc248524	[SparcV9]: Expand MULHU/MULHS:i64 and UMUL_LOHI/SMUL_LOHI:i64 on sparcv9. This fixes PR18150. llvm-svn: 196735	2013-12-08 22:06:07 +00:00
Manman Ren	2e06c8c777	Revert 196544 due to internal bot failures. llvm-svn: 196732	2013-12-08 20:28:33 +00:00
Reed Kotler	abaed9ecea	Make sure we mark these registers as defined. Previously was done in the td file. llvm-svn: 196731	2013-12-08 19:21:47 +00:00
Reed Kotler	e0a34ee66e	Cleaning up of prologue/epilogue code for Mips16. First step here is to make save/restore into variable number of argument instructions. llvm-svn: 196726	2013-12-08 16:51:52 +00:00
Tim Northover	a4173715f7	ARM: fix folding of stack-adjustment (yet again). When trying to eliminate an "sub sp, sp, #N" instruction by folding it into an existing push/pop using dummy registers, we need to account for the fact that this might affect precisely how "fp" gets set in the prologue. We were attempting this, but assuming that whenever we performed a fold it would make a difference. This is false, for example, in: push {r4, r7, lr} add fp, sp, #4 vpush {d8} sub sp, sp, #8 we can fold the "sub" into the "vpush", forming "vpush {d7, d8}". However, in that case the "add fp" instruction mustn't change, which we were getting wrong before. Should fix PR18160. llvm-svn: 196725	2013-12-08 15:56:50 +00:00
Rafael Espindola	080133453b	Remove the notion of primitive types. They were out of place since the introduction of arbitrary precision integer types. This also synchronizes the documentation to Types.h, so it refers to first class types and single value types. llvm-svn: 196661	2013-12-07 19:34:20 +00:00
Vincent Lejeune	92b0a64906	Add a RequireStructuredCFG Field to TargetMachine. llvm-svn: 196634	2013-12-07 01:49:19 +00:00
Vincent Lejeune	ae7e96062c	R600: Remove orphaned declarations llvm-svn: 196633	2013-12-07 01:49:10 +00:00
Ana Pazos	93a07c2185	Added support for mcpu krait - krait processor currently modeled with the same features as A9. - Krait processor additionally has VFP4 (fused multiply add/sub) and hardware division features enabled. - krait has currently the same Schedule model as A9 - krait cpu flag is not recognized by the GNU assembler yet, it is replaced with march=armv7-a to avoid a lower march from being used. llvm-svn: 196619	2013-12-06 22:48:17 +00:00
Weiming Zhao	43d8e6cb3b	Bug 18149: [AArch32] VSel instructions has no ARMCC field The current peephole optimizing for compare inst assumes an instr that uses CPSR has an MO for ARM Cond code.However, for VSEL instructions (vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do they support the modification of Cond Code. llvm-svn: 196588	2013-12-06 17:56:48 +00:00
Cameron McInally	e3cc4aacb9	Update AVX512 vector blend intrinsic names. llvm-svn: 196581	2013-12-06 13:35:35 +00:00
Richard Sandiford	198ddf83c1	[SystemZ] Use LOAD AND TEST for comparisons with -0 ...since it os equivalent to comparison with +0. llvm-svn: 196580	2013-12-06 09:59:12 +00:00
Richard Sandiford	7b4118a0fc	[SystemZ] Extend the use of C(L)GFR instcombine prefers to put extended operands first, so this patch handles that case for C(L)GFR. llvm-svn: 196579	2013-12-06 09:56:50 +00:00
Richard Sandiford	48ef6abddc	[SystemZ] Optimize selects between 0 and -1 Since z has no setcc instruction as such, the choice of setBooleanContents is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent, so we produced a branch-free form when selecting between 0 and 1, but not when selecting between 0 and -1. This patch handles the latter case too. At some point I'd like to measure whether it's better to use conditional moves for constant selects on z196, but that's future work. llvm-svn: 196578	2013-12-06 09:53:09 +00:00
Eric Christopher	99952a0823	Fix an index array check. Patch by Marius Wachtler. llvm-svn: 196561	2013-12-06 02:45:24 +00:00
Reed Kotler	2db182b5e8	Delete dead code. llvm-svn: 196551	2013-12-06 00:13:50 +00:00
Yi Jiang	01cfa94212	Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x) llvm-svn: 196544	2013-12-05 22:42:50 +00:00
Ana Pazos	6b0a8c50dd	Implemented vget/vset_lane_f16 intrinsics llvm-svn: 196533	2013-12-05 21:07:49 +00:00
Andrew Trick	880e573d98	MI-Sched: handle latency of in-order operations with the new machine model. The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: \| Benchmarks/BenchmarkGame/recursive \| 52.39% \| \| Benchmarks/VersaBench/beamformer \| 20.80% \| \| Benchmarks/Misc/pi \| 19.97% \| \| Benchmarks/Misc/mandel-2 \| 19.95% \| \| SPEC/CFP2000/188.ammp \| 18.72% \| \| Benchmarks/McCat/08-main/main \| 18.58% \| \| Benchmarks/Misc-C++/Large/sphereflake \| 18.46% \| \| Benchmarks/Olden/power \| 17.11% \| \| Benchmarks/Misc-C++/mandel-text \| 16.47% \| \| Benchmarks/Misc/oourafft \| 15.94% \| \| Benchmarks/Misc/flops-7 \| 14.99% \| \| Benchmarks/FreeBench/distray \| 14.26% \| \| SPEC/CFP2006/470.lbm \| 14.00% \| \| mediabench/mpeg2/mpeg2dec/mpeg2decode \| 12.28% \| \| Benchmarks/SmallPT/smallpt \| 10.36% \| \| Benchmarks/Misc-C++/Large/ray \| 8.97% \| \| Benchmarks/Misc/fp-convert \| 8.75% \| \| Benchmarks/Olden/perimeter \| 7.10% \| \| Benchmarks/Bullet/bullet \| 7.03% \| \| Benchmarks/Misc/mandel \| 6.75% \| \| Benchmarks/Olden/voronoi \| 6.26% \| \| Benchmarks/Misc/flops-8 \| 5.77% \| \| Benchmarks/Misc/matmul_f64_4x4 \| 5.19% \| \| Benchmarks/MiBench/security-rijndael \| 5.15% \| \| Benchmarks/Misc/flops-6 \| 5.10% \| \| Benchmarks/Olden/tsp \| 4.46% \| \| Benchmarks/MiBench/consumer-lame \| 4.28% \| \| Benchmarks/Misc/flops-5 \| 4.27% \| \| Benchmarks/mafft/pairlocalalign \| 4.19% \| \| Benchmarks/Misc/himenobmtxpa \| 4.07% \| \| Benchmarks/Misc/lowercase \| 4.06% \| \| SPEC/CFP2006/433.milc \| 3.99% \| \| Benchmarks/tramp3d-v4 \| 3.79% \| \| Benchmarks/FreeBench/pifft \| 3.66% \| \| Benchmarks/Ptrdist/ks \| 3.21% \| \| Benchmarks/Adobe-C++/loop_unroll \| 3.12% \| \| SPEC/CINT2000/175.vpr \| 3.12% \| \| Benchmarks/nbench \| 2.98% \| \| SPEC/CFP2000/183.equake \| 2.91% \| \| Benchmarks/Misc/perlin \| 2.85% \| \| Benchmarks/Misc/flops-1 \| 2.82% \| \| Benchmarks/Misc-C++-EH/spirit \| 2.80% \| \| Benchmarks/Misc/flops-2 \| 2.77% \| \| Benchmarks/NPB-serial/is \| 2.42% \| \| Benchmarks/ASC_Sequoia/CrystalMk \| 2.33% \| \| Benchmarks/BenchmarkGame/n-body \| 2.28% \| \| Benchmarks/SciMark2-C/scimark2 \| 2.27% \| \| Benchmarks/Olden/bh \| 2.03% \| \| skidmarks10/skidmarks \| 1.81% \| \| Benchmarks/Misc/flops \| 1.72% \| Slowdowns: \| Benchmarks/llubenchmark/llu \| -14.14% \| \| Benchmarks/Polybench/stencils/seidel-2d \| -5.67% \| \| Benchmarks/Adobe-C++/functionobjects \| -5.25% \| \| Benchmarks/Misc-C++/oopack_v1p8 \| -5.00% \| \| Benchmarks/Shootout/hash \| -2.35% \| \| Benchmarks/Prolangs-C++/ocean \| -2.01% \| \| Benchmarks/Polybench/medley/floyd-warshall \| -1.98% \| \| Polybench/linear-algebra/kernels/3mm \| -1.95% \| \| Benchmarks/McCat/09-vor/vor \| -1.68% \| llvm-svn: 196516	2013-12-05 17:55:58 +00:00
Andrew Trick	ff199a4b8e	Fix the A9 machine model. VTRN writes two registers. llvm-svn: 196514	2013-12-05 17:55:49 +00:00
Rafael Espindola	4cc2b87375	Add a default constructor to get deterministic behavior. Should fix the msan and valgrind bots. llvm-svn: 196509	2013-12-05 16:21:17 +00:00
Justin Holewinski	4459717bab	[NVPTX] Fix off-by-one error when creating the VT list for an SDNode llvm-svn: 196503	2013-12-05 12:58:00 +00:00
Matheus Almeida	a6beac1acc	[mips] Small code generation improvement for conditional operator (select) in case the operands are constants and its difference is \|1\|. It should be possible in those cases to rematerialize the result using MIPS's slt and similar instructions. The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed otherwise the optimization implemented in this patch would have been triggered (difference between the operands was 1) and that would have changed the semantic of the tests. llvm-svn: 196498	2013-12-05 12:07:05 +00:00
Matheus Almeida	a611c0f405	[mips] Add some comments related to the optimization performed in performSELECTCombine. The structure of the code was slightly modified so that the next patch is easier to read/review. No functional changes. llvm-svn: 196496	2013-12-05 11:56:56 +00:00
Matheus Almeida	6b59c449d9	[mips][msa] Fix issue with immediate fields of LD/ST instructions not being correctly encoded/decoded. In more detail, immediate fields of LD/ST instructions should be divided/multiplied by the size of the data format before encoding and after decoding, respectively. llvm-svn: 196494	2013-12-05 11:06:22 +00:00
Tim Northover	e4def5e228	ARM: fix yet another stack-folding bug We were trying to fold the stack adjustment into the wrong instruction in the situation where the entire basic-block was epilogue code. Really, it can only ever be valid to do the folding precisely where the "add sp, ..." would be placed so there's no need for a separate iterator to track that. Should fix PR18136. llvm-svn: 196493	2013-12-05 11:02:02 +00:00
Rafael Espindola	117b20c492	Remove the isImplicitlyPrivate argument of getNameWithPrefix. getSymbolWithGlobalValueBase use is to create a name of a new symbol based on the name of an existing GV. Assert that and then remove the last call to pass true to isImplicitlyPrivate. This gives the mangler API a 1:1 mapping from GV to names, which is what we need to drop the mangler dependency on the target (and use an extended datalayout instead). llvm-svn: 196472	2013-12-05 05:53:12 +00:00
Alp Toker	f907b891da	Correct word hyphenations This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471	2013-12-05 05:44:44 +00:00
Rafael Espindola	01d19d0299	Hide the stub created for MO_ExternalSymbol too. given declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) declare void @foo() define void @bar() { call void @foo() call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false) ret void } We used to produce L_foo$stub: .indirect_symbol _foo .ascii "\364\364\364\364\364" _memset$stub: .indirect_symbol _memset .ascii "\364\364\364\364\364" We not produce a private stub for memset too. Stubs are not needed with recent linkers, but we still produce them for darwin8. Thanks to David Fang for confirming that gcc used to do this too. llvm-svn: 196468	2013-12-05 05:19:12 +00:00
Matt Arsenault	89cc49fe5d	R600/SI: Add comments for number of used registers. llvm-svn: 196467	2013-12-05 05:15:35 +00:00
Jiangning Liu	65d8e3422a	For AArch64, add missing register cost calculation for big value types like v4i64 and v8i64. llvm-svn: 196456	2013-12-05 02:12:01 +00:00
Cameron McInally	30bbb214e5	Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load. Patch by Aleksey Bader. llvm-svn: 196435	2013-12-05 00:11:25 +00:00
Kevin Enderby	86496a45cb	Fix a bug in darwin's 32-bit X86 handling of evaluating fixups. Where it would use a scattered relocation entry but falls back to a normal relocation entry because the FixupOffset is more than 24-bits. The bug is in the X86MachObjectWriter::RecordScatteredRelocation() where it changes reference parameter FixedValue but then returns false to indicate it did not create a scattered relocation entry. The fix is simply to save the original value of the parameter FixedValue at the start of the method and restore it if we are returning false in that case. rdar://15526046 llvm-svn: 196432	2013-12-04 23:36:24 +00:00
David Peixotto	8ad70b3542	Add support for parsing ARM symbol variants on ELF targets ARM symbol variants are written with parens instead of @ like this: .word __GLOBAL_I_a(target1) This commit adds support for parsing these symbol variants in expressions. We introduce a new flag to MCAsmInfo that indicates the parser should use parens to parse the symbol variant. The expression parser is modified to look for symbol variants using parens instead of @ when the corresponding MCAsmInfo flag is true. The MCAsmInfo parens flag is enabled only for ARM on ELF. By adding this flag to MCAsmInfo, we are able to get rid of redundant ARM-specific symbol variants and use the generic variants instead (e.g. VK_GOT instead of VK_ARM_GOT). We use the new UseParensForSymbolVariant attribute in MCAsmInfo to correctly print the symbol variants for arm. To achive this we need to keep a handle to the MCAsmInfo in the MCSymbolRefExpr class that we can check when printing the symbol variant. Updated Tests: Changed case of symbol variant to match the generic kind. test/CodeGen/ARM/tls-models.ll test/CodeGen/ARM/tls1.ll test/CodeGen/ARM/tls2.ll test/CodeGen/Thumb2/tls1.ll test/CodeGen/Thumb2/tls2.ll PR18080 llvm-svn: 196424	2013-12-04 22:43:20 +00:00
Cameron McInally	cbb51dacfb	Fix assembly syntax for AVX512 vector blend instructions. llvm-svn: 196393	2013-12-04 18:05:36 +00:00
Michael Liao	9a0e3f4823	[X86] Check YMM31/ZMM31 as well - No test case as there's no calling convention preserve YMM31/ZMM31 only llvm-svn: 196391	2013-12-04 17:44:22 +00:00
Chad Rosier	1d22b5d1c0	Update the UseFusedMAC definition to directly specify its dependence on having VFP4. Patch by Daniel Stewart! llvm-svn: 196390	2013-12-04 17:16:36 +00:00
Cameron McInally	c5f420e129	Suppress '(x < y) ? a : 0 -> (x < y) & a' transform on X86 architectures with dedicated mask registers. Patch by Aleksey Bader. llvm-svn: 196386	2013-12-04 14:52:33 +00:00
Kevin Qin	afd095de8b	[AArch64 Neon] Add ACLE intrinsic vceqz_f64. llvm-svn: 196362	2013-12-04 08:02:34 +00:00
Kevin Qin	f9832e8de7	[AArch64 NEON] Add missing compare intrinsics. llvm-svn: 196360	2013-12-04 07:53:28 +00:00
Juergen Ributzka	17e0d9ee6c	[Stackmap] Emit multi-byte nops for X86. llvm-svn: 196334	2013-12-04 00:39:08 +00:00
Reed Kotler	59975c2cba	final patch for very long conditional branches for mips16 constant islands. this completes the basic port of ARM constant islands to Mips16. More testing, code review, cleanup is in order but basically everything seems to be working. A bug in gas is preventing some of the runtime testing but I hope to resolve this soon. llvm-svn: 196331	2013-12-03 23:42:51 +00:00
Rafael Espindola	0a2baf8eaf	Fix mingw32 thiscall + sret. Unlike msvc, when handling a thiscall + sret gcc will * Put the sret in %ecx * Put the this pointer is (%esp) This fixes, for example, calling stringstream::str. llvm-svn: 196312	2013-12-03 20:51:23 +00:00
James Molloy	8a25992f39	Addrspacecasts are no-ops on ARM. Testcase added. llvm-svn: 196269	2013-12-03 11:23:11 +00:00
Richard Sandiford	ccc2a7c1a0	[SystemZ] Fix choice of known-zero mask in insertion optimization The backend converts 64-bit ORs into subreg moves if the upper 32 bits of one operand and the low 32 bits of the other are known to be zero. It then tries to peel away redundant ANDs from the upper 32 bits. Since AND masks are canonicalized to exclude known-zero bits, the test ORs the mask and the known-zero bits together before checking for redundancy. The problem was that it was using the wrong node when checking for known-zero bits, so could drop ANDs that were still needed. llvm-svn: 196267	2013-12-03 11:01:54 +00:00
Michael Liao	14b02848a3	Enhance the fix of PR17631 - The fix to PR17631 fixes part of the cases where 'vzeroupper' should not be issued before 'call' insn. There're other cases where helper calls will be inserted not limited to epilog. These helper calls do not follow the standard calling convention and won't clobber any YMM registers. (So far, all call conventions will clobber any or part of YMM registers.) This patch enhances the previous fix to cover more cases 'vzerosupper' should not be inserted by checking if that function call won't clobber any YMM registers and skipping it if so. llvm-svn: 196261	2013-12-03 09:17:32 +00:00
Hao Liu	dca64f4a20	[AArch64]Add missing floating point convert, round and misc intrinsics. E.g. int64x1_t vcvt_s64_f64(float64x1_t a) -> FCVTZS Dd, Dn llvm-svn: 196210	2013-12-03 06:06:55 +00:00
Hao Liu	c250cbc095	AArch64: add missing ACLE intrinsics mapping to general arithmetic operation from VFP instructions. E.g. float64x1_t vadd_f64(float64x1_t a, float64x1_t b) -> FADD Dd, Dn, Dm. llvm-svn: 196208	2013-12-03 05:58:30 +00:00
NAKAMURA Takumi	bc815b2d21	Whitespace. llvm-svn: 196203	2013-12-03 05:28:27 +00:00
Hao Liu	21a461353a	AArch64: Add missing scalar pair intrinsics. E.g. "float32_t vaddv_f32(float32x2_t a)" to be matched into "faddp s0, v1.2s". llvm-svn: 196198	2013-12-03 03:39:47 +00:00
Jiangning Liu	3a541d46a1	Add some missing pattern matches for AArch64 Neon intrinsics like vuqadd_s64 and friends. llvm-svn: 196192	2013-12-03 01:33:52 +00:00
Jiangning Liu	94a7bb2130	Add some missing pattern matches for AArch64 Neon intrinsics like vmull_high_n_s16 and friends. llvm-svn: 196190	2013-12-03 01:29:32 +00:00
Rafael Espindola	20a8621e5f	Don't set PrivateGlobalPrefix for NVPTX and R600. These targets have special asm printers that don't use these. llvm-svn: 196187	2013-12-03 01:03:35 +00:00
Hal Finkel	563cc05cae	Remove PPCScoreboardHazardRecognizer PPCScoreboardHazardRecognizer was a subclass of ScoreboardHazardRecognizer which did only one thing: filtered out nodes in EmitInstruction for which DAG->getInstrDesc(SU) returned NULL. This used to be the case for PPC pseudo instructions. As far as I can tell, this is no longer true, and so we can use ScoreboardHazardRecognizer directly. llvm-svn: 196171	2013-12-02 23:52:46 +00:00
Rafael Espindola	5113d166f5	Refactor the setting of PrivateGlobalPrefix. No functionality change. llvm-svn: 196170	2013-12-02 23:39:26 +00:00
Rafael Espindola	5733d9bbb0	Don't set PrivateGlobalPrefix twice in the same function. llvm-svn: 196169	2013-12-02 23:26:31 +00:00
Rafael Espindola	04867ce9b0	Convert two char* that are only ever used as booleans to bool. llvm-svn: 196168	2013-12-02 23:04:51 +00:00
Chad Rosier	3106de3f9d	[AArch64] Implemented vcopy_lane patterns using scalar DUP instruction. Patch by Ana Pazos! llvm-svn: 196151	2013-12-02 21:05:16 +00:00
Vincent Lejeune	4b8d9e303c	R600: Workaround for cayman loop bug llvm-svn: 196121	2013-12-02 17:29:37 +00:00
Rafael Espindola	f4e6b29a03	Move getSymbolWithGlobalValueBase to TargetLoweringObjectFile. This allows it to be used in TargetLoweringObjectFileImpl.cpp. llvm-svn: 196117	2013-12-02 16:25:47 +00:00
Alp Toker	a5b88a5851	Introduce poor man's consumeToken() in X86AsmParser This makes the code a little more idiomatic. No change in behaviour. llvm-svn: 196113	2013-12-02 16:06:06 +00:00
Rafael Espindola	957cf6f9e1	Remove dead code. MO_JumpTableIndex and MO_ExternalSymbol don't show up on inline asm. Keeping parts of the old asm printer just to print inline asm to a string that we then parse back looks like a hack. llvm-svn: 196111	2013-12-02 15:36:37 +00:00
Tim Northover	dee8604caf	ARM: decide whether to use movw/movt based on "minsize" attribute. llvm-svn: 196102	2013-12-02 14:46:26 +00:00
NAKAMURA Takumi	e4b8a2e559	XCoreFrameLowering.cpp: Use [in,out] instead of [in] [out]. [-Wdocumentation] llvm-svn: 196094	2013-12-02 11:31:25 +00:00
Robert Lytton	7fbca3cce0	XCore target: Make handling of large frames not dependent upon an FP. eliminateFrameIndex() has been reworked to handle both small & large frames with either a FP or SP. An additional Slot is required for Scavenging spills when not using FP for large frames. Reworked the handling of Register Scavenging. Whether we are using an FP or not, whether it is a large frame or not, and whether we are using a large code model or not are now independent. llvm-svn: 196091	2013-12-02 11:05:28 +00:00
Tim Northover	72360d201c	ARM: add pseudo-instructions for lit-pool global materialisation These are used by MachO only at the moment, and (much like the existing MOVW/MOVT set) work around the fact that the labels used in the actual instructions often contain PC-dependent components, which means that repeatedly materialising the same global can't be CSEed. With small modifications, it could be adapted to how ELF finds the address of _GLOBAL_OFFSET_TABLE_, which would give similar benefits in PIC mode there. llvm-svn: 196090	2013-12-02 10:35:41 +00:00
Benjamin Kramer	2725bd21ff	XCore: Unbreak C++11 build. llvm-svn: 196089	2013-12-02 10:29:26 +00:00
Robert Lytton	d3ffa66c6c	XCore target: fix large code model 'select' indirect address handling. llvm-svn: 196088	2013-12-02 10:18:37 +00:00
Robert Lytton	ff38d37c77	XCore target: Add large code model When using large code model: Global objects larger than 'CodeModelLargeSize' bytes are placed in sections named with a trailing ".large" The folded global address of such objects are lowered into the const pool. During inspection it was noted that LowerConstantPool() was using a default offset of zero. A fix was made, but due to only offsets of zero being generated, testing only verifies the change is not detrimental. Correct the flags emitted for explicitly specified sections. We assume the size of the object queried by getSectionForConstant() is never greater than CodeModelLargeSize. To handle greater than CodeModelLargeSize, changes to AsmPrinter would be required. llvm-svn: 196087	2013-12-02 10:18:31 +00:00
Robert Lytton	0abd2c96b5	XCore target: Fix eliminateFrameIndex() to handle large frames Large frame offsets are loaded from the ConstantPool. Where possible, offsets are encoded using the smaller MKMSK instruction. Large frame offsets can only be used when there is a frame-pointer. llvm-svn: 196085	2013-12-02 10:18:19 +00:00
Robert Lytton	a9f984fb76	XCore target: Enable frames larger than 65535 to be lowered llvm-svn: 196084	2013-12-02 10:18:14 +00:00
Rafael Espindola	321f55ae9f	Remove leftovers from a non-MC asm printer. llvm-svn: 196068	2013-12-02 05:42:16 +00:00
Rafael Espindola	8473259602	Remove #if 0 declarations. llvm-svn: 196067	2013-12-02 05:24:28 +00:00
Rafael Espindola	50712a456d	Change the default of AsmWriterClassName and isMCAsmWriter. llvm-svn: 196065	2013-12-02 04:55:42 +00:00
Rafael Espindola	d0f993ae68	Remove dead declarations. llvm-svn: 196063	2013-12-02 04:18:19 +00:00
Rafael Espindola	32635781bb	Refactor for clarity and efficiency. The PPC GetSymbolFromOperand already prefixed stubs of MO_ExternalSymbol, so this should be a nop. llvm-svn: 196059	2013-12-02 03:26:43 +00:00
Tim Northover	45479dcf49	ARM: fix bug in -Oz stack adjustment folding Previously, we clobbered callee-saved registers when folding an "add sp, #N" into a "pop {rD, ...}" instruction. This change checks whether a register we're going to add to the "pop" could actually be live outside the function before doing so and should fix the issue. This should fix PR18081. llvm-svn: 196046	2013-12-01 14:16:24 +00:00
Benjamin Kramer	951b15eb09	Revamp error checking in the ms inline asm parser. - Actually abort when an error occurred. - Check that the frontend lookup worked when parsing length/size/type operators. Tested by a clang test. PR18096. llvm-svn: 196044	2013-12-01 11:47:42 +00:00
Hal Finkel	42daeae9bd	Add a scheduling model (with itinerary) for the PPC POWER7 This adds a scheduling model for the POWER7 (P7) core, and enables the machine-instruction scheduler when targeting the P7. Scheduling for the P7, like earlier ooo PPC cores, requires considering both dispatch group hazards, and functional unit resources and latencies. These are both modeled in a combined itinerary. Dispatch group formation is still handled by the post-RA scheduler (which still needs to be updated for the P7, but nevertheless does a pretty good job). One interesting aspect of this change is that I've also enabled to use of AA duing CodeGen for the P7 (just as it is for the embedded cores). The benchmark results seem to support this decision (see below), and while this is normally useful for in-order cores, and not for ooo cores like the P7, I think that the dispatch slot hazards are enough like in-order resources to make the AA useful. Test suite significant performance differences (where negative is a speedup, and positive is a regression) vs. the current situation: MultiSource/Benchmarks/BitBench/drop3/drop3 with AA: N/A without AA: -28.7614% +/- 19.8356% (significantly against AA) MultiSource/Benchmarks/FreeBench/neural/neural with AA: -17.7406% +/- 11.2712% without AA: N/A (significantly in favor of AA) MultiSource/Benchmarks/SciMark2-C/scimark2 with AA: -11.2079% +/- 1.80543% without AA: -11.3263% +/- 2.79651% MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt with AA: -41.8649% +/- 17.0053% without AA: -34.5256% +/- 23.7072% MultiSource/Benchmarks/mafft/pairlocalalign with AA: 25.3016% +/- 17.8614% without AA: 38.6629% +/- 14.9391% (significantly in favor of AA) MultiSource/Benchmarks/sim/sim with AA: N/A without AA: 13.4844% +/- 7.18195% (significantly in favor of AA) SingleSource/Benchmarks/BenchmarkGame/Large/fasta with AA: 15.0664% +/- 6.70216% without AA: 12.7747% +/- 8.43043% SingleSource/Benchmarks/BenchmarkGame/puzzle with AA: 82.2713% +/- 26.3567% without AA: 75.7525% +/- 41.1842% SingleSource/Benchmarks/Misc/flops-2 with AA: -37.1621% +/- 20.7964% without AA: -35.2342% +/- 20.2999% (significantly in favor of AA) These are 99.5% confidence intervals from 5 runs per configuration. Regarding the choice to turn on AA during CodeGen, of these results, four seem significantly in favor of using AA, and one seems significantly against. I'm not making this decision based on these numbers alone, but these results seem consistent with results I have from other tests, and so I think that, on balance, using AA is a win. llvm-svn: 195981	2013-11-30 20:55:12 +00:00
Hal Finkel	46402a4211	Split some PPC itinerary classes In preparation for adding scheduling definitions for the POWER7, split some PPC itinerary classes so that the P7's latencies and hazards can be better described. For the most part, this means differentiating indexed from non-index pre-increment loads and stores. Also, differentiate single from double-precision sqrt. No functionality change intended (except for a more-specific latency for single-precision sqrt on the A2). llvm-svn: 195980	2013-11-30 20:41:13 +00:00
Zoran Jovanovic	9d86e26e62	Fixed issue with microMIPS long branch. llvm-svn: 195975	2013-11-30 19:12:28 +00:00
Daniel Sanders	7fd68d6018	[mips][msa] MSA loads and stores have a 10-bit offset. Account for this when lowering FrameIndex. This prevents the compiler from emitting invalid ld.[bhwd]'s and st.[bhwd]'s when the stack frame is between 512 and 32,768 bytes in size. llvm-svn: 195973	2013-11-30 13:47:57 +00:00
Daniel Sanders	7153414768	[mips][msa] A small refactor to reduce patch noise in my next commit No functional change. An if-statement has been split into two nested if-statements. llvm-svn: 195972	2013-11-30 13:15:21 +00:00
Reed Kotler	ad450f239f	Part 1 of 3 patches that completes very long conditional branches in constant islands for Mips16. We introdcuce JalB16 as a synomnym for Jal16. It makes it easier to read and is also necessary because Jal16 is a call instruction but JalB16 is being used as a branch. Various parts of LLVM will not work properly even in this late stage of the backend if we use what was declared as a call instruction to function as a branch. For one, basic block labels may not get emitted in some situations. llvm-svn: 195968	2013-11-29 22:32:56 +00:00
Zoran Jovanovic	1bc3cce040	Revert revision 195965. llvm-svn: 195967	2013-11-29 22:10:02 +00:00
Zoran Jovanovic	ff2a40ce4d	Fixed issue with microMIPS long branch. llvm-svn: 195965	2013-11-29 21:41:24 +00:00
Hal Finkel	1df3205e8c	Adjust PPC A2 input operand latencies On the PPC A2, instructions are only issued after their input operands are ready. Model this by specifying that input operands are read at dispatch (0 cycles after issue). This changes all input operand latencies from 1 to 0. Significant test-suite performance changes (these are 99.5% confidence intervals on 6 runs for both before and after): speedups: MultiSource/Benchmarks/sim/sim -1.21915% +/- 0.175063% MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt -1.23946% +/- 1.05133% SingleSource/Benchmarks/Misc/flops-2 -1.24237% +/- 0.681362% MultiSource/Applications/JM/lencod/lencod -1.33992% +/- 0.757498% MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt -1.51802% +/- 1.21468% MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt -2.18818% +/- 1.28605% MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt -2.21977% +/- 1.19499% SingleSource/Benchmarks/BenchmarkGame/spectral-norm -2.29822% +/- 0.671871% MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl -2.40975% +/- 0.355931% SingleSource/Benchmarks/Misc/fp-convert -2.41899% +/- 1.04751% MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl -2.50349% +/- 0.126765% SingleSource/Benchmarks/Misc/flops-3 -3.00214% +/- 0.700795% MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt -3.56995% +/- 3.2929% MultiSource/Applications/sgefa/sgefa -4.24908% +/- 2.00413% MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk -18.1294% +/- 3.96489% regressions: MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 1.03249% +/- 0.178547% MultiSource/Applications/hexxagon/hexxagon 1.16597% +/- 0.285235% MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 1.39576% +/- 1.07855% SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 1.71539% +/- 0.173182% MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1 1.90013% +/- 0.866472% MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 2.39854% +/- 1.05914% MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 2.4402% +/- 0.817904% MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 5.87997% +/- 3.3172% MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 9.02643% +/- 5.79591% MultiSource/Benchmarks/VersaBench/bmm/bmm 10.3517% +/- 1.227% Obviously, there are data points on both sides of this; but I think, overall, this supports making the change. llvm-svn: 195951	2013-11-29 07:04:59 +00:00
Hal Finkel	5a7162f36b	Create a PPC440 SchedMachineModel Some of the older PPC processor definitions don't have associated SchedMachineModels; correct this for the PPC440. llvm-svn: 195949	2013-11-29 06:32:17 +00:00
Hal Finkel	4035e8d86a	Fixup PPC440 load/store operand latencies The operand latencies for loads and stores in the PPC440 itinerary were wrong (the store operands are all inputs, and the "with update" (pre-increment) instructions need a latency for the additional output). llvm-svn: 195948	2013-11-29 06:19:43 +00:00
Hal Finkel	a10bd1d23a	Adjust PPC440 operand latencies The operand latencies for the PPC440 should be specified relative to dispatch, not relative to the initial fetch-and-decode stages. Because most instructions (ignoring bypass) wait in dispatch until their operands are ready, this is modeled as reading input operands "at dispatch" (0 cycles after issue), and so every input and output operand has 4 cycles subtracted from it. This could alter scheduling slightly, but I don't expect a large effect. llvm-svn: 195947	2013-11-29 05:59:00 +00:00
Hal Finkel	dd06369913	Don't model the fetch and decode units for the PPC440 Modeling the fetch and decode units in the PPC440 itinerary does not add anything to the hazard detection capability (and so modeling them just wastes compile time). No functionality change intended. llvm-svn: 195946	2013-11-29 05:58:38 +00:00
Lang Hames	39609996d9	Refactor a lot of patchpoint/stackmap related code to simplify and make it target independent. Most of the x86 specific stackmap/patchpoint handling was necessitated by the use of the native address-mode format for frame index operands. PEI has now been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing us to use a simple, platform independent register/offset pair for frame indexes on stackmap/patchpoints. Notes: - Folding is now platform independent and automatically supported. - Emiting patchpoints with direct memory references now just involves calling the TargetLoweringBase::emitPatchPoint utility method from the target's XXXTargetLowering::EmitInstrWithCustomInserter method. (See X86TargetLowering for an example). - No more ugly platform-specific operand parsers. This patch shouldn't change the generated output for X86. llvm-svn: 195944	2013-11-29 03:07:54 +00:00
Hao Liu	ba38eee8ac	AArch64: The pattern match should check the range of the immediate value. Or we can generate some illegal instructions. E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16]. llvm-svn: 195941	2013-11-29 02:11:22 +00:00
Jiangning Liu	c429c00f3b	Add missing pattern for supporting intrinsic function vbsl_f64 with argument double floating point. llvm-svn: 195938	2013-11-29 01:37:15 +00:00
Kevin Qin	337cfcc83c	[AArch64 NEON]Fix a assertion failure when disassemble SHLL instruction. llvm-svn: 195936	2013-11-29 01:29:16 +00:00
Rafael Espindola	d5bd5a4716	Refactor to remove a bit of duplication. No functionality change. llvm-svn: 195933	2013-11-28 20:12:44 +00:00
Benjamin Kramer	ea1982aff9	Silence sign-compare warning and reduce nesting. No functionality change. llvm-svn: 195932	2013-11-28 19:58:56 +00:00
NAKAMURA Takumi	226e10edff	[CMake] Let add_public_tablegen_target() provide intrinsics_gen, too. I think, in principle, intrinsics_gen may be added explicitly. That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen. Almost all source files depend on both CommonTaleGen and intrinsics_gen. Explicit add_dependencies() have been pruned under lib/Target. llvm-svn: 195929	2013-11-28 17:04:31 +00:00
NAKAMURA Takumi	ce746c6c49	[CMake] Let add_public_tablegen_target responsible to provide dependency to CommonTableGen. add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS. LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope. llvm-svn: 195927	2013-11-28 17:04:04 +00:00
Rafael Espindola	848493d886	The global prefix is always one char. Don't use a string for it. llvm-svn: 195926	2013-11-28 17:00:49 +00:00
NAKAMURA Takumi	b2abd160b3	[CMake] Prune include_directories() in llvm/lib/Target, take #2 . I forgot to commit them. They were staging in my local repo. llvm-svn: 195924	2013-11-28 15:30:37 +00:00
Daniel Sanders	063b74ad4e	[mips] Revert test commit r195922. llvm-svn: 195923	2013-11-28 15:26:33 +00:00
Daniel Sanders	eb16443fca	[mips] A test commit to test my Herald and Audit workflow Will be reverted in the next commit llvm-svn: 195922	2013-11-28 15:25:43 +00:00
NAKAMURA Takumi	413518f1f8	[CMake] Prune include_directories() in llvm/lib/Target. add_llvm_target() sets them. llvm-svn: 195921	2013-11-28 14:53:30 +00:00
NAKAMURA Takumi	979e604d8c	Add newline at eof. llvm-svn: 195920	2013-11-28 14:52:52 +00:00
Rafael Espindola	3e3a3f1f85	Use the mangler consistently instead of using getGlobalPrefix directly. llvm-svn: 195911	2013-11-28 08:59:52 +00:00
Hal Finkel	92720ab1b2	Don't share functional units among the PPC itineraries Instead of sharing functional unit names between the various PPC itineraries, give each core its own unit names prefixed with the core name. This follows the convention used by other backends (such as ARM), and removes a non-obvious ordering dependency between the various PPCSchedule*.td files. No functionality change intended. llvm-svn: 195908	2013-11-28 06:05:59 +00:00
Jiangning Liu	4bc9dbd846	Remove the variable only used by assert to avoid the build failure caused by build options [-Werror,-Wunused-variable]. llvm-svn: 195905	2013-11-28 01:34:55 +00:00
Hao Liu	f9f468abee	AArch64: Fix a bug about disassembling post-index load single element to 4 vectors llvm-svn: 195903	2013-11-28 01:07:45 +00:00
Reed Kotler	0d409e2dfe	Check in conditional branches for constant islands. Still need to finish conditional branches for very large targets. That will be the next small patch. Everything now should in principle work as good (functionality wise) as without constant islands so we decided at Mips/Imagination to make constant islands the default for Mips16 now so that it will get excercised a lot and this port is still experimentatl though hopefully soon we will change the status. Some more cleanup and code review is in order but things are converging fast. llvm-svn: 195902	2013-11-28 00:56:37 +00:00
Akira Hatanaka	f6109e4ad7	[mips] Redefine TAILCALL as a pseudo instruction. No functionality change. llvm-svn: 195896	2013-11-27 23:58:32 +00:00
Akira Hatanaka	f9a0ec4fc4	Add MipsOptimizePICCall.cpp to CMakeLists.txt. llvm-svn: 195894	2013-11-27 23:47:25 +00:00
Akira Hatanaka	168d4e5b20	[mips] Implement the following optimizations using dominance information to make PIC calls a little more efficient: 1. Remove instructions setting up $gp if it is known that a function has been called at least once. 2. Save the address of a called function in a register instead of loading it from the GOT at every call site. llvm-svn: 195892	2013-11-27 23:38:42 +00:00
Hal Finkel	3e5a360ba3	Add IIC_ prefix to PPC instruction-class names This adds the IIC_ prefix to the instruction itinerary class names, giving the PPC backend a naming convention for itinerary classes that is more consistent with that used by the X86 and ARM backends. Instruction scheduling in the PPC backend needs a bunch of cleanup and improvement (especially for the ooo cores). This is just a preliminary step. No functionality change intended. llvm-svn: 195890	2013-11-27 23:26:09 +00:00
Rafael Espindola	c90584b6f6	Don't set GlobalPrefix to the default value. llvm-svn: 195884	2013-11-27 21:57:54 +00:00
Rafael Espindola	429e3fb068	The R600 has its own asm printer which doesn't use GlobalPrefix. Drop it. llvm-svn: 195883	2013-11-27 21:52:37 +00:00
Tom Stellard	175e7a8c97	R600: Expand vector FABS NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195881	2013-11-27 21:23:39 +00:00
Tom Stellard	c149dc02d3	R600/SI: Implement spilling of SGPRs v5 SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions. v2: - Fix encoding of Lane Mask - Use correct register flags, so we don't overwrite the low dword when restoring multi-dword registers. v3: - Register spilling seems to hang the GPU, so replace all shaders that need spilling with a dummy shader. v4: - Fix *LANE definitions - Change destination reg class for 32-bit SMRD instructions v5: - Remove small optimization that was crashing Serious Sam 3. https://bugs.freedesktop.org/show_bug.cgi?id=68224 https://bugs.freedesktop.org/show_bug.cgi?id=71285 NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195880	2013-11-27 21:23:35 +00:00
Tom Stellard	859199dad8	R600/SI: Use SGPR_32 register class for 32-bit SMRD outputs Writing to the M0 register from an SMRD instruction hangs the GPU, so we need to use the SGPR_32 register class, which does not include M0. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195879	2013-11-27 21:23:29 +00:00
Tom Stellard	4d566b2edf	R600: Add support for ISD::FROUND NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195878	2013-11-27 21:23:20 +00:00
Rafael Espindola	3dc549dbe3	Remove dead code. MO_ExternalSymbol and MO_JumpTableIndex don't show up in inline asm. llvm-svn: 195861	2013-11-27 18:38:14 +00:00
Rafael Espindola	52434f9673	Convert two if sequences to switches. llvm-svn: 195859	2013-11-27 18:26:51 +00:00
Rafael Espindola	ed20f478bc	Use a switch. llvm-svn: 195857	2013-11-27 18:18:24 +00:00
Rafael Espindola	c5c7bb6b20	Remove more dead code now that this is only used for inline asm. MO_ConstantPoolIndex is handled in printLeaMemReference. MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm. llvm-svn: 195847	2013-11-27 15:13:06 +00:00
Jiangning Liu	97aa8cf8b7	Fix the AArch64 NEON bug exposed by checking constant integer argument range of ACLE intrinsics. llvm-svn: 195843	2013-11-27 14:02:25 +00:00
Rafael Espindola	e370147b8c	Convert more methods in static helpers. llvm-svn: 195826	2013-11-27 07:34:09 +00:00
Rafael Espindola	7caa135677	Convert these methods into static functions. llvm-svn: 195825	2013-11-27 07:14:26 +00:00
Rafael Espindola	09cf06c75e	Cleanup and test X86AsmPrinter::printPCRelImm. It is only used for asm printing. On X86 we put basic block addresses on register before passing them to inline asm, so the MO_MachineBasicBlock case was dead. MO_ExternalSymbol was dead since any symbol being passed to inline asm is represented as MO_GlobalAddress. The MO_GlobalAddress and MO_Register cases were not tested. llvm-svn: 195824	2013-11-27 06:53:13 +00:00
Hal Finkel	8081ae9134	Fix comment in PPCA2Model llvm-svn: 195807	2013-11-27 03:12:56 +00:00
Rafael Espindola	d0ed730f92	Remove dead argument. llvm-svn: 195806	2013-11-27 02:25:20 +00:00
Chad Rosier	75290c6307	[AArch64] Add support for NEON scalar floating-point absolute difference. llvm-svn: 195803	2013-11-27 01:45:58 +00:00
Chad Rosier	9653d5c989	[AArch64] Add support for NEON scalar floating-point to integer convert instructions. llvm-svn: 195788	2013-11-26 22:17:37 +00:00
Reed Kotler	3aeb1d0857	Fix a bug related to constant islands for Mips16 and mips16/32 dual mode. The determination of when we are doing constant pools was being made too early in the asm printer. llvm-svn: 195781	2013-11-26 20:38:40 +00:00
Michael Liao	d617a3015d	Fix PR18054 - Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG lowering where we need to check whether x is a vector type (in-reg type) of i8, i16 or i32; otherwise, that optimization is not valid. llvm-svn: 195779	2013-11-26 20:31:31 +00:00
Tim Northover	fa36dfeeca	Darwin-ARM: use movw/movt for static relocations llvm-svn: 195759	2013-11-26 12:45:05 +00:00
Richard Sandiford	dd7dd930d1	[SystemZ] Fix incorrect use of RISBG for a zero-extended right shift We would wrongly transform the testcase into the equivalent of an AND with 1. The problem was that, when testing whether the shifted-in bits of the right shift were significant, we used the width of the final zero-extended result rather than the width of the shifted value. llvm-svn: 195731	2013-11-26 10:53:16 +00:00
Kevin Qin	599c47d0de	Refactored the implementation of AArch64 NEON instruction ZIP, UZP and TRN. Fix a bug when mixed use of vget_high_u8() and vuzp_u8(). llvm-svn: 195716	2013-11-26 03:26:47 +00:00
Kevin Qin	33ca18fdcf	[AArch64]Implement 128 bit register copy with NEON. llvm-svn: 195713	2013-11-26 02:33:42 +00:00
Andrew Trick	391dbadb51	StackMap: Implement support for DirectMemRefOp. A Direct stack map location records the address of frame index. This address is itself the value that the runtime requested. This differs from IndirectMemRefOp locations, which refer to a stack locations from which the requested values must be loaded. Direct locations can directly communicate the address if an alloca, while IndirectMemRefOp handle register spills. For example: entry: %a = alloca i64... llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a) Since both the alloca and stackmap intrinsic are in the entry block, and the intrinsic takes the address of the alloca, the runtime can assume that LLVM will not substitute alloca with any intervening value. This must be verified by the runtime by checking that the stack map's location is a Direct location type. The runtime can then determine the alloca's relative location on the stack immediately after compilation, or at any time thereafter. This differs from Register and Indirect locations, because the runtime can only read the values in those locations when execution reaches the instruction address of the stack map. llvm-svn: 195712	2013-11-26 02:03:25 +00:00
Andrew Trick	d3ab37cfeb	whitespace llvm-svn: 195711	2013-11-26 02:03:20 +00:00
Cameron McInally	c592e5251c	Add an intrinsic for the SSE2 PAUSE instruction. llvm-svn: 195697	2013-11-26 00:20:43 +00:00
Rafael Espindola	a834e30130	Do the string comparison in the constructor instead of once per nop. Thanks to Roman Divacky for the suggestion. llvm-svn: 195684	2013-11-25 20:50:03 +00:00
Rafael Espindola	1b8bfdaae3	Don't use nopl in cpus that don't support it. Patch by Mikulas Patocka. I added the test. I checked that for cpu names that gas knows about, it also doesn't generate nopl. The modified cpus: i686 - there are i686-class CPUs that don't have nopl: Via c3, Transmeta Crusoe, Microsoft VirtualBox - see https://bbs.archlinux.org/viewtopic.php?pid=775414 k6, k6-2, k6-3, winchip-c6, winchip2 - these are 586-class CPUs via c3 c3-2 - see https://bugs.archlinux.org/task/19733 as a proof that Via c3 and c3-Nehemiah don't have nopl llvm-svn: 195679	2013-11-25 20:15:14 +00:00
Tim Northover	d34094e525	Fix indentation typo llvm-svn: 195660	2013-11-25 17:04:35 +00:00
Tim Northover	db962e2c45	ARM: remove special cases for Darwin dynamic-no-pic mode. These are handled almost identically to static mode (and ELF's global address materialisation), except that a symbol may have "$non_lazy_ptr" appended. This can be handled by passing appropriate flags along with the instruction instead of using entirely separate pseudo-instructions. llvm-svn: 195655	2013-11-25 16:24:52 +00:00
Tim Northover	dfe2156c91	ARM: remove unused patterns. There is no sane way for an LEApcrel (= single ADR) instruction to generate a global address on any ARM target I know of. Fortunately, no-one was trying to any more, but there were vestigial patterns. llvm-svn: 195644	2013-11-25 14:40:57 +00:00
Amara Emerson	34df448f7c	[ARM] Enable FeatureMP for Cortex-A5 by default. Patch by Oliver Stannard. llvm-svn: 195640	2013-11-25 13:17:15 +00:00
Tim Northover	89ccb616bd	X86: enable AVX2 under Haswell native compilation Patch by Adam Strzelecki llvm-svn: 195632	2013-11-25 09:52:59 +00:00
Hao Liu	fbd2b4484c	Fixed a bug about disassembling AArch64 post-index load/store single element instructions. ie. echo "0x00 0x04 0x80 0x0d" \| ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble echo "0x00 0x00 0x80 0x0d" \| ../bin/llvm-mc -triple=aarch64 -mattr=+neon -disassemble will be disassembled into the same instruction st1 {v0b}[0], [x0], x0. llvm-svn: 195591	2013-11-25 01:53:26 +00:00
NAKAMURA Takumi	edbeaee857	SparcFrameLowering.cpp: Prune 'DL' [-Wunused-variable] llvm-svn: 195590	2013-11-25 00:52:46 +00:00
Venkatraman Govindaraju	1116868a0d	[Sparc] Emit large negative adjustments to SP/FP with sethi+xor instead of sethi+or. This generates correct code for both sparc32 and sparc64. llvm-svn: 195576	2013-11-24 20:23:25 +00:00
Venkatraman Govindaraju	9c338504e5	[Sparc]: Implement LEA pattern for sparcv9. llvm-svn: 195575	2013-11-24 20:07:35 +00:00
Venkatraman Govindaraju	f79528c132	[SparcV9]: Do not emit .register directives for global registers that are clobbered by calls but not used in the function itself. llvm-svn: 195574	2013-11-24 18:41:49 +00:00
Venkatraman Govindaraju	0510db0597	[SparcV9] Enable custom lowering of DYNAMIC_STACKALLOC in sparc64. llvm-svn: 195573	2013-11-24 17:41:41 +00:00
Reed Kotler	a787aa2b1e	Make sure that for C++ emitting LwConstant32 pseudos, that it corresponds to what is needed for constant islands. The prescan method for Mips16 constant islands will eventually go away. It is only temporary and should be done earlier when the instructions are first created or from the DAG. If we keep it here we need to handle better the situation where constant islands is called multiple times since don't want to prescan more than once. llvm-svn: 195569	2013-11-24 06:18:50 +00:00
Reed Kotler	d3b28ebe03	Fix a funny bug I introduced during conversion of ARM constant islands to Mips. I had to move some code and I moved a declaration forward past it's first use in the function but by nutty coincidence there was another variable of the same name and type and with completely unrelated function that was declared globally in the class so no compilation error ensued. It required some unusual conditions for it to even matter. Caused test case casts.c in test-suite to fail during compilation with a duplicate symbol error. I would have noticed it during final code review for this port. llvm-svn: 195565	2013-11-24 02:53:09 +00:00
Tom Stellard	c0845334da	R600/SI: Fixing handling of condition codes We were ignoring the ordered/onordered bits and also the signed/unsigned bits of condition codes when lowering the DAG to MachineInstrs. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195514	2013-11-22 23:07:58 +00:00
Jim Grosbach	860934a924	X86: Perform integer comparisons at i32 or larger. Utilizing the 8 and 16 bit comparison instructions, even when an input can be folded into the comparison instruction itself, is typically not worth it. There are too many partial register stalls as a result, leading to significant slowdowns. By always performing comparisons on at least 32-bit registers, performance of the calculation chain leading to the comparison improves. Continue to use the smaller comparisons when minimizing size, as that allows better folding of loads into the comparison instructions. rdar://15386341 llvm-svn: 195496	2013-11-22 19:57:47 +00:00
Paul Robinson	d89125a5d8	Teach ISel not to optimize 'optnone' functions (revised). Improvements over r195317: - Set/restore EnableFastISel flag instead of just running FastISel within SelectAllBasicBlocks; the flag is checked in various places, and FastISel won't run properly if those places don't do the right thing. - Test looks for normal ISel versus FastISel behavior, and not something more subtle that doesn't work everywhere. Based on work by Andrea Di Biagio. llvm-svn: 195491	2013-11-22 19:11:24 +00:00
Michael Liao	02160d580b	Fix PR18014 - When simplifying the mask generation for BLEND, check whether that mask is also consumed by other non-BLEND insns. If true, skip that simplification. llvm-svn: 195476	2013-11-22 17:56:57 +00:00
Richard Sandiford	f03789ca3f	[SystemZ] Fix TMHH and TMHL usage for z10 with -O0 I've no idea why I decided to handle TMxx differently from all the other high/low logic operations, but it was a stupid thing to do. The high registers aren't available as separate 32-bit registers on z10, so subreg_h32 can't be used on a GR64 there. I've normally been testing with z196 and with -O3 and so hadn't noticed this until now. llvm-svn: 195473	2013-11-22 17:28:28 +00:00
Rafael Espindola	5a8e985ad3	Don't produce tail calls when the caller is x86_thiscallcc. The callee will not pop the stack for us. llvm-svn: 195467	2013-11-22 15:18:28 +00:00
Daniel Sanders	d40aea8768	Fix typo in a comment added in r195455. Credit to Matheus Almeida for spotting it. llvm-svn: 195456	2013-11-22 13:22:52 +00:00
Daniel Sanders	630dbe0a14	[mips][msa] Fix corner case for integer constant splats with undef values. lowerBUILD_VECTOR() was treating integer constant splats as being legal regardless of whether they had undef values. This caused instruction selection failures when the undefs were legalized to zero, making the constant non-splat. Fixed this by requiring HasAnyUndef to be false for a integer constant splat to be legal. If it is true, a new node is generated with the undefs replaced with the necessary values to remain a splat. llvm-svn: 195455	2013-11-22 13:14:06 +00:00
Richard Barton	c31078cded	Add support for Cortex-A12. Patch by Oliver Stannard! llvm-svn: 195448	2013-11-22 11:53:16 +00:00
Daniel Sanders	fd8e416879	[mips][msa] Float vector constants cannot use ldi.[wd] directly. Bitcast from the appropriate integer vector type. Fixes an instruction selection failure detected by llvm-stress. llvm-svn: 195444	2013-11-22 11:24:50 +00:00
Kostya Serebryany	4007009815	Revert r195318 as it causes miscompilation (PR18029) llvm-svn: 195439	2013-11-22 10:30:39 +00:00
Hao Liu	e8bdc8c864	Fix a Cygwin build failure caused by enum values starting with '_', which is conflicted with some platform macros. This patch only renames variables, no functional change. llvm-svn: 195432	2013-11-22 09:24:41 +00:00
Hao Liu	25aed9bb5b	Fix the bugs about AArch64 Load/Store vector types and bitcast between i64 and vector types. e.g. "%tmp = load <2 x i64>* %ptr" can't be selected. "%tmp = bitcast i64 %in to <2 x i32>" can't be selected. llvm-svn: 195424	2013-11-22 08:47:22 +00:00
Hao Liu	91ae869692	Revert last change by haoliu because of buildbot failure. llvm-svn: 195423	2013-11-22 08:34:54 +00:00
Hao Liu	b75d80fdf0	Fix a Cygwin build failure caused by enum values starting with '_', which is conflicted with some platform macros. This solution only renames variables, no functional change. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195421	2013-11-22 08:17:16 +00:00
Jiangning Liu	a91633a435	For AArch64 back-end instruction selection, lower Neon_Lowxxx with EXTRCT_SUBREG. llvm-svn: 195408	2013-11-22 02:45:13 +00:00
Lang Hames	1ca1123598	Fix a typo where we were creating <def,kill> operands instead of <def,dead> ones. Add an assertion to make sure we catch this in the future. Fixes <rdar://problem/15464559>. llvm-svn: 195401	2013-11-22 00:46:32 +00:00
Tom Stellard	cd6b0a658a	R600: Implement TargetInstrInfo::isLegalToSplitMBBAt() Splitting a basic block will create a new ALU clause, so we need to make sure we aren't moving uses of registers that are local to their current clause into a new one. I had a test case for this, but unfortunately unrelated schedule changes invalidated it, and I wasn't been able to come up with another one. NOTE: This is a candidate for the 3.4 branch. llvm-svn: 195399	2013-11-22 00:41:08 +00:00
Ekaterina Romanova	d5fa55470c	SHLD/SHRD are VectorPath (microcode) instructions known to have poor latency on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) \| (y >> (64 - c))) when we are not optimizing for size. It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel. llvm-svn: 195383	2013-11-21 23:21:26 +00:00
Daniel Sanders	c8c50fb41f	[mips][msa] Fix a corner case in performORCombine() when combining nodes into VSELECT. Mask == ~InvMask asserts if the width of Mask and InvMask differ. The combine isn't valid (with two exceptions, see below) if the widths differ so test for this before testing Mask == ~InvMask. In the specific cases of Mask=~0 and InvMask=0, as well as Mask=0 and InvMask=~0, the combine is still valid. However, there are more appropriate combines that could be used in these cases such as folding x & 0 to 0, or x & ~0 to x. llvm-svn: 195364	2013-11-21 16:11:31 +00:00
Artyom Skrobov	468ee230ea	[ARM] add basic Cortex-A7 support to LLVM backend llvm-svn: 195358	2013-11-21 14:03:21 +00:00
Daniel Sanders	6e664bcef3	[mips][msa/dsp] Only do DSP combines if DSP is enabled. Fixes a crash (null pointer dereferenced) when MSA is enabled. llvm-svn: 195343	2013-11-21 11:40:14 +00:00
NAKAMURA Takumi	66c95430b8	Whitespace. llvm-svn: 195341	2013-11-21 11:08:31 +00:00
NAKAMURA Takumi	43aa939625	Revert r195317 (and r195333), "Teach ISel not to optimize 'optnone' functions." It broke, at least, i686 target. It is reproducible with "llc -mtriple=i686-unknown". FYI, it didn't appear to add either "-O0" or "-fast-isel". llvm-svn: 195339	2013-11-21 10:55:15 +00:00
Ana Pazos	9ac2fc85d2	Implemented Neon scalar vdup_lane intrinsics. Fixed scalar dup alias and added test case. llvm-svn: 195330	2013-11-21 08:16:15 +00:00
Ana Pazos	fbc1adbaa7	Implemented Neon scalar by element intrinsics. Intrinsics implemented: vqdmull_lane, vqdmulh_lane, vqrdmulh_lane, vqdmlal_lane, vqdmlsl_lane scalar Neon intrinsics. llvm-svn: 195327	2013-11-21 07:37:04 +00:00
Bill Wendling	07787f8747	The basic problem is that some mainstream programs cannot deal with the way clang optimizes tail calls, as in this example: int foo(void); int bar(void) { return foo(); } where the call is transformed to: calll .L0$pb .L0$pb: popl %eax .Ltmp0: addl $_GLOBAL_OFFSET_TABLE_+(.Ltmp0-.L0$pb), %eax movl foo@GOT(%eax), %eax popl %ebp jmpl *%eax # TAILCALL However, the GOT references must all be resolved at dlopen() time, and so this approach cannot be used with lazy dynamic linking (e.g. using RTLD_LAZY), which usually populates the PLT with stubs that perform the actual resolving. This patch changes X86TargetLowering::LowerCall() to skip tail call optimization, if the called function is a global or external symbol. Patch by Dimitry Andric! PR15086 llvm-svn: 195318	2013-11-21 07:04:30 +00:00
Paul Robinson	b379efeb53	Teach ISel not to optimize 'optnone' functions. Based on work by Andrea Di Biagio. llvm-svn: 195317	2013-11-21 06:33:32 +00:00
Reed Kotler	2fc05be887	Add, to constant islands, long jumps similar to ARM far branch. llvm-svn: 195312	2013-11-21 05:13:23 +00:00
Hal Finkel	884bde3031	PPC popcnt[dw] do not have record forms The instruction definitions incorrectly specified that popcntd and popcntw have record forms; they do not. This mistake was causing invalid code generation. llvm-svn: 195272	2013-11-20 20:54:55 +00:00
Daniel Sanders	f93e8152c4	[mips][msa] Pseudo instructions require HasMSA too. Inherit from MSAPseudo instead of MipsPseudo There's no test case for this commit. This is because it is doubtful that the incorrect behaviour can actually trigger. When MSA is not enabled, the type legalizer should have eliminated all occurrences of patterns the affected pseudo-instruction could possibly match before instruction selection occurs. llvm-svn: 195252	2013-11-20 14:32:28 +00:00
Daniel Sanders	6b97d604ff	[mips][msa] Remove unused instruction class MSA_I8_X_DESC_BASE llvm-svn: 195245	2013-11-20 13:01:10 +00:00
NAKAMURA Takumi	3dedf827f8	X86ISelLowering.cpp: Mark a variable VT as LLVM_ATTRIBUTE_UNUSED. [-Wunused-variable] llvm-svn: 195238	2013-11-20 10:55:22 +00:00
NAKAMURA Takumi	f2392ebb94	Whitespace. llvm-svn: 195237	2013-11-20 10:55:15 +00:00
Elena Demikhovsky	a5967af97d	Fixed compilation error. llvm-svn: 195230	2013-11-20 09:23:22 +00:00
Elena Demikhovsky	e1f9bf054f	AVX-512: Concat 4 128-bit vectors in one 512-bit vector. llvm-svn: 195229	2013-11-20 09:10:40 +00:00
Hal Finkel	22498fa6e3	PPC: Optimize rldicl generation for masked shifts Masking operations (where only some number of the low bits are being kept) are selected to rldicl(x, 0, mb). If x is a logical right shift (which would become rldicl(y, 64-n, n)), we might be able to fold the two instructions together: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb) for n <= mb The right shift is really a left rotate followed by a mask, and if the explicit mask is a more-restrictive sub-mask of the mask implied by the shift, only one rldicl is needed. llvm-svn: 195185	2013-11-20 01:10:15 +00:00
Jack Carter	03af6d1456	long line correction llvm-svn: 195175	2013-11-20 00:12:44 +00:00
Jack Carter	6ef6cc5c40	reverts 195057 per request llvm-svn: 195152	2013-11-19 20:53:28 +00:00
Cameron McInally	d1cd0be6f3	Fix assembly operands for the SSE2 cvtsd2ss instruction. llvm-svn: 195129	2013-11-19 14:36:00 +00:00
Simon Atanasyan	1093afe27a	[Mips] Adjust float ABI settings in case of MIPS16 mode. Hard float for mips16 means essentially to compile as soft float but to use a runtime library for soft float that is written with native mips32 floating point instructions (those runtime routines run in mips32 hard float mode). The patch reviewed by Reed Kotler. llvm-svn: 195123	2013-11-19 12:20:17 +00:00
Andrew Trick	0ab5ba8c35	Use symbolic operands in the patchpoint folding routine and fix a spilling bug. Fixes <rdar://15487687> [JS] AnyRegCC argument ends up being spilled llvm-svn: 195094	2013-11-19 03:29:59 +00:00
Andrew Trick	d4e3dc6d14	Add an abstraction to handle patchpoint operands. Hard-coded operand indices were scattered throughout lowering stages and layers. It was super bug prone. llvm-svn: 195093	2013-11-19 03:29:56 +00:00
Hao Liu	16edc4675c	Implement AArch64 neon instructions class SIMD lsone and SIMD lone-post. llvm-svn: 195078	2013-11-19 02:17:05 +00:00
Eric Christopher	37776fb3a6	Remove unused special member functions and reformat. llvm-svn: 195077	2013-11-19 02:01:07 +00:00
Eric Christopher	1192a8b428	Fix previous commit and fully remove variable. llvm-svn: 195076	2013-11-19 01:52:38 +00:00
Eric Christopher	ac81451e50	Remove unused variable. llvm-svn: 195075	2013-11-19 01:50:29 +00:00
Jiangning Liu	0c0c1e8598	Implement AArch64 SISD intrinsics for vget_high and vget_low. llvm-svn: 195074	2013-11-19 01:46:48 +00:00
Kevin Qin	7f8073edc2	implement MC layer of AArch64 neon instruction PMULL and PMULL2 with 128 bit integer. llvm-svn: 195072	2013-11-19 01:40:25 +00:00
Jiangning Liu	e329114ae5	Add predicate for AArch64 crypto instructions. llvm-svn: 195071	2013-11-19 01:38:31 +00:00
Jack Carter	b9fd457a32	[Mips] Support for MicroMips STO refactoring. No true functional changes. Change the "hack" name of emitMipsHackSTOCG to emitSymSTO. Remove demonstration code in AsmParser for emitMipsHackSTOCG and emitMipsHackELFFlags. The STO field is in an ELF symbol and is not an explicit directive. That said, we are missing the compliment call in AsmParser and that will need to be addressed soon. XFAIL dummy tests for emitMipsHackELFFlags and emitMipsHackELFFlags. These will built out with following patches. llvm-svn: 195067	2013-11-19 01:25:18 +00:00
Juergen Ributzka	d12ccbd343	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. The memory leaks in this version have been fixed. Thanks Alexey for pointing them out. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 195064	2013-11-19 00:57:56 +00:00
Jack Carter	86ac5c1b39	[Mips] MipsTargetStreamer refactoring. No functionality changes. llvm-svn: 195057	2013-11-18 23:55:27 +00:00
Reid Kleckner	8b2ad2a962	Revert "COFF: Emit all MCSymbols rather than filtering out some of them" This reverts commit r190888, to fix PR17967. The original change wasn't the right way to get @feat.00 into the object file. The right fix is to make @feat.00 be a global symbol. llvm-svn: 195053	2013-11-18 23:08:12 +00:00
Matt Arsenault	3a4d86a1a4	R600/SI: Fix moveToVALU when the first operand is VSrc. Moving into a VSrc doesn't always work, since it could be replaced with an SGPR later. llvm-svn: 195042	2013-11-18 20:09:55 +00:00
Matt Arsenault	08f7e37aa9	R600/SI: Fix multiple SGPR reads when using VCC. No other SGPR operands are allowed, so if VCC is used, move the other to a VGPR. llvm-svn: 195041	2013-11-18 20:09:50 +00:00
Matt Arsenault	fb826fa6e1	R600/SI: Implement add i64, but do not yet enable. Test doesn't actually check the output. I need to fix add i64 being matched for the addressing calculations. llvm-svn: 195040	2013-11-18 20:09:47 +00:00
Matt Arsenault	bf6e1e7ff7	R600/SI: Specify SSrc operands llvm-svn: 195039	2013-11-18 20:09:43 +00:00
Matt Arsenault	e8d214662a	R600/SI: addc / adde i32 are legal llvm-svn: 195038	2013-11-18 20:09:40 +00:00
Matt Arsenault	04fca446b1	R600/SI: Match addc to S_ADD_U32. The carry always goes to SCC. llvm-svn: 195037	2013-11-18 20:09:37 +00:00
Matt Arsenault	f8c089ac25	R600/SI: Match adde/sube to S_ADDC_U32/S_SUBB_U32 llvm-svn: 195036	2013-11-18 20:09:34 +00:00
Matt Arsenault	e27a41b5a4	R600/SI: Specify S_ADD/S_SUB set SCC and add is commutable llvm-svn: 195035	2013-11-18 20:09:32 +00:00
Matt Arsenault	43b8e4ed3b	R600/SI: Move patterns to match add / sub to scalar instructions llvm-svn: 195034	2013-11-18 20:09:29 +00:00
Matt Arsenault	f0b1e3a776	R600/SI: Fix extra defs of VCC / SCC. When replacing scalar operations with vector, the wrong implicit output register was used. llvm-svn: 195033	2013-11-18 20:09:21 +00:00
Tom Stellard	66df8a2c0a	R600: Enable the IR structurizer by default llvm-svn: 195031	2013-11-18 19:43:44 +00:00
Tom Stellard	827ec9b630	R600: Fix a crash in the AMDILCFGStrucurizer The ifPatternMatch() function was not correctly reporting the number of matches in some cases. llvm-svn: 195030	2013-11-18 19:43:38 +00:00
Tom Stellard	783893a893	R600: Add a SubtargetFeatture for disabling the ifcvt pass. This is useful when writing test cases for the AMDIL structurizer. llvm-svn: 195029	2013-11-18 19:43:33 +00:00
Tom Stellard	f1e3f77507	R600: Use lower-case for EnableIRStructurizer feature llc converts all values passed to -mattr= to lowercase, so this enables us to toggle this feature when using llc. llvm-svn: 195028	2013-11-18 19:43:29 +00:00
Tom Stellard	f340787d79	R600/SI: Fix illegal VGPR->SGPR copy inside of loop llvm-svn: 195026	2013-11-18 18:50:20 +00:00
Tom Stellard	13de545693	R600/SI: Fix another case of illegal VGPR->SGPR copy llvm-svn: 195025	2013-11-18 18:50:15 +00:00
Daniel Sanders	08d3cd163d	[mips] Fix 'ran out of registers' in MIPS32 with FP64 when generating code for (ConstantFP 0.0) Fixed an inappropriate use of BuildPairF64 when compiling for MIPS32 with FP64 which resulted in an impossible constraint on the register allocation. It now uses BuildPairF64_64. llvm-svn: 195007	2013-11-18 13:12:43 +00:00
Matheus Almeida	50c6e82222	[mips][msa] Update encoding of bnz.v (typo). Note that there's no hardware yet that relies on that encoding. llvm-svn: 195006	2013-11-18 13:09:54 +00:00
Matheus Almeida	779c593708	[mips][msa] Fix immediate value of LSA instruction as it was being wrongly encoded. The immediate field should be encoded as "imm - 1" as the CPU always adds one to that field. llvm-svn: 195004	2013-11-18 12:32:49 +00:00
Alexey Samsonov	49109a279c	Revert r194865 and r194874. This change is incorrect. If you delete virtual destructor of both a base class and a subclass, then the following code: Base *foo = new Child(); delete foo; will not cause the destructor for members of Child class. As a result, I observe plently of memory leaks. Notable examples I investigated are: ObjectBuffer and ObjectBufferStream, AttributeImpl and StringSAttributeImpl. llvm-svn: 194997	2013-11-18 09:31:53 +00:00
Kevin Qin	6588c1a638	[AArch64 NEON]Add mov alias for simd copy instructions. Set some unspecified bits of INS/DUP to zero as ARMARM requested. llvm-svn: 194996	2013-11-18 09:20:32 +00:00
Hao Liu	5a4e4e107d	Implement the newly added ACLE functions for ld1/st1 with 2/3/4 vectors. The functions are like: vst1_s8_x2 ... llvm-svn: 194990	2013-11-18 06:31:53 +00:00
Andrew Trick	10d5be4e6e	Added a size field to the stack map record to handle subregister spills. Implementing this on bigendian platforms could get strange. I added a target hook, getStackSlotRange, per Jakob's recommendation to make this as explicit as possible. llvm-svn: 194942	2013-11-17 01:36:23 +00:00
Juergen Ributzka	565acf9278	The WebKit_JS CC preserves the same registers as the C CC. llvm-svn: 194936	2013-11-16 22:08:58 +00:00
Vincent Lejeune	745d4298b1	R600: Make dot_4 instructions predicable llvm-svn: 194927	2013-11-16 16:24:41 +00:00
Jim Grosbach	664d148a92	X86: Encode the 'h' cpu subtype in the MachO header for x86. llvm-svn: 194906	2013-11-16 00:52:57 +00:00
Ana Pazos	d035209bd7	Implemented aarch64 Neon scalar vmulx_lane intrinsics Implemented aarch64 Neon scalar vfma_lane intrinsics Implemented aarch64 Neon scalar vfms_lane intrinsics Implemented legacy vmul_n_f64, vmul_lane_f64, vmul_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. Implemented legacy vfma_lane_f64, vfms_lane_f64, vfma_laneq_f64, vfms_laneq_f64 intrinsics (v1f64 parameter type) using Neon scalar instructions. llvm-svn: 194888	2013-11-15 23:32:10 +00:00
Lang Hames	56045cb219	Remove unused arguments. llvm-svn: 194882	2013-11-15 23:19:01 +00:00
Lang Hames	24e3954700	During folding for patchpoint/stackmap instructions, defer creation of new MIs until we know that folding will be successful. No functional change. llvm-svn: 194880	2013-11-15 23:13:21 +00:00
Juergen Ributzka	dbedae89b9	[weak vtables] Remove a bunch of weak vtables This patch removes most of the trivial cases of weak vtables by pinning them to a single object file. Differential Revision: http://llvm-reviews.chandlerc.com/D2068 Reviewed by Andy llvm-svn: 194865	2013-11-15 22:34:48 +00:00
Matt Arsenault	f14032af0e	Make method static llvm-svn: 194858	2013-11-15 22:02:28 +00:00
Chad Rosier	0c57c3402e	[AArch64] Fix the scalar NEON ACLE functions so that they return float/double rather than the vector equivalent. llvm-svn: 194853	2013-11-15 21:28:10 +00:00
Bob Wilson	9f3e6b25ee	Avoid illegal integer promotion in fastisel Stop folding constant adds into GEP when the type size doesn't match. Otherwise, the adds' operands are effectively being promoted, changing the conditions of an overflow. Results are different when: sext(a) + sext(b) != sext(a + b) Problem originally found on x86-64, but also fixed issues with ARM and PPC, which used similar code. <rdar://problem/15292280> Patch by Duncan Exon Smith! llvm-svn: 194840	2013-11-15 19:09:27 +00:00
Tom Stellard	519ae39c45	R600/SI: Add VReg_96 register class to SIRegisterInfo::hasVGPRs() This fixes a crash with GNOME settings manager. llvm-svn: 194836	2013-11-15 18:26:45 +00:00
Cameron McInally	ad41f1f693	Add AVX512 unmasked FMA intrinsics and support. llvm-svn: 194824	2013-11-15 17:01:14 +00:00
Daniel Sanders	71ce0cab02	[mips][msa] lowerMSABitClear() should use SelectionDAG::getNOT() instead of using a long-winded equivalent. Now that getConstant(-1, MVT::v2i64) works correctly on MIPS32 we can use SelectionDAG::getNOT() to produce the bitmask. llvm-svn: 194819	2013-11-15 16:02:04 +00:00
Alexey Samsonov	0d4f1c51db	Hopefully fix uninitialized memory read in AArch64AsmParser found by MSan bootstrap bot llvm-svn: 194818	2013-11-15 15:49:30 +00:00

... 6 7 8 9 10 ...

26916 Commits