llvm-project

Commit Graph

Author	SHA1	Message	Date
Nadav Rotem	977e0be4a0	Efficient lowering of vector sdiv when the divisor is a splatted power of two constant. PR 14848. The lowered sequence is based on the existing sequence the target-independent DAG Combiner creates for the scalar case. Patch by Zvi Rackover. llvm-svn: 171953	2013-01-09 05:14:33 +00:00
Andrew Trick	9f0b95f260	MIsched: add an ILP window property to machine model. This was an experimental option, but needs to be defined per-target. e.g. PPC A2 needs to aggressively hide latency. I converted some in-order scheduling tests to A2. Hal is working on more test cases. llvm-svn: 171946	2013-01-09 03:36:49 +00:00
Tim Northover	90fb75d859	Specify complete triple for fp128 tests. This avoids FileCheck failing over different comment characters in assembly (notably powerpc64 on Linux vs Darwin) and should fix David's build-bot. llvm-svn: 171886	2013-01-08 19:36:33 +00:00
Preston Gurd	a01daace88	Pad Short Functions for Intel Atom The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. This patch has been updated to address Nadav's review comments - Optimize only at >= O1 and don't do optimization if -Os is set - Stores MachineBasicBlock* instead of BBNum - Uses DenseMap instead of std::map - Fixes placement of braces Patch by Andy Zhang. llvm-svn: 171879	2013-01-08 18:27:24 +00:00
Tim Northover	7bb9992cce	Allow the asm printer to print fp128 values properly. llvm-svn: 171866	2013-01-08 16:56:23 +00:00
Bill Schmidt	9b1e3e25dc	This patch addresses bug 14678 by fixing two problems in medium code model code generation. Variables addressed through a GlobalAlias were not being handled, and variables with available_externally linkage were treated incorrectly. The patch contains two new tests to verify the correct code generation for these cases. llvm-svn: 171778	2013-01-07 19:29:18 +00:00
Silviu Baranga	a055aab506	Make the MergeGlobals pass correctly handle the address space qualifiers of the global variables. We partition the set of globals by their address space, and apply the same the trasnformation as before to merge them. llvm-svn: 171730	2013-01-07 12:31:25 +00:00
Craig Topper	4f1c7256f9	Fix suffix handling for parsing and printing of cvtsi2ss, cvtsi2sd, cvtss2si, cvttss2si, cvtsd2si, and cvttsd2si to match gas behavior. cvtsi2* should parse with an 'l' or 'q' suffix or no suffix at all. No suffix should be treated the same as 'l' suffix. Printing should always print a suffix. Previously we didn't parse or print an 'l' suffix. cvtt2si/cvt2si should parse with an 'l' or 'q' suffix or not suffix at all. No suffix should use the destination register size to choose encoding. Printing should not print a suffix. Original 'l' suffix issue with cvtsi2* pointed out by Michael Kuperstein. llvm-svn: 171668	2013-01-06 20:39:29 +00:00
Evan Cheng	3fb03e23a4	Fix for PR14739. It's not safe to fold a load into a call across a store. Thanks to Nick Lewycky for the initial patch. llvm-svn: 171665	2013-01-06 19:00:15 +00:00
Craig Topper	92a70b1e65	Recommit r171461 which was incorrectly reverted. Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171608	2013-01-05 07:39:25 +00:00
Nadav Rotem	478b6a47ec	Revert revision 171524. Original message: URL: http://llvm.org/viewvc/llvm-project?rev=171524&view=rev Log: The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171603	2013-01-05 05:42:48 +00:00
Preston Gurd	e36b685a94	The current Intel Atom microarchitecture has a feature whereby when a function returns early then it is slightly faster to execute a sequence of NOP instructions to wait until the return address is ready, as opposed to simply stalling on the ret instruction until the return address is ready. When compiling for X86 Atom only, this patch will run a pass, called "X86PadShortFunction" which will add NOP instructions where less than four cycles elapse between function entry and return. It includes tests. Patch by Andy Zhang. llvm-svn: 171524	2013-01-04 20:54:54 +00:00
Akira Hatanaka	b13b33359b	[mips] MipsTargetLowering::getSetCCResultType should return a vector type if vectors are being compared. llvm-svn: 171517	2013-01-04 20:06:01 +00:00
Nadav Rotem	c616a5408a	Revert revision: 171467. This transformation is incorrect and makes some tests fail. Original message: Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171468	2013-01-04 17:35:21 +00:00
Elena Demikhovsky	5f2f06d2d9	Simplified TRUNCATE operation that comes after SETCC. It is possible since SETCC result is 0 or -1. Added a test. llvm-svn: 171467	2013-01-03 08:48:33 +00:00
Michael Gottesman	820aac1c78	Revert "Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks." This reverts commit r171461 since it breaks the following tests: Clang :: Analysis/outofbound-notwork.c Clang :: Analysis/string-fail.c Clang :: CXX/basic/basic.lookup/basic.lookup.qual/p6-0x.cpp Clang :: CXX/basic/basic.lookup/basic.lookup.unqual/p15.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.fct.spec/p4.cpp Clang :: CXX/dcl.dcl/dcl.spec/dcl.stc/p10.cpp Clang :: CXX/temp/temp.param/p14.cpp Clang :: CXX/temp/temp.res/temp.dep.res/temp.point/p1.cpp Clang :: CodeGen/2009-02-13-zerosize-union-field-ppc.c Clang :: CodeGen/blocks-2.c Clang :: CodeGen/libcalls-d.c Clang :: CodeGen/libcalls-ld.c Clang :: CodeGenCXX/conversion-function.cpp Clang :: CodeGenCXX/debug-info-limit-type.cpp Clang :: CodeGenCXX/inheriting-constructor.cpp Clang :: FixIt/fixit-errors.c Clang :: FixIt/fixit-pmem.cpp Clang :: Modules/namespaces.cpp Clang :: PCH/changed-files.c Clang :: PCH/pr4489.c Clang :: PCH/source-manager-stack.c Clang :: Parser/cxx-ambig-decl-expr-xfail.cpp Clang :: SemaCXX/switch-implicit-fallthrough-cxx98.cpp Clang :: SemaTemplate/instantiate-function-1.mm llvm-svn: 171466	2013-01-03 08:18:30 +00:00
Craig Topper	7c27cc9fd0	Mark DIV/IDIV instructions hasSideEffects=1 because they can trap when dividing by 0. This is needed to keep early if conversion from moving them across basic blocks. llvm-svn: 171461	2013-01-03 06:40:20 +00:00
Jakob Stoklund Olesen	725d57682b	Fix PR14732 by handling all kinds of IMPLICIT_DEF live ranges. Most IMPLICIT_DEF instructions are removed by the ProcessImplicitDefs pass, and a few are reinserted by PHIElimination when a PHI argument is <undef>. RegisterCoalescer was assuming that all IMPLICIT_DEF live ranges look like those created by PHIElimination, and that their live range never leaves the basic block. The PR14732 test case does tricks with PHI nodes that causes a longer IMPLICIT_DEF live range to appear. This happens very rarely, but RegisterCoalescer should be able to handle it. llvm-svn: 171435	2013-01-03 00:47:51 +00:00
Tom Stellard	567f886eb0	DAGCombiner: Avoid generating illegal vector INT_TO_FP nodes DAGCombiner::reduceBuildVecConvertToConvertBuildVec() was making two mistakes: 1. It was checking the legality of scalar INT_TO_FP nodes and then generating vector nodes. 2. It was passing the result value type to TargetLoweringInfo::getOperationAction() when it should have been passing the value type of the first operand. llvm-svn: 171420	2013-01-02 22:13:01 +00:00
Nadav Rotem	c8d7047fa9	AVX: Fix a bug in WidenMaskArithmetic. llvm-svn: 171397	2013-01-02 17:40:39 +00:00
Hal Finkel	6dbdd4307b	Support ppcf128 in SelectionDAG::getConstantFP Fixes pr14751. Patch by Kai; Thanks! llvm-svn: 171261	2012-12-30 19:03:32 +00:00
Dmitri Gribenko	56bf2e1830	Tests: rewrite 'opt ... %s' to 'opt ... < %s' so that opt does not emit a ModuleID This is done to avoid odd test failures, like the one fixed in r171243. llvm-svn: 171250	2012-12-30 02:33:22 +00:00
Nadav Rotem	3da9ac72fa	AVX: Move the ZEXT/ANYEXT DAGCo optimizations to the lowering of these optimizations. The old test cases still cover all of these lowering/optimizations. The single change that we have is that now anyext does not need to zero a register, because it does not use the exact code path as the zero_extend. llvm-svn: 171178	2012-12-28 05:45:24 +00:00
Nadav Rotem	2a054b4475	On AVX/AVX2 the type v8i1 is legalized to v8i16, which is an XMM sized register. In most cases we actually compare or select YMM-sized registers and mixing the two types creates horrible code. This commit optimizes some of the transition sequences. PR14657. llvm-svn: 171148	2012-12-27 08:15:45 +00:00
NAKAMURA Takumi	40aa3285f4	llvm/test/CodeGen/X86: FileCheck-ize two tests in r171083. llvm-svn: 171084	2012-12-26 03:19:30 +00:00
NAKAMURA Takumi	334f685328	llvm/test/CodeGen/X86: Disable avx in two tests corresponding to r171082. llvm-svn: 171083	2012-12-26 03:08:55 +00:00
Hal Finkel	2ebe6d08cd	Loosen scheduling restrictions on the PPC dcbt intrinsic As with the prefetch intrinsic to which it maps, simply have dcbt marked as reading from and writing to its arguments instead of having unmodeled side effects. While this might cause unwanted code motion (because aliasing checks don't really capture cache-line sharing), it is more important that prefetches in unrolled loops don't block the scheduler from rearranging the unrolled loop body. llvm-svn: 171073	2012-12-25 18:51:18 +00:00
Hal Finkel	1b5ff08d43	Expand PPC64 atomic load and store Use of store or load with the atomic specifier on 64-bit types would cause instruction-selection failures. As with the 32-bit case, these can use the default expansion in terms of cmp-and-swap. llvm-svn: 171072	2012-12-25 17:22:53 +00:00
Benjamin Kramer	a9f265ee98	Harden test so it's not affected by changes to compare lowering. This only failed on hosts that don't have SSE41. llvm-svn: 171066	2012-12-25 13:23:23 +00:00
Benjamin Kramer	81b5a8fd2e	X86: Shave off one shuffle from the pcmpeqq sequence for SSE2 by making use of and commutativity. llvm-svn: 171064	2012-12-25 13:09:08 +00:00
Benjamin Kramer	df4af41b9b	X86: Custom lower <2 x i64> eq and ne when SSE41 is not available. pcmpeqd, pshufd, pshufd, pand is cheaper than unpack + cmpq, sbbq, cmpq, sbbq + pack. Small speedup on loop-vectorized viterbi (-march=core2). llvm-svn: 171063	2012-12-25 12:54:19 +00:00
NAKAMURA Takumi	1b18db7ea3	llvm/test/CodeGen/X86/fold-vex.ll: Add explicit triple. llvm-svn: 171029	2012-12-24 11:14:06 +00:00
Nadav Rotem	dc0ad92b64	Some x86 instructions can load/store one of the operands to memory. On SSE, this memory needs to be aligned. When these instructions are encoded in VEX (on AVX) there is no such requirement. This changes the folding tables and removes the alignment restrictions from VEX-encoded instructions. llvm-svn: 171024	2012-12-24 09:40:33 +00:00
Benjamin Kramer	76268ac682	X86: Turn mul of <4 x i32> into pmuludq when no SSE4.1 is available. pmuludq is slow, but it turns out that all the unpacking and packing of the scalarized mul is even slower. 10% speedup on loop-vectorized paq8p. llvm-svn: 170985	2012-12-22 16:07:56 +00:00
Benjamin Kramer	b2f0a2bd4b	X86: Emit vector sext as shuffle + sra if vpmovsx is not available. Also loosen the SSSE3 dependency a bit, expanded pshufb + psra is still better than scalarized loads. Fixes PR14590. llvm-svn: 170984	2012-12-22 11:34:28 +00:00
Nadav Rotem	d5aae980cb	In some cases, due to scheduling constraints we copy the EFLAGS. The only way to read the eflags is using push and pop. If we don't adjust the stack then we run over the first frame index. This is not something that we want to do, so we have to make sure that our machine function does not copy the flags. If it does then we have to emit the prolog that adjusts the stack. rdar://12896831 llvm-svn: 170961	2012-12-21 23:48:49 +00:00
Benjamin Kramer	b4688f84bd	try to unbreak ppc buildbots. llvm-svn: 170913	2012-12-21 18:11:45 +00:00
Benjamin Kramer	82d1c371e2	X86: Match pmin/pmax as a target specific dag combine. This occurs during vectorization. Part of PR14667. llvm-svn: 170908	2012-12-21 17:46:58 +00:00
Tom Stellard	a8b0351720	R600: Expand vec4 INT <-> FP conversions llvm-svn: 170901	2012-12-21 16:33:24 +00:00
Reed Kotler	93f778d2bd	Add test case for r170674 llvm-svn: 170823	2012-12-21 00:55:10 +00:00
Eric Christopher	6e47b725ff	Move these files over to the debug info directory. llvm-svn: 170810	2012-12-21 00:03:42 +00:00
Bob Wilson	7bba4f8957	Revert "Adding support for llvm.arm.neon.vaddl[su].* and" This reverts r170694. The operations can be represented in IR without adding any new intrinsics. llvm-svn: 170765	2012-12-20 21:09:38 +00:00
Evan Cheng	ddc0cb6dc5	On some ARM cpus, flags setting movs with shifter operand, i.e. lsl, lsr, asr, are more expensive than the non-flag setting variant. Teach thumb2 size reduction pass to avoid generating them unless we are optimizing for size. rdar://12892707 llvm-svn: 170728	2012-12-20 19:59:30 +00:00
Rafael Espindola	642c7cd56e	Simplify the testcase a bit. I checked that it would still crash llc before the corresponding fix. llvm-svn: 170709	2012-12-20 17:47:27 +00:00
Renato Golin	6b2ea4a48f	Adding support for llvm.arm.neon.vaddl[su].* and llvm.arm.neon.vsub[su].* intrinsics. Patch by Pete Couperus <pjcoup@gmail.com> llvm-svn: 170694	2012-12-20 13:52:11 +00:00
Reed Kotler	d019dbf75e	fix most of remaining issues with large frames. these patches are tested a lot by test-suite but make check tests are forthcoming once the next few patches that complete this are committed. with the next few patches the pass rate for mips16 is near 100% llvm-svn: 170656	2012-12-20 04:07:42 +00:00
Akira Hatanaka	f423672117	[mips] Use "or $r0, $r1, $zero" instead of "addu $r0, $zero, $r1" to copy physical register $r1 to $r0. GNU disassembler recognizes an "or" instruction as a "move", and this change makes the disassembled code easier to read. Original patch by Reed Kotler. llvm-svn: 170655	2012-12-20 04:06:06 +00:00
Bob Wilson	3365b80290	Do not introduce vector operations in functions marked with noimplicitfloat. <rdar://problem/12879313> llvm-svn: 170630	2012-12-20 01:36:20 +00:00
Evan Cheng	eae6d2ccea	LLVM sdisel normalize bit extraction of the form: ((x & 0xff00) >> 8) << 2 to (x >> 6) & 0x3fc This is general goodness since it folds a left shift into the mask. However, the trailing zeros in the mask prevents the ARM backend from using the bit extraction instructions. And worse since the mask materialization may require an addition instruction. This comes up fairly frequently when the result of the bit twiddling is used as memory address. e.g. = ptr[(x & 0xFF0000) >> 16] We want to generate: ubfx r3, r1, #16, #8 ldr.w r3, [r0, r3, lsl #2] vs. mov.w r9, #1020 and.w r2, r9, r1, lsr #14 ldr r2, [r0, r2] Add a late ARM specific isel optimization to ARMDAGToDAGISel::PreprocessISelDAG(). It folds the left shift to the 'base + offset' address computation; change the mask to one which doesn't have trailing zeros and enable the use of ubfx. Note the optimization has to be done late since it's target specific and we don't want to change the DAG normalization. It's also fairly restrictive as shifter operands are not always free. It's only done for lsh 1 / 2. It's known to be free on some cpus and they are most common for address computation. This is a slight win for blowfish, rijndael, etc. rdar://12870177 llvm-svn: 170581	2012-12-19 20:16:09 +00:00
Benjamin Kramer	c5071466d4	PowerPC: Expand VSELECT nodes. There's probably a better expansion for those nodes than the default for altivec, but this is better than crashing. VSELECTs occur in loop vectorizer output. llvm-svn: 170551	2012-12-19 15:49:14 +00:00

1 2 3 4 5 ...

6773 Commits