llvm-project

Commit Graph

Author	SHA1	Message	Date
Dan Gohman	ab316350bf	Fix fast-isel to not emit invalid assembly when presented with a constant shift count that doesn't fit in the shift instruction's immediate field. This fixes PR3242. llvm-svn: 61281	2008-12-20 17:19:40 +00:00
Dan Gohman	bb92a1b815	Use the correct Preds and Succs lists in setHeightDirty() and setDepthDirty(), respectively. This fixes PR3241. llvm-svn: 61276	2008-12-20 16:34:57 +00:00
Evan Cheng	0869f78555	Fix PR3149. If an early clobber def is a physical register and it is tied to an input operand, it effectively extends the live range of the physical register. Currently we do not have a good way to represent this. 172 %ECX<def> = MOV32rr %reg1039<kill> 180 INLINEASM <es:subl $5,$1 sbbl $3,$0>, 10, %EAX<def>, 14, %ECX<earlyclobber,def>, 9, %EAX<kill>, 36, <fi#0>, 1, %reg0, 0, 9, %ECX<kill>, 36, <fi#1>, 1, %reg0, 0 188 %EAX<def> = MOV32rr %EAX<kill> 196 %ECX<def> = MOV32rr %ECX<kill> 204 %ECX<def> = MOV32rr %ECX<kill> 212 %EAX<def> = MOV32rr %EAX<kill> 220 %EAX<def> = MOV32rr %EAX 228 %reg1039<def> = MOV32rr %ECX<kill> The early clobber operand ties ECX input to the ECX def. The live interval of ECX is represented as this: %reg20,inf = [46,47:1)[174,230:0) 0@174-(230) 1@46-(47) The right way to represent this is something like %reg20,inf = [46,47:2)[174,182:1)[181:230:0) 0@174-(182) 1@181-230 @2@46-(47) Of course that won't work since that means overlapping live ranges defined by two val#. The workaround for now is to add a bit to val# which says the val# is redefined by a early clobber def somewhere. This prevents the move at 228 from being optimized away by SimpleRegisterCoalescing::AdjustCopiesBackFrom. llvm-svn: 61259	2008-12-19 20:58:01 +00:00
Evan Cheng	3b3de7c228	- CodeGenPrepare does not split loop back edges but it only knows about back edges of single block loops. It now does a DFS walk to find loop back edges. - Use SplitBlockPredecessors to factor out common predecessors of the critical edge destination. This is disabled for now due to some regressions. llvm-svn: 61248	2008-12-19 18:03:11 +00:00
Rafael Espindola	770b4b830a	Fix bug 3202. The EH_frame and .eh symbols are now private, except for darwin9 and earlier. The patch also fixes the definition of PrivateGlobalPrefix on pcc linux. llvm-svn: 61242	2008-12-19 10:55:56 +00:00
Mon P Wang	308a1acaaf	Fix test to account for generating some vector code for mul v2i64 instead of incorrectly generating pmuldq llvm-svn: 61228	2008-12-18 23:42:37 +00:00
Mon P Wang	6e5f4bc1e7	Added some basic test cases for r61209 llvm-svn: 61210	2008-12-18 20:05:58 +00:00
Eli Friedman	6cf404f2d1	Fix for PR3225: disable a broken optimization in DAGTypeLegalizer::ExpandShiftWithKnownAmountBit. In terms of restoring the optimization, the best fix here isn't obvious... any ideas? llvm-svn: 61119	2008-12-17 03:35:17 +00:00
Dale Johannesen	f51dcef803	A new dag combine; several permutations of this are there under ADD, this one was missing. llvm-svn: 61107	2008-12-16 22:13:49 +00:00
Dan Gohman	51559185f1	Enable anti-dependence breaking by default when post-RA scheduling is enabled. llvm-svn: 61078	2008-12-16 06:21:45 +00:00
Dan Gohman	dddc1ac7ea	Fix some register-alias-related bugs in the post-RA scheduler liveness computation code. Also, avoid adding output-depenency edges when both defs are dead, which frequently happens with EFLAGS defs. Compute Depth and Height lazily, and always in terms of edge latency values. For the schedulers that don't care about latency, edge latencies are set to 1. Eliminate Cycle and CycleBound, and LatencyPriorityQueue's Latencies array. These are all subsumed by the Depth and Height fields. llvm-svn: 61073	2008-12-16 03:25:46 +00:00
Mon P Wang	580f2c7b61	Added support for splitting and scalarizing vector shifts. llvm-svn: 61050	2008-12-15 21:44:00 +00:00
Mon P Wang	ac4e120912	Added support to LegalizeType for expanding the operands of scalar to vector and insert vector element. Modified extract vector element to extend the result to match the expected promoted type. llvm-svn: 61029	2008-12-15 06:57:02 +00:00
Bill Wendling	c4499feb1a	- Use patterns instead of creating completely new instruction matching patterns, which are identical to the original patterns. - Change the multiply with overflow so that we distinguish between signed and unsigned multiplication. Currently, unsigned multiplication with overflow isn't working! llvm-svn: 60963	2008-12-12 21:15:41 +00:00
Bill Wendling	0864a75ebf	If ADD, SUB, or MUL have an overflow bit that's used, don't do transformation on them. The DAG combiner expects that nodes that are transformed have one value result. llvm-svn: 60857	2008-12-10 22:36:00 +00:00
Mon P Wang	4637c3c698	Fixed a bug when trying to optimize a extract vector element of a bit convert that changes the number of elements of a shuffle. llvm-svn: 60829	2008-12-10 03:59:02 +00:00
Bill Wendling	8008cb9a77	Implement fast-isel conversion of a branch instruction that's branching on an overflow/carry from the "arithmetic with overflow" intrinsics. It searches the machine basic block from bottom to top to find the SETO/SETC instruction that is its conditional. If an instruction modifies EFLAGS before it reaches the SETO/SETC instruction, then it defaults to the normal instruction emission. llvm-svn: 60807	2008-12-09 23:19:12 +00:00
Bill Wendling	db8ec2d75a	Add sub/mul overflow intrinsics. This currently doesn't have a target-independent way of determining overflow on multiplication. It's very tricky. Patch by Zoltan Varga! llvm-svn: 60800	2008-12-09 22:08:41 +00:00
Duncan Sands	445071c44f	Fix PR3117: not all nodes being legalized. The essential problem was that the DAG can contain random unused nodes which were never analyzed. When remapping a value of a node being processed, such a node may become used and need to be analyzed; however due to operands being transformed during analysis the node may morph into a different one. Users of the morphing node need to be updated, and this wasn't happening. While there I added a bunch of documentation and sanity checks, so I (or some other poor soul) won't have to scratch their head over this stuff so long trying to remember how it was all supposed to work next time some obscure problem pops up! The extra sanity checking exposed a few places where invariants weren't being preserved, so those are fixed too. Since some of the sanity checking is expensive, I added a flag to turn it on. It is also turned on when building with ENABLE_EXPENSIVE_CHECKS=1. llvm-svn: 60797	2008-12-09 21:33:20 +00:00
Mon P Wang	4dd832d241	Fix getNode to allow a vector for the shift amount for shifts of vectors. Fix the shift amount when unrolling a vector shift into scalar shifts. Fix problem in getShuffleScalarElt where it assumes that the input of a bit convert must be a vector. llvm-svn: 60740	2008-12-09 05:46:39 +00:00
Dan Gohman	4c31524bec	Factor out the code for sign-extending/truncating gep indices and use it in x86 address mode folding. Also, make getRegForValue return 0 for illegal types even if it has a ValueMap for them, because Argument values are put in the ValueMap. This fixes PR3181. llvm-svn: 60696	2008-12-08 07:57:47 +00:00
Dale Johannesen	0733759b5a	Fix test to pass on Linux. llvm-svn: 60614	2008-12-05 22:38:21 +00:00
Dale Johannesen	9efd2ce55b	Make LoopStrengthReduce smarter about hoisting things out of loops when they can be subsumed into addressing modes. Change X86 addressing mode check to realize that some PIC references need an extra register. (I believe this is correct for Linux, if not, I'm sure someone will tell me.) llvm-svn: 60608	2008-12-05 21:47:27 +00:00
Evan Cheng	7a15646d69	This test also requires -mattr=+sse41. llvm-svn: 60601	2008-12-05 19:26:37 +00:00
Evan Cheng	fd8c4d5975	Effectively undo 60461 in PIC mode which simply transform V_SET0 / V_SETALLONES into a load from constpool in order to fold into restores. This is not safe to do when PIC base is being used for a number of reasons: 1. GlobalBaseReg may have been spilled. 2. It may not be live at the use. 3. Spiller doesn't know this is happening so it won't prevent GlobalBaseReg from being spilled later (That by itself is a nasty hack. It's needed because we don't insert the reload until later). llvm-svn: 60595	2008-12-05 17:23:48 +00:00
Evan Cheng	2a03c7e977	Re-did 60519. It turns out Darwin's handling of hidden visibility symbols are a bit more complicate than I expected. Both declarations and weak definitions still need a stub indirection. However, the stubs are in data section and they contain the addresses of the actual symbols. llvm-svn: 60571	2008-12-05 01:06:39 +00:00
Bill Wendling	6949f6135b	Temporarily revert r60519. It was causing a bootstrap failure: /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -DHAVE_CONFIG_H -I. -I../../../llvm-gcc.src/libgomp -I. -I../../../llvm-gcc.src/libgomp/config/posix -I../../../llvm-gcc.src/libgomp -Wall -pthread -Werror -O2 -g -O2 -MT barrier.lo -MD -MP -MF .deps/barrier.Tpo -c ../../../llvm-gcc.src/libgomp/barrier.c -fno-common -DPIC -o .libs/barrier.o checking for sys/file.h... /var/folders/zG/zGE-ZJOGFiGjv0B5cs5oYE+++TM/-Tmp-//cc34Jg5P.s:13:non-relocatable subtraction expression, "_gomp_tls_key" minus "L1$pb" /var/folders/zG/zGE-ZJOGFiGjv0B5cs5oYE+++TM/-Tmp-//cc34Jg5P.s:13:symbol: "_gomp_tls_key" can't be undefined in a subtraction expression make[4]: * [barrier.lo] Error 1 make[4]: * Waiting for unfinished jobs.... /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.obj/./gcc/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Sandbox/Buildbot/llvm/full-llvm/build/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -DHAVE_CONFIG_H -I. -I../../../llvm-gcc.src/libgomp -I. -I../../../llvm-gcc.src/libgomp/config/posix -I../../../llvm-gcc.src/libgomp -Wall -pthread -Werror -O2 -g -O2 -MT alloc.lo -MD -MP -MF .deps/alloc.Tpo -c ../../../llvm-gcc.src/libgomp/alloc.c -o alloc.o >/dev/null 2>&1 yes checking for sys/param.h... make[3]: * [all-recursive] Error 1 make[2]: * [all] Error 2 make[1]: * [all-target-libgomp] Error 2 make[1]: * Waiting for unfinished jobs.... llvm-svn: 60527	2008-12-04 04:07:00 +00:00
Evan Cheng	011c4fa8a1	Visibility hidden GVs do not require extra load of symbol address from the GOT or non-lazy-ptr. llvm-svn: 60519	2008-12-04 01:56:50 +00:00
Evan Cheng	1339e72d97	Use mmx (punpckldq VR64, (mmx_v_set0)) to clear high 32-bits of a VR64 register. llvm-svn: 60499	2008-12-03 19:38:05 +00:00
Evan Cheng	b5a97ff651	Fix test. llvm-svn: 60476	2008-12-03 08:20:45 +00:00
Dan Gohman	cc78cdf275	Mark x86's V_SET0 and V_SETALLONES with isSimpleLoad, and teach X86's foldMemoryOperand how to "fold" them, by converting them into constant-pool loads. When they aren't folded, they use xorps/cmpeqd, but for example when register pressure is high, they may now be folded as memory operands, which reduces register pressure. Also, mark V_SET0 isAsCheapAsAMove so that two-address-elimination will remat it instead of copying zeros around (V_SETALLONES was already marked). llvm-svn: 60461	2008-12-03 05:21:24 +00:00
Bill Wendling	e3402692d8	Change label to 'carry' for unsigned adds. llvm-svn: 60460	2008-12-03 02:43:12 +00:00
Dan Gohman	5d3d1f69e1	Fix byval arguments in the fastcc calling convention. The fastcc convention delegates to the regular x86-32 convention which handles byval, but only after it handles a few cases, and it's necessary to handle byval before handling those cases. This fixes PR3122 (and rdar://6400815), llvm-gcc miscompiling LLVM. llvm-svn: 60453	2008-12-03 01:28:04 +00:00
Dan Gohman	971c88f3b2	Add nounwind attributes to this test. llvm-svn: 60451	2008-12-03 01:10:18 +00:00
Dale Johannesen	b43a689520	testcases for recent dag combiner changes llvm-svn: 60449	2008-12-03 00:52:41 +00:00
Evan Cheng	1718fd4375	Fix PR3124: overly strict assert. llvm-svn: 60392	2008-12-02 02:15:36 +00:00
Bill Wendling	30e9dc81c8	Second stab at target-dependent lowering of everyone's favorite nodes: [SU]ADDO - LowerXADDO lowers [SU]ADDO into an ADD with an implicit EFLAGS define. The EFLAGS are fed into a SETCC node which has the conditional COND_O or COND_C, depending on the type of ADDO requested. - LowerBRCOND now recognizes if it's coming from a SETCC node with COND_O or COND_C set. llvm-svn: 60388	2008-12-02 01:06:39 +00:00
Chris Lattner	b2f131a4ab	Add rdar reference, make this actually fail when the patch isn't applied. llvm-svn: 60376	2008-12-01 22:35:31 +00:00
Dale Johannesen	069a4eee55	Consider only references to an IV within the loop when figuring out the base of the IV. This produces better code in the example. (Addresses use (IV) instead of (BASE,IV) - a significant improvement on low-register machines like x86). llvm-svn: 60374	2008-12-01 22:00:01 +00:00
Eli Friedman	c8228d263b	Followup to r60283: optimize arbitrary width signed divisions as well as unsigned divisions. Same caveats as before. llvm-svn: 60284	2008-11-30 06:35:39 +00:00
Eli Friedman	1b7fc154a5	Fix for PR2164: allow transforming arbitrary-width unsigned divides into multiplies. Some more cleverness would be nice, though. It would be nice if we could do this transformation on illegal types. Also, we would prefer a narrower constant when possible so that we can use a narrower multiply, which can be cheaper. llvm-svn: 60283	2008-11-30 06:02:26 +00:00
Eli Friedman	bd0f57821a	APIntify a test which is potentially unsafe otherwise, and fix the nearby FIXME. I'm not sure what the right way to fix the Cell test was; if the approach I used isn't okay, please let me know. llvm-svn: 60277	2008-11-30 04:59:26 +00:00
Bill Wendling	077eb6fcc2	XFAil test due to reverting of patch. llvm-svn: 60161	2008-11-27 07:34:10 +00:00
Evan Cheng	3761143755	Avoid inserting noop's in the middle of a loop. llvm-svn: 60141	2008-11-27 01:16:00 +00:00
Evan Cheng	83bdb38965	On x86 favors folding short immediate into some arithmetic operations (e.g. add, and, xor, etc.) because materializing an immediate in a register is expensive in turns of code size. e.g. movl 4(%esp), %eax addl $4, %eax is 2 bytes shorter than movl $4, %eax addl 4(%esp), %eax llvm-svn: 60139	2008-11-27 00:49:46 +00:00
Evan Cheng	d1dda5339d	Add -march=x86. llvm-svn: 60135	2008-11-27 00:37:06 +00:00
Bill Wendling	a69ced6b68	Add x86-specific test for add-with-overflow intrinsics. llvm-svn: 60125	2008-11-26 22:42:19 +00:00
Chris Lattner	397a11ccd8	Turn on my codegen prepare heuristic by default. It doesn't affect performance in most cases on the Grawp tester, but does speed some things up (like shootout/hash by 15%). This also doesn't impact compile time in a noticable way on the Grawp tester. It also, of course, gets the testcase it was designed for right :) llvm-svn: 60120	2008-11-26 22:16:44 +00:00
Chris Lattner	eb3e4fb6fb	This adds in some code (currently disabled unless you pass -enable-smarter-addr-folding to llc) that gives CGP a better cost model for when to sink computations into addressing modes. The basic observation is that sinking increases register pressure when part of the addr computation has to be available for other reasons, such as having a use that is a non-memory operation. In cases where it works, it can substantially reduce register pressure. This code is currently an overall win on 403.gcc and 255.vortex (the two things I've been looking at), but there are several things I want to do before enabling it by default: 1. This isn't doing any caching of results, so it is much slower than it could be. It currently slows down release-asserts llc by 1.7% on 176.gcc: 27.12s -> 27.60s. 2. This doesn't think about inline asm memory operands yet. 3. The cost model botches the case when the needed value is live across the computation for other reasons. I'll continue poking at this, and eventually turn it on as llcbeta. llvm-svn: 60074	2008-11-26 02:00:14 +00:00
Chris Lattner	a9ab165b08	Teach CodeGenPrepare to look through Bitcast instructions when attempting to optimize addressing modes. This allows us to optimize things like isel-sink2.ll into: movl 4(%esp), %eax cmpb $0, 4(%eax) jne LBB1_2 ## F LBB1_1: ## TB movl $4, %eax ret LBB1_2: ## F movzbl 7(%eax), %eax ret instead of: _test: movl 4(%esp), %eax cmpb $0, 4(%eax) leal 4(%eax), %eax jne LBB1_2 ## F LBB1_1: ## TB movl $4, %eax ret LBB1_2: ## F movzbl 3(%eax), %eax ret This shrinks (e.g.) 403.gcc from 1133510 to 1128345 lines of .s. Note that the 2008-10-16-SpillerBug.ll testcase is dubious at best, I doubt it is really testing what it thinks it is. llvm-svn: 60068	2008-11-26 00:26:16 +00:00
Chris Lattner	f0e01def8c	fix an over-reduced test. llvm-svn: 60067	2008-11-26 00:12:08 +00:00
Chris Lattner	0f98f74c74	this doesn't need EH llvm-svn: 60066	2008-11-26 00:03:26 +00:00
Dan Gohman	ad2134d45d	Initial support for anti-dependence breaking. Currently this code does not introduce any new spilling; it just uses unused registers. Refactor the SUnit topological sort code out of the RRList scheduler and make use of it to help with the post-pass scheduler. llvm-svn: 59999	2008-11-25 00:52:40 +00:00
Owen Anderson	1af37c2fca	Add support for rematerialization in pre-alloc-splitting. llvm-svn: 59587	2008-11-19 04:28:29 +00:00
Lang Hames	41df63945d	Removed 2008-10-17-SpillerBug.ll as it does not provide an accurate test of PR2898. llvm-svn: 59431	2008-11-16 23:30:12 +00:00
Lang Hames	782ec1a746	2008-10-17-SpillerBug.ll is currently failing, but this doesn't reflect an actual regression of PR2898. This test should probably be removed. I've XFAILed it for now to keep buildbot quiet while this is considered. llvm-svn: 59415	2008-11-16 13:11:09 +00:00
Mon P Wang	7a82474387	Improved shuffle normalization to avoid using extract/build when we can extract using different indexes for two vectors. Added a few tests for vector shuffles. llvm-svn: 59399	2008-11-16 05:06:27 +00:00
Dan Gohman	072734ebd6	Remove the FlaggedNodes member from SUnit. Instead of requiring each SUnit to carry a SmallVector of flagged nodes, just calculate the flagged nodes dynamically when they are needed. The local-liveness change is due to a trivial scheduling change where the scheduler arbitrary decision differently. llvm-svn: 59273	2008-11-13 23:24:17 +00:00
Dale Johannesen	b47f6d3237	testcase for PR 1779. llvm-svn: 59268	2008-11-13 22:17:10 +00:00
Dale Johannesen	ffc67df2aa	Fix the testb optimization so x86 also bootstraps. Reenable test. llvm-svn: 59101	2008-11-12 02:00:35 +00:00
Bill Wendling	b85755c829	Temporarily revert r58979 and related patch. It's causing a failure in X86 bootstrap: Comparing stages 2 and 3 warning: ./cc1-checksum.o differs warning: ./cc1obj-checksum.o differs warning: ./cc1objplus-checksum.o differs warning: ./cc1plus-checksum.o differs Bootstrap comparison failure! ./alias.o differs ./alloc-pool.o differs ./attribs.o differs ./bb-reorder.o differs ./bitmap.o differs ./build/errors.o differs ./build/genattrtab.o differs ./build/genautomata.o differs ./build/genemit.o differs ./build/genextract.o differs ... -bw llvm-svn: 59003	2008-11-10 21:22:06 +00:00
Duncan Sands	d5b53e1c6c	When promoting the result of fp_to_uint/fp_to_sint, inform the optimizers that the result must be zero/ sign extended from the smaller type. For example, if a fp to unsigned i16 is promoted to fp to i32, then we are allowed to assume that the extra 16 bits are zero (because the result of fp to i16 is undefined if the result does not fit in an i16). This is quite aggressive, but should help the optimizers produce better code. This requires correcting a test which thought that fp_to_uint is some kind of truncation, which it is not: in the testcase (which does fp to i1), either the fp value converts to 0 or 1 or the result is undefined, which is quite different to truncation. llvm-svn: 58991	2008-11-10 17:28:30 +00:00
Dale Johannesen	23be3fd970	Reenable test. llvm-svn: 58980	2008-11-10 07:30:32 +00:00
Duncan Sands	3b36fd87a4	XFAIL this while waiting for a fix. llvm-svn: 58934	2008-11-09 13:07:47 +00:00
Dale Johannesen	a129e0b826	Testcase for testb optimization. llvm-svn: 58827	2008-11-07 01:30:18 +00:00
Evan Cheng	27889ab29f	Add more vector move low and zero-extend patterns. llvm-svn: 58752	2008-11-05 06:04:51 +00:00
Dan Gohman	b9110e7fbb	The ANDMask node folds to a constant, and isn't the node that needs to have its node id set. The new and and shift nodes are the nodes that need the IDs. This fixes PR2982. llvm-svn: 58655	2008-11-03 23:43:55 +00:00
Dan Gohman	d7546abb8a	Change how extended types are represented in MVTs. Instead of fiddling bits, use a union of a SimpleValueType enum and a regular Type*. This increases the size of MVT on 64-bit hosts from 32 bits to 64 bits. In most cases, this doesn't add significant overhead. There are places in codegen that use arrays of MVTs, so these are now larger, but they're small in common cases. This eliminates restrictions on the size of integer types and vector types that can be represented in codegen. As the included testcase demonstrates, it's now possible to codegen very large add operations. There are still some complications with using very large types. PR2880 is still open so they can't be used as return values on normal targets, there are no libcalls defined for very large integers so operations like multiply and divide aren't supported. This also introduces a minimal tablgen Type library, capable of handling IntegerType and VectorType. This will allow parts of TableGen that don't depend on using SimpleValueType values to handle arbitrary integer and vector types. llvm-svn: 58623	2008-11-03 17:56:27 +00:00
Duncan Sands	0207a3f897	Make VAARG work with x86 long double (which is 10 bytes long, but is passed in 12/16 bytes). llvm-svn: 58608	2008-11-03 11:51:11 +00:00
Dan Gohman	99cdf8893e	Use MOVSSmr instead of EXTRACTPSmr in the case of extracting vector element 0 for a store, as it's smaller and faster. llvm-svn: 58483	2008-10-31 00:57:24 +00:00
Duncan Sands	fbb10bbec4	Fix PR2977: LegalizeTypes support for expanding VAARG. llvm-svn: 58379	2008-10-29 14:25:28 +00:00
Evan Cheng	ce3ccc1ea0	- More pre-split fixes: spill slot live interval computation bug; restore point bug. - If a def is spilt, remember its spill index to allow its reuse. llvm-svn: 58375	2008-10-29 08:39:34 +00:00
Chris Lattner	38461f6b2f	Fix a nasty miscompilation of 176.gcc on linux/x86 where we synthesized a memset using 16-byte XMM stores, but where the stack realignment code didn't work. Until it does (PR2962) disable use of xmm regs in memcpy and memset formation for linux and other targets with insufficiently aligned stacks. This is part of PR2888 llvm-svn: 58317	2008-10-28 05:49:35 +00:00
Evan Cheng	fab31680e1	Avoid putting a split past the end of the live range; always shrink wrap live interval in the barrier mbb. llvm-svn: 58309	2008-10-28 00:47:49 +00:00
Evan Cheng	f46642ada6	Remove val# defined by a remat'ed def that is now dead. llvm-svn: 58294	2008-10-27 23:21:01 +00:00
Duncan Sands	8475d56794	Turn on LegalizeTypes, the new type legalization codegen infrastructure, by default. Please report any breakage to the mailing lists. llvm-svn: 58232	2008-10-27 08:42:46 +00:00
Evan Cheng	f713722975	For now, don't split live intervals around x87 stack register barriers. FpGET_ST0_80 must be right after a call instruction (and ADJCALLSTACKUP) so we need to find a way to prevent reload of x87 registers between them. llvm-svn: 58230	2008-10-27 07:14:50 +00:00
Evan Cheng	ed033ede22	Do not shrink wrap live interval in a mbb if it's livein any of its successor blocks. The mbb can be revisited again after all of the successors are processed. llvm-svn: 58184	2008-10-26 07:49:03 +00:00
Evan Cheng	f48367b8e9	Handle cases where there aren't uses in the barrier mbb. llvm-svn: 58174	2008-10-25 23:49:39 +00:00
Evan Cheng	85d71d4588	If val# def is ~0U, meaning it's defined by a PHI, and it's previously split, spill before the barrier because it's impossible to determine if all the defs are spilled in the same spill slot. llvm-svn: 58129	2008-10-25 00:52:41 +00:00
Dale Johannesen	e45896fc4f	Be kind to non-x86 hosts. llvm-svn: 58113	2008-10-24 21:20:25 +00:00
Duncan Sands	014f5bbaad	Fix translateX86CC: if SetCCOpcode is SETULE and LHS is a foldable load, then LHS and RHS are swapped and SetCCOpcode is changed to SETUGT. But the later code is expecting operands to be the wrong way round for SETUGT, but they are not in this case, resulting in an inverted compare. The solution is to move the load normalization before the correction for SETUGT. This bug was tickled by LegalizeTypes which happened to legalize the testcase slightly differently to LegalizeDAG. llvm-svn: 58092	2008-10-24 13:03:10 +00:00
Evan Cheng	4bac4d0a16	Avoid splitting an interval multiple times; avoid splitting re-materializable val# (for now). llvm-svn: 58068	2008-10-24 02:05:00 +00:00
Dan Gohman	8b44b88eff	Fix SelectionDAGBuild lowering of Select instructions to handle first-class aggregate values. Also, fix a bug in the Ret handling for empty aggregates. llvm-svn: 57925	2008-10-21 20:00:42 +00:00
Chris Lattner	192f27cb5c	really fix run line llvm-svn: 57889	2008-10-21 03:55:19 +00:00
Chris Lattner	b4ee2aebb5	fix run line llvm-svn: 57888	2008-10-21 03:54:49 +00:00
Chris Lattner	0b641e4718	remove some unneeded eh generation llvm-svn: 57887	2008-10-21 03:49:19 +00:00
Dan Gohman	269246b034	Don't create TargetGlobalAddress nodes with offsets that don't fit in the 32-bit signed offset field of addresses. Even though this may be intended, some linkers refuse to relocate code where the relocated address computation overflows. Also, fix the sign-extension of constant offsets to use the actual pointer size, rather than the size of the GlobalAddress node, which may be different, for example on x86-64 where MVT::i32 is used when the address is being fit into the 32-bit displacement field. llvm-svn: 57885	2008-10-21 03:38:42 +00:00
Dan Gohman	97d95d6d85	Optimized FCMP_OEQ and FCMP_UNE for x86. Where previously LLVM might emit code like this: ucomisd %xmm1, %xmm0 setne %al setp %cl orb %al, %cl jne .LBB4_2 it now emits this: ucomisd %xmm1, %xmm0 jne .LBB4_2 jp .LBB4_2 It has fewer instructions and uses fewer registers, but it does have more branches. And in the case that this code is followed by a non-fallthrough edge, it may be followed by a jmp instruction, resulting in three branch instructions in sequence. Some effort is made to avoid this situation. To achieve this, X86ISelLowering.cpp now recognizes FCMP_OEQ and FCMP_UNE in lowered form, and replace them with code that emits two branches, except in the case where it would require converting a fall-through edge to an explicit branch. Also, X86InstrInfo.cpp's branch analysis and transform code now knows now to handle blocks with multiple conditional branches. It uses loops instead of having fixed checks for up to two instructions. It can now analyze and transform code generated from FCMP_OEQ and FCMP_UNE. llvm-svn: 57873	2008-10-21 03:29:32 +00:00
Dan Gohman	c835458da9	When the coalescer is doing rematerializing, have it remove the copy instruction from the instruction list before asking the target to create the new instruction. This gets the old instruction out of the way so that it doesn't interfere with the target's rematerialization code. In the case of x86, this helps it find more cases where EFLAGS is not live. Also, in the X86InstrInfo.cpp, teach isSafeToClobberEFLAGS to check to see if it reached the end of the block after scanning each instruction, instead of just before. This lets it notice when the end of the block is only two instructions away, without doing any additional scanning. These changes allow rematerialization to clobber EFLAGS in more cases, for example using xor instead of mov to set the return value to zero in the included testcase. llvm-svn: 57872	2008-10-21 03:24:31 +00:00
Chris Lattner	4396e0d2c3	Fix gcc.c-torture/compile/920520-1.c by inserting bitconverts for strange asm conditions earlier. In this case, we have a double being passed in an integer reg class. Convert to like sized integer register so that we allocate the right number for the class (two i32's for the f64 in this case). llvm-svn: 57862	2008-10-21 00:45:36 +00:00
Dan Gohman	2fe6bee5b6	Teach DAGCombine to fold constant offsets into GlobalAddress nodes, and add a TargetLowering hook for it to use to determine when this is legal (i.e. not in PIC mode, etc.) This allows instruction selection to emit folded constant offsets in more cases, such as the included testcase, eliminating the need for explicit arithmetic instructions. This eliminates the need for the C++ code in X86ISelDAGToDAG.cpp that attempted to achieve the same effect, but wasn't as effective. Also, fix handling of offsets in GlobalAddressSDNodes in several places, including changing GlobalAddressSDNode's offset from int to int64_t. The Mips, Alpha, Sparc, and CellSPU targets appear to be unaware of GlobalAddress offsets currently, so set the hook to false on those targets. llvm-svn: 57748	2008-10-18 02:06:02 +00:00
Evan Cheng	94169f1021	Fix PR2898. Spiller delete a store for reuse before it knows for sure the reuse happened. Patch by Lang Hames! llvm-svn: 57720	2008-10-17 20:56:41 +00:00
Chris Lattner	c7e65f4377	Fix a bug where the x86 backend would reject 64-bit r constraints when in 32-bit mode instead of assigning a register pair. This has nothing to do with PR2356, but I happened to notice it while working on it. llvm-svn: 57704	2008-10-17 17:59:52 +00:00
Evan Cheng	08acb24225	Fix a very subtle spiller bug: UpdateKills should not forget to track defs of aliases. llvm-svn: 57673	2008-10-17 06:16:07 +00:00
Dan Gohman	ca0546facc	Fun x86 encoding tricks: when adding an immediate value of 128, use a SUB instruction instead of an ADD, because -128 can be encoded in an 8-bit signed immediate field, while +128 can't be. This avoids the need for a 32-bit immediate field in this case. A similar optimization applies to 64-bit adds with 0x80000000, with the 32-bit signed immediate field. To support this, teach tablegen how to handle 64-bit constants. llvm-svn: 57663	2008-10-17 01:33:43 +00:00
Dan Gohman	a39b0a1f05	Define patterns for shld and shrd that match immediate shift counts, and patterns that match dynamic shift counts when the subtract is obscured by a truncate node. Add DAGCombiner support for recognizing rotate patterns when the shift counts are defined by truncate nodes. Fix and simplify the code for commuting shld and shrd instructions to work even when the given instruction doesn't have a parent, and when the caller needs a new instruction. These changes allow LLVM to use the shld, shrd, rol, and ror instructions on x86 to replace equivalent code using two shifts and an or in many more cases. llvm-svn: 57662	2008-10-17 01:23:35 +00:00
Dan Gohman	016f16daf1	Fix this test so it actually runs the grep lines. llvm-svn: 57653	2008-10-16 23:57:54 +00:00
Duncan Sands	7451f87273	Testcase for PR2762. llvm-svn: 57633	2008-10-16 08:56:46 +00:00
Evan Cheng	3b0f5e4d61	- Add target lowering hooks that specify which setcc conditions are illegal, i.e. conditions that cannot be checked with a single instruction. For example, SETONE and SETUEQ on x86. - Teach legalizer to implement illegal setcc as a and / or of a number of legal setcc nodes. For now, only implement FP conditions. e.g. SETONE is implemented as SETO & SETNE, SETUEQ is SETUO \| SETEQ. - Move x86 target over. llvm-svn: 57542	2008-10-15 02:05:31 +00:00
Dan Gohman	56b6885104	When doing the very-late shift-and address-mode optimization, create a new DAG node to represent the new shift to keep the DAG consistent, even though it'll almost always be folded into the address. If a user of the resulting address has multiple uses, the nodes may get revisited by a later MatchAddress call, in which case DAG inconsistencies do matter. This fixes PR2849. llvm-svn: 57465	2008-10-13 20:52:04 +00:00
Evan Cheng	4c499c4fa6	Also update sub-register intervals after a trivial computation is rematt'ed for a copy instruction. PR2775. llvm-svn: 57458	2008-10-13 18:35:52 +00:00
Evan Cheng	762f0f53ec	Add a test case for _Complex passed as a FCA. llvm-svn: 57456	2008-10-13 18:13:07 +00:00
Chris Lattner	2753955fc0	Change CALLSEQ_BEGIN and CALLSEQ_END to take TargetConstant's as parameters instead of raw Constants. This prevents the constants from being selected by the isel pass, fixing PR2735. llvm-svn: 57385	2008-10-11 22:08:30 +00:00
Dan Gohman	60ad173dfe	Remove -disable-fast-isel. Use cl::boolOrDefault with -fast-isel instead. So now: -fast-isel or -fast-isel=true enable fast-isel, and -fast-isel=false disables it. Fast-isel is also on by default with -fast, and off by default otherwise. llvm-svn: 57270	2008-10-07 23:00:56 +00:00
Dan Gohman	b8118fd432	Add a testcase for i256 add. i256 isn't fully supported in codegen right now, but add and subtract work. llvm-svn: 57260	2008-10-07 20:39:12 +00:00
Anders Carlsson	1699ad9030	Certain patterns involving the "movss" instruction were marked as requiring SSE2, when in reality movss is an SSE1 instruction. llvm-svn: 57246	2008-10-07 16:14:11 +00:00
Dale Johannesen	6c6729f3a8	Be more precise about which conversions of NaNs are Inexact. (These are not Inexact as defined by IEEE754, but that seems like a reasonable way to abstract what happens: information is lost.) llvm-svn: 57218	2008-10-06 22:59:10 +00:00
Evan Cheng	94d14f2d45	Fix PR2850 and PR2863. Only generate movddup for 128-bit SSE vector shuffles. llvm-svn: 57210	2008-10-06 21:13:08 +00:00
Anton Korobeynikov	b52ef06c8c	Revert r56675 - it breaks unwinding runtime everywhere. llvm-svn: 57048	2008-10-04 11:09:36 +00:00
Dan Gohman	78bb44fcd4	Fix a bug in the local allocator's liveness computation where it was setting kill flags on tied uses in two-address instructions. The kill flags were causing the allocator to think it could allocate the use and its tied def in different registers. llvm-svn: 57039	2008-10-04 00:31:14 +00:00
Dale Johannesen	867d549fce	Handle some 64-bit atomics on x86-32, some of the time. llvm-svn: 56963	2008-10-02 18:53:47 +00:00
Dan Gohman	88536398ff	Fix a think-o in isSafeToMove. This fixes it from thinking that volatile memory references are safe to move. llvm-svn: 56948	2008-10-02 15:04:30 +00:00
Dan Gohman	dfc507d2b5	Disable fast-isel for this test, as it doesn't emit the same number of instructions. llvm-svn: 56940	2008-10-01 23:48:35 +00:00
Devang Patel	1b76f2c40b	Remove OptimizeForSize global. Use function attribute optsize. llvm-svn: 56937	2008-10-01 23:18:38 +00:00
Dan Gohman	5c8c00af1f	Split this test and move it into target-specific directories. This fixes failures on configurations that don't have one or the other targets enabled. llvm-svn: 56926	2008-10-01 19:46:30 +00:00
Dan Gohman	7354227de0	nounwind-ify this test. llvm-svn: 56918	2008-10-01 15:07:14 +00:00
Bill Wendling	920f6d588e	Moved this option to the front-end. llvm-svn: 56901	2008-10-01 01:02:18 +00:00
Dan Gohman	1df16dff64	Use explicit target-triples to unbreak this test on non-darwin systems. llvm-svn: 56896	2008-10-01 00:25:38 +00:00
Bill Wendling	1782584f56	Just don't transform this memset into "bzero" if no-builtin is specified. llvm-svn: 56888	2008-09-30 22:05:33 +00:00
Bill Wendling	e818bc159f	- Initialize "--no-builtin" to "false". - Testcase for r56885. llvm-svn: 56886	2008-09-30 21:40:30 +00:00
Evan Cheng	9156bd2f48	Re-apply 56835 along with header file changes. llvm-svn: 56848	2008-09-30 15:44:16 +00:00
Duncan Sands	2b9adce1d0	Revert commit 56835 since it breaks the build. "If a re-materializable instruction has a register operand, the spiller will change the register operand's spill weight to HUGE_VAL to avoid it being spilled. However, if the operand is already in the queue ready to be spilled, avoid re-materializing it". llvm-svn: 56837	2008-09-30 10:00:30 +00:00
Evan Cheng	9469049f7d	If a re-materializable instruction has a register operand, the spiller will change the register operand's spill weight to HUGE_VAL to avoid it being spilled. However, if the operand is already in the queue ready to be spilled, avoid re-materializing it. llvm-svn: 56835	2008-09-30 06:36:58 +00:00
Evan Cheng	82237f2f42	Fix PR2835. Do not change the width of a volatile load. llvm-svn: 56792	2008-09-29 17:26:18 +00:00
Evan Cheng	3774b2f292	Re-apply 56683 with fixes. llvm-svn: 56748	2008-09-27 01:56:22 +00:00
Evan Cheng	7d6fa97567	Implement "punpckldq %xmm0, $xmm0" as "pshufd $0x50, %xmm0, %xmm" unless optimizing for code size. llvm-svn: 56711	2008-09-26 23:41:32 +00:00
Bill Wendling	c966a737c5	Temporarily reverting r56683. This is causing a failure during the build of llvm-gcc: /Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/xgcc -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.obj/./gcc/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/bin/ -B/Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/lib/ -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/include -isystem /Volumes/Gir/devel/llvm/clean/llvm-gcc.install/i386-apple-darwin9.5.0/sys-include -mmacosx-version-min=10.4 -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -pipe -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. -I../../llvm-gcc.src/gcc -I../../llvm-gcc.src/gcc/. -I../../llvm-gcc.src/gcc/../include -I./../intl -I../../llvm-gcc.src/gcc/../libcpp/include -I../../llvm-gcc.src/gcc/../libdecnumber -I../libdecnumber -I/Volumes/Gir/devel/llvm/clean/llvm.obj/include -I/Volumes/Gir/devel/llvm/clean/llvm.src/include -fexceptions -fvisibility=hidden -DHIDE_EXPORTS -c ../../llvm-gcc.src/gcc/unwind-dw2-fde-darwin.c -o libgcc/./unwind-dw2-fde-darwin.o Assertion failed: (TargetRegisterInfo::isVirtualRegister(regA) && TargetRegisterInfo::isVirtualRegister(regB) && "cannot update physical register live information"), function runOnMachineFunction, file /Volumes/Gir/devel/llvm/clean/llvm.src/lib/CodeGen/TwoAddressInstructionPass.cpp, line 311. ../../llvm-gcc.src/gcc/unwind-dw2.c:1527: internal compiler error: Abort trap Please submit a full bug report, with preprocessed source if appropriate. See <URL:http://developer.apple.com/bugreporter> for instructions. {standard input}:3521:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb" {standard input}:3521:symbol: "_dwarf_reg_size_table" can't be undefined in a subtraction expression {standard input}:3520:non-relocatable subtraction expression, "_dwarf_reg_size_table" minus "L20$pb" ... llvm-svn: 56703	2008-09-26 22:10:44 +00:00
Evan Cheng	d77cbe8947	Fix @llvm.frameaddress codegen. FP elimination optimization should be disabled when frame address is desired. Also add support for depth > 0. llvm-svn: 56683	2008-09-26 19:48:35 +00:00
Evan Cheng	994dd0bbec	Avoid spilling EBP / RBP twice in the prologue. llvm-svn: 56675	2008-09-26 19:14:21 +00:00
Evan Cheng	9dbe45c000	Prefer movlhps over punpcklqdq, etc. in more cases. llvm-svn: 56627	2008-09-25 23:35:16 +00:00
Evan Cheng	74c9ed91b0	With sse3 and when the source is a load or has multiple uses, favors movddup over shuffp*, pshufd, etc. Without sse3 or when the source is from a register, make use of movlhps llvm-svn: 56620	2008-09-25 20:50:48 +00:00
Dale Johannesen	c50ada2f56	Accept 'inreg' attribute on x86 functions as meaning sse_regparm (i.e. float/double values go in XMM0 instead of ST0). Update documentation to reflect reality. llvm-svn: 56619	2008-09-25 20:47:45 +00:00
Evan Cheng	f8ead16b50	Fix patterns for SSE4.1 move and sign extend instructions. Also add instructions which fold VZEXT_MOVL and VZEXT_LOAD. llvm-svn: 56594	2008-09-24 23:27:55 +00:00
Dale Johannesen	86d421df23	Remove SelectionDag early allocation of registers for earlyclobbers. Teach Local RA about earlyclobber, and add some tests for it. llvm-svn: 56592	2008-09-24 23:13:09 +00:00
Evan Cheng	e0add20c1b	Properly handle 'm' inline asm constraints. If a GV is being selected for the addressing mode, it requires the same logic for PIC relative addressing, etc. llvm-svn: 56526	2008-09-24 00:05:32 +00:00
Evan Cheng	9e9426cb82	Support x86 specific inline asm modifier 'J'. llvm-svn: 56483	2008-09-22 23:57:37 +00:00
Arnold Schwaighofer	796a271c5f	Change the calling convention used when tail call optimization is enabled from CC_X86_32_TailCall to CC_X86_32_FastCC. llvm-svn: 56436	2008-09-22 14:50:07 +00:00
Evan Cheng	c042000649	Fix PR2808. When regalloc runs out of register, it spill a physical register around the live interval being allocated. Do not continue to try to spill another register, just grab the physical register and move on. llvm-svn: 56381	2008-09-20 01:28:05 +00:00
Evan Cheng	65502487b7	Clean up the test. llvm-svn: 56380	2008-09-20 01:26:27 +00:00
Evan Cheng	4730522235	No need to print function stubs for Mac OS X 10.5 and up. Linker will handle it. llvm-svn: 56378	2008-09-20 00:13:45 +00:00
Dan Gohman	9801ba451a	Refactor X86SelectConstAddr, folding it into X86SelectAddress. This results in better code for globals. Also, unbreak the local CSE for GlobalValue stub loads. llvm-svn: 56371	2008-09-19 22:16:54 +00:00
Evan Cheng	4c0197043c	Re-materalized definition instructions may be dead. Whack them. llvm-svn: 56352	2008-09-19 17:38:47 +00:00
Dale Johannesen	f8610ebebc	Add a bit to mark operands of asm's that conflict with an earlyclobber operand elsewhere. Propagate this bit and the earlyclobber bit through SDISel. Change linear-scan RA not to allocate regs in a way that conflicts with an earlyclobber. See also comments. llvm-svn: 56290	2008-09-17 21:13:11 +00:00
Dan Gohman	68e7735a38	Teach LSR to optimize away SMAX operations for tripcounts in common cases. See the comment above OptimizeSMax for the full story, and the testcase for an example. This cancels out a pessimization commonly attributed to indvars, and will allow us to lift some of the artificial throttles in indvars, rather than add new ones. llvm-svn: 56230	2008-09-15 21:22:06 +00:00
Arnold Schwaighofer	33ad850d93	Add indirect tail call (function pointer) examples. llvm-svn: 56127	2008-09-11 22:24:28 +00:00
Arnold Schwaighofer	dd45bc25ac	When tailcallopt is enabled all fastcc calls must have an aligned argument stack size. Add a test case. llvm-svn: 56119	2008-09-11 20:28:43 +00:00
Evan Cheng	5456a37280	Fix PR2748. Avoid coalescing physical register with virtual register which would create illegal extract_subreg. e.g. vr1024 = extract_subreg vr1025, 1 ... vr1024 = mov8rr AH If vr1024 is coalesced with AH, the extract_subreg is now illegal since AH does not have a super-reg whose sub-register 1 is AH. llvm-svn: 56118	2008-09-11 20:07:10 +00:00
Evan Cheng	4c9fbbb511	Fix PR2783 - coalescer bug. Missing a TargetRegisterInfo::isVirtualRegister check. llvm-svn: 56112	2008-09-11 18:40:32 +00:00
Evan Cheng	b401449ceb	Propagate subreg index when promoting a load to a copy. llvm-svn: 56085	2008-09-11 01:02:12 +00:00

1 2 3 4 5 ...

913 Commits