llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	b9e8e18949	Don't insert 128-bit UNDEF into 256-bit vectors. Just keep the 256-bit vector. Original patch by Elena Demikhovsky. Tweaked by me to allow possibility of covering more cases. llvm-svn: 158792	2012-06-20 05:39:26 +00:00
Andrew Trick	ff2ed7b687	A new algorithm for computing LoopInfo. Temporarily disabled. -stable-loops enables a new algorithm for generating the Loop forest. It differs from the original algorithm in a few respects: - Not determined by use-list order. - Initially guarantees RPO order of block and subloops. - Linear in the number of CFG edges. - Nonrecursive. I didn't want to change the LoopInfo API yet, so the block lists are still inclusive. This seems strange to me, and it means that building LoopInfo is not strictly linear, but it may not be a problem in practice. At least the block lists start out in RPO order now. In the future we may add an attribute or wrapper analysis that allows other passes to assume RPO order. The primary motivation of this work was not to optimize LoopInfo, but to allow reproducing performance issues by decomposing the compilation stages. I'm often unable to do this with the current LoopInfo, because the loop tree order determines Loop pass order. Serializing the IR tends to invert the order, which reverses the optimization order. This makes it nearly impossible to debug interdependent loop optimizations such as LSR. I also believe this will provide more stable performance results across time. llvm-svn: 158790	2012-06-20 05:23:33 +00:00
Francois Pichet	5dc987a3d1	Unbreak the MSVC build: add return to unimplemented functions. llvm-svn: 158788	2012-06-20 04:08:49 +00:00
Andrew Trick	cda51d430d	Move the implementation of LoopInfo into LoopInfoImpl.h. The implementation only needs inclusion from LoopInfo.cpp and MachineLoopInfo.cpp. Clients of the interface should only include the interface. This makes the interface readable and speeds up rebuilds after modifying the implementation. llvm-svn: 158787	2012-06-20 03:42:09 +00:00
Nick Kledzik	18497e9242	Add permissions(), map_file_pages(), and unmap_file_pages() to llvm::sys::fs and add unit test. Unix is implemented. Windows side needs to be implemented. llvm-svn: 158770	2012-06-20 00:28:54 +00:00
Kaelyn Uhrain	2212f807c7	Don't assert when given an empty range. llvm::RawMemoryObject handles empty ranges just fine, and the assert can be triggered in the wild by e.g. invoking clang with a file that included an empty pre-compiled header file when clang has been built with assertions enabled. Without assertions enabled, clang will properly report that the empty file is not a valid PCH. llvm-svn: 158769	2012-06-20 00:16:40 +00:00
Jakob Stoklund Olesen	3802bbf35e	Add regunit liveness support to LiveIntervals::handleMove(). When LiveIntervals is tracking fixed interference in regunits, make sure to update those intervals as well. Currently guarded by -live-regunits. llvm-svn: 158766	2012-06-19 23:50:18 +00:00
Chad Rosier	651f9a485a	Tidy up. llvm-svn: 158762	2012-06-19 23:37:57 +00:00
Chad Rosier	7369692790	Add an ensureMaxAlignment() function to MachineFrameInfo (analogous to ensureAlignment() in MachineFunction). Also, drop setMaxAlignment() in favor of this new function. This creates a main entry point to setting MaxAlignment, which will be helpful for future work. No functionality change intended. llvm-svn: 158758	2012-06-19 22:59:12 +00:00
Lang Hames	39fb1d08dc	Add DAG-combines for aggressive FMA formation. This patch adds DAG combines to form FMAs from pairs of FADD + FMUL or FSUB + FMUL. The combines are performed when: (a) Either AllowExcessFPPrecision option (-enable-excess-fp-precision for llc) OR UnsafeFPMath option (-enable-unsafe-fp-math) are set, and (b) TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) is true for the type of the FADD/FSUB, and (c) The FMUL only has one user (the FADD/FSUB). If your target has fast FMA instructions you can make use of these combines by overriding TargetLoweringInfo::isFMAFasterThanMulAndAdd(VT) to return true for types supported by your FMA instruction, and adding patterns to match ISD::FMA to your FMA instructions. llvm-svn: 158757	2012-06-19 22:51:23 +00:00
Jakob Stoklund Olesen	2db1125b15	80 col. llvm-svn: 158755	2012-06-19 22:50:53 +00:00
Jakob Stoklund Olesen	0f855e4263	Implement PPCInstrInfo::isCoalescableExtInstr(). The PPC::EXTSW instruction preserves the low 32 bits of its input, just like some of the x86 instructions. Use it to reduce register pressure when the low 32 bits have multiple uses. This requires a small change to PeepholeOptimizer since EXTSW takes a 64-bit input register. This is related to PR5997. llvm-svn: 158743	2012-06-19 21:14:34 +00:00
Jakob Stoklund Olesen	8eb9905a7c	Style: Don't reuse variables for multiple purposes. No functional change. llvm-svn: 158742	2012-06-19 21:10:18 +00:00
Chandler Carruth	198422a475	Fix PR13148, an inf-loop in StringMap. StringMap suffered from the same bug as DenseMap: when you explicitly construct it with a small number of buckets, you can arrange for the tombstone-based growth path to be followed when the number of buckets was less than '8'. In that case, even with a full map, it would compare '0' as not less than '0', and refuse to grow the table, leading to inf-loops trying to find an empty bucket on the next insertion. The fix is very simple: use '<=' as the comparison. The same fix was applied to DenseMap as well during its recent refactoring. Thanks to Alex Bolz for the great report and test case. =] llvm-svn: 158725	2012-06-19 17:40:35 +00:00
Benjamin Kramer	83aa94711b	Emit TableGen's header comment with C-style comments, so it can be used from C89 code. Should silence warnings when compiling the X86 disassembler. llvm-svn: 158723	2012-06-19 17:04:16 +00:00
Jan Wen Voung	7f5d79f864	Have ARM ELF use correct reloc for "b" instr. The condition code didn't actually matter for arm "b" instructions, unlike "bl". It should just use the R_ARM_JUMP24 reloc. llvm-svn: 158722	2012-06-19 16:03:02 +00:00
Hal Finkel	d465810f7c	Mark most PPC register classes to avoid write-after-write. For processors with the G5-like instruction-grouping scheme, this helps avoid early group termination due to a write-after-write dependency within the group. It should also help on pipelined embedded cores. On POWER7, over the test suite, this gives an average 0.5% speedup. The largest speedups are: SingleSource/Benchmarks/Stanford/Quicksort - 33% MultiSource/Applications/d/make_dparser - 21% MultiSource/Benchmarks/FreeBench/analyzer/analyzer - 12% MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft - 12% Largest slowdowns: SingleSource/Benchmarks/Stanford/Bubblesort - 23% MultiSource/Benchmarks/Prolangs-C++/city/city - 21% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 16% MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode - 13% llvm-svn: 158719	2012-06-19 13:57:17 +00:00
Michael J. Spencer	96ebd91d6c	[Support/PathV2] Fix out of bounds access in identify_magic when the file is empty. llvm-svn: 158704	2012-06-19 05:29:57 +00:00
Akira Hatanaka	9f96bb8619	Make MipsLongBranch::runOnMachineFunction return true. llvm-svn: 158702	2012-06-19 03:45:29 +00:00
Akira Hatanaka	9846239bbc	Use MachineBasicBlock::instr_iterator instead of MachineBasicBlock::iterator in MipsCodeEmitter.cpp. llvm-svn: 158701	2012-06-19 03:39:45 +00:00
Hal Finkel	1cc27e44a4	Add support for generating reg+reg preinc stores on PPC. PPC will now generate STWUX and friends. llvm-svn: 158698	2012-06-19 02:34:32 +00:00
Rafael Espindola	ca3e0ee8b3	Move the support for using .init_array from ARM to the generic TargetLoweringObjectFileELF. Use this to support it on X86. Unlike ARM, on X86 it is not easy to find out if .init_array should be used or not, so the decision is made via TargetOptions and defaults to off. Add a command line option to llc that enables it. llvm-svn: 158692	2012-06-19 00:48:28 +00:00
Nuno Lopes	f9abcb7ba9	revert r158660, since Chris has some issues with this patch (namely using code to reprent information only used by the compiler) Original commit msg: add the 'alloc' metadata node to represent the size of offset of buffers pointed to by pointers. This metadata can be attached to any instruction returning a pointer llvm-svn: 158688	2012-06-18 23:34:26 +00:00
Manman Ren	6e1fd46fdf	ARM: use NOEN loads and stores if possible when handling struct byval. This change is to be enabled in clang. rdar://9877866 llvm-svn: 158684	2012-06-18 22:23:48 +00:00
Hal Finkel	8eac009633	Allow up to 64 functional units per processor itinerary. This patch changes the type used to hold the FU bitset from unsigned to uint64_t. This will be needed for some upcoming PowerPC itineraries. llvm-svn: 158679	2012-06-18 21:08:18 +00:00
Marshall Clow	d3e2a76ca4	Added accessors for getting coff_relocation info llvm-svn: 158675	2012-06-18 19:47:16 +00:00
Jim Grosbach	cb540f5cff	ARM: Define generic HINT instruction. The NOP, WFE, WFI, SEV and YIELD instructions are all hints w/ a different immediate value in bits [7,0]. Define a generic HINT instruction and refactor NOP, WFI, WFI, SEV and YIELD to be assembly aliases of that. rdar://11600518 llvm-svn: 158674	2012-06-18 19:45:50 +00:00
Nuno Lopes	b7c941bad9	add the 'alloc' metadata node to represent the size of offset of buffers pointed to by pointers. This metadata can be attached to any instruction returning a pointer llvm-svn: 158660	2012-06-18 16:04:04 +00:00
Joel Jones	3237ce737e	This change handles a another case for generating the bic instruction when a compile time constant is known. This occurs when implicitly zero extending function arguments from 16 bits to 32 bits. The 8 bit case doesn't need to be handled, as the 8 bit constants are encoded directly, thereby not needing a separate load instruction to form the constant into a register. <rdar://problem/11481151> llvm-svn: 158659	2012-06-18 14:51:32 +00:00
Chandler Carruth	2cc11fd8c7	Temporarily revert r158087. This patch causes problems when both dynamic stack realignment and dynamic allocas combine in the same function. With this patch, we no longer build the epilog correctly, and silently restore registers from the wrong position in the stack. Thanks to Matt for tracking this down, and getting at least an initial test case to Chad. I'm going to try to check a variation of that test case in so we can easily track the fixes required. llvm-svn: 158654	2012-06-18 07:03:12 +00:00
Pete Cooper	33ee6c9bf1	Now that SROA can form alloca's for dynamic vector accesses, further improve it to be able to replace operations on these vector alloca's with insert/extract element insts llvm-svn: 158623	2012-06-17 03:58:26 +00:00
Benjamin Kramer	d0b767f849	Disable the right instance of TheJIT, this one is only used in asserts. llvm-svn: 158610	2012-06-16 21:55:52 +00:00
Benjamin Kramer	b9f84bb0ce	Guard private fields that are unused in Release builds with #ifndef NDEBUG. llvm-svn: 158608	2012-06-16 21:48:13 +00:00
Hal Finkel	6261c2dc28	Cleanup trip-count finding for PPC CTR loops (and some bug fixes). This cleans up the method used to find trip counts in order to form CTR loops on PPC. This refactoring allows the pass to find loops which have a constant trip count but also happen to end with a comparison to zero. This also adds explicit FIXMEs to mark two different classes of loops that are currently ignored. In addition, we now search through all potential induction operations instead of just the first. Also, we check the predicate code on the conditional branch and abort the transformation if the code is not EQ or NE, and we then make sure that the branch to be transformed matches the condition register defined by the comparison (multiple possible comparisons will be considered). llvm-svn: 158607	2012-06-16 20:34:07 +00:00
Hal Finkel	fa103d3fc7	Teach BBVectorize to combine, when possible, or discard metadata when fusing instructions. The present implementation handles only TBAA and FP metadata, discarding everything else. For debug metadata, the current behavior is maintained (the debug metadata associated with one of the instructions will be kept, discarding that attached to the other). This should address PR 13040. llvm-svn: 158606	2012-06-16 20:34:06 +00:00
Hal Finkel	16ddd4b66b	Move the Metadata merging methods from GVN and make them public in MDNode. There are other passes, BBVectorize specifically, that also need some of this functionality. llvm-svn: 158605	2012-06-16 20:33:37 +00:00
Rafael Espindola	f70bea93e2	Implement irpc. Extracted from a patch by the PaX team. I just added the test. llvm-svn: 158604	2012-06-16 18:03:25 +00:00
Kay Tiong Khoo	390edb0d91	*no need to pollute Intel syntax with bonus mnemonics; operand size is explicitly specified llvm-svn: 158603	2012-06-16 17:19:49 +00:00
NAKAMURA Takumi	e2d4a09305	Mips/AsmParser/CMakeLists.txt: Fix dependency. llvm-svn: 158602	2012-06-16 15:33:52 +00:00
Evan Cheng	773b2cd63c	It's not deterministic to iterate over SmallPtrSet. Replace it with SmallSetVector. Patch by Daniel Reynaud. rdar://11671029 llvm-svn: 158594	2012-06-16 04:28:11 +00:00
Pete Cooper	818e9f4a26	Fix crash from r158529 on Bullet. Dynamic GEPs created by SROA needed to insert extra "i32 0" operands to index through structs and arrays to get to the vector being indexed. llvm-svn: 158590	2012-06-16 01:43:26 +00:00
Chandler Carruth	52de271da1	Don't call 'FilesToRemove[0]' when the vector is empty, even to compute the address of it. Found by a checking STL implementation used on a dragonegg builder. Sorry about this one. =/ llvm-svn: 158582	2012-06-16 00:44:07 +00:00
Chandler Carruth	e6196eba0d	Harden the Unix signals code to be more async signal safe. This is likely only the tip of the ice berg, but this particular bug caused any double-free on a glibc system to turn into a deadlock! It is not generally safe to either allocate or release heap memory from within the signal handler. The 'pop_back()' in RemoveFilesToRemove was deleting memory and causing the deadlock. What's worse, eraseFromDisk in PathV1 has lots of allocation and deallocation paths. We even passed 'true' in a place that would have caused the signal handler to try to run the 'system' system call and shell out to 'rm -rf'. That was never going to work... This patch switches the file removal to use a vector of strings so that the exact text needed for the 'unlink' system call can be stored there. It switches the loop to be a boring indexed loop, and directly calls unlink without looking at the error. It also works quite hard to ensure that calling 'c_str()' is safe, by ensuring that the non-signal-handling code path that manipulates the vector always leaves it in a state where every element has already had 'c_str()' called at least once. I dunno exactly how overkill this is, but it fixes the deadlock-on-double free issue, and seems likely to prevent any other issues from sneaking up. Sorry for not having a test case, but I really don't know how to test signal handling code easily.... llvm-svn: 158580	2012-06-16 00:09:41 +00:00
Jakob Stoklund Olesen	38a6fbf933	Remove final verification in RABasic. We now have a proper machine code verifier pass between register allocation and rewriting. llvm-svn: 158577	2012-06-15 23:48:48 +00:00
Jakob Stoklund Olesen	45c1f9976c	Print out register number in InlineSpiller. llvm-svn: 158575	2012-06-15 23:47:09 +00:00
Jakob Stoklund Olesen	13dffcb766	Accept null PhysReg arguments to checkRegMaskInterference. Calling checkRegMaskInterference(VirtReg) checks if VirtReg crosses any regmask operands, regardless of the registers they clobber. llvm-svn: 158563	2012-06-15 22:24:22 +00:00
Kevin Enderby	6c7279ec2e	Fix the encoding of the armv7m (MClass) for MSR registers other than aspr, iaspr, espr and xpsr which also needed to have 0b10 in their mask encoding bits. llvm-svn: 158560	2012-06-15 22:14:44 +00:00
Manman Ren	e0763c7472	ARM: optimization for sub+abs. This patch will optimize abs(x-y) FROM sub, movs, rsbmi TO subs, rsbmi For abs, we will use cmp instead of movs. This is necessary because we already have an existing peephole pass which optimizes away cmp following sub. rdar: 11633193 llvm-svn: 158551	2012-06-15 21:32:12 +00:00
Kay Tiong Khoo	3d8fc90f96	*fixed to separate mnemonic from operands with tab llvm-svn: 158543	2012-06-15 21:04:21 +00:00
Andrew Trick	8370c7c38f	LSR: fix expansion of scaled reg in non-address type formulae. For non-address users, Base and Scaled registers are not specially associated to fit an address mode, so SCEVExpander should apply normal expansion rules. Otherwise we may sink computation into inner loops that have already been optimized. llvm-svn: 158537	2012-06-15 20:07:29 +00:00
Andrew Trick	aca8fb3c45	LSR fix: "Special" users are just like "Basic" users but allow -1 scale. llvm-svn: 158536	2012-06-15 20:07:26 +00:00
Bill Wendling	4fd966347a	Remove assignments which aren't used afterwards. llvm-svn: 158535	2012-06-15 19:30:42 +00:00
Pete Cooper	e24d6a19e3	Allow SROA to split up an array of vectors into multiple vectors, even when the vectors are dynamically indexed llvm-svn: 158529	2012-06-15 18:07:29 +00:00
Rafael Espindola	1821c6c3b0	Some optimizations done by globalopt are safe only for internal linkage, not linkonce linkage. For example, it is not valid to add unnamed_addr. This also fixes a crash in g++.dg/opt/static5.C. llvm-svn: 158528	2012-06-15 18:00:24 +00:00
Jakob Stoklund Olesen	a15a224db0	Preserve <undef> flags in ARMExpandPseudo. This probably mostly shows up in bugpoint-generated code. llvm-svn: 158527	2012-06-15 17:46:54 +00:00
Jakob Stoklund Olesen	5767ad727c	Use regunit liveness in RegisterCoalescer when it is available. We only do very limited physreg coalescing now, but we still merge virtual registers into reserved registers. llvm-svn: 158526	2012-06-15 17:36:48 +00:00
Rafael Espindola	768b41c17a	Factor macro argument parsing into helper methods and add support for .irp. Patch extracted from a larger one by the PaX team. I added the testcases and tightened error handling a bit. llvm-svn: 158523	2012-06-15 14:02:34 +00:00
Duncan Sands	7838603ffc	Fix issues (infinite loop and/or crash) with self-referential instructions, for example degenerate phi nodes and binops that use themselves in unreachable code. Thanks to Charles Davis for the testcase that uncovered this can of worms. llvm-svn: 158508	2012-06-15 08:37:50 +00:00
Craig Topper	11913052d6	Move AVX version of convert instructions that write to GPRs to the Op1 table. llvm-svn: 158497	2012-06-15 07:02:58 +00:00
Marshall Clow	bfb85e676c	Had a closing brace inside an #ifdef -- oops! llvm-svn: 158485	2012-06-15 01:15:47 +00:00
Marshall Clow	71757ef3ed	Adding acessors to COFFObjectFile so that clients can get at the (non-generic) bits llvm-svn: 158484	2012-06-15 01:08:25 +00:00
Pete Cooper	1d1fa72837	Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access. Now with additional fix and test for indexing into a vector inside a struct llvm-svn: 158479	2012-06-14 23:53:53 +00:00
Rafael Espindola	def1b09be2	Implement the isSafeToDiscardIfUnused predicate and use it in globalopt and globaldce. Globaldce was already removing linkonce globals, but globalopt was not. llvm-svn: 158476	2012-06-14 22:48:13 +00:00
Pete Cooper	8bbce768d8	Move X86::VCVTTSD2SIrr from the 2 operand to 1 operand MemRegOp table. Can someone with more knowledge of this please look at other entries to see if others need moved. llvm-svn: 158474	2012-06-14 22:12:58 +00:00
Akira Hatanaka	5fd22485a3	Fix coding style violations. Remove white spaces and tabs. llvm-svn: 158471	2012-06-14 21:10:56 +00:00
Akira Hatanaka	d8ab16b86f	1. introduce MipsPat in place of Pat in order to exclude those from being used by Mips16 or Micro Mips 2. clean up a few lines too long encountered Patch by Reed Kotler. llvm-svn: 158470	2012-06-14 21:03:23 +00:00
Akira Hatanaka	1b420ac4c8	Make machine verifier check the first instruction of the last bundle instead of the last instruction of a basic block. llvm-svn: 158468	2012-06-14 20:51:13 +00:00
Lang Hames	a33db65bd9	Make comment slightly more helpful. llvm-svn: 158467	2012-06-14 20:37:15 +00:00
Pete Cooper	5d19452f3f	Revert r158454: Allow SROA to look at a vector type... Its breaking the vectorise buildbot This reverts commit 12c1f86ffa731e2952c80d2cc577000c96b8962c. llvm-svn: 158462	2012-06-14 18:32:52 +00:00
Andrew Trick	45877fa011	misched: disable SSA check pending PR13112. llvm-svn: 158461	2012-06-14 17:48:49 +00:00
Pete Cooper	a7e6d58a87	Recommit r158407: Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access. Now with additional fix and test for indexing into a vector inside a struct llvm-svn: 158454	2012-06-14 16:38:13 +00:00
NAKAMURA Takumi	27bdc671ed	MipsLongBranch.cpp: Tweak llvm::next() to appease msvc. llvm-svn: 158446	2012-06-14 12:29:48 +00:00
Richard Barton	b0ec375b96	Replace assertion failure for badly formatted CPS instrution with error message. llvm-svn: 158445	2012-06-14 10:48:04 +00:00
Jush Lu	ac96b764ea	Cleanup whitespace. llvm-svn: 158443	2012-06-14 06:08:19 +00:00
Manman Ren	c2bc2d106b	InstCombine: fix a bug when combining (fcmp cc0 x, y) && (fcmp cc1 x, y). uno && ueq was converted to ueq, it should be converted to uno. llvm-svn: 158441	2012-06-14 05:57:42 +00:00
Akira Hatanaka	d74b1c1a48	Fix Mips/CMakeLists.txt. llvm-svn: 158437	2012-06-14 01:23:55 +00:00
Akira Hatanaka	a215929d5f	Add file MipsLongBranch.cpp. llvm-svn: 158436	2012-06-14 01:22:24 +00:00
Akira Hatanaka	a1b142f97c	Remove code in MipsAsmPrinter and MipsMCInstLower. llvm-svn: 158434	2012-06-14 01:20:12 +00:00
Akira Hatanaka	eb36522a4d	Add long branch expansion pass for MIPS. llvm-svn: 158433	2012-06-14 01:19:35 +00:00
Akira Hatanaka	64f8df28ed	Add AT to the list of registers clobbered by branches so that it is available as a scratch register when they are expanded to long branches. llvm-svn: 158432	2012-06-14 01:17:59 +00:00
Akira Hatanaka	194a8773ea	In MipsRegisterInfo::eliminateFrameIndex, call Mips::loadImmediate to load an immediate that does not fit into 16-bit. llvm-svn: 158431	2012-06-14 01:17:36 +00:00
Akira Hatanaka	2372c8bb5f	In MipsFrameLowering::emitPrologue and emitEpilogue, call Mips::loadImmediate to load an immediate that does not fit into 16-bit. Also, take into consideration the global base register slot on the stack when computing the stack size. llvm-svn: 158430	2012-06-14 01:17:13 +00:00
Akira Hatanaka	acd1a7dc68	Define function MipsInstrInfo::GetInstSizeInBytes, which will be called to compute the size of basic blocks in a function. Also, define a function which emits a series of instructions to load an immediate. llvm-svn: 158429	2012-06-14 01:16:45 +00:00
Akira Hatanaka	0c76448471	In MipsISelDAGToDAG.cpp, store the global base register to a stack frame object. Long-branches need access to the global base register to get the destination address. llvm-svn: 158428	2012-06-14 01:16:15 +00:00
Akira Hatanaka	51c70c62cf	Add methods to MipsFunctionInfo for initializing and accessing the stack frame object for the global base register. This is the first of a series of patches which implements long branch expansion for MIPS. llvm-svn: 158427	2012-06-14 01:15:36 +00:00
Akira Hatanaka	5ac78681c1	Bundle jump/branch instructions with the instructions in the delay slot in delay slot filler pass of MIPS, per suggestion of Jakob Stoklund Olesen. This change, along with the fix in r158154, enables machine verification to be run after delay slot filling. llvm-svn: 158426	2012-06-13 23:25:52 +00:00
Akira Hatanaka	df5205ef3d	Implement a DAGCombine in MipsISelLowering.cpp which transforms the following pattern: (add v0, (add v1, abs_lo(tjt))) => (add (add v0, v1), abs_lo(tjt)) "tjt" is a TargetJumpTable node. llvm-svn: 158419	2012-06-13 20:33:18 +00:00
Akira Hatanaka	1daf8c2a16	Set a higher value for maxStoresPerMemcpy in MipsISelLowering.cpp. llvm-svn: 158414	2012-06-13 19:33:32 +00:00
Akira Hatanaka	9586618c58	Simplify CreateLoadLR and CreateStoreLR in MipsISelLowering.cpp. llvm-svn: 158413	2012-06-13 19:06:08 +00:00
Akira Hatanaka	f0273603f5	Implement fastcc calling convention for MIPS. llvm-svn: 158410	2012-06-13 18:06:00 +00:00
Richard Osborne	ab7d788eb5	Fix pattern for MKMSK instruction. llvm-svn: 158409	2012-06-13 17:59:12 +00:00
Pete Cooper	e2fe809772	Revert "Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access" This reverts commit 51786e0aaec76b973205066bd44f7f427b21969f. llvm-svn: 158408	2012-06-13 17:55:22 +00:00
Pete Cooper	e1d4e8b563	Allow SROA to look at a vector type and see if the offset is out of range to be replaced with a scalar access llvm-svn: 158407	2012-06-13 17:30:34 +00:00
Argyrios Kyrtzidis	444fd42634	Fix building ThreadLocal.cpp with --disable-threads. llvm-svn: 158405	2012-06-13 16:30:06 +00:00
Kay Tiong Khoo	f294921e24	*typo: Cyles changed to Cycles llvm-svn: 158404	2012-06-13 15:53:04 +00:00
Duncan Sands	409d8ae165	It is possible for several constants which aren't individually absorbing to combine to the absorbing element. Thanks to nbjoerg on IRC for pointing this out. llvm-svn: 158399	2012-06-13 12:15:56 +00:00
Duncan Sands	318a89ddac	When linearizing a multiplication, return at once if we see a factor of zero, since then the entire expression must equal zero (similarly for other operations with an absorbing element). With this in place a bunch of reassociate code for handling constants is dead since it is all taken care of when linearizing. No intended functionality change. llvm-svn: 158398	2012-06-13 09:42:13 +00:00
Craig Topper	71dc02d659	Fix intrinsics for XOP frczss/sd instructions. These instructions only take one source register and zero the upper bits of the destination rather than preserving them. llvm-svn: 158396	2012-06-13 07:18:53 +00:00
Hal Finkel	9898614854	Add another missing 64-bit itinerary definition for the PPC A2 core. llvm-svn: 158393	2012-06-13 05:55:09 +00:00
Manman Ren	d33f4efbfd	SimplifyCFG: fold unconditional branch to its predecessor if profitable. This patch extends FoldBranchToCommonDest to fold unconditional branches. For unconditional branches, we fold them if it is easy to update the phi nodes in the common successors. rdar://10554090 llvm-svn: 158392	2012-06-13 05:43:29 +00:00
Jakob Stoklund Olesen	1c66b87f7d	Eliminate struct TableGenBackend. TableGen backends are simply written as functions now. Patch by Sean Silva! llvm-svn: 158389	2012-06-13 05:15:49 +00:00
Akira Hatanaka	21371766d1	Clean up trailing blanks in Mips16InstrFormats.td Patch by Reed Kotler. llvm-svn: 158382	2012-06-13 02:42:47 +00:00
Akira Hatanaka	5fa541231b	disable use of directive .set nomicromips until this directive is pushed in gas to open source fsf Patch by Reed Kotler. llvm-svn: 158381	2012-06-13 02:41:14 +00:00
Andrew Trick	344fb64fa3	sched: fix latency of memory dependence chain edges for consistency. For store->load dependencies that may alias, we should always use TrueMemOrderLatency, which may eventually become a subtarget hook. In effect, we should guarantee at least TrueMemOrderLatency on at least one DAG path from a store to a may-alias load. This should fix the standard mode as well as -enable-aa-sched-mi". llvm-svn: 158380	2012-06-13 02:39:03 +00:00
Andrew Trick	5b90645abb	sched: Avoid trivially redundant DAG edges. Take the one with higher latency. llvm-svn: 158379	2012-06-13 02:39:00 +00:00
Akira Hatanaka	3fe00f29ad	1. fix places where immed is used in place of imm to be consistent with non mips16 2. fix some comments to change OPcode->EXTEND for extended instructions Patch by Reed Kotler. llvm-svn: 158378	2012-06-13 02:37:54 +00:00
Hal Finkel	79c39da135	Add some missing 64-bit itinerary definitions for the PPC A2 core. llvm-svn: 158373	2012-06-12 20:32:29 +00:00
Duncan Sands	72aea01b6e	Use DenseMap as SmallMap workaround rather than std::map, at Chandler's request. llvm-svn: 158371	2012-06-12 20:26:43 +00:00
Duncan Sands	67cd591989	Use std::map rather than SmallMap because SmallMap assumes that the value has POD type, causing memory corruption when mapping to APInts with bitwidth > 64. Merge another crash testcase into crash.ll while there. llvm-svn: 158369	2012-06-12 20:16:51 +00:00
Chad Rosier	c6916f88a8	[arm-fast-isel] Add support for -arm-long-calls. Patch by Jush Lu <jush.msn@gmail.com>. llvm-svn: 158368	2012-06-12 19:25:13 +00:00
Hal Finkel	8c33dde666	Split out the PPC instruction class IntSimple from IntGeneral. On the POWER7, adds and logical operations can also be handled in the load/store pipelines. We'll call these IntSimple. llvm-svn: 158366	2012-06-12 19:01:24 +00:00
Hal Finkel	f1cc96ab50	Fixes for PPC host detection and features. POWER4 is a 64-bit CPU (better matched to the 970). The g3 is really the 750 (no altivec), the g4+ is the 74xx (not the 750). Patch by Andreas Tobler. llvm-svn: 158363	2012-06-12 16:39:23 +00:00
Duncan Sands	d7aeefebd6	Now that Reassociate's LinearizeExprTree can look through arbitrary expression topologies, it is quite possible for a leaf node to have huge multiplicity, for example: x0 = xx, x1 = x0x0, x2 = x1*x1, ... rapidly gives a value which is x raised to a vast power (the multiplicity, or weight, of x). This patch fixes the computation of weights by correctly computing them no matter how big they are, rather than just overflowing and getting a wrong value. It turns out that the weight for a value never needs more bits to represent than the value itself, so it is enough to represent weights as APInts of the same bitwidth and do the right overflow-avoiding dance steps when computing weights. As a side-effect it reduces the number of multiplies needed in some cases of large powers. While there, in view of external uses (eg by the vectorizer) I made LinearizeExprTree static, pushing the rank computation out into users. This is progress towards fixing PR13021. llvm-svn: 158358	2012-06-12 14:33:56 +00:00
Hal Finkel	59b0ee8a56	Reapply r158337, this time properly protect Darwin/PPC host CPU use with __ppc__. Original commit message: Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158349	2012-06-12 03:03:13 +00:00
Argyrios Kyrtzidis	c6dc4d75fd	Satisfy C++ aliasing rules, per suggestion by Chandler. llvm-svn: 158346	2012-06-12 01:06:16 +00:00
Jakob Stoklund Olesen	f8f128606c	Revert r158337 "Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName()." This commit broke most of the PowerPC unit tests when running on Intel/Apple. llvm-svn: 158345	2012-06-12 00:58:40 +00:00
Argyrios Kyrtzidis	8d19c86c9a	For llvm::sys::ThreadLocalImpl instead of malloc'ing the platform-specific thread local data, embed them in the class using a uint64_t and make sure we get compiler errors if there's a platform where this is not big enough. This makes ThreadLocal more safe for using it in conjunction with CrashRecoveryContext. Related to crash in rdar://11434201. llvm-svn: 158342	2012-06-12 00:21:31 +00:00
Andrew Trick	3e465fb225	misched: When querying RegisterPressureTracker, always save current and max pressure. llvm-svn: 158340	2012-06-11 23:42:23 +00:00
Andrew Trick	d054bd833a	misched: regpressure getMaxPressureDelta, revert accidental checkin. llvm-svn: 158339	2012-06-11 23:42:20 +00:00
Hal Finkel	23c699e497	Move PPC host-CPU detection logic from PPCSubtarget into sys::getHostCPUName(). Both the new Linux functionality and the old Darwin functions have been moved. This change also allows this information to be queried directly by clang and other frontends (clang, for example, will now have real -mcpu=native support). llvm-svn: 158337	2012-06-11 23:14:31 +00:00
Hal Finkel	bddc916f2b	Enable MFOCRF generation on the PPC A2 core. llvm-svn: 158324	2012-06-11 19:57:04 +00:00
Hal Finkel	bfd3d08d18	Rename the PPC target feature gpul to mfocrf. The PPC target feature gpul (IsGigaProcessor) was only used for one thing: To enable the generation of the MFOCRF instruction. Furthermore, this instruction is available on other PPC cores outside of the G5 line. This feature now corresponds to the HasMFOCRF flag. No functionality change. llvm-svn: 158323	2012-06-11 19:57:01 +00:00
Hal Finkel	25d4c568d3	Add A2 to the list of PPC CPUs recognized by Linux host CPU-type detection. llvm-svn: 158322	2012-06-11 19:56:57 +00:00
Hal Finkel	2c09058f19	Emit the two-operand form of the PPC mfcr instruction as mfocrf. This is necessary on Linux and supported on Darwin, see PR2604. llvm-svn: 158315	2012-06-11 15:43:15 +00:00
Hal Finkel	ba671c0ea7	Add local CPU detection for Linux PPC. This functionality mirrors that available on PPC/Darwin. llvm-svn: 158314	2012-06-11 15:43:13 +00:00
Hal Finkel	f2b9c38d6f	Add POWER6 and POWER7 CPU types to the PPC backend. No functional change; these will be used by upcoming scheduler enhancements. llvm-svn: 158313	2012-06-11 15:43:08 +00:00
Jakob Stoklund Olesen	e6aed139f0	Write llvm-tblgen backends as functions instead of sub-classes. The TableGenBackend base class doesn't do much, and will be removed completely soon. Patch by Sean Silva! llvm-svn: 158311	2012-06-11 15:37:55 +00:00
Bill Wendling	4b79647a6e	Re-enable the CMN instruction. We turned off the CMN instruction because it had semantics which we weren't getting correct. If we are comparing with an immediate, then it's okay to use the CMN instruction. <rdar://problem/7569620> llvm-svn: 158302	2012-06-11 08:07:26 +00:00
Benjamin Kramer	2150145ae4	InstCombine: factor code better. No functionality change. llvm-svn: 158301	2012-06-11 08:01:25 +00:00
Benjamin Kramer	8b8a76974f	InstCombine: Turn (zext A) == (B & (1<<X)-1) into A == (trunc B), narrowing the compare. This saves a cast, and zext is more expensive on platforms with subreg support than trunc is. This occurs in the BSD implementation of memchr(3), see PR12750. On the synthetic benchmark from that bug stupid_memchr and bsd_memchr have the same performance now when not inlining either function. stupid_memchr: 323.0us bsd_memchr: 321.0us memchr: 479.0us where memchr is the llvm-gcc compiled bsd_memchr from osx lion's libc. When inlining is enabled bsd_memchr still regresses down to llvm-gcc memchr time, I haven't fully understood the issue yet, something is grossly mangling the loop after inlining. llvm-svn: 158297	2012-06-10 20:35:00 +00:00
Hal Finkel	4e9f1a859f	Enable ILP scheduling for all nodes by default on PPC. Over the entire test-suite, this has an insignificantly negative average performance impact, but reduces some of the worst slowdowns from the anti-dep. change (r158294). Largest speedups: SingleSource/Benchmarks/Stanford/Quicksort - 28% SingleSource/Benchmarks/Stanford/Towers - 24% SingleSource/Benchmarks/Shootout-C++/matrix - 23% MultiSource/Benchmarks/SciMark2-C/scimark2 - 19% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - 15% (matrix and automotive-bitcount were both in the top-5 slowdown list from the anti-dep. change) Largest slowdowns: MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% MultiSource/Benchmarks/mediabench/gsm/toast/toast - 26% MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan - 21% SingleSource/Benchmarks/CoyoteBench/lpbench - 20% MultiSource/Applications/d/make_dparser - 16% llvm-svn: 158296	2012-06-10 19:32:29 +00:00
Nadav Rotem	17ee58a792	Add AutoUpgrade support for the SSE4 ptest intrinsics. Patch by Michael Kuperstein. llvm-svn: 158295	2012-06-10 18:42:51 +00:00
Hal Finkel	a8100281ae	Use critical anti-dep. breaking on all PPC targets, but also add other register classes. Using 'all' instead of 'critical' would be better because it would make it easier to satisfy the bundling constraints, but, as noted in the FIXME, that is currently not possible with the crs. This yields an average 1% speedup over the entire test suite (on Power 7). Largest speedups: SingleSource/Benchmarks/Shootout-C++/moments - 40% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 28% SingleSource/Benchmarks/BenchmarkGame/nsieve-bits - 26% SingleSource/Benchmarks/McGill/misr - 23% MultiSource/Applications/JM/ldecod/ldecod - 22% Largest slowdowns: SingleSource/Benchmarks/Shootout-C++/matrix - -29% SingleSource/Benchmarks/Shootout-C++/ary3 - -22% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -18% SingleSource/Benchmarks/Shootout-C++/ary - -17% MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-bitcount - -15% llvm-svn: 158294	2012-06-10 11:15:36 +00:00
Craig Topper	7afe343be5	Add intrinsics for immediate form of XOP vprot instructions. Use i128mem instead of f128mem for integer XOP instructions. llvm-svn: 158291	2012-06-10 07:31:56 +00:00
Hal Finkel	2edfbddcf0	Improve ext/trunc patterns on PPC64. The PPC64 backend had patterns for i32 <-> i64 extensions and truncations that would leave self-moves in the final assembly. Replacing those patterns with ones based on the SUBREG builtins yields better-looking code. Thanks to Jakob and Owen for their suggestions in this matter. llvm-svn: 158283	2012-06-09 22:10:19 +00:00
Craig Topper	a54893c662	Use XOP vpcom intrinsics in patterns instead of a target specific SDNode type. Remove the custom lowering code that selected the SDNode type. llvm-svn: 158279	2012-06-09 17:02:24 +00:00
Craig Topper	3352ba55b9	Replace XOP vpcom intrinsics with fewer intrinsics that take the immediate as an argument. llvm-svn: 158278	2012-06-09 16:46:13 +00:00
Aaron Ballman	36a978cca2	Disabling a spurious deprecation warning about using PathV1 from within the PathV1 implementation file. llvm-svn: 158274	2012-06-09 13:59:29 +00:00
Aaron Ballman	503bbff367	Fixing a typo in the comments. llvm-svn: 158273	2012-06-09 13:46:36 +00:00
Benjamin Kramer	0748008df5	Allocate the contents of DwarfDebug's StringMaps in a single big BumpPtrAllocator. llvm-svn: 158265	2012-06-09 10:34:15 +00:00
Duncan Sands	556eab8878	Silence a gcc-4.6 warning: GCC fails to understand that secondReg and cmpOp2 are correlated, and thinks that cmpOp2 may be used uninitialized. llvm-svn: 158263	2012-06-09 10:04:03 +00:00
Hal Finkel	eb50c2d4a4	Enable tail merging on PPC. Tail merging had been disabled on PPC because it would disturb bundling decisions made during pre-RA scheduling on the 970 cores. Now, however, all bundling decisions are made during post-RA scheduling, and tail merging is generally beneficial (the average test-suite speedup is insignificantly positive). Largest test-suite speedups: MultiSource/Benchmarks/mediabench/gsm/toast/toast - 30% MultiSource/Benchmarks/BitBench/uuencode/uuencode - 23% SingleSource/Benchmarks/Shootout-C++/ary - 21% SingleSource/Benchmarks/Stanford/Queens - 17% Largest slowdowns: MultiSource/Benchmarks/MiBench/security-sha/security-sha - 24% MultiSource/Benchmarks/McCat/03-testtrie/testtrie - 22% MultiSource/Applications/JM/ldecod/ldecod - 14% MultiSource/Benchmarks/mediabench/g721/g721encode/encode - 9% This is improved by using full (instead of just critical) anti-dependency breaking, but doing so still causes miscompiles and so cannot yet be enabled by default. llvm-svn: 158259	2012-06-09 03:14:50 +00:00
Andrew Trick	fc8ce08be3	Register pressure: added getPressureAfterInstr. llvm-svn: 158256	2012-06-09 02:16:58 +00:00
Jakob Stoklund Olesen	c26fbbfba5	Sketch a LiveRegMatrix analysis pass. The LiveRegMatrix represents the live range of assigned virtual registers in a Live interval union per register unit. This is not fundamentally different from the interference tracking in RegAllocBase that both RABasic and RAGreedy use. The important differences are: - LiveRegMatrix tracks interference per register unit instead of per physical register. This makes interference checks cheaper and assignments slightly more expensive. For example, the ARM D7 reigster has 24 aliases, so we would check 24 physregs before assigning to one. With unit-based interference, we check 2 units before assigning to 2 units. - LiveRegMatrix caches regmask interference checks. That is currently duplicated functionality in RABasic and RAGreedy. - LiveRegMatrix is a pass which makes it possible to insert target-dependent passes between register allocation and rewriting. Such passes could tweak the register assignments with interference checking support from LiveRegMatrix. Eventually, RABasic and RAGreedy will be switched to LiveRegMatrix. llvm-svn: 158255	2012-06-09 02:13:10 +00:00
Jack Carter	2db37e8226	Test commit llvm-svn: 158250	2012-06-09 00:27:55 +00:00
Jakob Stoklund Olesen	be336295cd	Also compute MBB live-in lists in the new rewriter pass. This deduplicates some code from the optimizing register allocators, and it means that it is now possible to change the register allocators' solutions simply by editing the VirtRegMap between the register allocator pass and the rewriter. llvm-svn: 158249	2012-06-09 00:14:47 +00:00
Dmitri Gribenko	dbeafa773a	Convert comments to proper Doxygen comments. llvm-svn: 158248	2012-06-09 00:01:45 +00:00
Jakob Stoklund Olesen	1224312f5b	Reintroduce VirtRegRewriter. OK, not really. We don't want to reintroduce the old rewriter hacks. This patch extracts virtual register rewriting as a separate pass that runs after the register allocator. This is possible now that CodeGen/Passes.cpp can configure the full optimizing register allocator pipeline. The rewriter pass uses register assignments in VirtRegMap to rewrite virtual registers to physical registers, and it inserts kill flags based on live intervals. These finalization steps are the same for the optimizing register allocators: RABasic, RAGreedy, and PBQP. llvm-svn: 158244	2012-06-08 23:44:45 +00:00
Nuno Lopes	2710f1b049	canonicalize: -%a + 42 into 42 - %a previously we were emitting: -(%a + 42) This fixes the infinite loop in PR12338. The generated code is still not perfect, though. Will work on that next llvm-svn: 158237	2012-06-08 22:30:05 +00:00
Evan Cheng	c5adccab1a	Start implementing pre-ra if-converter: using speculation and selects to eliminate branches. llvm-svn: 158234	2012-06-08 21:53:50 +00:00
Andrew Trick	423fa6faee	TargetInstrInfo hooks implemented in codegen should be declared pure virtual. llvm-svn: 158233	2012-06-08 21:52:38 +00:00
Duncan Sands	3293f460e7	Reapply commit 158073 with a fix (the testcase was already committed). The problem was that by moving instructions around inside the function, the pass could accidentally move the iterator being used to advance over the function too. Fix this by only processing the instruction equal to the iterator, and leaving processing of instructions that might not be equal to the iterator to later (later = after traversing the basic block; it could also wait until after traversing the entire function, but this might make the sets quite big). Original commit message: Grab-bag of reassociate tweaks. Unify handling of dead instructions and instructions to reoptimize. Exploit this to more systematically eliminate dead instructions (this isn't very useful in practice but is convenient for analysing some testcase I am working on). No need for WeakVH any more: use an AssertingVH instead. llvm-svn: 158226	2012-06-08 20:15:33 +00:00
Hal Finkel	41e6fd1df9	Remove the TODO statement in the PPC README re: CTR loops As Chris points out, this can now be removed! TODO: check if the associated section on viterbi's inner loop can also be removed. llvm-svn: 158224	2012-06-08 20:02:09 +00:00
Hal Finkel	c6b5debb40	Enable PPC CTR loop formation by default. Thanks to Jakob's help, this now causes no new test suite failures! Over the entire test suite, this gives an average 1% speedup. The largest speedups are: SingleSource/Benchmarks/Misc/pi - 108% SingleSource/Benchmarks/CoyoteBench/lpbench - 54% MultiSource/Benchmarks/Prolangs-C/unix-smail/unix-smail - 50% SingleSource/Benchmarks/Shootout/ary3 - 32% SingleSource/Benchmarks/Shootout-C++/matrix - 30% The largest slowdowns are: MultiSource/Benchmarks/mediabench/gsm/toast/toast - -30% MultiSource/Benchmarks/Prolangs-C/bison/mybison - -25% MultiSource/Benchmarks/BitBench/uuencode/uuencode - -22% MultiSource/Applications/d/make_dparser - -14% SingleSource/Benchmarks/Shootout-C++/ary - -13% In light of these slowdowns, additional profiling work is obviously needed! llvm-svn: 158223	2012-06-08 19:19:53 +00:00
Hal Finkel	3d32ad3a7f	Mark the PPC CTRRC and CTRRC8 register classes as non-allocatable. Marking these classes as non-alocatable allows CTR loop generation to work correctly with the block placement passes, etc. These register classes are currently used only by some unused TCRETURN patterns. In future cleanup, these will be removed. Thanks again to Jakob for suggesting this fix to the CTR loop problem! llvm-svn: 158221	2012-06-08 19:02:08 +00:00
Manman Ren	6bc2d27073	Enable optimization for integer ABS on X86 if Subtarget has CMOV. llvm-svn: 158220	2012-06-08 18:58:26 +00:00
Chad Rosier	3d464d8068	Fix a crash in APInt::lshr when shiftAmt > BitWidth. Patch by James Benton <jbenton@vmware.com>. llvm-svn: 158213	2012-06-08 18:04:52 +00:00
Andrew Trick	596af1b02e	Fix Target->Codegen dependence. Bulk move of TargetInstrInfo implementation into TargetInstrInfoImpl. This is dirty because the code isn't part of TargetInstrInfoImpl class, nor should it be, because the methods are not target hooks. However, it's the current mechanism for keeping libTarget useful outside the backend. You'll get a not-so-nice link error if you invoke a TargetInstrInfo method that depends on CodeGen. The TargetInstrInfoImpl class should probably be removed since it doesn't really solve this problem. To really fix this, we probably need separate interfaces for the CodeGen/nonCodeGen sides of TargetInstrInfo. llvm-svn: 158212	2012-06-08 17:23:27 +00:00
Nuno Lopes	4b68c1da54	BoundsChecking: add support for ConstantPointerNull. fixes a bunch of instrumentation failures in loops with reallocs llvm-svn: 158210	2012-06-08 16:31:42 +00:00
Hal Finkel	821e00121c	Disable the PPC CTR-Loops pass by default. The pass itself works well, but the something in the Machine* infrastructure does not understand terminators which define registers. Without the ability to use the block-placement pass, etc. this causes performance regressions (and so is turned off by default). Turning off the analysis turns off the problems with the Machine* infrastructure. llvm-svn: 158206	2012-06-08 15:38:25 +00:00
Hal Finkel	8b01503ee5	Fix a bug in the new PPC CTR-Loops pass. The code which tests for an induction operation cannot assume that any ADDI instruction will have a register operand because the operand could also be a frame index; for example: %vreg16<def> = ADDI8 <fi#0>, 0; G8RC:%vreg16 llvm-svn: 158205	2012-06-08 15:38:23 +00:00
Hal Finkel	96c2d4d945	Add the PPCCTRLoops pass: a PPC machine-code-level optimization pass to form CTR-based loop branching code. This pass is derived from the Hexagon HardwareLoops pass. The only significant enhancement over the Hexagon pass is that PPCCTRLoops will also attempt to delete the replaced add and compare operations if they are no longer otherwise used. Also, invalid preheader DebugLoc is not used. llvm-svn: 158204	2012-06-08 15:38:21 +00:00
Duncan Sands	9a5cf92250	Revert commit 158073 while waiting for a fix. The issue is that reassociate can move instructions within the instruction list. If the instruction just happens to be the one the basic block iterator is pointing to, and it is moved to a different basic block, then we get into an infinite loop due to the iterator running off the end of the basic block (for some reason this doesn't fire any assertions). Original commit message: Grab-bag of reassociate tweaks. Unify handling of dead instructions and instructions to reoptimize. Exploit this to more systematically eliminate dead instructions (this isn't very useful in practice but is convenient for analysing some testcase I am working on). No need for WeakVH any more: use an AssertingVH instead. llvm-svn: 158199	2012-06-08 13:37:30 +00:00
Manman Ren	2cdc8afccf	X86: optimize generated code for integer ABS This patch will generate the following for integer ABS: movl %edi, %eax negl %eax cmovll %edi, %eax INSTEAD OF movl %edi, %ecx sarl $31, %ecx leal (%rdi,%rcx), %eax xorl %ecx, %eax There exists a target-independent DAG combine for integer ABS, which converts integer ABS to sar+add+xor. For X86, we match this pattern back to neg+cmov. This is implemented in PerformXorCombine. rdar://10695237 llvm-svn: 158175	2012-06-07 22:39:10 +00:00
Nadav Rotem	bbd40f67d8	Do not optimize the used bits of the x86 vselect condition operand, when the condition operand is a vector of 1-bit predicates. This may happen on MIC devices. llvm-svn: 158168	2012-06-07 20:53:48 +00:00
Nadav Rotem	4e50efead6	Fix a bug in FoldSelectOpOp. Bitcast ops may change the number of vector elements, which may disagree with the select condition type. llvm-svn: 158166	2012-06-07 20:28:57 +00:00
Andrew Trick	a5d24ca453	Continue factoring computeOperandLatency. Use it for ARM hasHighOperandLatency. llvm-svn: 158164	2012-06-07 19:42:04 +00:00
Andrew Trick	5b1cadf9f7	ARM getOperandLatency rewrite. Match expectations of the new latency API. Cleanup and make the logic consistent. llvm-svn: 158163	2012-06-07 19:42:00 +00:00
Andrew Trick	3564bdfa61	ARM getOperandLatency should return -1 for unknown, consistent with API llvm-svn: 158162	2012-06-07 19:41:58 +00:00
Andrew Trick	fb1a74c2b2	Fix ARM getInstrLatency logic to work with the current API. llvm-svn: 158161	2012-06-07 19:41:55 +00:00
Manman Ren	746e4859d0	PR13046: we can't replace usage of SUB with CMP in the lowering phase. It will cause assertion failure later on. llvm-svn: 158160	2012-06-07 19:27:33 +00:00
Rafael Espindola	55d1145bd5	Use a base register instead of an index register with the local dynamic model. Fixes pr13048. llvm-svn: 158158	2012-06-07 18:39:19 +00:00
Pete Cooper	cd72016cab	Move terminator machine verification to check MachineBasicBlock::instr_iterator instead of MBB::iterator llvm-svn: 158154	2012-06-07 17:41:39 +00:00
Manman Ren	ae02c5a93e	X86: replace SUB with CMP if possible This patch will optimize the following movq %rdi, %rax subq %rsi, %rax cmovsq %rsi, %rdi movq %rdi, %rax to cmpq %rsi, %rdi cmovsq %rsi, %rdi movq %rdi, %rax Perform this optimization if the actual result of SUB is not used. rdar: 11540023 llvm-svn: 158126	2012-06-07 00:42:47 +00:00
Manman Ren	9c9641812c	Revert r157755. The commit is intended to fix rdar://11540023. It is implemented as part of peephole optimization. We can actually implement this in the SelectionDAG lowering phase. llvm-svn: 158122	2012-06-06 23:53:03 +00:00
Jakob Stoklund Olesen	00e7dffefb	Properly verify liveness with bundled machine instructions. Bundles should be treated as one atomic transaction when checking liveness. That is how the register allocator (and VLIW targets) treats bundles. llvm-svn: 158116	2012-06-06 22:34:30 +00:00
Benjamin Kramer	3f87e3b707	Add accessors for all private members of DisasmContext. LLVM should be -Wunused-private-field clean now. llvm-svn: 158103	2012-06-06 20:45:10 +00:00
Andrew Trick	05ff4667eb	Move RegisterClassInfo.h. Allow targets to access this API. It's required for RegisterPressure. llvm-svn: 158102	2012-06-06 20:29:31 +00:00
Andrew Trick	88517f608c	Move RegisterPressure.h. Make it a general utility for use by Targets. llvm-svn: 158097	2012-06-06 19:47:35 +00:00
Benjamin Kramer	009b1c1cf1	Round 2 of dead private variable removal. LLVM is now -Wunused-private-field clean except for - lib/MC/MCDisassembler/Disassembler.h. Not sure why it keeps all those unaccessible fields. - gtest. llvm-svn: 158096	2012-06-06 19:47:08 +00:00
Benjamin Kramer	628a39faa3	Remove unused private fields found by clang's new -Wunused-private-field. There are some that I didn't remove this round because they looked like obvious stubs. There are dead variables in gtest too, they should be fixed upstream. llvm-svn: 158090	2012-06-06 18:25:08 +00:00
Chad Rosier	5d6f01ad77	Add support for dynamic stack realignment in the presence of dynamic allocas on X86. rdar://11496434 llvm-svn: 158087	2012-06-06 17:37:40 +00:00
Chad Rosier	faa3894628	Fix combine of uno && ord -> false so that the ordering of the fcmps doesn't matter. rdar://11579835 llvm-svn: 158084	2012-06-06 17:22:40 +00:00
Jakob Stoklund Olesen	f435b1867d	Remove dead debug option -disable-rematerialization. Remat has been stable for years, and it isn't done by LiveIntervalAnalysis any longer. (See LiveRangeEdit). llvm-svn: 158079	2012-06-06 16:22:41 +00:00
Duncan Sands	763da45e9e	Grab-bag of reassociate tweaks. Unify handling of dead instructions and instructions to reoptimize. Exploit this to more systematically eliminate dead instructions (this isn't very useful in practice but is convenient for analysing some testcase I am working on). No need for WeakVH any more: use an AssertingVH instead. llvm-svn: 158073	2012-06-06 14:53:10 +00:00
Benjamin Kramer	3de5d40f4d	Stop leaking RegScavengers from TailDuplication. llvm-svn: 158069	2012-06-06 13:53:41 +00:00
Richard Barton	f1ef87ddbb	Correct decoder for T1 conditional B encoding llvm-svn: 158055	2012-06-06 09:12:53 +00:00
Craig Topper	bf2409e8aa	Mark several instructions SSE2 instead of SSE3 as they should be. llvm-svn: 158049	2012-06-06 06:45:27 +00:00
Jakob Stoklund Olesen	c141ba584e	Move LiveUnionArray into LiveIntervalUnion.h It is useful outside RegAllocBase. llvm-svn: 158041	2012-06-05 23:57:30 +00:00
Jakob Stoklund Olesen	46d229c573	Don't print register names in LiveIntervalUnion::print(). Soon we'll be making LiveIntervalUnions for register units as well. This was the only place using the RepReg member, so just remove it. llvm-svn: 158038	2012-06-05 23:07:19 +00:00
Matt Beaumont-Gay	7ba769bedd	Suppress -Wunused-variable in -Asserts build llvm-svn: 158037	2012-06-05 23:00:03 +00:00
Jakob Stoklund Olesen	f3f7d6f6e2	Simplify LiveInterval::print(). Don't print out the register number and spill weight, making the TRI argument unnecessary. This allows callers to interpret the reg field. It can currently be a virtual register, a physical register, a spill slot, or a register unit. llvm-svn: 158031	2012-06-05 22:51:54 +00:00
Jakob Stoklund Olesen	12e03dae44	Add experimental support for register unit liveness. Instead of computing a live interval per physreg, LiveIntervals can compute live intervals per register unit. This makes impossible the confusing situation where aliasing registers could have overlapping live intervals. It should also make fixed interferernce checking cheaper since registers have fewer register units than aliases. Live intervals for regunits are computed on demand, using MRI use-def chains and the new LiveRangeCalc class. Only regunits live in to ABI blocks are precomputed during LiveIntervals::runOnMachineFunction(). The regunit liveness computations don't depend on LiveVariables. llvm-svn: 158029	2012-06-05 22:02:15 +00:00
Jakob Stoklund Olesen	989b3b1516	Implement LiveRangeCalc::extendToUses() and createDeadDefs(). These LiveRangeCalc methods are to be used when computing a live range from scratch. llvm-svn: 158027	2012-06-05 21:54:09 +00:00
Andrew Trick	4b037005d2	MachineInstr::eraseFromParent fix for removing bundled instrs. Patch by Ivan Llopard. llvm-svn: 158025	2012-06-05 21:44:23 +00:00
Andrew Trick	4544606c71	misched: API for minimum vs. expected latency. Minimum latency determines per-cycle scheduling groups. Expected latency determines critical path and cost. llvm-svn: 158021	2012-06-05 21:11:27 +00:00
Lang Hames	a59100cc08	Add a new intrinsic: llvm.fmuladd. This intrinsic represents a multiply-add expression (a * b + c) that can be implemented as a fused multiply-add (fma) if the target determines that this will be more efficient. This intrinsic will be used to implement FP_CONTRACT support and an aggressive FMA formation mode. If your target has a fast FMA instruction you should override the isFMAFasterThanMulAndAdd method in TargetLowering to return true. llvm-svn: 158014	2012-06-05 19:07:46 +00:00
Yuan Lin	572a3a2cce	Fix header file include order in NVPTX backend NV_CONTRIB llvm-svn: 158013	2012-06-05 19:06:13 +00:00
Andrew Trick	a6fb910fad	LoopUnroll: always check for NULL LoopPassManager llvm-svn: 158007	2012-06-05 17:51:05 +00:00
Roman Divacky	c856653fb3	PPC32 uses R2 as the TLS register. Fix the copy and paste. llvm-svn: 158004	2012-06-05 17:14:17 +00:00

... 2 3 4 5 6 ...

55057 Commits