llvm-project

Commit Graph

Author	SHA1	Message	Date
Hal Finkel	8ee309d9b7	Simplify checking for pointer types in BBVectorize (this change was suggested by Duncan). llvm-svn: 154787	2012-04-16 03:49:42 +00:00
Hal Finkel	e0cf6397fd	Remove dead SD nodes after the combining pass. Fixes PR12201. llvm-svn: 154786	2012-04-16 03:33:22 +00:00
Chandler Carruth	ccc7e42b1f	Rewrite how machine block placement handles loop rotation. This is a complex change that resulted from a great deal of experimentation with several different benchmarks. The one which proved the most useful is included as a test case, but I don't know that it captures all of the relevant changes, as I didn't have specific regression tests for each, they were more the result of reasoning about what the old algorithm would possibly do wrong. I'm also failing at the moment to craft more targeted regression tests for these changes, if anyone has ideas, it would be welcome. The first big thing broken with the old algorithm is the idea that we can take a basic block which has a loop-exiting successor and a looping successor and use the looping successor as the layout top in order to get that particular block to be the bottom of the loop after layout. This happens to work in many cases, but not in all. The second big thing broken was that we didn't try to select the exit which fell into the nearest enclosing loop (to which we exit at all). As a consequence, even if the rotation worked perfectly, it would result in one of two bad layouts. Either the bottom of the loop would get fallthrough, skipping across a nearer enclosing loop and thereby making it discontiguous, or it would be forced to take an explicit jump over the nearest enclosing loop to earch its successor. The point of the rotation is to get fallthrough, so we need it to fallthrough to the nearest loop it can. The fix to the first issue is to actually layout the loop from the loop header, and then rotate the loop such that the correct exiting edge can be a fallthrough edge. This is actually much easier than I anticipated because we can handle all the hard parts of finding a viable rotation before we do the layout. We just store that, and then rotate after layout is finished. No inner loops get split across the post-rotation backedge because we check for them when selecting the rotation. That fix exposed a latent problem with our exitting block selection -- we should allow the backedge to point into the middle of some inner-loop chain as there is no real penalty to it, the whole point is that it won't be a fallthrough edge. This may have blocked the rotation at all in some cases, I have no idea and no test case as I've never seen it in practice, it was just noticed by inspection. Finally, all of these fixes, and studying the loops they produce, highlighted another problem: in rotating loops like this, we sometimes fail to align the destination of these backwards jumping edges. Fix this by actually walking the backwards edges rather than relying on loopinfo. This fixes regressions on heapsort if block placement is enabled as well as lots of other cases where the previous logic would introduce an abundance of unnecessary branches into the execution. llvm-svn: 154783	2012-04-16 01:12:56 +00:00
Craig Topper	b86fa404d3	Merge vpermps/vpermd and vpermpd/vpermq SD nodes. llvm-svn: 154782	2012-04-16 00:41:45 +00:00
Craig Topper	b04fe34030	Fix SDTypeProfile for vpermps. The mask operand should be v8i32. llvm-svn: 154781	2012-04-16 00:12:20 +00:00
Craig Topper	1f8c9eb925	Spacing fixes and 80 column fixes. Use 0 instead of 0x80 for undef indices in vpermps/vpermd. Hardware only looks at lower 3-bits. llvm-svn: 154780	2012-04-15 23:48:57 +00:00
Craig Topper	bfc9a5f7d3	Remove AVX2 vpermq and vpermpd intrinsics. These can now be handled with normal shuffle vectors. llvm-svn: 154778	2012-04-15 22:43:31 +00:00
Craig Topper	a6e377f34f	Make member variables of AsmToken private. Remove unnecessary forward declarations. Remove an unnecessary include. llvm-svn: 154775	2012-04-15 22:00:22 +00:00
Jakub Staszak	9414f0f266	Fix class name. llvm-svn: 154773	2012-04-15 20:22:36 +00:00
Nadav Rotem	aeacc17d50	Do not convert between fp128 <-> ppc_fp128 since there is no legal cast conversion between the two. Patch by nobled <nobled@dreamwidth.org> llvm-svn: 154772	2012-04-15 20:17:14 +00:00
Jakub Staszak	89f1d0a5a4	Fix filename and register numbers. llvm-svn: 154771	2012-04-15 20:13:47 +00:00
Nadav Rotem	42bcd04ee3	Fix PR12529. The Vxx family of instructions are only supported by AVX. Use non-vex instructions for SSE4. llvm-svn: 154770	2012-04-15 19:36:44 +00:00
Duncan Sands	62d5f6f247	Add the MDBuilder helper class for conveniently creating metadata. llvm-svn: 154766	2012-04-15 18:03:49 +00:00
Benjamin Kramer	673824b4a1	Wire up support for diagnostic ranges in the ARMAsmParser. As an example, attach range info to the "invalid instruction" message: $ clang -arch arm -c asm.c asm.c:2:11: error: invalid instruction __asm__("foo r0"); ^ <inline asm>:1:2: note: instantiated into assembly here foo r0 ^~~ llvm-svn: 154765	2012-04-15 17:04:27 +00:00
Nadav Rotem	02ef0c3524	When emulating vselect using OR/AND/XOR make sure to bitcast the result back to the original type. llvm-svn: 154764	2012-04-15 15:08:09 +00:00
Elena Demikhovsky	779a72b49e	Added VPERM optimization for AVX2 shuffles llvm-svn: 154761	2012-04-15 11:18:59 +00:00
NAKAMURA Takumi	67de410135	HexagonCopyToCombine.cpp: Silence two warnings, -Wunused-variable, with -Asserts. llvm-svn: 154759	2012-04-15 05:33:43 +00:00
NAKAMURA Takumi	355eebf4cf	Target/Hexagon: Tweak to fix msvc build. llvm-svn: 154758	2012-04-15 05:09:09 +00:00
Anshuman Dasgupta	d07ba6208f	Remove trailing whitespace. llvm-svn: 154755	2012-04-14 20:59:13 +00:00
Anshuman Dasgupta	888bcf9c63	Add VLIW packetizer to ReleaseNotes.html and CREDITS.TXT. Committing patch by Sundeep Kushwaha. llvm-svn: 154754	2012-04-14 20:57:13 +00:00
Brendon Cahoon	5aa9db38ac	Add the loop unrolling info to ReleaseNotes.html and CREDITS.TXT. llvm-svn: 154752	2012-04-14 16:54:12 +00:00
Duncan Sands	c7dc70709c	There is no need for setIsExact to be public. Make it private. llvm-svn: 154750	2012-04-14 15:43:22 +00:00
Duncan Sands	34bd91a49f	Rename "fpaccuracy" metadata to the more generic "fpmath". That's because I'm thinking of generalizing it to be able to specify other freedoms beyond accuracy (such as that NaN's don't have to be respected). I'd like the 3.1 release (the first one with this metadata) to have the more generic name already rather than having to auto-upgrade it in 3.2. llvm-svn: 154744	2012-04-14 12:36:06 +00:00
Benjamin Kramer	3b342bf114	Make StringMap's copy ctor non-explicit. Without this gcc doesn't allow us to put a StringMap into a std::map. Works with clang though. llvm-svn: 154737	2012-04-14 09:04:57 +00:00
Hal Finkel	83c9796033	Fix an error in BBVectorize important for vectorizing pointer types. When vectorizing pointer types it is important to realize that potential pairs cannot be connected via the address pointer argument of a load or store. This is because even after vectorization, the address is still a scalar because the address of the higher half of the pair is implicit from the address of the lower half (it need not be, and should not be, explicitly computed). llvm-svn: 154735	2012-04-14 07:32:50 +00:00
Hal Finkel	f589519a67	Enhance BBVectorize to more-properly handle pointer values and vectorize GEPs. llvm-svn: 154734	2012-04-14 07:32:43 +00:00
Andrew Trick	97d5b9cca6	misched: Added CanHandleTerminators. This is a special flag for targets that really want their block terminators in the DAG. The default scheduler cannot handle this correctly, so it becomes the specialized scheduler's responsibility to schedule terminators. llvm-svn: 154712	2012-04-13 23:29:54 +00:00
Bob Wilson	3b0fda08e6	Remove old code to strip out unwanted PPC slices for Apple llvmCore. llvm-svn: 154706	2012-04-13 22:58:53 +00:00
Richard Smith	3e8f1f6aea	Fix X86 codegen for 'atomicrmw nand' to generate x = ~(x & y), not x = ~x & y. llvm-svn: 154705	2012-04-13 22:47:00 +00:00
Sirish Pande	f4db4b2cb4	Remove iostream from New Value Jump. llvm-svn: 154703	2012-04-13 21:01:35 +00:00
Hal Finkel	b2336a79f9	Add support to BBVectorize for vectorizing selects. llvm-svn: 154700	2012-04-13 20:45:45 +00:00
Sirish Pande	0e6e36d1d0	Add support for Hexagon Architectural feature, New Value Jump. llvm-svn: 154696	2012-04-13 20:22:31 +00:00
Sirish Pande	a8071a0f88	Pass to replace tranfer/copy instructions into combine instruction where possible. llvm-svn: 154695	2012-04-13 20:22:19 +00:00
Benjamin Kramer	330970d658	Reduce malloc traffic in DwarfAccelTable - Don't copy offsets into HashData, the underlying vector won't change once the table is finalized. - Allocate HashData and HashDataContents in a BumpPtrAllocator. - Allocate string map entries in the same allocator. - Random cleanups. llvm-svn: 154694	2012-04-13 20:06:17 +00:00
Tony Linthicum	7f13de2d6f	Support for Hexagon backend. llvm-svn: 154692	2012-04-13 19:09:44 +00:00
Tony Linthicum	66851c3e95	Support for Hexagon backend. llvm-svn: 154691	2012-04-13 19:09:18 +00:00
Evan Cheng	267a4ada52	On Darwin targets, only use vfma etc. if the source use fma() intrinsic explicitly. llvm-svn: 154689	2012-04-13 18:59:28 +00:00
Dan Gohman	670f93744b	Add some comments, and fix a few places that missed setting Changed. llvm-svn: 154687	2012-04-13 18:57:48 +00:00
Kevin Enderby	c407cc7a40	For ARM disassembly only print 32 unsigned bits for the address of branch targets so if the branch target has the high bit set it does not get printed as: beq 0xffffffff8008c404 llvm-svn: 154685	2012-04-13 18:46:37 +00:00
Dan Gohman	e1e352af2b	Consider ObjC runtime calls objc_storeWeak and others which make a copy of their argument as "escape" points for objc_retainBlock optimization. This fixes rdar://11229925. llvm-svn: 154682	2012-04-13 18:28:58 +00:00
Hal Finkel	204bf5352a	By default, use Early-CSE instead of GVN for vectorization cleanup. As has been suggested by Duncan and others, Early-CSE and GVN should do similar redundancy elimination, but Early-CSE is much less expensive. Most of my autovectorization benchmarks show a performance regresion, but all of these are < 0.1%, and so I think that it is still worth using the less expensive pass. llvm-svn: 154673	2012-04-13 17:15:33 +00:00
Sylvestre Ledru	a10d97ac91	Catch the Python exception when subprocess.Popen is failing. For example, if llc cannot be found, the full python stacktrace is displayed and no interesting information are provided. + fail the process when an exception occurs llvm-svn: 154665	2012-04-13 11:22:18 +00:00
Benjamin Kramer	a737f7de2b	Remove unused variable. llvm-svn: 154661	2012-04-13 08:09:12 +00:00
Craig Topper	eb455832b4	Silence various build warnings from Hexagon backend that show up in release builds. Mostly converting 'assert(0)' to 'llvm_unreachable' to silence warnings about missing returns. Also fold some variable declarations into asserts to prevent the variables from being unused in release builds. llvm-svn: 154660	2012-04-13 06:38:11 +00:00
Craig Topper	374f19cade	Fix target specific intrinsic handling to adjust intrinsic number before doing attribute table lookup. Also fix attribute table lookup to handle 'invalid' intrinsic correctly. Fixes PR12542 llvm-svn: 154658	2012-04-13 06:14:57 +00:00
Craig Topper	bc6bc81449	Remove getElfArchType from ELF.h. It's only used in ELFObjectFile.cpp and there's already a copy there. ELF.h was hiding the one there and causing an unused function warning. llvm-svn: 154657	2012-04-13 05:58:19 +00:00
Dan Gohman	de8d2c446b	Use the new Use-aware dominates method to apply the objc runtime library return value optimization for phi uses. Even when the phi itself is not dominated, the specific use may be dominated. llvm-svn: 154647	2012-04-13 01:08:28 +00:00
Bill Wendling	585583c8dd	Code-gen may inject code into the IR before it emits the ASM. The linker obviously cannot know that this code is present, let alone used. So prevent the internalize pass from internalizing those global values which code-gen may insert. llvm-svn: 154645	2012-04-13 01:06:27 +00:00
Dan Gohman	8478d76d64	Don't move objc_autorelease calls past autorelease pool boundaries when optimizing autorelease calls on phi nodes with null operands. This fixes rdar://11207070. llvm-svn: 154642	2012-04-13 00:59:57 +00:00
Dan Gohman	4f8ced58a7	Def here is an Instruction, so !isa<Instruction>(Def) is always false, as Eli noticed. llvm-svn: 154641	2012-04-13 00:50:57 +00:00
Dan Gohman	73273275a4	Add forms of dominates and isReachableFromEntry that accept a Use directly instead of a user Instruction. This allows them to test whether a def dominates a particular operand if the user instruction is a PHI. llvm-svn: 154631	2012-04-12 23:31:46 +00:00
Kevin Enderby	40d4e47003	Fix a few more places in the ARM disassembler so that branches get symbolic operands added when using the C disassembler API. llvm-svn: 154628	2012-04-12 23:13:34 +00:00
Ted Kremenek	967aaa956f	Update CMake build. llvm-svn: 154622	2012-04-12 22:15:23 +00:00
Evandro Menezes	6a6a66e313	Hexagon: fix CMake error. llvm-svn: 154620	2012-04-12 21:44:58 +00:00
Sirish Pande	1d195b9c25	Disable Hexagon test temporarily. There is an assert at line 558 in ScheduleDAGInstrs::buildSchedGraph(AliasAnalysis *AA). This assert needs to addressed for post RA scheduler. Until that assert is addressed, any passes that uses post ra scheduler will fail. So, I am temporarily disabling the hexagon tests until that fix is in. The assert is as follows: assert(!MI->isTerminator() && !MI->isLabel() && "Cannot schedule terminators or labels!"); llvm-svn: 154617	2012-04-12 21:06:54 +00:00
Sirish Pande	b486144c12	HexagonPacketizer patch. llvm-svn: 154616	2012-04-12 21:06:38 +00:00
Preston Gurd	2138ef6d3d	This patch improves the MCJIT runtime dynamic loader by adding new handling of zero-initialized sections, virtual sections and common symbols and preventing the loading of sections which are not required for execution such as debug information. Patch by Andy Kaylor! llvm-svn: 154610	2012-04-12 20:13:57 +00:00
Evan Cheng	3e869f002c	Generalize r153635 to deal with TokenFactor chains; also clean up the logic and fix the tests. rdar://11069732, rdar://11236106 llvm-svn: 154604	2012-04-12 19:14:21 +00:00
Evandro Menezes	5cee621c88	Hexagon: enable assembler output through the MC layer. llvm-svn: 154597	2012-04-12 17:55:53 +00:00
Anshuman Dasgupta	47628b2580	Add DFA generator for VLIW targets to ReleaseNotes.html and CREDITS.TXT. llvm-svn: 154590	2012-04-12 15:17:35 +00:00
Benjamin Kramer	df4477c506	Remove README entry obsoleted by register masks. llvm-svn: 154588	2012-04-12 12:47:29 +00:00
Jean-Daniel Dupas	1332038a8f	Remove a remaining reference to the obsolete C backend in configure llvm-svn: 154587	2012-04-12 12:02:39 +00:00
Craig Topper	d0271b27cb	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Jim Grosbach	4324f426ce	ARM 'adr' fixups don't need the interworking addend tweaking. They reference the PC directly, so things work properly that way. rdar://11231229 llvm-svn: 154576	2012-04-12 01:19:35 +00:00
Akira Hatanaka	c80ae58a5e	Revert changes that were accidentally committed. llvm-svn: 154563	2012-04-11 23:19:55 +00:00
Akira Hatanaka	1e962f250b	Fix string that is being checked. llvm-svn: 154547	2012-04-11 23:11:33 +00:00
Akira Hatanaka	47ad674f67	Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user, otherwise expand FNEG during legalization. llvm-svn: 154546	2012-04-11 22:59:08 +00:00
Akira Hatanaka	7f4c9d1429	Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user. Invalid operation is signaled if the operand of these instructions is NaN. llvm-svn: 154545	2012-04-11 22:49:04 +00:00
Kevin Enderby	72f18bbcff	Fixed a case of ARM disassembly getting an assert on a bad encoding of a VST instruction. llvm-svn: 154544	2012-04-11 22:40:17 +00:00
Akira Hatanaka	4f5c8421b3	Fix bugs in lowering of FCOPYSIGN nodes. - FCOPYSIGN nodes that have operands of different types were not handled. - Different code was generated depending on the endianness of the target. Additionally, code is added that emits INS and EXT instructions, if they are supported by target (they are R2 instructions). llvm-svn: 154540	2012-04-11 22:13:04 +00:00
Jim Grosbach	b4722bba5f	Remove incorrect comment. llvm-svn: 154533	2012-04-11 21:09:54 +00:00
Jim Grosbach	3263a07d48	Tidy up. Remove hard tab characters. llvm-svn: 154532	2012-04-11 21:02:33 +00:00
Jim Grosbach	dac4a95b35	Tidy up. Whitespace. llvm-svn: 154531	2012-04-11 21:02:30 +00:00
Benjamin Kramer	63fa02ea89	Fix pasto. llvm-svn: 154527	2012-04-11 20:20:37 +00:00
Chad Rosier	cc899f3b6d	Typo. llvm-svn: 154522	2012-04-11 19:21:58 +00:00
Andrew Trick	972541503f	TableGen's regpressure: emit per-registerclass weight limits. llvm-svn: 154518	2012-04-11 18:16:28 +00:00
Jim Grosbach	6e536de1a1	ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction. While there is an encoding for it in VUZP, the result of that is undefined, so we should avoid it. Define the instruction as a pseudo for VTRN.32 instead, as the ARM ARM indicates. rdar://11222366 llvm-svn: 154511	2012-04-11 17:40:18 +00:00
Andrew Trick	a5eee987e0	TableGen'd regpressure: register unit set pruning. The pruning is more complete if it is not done incrementally. The code is also a tad less convluted. llvm-svn: 154510	2012-04-11 17:35:26 +00:00
Jim Grosbach	4640c8169f	ARM 'vzip.32 Dd, Dm' is a pseudo-instruction. While there is an encoding for it in VZIP, the result of that is undefined, so we should avoid it. Define the instruction as a pseudo for VTRN.32 instead, as the ARM ARM indicates. rdar://11221911 llvm-svn: 154505	2012-04-11 16:53:25 +00:00
Sylvestre Ledru	14ada94682	Fix the build under Debian GNU/Hurd. Thanks to Pino Toscano for the patch llvm-svn: 154500	2012-04-11 15:35:36 +00:00
Benjamin Kramer	2335a5cb85	Cache the hash value of the operands in the MDNode. FoldingSet is implemented as a chained hash table. When there is a hash collision during insertion, which is common as we fill the table until a load factor of 2.0 is hit, we walk the chained elements, comparing every operand with the new element's operands. This can be very expensive if the MDNode has many operands. We sacrifice a word of space in MDNode to cache the full hash value, reducing compares on collision to a minimum. MDNode grows from 28 to 32 bytes + operands on x86. On x86_64 the new bits fit nicely into existing padding, not growing the struct at all. The actual speedup depends a lot on the test case and is typically between 1% and 2% for C++ code with clang -c -O0 -g. llvm-svn: 154497	2012-04-11 14:06:54 +00:00
Benjamin Kramer	63057a5ff0	FoldingSet: Push the hash through FoldingSetTraits::Equals, so clients can use it. llvm-svn: 154496	2012-04-11 14:06:47 +00:00
Benjamin Kramer	7a426b5f2e	Compute hashes directly with hash_combine instead of taking a detour through FoldingSetNodeID. llvm-svn: 154495	2012-04-11 14:06:39 +00:00
Nadav Rotem	372cf15125	remove unused argument llvm-svn: 154494	2012-04-11 11:05:21 +00:00
Duncan Sands	264d2e7121	Add a C binding to the Target and TargetMachine classes to allow for emitting binary and assembly. Patch by Carlo Kok. Emitting was inspired by but not based on the D llvm bindings. llvm-svn: 154493	2012-04-11 10:25:24 +00:00
Chandler Carruth	7ae90d4d2d	Add two statistics to help track how we are computing the inline cost. Yea, 'NumCallerCallersAnalyzed' isn't a great name, suggestions welcome. llvm-svn: 154492	2012-04-11 10:15:10 +00:00
Nadav Rotem	9d376b6578	Reapply 154397. Original message: Fix a dagcombine optimization which assumes that the vsetcc result type is always of the same size as the compared values. This is ture for SSE/AVX/NEON but not for all targets. llvm-svn: 154490	2012-04-11 08:26:11 +00:00
Duncan Sands	a4b125634e	Comment typo fix. llvm-svn: 154488	2012-04-11 08:13:47 +00:00
Evan Cheng	5efc442290	Add more fused mul+add/sub patterns. rdar://10139676 llvm-svn: 154484	2012-04-11 06:59:47 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Evan Cheng	48346c1cd9	Clean up ARM fused multiply + add/sub support some more: rename some isel predicates. Also remove NEON2 since it's not really useful and it is confusing. If NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it really mean? rdar://10139676 llvm-svn: 154480	2012-04-11 05:33:07 +00:00
Craig Topper	692d584910	Fix an overly indented line. Remove an 'else' after an 'if' that returns. llvm-svn: 154479	2012-04-11 04:55:51 +00:00
Craig Topper	bc680061e8	Inline implVisitAluOverflow by introducing a nested switch to convert the intrinsic to an nodetype. llvm-svn: 154478	2012-04-11 04:34:11 +00:00
Andrew Trick	b1a92d3b35	Tablegen'd regpressure: emit the weighted pressure limit. llvm-svn: 154477	2012-04-11 04:31:33 +00:00
Andrew Trick	0d94c73c26	Table-generated register pressure fixes. Handle mixing allocatable and unallocatable register gracefully. Simplify the pruning of register unit sets. llvm-svn: 154474	2012-04-11 03:19:15 +00:00
Craig Topper	3ef01cdb2e	Optimize code a bit by calling push_back only once in some loops. Reduces compiled code size a bit. llvm-svn: 154473	2012-04-11 03:06:35 +00:00
Evan Cheng	67a09fc397	Match (fneg (fma) to vfnma. rdar://10139676 llvm-svn: 154469	2012-04-11 01:21:25 +00:00
Charles Davis	74c282b5ef	Add retw and lretw instructions. Also, fix Intel syntax parsing for all ret instructions. llvm-svn: 154468	2012-04-11 01:10:53 +00:00
Evan Cheng	d0f61cbefe	Merge fma.ll into fusedMAC.ll llvm-svn: 154466	2012-04-11 01:03:11 +00:00
Kevin Enderby	d2980cd041	Fix ARM disassembly of VLD instructions with writebacks. And add test a case for all opcodes handed by DecodeVLDInstruction() in ARMDisassembler.cpp . llvm-svn: 154459	2012-04-11 00:25:40 +00:00
Jim Grosbach	ad66de155b	ARM add missing Thumb1 two-operand aliases for shift-by-immediate. rdar://11222742 llvm-svn: 154457	2012-04-11 00:15:16 +00:00
Evan Cheng	aca6c822e6	Fix a number of problems with ARM fused multiply add/subtract instructions. 1. The new instruction itinerary entries are not properly described. 2. The asm parser can't handle vfms and vfnms. 3. There were no assembler, disassembler test cases. 4. HasNEON2 has the wrong assembler predicate. rdar://10139676 llvm-svn: 154456	2012-04-11 00:13:00 +00:00
Jakob Stoklund Olesen	645bdd4b69	Tweak MachineLICM heuristics for cheap instructions. Allow cheap instructions to be hoisted if they are register pressure neutral or better. This happens if the instruction is the last loop use of another virtual register. Only expensive instructions are allowed to increase loop register pressure. llvm-svn: 154455	2012-04-11 00:00:28 +00:00
Jakob Stoklund Olesen	a3e86a604a	Only check for PHI uses inside the current loop. Hoisting a value that is used by a PHI in the loop will introduce a copy because the live range is extended to cross the PHI. The same applies to PHIs in exit blocks. Also use this opportunity to make HasLoopPHIUse() non-recursive. llvm-svn: 154454	2012-04-11 00:00:26 +00:00
Jakob Stoklund Olesen	0bcf8f4bfb	Fix test to be register assignment invariant. llvm-svn: 154453	2012-04-11 00:00:24 +00:00
Andrew Trick	f8b1a66620	TableGen/reginfo potential bug: typo from previous checkin. llvm-svn: 154452	2012-04-10 23:53:32 +00:00
Owen Anderson	6f1ee1634d	Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point. Zap a testcase that this allows us to completely fold away. llvm-svn: 154447	2012-04-10 22:46:53 +00:00
Dylan Noblesmith	68f310df58	llvm-stress: stop abusing ConstantFP::get() ConstantFP::get(Type, double) is unreliably host-specific: it can't handle a type like PPC128 on an x86 host. It even has a comment to that effect: "This should only be used for simple constant values like 2.0/1.0 etc, that are known-valid both as host double and as the target format." Instead, use APFloat. While we're at it, randomize the floating point value more thoroughly; it was previously limited to the range 0 to 2*19 - 1. PR12451. llvm-svn: 154446	2012-04-10 22:44:51 +00:00
Dylan Noblesmith	2a592dcc46	llvm-stress: don't make vectors of x86_mmx type LangRef.html says: "There are no arrays, vectors or constants of this type." This was hitting assertions when passing the -generate-x86-mmx option. PR12452. llvm-svn: 154445	2012-04-10 22:44:49 +00:00
Kostya Serebryany	5ba61ac651	[tsan] two more compile-time optimizations: - don't isntrument reads from constant globals. Saves ~1.5% of instrumented instructions on CPU2006 (counting static instructions, not their execution). - don't insrument reads from vtable (which is a global constant too). Saves ~5%. I did not measure the run-time impact of this, but it is certainly non-negative. llvm-svn: 154444	2012-04-10 22:29:17 +00:00
Evan Cheng	d0007f3c83	Handle llvm.fma.* intrinsics. rdar://10914096 llvm-svn: 154439	2012-04-10 21:40:28 +00:00
Duncan Sands	4f53074cca	Add a comment noting that the fdiv -> fmul conversion won't generate multiplication by a denormal, and some tests checking that. llvm-svn: 154431	2012-04-10 20:35:27 +00:00
Bill Wendling	c4c568b2d9	The MDString class stored a StringRef to the string which was already in a StringMap. This was redundant and unnecessarily bloated the MDString class. Because the MDString class is a "Value" and will never have a "name", and because the Name field in the Value class is a pointer to a StringMap entry, we repurpose the Name field for an MDString. It stores the StringMap entry in the Name field, and uses the normal methods to get the string (name) back. PR12474 llvm-svn: 154429	2012-04-10 20:12:16 +00:00
Chad Rosier	f7345b027a	Whitespace. llvm-svn: 154427	2012-04-10 19:42:07 +00:00
Chad Rosier	235a7a1746	Revert r154396, which looks to be the real culprit behind the bot failures. llvm-svn: 154426	2012-04-10 19:39:18 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Kostya Serebryany	bf2de80be6	[tsan] compile-time instrumentation: do not instrument a read if a write to the same temp follows in the same BB. Also add stats printing. On Spec CPU2006 this optimization saves roughly 4% of instrumented reads (which is 3% of all instrumented accesses): Writes : 161216 Reads : 446458 Reads-before-write: 18295 llvm-svn: 154418	2012-04-10 18:18:56 +00:00
Eric Christopher	e9abba71fe	To ensure that we have more accurate line information for a block don't elide the branch instruction if it's the only one in the block, otherwise it's ok. PR9796 and rdar://11215207 llvm-svn: 154417	2012-04-10 18:18:10 +00:00
Owen Anderson	3efc8f22bd	Revert r154397, which was causing make check failures on the buildbots. llvm-svn: 154414	2012-04-10 18:02:12 +00:00
Jim Grosbach	df5a244797	ARM fix cc_out operand handling for t2SUBrr instructions. We were incorrectly conflating some add variants which don't have a cc_out operand with the mirroring sub encodings, which do. Part of the awesome non-orthogonality legacy of thumb1. Similarly, handling of add/sub of an immediate was sometimes incorrectly removing the cc_out operand for add/sub register variants. rdar://11216577 llvm-svn: 154411	2012-04-10 17:31:55 +00:00
David Blaikie	2735136655	Remove unused variable. llvm-svn: 154398	2012-04-10 15:23:13 +00:00
Nadav Rotem	065564d85a	Fix a dagcombine optimization which assumes that the vsetcc result type is always of the same size as the compared values. This is ture for SSE/AVX/NEON but not for all targets. llvm-svn: 154397	2012-04-10 14:58:31 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Chandler Carruth	68062617a6	Make a somewhat subtle change in the logic of block placement. Sometimes the loop header has a non-loop predecessor which has been pre-fused into its chain due to unanalyzable branches. In this case, rotating the header into the body of the loop in order to place a loop exit at the bottom of the loop is a Very Bad Idea as it makes the loop non-contiguous. I'm working on a good test case for this, but it's a bit annoynig to craft. I should get one shortly, but I'm submitting this now so I can begin the (lengthy) performance analysis process. An initial run of LNT looks really, really good, but there is too much noise there for me to trust it much. llvm-svn: 154395	2012-04-10 13:35:57 +00:00
Anton Korobeynikov	4d1220de34	Transform div to mul with reciprocal only when fp imm is legal. This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394	2012-04-10 13:22:49 +00:00
David Chisnall	bbec87205d	Use the correct section types on Solaris for unwind data on both x86 and x86-64. Patch by Dmitri Shubin! llvm-svn: 154391	2012-04-10 11:44:33 +00:00
Duncan Sands	af06b26c8e	Express the number of ULPs in fpaccuracy metadata as a real rather than a rational number, eg as 2.5 rather than 5, 2. OK'd by Peter Collingbourne. llvm-svn: 154387	2012-04-10 08:22:43 +00:00
Andrew Trick	4442bfe559	Fix 12513: Loop unrolling breaks with indirect branches. Take this opportunity to generalize the indirectbr bailout logic for loop transformations. CFG transformations will never get indirectbr right, and there's no point trying. llvm-svn: 154386	2012-04-10 05:14:42 +00:00
Andrew Trick	4104ed9c76	whitespace llvm-svn: 154385	2012-04-10 05:14:37 +00:00
Andrew Trick	7d52db9864	Fix for register pressure tables. Recent refactoring introduced a bug. Fix: added buildRegUnitSets. llvm-svn: 154382	2012-04-10 03:36:49 +00:00
Evan Cheng	0752624970	Add proper checks. llvm-svn: 154379	2012-04-10 03:15:42 +00:00
Evan Cheng	136861d994	Make the code slightly more palatable. llvm-svn: 154378	2012-04-10 03:15:18 +00:00
Andrew Trick	9002c3157f	Use std::includes instead of my own implementation. Jakob's review. llvm-svn: 154377	2012-04-10 03:12:29 +00:00
Andrew Trick	31f6487532	Added a TargetRegisterInfo interface for accessing register pressure sets. llvm-svn: 154375	2012-04-10 02:25:26 +00:00
Andrew Trick	739a00386e	Added register unit sets to the target description. This is a new algorithm that finds sets of register units that can be used to model registers pressure. This handles arbitrary, overlapping register classes. Each register class is associated with a (small) list of pressure sets. These are the dimensions of pressure affected by the register class's liveness. llvm-svn: 154374	2012-04-10 02:25:24 +00:00
Andrew Trick	1d7a2c572c	Added register unit weights to the target description. This is a new algorithm that associates registers with weighted register units to accuretely model their effect on register pressure. This handles registers with multiple overlapping subregisters. It is possible, but almost inconceivable that the algorithm fails to find an exact solution for a target description. If an exact solution cannot be found, an inexact, but reasonable solution will be chosen. llvm-svn: 154373	2012-04-10 02:25:21 +00:00
Andrew Trick	3a6e88dcc9	Fix header comment llvm-svn: 154372	2012-04-10 02:25:18 +00:00
Danil Malyshev	549515e128	Add a constructor for DataRefImpl and remove excess initialization. llvm-svn: 154371	2012-04-10 01:54:44 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Rafael Espindola	1d9672bdce	Don't try to zExt just to check if an integer constant is zero, it might not fit in a i64. llvm-svn: 154364	2012-04-10 00:16:22 +00:00
Jim Grosbach	8f99bc3aed	ARM LDR/LDRT has the same encoding collision as STR/STRT. Generalized logic of r154141. llvm-svn: 154362	2012-04-10 00:13:07 +00:00
Lang Hames	ec96cd0690	Test case for PR12495. llvm-svn: 154359	2012-04-09 23:58:59 +00:00
Bill Wendling	b5cedde66d	Revert the 'EnableInitializing' flag. There is debate on whether we should run that pass by default in LTO. llvm-svn: 154356	2012-04-09 23:16:51 +00:00
Bill Wendling	383fda29be	Apply the scope restrictions after parsing the command line options. There may be some which are used in that function. llvm-svn: 154348	2012-04-09 22:18:01 +00:00
Akira Hatanaka	8483a6c47d	Have TargetLowering::getPICJumpTableRelocBase return a node that points to the GOT if jump table uses 64-bit gp-relative relocation. llvm-svn: 154341	2012-04-09 20:32:12 +00:00
Chad Rosier	e0e38f61a5	When performing a truncating store, it's possible to rearrange the data in-register, such that we can use a single vector store rather then a series of scalar stores. For func_4_8 the generated code vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vmov.u16 r0, d16[3] strb r0, [r2, #3] vmov.u16 r0, d16[2] strb r0, [r2, #2] vmov.u16 r0, d16[1] strb r0, [r2, #1] vmov.u16 r0, d16[0] strb r0, [r2] bx lr becomes vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vuzp.8 d16, d17 vst1.32 {d16[0]}, [r2, :32] bx lr I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll, but I couldn't think of a way to judiciously apply this combine. This ldrh r0, [r0, #4] strh r0, [r1] becomes vldr d16, [r0] vmov.u16 r0, d16[2] vmov.32 d16[0], r0 vuzp.16 d16, d17 vst1.32 {d16[0]}, [r1, :32] PR11158 rdar://10703339 llvm-svn: 154340	2012-04-09 20:32:02 +00:00
Lang Hames	3ad11ff90f	Patch r153892 for PR11861 apparently broke an external project (see PR12493). This patch restores TwoAddressInstructionPass's pre-r153892 behaviour when rescheduling instructions in TryInstructionTransform. Hopefully this will fix PR12493. To refix PR11861, lowering of INSERT_SUBREGS is deferred until after the copy that unties the operands is emitted (this seems to be a more appropriate fix for that issue anyway). llvm-svn: 154338	2012-04-09 20:17:30 +00:00
Chad Rosier	99cbde9e82	Update comments and remove unnecessary isVolatile() check. llvm-svn: 154336	2012-04-09 19:38:15 +00:00
Eric Christopher	132a998331	Typo. llvm-svn: 154329	2012-04-09 17:54:34 +00:00
David Blaikie	e6b6fae8ff	Fix accidentally constant conditions found by uncommitted improvements to -Wconstant-conversion. A couple of cases where we were accidentally creating constant conditions by something like "x == a \|\| b" instead of "x == a \|\| x == b". In one case a conditional & then unreachable was used - I transformed this into a direct assert instead. llvm-svn: 154324	2012-04-09 16:29:35 +00:00
Rafael Espindola	8f62b3248e	Pattern match a setcc of boolean value with 0 as a truncate. llvm-svn: 154322	2012-04-09 16:06:03 +00:00
Preston Gurd	2eec367227	This patch adds X86 instruction itineraries, which were missed by the original patch to add itineraries, to X86InstrArithmetc.td. llvm-svn: 154320	2012-04-09 15:32:22 +00:00
Duncan Sands	f1e1bb213f	Clarify that fpaccuracy metadata is giving the compiler permission to use a less accurate method. llvm-svn: 154319	2012-04-09 14:08:00 +00:00
Nadav Rotem	fb7e2ae53c	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Bill Wendling	deffc42d63	s/lto_codegen_whole_program_optimization/lto_codegen_set_whole_program_optimization/ llvm-svn: 154312	2012-04-09 08:32:21 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Craig Topper	9c3da316ec	Remove unnecessary type check when combining and/or/xor of swizzles. Move some checks to allow better early out. llvm-svn: 154309	2012-04-09 07:19:09 +00:00
Craig Topper	e5893f64e8	Remove unnecessary 'else' on an 'if' that always returns llvm-svn: 154308	2012-04-09 05:59:53 +00:00
Craig Topper	e3ad4834ae	Optimize code slightly. No functionality change. llvm-svn: 154307	2012-04-09 05:55:33 +00:00
Bill Wendling	8a49d049e1	Add a hook to turn on the internalize pass through the LTO interface. llvm-svn: 154306	2012-04-09 05:26:48 +00:00
Craig Topper	5894fe430a	Replace some explicit checks with asserts for conditions that should never happen. llvm-svn: 154305	2012-04-09 05:16:56 +00:00
Chandler Carruth	3779ac10b4	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00
Chandler Carruth	84b834267e	Fold 15 tiny test cases into a single file that implements the comprehensive testing of TLS codegen for x86. Convert all of the ones that were still using grep to use FileCheck. Remove some redundancies between them. Perhaps most interestingly expand the test cases so that they actually fully list the instruction snippet being tested. TLS operations are very narrowly defined, and so these seem reasonably stable. More importantly, the existing test cases already were crazy fine grained, expecting specific registers to be allocated. This just clarifies that no other instructions are expected, and fills in some crucial gaps that weren't being tested at all. This will make any subsequent changes to TLS much more clear during review. llvm-svn: 154303	2012-04-09 01:43:17 +00:00
Craig Topper	6148fe65e8	Optimize code a bit. No functional change intended. llvm-svn: 154299	2012-04-08 23:15:04 +00:00
Benjamin Kramer	bb6ff08766	Silence sign-compare warning. llvm-svn: 154297	2012-04-08 19:04:45 +00:00
Duncan Sands	2f1dc3814b	Only have codegen turn fdiv by a constant into fmul by the reciprocal when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296	2012-04-08 18:08:12 +00:00
Craig Topper	c8e2d91a58	Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently. llvm-svn: 154295	2012-04-08 17:53:33 +00:00
Chandler Carruth	ede4a8aa2b	Teach LLVM about a PIE option which, when enabled on top of PIC, makes optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294	2012-04-08 17:51:45 +00:00
Chandler Carruth	16f0ebcbb5	Move the TLSModel information into the TargetMachine rather than hiding in TargetLowering. There was already a FIXME about this location being odd. The interface is simplified as a consequence. This will also make it easier to change TLS models when compiling with PIE. llvm-svn: 154292	2012-04-08 17:20:55 +00:00
Benjamin Kramer	25a3d816a6	EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it. llvm-svn: 154288	2012-04-08 14:53:14 +00:00
Chandler Carruth	bed1abf9ca	Remove an over zealous assert. The assert was trying to catch places where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is not in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287	2012-04-08 14:37:02 +00:00
Chandler Carruth	49158908dc	Add a debug-only 'dump' method to the BlockChain structure to ease debugging. llvm-svn: 154286	2012-04-08 14:37:01 +00:00
Chandler Carruth	f82b0e2d29	Teach InstCombine to nuke a common alloca pattern -- an alloca which has GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285	2012-04-08 14:36:56 +00:00
Nadav Rotem	82609df647	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Bill Wendling	8c783d4122	Remove old 'grep' lines. llvm-svn: 154283	2012-04-08 11:53:54 +00:00
Bill Wendling	ccf1109040	Formatting changes. Don't put spaces in front of some code, which only makes it look 'off'. llvm-svn: 154282	2012-04-08 11:52:52 +00:00
Bill Wendling	57f8e5ebe4	FileCheckize these testcases. llvm-svn: 154281	2012-04-08 11:00:38 +00:00
Bill Wendling	5c0068f807	Remove the 'Parent' pointer from the MDNodeOperand class. An MDNode has a list of MDNodeOperands allocated directly after it as part of its allocation. Therefore, the Parent of the MDNodeOperands can be found by walking back through the operands to the beginning of that list. Mark the first operand's value pointer as being the 'first' operand so that we know where the beginning of said list is. This saves a lot of space during LTO with -O0 -g flags. llvm-svn: 154280	2012-04-08 10:20:49 +00:00
Bill Wendling	9b2503a006	Allow subclasses of the ValueHandleBase to store information as part of the value pointer by making the value pointer into a pointer-int pair with 2 bits available for flags. llvm-svn: 154279	2012-04-08 10:16:43 +00:00
Craig Topper	d024cef233	Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272	2012-04-07 22:32:29 +00:00
Craig Topper	aa9aab5ad2	Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268	2012-04-07 21:57:43 +00:00
Craig Topper	e09d1c5c48	Remove 'else' after 'if' that ends in return. llvm-svn: 154267	2012-04-07 21:23:41 +00:00
Nadav Rotem	71d07ae5cb	1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266	2012-04-07 21:19:08 +00:00
Duncan Sands	5f8397a934	Convert floating point division by a constant into multiplication by the reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265	2012-04-07 20:04:00 +00:00
Chandler Carruth	75a1cf327a	Perform partial SROA on the helper hashing structure. I really wish the optimizers could do this for us, but expecting partial SROA of classes with template methods through cloning is probably expecting too much heroics. With this change, the begin/end pointer pairs which indicate the status of each loop iteration are actually passed directly into each layer of the combine_data calls, and the inliner has a chance to see when most of the combine_data function could be deleted by inlining. Similarly for 'length'. We have to be careful to limit the places where in/out reference parameters are used as those will also defeat the inliner / optimizers from properly propagating constants. With this change, LLVM is able to fully inline and unroll the hash computation of small sets of values, such as two or three pointers. These now decompose into essentially straight-line code with no loops or function calls. There is still one code quality problem to be solved with the hashing -- LLVM is failing to nuke the alloca. It removes all loads from the alloca, leaving only lifetime intrinsics and dead(!!) stores to the alloca. =/ Very unfortunate. llvm-svn: 154264	2012-04-07 20:01:31 +00:00
Chandler Carruth	28192c9398	Fix ValueTracking to conclude that debug intrinsics are safe to speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263	2012-04-07 19:22:18 +00:00
Benjamin Kramer	e1f4ca1b0f	SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. Found by inspection. llvm-svn: 154262	2012-04-07 17:19:26 +00:00
Bob Wilson	6f9be7e2c6	Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543> The tLDRr instruction with the last register operand set to the zero register prints in assembly as if no register was specified, and the assembler encodes it as a tLDRi instruction with a zero immediate. With the integrated assembler, that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which is broken. Emit the instruction as tLDRi with a zero immediate. I don't know if there's a good way to write a testcase for this. Suggestions welcome. Opportunities for follow-up work: 1) The asm printer should complain if a non-optional register operand is set to the zero register, instead of silently dropping it. 2) The integrated assembler should complain in the same situation, instead of silently emitting the operand as "r0". llvm-svn: 154261	2012-04-07 16:51:59 +00:00
Hongbin Zheng	5758f495da	Refactor: Use positive field names in VectorizeConfig. llvm-svn: 154249	2012-04-07 03:56:23 +00:00
NAKAMURA Takumi	b95f64134e	Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. Cygwin-1.7 supports dw2. Some recent mingw distros support one, too. I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin. llvm-svn: 154247	2012-04-07 02:24:20 +00:00
Alexis Hunt	78fce432b7	Make the test for r154235 more platform-independent with a shorter string. llvm-svn: 154243	2012-04-07 01:33:14 +00:00
Alexis Hunt	0235f684f0	Output UTF-8-encoded characters as identifier characters into assembly by default. This is a behaviour configurable in the MCAsmInfo. I've decided to turn it on by default in (possibly optimistic) hopes that most assemblers are reasonably sane. If this proves a problem, switching to default seems reasonable. I'm not sure if this is the opportune place to test, but it seemed good to make sure it was tested somewhere. llvm-svn: 154235	2012-04-07 00:37:53 +00:00
Jim Grosbach	0c509fa6bf	Tidy up. 80 columns. llvm-svn: 154226	2012-04-06 23:43:50 +00:00
Jakob Stoklund Olesen	baa3566091	ARMPat is equivalent to Requires<[IsARM]>. llvm-svn: 154210	2012-04-06 21:21:59 +00:00
Jakob Stoklund Olesen	b4bd3880ba	Eliminate iOS-specific tail call instructions. After register masks were introdruced to represent the call clobbers, it is no longer necessary to have duplicate instruction for iOS. llvm-svn: 154209	2012-04-06 21:17:42 +00:00
Akira Hatanaka	487e56763d	Add lines in global-address.ll to test N32 and N64 code generation. llvm-svn: 154202	2012-04-06 20:23:36 +00:00
Chandler Carruth	8a102c21e3	There is no portable std::abs overload for int64_t, use the llvm::abs64 which exists for this purpose. llvm-svn: 154199	2012-04-06 20:10:52 +00:00
Sean Callanan	e804b5b762	Fixed two leaks in the MC disassembler. The MC disassembler requires a MCSubtargetInfo and a MCInstrInfo to exist in order to initialize the instruction printer and disassembler; however, although the printer and disassembler keep references to these objects they do not own them. Previously, the MCSubtargetInfo and MCInstrInfo objects were just leaked. I have extended LLVMDisasmContext to own these objects and delete them when it is destroyed. llvm-svn: 154192	2012-04-06 18:21:09 +00:00
Jakob Stoklund Olesen	967b86a0a2	Allow negative immediates in ARM and Thumb2 compares. ARM and Thumb2 mode can use cmn instructions to compare against negative immediates. Thumb1 mode can't. llvm-svn: 154183	2012-04-06 17:45:04 +00:00
David Chisnall	c1c9cdab23	Reintroduce InlineCostAnalyzer::getInlineCost() variant with explicit callee parameter until we have a more sensible API for doing the same thing. Reviewed by Chandler. llvm-svn: 154180	2012-04-06 17:27:41 +00:00
Chandler Carruth	49da93396e	Sink the collection of return instructions until after all simplification has been performed. This is a bit less efficient (requires another ilist walk of the basic blocks) but shouldn't matter in practice. More importantly, it's just too much work to keep track of all the various ways the return instructions can be mutated while simplifying them. This fixes yet another crasher, reported by Daniel Dunbar. llvm-svn: 154179	2012-04-06 17:21:31 +00:00
Chandler Carruth	e547fefcb7	Tweak this test to ensure the inliner did indeed fire. Thanks to Richard Smith for pointing this out in review. llvm-svn: 154178	2012-04-06 17:21:28 +00:00
Duncan Sands	d12b18f820	Make GVN's propagateEquality non-recursive. No intended functionality change. The modifications are a lot more trivial than they appear to be in the diff! llvm-svn: 154174	2012-04-06 15:31:09 +00:00
Craig Topper	bdc9f071a4	Test case for PR12413 llvm-svn: 154172	2012-04-06 14:38:25 +00:00
Benjamin Kramer	3cacabfb04	Fix narrowing conversion. llvm-svn: 154171	2012-04-06 13:33:52 +00:00
Benjamin Kramer	15e21a159e	DenseMap: Perform the pod-like object optimization when the value type is POD-like, not the DenseMapInfo for it. Purge now unused template arguments. This has been broken since r91421. Patch by Lubos Lunak! llvm-svn: 154170	2012-04-06 10:43:44 +00:00
Craig Topper	447417c932	Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413. llvm-svn: 154166	2012-04-06 07:45:23 +00:00
Craig Topper	4eb9616b24	Add the tests that were supposed to go with r153935 that I forgot svn add llvm-svn: 154165	2012-04-06 07:09:59 +00:00
Chandler Carruth	17e335888c	Actually finish this sentence in the comment the way I intended. Thanks Matt for pointing this out. llvm-svn: 154158	2012-04-06 01:19:38 +00:00
Chandler Carruth	e41f6f4189	Sink the return instruction collection until after we're done deleting dead code, including dead return instructions in some cases. Otherwise, we end up having a bogus poniter to a return instruction that blows up much further down the road. It turns out that this pattern is both simpler to code, easier to update in the face of enhancements to the inliner cleanup, and likely cheaper given that it won't add dead instructions to the list. Thanks to John Regehr's numerous test cases for teasing this out. llvm-svn: 154157	2012-04-06 01:11:52 +00:00
Jakob Stoklund Olesen	6a2e99a46a	Deduplicate ARM call-related instructions. We had special instructions for iOS because r9 is call-clobbered, but that is represented dynamically by the register mask operands now, so there is no need for the pseudo-instructions. llvm-svn: 154144	2012-04-06 00:04:58 +00:00
Jim Grosbach	d6a1a1dc2f	ARM: Don't form a t2LDRi8 or t2STRi8 with an offset of zero. The load/store optimizer splits LDRD/STRD into two instructions when the register pairing doesn't work out. For negative offsets in Thumb2, it uses t2STRi8 to do that. That's fine, except for the case when the offset is in the range [-4,-1]. In that case, we'll also form a second t2STRi8 with the original offset plus 4, resulting in a t2STRi8 with a non-negative offset, which ends up as if it were an STRT, which is completely bogus. Similarly for loads. No testcase, unfortunately, as any I've been able to construct is both large and extremely fragile. rdar://11193937 llvm-svn: 154141	2012-04-05 23:51:24 +00:00
Kaelyn Uhrain	cb5b585cca	Fix the build breakage introduced by r154131. The empty 1-argument operator delete is for the benefit of the destructor. A couple of spot checks of running yaml-bench under valgrind against a few of the files under test/YAMLParser did not reveal any leaks introduced by this change. llvm-svn: 154137	2012-04-05 23:06:17 +00:00
Kaelyn Uhrain	64aa24e13f	Really fix -Wnon-virtual-dtor warnings; gcc needs the dtors to be explicitly marked as virtual. llvm-svn: 154131	2012-04-05 22:11:12 +00:00
Bill Wendling	4f60125dd8	The internalize pass can be dangerous for LTO. Consider the following program: $ cat main.c void foo(void) { } int main(int argc, char *argv[]) { foo(); return 0; } $ cat bundle.c extern void foo(void); void bar(void) { foo(); } $ clang -o main main.c $ clang -o bundle.so bundle.c -bundle -bundle_loader ./main $ nm -m bundle.so 0000000000000f40 (__TEXT,__text) external _bar (undefined) external _foo (from executable) (undefined) external dyld_stub_binder (from libSystem) $ clang -o main main.c -O4 $ clang -o bundle.so bundle.c -bundle -bundle_loader ./main Undefined symbols for architecture x86_64: "_foo", referenced from: _bar in bundle-elQN6d.o ld: symbol(s) not found for architecture x86_64 clang: error: linker command failed with exit code 1 (use -v to see invocation) The linker was told that the 'foo' in 'main' was 'internal' and had no uses, so it was dead stripped. Another situation is something like: define void @foo() { ret void } define void @bar() { call asm volatile "call _foo" ... ret void } The only use of 'foo' is inside of an inline ASM call. Since we don't look inside those for uses of functions, we don't specify this as a "use." Get around this by not invoking the 'internalize' pass by default. This is an admitted hack for LTO correctness. <rdar://problem/11185386> llvm-svn: 154124	2012-04-05 21:26:44 +00:00
Jim Grosbach	930f2f66e7	ARM assembly aliases for add negative immediates using sub. 'add r2, #-1024' should just use 'sub r2, #1024' rather than erroring out. Thumb1 aliases for adding a negative immediate to the stack pointer, also. rdar://11192734 llvm-svn: 154123	2012-04-05 20:57:13 +00:00
Akira Hatanaka	43fb2b2cea	Reapply test case in 154038, this time with triple to prevent the backend from emitting gp_rel relocation. llvm-svn: 154122	2012-04-05 20:44:35 +00:00
Eric Christopher	aec8a82694	Patch to set is_stmt a little better for prologue lines in a function. This enables debuggers to see what are interesting lines for a breakpoint rather than any line that starts a function. rdar://9852092 llvm-svn: 154120	2012-04-05 20:39:05 +00:00
Jakob Stoklund Olesen	37492eac8c	Don't break the IV update in TLI::SimplifySetCC(). LSR always tries to make the ICmp in the loop latch use the incremented induction variable. This allows the induction variable to be kept in a single register. When the induction variable limit is equal to the stride, SimplifySetCC() would break LSR's hard work by transforming: (icmp (add iv, stride), stride) --> (cmp iv, 0) This forced us to use lea for the IC update, preventing the simpler incl+cmp. <rdar://problem/7643606> <rdar://problem/11184260> llvm-svn: 154119	2012-04-05 20:30:20 +00:00
Dan Gohman	cc64bbca81	Fix accidentally inverted logic from r152803, and make the testcase slightly less trivial. This fixes rdar://11171718. llvm-svn: 154118	2012-04-05 20:27:21 +00:00
Sylvestre Ledru	e8235fef31	Fix a problem in the target detection for Debian GNU/HURD llvm-svn: 154117	2012-04-05 19:34:15 +00:00
Sylvestre Ledru	4cf7dae516	Fix a problem in the target detection for Debian GNU/kFreeBSD llvm-svn: 154114	2012-04-05 18:53:09 +00:00
Owen Anderson	a6eebf6013	Treat f16 the same as f80/f128 for the purposes of generating constants during instruction selection. llvm-svn: 154113	2012-04-05 18:50:32 +00:00
Silviu Baranga	af3c79f0ac	Added support for unpredictable ADC/SBC instructions on ARM, and also fixed some corner cases involving the PC register as an operand for these instructions. llvm-svn: 154101	2012-04-05 16:19:29 +00:00
Silviu Baranga	d365397daa	Added support for handling unpredictable arithmetic instructions on ARM. llvm-svn: 154100	2012-04-05 16:13:15 +00:00
Hongbin Zheng	31d33b8318	BBVectorize: Add the const modifier to the VectorizeConfig because we won't modify it. llvm-svn: 154098	2012-04-05 16:07:49 +00:00
Hongbin Zheng	d6825173d3	Introduce the VectorizeConfig class, with which we can control the behavior of the BBVectorizePass without using command line option. As pointed out by Hal, we can ask the TargetLoweringInfo for the architecture specific VectorizeConfig to perform vectorizing with architecture specific information. llvm-svn: 154096	2012-04-05 15:46:55 +00:00
James Molloy	1ea6473688	An oversight when applying the patches for r150956 and r150957 to a vanilla tree meant I forgot to svn add these testcases. Noticed while investigating PR12274! llvm-svn: 154090	2012-04-05 10:01:12 +00:00
Hongbin Zheng	6edbc39bd7	Add the function "vectorizeBasicBlock" which allow users vectorize a BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the loop unroll pass right after the loop is unrolled. llvm-svn: 154089	2012-04-05 08:05:16 +00:00
Jim Grosbach	15c6884a4b	ARM assembly aliases for two-operand V[R]SHR instructions. rdar://11189467 llvm-svn: 154087	2012-04-05 07:23:53 +00:00
Argyrios Kyrtzidis	ef909265e8	In MemoryBuffer::getOpenFile() make sure that the buffer is null-terminated if the caller requested a null-terminated one. When mapping the file there could be a racing issue that resulted in the file being larger than the FileSize passed by the caller. We already have an assertion for this in MemoryBuffer::init() but have a runtime guarantee that the buffer will be null-terminated, so do a copy that adds a null-terminator. Protects against crash of rdar://11161822. llvm-svn: 154082	2012-04-05 04:23:56 +00:00
Jim Grosbach	3d00eecc53	ARM assembly parsing for 'msr' plain 'cpsr' operand. Plain 'cpsr' is an alias for 'cpsr_fc'. rdar://11153753 llvm-svn: 154080	2012-04-05 03:17:53 +00:00
Jakob Stoklund Olesen	f2390e8303	Pass the right sign to TLI->isLegalICmpImmediate. LSR can fold three addressing modes into its ICmpZero node: ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset ICmpZero -1ScaleReg + Offset => ICmp ScaleReg, Offset ICmpZero BaseReg + -1ScaleReg => ICmp BaseReg, ScaleReg The first two cases are only used if TLI->isLegalICmpImmediate() likes the offset. Make sure the right Offset sign is passed to this method in the second case. The ARM version is not symmetric. <rdar://problem/11184260> llvm-svn: 154079	2012-04-05 03:10:56 +00:00
Bob Wilson	1864146ab7	Do not include multiple -arch options in CPPFLAGS. llvm-svn: 154070	2012-04-05 00:35:55 +00:00
Michael J. Spencer	b2d30b8699	Fix -Wnon-virtual-dtor warnings. llvm-svn: 154063	2012-04-04 22:34:55 +00:00
Akira Hatanaka	121342fcc2	Reapply 154038 without the failing test. llvm-svn: 154062	2012-04-04 22:16:36 +00:00
Owen Anderson	4743c6e159	Revert r154038. It was causing make check failures. llvm-svn: 154054	2012-04-04 21:18:58 +00:00
Pete Cooper	d7290700e6	REG_SEQUENCE expansion to COPY instructions wasn't taking account of sub register indices on the source registers. No simple test case llvm-svn: 154051	2012-04-04 21:03:25 +00:00
Benjamin Kramer	379018b2da	Fix a C++11 UDL conflict. Still not fixed in the standard ;) llvm-svn: 154044	2012-04-04 20:33:56 +00:00
Pete Cooper	8a3dc0ed8c	f16 FREM can now be legalized by promoting to f32 llvm-svn: 154039	2012-04-04 19:36:31 +00:00
Akira Hatanaka	9705c865d9	Fix LowerGlobalAddress to produce instructions with the correct relocation types for N32 ABI. Add new test case and update existing ones. llvm-svn: 154038	2012-04-04 19:02:38 +00:00
Akira Hatanaka	591ecdd7c1	Fix LowerJumpTable to produce instructions with the correct relocation types for N32 ABI. Test case will be updated after the patch that fixes TargetLowering::getPICJumpTableRelocBase is checked in. llvm-svn: 154036	2012-04-04 18:31:32 +00:00
Akira Hatanaka	b3a2b8c199	Fix LowerConstantPool to produce instructions with the correct relocation types for N32 ABI and update test case. llvm-svn: 154034	2012-04-04 18:26:12 +00:00
Jakob Stoklund Olesen	0a5b72f0e4	Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr. A MOVCCr instruction can be commuted by inverting the condition. This can help reduce register pressure and remove unnecessary copies in some cases. <rdar://problem/11182914> llvm-svn: 154033	2012-04-04 18:23:42 +00:00
Jakob Stoklund Olesen	92fd79a639	Remove spurious debug output. llvm-svn: 154032	2012-04-04 18:23:38 +00:00
Akira Hatanaka	aeff24e424	Fix LowerBlockAddress to produce instructions with the correct relocation types for N32 ABI and update test case. llvm-svn: 154031	2012-04-04 18:22:53 +00:00
Hongbin Zheng	e1fd20172b	Add testcase for r154007, when a function has the optsize attribute, the loop should be unrolled according the value of OptSizeUnrollThreshold. llvm-svn: 154014	2012-04-04 13:24:40 +00:00
Rafael Espindola	ba0a6cabb8	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Hongbin Zheng	b21b865fe8	LoopUnrollPass: Use variable "Threshold" instead of "CurrentThreshold" when reducing unroll count, otherwise the reduced unroll count is not taking the "OptimizeForSize" attribute into account. llvm-svn: 154007	2012-04-04 11:44:08 +00:00
Benjamin Kramer	a1355d17ca	Move yaml::Stream's dtor out of line so it can see Scanner's dtor. llvm-svn: 154004	2012-04-04 08:53:34 +00:00

... 3 4 5 6 7 ...

81821 Commits