llvm-project

Commit Graph

Author	SHA1	Message	Date
Evandro Menezes	6a6a66e313	Hexagon: fix CMake error. llvm-svn: 154620	2012-04-12 21:44:58 +00:00
Sirish Pande	b486144c12	HexagonPacketizer patch. llvm-svn: 154616	2012-04-12 21:06:38 +00:00
Preston Gurd	2138ef6d3d	This patch improves the MCJIT runtime dynamic loader by adding new handling of zero-initialized sections, virtual sections and common symbols and preventing the loading of sections which are not required for execution such as debug information. Patch by Andy Kaylor! llvm-svn: 154610	2012-04-12 20:13:57 +00:00
Evan Cheng	3e869f002c	Generalize r153635 to deal with TokenFactor chains; also clean up the logic and fix the tests. rdar://11069732, rdar://11236106 llvm-svn: 154604	2012-04-12 19:14:21 +00:00
Evandro Menezes	5cee621c88	Hexagon: enable assembler output through the MC layer. llvm-svn: 154597	2012-04-12 17:55:53 +00:00
Benjamin Kramer	df4477c506	Remove README entry obsoleted by register masks. llvm-svn: 154588	2012-04-12 12:47:29 +00:00
Craig Topper	d0271b27cb	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Jim Grosbach	4324f426ce	ARM 'adr' fixups don't need the interworking addend tweaking. They reference the PC directly, so things work properly that way. rdar://11231229 llvm-svn: 154576	2012-04-12 01:19:35 +00:00
Akira Hatanaka	47ad674f67	Emit neg.s or neg.d only if -enable-no-nans-fp-math is supplied by user, otherwise expand FNEG during legalization. llvm-svn: 154546	2012-04-11 22:59:08 +00:00
Akira Hatanaka	7f4c9d1429	Emit abs.s or abs.d only if -enable-no-nans-fp-math is supplied by user. Invalid operation is signaled if the operand of these instructions is NaN. llvm-svn: 154545	2012-04-11 22:49:04 +00:00
Kevin Enderby	72f18bbcff	Fixed a case of ARM disassembly getting an assert on a bad encoding of a VST instruction. llvm-svn: 154544	2012-04-11 22:40:17 +00:00
Akira Hatanaka	4f5c8421b3	Fix bugs in lowering of FCOPYSIGN nodes. - FCOPYSIGN nodes that have operands of different types were not handled. - Different code was generated depending on the endianness of the target. Additionally, code is added that emits INS and EXT instructions, if they are supported by target (they are R2 instructions). llvm-svn: 154540	2012-04-11 22:13:04 +00:00
Chad Rosier	cc899f3b6d	Typo. llvm-svn: 154522	2012-04-11 19:21:58 +00:00
Jim Grosbach	6e536de1a1	ARM 'vuzp.32 Dd, Dm' is a pseudo-instruction. While there is an encoding for it in VUZP, the result of that is undefined, so we should avoid it. Define the instruction as a pseudo for VTRN.32 instead, as the ARM ARM indicates. rdar://11222366 llvm-svn: 154511	2012-04-11 17:40:18 +00:00
Jim Grosbach	4640c8169f	ARM 'vzip.32 Dd, Dm' is a pseudo-instruction. While there is an encoding for it in VZIP, the result of that is undefined, so we should avoid it. Define the instruction as a pseudo for VTRN.32 instead, as the ARM ARM indicates. rdar://11221911 llvm-svn: 154505	2012-04-11 16:53:25 +00:00
Sylvestre Ledru	14ada94682	Fix the build under Debian GNU/Hurd. Thanks to Pino Toscano for the patch llvm-svn: 154500	2012-04-11 15:35:36 +00:00
Benjamin Kramer	2335a5cb85	Cache the hash value of the operands in the MDNode. FoldingSet is implemented as a chained hash table. When there is a hash collision during insertion, which is common as we fill the table until a load factor of 2.0 is hit, we walk the chained elements, comparing every operand with the new element's operands. This can be very expensive if the MDNode has many operands. We sacrifice a word of space in MDNode to cache the full hash value, reducing compares on collision to a minimum. MDNode grows from 28 to 32 bytes + operands on x86. On x86_64 the new bits fit nicely into existing padding, not growing the struct at all. The actual speedup depends a lot on the test case and is typically between 1% and 2% for C++ code with clang -c -O0 -g. llvm-svn: 154497	2012-04-11 14:06:54 +00:00
Benjamin Kramer	63057a5ff0	FoldingSet: Push the hash through FoldingSetTraits::Equals, so clients can use it. llvm-svn: 154496	2012-04-11 14:06:47 +00:00
Benjamin Kramer	7a426b5f2e	Compute hashes directly with hash_combine instead of taking a detour through FoldingSetNodeID. llvm-svn: 154495	2012-04-11 14:06:39 +00:00
Nadav Rotem	372cf15125	remove unused argument llvm-svn: 154494	2012-04-11 11:05:21 +00:00
Duncan Sands	264d2e7121	Add a C binding to the Target and TargetMachine classes to allow for emitting binary and assembly. Patch by Carlo Kok. Emitting was inspired by but not based on the D llvm bindings. llvm-svn: 154493	2012-04-11 10:25:24 +00:00
Chandler Carruth	7ae90d4d2d	Add two statistics to help track how we are computing the inline cost. Yea, 'NumCallerCallersAnalyzed' isn't a great name, suggestions welcome. llvm-svn: 154492	2012-04-11 10:15:10 +00:00
Nadav Rotem	9d376b6578	Reapply 154397. Original message: Fix a dagcombine optimization which assumes that the vsetcc result type is always of the same size as the compared values. This is ture for SSE/AVX/NEON but not for all targets. llvm-svn: 154490	2012-04-11 08:26:11 +00:00
Evan Cheng	5efc442290	Add more fused mul+add/sub patterns. rdar://10139676 llvm-svn: 154484	2012-04-11 06:59:47 +00:00
Nadav Rotem	9bc178ac5c	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Evan Cheng	48346c1cd9	Clean up ARM fused multiply + add/sub support some more: rename some isel predicates. Also remove NEON2 since it's not really useful and it is confusing. If NEON + VFP4 implies NEON2 but NEON2 doesn't imply NEON + VFP4, what does it really mean? rdar://10139676 llvm-svn: 154480	2012-04-11 05:33:07 +00:00
Craig Topper	692d584910	Fix an overly indented line. Remove an 'else' after an 'if' that returns. llvm-svn: 154479	2012-04-11 04:55:51 +00:00
Craig Topper	bc680061e8	Inline implVisitAluOverflow by introducing a nested switch to convert the intrinsic to an nodetype. llvm-svn: 154478	2012-04-11 04:34:11 +00:00
Craig Topper	3ef01cdb2e	Optimize code a bit by calling push_back only once in some loops. Reduces compiled code size a bit. llvm-svn: 154473	2012-04-11 03:06:35 +00:00
Evan Cheng	67a09fc397	Match (fneg (fma) to vfnma. rdar://10139676 llvm-svn: 154469	2012-04-11 01:21:25 +00:00
Charles Davis	74c282b5ef	Add retw and lretw instructions. Also, fix Intel syntax parsing for all ret instructions. llvm-svn: 154468	2012-04-11 01:10:53 +00:00
Kevin Enderby	d2980cd041	Fix ARM disassembly of VLD instructions with writebacks. And add test a case for all opcodes handed by DecodeVLDInstruction() in ARMDisassembler.cpp . llvm-svn: 154459	2012-04-11 00:25:40 +00:00
Jim Grosbach	ad66de155b	ARM add missing Thumb1 two-operand aliases for shift-by-immediate. rdar://11222742 llvm-svn: 154457	2012-04-11 00:15:16 +00:00
Evan Cheng	aca6c822e6	Fix a number of problems with ARM fused multiply add/subtract instructions. 1. The new instruction itinerary entries are not properly described. 2. The asm parser can't handle vfms and vfnms. 3. There were no assembler, disassembler test cases. 4. HasNEON2 has the wrong assembler predicate. rdar://10139676 llvm-svn: 154456	2012-04-11 00:13:00 +00:00
Jakob Stoklund Olesen	645bdd4b69	Tweak MachineLICM heuristics for cheap instructions. Allow cheap instructions to be hoisted if they are register pressure neutral or better. This happens if the instruction is the last loop use of another virtual register. Only expensive instructions are allowed to increase loop register pressure. llvm-svn: 154455	2012-04-11 00:00:28 +00:00
Jakob Stoklund Olesen	a3e86a604a	Only check for PHI uses inside the current loop. Hoisting a value that is used by a PHI in the loop will introduce a copy because the live range is extended to cross the PHI. The same applies to PHIs in exit blocks. Also use this opportunity to make HasLoopPHIUse() non-recursive. llvm-svn: 154454	2012-04-11 00:00:26 +00:00
Owen Anderson	6f1ee1634d	Move the constant-folding support for FP_ROUND in SelectionDAG from the one-operand version of getNode() to the two-operand version, since it became a two-operand node at sound point. Zap a testcase that this allows us to completely fold away. llvm-svn: 154447	2012-04-10 22:46:53 +00:00
Kostya Serebryany	5ba61ac651	[tsan] two more compile-time optimizations: - don't isntrument reads from constant globals. Saves ~1.5% of instrumented instructions on CPU2006 (counting static instructions, not their execution). - don't insrument reads from vtable (which is a global constant too). Saves ~5%. I did not measure the run-time impact of this, but it is certainly non-negative. llvm-svn: 154444	2012-04-10 22:29:17 +00:00
Evan Cheng	d0007f3c83	Handle llvm.fma.* intrinsics. rdar://10914096 llvm-svn: 154439	2012-04-10 21:40:28 +00:00
Duncan Sands	4f53074cca	Add a comment noting that the fdiv -> fmul conversion won't generate multiplication by a denormal, and some tests checking that. llvm-svn: 154431	2012-04-10 20:35:27 +00:00
Bill Wendling	c4c568b2d9	The MDString class stored a StringRef to the string which was already in a StringMap. This was redundant and unnecessarily bloated the MDString class. Because the MDString class is a "Value" and will never have a "name", and because the Name field in the Value class is a pointer to a StringMap entry, we repurpose the Name field for an MDString. It stores the StringMap entry in the Name field, and uses the normal methods to get the string (name) back. PR12474 llvm-svn: 154429	2012-04-10 20:12:16 +00:00
Chad Rosier	f7345b027a	Whitespace. llvm-svn: 154427	2012-04-10 19:42:07 +00:00
Chad Rosier	235a7a1746	Revert r154396, which looks to be the real culprit behind the bot failures. llvm-svn: 154426	2012-04-10 19:39:18 +00:00
Eric Christopher	65ada95b84	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Kostya Serebryany	bf2de80be6	[tsan] compile-time instrumentation: do not instrument a read if a write to the same temp follows in the same BB. Also add stats printing. On Spec CPU2006 this optimization saves roughly 4% of instrumented reads (which is 3% of all instrumented accesses): Writes : 161216 Reads : 446458 Reads-before-write: 18295 llvm-svn: 154418	2012-04-10 18:18:56 +00:00
Eric Christopher	e9abba71fe	To ensure that we have more accurate line information for a block don't elide the branch instruction if it's the only one in the block, otherwise it's ok. PR9796 and rdar://11215207 llvm-svn: 154417	2012-04-10 18:18:10 +00:00
Owen Anderson	3efc8f22bd	Revert r154397, which was causing make check failures on the buildbots. llvm-svn: 154414	2012-04-10 18:02:12 +00:00
Jim Grosbach	df5a244797	ARM fix cc_out operand handling for t2SUBrr instructions. We were incorrectly conflating some add variants which don't have a cc_out operand with the mirroring sub encodings, which do. Part of the awesome non-orthogonality legacy of thumb1. Similarly, handling of add/sub of an immediate was sometimes incorrectly removing the cc_out operand for add/sub register variants. rdar://11216577 llvm-svn: 154411	2012-04-10 17:31:55 +00:00
David Blaikie	2735136655	Remove unused variable. llvm-svn: 154398	2012-04-10 15:23:13 +00:00
Nadav Rotem	065564d85a	Fix a dagcombine optimization which assumes that the vsetcc result type is always of the same size as the compared values. This is ture for SSE/AVX/NEON but not for all targets. llvm-svn: 154397	2012-04-10 14:58:31 +00:00
Nadav Rotem	f934f91709	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Chandler Carruth	68062617a6	Make a somewhat subtle change in the logic of block placement. Sometimes the loop header has a non-loop predecessor which has been pre-fused into its chain due to unanalyzable branches. In this case, rotating the header into the body of the loop in order to place a loop exit at the bottom of the loop is a Very Bad Idea as it makes the loop non-contiguous. I'm working on a good test case for this, but it's a bit annoynig to craft. I should get one shortly, but I'm submitting this now so I can begin the (lengthy) performance analysis process. An initial run of LNT looks really, really good, but there is too much noise there for me to trust it much. llvm-svn: 154395	2012-04-10 13:35:57 +00:00
Anton Korobeynikov	4d1220de34	Transform div to mul with reciprocal only when fp imm is legal. This fixes PR12516 and uncovers one weird problem in legalize (workarounded) llvm-svn: 154394	2012-04-10 13:22:49 +00:00
David Chisnall	bbec87205d	Use the correct section types on Solaris for unwind data on both x86 and x86-64. Patch by Dmitri Shubin! llvm-svn: 154391	2012-04-10 11:44:33 +00:00
Duncan Sands	af06b26c8e	Express the number of ULPs in fpaccuracy metadata as a real rather than a rational number, eg as 2.5 rather than 5, 2. OK'd by Peter Collingbourne. llvm-svn: 154387	2012-04-10 08:22:43 +00:00
Andrew Trick	4442bfe559	Fix 12513: Loop unrolling breaks with indirect branches. Take this opportunity to generalize the indirectbr bailout logic for loop transformations. CFG transformations will never get indirectbr right, and there's no point trying. llvm-svn: 154386	2012-04-10 05:14:42 +00:00
Andrew Trick	4104ed9c76	whitespace llvm-svn: 154385	2012-04-10 05:14:37 +00:00
Evan Cheng	136861d994	Make the code slightly more palatable. llvm-svn: 154378	2012-04-10 03:15:18 +00:00
Danil Malyshev	549515e128	Add a constructor for DataRefImpl and remove excess initialization. llvm-svn: 154371	2012-04-10 01:54:44 +00:00
Evan Cheng	f8bad08001	Fix a long standing tail call optimization bug. When a libcall is emitted legalizer always use the DAG entry node. This is wrong when the libcall is emitted as a tail call since it effectively folds the return node. If the return node's input chain is not the entry (i.e. call, load, or store) use that as the tail call input chain. PR12419 rdar://9770785 rdar://11195178 llvm-svn: 154370	2012-04-10 01:51:00 +00:00
Rafael Espindola	1d9672bdce	Don't try to zExt just to check if an integer constant is zero, it might not fit in a i64. llvm-svn: 154364	2012-04-10 00:16:22 +00:00
Jim Grosbach	8f99bc3aed	ARM LDR/LDRT has the same encoding collision as STR/STRT. Generalized logic of r154141. llvm-svn: 154362	2012-04-10 00:13:07 +00:00
Akira Hatanaka	8483a6c47d	Have TargetLowering::getPICJumpTableRelocBase return a node that points to the GOT if jump table uses 64-bit gp-relative relocation. llvm-svn: 154341	2012-04-09 20:32:12 +00:00
Chad Rosier	e0e38f61a5	When performing a truncating store, it's possible to rearrange the data in-register, such that we can use a single vector store rather then a series of scalar stores. For func_4_8 the generated code vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vmov.u16 r0, d16[3] strb r0, [r2, #3] vmov.u16 r0, d16[2] strb r0, [r2, #2] vmov.u16 r0, d16[1] strb r0, [r2, #1] vmov.u16 r0, d16[0] strb r0, [r2] bx lr becomes vldr d16, LCPI0_0 vmov d17, r0, r1 vadd.i16 d16, d17, d16 vuzp.8 d16, d17 vst1.32 {d16[0]}, [r2, :32] bx lr I'm not fond of how this combine pessimizes 2012-03-13-DAGCombineBug.ll, but I couldn't think of a way to judiciously apply this combine. This ldrh r0, [r0, #4] strh r0, [r1] becomes vldr d16, [r0] vmov.u16 r0, d16[2] vmov.32 d16[0], r0 vuzp.16 d16, d17 vst1.32 {d16[0]}, [r1, :32] PR11158 rdar://10703339 llvm-svn: 154340	2012-04-09 20:32:02 +00:00
Lang Hames	3ad11ff90f	Patch r153892 for PR11861 apparently broke an external project (see PR12493). This patch restores TwoAddressInstructionPass's pre-r153892 behaviour when rescheduling instructions in TryInstructionTransform. Hopefully this will fix PR12493. To refix PR11861, lowering of INSERT_SUBREGS is deferred until after the copy that unties the operands is emitted (this seems to be a more appropriate fix for that issue anyway). llvm-svn: 154338	2012-04-09 20:17:30 +00:00
Chad Rosier	99cbde9e82	Update comments and remove unnecessary isVolatile() check. llvm-svn: 154336	2012-04-09 19:38:15 +00:00
David Blaikie	e6b6fae8ff	Fix accidentally constant conditions found by uncommitted improvements to -Wconstant-conversion. A couple of cases where we were accidentally creating constant conditions by something like "x == a \|\| b" instead of "x == a \|\| x == b". In one case a conditional & then unreachable was used - I transformed this into a direct assert instead. llvm-svn: 154324	2012-04-09 16:29:35 +00:00
Rafael Espindola	8f62b3248e	Pattern match a setcc of boolean value with 0 as a truncate. llvm-svn: 154322	2012-04-09 16:06:03 +00:00
Preston Gurd	2eec367227	This patch adds X86 instruction itineraries, which were missed by the original patch to add itineraries, to X86InstrArithmetc.td. llvm-svn: 154320	2012-04-09 15:32:22 +00:00
Nadav Rotem	fb7e2ae53c	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Nadav Rotem	b801ca3976	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Craig Topper	9c3da316ec	Remove unnecessary type check when combining and/or/xor of swizzles. Move some checks to allow better early out. llvm-svn: 154309	2012-04-09 07:19:09 +00:00
Craig Topper	e5893f64e8	Remove unnecessary 'else' on an 'if' that always returns llvm-svn: 154308	2012-04-09 05:59:53 +00:00
Craig Topper	e3ad4834ae	Optimize code slightly. No functionality change. llvm-svn: 154307	2012-04-09 05:55:33 +00:00
Craig Topper	5894fe430a	Replace some explicit checks with asserts for conditions that should never happen. llvm-svn: 154305	2012-04-09 05:16:56 +00:00
Chandler Carruth	3779ac10b4	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00
Craig Topper	6148fe65e8	Optimize code a bit. No functional change intended. llvm-svn: 154299	2012-04-08 23:15:04 +00:00
Benjamin Kramer	bb6ff08766	Silence sign-compare warning. llvm-svn: 154297	2012-04-08 19:04:45 +00:00
Duncan Sands	2f1dc3814b	Only have codegen turn fdiv by a constant into fmul by the reciprocal when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296	2012-04-08 18:08:12 +00:00
Craig Topper	c8e2d91a58	Simplify code that tries to do vector extracts for shuffles when the mask width and the input vector widths don't match. No need to check the min and max are in range before calculating the start index. The range check after having the start index is sufficient. Also no need to check for an extract from the beginning differently. llvm-svn: 154295	2012-04-08 17:53:33 +00:00
Chandler Carruth	ede4a8aa2b	Teach LLVM about a PIE option which, when enabled on top of PIC, makes optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294	2012-04-08 17:51:45 +00:00
Chandler Carruth	16f0ebcbb5	Move the TLSModel information into the TargetMachine rather than hiding in TargetLowering. There was already a FIXME about this location being odd. The interface is simplified as a consequence. This will also make it easier to change TLS models when compiling with PIE. llvm-svn: 154292	2012-04-08 17:20:55 +00:00
Benjamin Kramer	25a3d816a6	EngineBuilder::create is expected to take ownership of the TargetMachine passed to it. Delete it on error or when we create an interpreter that doesn't need it. llvm-svn: 154288	2012-04-08 14:53:14 +00:00
Chandler Carruth	bed1abf9ca	Remove an over zealous assert. The assert was trying to catch places where a chain outside of the loop block-set ended up in the worklist for scheduling as part of the contiguous loop. However, asserting the first block in the chain is in the loop-set isn't a valid check -- we may be forced to drag a chain into the worklist due to one block in the chain being part of the loop even though the first block is not in the loop. This occurs when we have been forced to form a chain early due to un-analyzable branches. No test case here as I have no idea how to even begin reducing one, and it will be hopelessly fragile. We have to somehow end up with a loop header of an inner loop which is a successor of a basic block with an unanalyzable pair of branch instructions. Ow. Self-host triggers it so it is unlikely it will regress. This at least gets block placement back to passing selfhost and the test suite. There are still a lot of slowdown that I don't like coming out of block placement, although there are now also a lot of speedups. =[ I'm seeing swings in both directions up to 10%. I'm going to try to find time to dig into this and see if we can turn this on for 3.1 as it does a really good job of cleaning up after some loops that degraded with the inliner changes. llvm-svn: 154287	2012-04-08 14:37:02 +00:00
Chandler Carruth	49158908dc	Add a debug-only 'dump' method to the BlockChain structure to ease debugging. llvm-svn: 154286	2012-04-08 14:37:01 +00:00
Chandler Carruth	f82b0e2d29	Teach InstCombine to nuke a common alloca pattern -- an alloca which has GEPs, bit casts, and stores reaching it but no other instructions. These often show up during the iterative processing of the inliner, SROA, and DCE. Once we hit this point, we can completely remove the alloca. These were actually showing up in the final, fully optimized code in a bunch of inliner tests I've been working on, and notably they show up after LLVM finishes optimizing away all function calls involved in hash_combine(a, b). llvm-svn: 154285	2012-04-08 14:36:56 +00:00
Nadav Rotem	82609df647	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Bill Wendling	5c0068f807	Remove the 'Parent' pointer from the MDNodeOperand class. An MDNode has a list of MDNodeOperands allocated directly after it as part of its allocation. Therefore, the Parent of the MDNodeOperands can be found by walking back through the operands to the beginning of that list. Mark the first operand's value pointer as being the 'first' operand so that we know where the beginning of said list is. This saves a lot of space during LTO with -O0 -g flags. llvm-svn: 154280	2012-04-08 10:20:49 +00:00
Bill Wendling	9b2503a006	Allow subclasses of the ValueHandleBase to store information as part of the value pointer by making the value pointer into a pointer-int pair with 2 bits available for flags. llvm-svn: 154279	2012-04-08 10:16:43 +00:00
Craig Topper	d024cef233	Turn avx2 vinserti128 intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove patterns for selecting the intrinsic. Similar was already done for avx1. llvm-svn: 154272	2012-04-07 22:32:29 +00:00
Craig Topper	aa9aab5ad2	Move vinsertf128 patterns near the instruction definitions. Add AddedComplexity to AVX2 vextracti128 patterns to give them priority over the integer versions of vextractf128 patterns. llvm-svn: 154268	2012-04-07 21:57:43 +00:00
Craig Topper	e09d1c5c48	Remove 'else' after 'if' that ends in return. llvm-svn: 154267	2012-04-07 21:23:41 +00:00
Nadav Rotem	71d07ae5cb	1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266	2012-04-07 21:19:08 +00:00
Duncan Sands	5f8397a934	Convert floating point division by a constant into multiplication by the reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265	2012-04-07 20:04:00 +00:00
Chandler Carruth	28192c9398	Fix ValueTracking to conclude that debug intrinsics are safe to speculate. Without this, loop rotate (among many other places) would suddenly stop working in the presence of debug info. I found this looking at loop rotate, and have augmented its tests with a reduction out of a very hot loop in yacr2 where failing to do this rotation costs sometimes more than 10% in runtime performance, perturbing numerous downstream optimizations. This should have no impact on performance without debug info, but the change in performance when debug info is enabled can be extreme. As a consequence (and this how I got to this yak) any profiling of performance problems should be treated with deep suspicion -- they may have been wildly innacurate of debug info was enabled for profiling. =/ Just a heads up. llvm-svn: 154263	2012-04-07 19:22:18 +00:00
Benjamin Kramer	e1f4ca1b0f	SCEV: When expanding a GEP the final addition to the base pointer has NUW but not NSW. Found by inspection. llvm-svn: 154262	2012-04-07 17:19:26 +00:00
Bob Wilson	6f9be7e2c6	Fix Thumb __builtin_longjmp with integrated assembler. <rdar://problem/11203543> The tLDRr instruction with the last register operand set to the zero register prints in assembly as if no register was specified, and the assembler encodes it as a tLDRi instruction with a zero immediate. With the integrated assembler, that zero register gets emitted as "r0", so we get "ldr rx, [ry, r0]" which is broken. Emit the instruction as tLDRi with a zero immediate. I don't know if there's a good way to write a testcase for this. Suggestions welcome. Opportunities for follow-up work: 1) The asm printer should complain if a non-optional register operand is set to the zero register, instead of silently dropping it. 2) The integrated assembler should complain in the same situation, instead of silently emitting the operand as "r0". llvm-svn: 154261	2012-04-07 16:51:59 +00:00
Hongbin Zheng	5758f495da	Refactor: Use positive field names in VectorizeConfig. llvm-svn: 154249	2012-04-07 03:56:23 +00:00
NAKAMURA Takumi	b95f64134e	Target/X86/MCTargetDesc/X86MCAsmInfo.cpp: Enable DwarfCFI (aka DW2) on Cygming. Cygwin-1.7 supports dw2. Some recent mingw distros support one, too. I have confirmed test-suite/SingleSource/Benchmarks/Shootout-C++/except.cpp can pass on Cygwin. llvm-svn: 154247	2012-04-07 02:24:20 +00:00
Alexis Hunt	0235f684f0	Output UTF-8-encoded characters as identifier characters into assembly by default. This is a behaviour configurable in the MCAsmInfo. I've decided to turn it on by default in (possibly optimistic) hopes that most assemblers are reasonably sane. If this proves a problem, switching to default seems reasonable. I'm not sure if this is the opportune place to test, but it seemed good to make sure it was tested somewhere. llvm-svn: 154235	2012-04-07 00:37:53 +00:00
Jim Grosbach	0c509fa6bf	Tidy up. 80 columns. llvm-svn: 154226	2012-04-06 23:43:50 +00:00
Jakob Stoklund Olesen	baa3566091	ARMPat is equivalent to Requires<[IsARM]>. llvm-svn: 154210	2012-04-06 21:21:59 +00:00
Jakob Stoklund Olesen	b4bd3880ba	Eliminate iOS-specific tail call instructions. After register masks were introdruced to represent the call clobbers, it is no longer necessary to have duplicate instruction for iOS. llvm-svn: 154209	2012-04-06 21:17:42 +00:00
Chandler Carruth	8a102c21e3	There is no portable std::abs overload for int64_t, use the llvm::abs64 which exists for this purpose. llvm-svn: 154199	2012-04-06 20:10:52 +00:00
Sean Callanan	e804b5b762	Fixed two leaks in the MC disassembler. The MC disassembler requires a MCSubtargetInfo and a MCInstrInfo to exist in order to initialize the instruction printer and disassembler; however, although the printer and disassembler keep references to these objects they do not own them. Previously, the MCSubtargetInfo and MCInstrInfo objects were just leaked. I have extended LLVMDisasmContext to own these objects and delete them when it is destroyed. llvm-svn: 154192	2012-04-06 18:21:09 +00:00
Jakob Stoklund Olesen	967b86a0a2	Allow negative immediates in ARM and Thumb2 compares. ARM and Thumb2 mode can use cmn instructions to compare against negative immediates. Thumb1 mode can't. llvm-svn: 154183	2012-04-06 17:45:04 +00:00
David Chisnall	c1c9cdab23	Reintroduce InlineCostAnalyzer::getInlineCost() variant with explicit callee parameter until we have a more sensible API for doing the same thing. Reviewed by Chandler. llvm-svn: 154180	2012-04-06 17:27:41 +00:00
Chandler Carruth	49da93396e	Sink the collection of return instructions until after all simplification has been performed. This is a bit less efficient (requires another ilist walk of the basic blocks) but shouldn't matter in practice. More importantly, it's just too much work to keep track of all the various ways the return instructions can be mutated while simplifying them. This fixes yet another crasher, reported by Daniel Dunbar. llvm-svn: 154179	2012-04-06 17:21:31 +00:00
Duncan Sands	d12b18f820	Make GVN's propagateEquality non-recursive. No intended functionality change. The modifications are a lot more trivial than they appear to be in the diff! llvm-svn: 154174	2012-04-06 15:31:09 +00:00
Benjamin Kramer	3cacabfb04	Fix narrowing conversion. llvm-svn: 154171	2012-04-06 13:33:52 +00:00
Craig Topper	447417c932	Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413. llvm-svn: 154166	2012-04-06 07:45:23 +00:00
Chandler Carruth	e41f6f4189	Sink the return instruction collection until after we're done deleting dead code, including dead return instructions in some cases. Otherwise, we end up having a bogus poniter to a return instruction that blows up much further down the road. It turns out that this pattern is both simpler to code, easier to update in the face of enhancements to the inliner cleanup, and likely cheaper given that it won't add dead instructions to the list. Thanks to John Regehr's numerous test cases for teasing this out. llvm-svn: 154157	2012-04-06 01:11:52 +00:00
Jakob Stoklund Olesen	6a2e99a46a	Deduplicate ARM call-related instructions. We had special instructions for iOS because r9 is call-clobbered, but that is represented dynamically by the register mask operands now, so there is no need for the pseudo-instructions. llvm-svn: 154144	2012-04-06 00:04:58 +00:00
Jim Grosbach	d6a1a1dc2f	ARM: Don't form a t2LDRi8 or t2STRi8 with an offset of zero. The load/store optimizer splits LDRD/STRD into two instructions when the register pairing doesn't work out. For negative offsets in Thumb2, it uses t2STRi8 to do that. That's fine, except for the case when the offset is in the range [-4,-1]. In that case, we'll also form a second t2STRi8 with the original offset plus 4, resulting in a t2STRi8 with a non-negative offset, which ends up as if it were an STRT, which is completely bogus. Similarly for loads. No testcase, unfortunately, as any I've been able to construct is both large and extremely fragile. rdar://11193937 llvm-svn: 154141	2012-04-05 23:51:24 +00:00
Jim Grosbach	930f2f66e7	ARM assembly aliases for add negative immediates using sub. 'add r2, #-1024' should just use 'sub r2, #1024' rather than erroring out. Thumb1 aliases for adding a negative immediate to the stack pointer, also. rdar://11192734 llvm-svn: 154123	2012-04-05 20:57:13 +00:00
Eric Christopher	aec8a82694	Patch to set is_stmt a little better for prologue lines in a function. This enables debuggers to see what are interesting lines for a breakpoint rather than any line that starts a function. rdar://9852092 llvm-svn: 154120	2012-04-05 20:39:05 +00:00
Jakob Stoklund Olesen	37492eac8c	Don't break the IV update in TLI::SimplifySetCC(). LSR always tries to make the ICmp in the loop latch use the incremented induction variable. This allows the induction variable to be kept in a single register. When the induction variable limit is equal to the stride, SimplifySetCC() would break LSR's hard work by transforming: (icmp (add iv, stride), stride) --> (cmp iv, 0) This forced us to use lea for the IC update, preventing the simpler incl+cmp. <rdar://problem/7643606> <rdar://problem/11184260> llvm-svn: 154119	2012-04-05 20:30:20 +00:00
Dan Gohman	cc64bbca81	Fix accidentally inverted logic from r152803, and make the testcase slightly less trivial. This fixes rdar://11171718. llvm-svn: 154118	2012-04-05 20:27:21 +00:00
Owen Anderson	a6eebf6013	Treat f16 the same as f80/f128 for the purposes of generating constants during instruction selection. llvm-svn: 154113	2012-04-05 18:50:32 +00:00
Silviu Baranga	af3c79f0ac	Added support for unpredictable ADC/SBC instructions on ARM, and also fixed some corner cases involving the PC register as an operand for these instructions. llvm-svn: 154101	2012-04-05 16:19:29 +00:00
Silviu Baranga	d365397daa	Added support for handling unpredictable arithmetic instructions on ARM. llvm-svn: 154100	2012-04-05 16:13:15 +00:00
Hongbin Zheng	31d33b8318	BBVectorize: Add the const modifier to the VectorizeConfig because we won't modify it. llvm-svn: 154098	2012-04-05 16:07:49 +00:00
Hongbin Zheng	d6825173d3	Introduce the VectorizeConfig class, with which we can control the behavior of the BBVectorizePass without using command line option. As pointed out by Hal, we can ask the TargetLoweringInfo for the architecture specific VectorizeConfig to perform vectorizing with architecture specific information. llvm-svn: 154096	2012-04-05 15:46:55 +00:00
Hongbin Zheng	6edbc39bd7	Add the function "vectorizeBasicBlock" which allow users vectorize a BasicBlock in other passes, e.g. we can call vectorizeBasicBlock in the loop unroll pass right after the loop is unrolled. llvm-svn: 154089	2012-04-05 08:05:16 +00:00
Jim Grosbach	15c6884a4b	ARM assembly aliases for two-operand V[R]SHR instructions. rdar://11189467 llvm-svn: 154087	2012-04-05 07:23:53 +00:00
Argyrios Kyrtzidis	ef909265e8	In MemoryBuffer::getOpenFile() make sure that the buffer is null-terminated if the caller requested a null-terminated one. When mapping the file there could be a racing issue that resulted in the file being larger than the FileSize passed by the caller. We already have an assertion for this in MemoryBuffer::init() but have a runtime guarantee that the buffer will be null-terminated, so do a copy that adds a null-terminator. Protects against crash of rdar://11161822. llvm-svn: 154082	2012-04-05 04:23:56 +00:00
Jim Grosbach	3d00eecc53	ARM assembly parsing for 'msr' plain 'cpsr' operand. Plain 'cpsr' is an alias for 'cpsr_fc'. rdar://11153753 llvm-svn: 154080	2012-04-05 03:17:53 +00:00
Jakob Stoklund Olesen	f2390e8303	Pass the right sign to TLI->isLegalICmpImmediate. LSR can fold three addressing modes into its ICmpZero node: ICmpZero BaseReg + Offset => ICmp BaseReg, -Offset ICmpZero -1ScaleReg + Offset => ICmp ScaleReg, Offset ICmpZero BaseReg + -1ScaleReg => ICmp BaseReg, ScaleReg The first two cases are only used if TLI->isLegalICmpImmediate() likes the offset. Make sure the right Offset sign is passed to this method in the second case. The ARM version is not symmetric. <rdar://problem/11184260> llvm-svn: 154079	2012-04-05 03:10:56 +00:00
Akira Hatanaka	121342fcc2	Reapply 154038 without the failing test. llvm-svn: 154062	2012-04-04 22:16:36 +00:00
Owen Anderson	4743c6e159	Revert r154038. It was causing make check failures. llvm-svn: 154054	2012-04-04 21:18:58 +00:00
Pete Cooper	d7290700e6	REG_SEQUENCE expansion to COPY instructions wasn't taking account of sub register indices on the source registers. No simple test case llvm-svn: 154051	2012-04-04 21:03:25 +00:00
Benjamin Kramer	379018b2da	Fix a C++11 UDL conflict. Still not fixed in the standard ;) llvm-svn: 154044	2012-04-04 20:33:56 +00:00
Pete Cooper	8a3dc0ed8c	f16 FREM can now be legalized by promoting to f32 llvm-svn: 154039	2012-04-04 19:36:31 +00:00
Akira Hatanaka	9705c865d9	Fix LowerGlobalAddress to produce instructions with the correct relocation types for N32 ABI. Add new test case and update existing ones. llvm-svn: 154038	2012-04-04 19:02:38 +00:00
Akira Hatanaka	591ecdd7c1	Fix LowerJumpTable to produce instructions with the correct relocation types for N32 ABI. Test case will be updated after the patch that fixes TargetLowering::getPICJumpTableRelocBase is checked in. llvm-svn: 154036	2012-04-04 18:31:32 +00:00
Akira Hatanaka	b3a2b8c199	Fix LowerConstantPool to produce instructions with the correct relocation types for N32 ABI and update test case. llvm-svn: 154034	2012-04-04 18:26:12 +00:00
Jakob Stoklund Olesen	0a5b72f0e4	Implement ARMBaseInstrInfo::commuteInstruction() for MOVCCr. A MOVCCr instruction can be commuted by inverting the condition. This can help reduce register pressure and remove unnecessary copies in some cases. <rdar://problem/11182914> llvm-svn: 154033	2012-04-04 18:23:42 +00:00
Jakob Stoklund Olesen	92fd79a639	Remove spurious debug output. llvm-svn: 154032	2012-04-04 18:23:38 +00:00
Akira Hatanaka	aeff24e424	Fix LowerBlockAddress to produce instructions with the correct relocation types for N32 ABI and update test case. llvm-svn: 154031	2012-04-04 18:22:53 +00:00
Rafael Espindola	ba0a6cabb8	Always compute all the bits in ComputeMaskedBits. This allows us to keep passing reduced masks to SimplifyDemandedBits, but know about all the bits if SimplifyDemandedBits fails. This allows instcombine to simplify cases like the one in the included testcase. llvm-svn: 154011	2012-04-04 12:51:34 +00:00
Hongbin Zheng	b21b865fe8	LoopUnrollPass: Use variable "Threshold" instead of "CurrentThreshold" when reducing unroll count, otherwise the reduced unroll count is not taking the "OptimizeForSize" attribute into account. llvm-svn: 154007	2012-04-04 11:44:08 +00:00
Benjamin Kramer	a1355d17ca	Move yaml::Stream's dtor out of line so it can see Scanner's dtor. llvm-svn: 154004	2012-04-04 08:53:34 +00:00
Craig Topper	4c7d995029	Remove default case from switch that was already covering all cases. llvm-svn: 153996	2012-04-04 04:42:42 +00:00
Pete Cooper	e7bff68a5e	Removed useless switch for default case when switch was covering all the enum values llvm-svn: 153984	2012-04-04 00:53:04 +00:00
Michael J. Spencer	afc0d6a36f	Sorry about that. MSVC seems to accept just about any random string you give it ;/ llvm-svn: 153979	2012-04-03 23:36:44 +00:00
Michael J. Spencer	22120c47a7	Add YAML parser to Support. llvm-svn: 153977	2012-04-03 23:09:22 +00:00
Pete Cooper	9511ec86f9	Add VSELECT to LegalizeVectorTypes::ScalariseVectorResult. Previously it would crash if it encountered a 1 element VSELECT. Solution is slightly more complicated than just creating a SELET as we have to mask or sign extend the vector condition if it had different boolean contents from the scalar condition. Fixes <rdar://problem/11178095> llvm-svn: 153976	2012-04-03 22:57:55 +00:00
Pete Cooper	b98934cf72	Removed one last bad continue statement meant to be removed in r153914. llvm-svn: 153975	2012-04-03 22:18:49 +00:00
Chad Rosier	2a02fe1bb2	Fix an issue in SimplifySetCC() specific to vector comparisons. When folding X == X we need to check getBooleanContents() to determine if the result is a vector of ones or a vector of negative ones. I tried creating a test case, but the problem seems to only be exposed on a much older version of clang (around r144500). rdar://10923049 llvm-svn: 153966	2012-04-03 20:11:24 +00:00
Eric Christopher	b81e2b403c	Fix thinko check for number of operands to be the one that actually might have more than 19 operands. Add a testcase to make sure I never screw that up again. Part of rdar://11026482 llvm-svn: 153961	2012-04-03 17:55:42 +00:00

1 2 3 4 5 ...

53972 Commits