llvm-project

Commit Graph

Author	SHA1	Message	Date
Juergen Ributzka	75b2f34069	[FastISel][AArch64] Teach the address computation to also fold sub instructions. Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186	2014-10-07 03:40:03 +00:00
Juergen Ributzka	42bf665f2b	[FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction." This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185	2014-10-07 03:39:59 +00:00
Gerolf Hoflehner	c0b4c20e5e	[InstCombine] re-commit r218721 icmp-select-icmp optimization Takes care of the assert that caused build fails. Rather than asserting the code checks now that the definition and use are in the same block, and does not attempt to optimize when that is not the case. llvm-svn: 219175	2014-10-07 00:16:12 +00:00
NAKAMURA Takumi	c62436c60a	ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable] llvm-svn: 219172	2014-10-06 23:48:04 +00:00
David Majnemer	121a174f52	Support: Add a utility to remap std{in,out,err} to /dev/null if closed It's possible to start a program with one (or all) of the standard file descriptors closed. Subsequent open system calls will give the program a low-numbered file descriptor. This is problematic because we may believe we are writing to standard out instead of a file. Introduce Process::FixupStandardFileDescriptors, a helper function to remap standard file descriptors to /dev/null if they were closed before the program started. llvm-svn: 219170	2014-10-06 23:16:18 +00:00
David Blaikie	e44ee92a3f	range-for some loops in DAE llvm-svn: 219167	2014-10-06 22:59:29 +00:00
Duncan P. N. Exon Smith	e5d7d9797b	LoopUnroll: Change code order of changes to new basic blocks Add new basic blocks to `LoopInfo` earlier. No functionality change intended (simplifies upcoming bugfix patch). llvm-svn: 219150	2014-10-06 22:05:02 +00:00
Duncan P. N. Exon Smith	0bbf5418c6	Sink comment, NFC llvm-svn: 219149	2014-10-06 22:04:59 +00:00
Hal Finkel	9808595319	[DAGCombine] Remove SIGN_EXTEND-related inf-loop The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141	2014-10-06 20:19:47 +00:00
Sanjay Patel	7bc9185ab5	Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y) The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139	2014-10-06 19:31:18 +00:00
Hal Finkel	43ce71f1b1	[BasicAA] Revert "Revert r218714 - Make better use of zext and sign information." This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick) The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 219135	2014-10-06 18:37:59 +00:00
Duncan P. N. Exon Smith	a7a90a2f19	BFI: Improve assertion message, since it's actually firing This assertion is firing because -loop-unroll is failing to preserve -loop-info (see PR20987). Improve it. llvm-svn: 219130	2014-10-06 17:42:00 +00:00
Tim Northover	ea964f53c3	ARM: silence unused variable warning llvm-svn: 219128	2014-10-06 17:26:36 +00:00
Tim Northover	8997fedfc6	ARM: remove dead InstPrinting code This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127	2014-10-06 17:10:13 +00:00
Hans Wennborg	1b1a399489	MachObjectWriter: optimize the string table for common suffices This is a follow-up to r207670 (ELF) and r218636 (COFF). Differential Revision: http://reviews.llvm.org/D5622 llvm-svn: 219126	2014-10-06 17:05:19 +00:00
Benjamin Kramer	6bf8af5de9	DbgValueHistoryCalculator: Store modified registers in a BitVector instead of std::set. And iterate over the smaller map instead of the larger set first. Reduces the time spent in calculateDbgValueHistory by 30-40%. llvm-svn: 219123	2014-10-06 15:31:04 +00:00
Hal Finkel	8eae3ad2ff	[CFL-AA] Update for handling of globals and more tests We used to return PartialAlias if either variable being queried interacted with arguments or globals. AFAICT, we can change this to only returning MayAlias iff both variables being queried interacted with arguments or globals. Also, adding some basic functionality tests: some basic IPA tests, checking that we give conservative responses with arguments/globals thrown in the mix, and ensuring that we trace values through stores and loads. Note that saying that 'x' interacted with arguments or globals means that the Attributes of the StratifiedSet that 'x' belongs to has any bits set. Patch by George Burgess IV, thanks! llvm-svn: 219122	2014-10-06 14:42:56 +00:00
Yaron Keren	c8514a3421	Make the MD5 result name consistent between functions, header and source. llvm-svn: 219121	2014-10-06 13:48:07 +00:00
Rafael Espindola	11527a1d71	Note that a gold bug has been fixed. We should be able to stop working around it at some point in the future. llvm-svn: 219115	2014-10-06 12:33:27 +00:00
Benjamin Kramer	4ba642a2f7	X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't convert them anymore. llvm-svn: 219112	2014-10-06 09:56:40 +00:00
Eric Christopher	47e079d45e	Refactor RelocVisitor to take an object. This removes some string comparisons and makes it a bit easier to check individual targets. Patch by Charlie Turner. llvm-svn: 219108	2014-10-06 06:55:55 +00:00
Eric Christopher	3faf2f1e02	Add subtarget caches to aarch64, arm, ppc, and x86. These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106	2014-10-06 06:45:36 +00:00
Yaron Keren	28a3fc6c3e	Resolve ambiguity between llvm::make_unique and std::make_unique. Intorduced in r219098. llvm-svn: 219105	2014-10-06 06:39:57 +00:00
David Blaikie	febfafd13a	DebugInfo: Sink constructImportedEntityDIE down into DwarfUnit from DwarfDebug. It was just calling a bunch of DwarfUnit functions anyway, as can be seen by the simplification of removing "TheCU" from all the function calls in the implementation. llvm-svn: 219103	2014-10-06 05:37:24 +00:00
Frederic Riss	d1cfc3c791	[dwarfdump] Print the name for referenced specification of abstract_origin DIEs. Reviewers: dblaikie, samsonov, echristo, aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5466 llvm-svn: 219099	2014-10-06 03:36:31 +00:00
Frederic Riss	6005dbd62e	Factor the Unit section parsing into the DWARFUnitSection class. Summary: No functional change. Reviewers: dblaikie, samsonov Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5522 llvm-svn: 219098	2014-10-06 03:36:18 +00:00
Chandler Carruth	6d2472daca	[PM] Remove an unused and rather expensive mapping from an analysis group's interface to all of the implementations of that analysis group. The groups themselves can and do manage this anyways, the pass registry needn't involve itself. llvm-svn: 219097	2014-10-06 00:30:59 +00:00
Chandler Carruth	9cf0b8f0eb	[PM] Remove the (deeply misguided) 'unregister' functionality from the pass registry. This style of registry is somewhat questionable, but it being non-monotonic is crazy. No one is (or should be) unloading DSOs with passes and unregistering them here. I've checked with a few folks and I don't know of anyone using this functionality or any important use case where it is necessary. llvm-svn: 219096	2014-10-06 00:13:25 +00:00
Chandler Carruth	484bc69aec	[cleanup] Switch to using range-based for loops in two very obvious places. llvm-svn: 219095	2014-10-06 00:06:48 +00:00
Chandler Carruth	c34cfb9c0a	[cleanup] Fix up trailing whitespace and formatting in the pass regitsry code prior to hacking on it more significantly. llvm-svn: 219094	2014-10-05 23:59:03 +00:00
Owen Anderson	8373d338f6	Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092	2014-10-05 23:41:26 +00:00
Chandler Carruth	0927da4583	[x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd. This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090	2014-10-05 22:57:31 +00:00
Chandler Carruth	daa1ff985c	[x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffle that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this really right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086	2014-10-05 19:14:34 +00:00
David Blaikie	60b8662ea7	Remove unused map This became unnecessary/unused in r208636 llvm-svn: 219085	2014-10-05 16:31:13 +00:00
Benjamin Kramer	77b0e13aba	X86: Don't drop half of the mask when converting 2-address shufps into 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084	2014-10-05 16:14:29 +00:00
Elena Demikhovsky	44bf0637d5	AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q. This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083	2014-10-05 14:11:08 +00:00
Benjamin Kramer	12a2d10769	Simplify code. No functionality change. llvm-svn: 219082	2014-10-05 12:21:57 +00:00
Chandler Carruth	acecdc0211	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081	2014-10-05 12:07:34 +00:00
Chandler Carruth	9f4d9fa54e	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079	2014-10-05 11:41:36 +00:00
NAKAMURA Takumi	2a295fd337	HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation] llvm-svn: 219073	2014-10-05 04:54:54 +00:00
NAKAMURA Takumi	431c9d3f1f	HexagonDesc: Update LLVMBuild.txt. llvm-svn: 219071	2014-10-05 04:54:29 +00:00
Hal Finkel	4564688806	[InstCombine] Simplify the logic from r219067 using ValueTracking Joerg suggested on IRC that I look at generalizing the logic from r219067 to handle more general redundancies (like removing an assume(x > 3) dominated by an assume(x > 5)). The way to do this would be to ask ValueTracking to determine the value of the i1 argument. It turns out that ValueTracking is not very good at this right now (although it does get the trivial redundancy case) because it does not understand ICmps. Nevertheless, the resulting code in InstCombine is simpler than r219067, so we might as well do it now. llvm-svn: 219070	2014-10-05 00:53:02 +00:00
Benjamin Kramer	4b92c6b8e5	[SystemZ] Make operator bool explicit. NFC. llvm-svn: 219069	2014-10-04 22:44:35 +00:00
Benjamin Kramer	2e52f02864	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Hal Finkel	04a156139e	[InstCombine] Remove redundant @llvm.assume intrinsics For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067	2014-10-04 21:27:06 +00:00
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
David Blaikie	cda2aa823e	Sink DwarfDebug::updateSubprogramScopeDIE into DwarfCompileUnit This requires exposing some of the current function state from DwarfDebug. I hope there's not too much of that to expose as I go through all the functions, but it still seems nicer to expose singular data down to multiple consumers, than have consumers expose raw mapping data structures up to DwarfDebug for building subprograms. Part of a series of refactoring to allow subprograms in both the skeleton and dwo CUs under Fission. llvm-svn: 219060	2014-10-04 16:24:00 +00:00
David Blaikie	8945219dc9	Reformatting accidentally left out of r219057 llvm-svn: 219059	2014-10-04 16:00:26 +00:00
David Blaikie	14499a7d68	Sink DwarfDebug::attachLowHighPC into DwarfCompileUnit One of many things to sink down into DwarfCompileUnit to allow handling of subprograms in both the skeleton and dwo CU under Fission. llvm-svn: 219058	2014-10-04 15:58:47 +00:00
David Blaikie	37c5231051	Move DwarfCompileUnit from DwarfUnit.h to its own header (DwarfCompileUnit.h) In preparation for sinking all the subprogram emission code down from DwarfDebug into DwarfCompileUnit, this will avoid bloating DwarfUnit.h/cpp greatly and make concerns a bit more clear/isolated. (sinking this handling down is part of the work to handle emitting minimal subprograms for -gmlt-like data into the skeleton CU under fission) llvm-svn: 219057	2014-10-04 15:49:50 +00:00
Chandler Carruth	99627bfbff	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Jingyue Wu	4938e271c6	Add fake use to suppress defined-but-unused warnings llvm-svn: 219045	2014-10-04 03:50:10 +00:00
Chandler Carruth	200e87c0c5	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044	2014-10-04 02:51:03 +00:00
Chandler Carruth	7e26a67ffa	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042	2014-10-04 01:05:48 +00:00
Matt Arsenault	c996175b57	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	f7c95e3eda	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	6cda887776	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Chandler Carruth	f3e880697a	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chris Bieneman	489d1dce3f	Converting the ErrorHandlerMutex to a ManagedStatic to avoid the static constructor and destructor. llvm-svn: 219028	2014-10-03 22:03:12 +00:00
Chandler Carruth	0adda1e4d4	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Richard Smith	1ed4229f6f	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith	176b691d32	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Adam Nemet	ff63a2dc51	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009	2014-10-03 20:00:34 +00:00
Tom Stellard	fae1dc8a12	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Benjamin Kramer	e12a6bac32	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Benjamin Kramer	cb3e06ba00	MCParser: Modernize memory handling. NFC. llvm-svn: 218998	2014-10-03 18:32:55 +00:00
Rui Ueyama	1af0865871	llvm-readobj: print out the fields of the COFF delay-import table llvm-svn: 218996	2014-10-03 18:07:18 +00:00
Robin Morisset	9098fee690	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hans Wennborg	6a654333c5	MipsAsmParser.cpp: fix VS2012 build llvm-svn: 218991	2014-10-03 17:16:24 +00:00
Hans Wennborg	da47cf46de	HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012 llvm-svn: 218990	2014-10-03 17:02:28 +00:00
Daniel Sanders	ef638fea2d	[mips] Print warning when using register names not available in N32/64 Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989	2014-10-03 15:37:37 +00:00
Sid Manning	40d809399f	Fix build break on Hexagon Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987	2014-10-03 13:59:01 +00:00
Sid Manning	7da3f9acba	Adding skeleton for unit testing Hexagon Code Emission Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986	2014-10-03 13:18:11 +00:00
Chandler Carruth	1964078936	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Renato Golin	4e31ae1051	Revert 202433 - Provide a target override for the latest regalloc heuristic That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981	2014-10-03 12:20:53 +00:00
Chandler Carruth	4bf341de3c	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980	2014-10-03 12:01:55 +00:00
Chandler Carruth	971a560cb8	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	e91b316266	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974	2014-10-03 10:11:39 +00:00
James Molloy	cb7449d058	Revert r215343. This was contentious and needs invesigation. llvm-svn: 218971	2014-10-03 09:29:24 +00:00
Lang Hames	89e9c17235	[BasicAA] Revert r218714 - Make better use of zext and sign information. This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944	2014-10-03 01:33:47 +00:00
Eric Christopher	f12e1ab313	constify TargetMachine parameter. llvm-svn: 218934	2014-10-03 00:42:41 +00:00
Rui Ueyama	15d993591c	llvm-readobj: print COFF delay-load import table This patch adds another iterator to access the delay-load import table and use it from llvm-readobj. http://reviews.llvm.org/D5594 llvm-svn: 218933	2014-10-03 00:41:58 +00:00
Eric Christopher	5312afe7e1	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Eric Christopher	a94e592e49	We can grab the options struct from the TargetMachine, no need to pass it down in the constructor. llvm-svn: 218929	2014-10-03 00:10:03 +00:00
Adam Nemet	4dca3ce4b0	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	4e2ef472d2	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	dc87aea176	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00
Hal Finkel	fe3368cb57	[PowerPC] Modern Book-E cores support sync Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923	2014-10-02 22:34:22 +00:00
Robin Morisset	e1ca44bd4c	[Power] Improve the expansion of atomic loads/stores Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922	2014-10-02 22:27:07 +00:00
Chandler Carruth	7425c8c279	Fix the threshold added in r186434 (a re-apply of r185393) and updaated to be a ManagedStatic in r218163 to not be a global variable written and read to from within the innards of SpillPlacement. This will fix a really scary race condition for anyone that has two copies of LLVM running spill placement concurrently. Yikes! This will also fix a really significant compile time hit that r218163 caused because the spill placement threshold read is actually in the very hot path of this code. The memory fence on each read was showing up as huge compile time regressions when spilling is responsible for most of the compile time. For example, optimizing sanitized code showed over 50% compile time regressions here. =/ llvm-svn: 218921	2014-10-02 22:23:14 +00:00
Juergen Ributzka	99bd3cba8b	[Stackmaps] Make ithe frame-pointer required for stackmaps. Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920	2014-10-02 22:21:49 +00:00
Duncan P. N. Exon Smith	786cd049fc	Revert "DI: Fold constant arguments into a single MDString" This reverts commit r218914 while I investigate some bots. llvm-svn: 218918	2014-10-02 22:15:31 +00:00
Rui Ueyama	861021f986	llvm-readobj: print COFF imported symbols This patch defines a new iterator for the imported symbols. Make a change to COFFDumper to use that iterator to print out imported symbols and its ordinals. llvm-svn: 218915	2014-10-02 22:05:29 +00:00
Duncan P. N. Exon Smith	571f97bd90	DI: Fold constant arguments into a single MDString This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 218914	2014-10-02 21:56:57 +00:00
Chandler Carruth	75e182b414	[x86] Teach the new vector shuffle lowering to widen floating point elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911	2014-10-02 21:37:14 +00:00
Duncan P. N. Exon Smith	f02fe70805	LTO: Document the Boolean argument from r218784 llvm-svn: 218907	2014-10-02 21:11:04 +00:00
Sanjay Patel	12d1ce5408	Optimize square root squared (PR21126). When unsafe-fp-math is enabled, we can turn sqrt(X) * sqrt(X) into X. This can happen in the real world when calculating x ** 3/2. This occurs in test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c. Differential Revision: http://reviews.llvm.org/D5584 llvm-svn: 218906	2014-10-02 21:10:54 +00:00
Justin Bogner	ad69e64761	InstrProf: Avoid linear search in a hot loop Every time we were adding or removing an expression when generating a coverage mapping we were doing a linear search to try and deduplicate the list. The indices in the list are important, so we can't just replace it by a DenseMap entirely, but an auxilliary DenseMap for fast lookup massively improves the performance issues I was seeing here. llvm-svn: 218892	2014-10-02 17:14:18 +00:00
Rui Ueyama	1e152d5eec	This patch adds a new flag "-coff-imports" to llvm-readobj. When the flag is given, the command prints out the COFF import table. Currently only the import table directory will be printed. I'm going to make another patch to print out the imported symbols. The implementation of import directory entry iterator in COFFObjectFile.cpp was buggy. This patch fixes that too. http://reviews.llvm.org/D5569 llvm-svn: 218891	2014-10-02 17:02:18 +00:00
Justin Bogner	f9535c418f	Reapply "InstrProf: Don't keep a large sparse list around just to zero it" When I was preparing r218879 for commit, I removed an early return that I decided was just noise. It wasn't. This is r218879 no-crash edition. This reverts commit r218881, reapplying r218879. llvm-svn: 218887	2014-10-02 16:43:31 +00:00
Adrian Prantl	38666f1d13	Remove an extra whitespace. llvm-svn: 218886	2014-10-02 16:42:15 +00:00
Adrian Prantl	75a0dac4b3	Pretty-printer: Paper over an ambiguity between line table entries and tagged mdnodes. fixes http://llvm.org/bugs/show_bug.cgi?id=21131 llvm-svn: 218885	2014-10-02 16:42:13 +00:00
Justin Bogner	70b5c562ce	Revert "InstrProf: Don't keep a large sparse list around just to zero it" This seems to be crashing on some buildbots. Reverting to investigate. This reverts commit r218879. llvm-svn: 218881	2014-10-02 16:15:27 +00:00
Justin Bogner	d6a9e4b3be	InstrProf: Don't keep a large sparse list around just to zero it The Terms vector here represented a polynomial of of all possible counters, and is used to simplify expressions when generating coverage mapping. There are a few problems with this: 1. Keeping the vector as a member is wasteful, since we clear it every time we use it. 2. Most expressions refer to a subset of the counters, so we end up iterating over a large number of zeros doing nothing a lot of the time. This updates the user of the vector to store the terms locally, and uses a sort and combine approach so that we only operate on counters that are actually used in a given expression. For small cases this makes very little difference, but in cases with a very large number of counted regions this is a significant performance fix. llvm-svn: 218879	2014-10-02 16:04:03 +00:00
Sanjay Patel	b41d46118a	Use the local variable that other clauses around here are already using. llvm-svn: 218876	2014-10-02 15:20:45 +00:00
Sanjay Patel	0d7dee654d	Remove duplicate function names from comments. NFC. llvm-svn: 218875	2014-10-02 15:13:22 +00:00
Tilmann Scheller	383b4fff4c	[NVPTX] Remove dead code. Found by the Clang static analyzer. llvm-svn: 218874	2014-10-02 15:12:48 +00:00
Joerg Sonnenberger	f148a6d498	Support padding unaligned data in .text. llvm-svn: 218870	2014-10-02 13:41:42 +00:00
Aaron Ballman	254dd7e439	Silence a -Wsign-compare warning. NFC. llvm-svn: 218868	2014-10-02 13:17:11 +00:00
Zinovy Nis	ccc3e3733b	[BUG][INDVAR] Fix for PR21014: wrong SCEV operands commuting for non-commutative instructions My commit rL216160 introduced a bug PR21014: IndVars widens code 'for (i = ; i < ...; i++) arr[ CONST - i]' into 'for (i = ; i < ...; i++) arr[ i - CONST]' thus inverting index expression. This patch fixes it. Thanks to Jörg Sonnenberger for pointing. Differential Revision: http://reviews.llvm.org/D5576 llvm-svn: 218867	2014-10-02 13:01:15 +00:00
Justin Bogner	5eec02a399	InstrProf: Simplify counting a file's regions when writing coverage (NFC) When writing a coverage mapping we iterate through the mapping regions in order of FileID, but we were then repeatedly searching from the beginning of the list to count the number of regions with a given FileID. It is simpler and more efficient to search forward from the current iterator to find the number of regions. llvm-svn: 218842	2014-10-02 00:31:00 +00:00
Chandler Carruth	8a16802d46	[x86] Improve and correct how the new vector shuffle lowering was matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we could lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839	2014-10-01 23:14:28 +00:00
Lang Hames	24f0c24de9	[MCJIT] Don't crash in debugging output for sections that aren't emitted. llvm-svn: 218836	2014-10-01 21:57:47 +00:00
Eric Christopher	f6ed33e7fa	constify the TargetMachine argument used in the subtarget and lowering constructors. llvm-svn: 218832	2014-10-01 21:36:28 +00:00
Duncan P. N. Exon Smith	379e375761	DIBuilder: Remove duplicated comments, NFC These comments already appear in the header, and some of them are out-of-date anyway. llvm-svn: 218829	2014-10-01 21:32:15 +00:00
Duncan P. N. Exon Smith	9affbbaac0	Revert "DIBuilder: Remove dead code" This reverts commit r218820. It turns out that Adrian has an outstanding SROA patch that uses this. I've updated it to forward to `createExpression()`. llvm-svn: 218828	2014-10-01 21:32:12 +00:00
Sanjay Patel	9ebfbb969d	Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578 Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822	2014-10-01 21:20:06 +00:00
Duncan P. N. Exon Smith	1ce4fd36bf	DIBuilder: Remove dead code I neglected to update `DIBuilder::createPieceExpression()` in r218797, which I noticed while rebasing a patch for PR17891. On closer inspection, it looks like dead code. If there are any downstream users of this, you should transition to the more general `createExpression()`. Or, we can add this back, but then it should just forward to `createExpression()`. llvm-svn: 218820	2014-10-01 21:14:20 +00:00
Eric Christopher	eb6e3bbf47	Now that the optimization level is adjusting the feature string before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817	2014-10-01 21:05:35 +00:00
Argyrios Kyrtzidis	0b9f5507c8	Adds 'override' to overriding methods. NFC. llvm-svn: 218815	2014-10-01 21:00:44 +00:00
Eric Christopher	36448af7f5	Rework the PPC TargetMachine so that the non-function specific overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805	2014-10-01 20:38:26 +00:00
Eric Christopher	12f4a78581	constify TargetMachine parameter for X86TargetLowering. llvm-svn: 218804	2014-10-01 20:38:22 +00:00
Sanjay Patel	7b2cd9ad86	Make the sqrt intrinsic return undef for a negative input. As discussed here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140609/220598.html And again here: http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/077168.html The sqrt of a negative number when using the llvm intrinsic is undefined. We should return undef rather than 0.0 to match the definition in the LLVM IR lang ref. This change should not affect any code that isn't using "no-nans-fp-math"; ie, no-nans is a requirement for generating the llvm intrinsic in place of a sqrt function call. Unfortunately, the behavior introduced by this patch will not match current gcc, xlc, icc, and possibly other compilers. The current clang/llvm behavior of returning 0.0 doesn't either. We knowingly approve of this difference with the other compilers in an attempt to flag code that is invoking undefined behavior. A front-end warning should also try to convince the user that the program will fail: http://llvm.org/bugs/show_bug.cgi?id=21093 Differential Revision: http://reviews.llvm.org/D5527 llvm-svn: 218803	2014-10-01 20:36:33 +00:00
Duncan P. N. Exon Smith	611afb229c	DIBuilder: Encapsulate DIExpression's element type `DIExpression`'s elements are 64-bit integers that are stored as `ConstantInt`. The accessors already encapsulate the storage. This commit updates the `DIBuilder` API to also encapsulate that. llvm-svn: 218797	2014-10-01 20:26:08 +00:00
Bruno Cardoso Lopes	e3c513a965	[MemoryDepAnalysis] Fix compile time slowdown - Problem One program takes ~3min to compile under -O2. This happens after a certain function A is inlined ~700 times in a function B, inserting thousands of new BBs. This leads to 80% of the compilation time spent in GVN::processNonLocalLoad and MemoryDependenceAnalysis::getNonLocalPointerDependency, while searching for nonlocal information for basic blocks. Usually, to avoid spending a long time to process nonlocal loads, GVN bails out if it gets more than 100 deps as a result from MD->getNonLocalPointerDependency. However this only happens after all nonlocal information for BBs have been computed, which is the bottleneck in this scenario. For instance, there are 8280 times where getNonLocalPointerDependency returns deps with more than 100 bbs and from those, 600 times it returns more than 1000 blocks. - Solution Bail out early during the nonlocal info computation whenever we reach a specified threshold. This patch proposes a 100 BBs threshold, it also reduces the compile time from 3min to 23s. - Testing The test-suite presented no compile nor execution time regressions. Some numbers from my machine (x86_64 darwin): - 17s under -Oz (which avoids inlining). - 1.3s under -O1. - 2m51s under -O2 ToT *** 23s under -O2 w/ Result.size() > 100 - 1m54s under -O2 w/ Result.size() > 500 With NumResultsLimit = 100, GVN yields the same outcome as in the unlimited 3min version. http://reviews.llvm.org/D5532 rdar://problem/18188041 llvm-svn: 218792	2014-10-01 20:07:13 +00:00
Sanjay Patel	0e4a83e89c	Don't repeat function/variable name in comment. NFC. llvm-svn: 218791	2014-10-01 19:39:32 +00:00
Tim Northover	5d72c5de02	ARM: allow copying of CPSR when all else fails. As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789	2014-10-01 19:21:03 +00:00
Adrian Prantl	87b7eb9d0f	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787	2014-10-01 18:55:02 +00:00
Reed Kotler	b9dc248e9e	Add fptrunc to mips fast-sel Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785	2014-10-01 18:47:02 +00:00
Duncan P. N. Exon Smith	30c9242caa	LTO: Ignore disabled diagnostic remarks r206400 and r209442 added remarks that are disabled by default. However, if a diagnostic handler is registered, the remarks are sent unfiltered to the handler. This is the right behaviour for clang, since it has its own filters. However, the diagnostic handler exposed in the LTO API receives only the severity and message. It doesn't have the information to filter by pass name. For LTO, disabled remarks should be filtered by the producer. I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool` argument indicating whether to respect the built-in filters. This defaults to `false`, so other consumers don't have a behaviour change, but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`. To make this behaviour testable, I added a `-use-diagnostic-handler` command-line option to `llvm-lto`. This fixes PR21108. llvm-svn: 218784	2014-10-01 18:36:03 +00:00
Adrian Prantl	b458dc2eee	Revert r218778 while investigating buldbot breakage. "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782	2014-10-01 18:10:54 +00:00
Adrian Prantl	25a7174e7a	Move the complex address expression out of DIVariable and into an extra argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778	2014-10-01 17:55:39 +00:00
Tom Stellard	79243d9664	R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol table llvm-svn: 218776	2014-10-01 17:15:17 +00:00
Tom Stellard	0a4e9a3b25	C API: Add LLVMCloneModule() llvm-svn: 218775	2014-10-01 17:14:57 +00:00
Jingyue Wu	fd47fb9976	Revert r216862 due to a performance regression Reported by Alexey Volkov in PR21115 llvm-svn: 218771	2014-10-01 15:22:13 +00:00
Toma Tabacu	c4c202a9a7	[mips] Rename emit and parse functions for the .cpload assembler directive. NFC. Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768	2014-10-01 14:53:19 +00:00
Tom Stellard	3a35d8f4c2	R600/SI: Add a generic pseudo EXP instruction llvm-svn: 218767	2014-10-01 14:44:45 +00:00
Tom Stellard	0c238c2fbe	R600/SI: Add generic pseudo MTBUF instructions llvm-svn: 218766	2014-10-01 14:44:43 +00:00
Tom Stellard	c470c96e6b	R600/SI: Add generic pseudo SMRD instructions llvm-svn: 218765	2014-10-01 14:44:42 +00:00
Oliver Stannard	d4e0a4fd2c	[ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5 Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763	2014-10-01 13:13:18 +00:00
Chandler Carruth	6c02c031b8	[x86] Fix a few more tiny patterns with the new vector shuffle lowering that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns just right. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're going to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756	2014-10-01 11:14:02 +00:00
Chandler Carruth	048486109b	[x86] Delete some extraneous logic from the new vector shuffle lowering. Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755	2014-10-01 11:13:57 +00:00
Tom Coxon	e493f177ee	[AArch64] Allow access to all system registers with MRS/MSR instructions. The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753	2014-10-01 10:13:59 +00:00
Evgeniy Stepanov	815f2869ad	Revert r218721, r218735. Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752	2014-10-01 10:07:28 +00:00
Asiri Rathnayake	530b3edab6	Add missing natual vector cast. Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751	2014-10-01 09:59:45 +00:00
Oliver Stannard	37e4daab05	[ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM) The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747	2014-10-01 09:02:17 +00:00
Daniel Sanders	92db6b78f7	[mips] Fix disassembly of [ls][wd]c[23], cache, and pref Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745	2014-10-01 08:26:55 +00:00
Sasa Stankovic	7072a7968f	[mips] For indirect calls we don't need $gp to point to .got. Mips linker doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744	2014-10-01 08:22:21 +00:00
Lang Hames	2f27b2fe89	[MCJIT] Turn the getSymbolAddress free function created in r218626 into a static member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess). The functionality this provides is very specific to RTDyldMemoryManager, so it makes sense to keep it in that class to avoid accidental re-use. No functional change. llvm-svn: 218741	2014-10-01 04:11:13 +00:00
Nick Lewycky	5f75f4ddb9	Fix typo in comment from r218733 llvm-svn: 218739	2014-10-01 03:37:34 +00:00

1 2 3 4 5 ...

73340 Commits