llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	acecdc0211	[x86] Fix PR21139, one of the last remaining regressions found in the new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081	2014-10-05 12:07:34 +00:00
Chandler Carruth	9f4d9fa54e	[x86] Teach the new vector shuffle lowering how to lower 128-bit shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079	2014-10-05 11:41:36 +00:00
NAKAMURA Takumi	2a295fd337	HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation] llvm-svn: 219073	2014-10-05 04:54:54 +00:00
NAKAMURA Takumi	431c9d3f1f	HexagonDesc: Update LLVMBuild.txt. llvm-svn: 219071	2014-10-05 04:54:29 +00:00
Hal Finkel	4564688806	[InstCombine] Simplify the logic from r219067 using ValueTracking Joerg suggested on IRC that I look at generalizing the logic from r219067 to handle more general redundancies (like removing an assume(x > 3) dominated by an assume(x > 5)). The way to do this would be to ask ValueTracking to determine the value of the i1 argument. It turns out that ValueTracking is not very good at this right now (although it does get the trivial redundancy case) because it does not understand ICmps. Nevertheless, the resulting code in InstCombine is simpler than r219067, so we might as well do it now. llvm-svn: 219070	2014-10-05 00:53:02 +00:00
Benjamin Kramer	4b92c6b8e5	[SystemZ] Make operator bool explicit. NFC. llvm-svn: 219069	2014-10-04 22:44:35 +00:00
Benjamin Kramer	2e52f02864	Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and weirdness exposed by it. llvm-svn: 219068	2014-10-04 22:44:29 +00:00
Hal Finkel	04a156139e	[InstCombine] Remove redundant @llvm.assume intrinsics For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067	2014-10-04 21:27:06 +00:00
Benjamin Kramer	c6cc58e703	Remove unnecessary copying or replace it with moves in a bunch of places. NFC. llvm-svn: 219061	2014-10-04 16:55:56 +00:00
David Blaikie	cda2aa823e	Sink DwarfDebug::updateSubprogramScopeDIE into DwarfCompileUnit This requires exposing some of the current function state from DwarfDebug. I hope there's not too much of that to expose as I go through all the functions, but it still seems nicer to expose singular data down to multiple consumers, than have consumers expose raw mapping data structures up to DwarfDebug for building subprograms. Part of a series of refactoring to allow subprograms in both the skeleton and dwo CUs under Fission. llvm-svn: 219060	2014-10-04 16:24:00 +00:00
David Blaikie	8945219dc9	Reformatting accidentally left out of r219057 llvm-svn: 219059	2014-10-04 16:00:26 +00:00
David Blaikie	14499a7d68	Sink DwarfDebug::attachLowHighPC into DwarfCompileUnit One of many things to sink down into DwarfCompileUnit to allow handling of subprograms in both the skeleton and dwo CU under Fission. llvm-svn: 219058	2014-10-04 15:58:47 +00:00
David Blaikie	37c5231051	Move DwarfCompileUnit from DwarfUnit.h to its own header (DwarfCompileUnit.h) In preparation for sinking all the subprogram emission code down from DwarfDebug into DwarfCompileUnit, this will avoid bloating DwarfUnit.h/cpp greatly and make concerns a bit more clear/isolated. (sinking this handling down is part of the work to handle emitting minimal subprograms for -gmlt-like data into the skeleton CU under fission) llvm-svn: 219057	2014-10-04 15:49:50 +00:00
Chandler Carruth	99627bfbff	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Jingyue Wu	4938e271c6	Add fake use to suppress defined-but-unused warnings llvm-svn: 219045	2014-10-04 03:50:10 +00:00
Chandler Carruth	200e87c0c5	[x86] Fix a bug in the VZEXT DAG combine that I just made more powerful. It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes can't be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044	2014-10-04 02:51:03 +00:00
Chandler Carruth	7e26a67ffa	[x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXT nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042	2014-10-04 01:05:48 +00:00
Matt Arsenault	c996175b57	R600/SI: Custom lower f64 -> i64 conversions llvm-svn: 219038	2014-10-03 23:54:56 +00:00
Matt Arsenault	f7c95e3eda	R600: Custom lower [s\|u]int_to_fp for i64 -> f64 llvm-svn: 219037	2014-10-03 23:54:41 +00:00
Matt Arsenault	6cda887776	R600/SI: Fix ftrunc f64 conformance failures. Re-add the tests since they were deleted at some point llvm-svn: 219036	2014-10-03 23:54:27 +00:00
Chandler Carruth	f3e880697a	[x86] Add a really preposterous number of patterns for matching all of the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't have a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll completely with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033	2014-10-03 22:43:17 +00:00
Chris Bieneman	489d1dce3f	Converting the ErrorHandlerMutex to a ManagedStatic to avoid the static constructor and destructor. llvm-svn: 219028	2014-10-03 22:03:12 +00:00
Chandler Carruth	0adda1e4d4	[x86] Adjust the patterns for lowering X86vzmovl nodes which don't perform a load to use blendps rather than movss when it is available. For non-loads, blendps is much faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and three ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022	2014-10-03 21:38:49 +00:00
Richard Smith	1ed4229f6f	PR21145: Teach LLVM about C++14 sized deallocation functions. C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014	2014-10-03 20:17:06 +00:00
Duncan P. N. Exon Smith	176b691d32	Revert "Revert "DI: Fold constant arguments into a single MDString"" This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010	2014-10-03 20:01:09 +00:00
Adam Nemet	ff63a2dc51	[ISel] Keep matching state consistent when folding during X86 address match In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009	2014-10-03 20:00:34 +00:00
Tom Stellard	fae1dc8a12	R600: Align functions to 256 bytes llvm-svn: 219002	2014-10-03 19:02:02 +00:00
Benjamin Kramer	e12a6bac32	Eliminate some deep std::vector copies. NFC. llvm-svn: 218999	2014-10-03 18:33:16 +00:00
Benjamin Kramer	cb3e06ba00	MCParser: Modernize memory handling. NFC. llvm-svn: 218998	2014-10-03 18:32:55 +00:00
Rui Ueyama	1af0865871	llvm-readobj: print out the fields of the COFF delay-import table llvm-svn: 218996	2014-10-03 18:07:18 +00:00
Robin Morisset	9098fee690	[Power] Use lwsync for non-seq_cst fences Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995	2014-10-03 18:04:36 +00:00
Hans Wennborg	6a654333c5	MipsAsmParser.cpp: fix VS2012 build llvm-svn: 218991	2014-10-03 17:16:24 +00:00
Hans Wennborg	da47cf46de	HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012 llvm-svn: 218990	2014-10-03 17:02:28 +00:00
Daniel Sanders	ef638fea2d	[mips] Print warning when using register names not available in N32/64 Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989	2014-10-03 15:37:37 +00:00
Sid Manning	40d809399f	Fix build break on Hexagon Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987	2014-10-03 13:59:01 +00:00
Sid Manning	7da3f9acba	Adding skeleton for unit testing Hexagon Code Emission Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986	2014-10-03 13:18:11 +00:00
Chandler Carruth	1964078936	[x86] Teach the new vector shuffle lowering to aggressively form MOVSS and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are dramatically faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends aren't as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985	2014-10-03 13:11:13 +00:00
Renato Golin	4e31ae1051	Revert 202433 - Provide a target override for the latest regalloc heuristic That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981	2014-10-03 12:20:53 +00:00
Chandler Carruth	4bf341de3c	[x86] Refactor the element insertion logic in the new vector shuffle lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980	2014-10-03 12:01:55 +00:00
Chandler Carruth	971a560cb8	[x86] Significantly improve the ability of the new vector shuffle lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977	2014-10-03 11:25:58 +00:00
Chandler Carruth	e91b316266	[x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widen element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974	2014-10-03 10:11:39 +00:00
James Molloy	cb7449d058	Revert r215343. This was contentious and needs invesigation. llvm-svn: 218971	2014-10-03 09:29:24 +00:00
Lang Hames	89e9c17235	[BasicAA] Revert r218714 - Make better use of zext and sign information. This patch broke 447.dealII on Darwin. I'm currently working on a reduced test-case, but reverting for now to keep the bots happy. <rdar://problem/18530107> llvm-svn: 218944	2014-10-03 01:33:47 +00:00
Eric Christopher	f12e1ab313	constify TargetMachine parameter. llvm-svn: 218934	2014-10-03 00:42:41 +00:00
Rui Ueyama	15d993591c	llvm-readobj: print COFF delay-load import table This patch adds another iterator to access the delay-load import table and use it from llvm-readobj. http://reviews.llvm.org/D5594 llvm-svn: 218933	2014-10-03 00:41:58 +00:00
Eric Christopher	5312afe7e1	constify TargetMachine argument. llvm-svn: 218930	2014-10-03 00:17:59 +00:00
Eric Christopher	a94e592e49	We can grab the options struct from the TargetMachine, no need to pass it down in the constructor. llvm-svn: 218929	2014-10-03 00:10:03 +00:00
Adam Nemet	4dca3ce4b0	[AVX512] Pull pattern for subvector insert into the instruction definition No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928	2014-10-02 23:18:30 +00:00
Adam Nemet	4e2ef472d2	[AVX512] Refactor subvector inserts No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927	2014-10-02 23:18:28 +00:00
Adam Nemet	dc87aea176	[AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926	2014-10-02 23:18:26 +00:00

1 2 3 4 5 ...

73203 Commits