llvm-project

Commit Graph

Author	SHA1	Message	Date
Rafael Espindola	9075f77064	Use short names for jumptable sections. Also refactor code to remove some duplication. llvm-svn: 230087	2015-02-20 23:28:28 +00:00
Matt Arsenault	20711b7bae	R600/SI: Remove v_sub_f64 pseudo The expansion code does the same thing. Since the operands were not defined with the correct types, this has the side effect of fixing operand folding since the expanded pseudo would never use SGPRs or inline immediates. llvm-svn: 230072	2015-02-20 22:10:45 +00:00
Matt Arsenault	8d6300346f	R600: Use new fmad node. This enables a few useful combines that used to only use fma. Also since v_mad_f32 apparently does not support denormals, disable the existing cases that are custom handled if they are requested. llvm-svn: 230071	2015-02-20 22:10:41 +00:00
Jozef Kolek	0365675522	Reversed revision 229706. The reason is regression, which is caused by the usage of instruction ADDU16 by CodeGen. For this instruction an improper register is allocated, i.e. the register that is not from register set defined for the instruction. llvm-svn: 230053	2015-02-20 20:26:52 +00:00
Andrea Di Biagio	7035178aeb	[X86][FastIsel] Teach how to select float-half conversion intrinsics. This patch teaches X86FastISel how to select intrinsic 'convert_from_fp16' and intrinsic 'convert_to_fp16'. If the target has F16C, we can select VCVTPS2PHrr for a float-half conversion, and VCVTPH2PSrr for a half-float conversion. Differential Revision: http://reviews.llvm.org/D7673 llvm-svn: 230043	2015-02-20 19:37:14 +00:00
Kit Barton	263edb99ab	I incorrectly marked the VORC instruction as isCommutable when I added it. This fix removes the VORC instruction definition from the isCommutable block. Phabricator review: http://reviews.llvm.org/D7772 llvm-svn: 230020	2015-02-20 15:54:58 +00:00
Hal Finkel	e5aaf3f2cd	[PowerPC] Loop Data Prefetching for the BG/Q The IBM BG/Q supercomputer's A2 cores have a hardware prefetching unit, the L1P, but it does not prefetch directly into the A2's L1 cache. Instead, it prefetches into its own L1P buffer, and the latency to access that buffer is significantly higher than that to the L1 cache (although smaller than the latency to the L2 cache). As a result, especially when multiple hardware threads are not actively busy, explicitly prefetching data into the L1 cache is advantageous. I've been using this pass out-of-tree for data prefetching on the BG/Q for well over a year, and it has worked quite well. It is enabled by default only for the BG/Q, but can be enabled for other cores as well via a command-line option. Eventually, we might want to add some TTI interfaces and move this into Transforms/Scalar (there is nothing particularly target dependent about it, although only machines like the BG/Q will benefit from its simplistic strategy). llvm-svn: 229966	2015-02-20 05:08:21 +00:00
Chandler Carruth	4041f2217b	[x86] Remove the old vector shuffle lowering code and its flag. The new shuffle lowering has been the default for some time. I've enabled the new legality testing by default with no really blocking regressions. I've fuzz tested this very heavily (many millions of fuzz test cases have passed at this point). And this cleans up a ton of code. =] Thanks again to the many folks that helped with this transition. There was a lot of work by others that went into the new shuffle lowering to make it really excellent. In case you aren't using a diff algorithm that can handle this: X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-) llvm-svn: 229964	2015-02-20 04:25:04 +00:00
Chandler Carruth	eb206aa1ea	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Chandler Carruth	d2b14b296c	[x86] Make the new vector shuffle legality test on by default, which reflects the fact that the x86 backend can in fact lower any shuffle you want it to with reasonably high code quality. My recent work on the new vector shuffle has made this regress very little. The diff in the test cases makes me very, very happy. llvm-svn: 229958	2015-02-20 03:05:47 +00:00
Chandler Carruth	6677809820	[x86] Clean up a couple of test cases with the new update script. Split one test case that is only partially tested in 32-bits into two test cases so that the script doesn't generate massive spews of tests for the cases we don't care about. llvm-svn: 229955	2015-02-20 02:44:13 +00:00
Chandler Carruth	301ed0c3b4	Revert r229944: EH: Prune unreachable resume instructions during Dwarf EH preparation This doesn't pass 'ninja check-llvm' for me. Lots of tests, including the ones updated, fail with crashes and other explosions. llvm-svn: 229952	2015-02-20 02:15:36 +00:00
Reid Kleckner	0b647e6cca	EH: Prune unreachable resume instructions during Dwarf EH preparation Today a simple function that only catches exceptions and doesn't run destructor cleanups ends up containing a dead call to _Unwind_Resume (PR20300). We can't remove these dead resume instructions during normal optimization because inlining might introduce additional landingpads that do have cleanups to run. Instead we can do this during EH preparation, which is guaranteed to run after inlining. Fixes PR20300. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D7744 llvm-svn: 229944	2015-02-20 01:00:19 +00:00
Eric Christopher	0d94fa98e5	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Ahmed Bougacha	db141ac37d	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932	2015-02-19 23:52:41 +00:00
Sanjay Patel	f34a29a845	add X86 load folding tests for unary math ops X86 load folding is fragile; eg, the tests here don't work without AVX even though they should. This is because we have a mix of tablegen patterns that have been added over time, and we have a load folding table used by the peephole optimizer that has to be kept in sync with the ever-changing ISA and tablegen defs. llvm-svn: 229870	2015-02-19 16:59:11 +00:00
Chandler Carruth	5d1a84b7b8	[x86] Delete still more piles of complex code now that we have a good systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. llvm-svn: 229859	2015-02-19 15:21:57 +00:00
Chandler Carruth	0b39536390	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. llvm-svn: 229856	2015-02-19 15:06:13 +00:00
Chandler Carruth	352eba1c29	[x86] Dramatically improve v8i16 shuffle lowering by not using its terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. llvm-svn: 229853	2015-02-19 14:08:24 +00:00
Chandler Carruth	2c0390ca4b	[x86] Remove the final fallback in the v8i16 lowering that isn't really needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we know we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. llvm-svn: 229851	2015-02-19 13:56:49 +00:00
Chandler Carruth	f0f0d27391	[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. llvm-svn: 229849	2015-02-19 13:15:12 +00:00
Chandler Carruth	8817e5e01b	[x86] Remove the insanely over-aggressive unpack lowering strategy for v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes sooooo much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is tiny. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. llvm-svn: 229846	2015-02-19 12:10:37 +00:00
Jozef Kolek	5d171fc291	[mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator Differential Revision: http://reviews.llvm.org/D7611 llvm-svn: 229845	2015-02-19 11:51:32 +00:00
Elena Demikhovsky	69e8b45b13	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Chandler Carruth	bcb6c5f62d	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. llvm-svn: 229836	2015-02-19 10:46:52 +00:00
Chandler Carruth	b89464a9b6	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
Peter Collingbourne	f4498a4fd3	llvm-mc: Use Target::createNullStreamer to fix crashes on target-specific asm directives. llvm-svn: 229798	2015-02-19 00:45:04 +00:00
Chandler Carruth	c8e6877065	[x86] Merge checks for a recently added test case that is the same on all SSE variants and AVX variants. llvm-svn: 229770	2015-02-18 23:20:49 +00:00
Reid Kleckner	7bb0738d82	Add an IR-to-IR test for dwarf EH preparation using opt This tests the simple resume instruction elimination logic that we have before making some changes to it. llvm-svn: 229768	2015-02-18 23:17:41 +00:00
Reid Kleckner	4dd0304e34	dos2unix the WinEH file and tests llvm-svn: 229735	2015-02-18 19:52:46 +00:00
Andrew Kaylor	527c5dc68d	Adding implementation to outline C++ catch handlers for native Windows 64 exception handling. Differential Revision: http://reviews.llvm.org/D7363 llvm-svn: 229715	2015-02-18 18:31:51 +00:00
Jozef Kolek	3c6724f442	[mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator Differential Revision: http://reviews.llvm.org/D7609 llvm-svn: 229706	2015-02-18 17:33:56 +00:00
Daniel Sanders	1779314e3c	[mips] Add backend support for Mips32r[35] and Mips64r[35]. Summary: These ISA's didn't add any instructions so they are almost identical to Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA revision in .MIPS.abiflags is 3 or 5 respectively instead of 2. Reviewers: vmedic Reviewed By: vmedic Subscribers: tomatabacu, llvm-commits, atanasyan Differential Revision: http://reviews.llvm.org/D7381 llvm-svn: 229695	2015-02-18 16:24:50 +00:00
Kit Barton	298beb5e86	This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694	2015-02-18 16:21:46 +00:00
Vasileios Kalintiris	611cb70b83	[mips] Avoid redundant sign extension of the result of binary bitwise instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7581 llvm-svn: 229675	2015-02-18 14:57:05 +00:00
Bradley Smith	26c9922a59	[ARM] Add missing M/R class CPUs Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 llvm-svn: 229660	2015-02-18 10:33:30 +00:00
Michael Kuperstein	af9befa6b7	Fixes two issue in SimplifyDemandedBits of sext_in_reg: 1) We should not try to simplify if the sext has multiple uses 2) There is no need to simplify is the source value is already sign-extended. Patch by Gil Rapaport <gil.rapaport@intel.com> Differential Revision: http://reviews.llvm.org/D6949 llvm-svn: 229659	2015-02-18 09:43:40 +00:00
Chandler Carruth	48cc6c623a	[x86] Refactor the bit shift code the same as I just did the byte shift code. While this didn't have the miscompile (it used MatchLeft consistently) it missed some cases where it could use right shifts. I've added a test case Craig Topper came up with to exercise the right shift matching. This code is really identical between the two. I'm going to merge them next so that we don't keep two copies of all of this logic. llvm-svn: 229655	2015-02-18 09:19:58 +00:00
Ulrich Weigand	7db6918e2b	[SystemZ] Support all TLS access models - CodeGen part The current SystemZ back-end only supports the local-exec TLS access model. This patch adds all required CodeGen support for the other TLS models, which means in particular: - Expand initial-exec TLS accesses by loading TLS offsets from the GOT using @indntpoff relocations. - Expand general-dynamic and local-dynamic accesses by generating the appropriate calls to __tls_get_offset. Note that this routine has a non-standard ABI and requires loading the GOT pointer into %r12, so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node. - Add a new platform-specific optimization pass to remove redundant __tls_get_offset calls in the local-dynamic model (modeled after the corresponding X86 pass). - Add test cases verifying all access models and optimizations. llvm-svn: 229654	2015-02-18 09:13:27 +00:00
Daniel Jasper	4d7b04384e	Remove experimental options to control machine block placement. This reverts r226034. Benchmarking with those flags has not revealed anything interesting. llvm-svn: 229648	2015-02-18 08:18:07 +00:00
Elena Demikhovsky	714f23bcdb	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Craig Topper	55ac42426e	[X86] Add another test case for the bug fixed in r229642. With the bug a vpsrldq was emitted instead of pslldq. llvm-svn: 229643	2015-02-18 07:45:43 +00:00
Chandler Carruth	55553f5299	[x86] Rewrite the byte shift detection to not use boolean variables to track state. I didn't like this in the code review because the pattern tends to be error prone, but I didn't see a clear way to rewrite it. Turns out that there were bugs here, I found them when fuzz testing our shuffle lowering for correctness on x86. The core of the problem is that we need to consistently test all our preconditions for the same directionality of shift and the same input vector. Instead, formulate this as two predicates (one doesn't depend on the input in any way), pass things like the directionality and input vector as inputs, and loop over the alternatives. This fixes a pattern of very rare miscompiles coming out of this code. Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz testing. The new code is over half a million test runs with no failures yet. I've also fuzzed every other function in the lowering code with over 3.5 million test cases and not discovered any other miscompiles. llvm-svn: 229642	2015-02-18 07:13:48 +00:00
Craig Topper	b324e43aed	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. llvm-svn: 229640	2015-02-18 06:24:44 +00:00
Matt Arsenault	caa1288fff	R600/SI: Add missing offset operand to buffer bothen llvm-svn: 229605	2015-02-18 02:04:38 +00:00
Matt Arsenault	2ad8bab7ee	R600/SI: Add missing soffset operand to global atomics llvm-svn: 229604	2015-02-18 02:04:35 +00:00
Andrea Di Biagio	e7b58ee555	[X86][FastIsel] Teach how to select scalar integer to float/double conversions. This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 llvm-svn: 229589	2015-02-17 23:40:58 +00:00
Rafael Espindola	df19519800	Add r228939 back with a fix. The problem in the original patch was not switching back to .text after printing an eh table. Original message: On ELF, put PIC jump tables in a non executable section. Fixes PR22558. llvm-svn: 229586	2015-02-17 23:34:51 +00:00
Rafael Espindola	8c77768609	Add a test showing the problem in r228939. If an EH table is printed in between the function and the jump table we would fail to switch back to the text section to print the jump table. llvm-svn: 229580	2015-02-17 23:21:46 +00:00
Simon Pilgrim	1d89a02abb	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 llvm-svn: 229571	2015-02-17 22:24:32 +00:00

1 2 3 4 5 ...

12009 Commits