llvm-project

Commit Graph

Author	SHA1	Message	Date
Chandler Carruth	4041f2217b	[x86] Remove the old vector shuffle lowering code and its flag. The new shuffle lowering has been the default for some time. I've enabled the new legality testing by default with no really blocking regressions. I've fuzz tested this very heavily (many millions of fuzz test cases have passed at this point). And this cleans up a ton of code. =] Thanks again to the many folks that helped with this transition. There was a lot of work by others that went into the new shuffle lowering to make it really excellent. In case you aren't using a diff algorithm that can handle this: X86ISelLowering.cpp: 22 insertions(+), 2940 deletions(-) llvm-svn: 229964	2015-02-20 04:25:04 +00:00
Chandler Carruth	eb206aa1ea	[x86] Now that the new vector shuffle legality is enabled and everything is going well, remove the flag and the code for the old legality tests. This is the first step toward removing the entire old vector shuffle lowering. Much more code to delete coming up next. llvm-svn: 229963	2015-02-20 03:59:35 +00:00
Chandler Carruth	d2b14b296c	[x86] Make the new vector shuffle legality test on by default, which reflects the fact that the x86 backend can in fact lower any shuffle you want it to with reasonably high code quality. My recent work on the new vector shuffle has made this regress very little. The diff in the test cases makes me very, very happy. llvm-svn: 229958	2015-02-20 03:05:47 +00:00
Eric Christopher	0d94fa98e5	Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics." The instructions were being generated on architectures that don't support avx512. This reverts commit r229837. llvm-svn: 229942	2015-02-20 00:45:28 +00:00
Eric Christopher	06b32cdfed	Add a license header to the AVX512 file. llvm-svn: 229941	2015-02-20 00:36:53 +00:00
Ahmed Bougacha	db141ac37d	[ARM] Re-re-apply VLD1/VST1 base-update combine. This re-applies r223862, r224198, r224203, and r224754, which were reverted in r228129 because they exposed Clang misalignment problems when self-hosting. The combine caused the crashes because we turned ISD::LOAD/STORE nodes to ARMISD::VLD1/VST1_UPD nodes. When selecting addressing modes, we were very lax for the former, and only emitted the alignment operand (as in "[r1:128]") when it was larger than the standard alignment of the memory type. However, for ARMISD nodes, we just used the MMO alignment, no matter what. In our case, we turned ISD nodes to ARMISD nodes, and this caused the alignment operands to start being emitted. And that's how we exposed alignment problems that were ignored before (but I believe would have been caught with SCTRL.A==1?). To fix this, we can just mirror the hack done for ISD nodes: only take into account the MMO alignment when the access is overaligned. Original commit message: We used to only combine intrinsics, and turn them into VLD1_UPD/VST1_UPD when the base pointer is incremented after the load/store. We can do the same thing for generic load/stores. Note that we can only combine the first load/store+adds pair in a sequence (as might be generated for a v16f32 load for instance), because other combines turn the base pointer addition chain (each computing the address of the next load, from the address of the last load) into independent additions (common base pointer + this load's offset). rdar://19717869, rdar://14062261. llvm-svn: 229932	2015-02-19 23:52:41 +00:00
Ahmed Bougacha	dfdf54bed0	[ARM] Minor cleanup to CombineBaseUpdate. NFC. In preparation for a future patch: - rename isLoad to isLoadOp: the former is confusing, and can be taken to refer to the fact that the node is an ISD::LOAD. (it isn't, yet.) - change formatting here and there. - add some comments. - const-ify bools. llvm-svn: 229929	2015-02-19 23:30:37 +00:00
Ahmed Bougacha	4c2b0781a5	[CodeGen] Use ArrayRef instead of std::vector&. NFC. The former lets us use SmallVectors. Do so in ARM and AArch64. llvm-svn: 229925	2015-02-19 23:13:10 +00:00
Colin LeMahieu	1174fea31c	[Hexagon] Moving remaining methods off of HexagonMCInst in to HexagonMCInstrInfo and eliminating HexagonMCInst class. llvm-svn: 229914	2015-02-19 21:10:50 +00:00
Eric Christopher	64d35be6d6	Remove unused argument from emitInlineAsmStart. llvm-svn: 229907	2015-02-19 19:52:25 +00:00
Colin LeMahieu	745c4710db	[Hexagon] Moving more functions off of HexagonMCInst and in to HexagonMCInstrInfo. llvm-svn: 229903	2015-02-19 19:49:27 +00:00
Colin LeMahieu	af304e5192	[Hexagon] Creating HexagonMCInstrInfo namespace as landing zone for static functions detached from HexagonMCInst. llvm-svn: 229885	2015-02-19 19:00:00 +00:00
Colin LeMahieu	f08a3ccf50	[Hexagon] Removing static variable holding MCInstrInfo. llvm-svn: 229872	2015-02-19 17:38:39 +00:00
Benjamin Kramer	ea68a944a1	Demote vectors to arrays. No functionality change. llvm-svn: 229861	2015-02-19 15:26:17 +00:00
Chandler Carruth	5d1a84b7b8	[x86] Delete still more piles of complex code now that we have a good systematic lowering of v8i16. This required a slight strategy shift to prefer unpack lowerings in more places. While this isn't a cut-and-dry win in every case, it is in the overwhelming majority. There are only a few places where the old lowering would probably be a touch faster, and then only by a small margin. In some cases, this is yet another significant improvement. llvm-svn: 229859	2015-02-19 15:21:57 +00:00
Chandler Carruth	0b39536390	[x86] Teach the unpack lowering how to lower with an initial unpack in addition to lowering to trees rooted in an unpack. This saves shuffles and or registers in many various ways, lets us handle another class of v4i32 shuffles pre SSE4.1 without domain crosses, etc. llvm-svn: 229856	2015-02-19 15:06:13 +00:00
Chandler Carruth	352eba1c29	[x86] Dramatically improve v8i16 shuffle lowering by not using its terribly complex partial blend logic. This code path was one of the more complex and bug prone when it first went in and it hasn't faired much better. Ultimately, with the simpler basis for unpack lowering and support bit-math blending, this is completely obsolete. In the worst case without this we generate different but equivalent instructions. However, in many cases we generate much better code. This is especially true when blends or pshufb is available. This does expose one (minor) weakness of the unpack lowering that I'll try to address. In case you were wondering, this is actually a big part of what I've been trying to pull off in the recent string of commits. llvm-svn: 229853	2015-02-19 14:08:24 +00:00
Chandler Carruth	2c0390ca4b	[x86] Remove the final fallback in the v8i16 lowering that isn't really needed, and significantly improve the SSSE3 path. This makes the new strategy much more clear. If we can blend, we just go with that. If we can't blend, we try to permute into an unpack so that we handle cases where the unpack doing the blend also simplifies the shuffle. If that fails and we've got SSSE3, we now call into factored-out pshufb lowering code so that we leverage the fact that pshufb can set up a blend for us while shuffling. This generates great code, especially because we know we don't have a fast blend at this point. Finally, we fall back on decomposing into permutes and blends because we do at least have a bit-math-based blend if we need to use that. This pretty significantly improves some of the v8i16 code paths. We never need to form pshufb for the single-input shuffles because we have effective target-specific combines to form it there, but we were missing its effectiveness in the blends. llvm-svn: 229851	2015-02-19 13:56:49 +00:00
Chandler Carruth	f0f0d27391	[x86] Simplify the pre-SSSE3 v16i8 lowering significantly by decomposing them into permutes and a blend with the generic decomposition logic. This works really well in almost every case and lets the code only manage the expansion of a single input into two v8i16 vectors to perform the actual shuffle. The blend-based merging is often much nicer than the pack based merging that this replaces. The only place where it isn't we end up blending between two packs when we could do a single pack. To handle that case, just teach the v2i64 lowering to handle these blends by digging out the operands. With this we're down to only really random permutations that cause an explosion of instructions. llvm-svn: 229849	2015-02-19 13:15:12 +00:00
Chandler Carruth	8817e5e01b	[x86] Remove the insanely over-aggressive unpack lowering strategy for v16i8 shuffles, and replace it with new facilities. This uses precise patterns to match exact unpacks, and the new generalized unpack lowering only when we detect a case where we will have to shuffle both inputs anyways and they terminate in exactly a blend. This fixes all of the blend horrors that I uncovered by always lowering blends through the vector shuffle lowering. It also removes sooooo much of the crazy instruction sequences required for v16i8 lowering previously. Much cleaner now. The only "meh" aspect is that we sometimes use pshufb+pshufb+unpck when it would be marginally nicer to use pshufb+pshufb+por. However, the difference there is tiny. In many cases its a win because we re-use the pshufb mask. In others, we get to avoid the pshufb entirely. I've left a FIXME, but I'm dubious we can really do better than this. I'm actually pretty happy with this lowering now. For SSE2 this exposes some horrors that were really already there. Those will have to fixed by changing a different path through the v16i8 lowering. llvm-svn: 229846	2015-02-19 12:10:37 +00:00
Jozef Kolek	5d171fc291	[mips][microMIPS] Make usage of AND16, OR16 and XOR16 by code generator Differential Revision: http://reviews.llvm.org/D7611 llvm-svn: 229845	2015-02-19 11:51:32 +00:00
Chandler Carruth	38dea42ddf	[x86] The SELECT x86 DAG combine also does legalization. It used to rely on things not being marked as either custom or legal, but we now do custom lowering of more VSELECT nodes. To cope with this, manually replicate the legality tests here. These have to stay in sync with the set of tests used in the custom lowering of VSELECT. Ideally, we wouldn't do any of this combine-based-legalization when we have an actual custom legalization step for VSELECT, but I'm not going to be able to rewrite all of that today. I don't have a test case for this currently, but it was found when compiling a number of the test-suite benchmarks. I'll try to reduce a test case and add it. This should at least fix the test-suite fallout on build bots. llvm-svn: 229844	2015-02-19 11:43:37 +00:00
Michael Kuperstein	efd7a96d2e	Reverting r229831 due to multiple ARM/PPC/MIPS build-bot failures. llvm-svn: 229841	2015-02-19 11:38:11 +00:00
Elena Demikhovsky	69e8b45b13	AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics. llvm-svn: 229837	2015-02-19 10:48:04 +00:00
Chandler Carruth	bcb6c5f62d	[x86] Add support for bit-wise blending and use it in the v8 and v16 lowering paths. I'm going to be leveraging this to simplify a lot of the overly complex lowering of v8 and v16 shuffles in pre-SSSE3 modes. Sadly, this isn't profitable on v4i32 and v2i64. There, the float and double blending instructions for pre-SSE4.1 are actually pretty good, and we can't beat them with bit math. And once SSE4.1 comes around we have direct blending support and this ceases to be relevant. Also, some of the test cases look odd because the domain fixer canonicalizes these to floating point domain. That's OK, it'll use the integer domain when it matters and some day I may be able to update enough of LLVM to canonicalize the other way. This restores almost all of the regressions from teaching x86's vselect lowering to always use vector shuffle lowering for blends. The remaining problems are because the v16 lowering path is still doing crazy things. I'll be re-arranging that strategy in more detail in subsequent commits to finish recovering the performance here. llvm-svn: 229836	2015-02-19 10:46:52 +00:00
Chandler Carruth	b89464a9b6	[x86,sdag] Two interrelated changes to the x86 and sdag code. First, don't combine bit masking into vector shuffles (even ones the target can handle) once operation legalization has taken place. Custom legalization of vector shuffles may exist for these patterns (making the predicate return true) but that custom legalization may in some cases produce the exact bit math this matches. We only really want to handle this prior to operation legalization. However, the x86 backend, in a fit of awesome, relied on this. What it would do is mark VSELECTs as expand, which would turn them into arithmetic, which this would then match back into vector shuffles, which we would then lower properly. Amazing. Instead, the second change is to teach the x86 backend to directly form vector shuffles from VSELECT nodes with constant conditions, and to mark all of the vector types we support lowering blends as shuffles as custom VSELECT lowering. We still mark the forms which actually support variable blends as legal so that the custom lowering is bypassed, and the legal lowering can even be used by the vector shuffle legalization (yes, i know, this is confusing. but that's how the patterns are written). This makes the VSELECT lowering much more sensible, and in fact should fix a bunch of bugs with it. However, as you'll see in the test cases, right now what it does is point out the hilarious deficiency of the new vector shuffle lowering when it comes to blends. Fortunately, my very next patch fixes that. I can't submit it yet, because that patch, somewhat obviously, forms the exact and/or pattern that the DAG combine is matching here! Without this patch, teaching the vector shuffle lowering to produce the right code infloops in the DAG combiner. With this patch alone, we produce terrible code but at least lower through the right paths. With both patches, all the regressions here should be fixed, and a bunch of the improvements (like using 2 shufps with no memory loads instead of 2 andps with memory loads and an orps) will stay. Win! There is one other change worth noting here. We had hilariously wrong vectorization cost estimates for vselect because we fell through to the code path that assumed all "expand" vector operations are scalarized. However, the "expand" lowering of VSELECT is vector bit math, most definitely not scalarized. So now we go back to the correct if horribly naive cost of "1" for "not scalarized". If anyone wants to add actual modeling of shuffle costs, that would be cool, but this seems an improvement on its own. Note the removal of 16 and 32 "costs" for doing a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of course, we don't right now because of OMG bad code, but I'm going to fix that. Next patch. I promise. llvm-svn: 229835	2015-02-19 10:36:19 +00:00
Michael Kuperstein	ba5b04c798	Use std::bitset for SubtargetFeatures Previously, subtarget features were a bitfield with the underlying type being uint64_t. Since several targets (X86 and ARM, in particular) have hit or were very close to hitting this bound, switching the features to use a bitset. No functional change. Differential Revision: http://reviews.llvm.org/D7065 llvm-svn: 229831	2015-02-19 09:01:04 +00:00
Eric Christopher	d84f5d30e2	Remove the local subtarget variable from the SystemZ asm printer and update the two calls accordingly. llvm-svn: 229805	2015-02-19 01:26:28 +00:00
Eric Christopher	0795a2ef0c	Remove a few more calls to TargetMachine::getSubtarget from the R600 port. llvm-svn: 229804	2015-02-19 01:10:55 +00:00
Eric Christopher	7edca437f5	Grab the subtarget off of the machine function for the R600 asm printer and clean up a bunch of uses. llvm-svn: 229803	2015-02-19 01:10:53 +00:00
Eric Christopher	96caeda730	Remove the DisasmEnabled AsmPrinter variable and just look it up on the subtarget where it's set anyhow than looking it up 2-3 times in the same place. llvm-svn: 229802	2015-02-19 01:10:49 +00:00
Peter Collingbourne	fb8002cbe0	MC: Remove NullStreamer hook, as it is redundant with NullTargetStreamer. llvm-svn: 229799	2015-02-19 00:45:07 +00:00
Peter Collingbourne	20c7259ce9	Introduce Target::createNullTargetStreamer and use it from IRObjectFile. A null MCTargetStreamer allows IRObjectFile to ignore target-specific directives. Previously we were crashing. Differential Revision: http://reviews.llvm.org/D7711 llvm-svn: 229797	2015-02-19 00:45:02 +00:00
Eric Christopher	ca929f2469	Avoid using a self-referential initializer and fix up uses. llvm-svn: 229790	2015-02-19 00:22:47 +00:00
Eric Christopher	111de895a0	80-column fixups. llvm-svn: 229789	2015-02-19 00:15:33 +00:00
Eric Christopher	02389e3886	Remove all use of is64bit off of NVPTXSubtarget and clean up code accordingly. This changes the constructors of a number of classes that don't need to know the subtarget's 64-bitness. llvm-svn: 229787	2015-02-19 00:08:27 +00:00
Eric Christopher	beffc4e84f	Remove all use of getDrvInterface off of NVPTXSubtarget and clean up code accordingly. Delete code that was checking for all cases of an enum. llvm-svn: 229786	2015-02-19 00:08:23 +00:00
Eric Christopher	6aad8b1801	Migrate the NVPTX backend asm printer to a per function subtarget. This involved moving two non-subtarget dependent features (64-bitness and the driver interface) to the NVPTX target machine and updating the uses (or migrating around the subtarget use for ease of review). Otherwise use the cached subtarget or create a default subtarget based on the TargetMachine cpu and feature string for the module level assembler emission. llvm-svn: 229785	2015-02-19 00:08:14 +00:00
Marek Olsak	9b8f32eed1	R600/SI: Fix READLANE and WRITELANE lane select for VI VOP2 declares vsrc1, but VOP3 declares src1. We can't use the same "ins" if the operands have different names in VOP2 and VOP3 encodings. This fixes a hang in geometry shaders which spill M0 on VI. (BTW it doesn't look like M0 needs spilling and the spilling seems duplicated 3 times) llvm-svn: 229752	2015-02-18 22:12:45 +00:00
Marek Olsak	8eeebcccb5	R600/SI: Simplify verification of AMDGPU::OPERAND_REG_INLINE_C llvm-svn: 229751	2015-02-18 22:12:41 +00:00
Marek Olsak	b8c818337d	R600/SI: Remove explicit VOP operand checking This should be handled by the OperandType checking. llvm-svn: 229750	2015-02-18 22:12:37 +00:00
Jozef Kolek	3c6724f442	[mips][microMIPS] Make usage of ADDU16 and SUBU16 by code generator Differential Revision: http://reviews.llvm.org/D7609 llvm-svn: 229706	2015-02-18 17:33:56 +00:00
Jozef Kolek	1fd6548297	[mips][microMIPS] Implement JALX instruction Differential Revision: http://reviews.llvm.org/D5047 llvm-svn: 229702	2015-02-18 17:15:48 +00:00
Daniel Sanders	1779314e3c	[mips] Add backend support for Mips32r[35] and Mips64r[35]. Summary: These ISA's didn't add any instructions so they are almost identical to Mips32r2 and Mips64r2. Even the ELF e_flags are the same, However the ISA revision in .MIPS.abiflags is 3 or 5 respectively instead of 2. Reviewers: vmedic Reviewed By: vmedic Subscribers: tomatabacu, llvm-commits, atanasyan Differential Revision: http://reviews.llvm.org/D7381 llvm-svn: 229695	2015-02-18 16:24:50 +00:00
Kit Barton	298beb5e86	This patch adds the VSX logical instructions introduced in the Power ISA 2.07. It also removes the added complexity that favors VMX versions of the three instructions. Phabricator review: http://reviews.llvm.org/D7616 Commiting on Nemanja's behalf. llvm-svn: 229694	2015-02-18 16:21:46 +00:00
Tom Stellard	1ca873bbc5	R600/SI: Don't set isCodeGenOnly = 1 on all instructions We only need to set this on pseudo instructions which won't be used by the assembler. llvm-svn: 229689	2015-02-18 16:08:17 +00:00
Tom Stellard	c34c37ae66	R600/SI: Add missing VOP1 instructions llvm-svn: 229688	2015-02-18 16:08:15 +00:00
Tom Stellard	894b9883f4	R600/SI: Add missing VOP2 instructions llvm-svn: 229687	2015-02-18 16:08:14 +00:00
Tom Stellard	0c0008cb6e	R600/SI: Add definition for S_CBRANCH_G_FORK llvm-svn: 229686	2015-02-18 16:08:13 +00:00
Tom Stellard	ce449ade7e	R600/SI: Add missing SOP1 instructions llvm-svn: 229685	2015-02-18 16:08:11 +00:00
Tom Stellard	ee21faa029	R600/SI: Refactor SOP2 definitions llvm-svn: 229684	2015-02-18 16:08:09 +00:00
Vasileios Kalintiris	611cb70b83	[mips] Avoid redundant sign extension of the result of binary bitwise instructions. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7581 llvm-svn: 229675	2015-02-18 14:57:05 +00:00
Benjamin Kramer	6ca8992018	X86: Use bitset to manage a bag of bits. NFC. Doesn't matter in terms of memory usage or perf here, but it's a neat simplification. llvm-svn: 229672	2015-02-18 14:10:44 +00:00
Toma Tabacu	8874eac5e6	[mips] [IAS] Fix using .cpsetup with local labels (PR22518). Summary: Parse for an MCExpr instead of an Identifier and use the symbol for relocations, not just the symbol's name. This fixes errors when using local labels in .cpsetup (PR22518). Reviewers: dsanders Reviewed By: dsanders Subscribers: seanbruno, emaste, llvm-commits Differential Revision: http://reviews.llvm.org/D7697 llvm-svn: 229671	2015-02-18 13:46:53 +00:00
Chandler Carruth	bbb377c3a1	[x86] Tighten the assertions to document that canonicalization has actually removed all but a very small number of choices for v2i64. Also remove dead code handling cases that simply cannot arise. llvm-svn: 229670	2015-02-18 11:46:29 +00:00
Chandler Carruth	811f0ee8c1	[x86] Switch an if which is trivially true to an assert. NFC llvm-svn: 229669	2015-02-18 11:46:27 +00:00
Chandler Carruth	8f3e585b17	[x86] Remove some more 'bit' nomenclature from the generic shift lowering. llvm-svn: 229668	2015-02-18 11:46:23 +00:00
Chandler Carruth	672a98ea28	[x86] Fold together the two shift lowering strategies. They were doing quite literally the same work, we just need to special case the >64-bit element shift code emission to emit the byte shift instructions and offsets. This also makes reasoning about each of the vector lowering strategies easier as we don't have to remember to use both forms. llvm-svn: 229662	2015-02-18 10:40:38 +00:00
Bradley Smith	26c9922a59	[ARM] Add missing M/R class CPUs Add some of the missing M and R class Cortex CPUs, namely: Cortex-M0+ (called Cortex-M0plus for GCC compatibility) Cortex-M1 SC000 SC300 Cortex-R5 llvm-svn: 229660	2015-02-18 10:33:30 +00:00
Ulrich Weigand	b7e5909a42	[SystemZ] Clean up warning Removed (unreachable) default case in switch to clean up warning: lib/Target/SystemZ/SystemZISelLowering.cpp:1974:5: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] llvm-svn: 229658	2015-02-18 09:42:23 +00:00
Chandler Carruth	48cc6c623a	[x86] Refactor the bit shift code the same as I just did the byte shift code. While this didn't have the miscompile (it used MatchLeft consistently) it missed some cases where it could use right shifts. I've added a test case Craig Topper came up with to exercise the right shift matching. This code is really identical between the two. I'm going to merge them next so that we don't keep two copies of all of this logic. llvm-svn: 229655	2015-02-18 09:19:58 +00:00
Ulrich Weigand	7db6918e2b	[SystemZ] Support all TLS access models - CodeGen part The current SystemZ back-end only supports the local-exec TLS access model. This patch adds all required CodeGen support for the other TLS models, which means in particular: - Expand initial-exec TLS accesses by loading TLS offsets from the GOT using @indntpoff relocations. - Expand general-dynamic and local-dynamic accesses by generating the appropriate calls to __tls_get_offset. Note that this routine has a non-standard ABI and requires loading the GOT pointer into %r12, so the patch also adds support for the GLOBAL_OFFSET_TABLE ISD node. - Add a new platform-specific optimization pass to remove redundant __tls_get_offset calls in the local-dynamic model (modeled after the corresponding X86 pass). - Add test cases verifying all access models and optimizations. llvm-svn: 229654	2015-02-18 09:13:27 +00:00
Ulrich Weigand	7bdd7c2346	[SystemZ] Support all TLS access models - MC part The current SystemZ back-end only supports the local-exec TLS access model. This patch adds all required MC support for the other TLS models, which means in particular: - Support additional relocation types for Initial-exec model: R_390_TLS_IEENT Local-dynamic-model: R_390_TLS_LDO32, R_390_TLS_LDO64, R_390_TLS_LDM32, R_390_TLS_LDM64, R_390_TLS_LDCALL General-dynamic model: R_390_TLS_GD32, R_390_TLS_GD64, R_390_TLS_GDCALL - Support assembler syntax to generate additional relocations for use with __tls_get_offset calls: :tls_gdcall: :tls_ldcall: The patch also adds a new test to verify fixups and relocations, and removes the (already unused) FK_390_PLT16DBL/FK_390_PLT32DBL fixup kinds. llvm-svn: 229652	2015-02-18 09:11:36 +00:00
Elena Demikhovsky	714f23bcdb	AVX-512: Added support for FP instructions with embedded rounding mode. By Asaf Badouh <asaf.badouh@intel.com> llvm-svn: 229645	2015-02-18 07:59:20 +00:00
Chandler Carruth	55553f5299	[x86] Rewrite the byte shift detection to not use boolean variables to track state. I didn't like this in the code review because the pattern tends to be error prone, but I didn't see a clear way to rewrite it. Turns out that there were bugs here, I found them when fuzz testing our shuffle lowering for correctness on x86. The core of the problem is that we need to consistently test all our preconditions for the same directionality of shift and the same input vector. Instead, formulate this as two predicates (one doesn't depend on the input in any way), pass things like the directionality and input vector as inputs, and loop over the alternatives. This fixes a pattern of very rare miscompiles coming out of this code. Turned up roughly 4 out of every 1 million v8 shuffles in my fuzz testing. The new code is over half a million test runs with no failures yet. I've also fuzzed every other function in the lowering code with over 3.5 million test cases and not discovered any other miscompiles. llvm-svn: 229642	2015-02-18 07:13:48 +00:00
Craig Topper	b324e43aed	[X86] Remove AVX2 and SSE2 pslldq and psrldq intrinsics. We can represent them in IR with vector shuffles now. All their uses have been removed from clang in favor of shuffles. llvm-svn: 229640	2015-02-18 06:24:44 +00:00
Matt Arsenault	0ba644b66b	R600/SI: Rename dst encoding field to be consistent with docs The docs call this vdst instead of just dst. llvm-svn: 229614	2015-02-18 02:15:37 +00:00
Matt Arsenault	e3dbcf6656	R600/SI: Consistently capitalize encoding field names Some formats capitalized these, but most didn't. Change them all to be consistently lowercase. Now, non-encoding fields and convenience bits are capitalized. Also remove weird looking empty line in some of the formats. llvm-svn: 229613	2015-02-18 02:15:35 +00:00
Matt Arsenault	1ecac06a6f	R600/SI: Set noNamedPositionallyEncodedOperands llvm-svn: 229612	2015-02-18 02:15:32 +00:00
Matt Arsenault	096ec1e10c	R600/SI: Fix src1_modifiers for class instructions src1 doesn't have modifiers, but the operand was missing resulting in an encoding build error when all fields are required.' llvm-svn: 229611	2015-02-18 02:15:30 +00:00
Matt Arsenault	65fa1c425d	R600/SI: Fix not setting clamp / omod for v_cndmask_b32_e64 Rename the multiclass since it now applies to the output modifiers as well. llvm-svn: 229610	2015-02-18 02:15:27 +00:00
Matt Arsenault	284d7dfb53	R600: Fix operand encoding error llvm-svn: 229609	2015-02-18 02:10:42 +00:00
Matt Arsenault	1991f5e40b	R600/SI: Fix encoding error from glc bit on VI SMRD instructions llvm-svn: 229608	2015-02-18 02:10:40 +00:00
Matt Arsenault	e6c5241814	R600/SI: Fix operand encoding for flat instructions llvm-svn: 229607	2015-02-18 02:10:37 +00:00
Matt Arsenault	07e3bb153f	R600/SI: Fix error from vdst on no return atomics Set the ignored field to 0 so we can enable noNamedPositionallyEncodedOperands. llvm-svn: 229606	2015-02-18 02:10:35 +00:00
Matt Arsenault	caa1288fff	R600/SI: Add missing offset operand to buffer bothen llvm-svn: 229605	2015-02-18 02:04:38 +00:00
Matt Arsenault	2ad8bab7ee	R600/SI: Add missing soffset operand to global atomics llvm-svn: 229604	2015-02-18 02:04:35 +00:00
Matt Arsenault	3c34ae293c	R600/SI: Fix brace identation llvm-svn: 229603	2015-02-18 02:04:31 +00:00
Eric Christopher	8af49b3214	Make the Mips AsmPrinter independent of global subtarget initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use calls on the TargetMachine or get information from the subtarget we'd use for assembling. The top-level-ness of the MIPS attribute output for assembly is, by nature, contrary to how we'd want to do this for an LTO situation where we have multiple cpu architectures so this solution is good enough for now. llvm-svn: 229596	2015-02-18 01:01:57 +00:00
Eric Christopher	bbe6ff50f3	Unify selectMipsCPU implementations. llvm-svn: 229595	2015-02-18 00:55:06 +00:00
Andrea Di Biagio	e7b58ee555	[X86][FastIsel] Teach how to select scalar integer to float/double conversions. This patch teaches fast-isel how to select a (V)CVTSI2SSrr for an integer to float conversion, and how to select a (V)CVTSI2SDrr for an integer to double conversion. Added test 'fast-isel-int-float-conversion.ll'. Differential Revision: http://reviews.llvm.org/D7698 llvm-svn: 229589	2015-02-17 23:40:58 +00:00
Rafael Espindola	df19519800	Add r228939 back with a fix. The problem in the original patch was not switching back to .text after printing an eh table. Original message: On ELF, put PIC jump tables in a non executable section. Fixes PR22558. llvm-svn: 229586	2015-02-17 23:34:51 +00:00
Sanjay Patel	e951a3839a	rename variables again because these tables also deal with stores; NFC Suggestion by Simon Pilgrim llvm-svn: 229574	2015-02-17 22:38:06 +00:00
Simon Pilgrim	1d89a02abb	[X86][SSE] Generalised unpckl/unpckh shuffle matching Added commuted unpckl/unpckh shuffle matching patterns as many cases containing undefined lanes fail to commute by themselves. Differential Revision: http://reviews.llvm.org/D7564 llvm-svn: 229571	2015-02-17 22:24:32 +00:00
Sanjay Patel	1a20fdf36f	Add comment to explain a non-obvious setting; NFC. This is paraphrased from Simon Pilgrim's comment in: http://reviews.llvm.org/D7492 llvm-svn: 229566	2015-02-17 22:09:54 +00:00
Sanjay Patel	203ee500e9	remove function names from comments; NFC llvm-svn: 229558	2015-02-17 21:55:20 +00:00
Sanjay Patel	52f9f7c0f3	replace meaningless variable names; NFCI llvm-svn: 229549	2015-02-17 21:37:28 +00:00
Tom Stellard	7b3aa88ac1	R600/SI: Fix asam errors in SIFoldOperands We were trying to fold into implicit uses, which led to out of bounds access of the MCInstrDesc::OpInfo arrray. llvm-svn: 229533	2015-02-17 20:11:54 +00:00
Sanjay Patel	b811c1d6a5	prevent folding a scalar FP load into a packed logical FP instruction (PR22371) Change the memory operands in sse12_fp_packed_scalar_logical_alias from scalars to vectors. That's what the hardware packed logical FP instructions define: 128-bit memory operands. There are no scalar versions of these instructions...because this is x86. Generating the wrong code (folding a scalar load into a 128-bit load) is still possible using the peephole optimization pass and the load folding tables. We won't completely solve this bug until we either fix the lowering in fabs/fneg/fcopysign and any other places where scalar FP logic is created or fix the load folding in foldMemoryOperandImpl() to make sure it isn't changing the size of the load. Differential Revision: http://reviews.llvm.org/D7474 llvm-svn: 229531	2015-02-17 20:08:21 +00:00
Eric Christopher	a49d68e078	Make the ARM AsmPrinter independent of global subtarget initialization. Initialize the subtarget once per function and migrate Emit{Start\|End}OfAsmFile to either use attributes on the TargetMachine or get information from the subtarget we'd use for assembling. One bit (getISAEncoding) touched the general AsmPrinter and the debug output. Handle this one by passing the function for the subprogram down and updating all callers and users. The top-level-ness of the ARM attribute output for assembly is, by nature, contrary to how we'd want to do this for an LTO situation where we have multiple cpu architectures so this solution is good enough for now. llvm-svn: 229528	2015-02-17 20:02:32 +00:00
Tom Stellard	bc3776803b	R600/SI: Extend private extload pattern to include zext loads llvm-svn: 229507	2015-02-17 16:36:00 +00:00
Benjamin Kramer	6cd780ff21	Prefer SmallVector::append/insert over push_back loops. Same functionality, but hoists the vector growth out of the loop. llvm-svn: 229500	2015-02-17 15:29:18 +00:00
Andrea Di Biagio	eb97f92489	[X86] Silence -Wsign-compare warnings. GCC 4.8 reported two new warnings due to comparisons between signed and unsigned integer expressions. The new warnings were accidentally introduced by revision 229480. Added explicit casts to silence the warnings. No functional change intended. llvm-svn: 229488	2015-02-17 11:20:11 +00:00
Elena Demikhovsky	ba84672519	AVX-512: changes in intel_ocl_bi calling conventions - added mask types v8i1 and v16i1 to possible function parameters - enabled passing 512-bit vectors in standard CC - added a test for KNL intel_ocl_bi conventions llvm-svn: 229482	2015-02-17 09:20:12 +00:00
Michael Kuperstein	ff5acaf50c	[X86] Combine vector anyext + and into a vector zext Vector zext tends to get legalized into a vector anyext, represented as a vector shuffle with an undef vector + a bitcast, that gets ANDed with a mask that zeroes the undef elements. Combine this into an explicit shuffle with a zero vector instead. This allows shuffle lowering to match it as a zext, instead of matching it as an anyext and emitting an explicit AND. This combine only covers a subset of the cases, but it's a start. Differential Revision: http://reviews.llvm.org/D7666 llvm-svn: 229480	2015-02-17 08:22:51 +00:00
Eric Christopher	5c0e009d3a	Make the PowerPC AsmPrinter independent of global subtarget initialization. Initialize the subtarget once per function and migrate EmitStartOfAsmFile to either use attributes on the TargetMachine or get information from all of the various subtargets. llvm-svn: 229475	2015-02-17 07:21:21 +00:00
Eric Christopher	75dc3904a5	Add a FIXME to move IsLittleEndian to the target machine. llvm-svn: 229472	2015-02-17 06:45:17 +00:00
Eric Christopher	fee6aaf683	Move ABI handling and 64-bitness to the PowerPC target machine. This required changing how the computation of the ABI is handled and how some of the checks for ABI/target are done. llvm-svn: 229471	2015-02-17 06:45:15 +00:00
Chandler Carruth	55db07016e	[x86] Teach the unpack lowering to try wider element unpacks. This allows it to match still more places where previously we would have to fall back on floating point shuffles or other more complex lowering strategies. I'm hoping to replace some of the hand-rolled unpack matching with this routine is it gets more and more clever. llvm-svn: 229463	2015-02-17 02:12:24 +00:00
Hal Finkel	5cedafb8cd	[PowerPC] Support non-direct-sub/superclass VSX copies Our register allocation has become better recently, it seems, and is now starting to generate cross-block copies into inflated register classes. These copies are not transformed into subregister insertions/extractions by the PPCVSXCopy class, and so need to be handled directly by PPCInstrInfo::copyPhysReg. The code to do this was almost there, but not quite (it was unnecessarily restricting itself to only the direct sub/super-register-class case (not copying between, for example, something in VRRC and the lower-half of VSRC which are super-registers of F8RC). Triggering this behavior manually is difficult; I'm including two bugpoint-reduced test cases from the test suite. llvm-svn: 229457	2015-02-16 23:46:30 +00:00

1 2 3 4 5 ...

32113 Commits