llvm-project

Commit Graph

Author	SHA1	Message	Date
Artyom Skrobov	a70dfe18d3	Re-apply r237247 - [AArch64] Codegen VMAX/VMIN for safe math cases No longer breaks SPEC2000/2006 llvm-svn: 237361	2015-05-14 12:59:46 +00:00
Elena Demikhovsky	d5b3e376d2	AVX-512: Added i1 type handling for calling conventions. i1 type is a legal type on AVX-512 and can be passed as parameter or return value. i1 is promoted to i8 on return and to i32 for call arguments (i8 is also promoted to i32 here). The result code is similar to the previous X86 targets, where i1 is allways promoted to i8. llvm-svn: 237350	2015-05-14 09:04:45 +00:00
Ahmed Bougacha	6402ad27c0	[CodeGen] Use standard -not gnueabi- naming for f16 libcalls on Darwin. Other targets probably should as well. Since r237161, compiler-rt has both, but I don't see why anything other than gnueabi would use a gnueabi naming scheme. llvm-svn: 237324	2015-05-14 01:00:51 +00:00
Brendon Cahoon	d11c92a41c	[Hexagon] Generate loop1 instruction for nested loops loop1 is for the outer loop and loop0 is for the inner loop. Differential Revision: http://reviews.llvm.org/D9680 llvm-svn: 237266	2015-05-13 17:56:03 +00:00
Brendon Cahoon	254e656862	[Hexagon] Generate hardware loop when loop has a critical edge The hardware loop pass should try to generate a hardware loop instruction when the original loop has a critical edge. Differential Revision: http://reviews.llvm.org/D9678 llvm-svn: 237258	2015-05-13 14:54:24 +00:00
Silviu Baranga	780a3b3be7	Revert r237247 - [AArch64] Codegen VMAX/VMIN.. as it is causing failures in SPEC2000/2006 llvm-svn: 237256	2015-05-13 14:03:18 +00:00
Artyom Skrobov	b526681e08	[AArch64] Codegen VMAX/VMIN for safe math cases llvm-svn: 237247	2015-05-13 12:01:09 +00:00
Sanjoy Das	a1d39ba940	[Statepoints] Support for "patchable" statepoints. Summary: This change adds two new parameters to the statepoint intrinsic, `i64 id` and `i32 num_patch_bytes`. `id` gets propagated to the ID field in the generated StackMap section. If the `num_patch_bytes` is non-zero then the statepoint is lowered to `num_patch_bytes` bytes of nops instead of a call (the spill and reload code remains unchanged). A non-zero `num_patch_bytes` is useful in situations where a language runtime requires complete control over how a call is lowered. This change brings statepoints one step closer to patchpoints. With some additional work (that is not part of this patch) it should be possible to get rid of `TargetOpcode::STATEPOINT` altogether. PlaceSafepoints generates `statepoint` wrappers with `id` set to `0xABCDEF00` (the old default value for the ID reported in the stackmap) and `num_patch_bytes` set to `0`. This can be made more sophisticated later. Reviewers: reames, pgavlin, swaroop.sridhar, AndyAyers Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9546 llvm-svn: 237214	2015-05-12 23:52:24 +00:00
Saleem Abdulrasool	ee13fbe848	CodeGen: ignore DEBUG_VALUE nodes in KILL tagging DEBUG_VALUE nodes do not take part in code generation. Ignore them when performing KILL updates. Addresses PR23486. llvm-svn: 237211	2015-05-12 23:36:18 +00:00
Chandler Carruth	942fba95e2	Revert r237175: [X86] Always return the sret parameter in eax/rax ... This commit broke an x86 test and the bots have been broken for well over an hour now so I'm just reverting. llvm-svn: 237210	2015-05-12 23:34:27 +00:00
Reid Kleckner	b465563b46	[X86] Always return the sret parameter in eax/rax, even on 32-bit Summary: This rule was always in the old SysV i386 ABI docs and the new ones that H.J. Lu has put together, but we never noticed: EAX scratch register; also used to return integer and pointer values from functions; also stores the address of a returned struct or union Fixes PR23491. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9715 llvm-svn: 237175	2015-05-12 20:56:32 +00:00
Sundeep Kushwaha	83a4039c17	[PATCH] [HEXAGON] Add a test program to verify calling convention for large struct return by value. Differential Revision: http://reviews.llvm.org/D9709 llvm-svn: 237170	2015-05-12 20:13:10 +00:00
Pat Gavlin	c7dc6d6ee7	[Statepoints] Split the calling convention and statepoint flags operand to STATEPOINT into two separate operands. Differential Revision: http://reviews.llvm.org/D9623 llvm-svn: 237166	2015-05-12 19:50:19 +00:00
Petar Jovanovic	e0de8f4efb	[Mips] Return false for isFPCloseToIncomingSP() On Mips, frame pointer points to the same side of the frame as the stack pointer. This function is used to decide where to put register scavenging spill slot. So far, it was put on the wrong side of the frame, and thus it was too far away from $fp when frame was larger than 2^15 bytes. Patch by Vladimir Radosavljevic. http://reviews.llvm.org/D8895 llvm-svn: 237153	2015-05-12 17:14:05 +00:00
Tom Stellard	28d13a4b12	R600/SI: add pass to mark CF live ranges as non-spillable Spilling can insert instructions almost anywhere, and this can mess up control flow lowering in a multitude of ways, due to instruction reordering. Let's sort this out the easy way: never spill registers involved with control flow, i.e. saved EXEC masks. Unfortunately, this does not work at all with optimizations disabled, as the register allocator ignores spill weights. This should be addressed in a future commit. The test was reduced from the "stacks" shader of [1]. Some issues trigger the machine verifier while another one is checked manually. [1] http://madebyevan.com/webgl-path-tracing/ v2: only insert pass with optimizations enabled, merge test runs. Patch by: Grigori Goronzy llvm-svn: 237152	2015-05-12 17:13:02 +00:00
Sunil Srivastava	d79dfcbc37	Changed renaming of local symbols by inserting a dot vefore the numeric suffix. One code change and several test changes to match that details in http://reviews.llvm.org/D9481 llvm-svn: 237150	2015-05-12 16:47:30 +00:00
Tom Stellard	8f96dfc9ea	R600/SI: Remove M0Reg register class It is no longer used. llvm-svn: 237142	2015-05-12 15:00:52 +00:00
Tom Stellard	381a94a764	R600/SI: Remove explicit m0 operand from DS instructions Instead add m0 as an implicit operand. This helps avoid spills of the m0 register in some cases. llvm-svn: 237141	2015-05-12 15:00:49 +00:00
Tom Stellard	7b222e110f	R600/SI: Make sendmsg test more strict We want to make sure that the m0 copies are being cse'd. llvm-svn: 237134	2015-05-12 14:18:16 +00:00
Elena Demikhovsky	fae20d3565	AVX-512, X86: Added lowering for shift operations for SKX. The other changes in the LowerShift() are not functional, just to make the code more convenient. So, the functional changes for SKX only. llvm-svn: 237129	2015-05-12 13:25:46 +00:00
John Brawn	70605f7d22	[ARM] Use AEABI aligned function variants AEABI defines aligned variants of memcpy etc. that can be faster than the default version due to not having to do alignment checks. When emitting target code for these functions make use of these aligned variants if possible. Also convert memset to memclr if possible. Differential Revision: http://reviews.llvm.org/D8060 llvm-svn: 237127	2015-05-12 13:13:38 +00:00
Igor Laevsky	87ef5eaf46	Reverse ordering of base and derived pointer during safepoint lowering. According to the documentation in StackMap section for the safepoint we should have: "The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated." But before this change we emitted them in reverse order - derived pointer first, base pointer second. llvm-svn: 237126	2015-05-12 13:12:14 +00:00
Vasileios Kalintiris	b48c905613	[mips][FastISel] Handle calls with non legal types i8 and i16. Summary: Allow calls with non legal integer types based on i8 and i16 to be processed by mips fast-isel. Based on a patch by Reed Kotler. Test Plan: "Make check" test forthcoming. Test-suite passes at O0/O2 and with mips32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6770 llvm-svn: 237121	2015-05-12 12:29:17 +00:00
Vasileios Kalintiris	c07f848785	[mips][FastISel] Simplify callabi.ll by using multiple check prefixes. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9635 llvm-svn: 237119	2015-05-12 12:17:11 +00:00
Vasileios Kalintiris	32cd69a2eb	[mips][FastISel] Allow computation of addresses from constant expressions. Summary: Try to compute addresses when the offset from a memory location is a constant expression. Based on a patch by Reed Kotler. Test Plan: Passes test-suite for -O0/O2 and mips 32 r1/r2 Reviewers: rkotler, dsanders Subscribers: llvm-commits, aemerson, rfuhler Differential Revision: http://reviews.llvm.org/D6767 llvm-svn: 237117	2015-05-12 12:08:31 +00:00
Elena Demikhovsky	c1ac5d7bd5	AVX-512: select operation for i1 vectors like: select i1 %cond, <16 x i1> %a, <16 x i1> %b. I added pseudo-CMOV patterns to resolve the "select". Added tests for KNL and SKX. llvm-svn: 237106	2015-05-12 09:36:52 +00:00
Michael Kuperstein	6f5ff6905c	[X86] DAGCombine should not assume arbitrary vector types are simple The X86-specific DAGCombine for stores should not assume vector types are always simple. This fixes PR23476. Differential Revision: http://reviews.llvm.org/D9659 llvm-svn: 237097	2015-05-12 07:33:07 +00:00
Eric Christopher	824f42f209	Migrate existing backends that care about software floating point to use the information in the module rather than TargetOptions. We've had and clang has used the use-soft-float attribute for some time now so have the backends set a subtarget feature based on a particular function now that subtargets are created based on functions and function attributes. For the one middle end soft float check go ahead and create an overloadable TargetLowering::useSoftFloat function that just checks the TargetSubtargetInfo in all cases. Also remove the command line option that hard codes whether or not soft-float is set by using the attribute for all of the target specific test cases - for the generic just go ahead and add the attribute in the one case that showed up. llvm-svn: 237079	2015-05-12 01:26:05 +00:00
Andrew Kaylor	cc14f387e8	[WinEH] Handle nested landing pads that return directly to the parent function. Differential Revision: http://reviews.llvm.org/D9684 llvm-svn: 237063	2015-05-11 23:06:02 +00:00
Andrew Kaylor	762a6bea1f	[WinEH] Update exception numbering to give handlers their own base state. Differential Revision: http://reviews.llvm.org/D9512 llvm-svn: 237014	2015-05-11 19:41:19 +00:00
Pirama Arumuga Nainar	af171e7720	[X86] Updates to X86 backend for f16 promotion Summary: r235215 adds support for f16 to be considered as a load/store type and promote f16 operations to f32. This patch has miscellaneous fixes for the X86 backend so all f16 operations are handled: 1. Set loadextaction for f16 vectors to expand. 2. Handle FP_EXTEND in a switch statement when handling v2f32 3. Do not fold (FP_TO_SINT (load f16)) into FP_TO_INT*_IN_MEM or (store (SINT_TO_FP )) to a FILD. Tests included. Reviewers: ab, srhines, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9092 llvm-svn: 237004	2015-05-11 17:14:39 +00:00
Elena Demikhovsky	3822af0715	AVX-512: Changed CC parameter in "cmp" intrinsic from i8 to i32 according to the Intel Spec by Igor Breger (igor.breger@intel.com) llvm-svn: 236979	2015-05-11 09:03:14 +00:00
Elena Demikhovsky	0d7e9364d1	AVX-512: Added SKX instructions and intrinsics: {add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae By Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236971	2015-05-11 06:05:05 +00:00
Elena Demikhovsky	f40342d6a2	AVX-512: fixed UINT_TO_FP operation for 512-bit types. llvm-svn: 236955	2015-05-10 14:23:52 +00:00
Elena Demikhovsky	75d1489326	AVX-512: fixed a bug in i1 vectors lowering llvm-svn: 236947	2015-05-10 10:33:32 +00:00
NAKAMURA Takumi	7f52d6d82c	llvm/test/CodeGen/AArch64/tailcall_misched_graph.ll: s/REQUIRE/REQUIRES/ llvm-svn: 236928	2015-05-09 05:59:00 +00:00
James Y Knight	fca02be3c1	Fix MergeConsecutiveStore for non-byte-sized memory accesses. The bug showed up as a compile-time assertion failure: Assertion `NumBits >= MIN_INT_BITS && "bitwidth too small"' failed when building msan tests on x86-64. Prior to r236850, this bug was masked due to a bogus alignment check, which also accidentally rejected non-byte-sized accesses. Afterwards, an invalid ElementSizeBytes == 0 got further into the function, and triggered the assertion failure. It would probably be a good idea to allow it to handle merging stores of unusual widths as well, but for now, to un-break it, I'm just making the minimal fix. Differential Revision: http://reviews.llvm.org/D9626 llvm-svn: 236927	2015-05-09 03:13:37 +00:00
Pete Cooper	d54fb89901	[Fast-ISel] Don't mark the first use of a remat constant as killed. When emitting something like 'add x, 1000' if we remat the 1000 then we should be able to mark the vreg containing 1000 as killed. Given that we go bottom up in fast-isel, a later use of 1000 will be higher up in the BB and won't kill it, or be impacted by the lower kill. However, rematerialised constant expressions aren't generated bottom up. The local value save area grows downwards. This means that if you remat 2 constant expressions which both use 1000 then the first will kill it, then the second, which is lower in the BB will read a killed register. This is the case in the attached test where the 2 GEPs both need to generate 'add x, 6680' for the constant offset. Note that this commit only makes kill flag generation conservative. There's nothing else obviously wrong with the local value save area growing downwards, and in fact it needs to for handling arbitrarily complex constant expressions. However, it would be nice if there was a solution which would let us generate more accurate kill flags, or just kill flags completely. llvm-svn: 236922	2015-05-09 00:51:03 +00:00
Arnold Schwaighofer	f54b73d681	ScheduleDAGInstrs: In functions with tail calls PseudoSourceValues are not non-aliasing distinct objects The code that builds the dependence graph assumes that two PseudoSourceValues don't alias. In a tail calling function two FixedStackObjects might refer to the same location. Worse 'immutable' fixed stack objects like function arguments are not immutable and will be clobbered. Change this so that a load from a FixedStackObject is not invariant in a tail calling function and don't return a PseudoSourceValue for an instruction in tail calling functions when building the dependence graph so that we handle function arguments conservatively. Fix for PR23459. rdar://20740035 llvm-svn: 236916	2015-05-08 23:52:00 +00:00
Hans Wennborg	ae0254dabc	Switch lowering: cluster adjacent fall-through cases even at -O0 It's cheap to do, and codegen is much faster if cases can be merged into clusters. llvm-svn: 236905	2015-05-08 21:23:39 +00:00
Pete Cooper	e4bb07ecff	[Fast-ISel] Clear kill flags on registers replaced by updateValueMap. When selecting an extract instruction, we don't actually generate code but instead work out which register we are reading, and rewrite uses of the extract def to the source register. This is done via updateValueMap,. However, its possible that the source register we are rewriting to to also have uses. If those uses are after a kill of the value we are rewriting from then we have uses after a kill and the verifier fails. This code checks for the case where the to register is also used, and if so it clears all kill on the from register. This is conservative, but better that always clearing kills on the from register. llvm-svn: 236897	2015-05-08 20:46:54 +00:00
Brendon Cahoon	bece8edcdd	[Hexagon] Generate more hardware loops Refactored parts of the hardware loop pass to generate more. Also, added more tests. Differential Revision: http://reviews.llvm.org/D9568 llvm-svn: 236896	2015-05-08 20:18:21 +00:00
Pete Cooper	7f7c9f1dad	[X86] Fast-ISel was incorrectly always killing the source of a truncate. A trunc from i32 to i1 on x86_64 generates an instruction such as %vreg19<def> = COPY %vreg9:sub_8bit<kill>; GR8:%vreg19 GR32:%vreg9 However, the copy here should only have the kill flag on the 32-bit path, not the 64-bit one. Otherwise, we are killing the source of the truncate which could be used later in the program. llvm-svn: 236890	2015-05-08 18:29:42 +00:00
Pat Gavlin	cc0431d1c0	Extend the statepoint intrinsic to allow statepoints to be marked as transitions from GC-aware code to code that is not GC-aware. This changes the shape of the statepoint intrinsic from: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 unused, ...call args, i32 # deopt args, ...deopt args, ...gc args) to: @llvm.experimental.gc.statepoint(anyptr target, i32 # call args, i32 flags, ...call args, i32 # transition args, ...transition args, i32 # deopt args, ...deopt args, ...gc args) This extension offers the backend the opportunity to insert (somewhat) arbitrary code to manage the transition from GC-aware code to code that is not GC-aware and back. In order to support the injection of transition code, this extension wraps the STATEPOINT ISD node generated by the usual lowering lowering with two additional nodes: GC_TRANSITION_START and GC_TRANSITION_END. The transition arguments that were passed passed to the intrinsic (if any) are lowered and provided as operands to these nodes and may be used by the backend during code generation. Eventually, the lowering of the GC_TRANSITION_{START,END} nodes should be informed by the GC strategy in use for the function containing the intrinsic call; for now, these nodes are instead replaced with no-ops. Differential Revision: http://reviews.llvm.org/D9501 llvm-svn: 236888	2015-05-08 18:07:42 +00:00
Pete Cooper	85b1c48b20	Clear kill flags on all used registers when sinking instructions. The test here was sinking the AND here to a lower BB: %vreg7<def> = ANDWri %vreg8, 0; GPR32common:%vreg7,%vreg8 TBNZW %vreg8<kill>, 0, <BB#1>; GPR32common:%vreg8 which meant that vreg8 was read after it was killed. This commit changes the code from clearing kill flags on the AND to clearing flags on all registers used by the AND. llvm-svn: 236886	2015-05-08 17:54:32 +00:00
Brendon Cahoon	df43e68629	[Hexagon] Update AnalyzeBranch, etc target hooks Improved the AnalyzeBranch, InsertBranch, and RemoveBranch functions in order to handle more of our branch instructions. This requires changes to analyzeCompare and PredicateInstructions. Specifically, we've added support for new value compare jumps, improved handling of endloop, added more compare instructions, and improved support for predicate instructions. Differential Revision: http://reviews.llvm.org/D9559 llvm-svn: 236876	2015-05-08 16:16:29 +00:00
Andrea Di Biagio	84e22b9096	[X86] Teach 'getTargetShuffleMask' how to look through ISD::WrapperRIP when decoding a PSHUFB mask. The function 'getTargetShuffleMask' already knows how to deal with PSHUFB nodes where the mask node is a load from constant pool, and the constant pool node is wrapped by a X86ISD::Wrapper node. This patch extends that logic by teaching it how to also look through X86ISD::WrapperRIP. This helps function combineX86ShufflesRecusively to combine more shuffle sequences containing PSHUFB nodes if we are in RIPRel PIC mode. Before this change, llc (with -relocation-model=pic -march=x86-64) was unable to decode a pshufb where the mask was loaded from a constant pool. For example, the no-op shuffle from test 'x86-fold-pshufb.ll' was not folded into its operand, so instead of generating a single 'movaps' the backend always generated a sub-optimal 'movdqa + pshufb' sequence. Added test x86-fold-pshufb.ll. llvm-svn: 236863	2015-05-08 15:11:07 +00:00
James Y Knight	38514d2177	Fix test added in r236850 for OSX builders. Need to specify triple so that llvm emits the asm syntax that the test expected. llvm-svn: 236855	2015-05-08 14:04:54 +00:00
James Y Knight	284e7b3d6c	Fix alignment checks in MergeConsecutiveStores. 1) check whether the alignment of the memory is sufficient for the merged store or load to be efficient. Not doing so can result in some ridiculously poor code generation, if merging creates a vector operation which must be aligned but isn't. 2) DON'T check that the alignment of each load/store is equal. If you're merging 2 4-byte stores, the first might have 8-byte alignment, but the second certainly will have 4-byte alignment. We do want to allow those to be merged. llvm-svn: 236850	2015-05-08 13:47:01 +00:00
Vasileios Kalintiris	42544d6472	[mips] Emit the .insn directive for empty basic blocks. Summary: In microMIPS, labels need to know whether they are on code or data. This is indicated with STO_MIPS_MICROMIPS and can be inferred by being followed by instructions. For empty basic blocks, we can ensure this by emitting the .insn directive after the label. Also, this fixes some failures in our out-of-tree microMIPS buildbots, for the exception handling regression tests under: SingleSource/Regression/C++/EH Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9530 llvm-svn: 236815	2015-05-08 09:10:15 +00:00
Eric Christopher	81fa35ca24	Now that we have a soft-float attribute, use it instead of the hard coded command line option for the Mips soft float tests. llvm-svn: 236801	2015-05-08 00:57:22 +00:00
Pete Cooper	ba593ad3f3	Clear kill flags in tail duplication. If we duplicate an instruction then we must also clear kill flags on any uses we rewrite. Otherwise we might be killing a register which was used in other BBs. For example, here the entry BB ended up with these instructions, the ADD having been tail duplicated. %vreg24<def> = t2ADDri %vreg10<kill>, 1, pred:14, pred:%noreg, opt:%noreg; GPRnopc:%vreg24 rGPR:%vreg10 %vreg22<def> = COPY %vreg10; GPR:%vreg22 rGPR:%vreg10 The copy here is inserted after the add and so needs vreg10 to be live. llvm-svn: 236782	2015-05-07 21:48:26 +00:00
Pete Cooper	f52123b454	[AArch64] Fix sext/zext folding in address arithmetic. We were accidentally folding a sign/zero extend in to address arithmetic in a different BB when the extend wasn't available there. Cross BB fast-isel isn't safe, so restrict this to only when the extend is in the same BB as the use. llvm-svn: 236764	2015-05-07 19:21:36 +00:00
Nemanja Ivanovic	f3c94b1e3c	Add VSX Scalar loads and stores to the PPC back end This patch corresponds to review: http://reviews.llvm.org/D9440 It adds a new register class to the PPC back end to contain single precision values in VSX registers. Additionally, it adds scalar loads and stores for VSX registers. llvm-svn: 236755	2015-05-07 18:24:05 +00:00
Diego Novillo	de5b8016ab	Fix information loss in branch probability computation. Summary: This addresses PR 22718. When branch weights are too large, they were being clamped to the range [1, MaxWeightForBB]. But this clamping is only applied to edges that go outside the range, so it distorts the relative branch probabilities. This patch changes the weight calculation to scale every branch so the relative probabilities are preserved. The scaling is done differently now. First, all the branch weights are added up, and if the sum exceeds 32 bits, it computes an integer scale to bring all the weights within the range. The patch fixes an existing test that had slightly wrong branch probabilities due to the previous clamping. It now gets branch weights scaled accordingly. Reviewers: dexonsmith Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9442 llvm-svn: 236750	2015-05-07 17:22:06 +00:00
Sanjay Patel	a9f6d3505d	[x86] eliminate unnecessary shuffling/moves with unary scalar math ops (PR21507) Finish the job that was abandoned in D6958 following the refactoring in http://reviews.llvm.org/rL230221: 1. Uncomment the intrinsic def for the AVX r_Int instruction. 2. Add missing r_Int entries to the load folding tables; there are already tests that check these in "test/Codegen/X86/fold-load-unops.ll", so I haven't added any more in this patch. 3. Add patterns to solve PR21507 ( https://llvm.org/bugs/show_bug.cgi?id=21507 ). So instead of this: movaps %xmm0, %xmm1 rcpss %xmm1, %xmm1 movss %xmm1, %xmm0 We should now get: rcpss %xmm0, %xmm0 And instead of this: vsqrtss %xmm0, %xmm0, %xmm1 vblendps $1, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm1[0],xmm0[1,2,3] We should now get: vsqrtss %xmm0, %xmm0, %xmm0 Differential Revision: http://reviews.llvm.org/D9504 llvm-svn: 236740	2015-05-07 15:48:53 +00:00
Elena Demikhovsky	29792e9a80	AVX-512: Added all forms of FP compare instructions for KNL and SKX. Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec. By Igor Breger (igor.breger@intel.com) llvm-svn: 236714	2015-05-07 11:24:42 +00:00
NAKAMURA Takumi	386c22c0ff	llvm/test/CodeGen/X86/llc-override-mcpu-mattr.ll: Tweak not to be affected by x64 Calling Convention. llvm-svn: 236710	2015-05-07 10:18:28 +00:00
Akira Hatanaka	3058d0f080	Let llc and opt override "-target-cpu" and "-target-features" via command line options. This commit fixes a bug in llc and opt where "-mcpu" and "-mattr" wouldn't override function attributes "-target-cpu" and "-target-features" in the IR. Differential Revision: http://reviews.llvm.org/D9537 llvm-svn: 236677	2015-05-06 23:54:14 +00:00
Pete Cooper	27483915e8	Handle dead defs in the if converter. We had code such as this: r2 = ... t2Bcc label1: ldr ... r2 label2; return r2<dead, def> The if converter was transforming this to r2<def> = ... return [pred] r2<dead,def> ldr <r2, kill> return which fails the machine verifier because the ldr now reads from a dead def. The fix here detects dead defs in stepForward and passes them back to the caller in the clobbers list. The caller then clears the dead flag from the def is the value is live. llvm-svn: 236660	2015-05-06 22:51:04 +00:00
Pete Cooper	54085cdc7b	Fix incorrect kill flags in fastisel. If called twice in the same BB on the same constant, FastISel::fastEmit_ri_ was marking the materialized vreg as killed on each use, instead of only the last use. Change this to only mark the last use as killed by making earlier uses check if the vreg is already used elsewhere. llvm-svn: 236650	2015-05-06 22:09:29 +00:00
Pete Cooper	d31583ddfb	[x86] Fix register class of folded load index reg. When folding a load in to another instruction, we need to fix the class of the index register Otherwise, it could be something like GR64 not GR64_NOSP and would fail the machine verifier. llvm-svn: 236644	2015-05-06 21:37:19 +00:00
Tim Northover	e4310fe946	CodeGen: move over-zealous assert into actual if statement. It's quite possible to encounter an insertvalue instruction that's more deeply nested than the value we're looking for, but when that happens we really mustn't compare beyond the end of the index array. Since I couldn't see any guarantees about what comparisons std::equal makes, we probably need to directly check the size beforehand. In practice, I suspect most std::equal implementations would probably bail early, which would be OK. But just in case... rdar://20834485 llvm-svn: 236635	2015-05-06 20:07:38 +00:00
Duncan P. N. Exon Smith	653c1099b4	DwarfDebug: Emit number of bytes in .debug_loc entry directly Emit the number of bytes in a `.debug_loc` entry directly. The old code created temp labels (expensive), emitted the difference between them, and then emitted one on each side of the relevant bytes. (I'm looking at `llc` memory usage on `verify-uselistorder.lto.opt.bc` (the optimized version of ld64's `-save-temps` when linking the `verify-uselistorder` executable in an LTO bootstrap). I've hacked `MCContext::Allocate()` to just call `malloc()` instead of using the `BumpPtrAllocator` so that the heap profile is easier to read. As far as peak memory is concerned, `MCContext::Allocate()` is equivalent to a leak, since it only gets freed at process teardown. In my heap profile, this patch drops memory usage of `DwarfDebug::emitDebugLoc()` from 132.56 MB (11.4%) down to 29.86 MB (2.7%) at peak memory. Some of that must be noise from `SmallVector` (or other) allocations -- peak memory only dropped from 1160 MB down to 1100 MB -- but this nevertheless shaves 5% off the top.) llvm-svn: 236629	2015-05-06 19:11:20 +00:00
Pawel Bylica	3b0adaf6b0	Readd the regression test from r236584. Calling convention fixed to linux. llvm-svn: 236610	2015-05-06 16:43:21 +00:00
Pete Cooper	d927c6eaf8	[ARM] Fast-Isel was incorrectly selecting <2 x double> adds. With neon enabled, we reach SelectBinaryFPOp and are able to get registers for a <2 x double> add. However, we shouldn't actually attempt arithmetic on it as ARMIselLowering says "v2f64 is legal so that QR subregs can be extracted as f64 elements, but neither Neon nor VFP support any arithmetic operations on it." This commit disables SelectBinaryFPOp for any vector types. There's already a FIXME to try handle neon. Doing so would require fixing this conditional which isn't safe for vectors 'VT == MVT::f64 \|\| VT == MVT::i64' llvm-svn: 236609	2015-05-06 16:39:17 +00:00
Bill Schmidt	5fe2e25f7c	[PPC64LE] Adjust vector splats during VSX swap optimization The initial code drop for VSX swap optimization permitted the optimization only when all operations in a web of related computation are lane-insensitive. For some lane-sensitive operations, we can still permit the optimization provided that we make adjustments to those operations. This patch adds special handling for vector splats so that their presence doesn't kill the optimization. Vector splats are lane-sensitive since they identify by number a vector element to be used as the source of a splat. When swap optimizations take place, the desired vector element will move to the opposite doubleword of the quadword vector. We thus replace the index I by (I + N/2) % N, where N is the number of elements in the vector. A new test case is added to test that swap optimization succeeds when vector splats are present, and that the proper input element is used as the source of the splat. An ancillary change removes SH_BUILDVEC as one of the kinds of special handling that may be required by VSX swap optimization. From experience with GCC, I had expected to need some modifications for vector build operations, but I did not find that to be the case. llvm-svn: 236606	2015-05-06 15:40:46 +00:00
Artyom Skrobov	3f8eae92a4	[ARM] generate VMAXNM/VMINNM for a compare followed by a select, in safe math mode too llvm-svn: 236590	2015-05-06 11:44:10 +00:00
Pawel Bylica	b25491faf4	Revert regression test from r236584. Temporary remove a regression test added in r236584. It fails on Windows. llvm-svn: 236586	2015-05-06 10:41:46 +00:00
Pawel Bylica	9f1fb9d1ef	SelectionDAG: Handle out-of-bounds index in extract vector element Summary: This patch correctly handles undef case of EXTRACT_VECTOR_ELT node where the element index is constant and not less than vector size. Test Plan: CodeGen for X86 test included. Also one incorrect regression test fixed. Reviewers: qcolombet, chandlerc, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D9250 llvm-svn: 236584	2015-05-06 10:19:14 +00:00
Ahmed Bougacha	e8d0c4ccea	[ARM][FastISel] Use TST #1 instead of CMP #0 for select. Since r234249, i1 are sext instead of zext; because of that, doing "CMP rN, #0; IT EQ/NE" isn't correct anymore. "TST #1" is the conservatively correct alternative - the tradeoff being that it doesn't have a 16-bit encoding -, so use that instead. llvm-svn: 236569	2015-05-06 04:14:02 +00:00
Sanjoy Das	77b16b78e9	[Statepoints] Remove broken test case. statepoint-indirect-return.ll breaks on linux systems. Delete the test case to make the bots green while I figure out what the right fix is. llvm-svn: 236568	2015-05-06 02:51:46 +00:00
Sanjoy Das	c6bf3e9f12	[StatepointLowering] Don't create temporary instructions. NFCI. Summary: Instead of creating a temporary call instruction and lowering that, use SelectionDAGBuilder::lowerCallOperands. Reviewers: reames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9480 llvm-svn: 236563	2015-05-06 02:36:20 +00:00
Pete Cooper	d0dae3e577	[X86 fast-isel] Constrain the index reg class to not include SP. The index reg on instructions with complex address modes is a GPR64_NOSP. Constrain it to appease the machine verifier. llvm-svn: 236557	2015-05-05 23:41:53 +00:00
Pete Cooper	ce9ad757c7	Fix IfConverter to handle regmask machine operands. Note, this is a recommit of r236515 after fixing an error in r236514. The buildbot ran fast enough that it picked up r236514 prior to r236515 and threw an error. r236515 itself ran 'make check' without errors. Original commit message follows: A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks. These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register. Reviewed by Matthias Braun and Quentin Colombet. llvm-svn: 236550	2015-05-05 22:09:41 +00:00
Peter Collingbourne	85a0e23bc8	Thumb2SizeReduction: Check the correct set of registers for LDMIA. The register set for LDMIA begins at offset 3, not 4. We were previously missing the short encoding of this instruction in the case where the base register was the first register in the register set. Also clean up some dead code: - The isARMLowRegister check is redundant with what VerifyLowRegs does; replace with an assert. - Remove handling of LDMDB instruction, which has no short encoding (and does not appear in ReduceTable). Differential Revision: http://reviews.llvm.org/D9485 llvm-svn: 236535	2015-05-05 20:07:10 +00:00
Ulrich Weigand	9958c489bb	[DAGCombiner] Account for getVectorIdxTy() when narrowing vector load This patch makes ReplaceExtractVectorEltOfLoadWithNarrowedLoad convert the element number from getVectorIdxTy() to PtrTy before doing pointer arithmetic on it. This is needed on z, where element numbers are i32 but pointers are i64. Original patch by Richard Sandiford. llvm-svn: 236530	2015-05-05 19:34:10 +00:00
Ulrich Weigand	af2c618e2b	[DAGCombiner] Fix ReplaceExtractVectorEltOfLoadWithNarrowedLoad for BE For little-endian, the function would convert (extract_vector_elt (load X), Y) to X + Ysizeof(elt). For big-endian it would instead use X + sizeof(vec) - Ysizeof(elt). The big-endian case wasn't right since vector index order always follows memory/array order, even for big-endian. (Note that the current handling has to be wrong for Y==0 since it would access beyond the end of the vector.) Original patch by Richard Sandiford. llvm-svn: 236529	2015-05-05 19:33:37 +00:00
Ulrich Weigand	2693c0a491	[LegalizeVectorTypes] Allow single loads and stores for more short vectors When lowering a load or store for TypeWidenVector, the type legalizer would use a single load or store if the associated integer type was legal. E.g. it would load a v4i8 as an i32 if i32 was legal. This patch extends that behavior to promoted integers as well as legal ones. If the integer type for the full vector width is TypePromoteInteger, the element type is going to be TypePromoteInteger too, and it's still better to use a single promoting load or truncating store rather than N individual promoting loads or truncating stores. E.g. if you have a v2i8 on a target where i16 is promoted to i32, it's better to load the v2i8 as an i16 rather than load both i8s individually. Original patch by Richard Sandiford. llvm-svn: 236528	2015-05-05 19:32:57 +00:00
Ulrich Weigand	c1708b2618	[SystemZ] Add vector intrinsics This adds intrinsics to allow access to all of the z13 vector instructions. Note that instructions whose semantics can be described by standard LLVM IR do not get any intrinsics. For each instructions whose semantics cannot (fully) be described, we define an LLVM IR target-specific intrinsic that directly maps to this instruction. For instructions that also set the condition code, the LLVM IR intrinsic returns the post-instruction CC value as a second result. Instruction selection will attempt to detect code that compares that CC value against constants and use the condition code directly instead. Based on a patch by Richard Sandiford. llvm-svn: 236527	2015-05-05 19:31:09 +00:00
Ulrich Weigand	5211f9ff4d	[SystemZ] Mark v1i128 and v1f128 as unsupported The ABI specifies that <1 x i128> and <1 x fp128> are supposed to be passed in vector registers. We do not yet support those types, and some infrastructure is missing before we can do so. In order to prevent accidentally generating code violating the ABI, this patch adds checks to detect those types and error out if user code attempts to use them. llvm-svn: 236526	2015-05-05 19:30:05 +00:00
Ulrich Weigand	cd2a1b5341	[SystemZ] Handle sub-128 vectors The ABI allows sub-128 vectors to be passed and returned in registers, with the vector occupying the upper part of a register. We therefore want to legalize those types by widening the vector rather than promoting the elements. The patch includes some simple tests for sub-128 vectors and also tests that we can recognize various pack sequences, some of which use sub-128 vectors as temporary results. One of these forms is based on the pack sequences generated by llvmpipe when no intrinsics are used. Signed unpacks are recognized as BUILD_VECTORs whose elements are individually sign-extended. Unsigned unpacks can have the equivalent form with zero extension, but they also occur as shuffles in which some elements are zero. Based on a patch by Richard Sandiford. llvm-svn: 236525	2015-05-05 19:29:21 +00:00
Ulrich Weigand	49506d78e7	[SystemZ] Add CodeGen support for scalar f64 ops in vector registers The z13 vector facility includes some instructions that operate only on the high f64 in a v2f64, effectively extending the FP register set from 16 to 32 registers. It's still better to use the old instructions if the operands happen to fit though, since the older instructions have a shorter encoding. Based on a patch by Richard Sandiford. llvm-svn: 236524	2015-05-05 19:28:34 +00:00
Ulrich Weigand	80b3af7ab3	[SystemZ] Add CodeGen support for v4f32 The architecture doesn't really have any native v4f32 operations except v4f32->v2f64 and v2f64->v4f32 conversions, with only half of the v4f32 elements being used. Even so, using vector registers for <4 x float> and scalarising individual operations is much better than generating completely scalar code, since there's much less register pressure. It's also more efficient to do v4f32 comparisons by extending to 2 v2f64s, comparing those, then packing the result. This particularly helps with llvmpipe. Based on a patch by Richard Sandiford. llvm-svn: 236523	2015-05-05 19:27:45 +00:00
Ulrich Weigand	cd808237b2	[SystemZ] Add CodeGen support for v2f64 This adds ABI and CodeGen support for the v2f64 type, which is natively supported by z13 instructions. Based on a patch by Richard Sandiford. llvm-svn: 236522	2015-05-05 19:26:48 +00:00
Ulrich Weigand	ce4c109585	[SystemZ] Add CodeGen support for integer vector types This the first of a series of patches to add CodeGen support exploiting the instructions of the z13 vector facility. This patch adds support for the native integer vector types (v16i8, v8i16, v4i32, v2i64). When the vector facility is present, we default to the new vector ABI. This is characterized by two major differences: - Vector types are passed/returned in vector registers (except for unnamed arguments of a variable-argument list function). - Vector types are at most 8-byte aligned. The reason for the choice of 8-byte vector alignment is that the hardware is able to efficiently load vectors at 8-byte alignment, and the ABI only guarantees 8-byte alignment of the stack pointer, so requiring any higher alignment for vectors would require dynamic stack re-alignment code. However, for compatibility with old code that may use vector types, when not using the vector facility, the old alignment rules (vector types are naturally aligned) remain in use. These alignment rules are not only implemented at the C language level (implemented in clang), but also at the LLVM IR level. This is done by selecting a different DataLayout string depending on whether the vector ABI is in effect or not. Based on a patch by Richard Sandiford. llvm-svn: 236521	2015-05-05 19:25:42 +00:00
Pete Cooper	05b84d4168	Revert "Fix IfConverter to handle regmask machine operands." This reverts commit b27413cbfd78d959c18e713bfa271fb69e6b3303 (ie r236515). This is to get the bots green while i investigate the failures. llvm-svn: 236517	2015-05-05 18:49:05 +00:00
Pete Cooper	6ebc207703	Fix IfConverter to handle regmask machine operands. A regmask (typically seen on a call) clobbers the set of registers it lists. The IfConverter, in UpdatePredRedefs, was handling register defs, but not regmasks. These are slightly different to a def in that we need to add both an implicit use and def to appease the machine verifier. Otherwise, uses after the if converted call could think they are reading an undefined register. Reviewed by Matthias Braun and Quentin Colombet. llvm-svn: 236515	2015-05-05 18:31:36 +00:00
Reid Kleckner	0738a9c02e	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236360. This change exposed a bug in WinEHPrepare by opting win32 code into EH preparation. We already knew that WinEHPrepare has bugs, and is the status quo for x64, so I don't think that's a reason to hold off on this change. I disabled exceptions in the sanitizer tests in r236505 and an earlier revision. llvm-svn: 236508	2015-05-05 17:44:16 +00:00
Quentin Colombet	61b305edfd	[ShrinkWrap] Add (a simplified version) of shrink-wrapping. This patch introduces a new pass that computes the safe point to insert the prologue and epilogue of the function. The interest is to find safe points that are cheaper than the entry and exits blocks. As an example and to avoid regressions to be introduce, this patch also implements the required bits to enable the shrink-wrapping pass for AArch64. Context Currently we insert the prologue and epilogue of the method/function in the entry and exits blocks. Although this is correct, we can do a better job when those are not immediately required and insert them at less frequently executed places. The job of the shrink-wrapping pass is to identify such places. Motivating example Let us consider the following function that perform a call only in one branch of a if: define i32 @f(i32 %a, i32 %b) { %tmp = alloca i32, align 4 %tmp2 = icmp slt i32 %a, %b br i1 %tmp2, label %true, label %false true: store i32 %a, i32* %tmp, align 4 %tmp4 = call i32 @doSomething(i32 0, i32* %tmp) br label %false false: %tmp.0 = phi i32 [ %tmp4, %true ], [ %a, %0 ] ret i32 %tmp.0 } On AArch64 this code generates (removing the cfi directives to ease readabilities): _f: ; @f ; BB#0: stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething LBB0_2: ; %false mov sp, x29 ldp x29, x30, [sp], #16 ret With shrink-wrapping we could generate: _f: ; @f ; BB#0: cmp w0, w1 b.ge LBB0_2 ; BB#1: ; %true stp x29, x30, [sp, #-16]! mov x29, sp sub sp, sp, #16 ; =16 stur w0, [x29, #-4] sub x1, x29, #4 ; =4 mov w0, wzr bl _doSomething add sp, x29, #16 ; =16 ldp x29, x30, [sp], #16 LBB0_2: ; %false ret Therefore, we would pay the overhead of setting up/destroying the frame only if we actually do the call. Proposed Solution This patch introduces a new machine pass that perform the shrink-wrapping analysis (See the comments at the beginning of ShrinkWrap.cpp for more details). It then stores the safe save and restore point into the MachineFrameInfo attached to the MachineFunction. This information is then used by the PrologEpilogInserter (PEI) to place the related code at the right place. This pass runs right before the PEI. Unlike the original paper of Chow from PLDI’88, this implementation of shrink-wrapping does not use expensive data-flow analysis and does not need hack to properly avoid frequently executed point. Instead, it relies on dominance and loop properties. The pass is off by default and each target can opt-in by setting the EnableShrinkWrap boolean to true in their derived class of TargetPassConfig. This setting can also be overwritten on the command line by using -enable-shrink-wrap. Before you try out the pass for your target, make sure you properly fix your emitProlog/emitEpilog/adjustForXXX method to cope with basic blocks that are not necessarily the entry block. Design Decisions 1. ShrinkWrap is its own pass right now. It could frankly be merged into PEI but for debugging and clarity I thought it was best to have its own file. 2. Right now, we only support one save point and one restore point. At some point we can expand this to several save point and restore point, the impacted component would then be: - The pass itself: New algorithm needed. - MachineFrameInfo: Hold a list or set of Save/Restore point instead of one pointer. - PEI: Should loop over the save point and restore point. Anyhow, at least for this first iteration, I do not believe this is interesting to support the complex cases. We should revisit that when we motivating examples. Differential Revision: http://reviews.llvm.org/D9210 <rdar://problem/3201744> llvm-svn: 236507	2015-05-05 17:38:16 +00:00
Kit Barton	d4eb73c00e	This patch adds ABI support for v1i128 data type. It adds v1i128 to the appropriate register classes and checks parameter passing and return values. This is related to http://reviews.llvm.org/D9081, which will add instructions that exploit the v1i128 datatype. Phabricator review: http://reviews.llvm.org/D9475 llvm-svn: 236503	2015-05-05 16:10:44 +00:00
Daniel Sanders	eda60d217b	[mips] Generate code for insert/extract operations when using the N64 ABI and MSA. Summary: When using the N64 ABI, element-indices use the i64 type instead of i32. In many cases, we can use iPTR to account for this but additional patterns and pseudo's are also required. This fixes most (but not quite all) failures in the test-suite when using N64 and MSA together. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9342 llvm-svn: 236494	2015-05-05 10:32:24 +00:00
Daniel Sanders	4160c802d9	[mips][msa] Test basic operations for the N32 ABI too. Summary: This required adding instruction aliases for dneg. N64 will be enabled shortly but requires additional bugfixes. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9341 llvm-svn: 236489	2015-05-05 08:48:35 +00:00
Reid Kleckner	9dad227b85	[X86] Fix assertion while DAG combining offsets and ExternalSymbols ExternalSymbol nodes do not contain offsets, unlike GlobalValue nodes. llvm-svn: 236471	2015-05-04 23:22:36 +00:00
Sanjay Patel	ec2d7358b9	zap windows line endings; NFC llvm-svn: 236460	2015-05-04 21:27:27 +00:00
Tim Northover	851ff69b42	CodeGen: match up correct insertvalue indices when assessing tail calls. When deciding whether a value comes from the aggregate or inserted value of an insertvalue instruction, we compare the indices against those of the location we're interested in. One of the lists needs reversing because the input data is backwards (so that modifications take place at the end of the SmallVector), but we were reversing both before leading to incorrect results. Should fix PR23408 llvm-svn: 236457	2015-05-04 20:41:51 +00:00
Elena Demikhovsky	d41e506342	AVX-512: added a test for encoding by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236421	2015-05-04 12:59:15 +00:00
Elena Demikhovsky	60eb9db7bb	AVX-512: added calling convention for i1 vectors in 32-bit mode. Fixed some bugs in extend/truncate for AVX-512 target. Removed VBROADCASTM (masked broadcast) node, since it is not used any more. llvm-svn: 236420	2015-05-04 12:40:50 +00:00
Elena Demikhovsky	52266388f8	AVX-512: added integer "add" and "sub" instructions with saturation for SKX with intrinsics and tests by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236418	2015-05-04 12:35:55 +00:00
Elena Demikhovsky	2557a22be7	AVX-512: Added VPACK* instructions forms for KNL and SKX and their intrinsics by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 236414	2015-05-04 09:14:02 +00:00
Elena Demikhovsky	1b60ed7069	Masked gather and scatter intrinsics - enabled codegen for KNL. llvm-svn: 236394	2015-05-03 07:12:25 +00:00
Simon Pilgrim	017ca19384	[DAGCombiner] Enabled vector float/double -> int constant folding llvm-svn: 236387	2015-05-02 13:04:07 +00:00
Simon Pilgrim	e170a4f5fa	Line ending fix llvm-svn: 236386	2015-05-02 11:50:47 +00:00
Simon Pilgrim	7d6df82dd1	[SSE] Added vector int (i32 and i64) -> float/double conversion tests llvm-svn: 236385	2015-05-02 11:42:47 +00:00
Simon Pilgrim	6e3b7bad11	[SSE] Added vector float/double -> i32 and i64 conversion tests llvm-svn: 236384	2015-05-02 11:18:47 +00:00
Eric Christopher	a2d44dee73	Rework test to use FileCheck by making sure we have no xmm registers with numbers. llvm-svn: 236373	2015-05-02 01:06:17 +00:00
Reid Kleckner	83d89fa546	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236359. Things are still broken despite testing. :( llvm-svn: 236360	2015-05-01 22:50:14 +00:00
Reid Kleckner	51476acd77	Re-land "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236340. llvm-svn: 236359	2015-05-01 22:40:25 +00:00
Colin LeMahieu	bb0d7cbee1	[Hexagon] r236351 fix does not work on builder configurations yet. llvm-svn: 236358	2015-05-01 22:39:20 +00:00
Quentin Colombet	0de2346859	[AArch64][FastISel] Variant of the logical instructions that use two input registers cannot write on SP. rdar://problem/20748715 llvm-svn: 236352	2015-05-01 21:34:57 +00:00
Colin LeMahieu	b662565475	[Hexagon] Adding expression MC emission and removing XFAIL from test that hits this code path. llvm-svn: 236348	2015-05-01 21:14:21 +00:00
Quentin Colombet	9df2fa261b	[AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences. rdar://problem/20748715 llvm-svn: 236346	2015-05-01 20:57:11 +00:00
Reid Kleckner	2747d3d55a	Revert "[WinEH] Add an EH registration and state insertion pass for 32-bit x86" This reverts commit r236339, it breaks the win32 clang-cl self-host. llvm-svn: 236340	2015-05-01 20:14:04 +00:00
Reid Kleckner	4856fc61b4	[WinEH] Add an EH registration and state insertion pass for 32-bit x86 This pass is responsible for constructing the EH registration object that gets linked into fs:00, which is all it does in this change. In the future, it will also insert stores to update the EH state number. I considered keeping this functionality in WinEHPrepare, but it's pretty separable and X86 specific. It has conceptually very little to do with the task of WinEHPrepare, which is currently outlining. WinEHPrepare is also in theory useful on ARM, but this logic is pretty x86 specific. Reviewers: andrew.w.kaylor, majnemer Differential Revision: http://reviews.llvm.org/D9422 llvm-svn: 236339	2015-05-01 20:04:54 +00:00
Peter Collingbourne	d27d3a151f	ARM: Align functions containing Thumb-2 jump tables to 4 bytes. Functions with jump tables need an alignment of 4 because they use the ADR instruction, which aligns the PC to 4 bytes before adding an offset. Differential Revision: http://reviews.llvm.org/D9424 llvm-svn: 236327	2015-05-01 18:05:59 +00:00
Simon Pilgrim	9fb06bca67	[SelectionDAG] Unary vector constant folding integer legality fixes This patch fixes issues with vector constant folding not correctly handling scalar input operands if they require implicit truncation - this was tested with llvm-stress as recommended by Patrik H Hagglund. The patch ensures that integer input scalars from a build vector are correctly truncated before folding, and that constant integer scalar results are promoted to a legal type before inclusion in the new folded build vector. I have added another crash test case and also a test for UINT_TO_FP / SINT_TO_FP using an non-truncated scalar input, which was failing before this patch. Differential Revision: http://reviews.llvm.org/D9282 llvm-svn: 236308	2015-05-01 08:20:04 +00:00
Tom Stellard	aa798340c3	R600/SI: Add VCC as an implict def of SI_KILL When SI_KILL has a register operand, its lowered form writes to vcc. llvm-svn: 236307	2015-05-01 03:44:09 +00:00
Tom Stellard	0b7feb1cb7	R600/SI: Fix verifier errors from the SIAnnotateControlFlow pass This pass was generating 'Instruction does not dominate all uses!' errors for programs which had loops with a condition variable that depended on the result of a phi instruction from outside of the loop. The pass was inserting new phi nodes outside of the loop which used values defined inside the loop. http://bugs.freedesktop.org/show_bug.cgi?id=90056 llvm-svn: 236306	2015-05-01 03:44:08 +00:00
Quentin Colombet	65b5b01d56	[ARM][TEST] Strengthen test against smarter reg alloc. Follow-up of r236247. rdar://problem/20770899 llvm-svn: 236296	2015-05-01 00:45:55 +00:00
Pete Cooper	2127b00cd5	[ARM] optimizeSelect should clear kill flags. If we move an instruction from one block down to a MOVC and predicate it, then the original instruction could be moved in to a loop. In this case, its invalid for any kill flags to remain on there. Fails with -verfy-machineinstrs. rdar://problem/20752113 llvm-svn: 236290	2015-04-30 23:57:47 +00:00
Pete Cooper	451755d370	Commute the internal flag on MachineOperands. When commuting a thumb instruction in the size reduction pass, thumb instructions are represented as a bundle and so some operands may be marked as internal. The internal flag has to move with the operand when commuting. This test is sensitive to register allocation so can't specifically check that this error was happening, but so long as it continues to pass with -verify then hopefully its still ok. rdar://problem/20752113 llvm-svn: 236282	2015-04-30 23:14:14 +00:00
Quentin Colombet	329fa890ba	[AArch64] Fix bad register class constraint in fast-isel for TST instruction. rdar://problem/20748715 llvm-svn: 236273	2015-04-30 22:27:20 +00:00
Pete Cooper	5111881cfc	Don't always apply kill flag in thumb2 ABS pseudo expansion. The expansion for t2ABS was always setting the kill flag on the rsb instruction. It should instead only be set on rsb if it was set on the original ABS instruction. rdar://problem/20752113 llvm-svn: 236272	2015-04-30 22:15:59 +00:00
Reid Kleckner	60d5232be2	[X86] Use 4 byte preferred aggregate alignment on Win32 This helps reduce the frequency of stack realignment prologues in 32-bit X86 Windows code. Before this change and the corresponding clang change, we would take the max of the type preferred alignment and the explicit alignment on the alloca. If you don't override aggregate alignment in datalayout, you get a default of 8. This dates back to 2007 / r34356, and changing it seems prohibitively difficult at this point. llvm-svn: 236270	2015-04-30 22:11:59 +00:00
Andrea Di Biagio	737a361006	Fix comment in test. NFC. llvm-svn: 236262	2015-04-30 21:22:28 +00:00
Andrea Di Biagio	c84b5bdd69	Fix for PR23103. Correctly propagate the 'IsUndef' flag to the register operands of a commuted instruction. Revision 220239 exposed a latent bug in method 'TargetInstrInfo::commuteInstruction'. When commuting the operands of a machine instruction, method 'commuteInstruction' didn't correctly propagate the 'IsUndef' flag to the register operands of the new (commuted) instruction. Before this patch, the following instruction: %vreg4<def> = VADDSDrr %vreg14, %vreg5<undef>; FR64:%vreg4,%vreg14,%vreg5 was wrongly converted by method 'commuteInstruction' into: %vreg4<def> = VADDSDrr %vreg5, %vreg14<undef>; FR64:%vreg4,%vreg5,%vreg14 The correct instruction should have been: %vreg4<def> = VADDSDrr %vreg5<undef>, %vreg14; FR64:%vreg4,%vreg5,%vreg14 This patch fixes the problem in method 'TargetInstrInfo::commuteInstruction'. When swapping the operands of a machine instruction, we now make sure that 'IsUndef' flags are correctly set. Added test case 'pr23103.ll'. Differential Revision: http://reviews.llvm.org/D9406 llvm-svn: 236258	2015-04-30 21:03:29 +00:00
Pete Cooper	4d8d2ec3eb	Don't rewrite jumps to empty BBs to landing pads. In the test case here, the 'unreachable' BB was removed by BranchFolding because its empty. It then rewrote the jump from 'entry' to jump to its fallthrough, which was a landing pad. This results in 'entry' jumping to 2 different landing pads, which fails the machine verifier. rdar://problem/20750162 llvm-svn: 236248	2015-04-30 18:58:23 +00:00
Quentin Colombet	0a905042cd	[ARM] Do not generate invalid encoding for stack adjust, even if this is just temporary. Because of that: 1. The machine verifier was complaining on such code. 2. The generate code worked just because the thumb reduction size pass fixed the opcode. rdar://problem/20749824 llvm-svn: 236247	2015-04-30 18:52:49 +00:00
Jan Vesely	808fff585b	Reinstate revisions r234755, r234759, r234760 changes: Don't apply on hexagon and NVPTX since they no longer claim to support UADDO/USUBO Add location to getConstant Drop comment about the ops being turned into expand llvm-svn: 236240	2015-04-30 17:15:56 +00:00
Daniel Sanders	59f89aa8ed	[mips][msa] Rename main check prefix to 'ALL' in basic operations tests. NFC Summary: The majority of the checks are subtarget independent. The few that aren't will be corrected shortly. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9340 llvm-svn: 236220	2015-04-30 09:57:37 +00:00
Daniel Sanders	fa159165be	[mips][msa] Use CHECK-LABEL where missing, and remove checks matching the .size directive. NFC. Summary: Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9339 llvm-svn: 236219	2015-04-30 09:56:30 +00:00
Daniel Sanders	90b059d555	[mips] Add missing signext attributes to MSA basic operations tests. NFC. Summary: This doesn't make much difference to MIPS32, but it will simplify a MIPS64r6 bugfix which will follow shortly by removing unnecessary sign-extension of parameters. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9338 llvm-svn: 236216	2015-04-30 09:24:09 +00:00
Simon Pilgrim	ecf5875bd5	[SSE] Fix for MUL v16i8 on pre-SSE41 targets (PR23369). Sign extension of i8 to i16 was placing the unpacked bytes in the lower byte instead of the upper byte. llvm-svn: 236209	2015-04-30 08:23:16 +00:00
Owen Anderson	d8a029c81b	Semantically revert r236031, which is not a good idea for in-order targets. At the least it should be guarded by some kind of target hook. It also introduced catastrophic compile time and code quality regressions on some out of tree targets (test case still being reduced/sanitized). Sanjay agreed with reverting this patch until these issues can be resolved. llvm-svn: 236199	2015-04-30 04:06:32 +00:00
Hans Wennborg	680a60f0d0	XFAIL test/CodeGen/Generic/MachineBranchProb.ll on Hexagon (PR23377) llvm-svn: 236196	2015-04-30 01:59:04 +00:00
Hans Wennborg	4b828d35fd	Switch lowering: use profile info to build weight-balanced binary search trees This will cause hot nodes to appear closer to the root. The literature says building the tree like this makes it a near-optimal (in terms of search time given key frequencies) binary search tree. In LLVM's case, we can do up to 3 comparisons in each leaf node, so it might be better to opt for lower tree height in some cases; that's something to look into in the future. Differential Revision: http://reviews.llvm.org/D9318 llvm-svn: 236192	2015-04-30 00:57:37 +00:00
Ahmed Bougacha	ace0593b42	Flip r236172 testcase RUN option ordering for BSD sed(1). NFC. llvm-svn: 236186	2015-04-30 00:07:34 +00:00
Pete Cooper	46361a1ea1	Change x86 CMOVE_F to read it source, not write it. This was breaking sqlite with the machine verifier because operand 0 was a def according to tablegen, but didn't have the 'isDef' flag set. Looking at the ISA, its clear that this operand is a source as writing to st(0) is implicit. So move the operand to the correct place in the td file. rdar://problem/20751584 llvm-svn: 236183	2015-04-29 23:51:33 +00:00
Reid Kleckner	bcda1cd45a	[WinEH] Start EH preparation for 32-bit x86, it uses no arguments 32-bit x86 MSVC-style exceptions are functionaly similar to 64-bit, but they take no arguments. Instead, they implicitly use the value of EBP passed in by the caller as a pointer to the parent's frame. In LLVM, we can represent this as llvm.frameaddress(1), and feed that into all of our calls to llvm.framerecover. The next steps are: - Add an alloca to the fs:00 linked list of handlers - Add something like llvm.sjlj.lsda or generalize it to store in the alloca - Move state number calculation to WinEHPrepare, arrange for FunctionLoweringInfo to call it - Use the state numbers to insert explicit loads and stores in the IR llvm-svn: 236172	2015-04-29 22:49:54 +00:00
Reid Kleckner	c695471365	[X86] Avoid mangling frameescape labels x86 Windows uses the '_' prefix for all global symbols, and this was mistakenly being applied to frameescape labels, which are not externally visible global symbols. They use the private global prefix 'L'. The right way to fix this is probably to stop masquerading this label as an ExternalSymbol and create a new SDNode type. These labels are not "external", and we know they will be resolved by assembly time. Having a custom SDNode type would allow us to do better X86 address mode matching, so it's probably worth doing eventually. llvm-svn: 236123	2015-04-29 16:46:01 +00:00
Duncan P. N. Exon Smith	a9308c49ef	IR: Give 'DI' prefix to debug info metadata Finish off PR23080 by renaming the debug info IR constructs from `MD` to `DI`. The last of the `DIDescriptor` classes were deleted in r235356, and the last of the related typedefs removed in r235413, so this has all baked for about a week. Note: If you have out-of-tree code (like a frontend), I recommend that you get everything compiling and tests passing with the previous commit before updating to this one. It'll be easier to keep track of what code is using the `DIDescriptor` hierarchy and what you've already updated, and I think you're extremely unlikely to insert bugs. YMMV of course. Back to this commit: I did this using the rename-md-di-nodes.sh upgrade script I've attached to PR23080 (both code and testcases) and filtered through clang-format-diff.py. I edited the tests for test/Assembler/invalid-generic-debug-node-*.ll by hand since the columns were off-by-three. It should work on your out-of-tree testcases (and code, if you've followed the advice in the previous paragraph). Some of the tests are in badly named files now (e.g., test/Assembler/invalid-mdcompositetype-missing-tag.ll should be 'dicompositetype'); I'll come back and move the files in a follow-up commit. llvm-svn: 236120	2015-04-29 16:38:44 +00:00
Vasileios Kalintiris	1249e74648	Mips fast-isel - handle functions which return i8 or i6 . Summary: Allow Mips fast-isel to handle functions which return i8/i16 signed/unsigned. Test Plan: Make check tests are forthcoming. Already passes test-suite at O0/O2 for Mips 32 r1/r2 Reviewers: dsanders, rkotler Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D6765 llvm-svn: 236103	2015-04-29 14:17:14 +00:00
Daniel Sanders	301f937765	[mips] Correct 128-bit shifts on 64-bit targets. Summary: The existing code was correct for 32-bit GPR's but not 64-bit GPR's. It now accounts for both cases. Reviewers: vkalintiris Reviewed By: vkalintiris Subscribers: llvm-commits, mohit.bhakkad, sagar Differential Revision: http://reviews.llvm.org/D9337 llvm-svn: 236099	2015-04-29 12:28:58 +00:00
Tim Northover	e18d662201	ARM: fix peephole optimisation of TST We were trying to look through COPY instructions, but only to the next instruction in a BB and incorrectly anyway. The cases where that would actually be a good idea are rare enough (and not even tested!) that it's not worth trying to get right. rdar://20721342 llvm-svn: 236050	2015-04-28 22:03:55 +00:00
Andrew Kaylor	046f7b42f2	[WinEH] Split blocks at calls to llvm.eh.begincatch Differential Revision: http://reviews.llvm.org/D9311 llvm-svn: 236046	2015-04-28 21:54:14 +00:00
Sanjay Patel	2fbc4e5c49	transform fadd chains to increase parallelism This is a compromise: with this simple patch, we should always handle a chain of exactly 3 operations optimally, but we're not generating the optimal balanced binary tree for a longer sequence. In general, this transform will reduce the dependency chain for a sequence of instructions using N operands from a worst case N-1 dependent operations to N/2 dependent operations. The optimal balanced binary tree would reduce the chain to log2(N). The trade-off for not dealing with longer sequences is: (1) we have less complexity in the compiler, (2) we avoid unknown compile-time blowup calculating a balanced tree, and (3) we don't need to worry about the increased register pressure required to parallelize longer sequences. It also seems unlikely that we would ever encounter really long strings of dependent ops like that in the wild, but I'm not sure how to verify that speculation. FWIW, I see no perf difference for test-suite running on btver2 (x86-64) with -ffast-math and this patch. We can extend this patch to cover other associative operations such as fmul, fmax, fmin, integer add, integer mul. This is a partial fix for: https://llvm.org/bugs/show_bug.cgi?id=17305 and if extended: https://llvm.org/bugs/show_bug.cgi?id=21768 https://llvm.org/bugs/show_bug.cgi?id=23116 The issue also came up in: http://reviews.llvm.org/D8941 Differential Revision: http://reviews.llvm.org/D9232 llvm-svn: 236031	2015-04-28 21:03:22 +00:00
Tom Stellard	96301d2455	R600: Fix up for AsmPrinter's OutStreamer being a unique_ptr Fixes a crash with basically any OpenGL application using the radeonsi driver. Patch by: Michel Dänzer Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90176 Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 236004	2015-04-28 17:37:03 +00:00
Justin Holewinski	3d2a976197	[NVPTX] Handle addrspacecast constant expressions in aggregate initializers We need to track if an AddrSpaceCast expression was seen when generating an MCExpr for a ConstantExpr. This change introduces a custom lowerConstant method to the NVPTX asm printer that will create NVPTXGenericMCSymbolRefExpr nodes at the appropriate places to encode the information that a given symbol needs to be casted to a generic address. llvm-svn: 236000	2015-04-28 17:18:30 +00:00
Elena Demikhovsky	1f7b3644d3	Fixed crash of variable shift inst on AVX2 https://llvm.org/bugs/show_bug.cgi?id=22955 llvm-svn: 235993	2015-04-28 14:46:35 +00:00
Elena Demikhovsky	ae51853924	AVX-512: Added "pandn" intrinsics set by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 235971	2015-04-28 08:12:42 +00:00
Hans Wennborg	67c03759e4	Switch lowering: Take branch weight into account when ordering for fall-through Previously, the code would try to put a fall-through case last, even if that meant moving a case with much higher branch weight further down the chain. Ordering by branch weight is most important, putting a fall-through block last is secondary. llvm-svn: 235942	2015-04-27 23:35:22 +00:00
Ahmed Bougacha	c004c60c0a	[AArch64] Also combine vector selects fed by non-i1 SETCCs. After legalization, scalar SETCC has an i32 result type on AArch64. The i1 requirement seems too conservative, replace it with an assert. This also means that we now can run after legalization. That should also be fine, since the ops legalizer runs again after each combine, and all types created all have the same sizes as the (legal) inputs. Exposed by r235917; while there, robustize its tests (bsl also uses the register it defines). llvm-svn: 235922	2015-04-27 21:43:12 +00:00
Ahmed Bougacha	89bba61c84	[AArch64] Don't assert when combining (v3f32 select (setcc f64)). When the setcc has f64 operands, we can't build a vector setcc mask to feed a vselect, because f64 doesn't divide v3f32 evenly. Just bail out when that happens. llvm-svn: 235917	2015-04-27 21:01:20 +00:00
Hans Wennborg	ba6d2568f9	Switch lowering: order bit tests by branch weight. llvm-svn: 235912	2015-04-27 20:21:17 +00:00
Bill Schmidt	fe723b9a6d	[PPC64LE] Remove unnecessary swaps from lane-insensitive vector computations This patch adds a new SSA MI pass that runs on little-endian PPC64 code with VSX enabled. Loads and stores of 4x32 and 2x64 vectors without alignment constraints are accomplished for little-endian using lxvd2x/xxswapd and xxswapd/stxvd2x. The existence of the additional xxswapd instructions hurts performance in comparison with big-endian code, but they are necessary in the general case to support correct semantics. However, the general case does not apply to most vector code. Many vector instructions are lane-insensitive; they do not "care" which lanes the parallel computations are performed within, provided that the resulting data is stored into the correct locations. Thus this pass looks for computations that perform only lane-insensitive operations, and remove the unnecessary swaps from loads and stores in such computations. Future improvements will allow computations using certain lane-sensitive operations to also be optimized in this manner, by modifying the lane-sensitive operations to account for the permuted order of the lanes. However, this patch only adds the infrastructure to permit this; no lane-sensitive operations are optimized at this time. This code is heavily exercised by the various vectorizing applications in the projects/test-suite tree. For the time being, I have only added one simple test case to demonstrate what the pass is doing. Although it is quite simple, it provides coverage for much of the code, including the special case handling of copies and subreg-to-reg operations feeding the swaps. I plan to add additional tests in the future as I fill in more of the "special handling" code. Two existing tests were affected, because they expected the swaps to be present, but they are now removed. llvm-svn: 235910	2015-04-27 19:57:34 +00:00
Elena Demikhovsky	a480ef5494	AVX-512: added calling conventions for i1 vectors. Fixed bug: https://llvm.org/bugs/show_bug.cgi?id=20724 llvm-svn: 235889	2015-04-27 15:11:19 +00:00
Brendon Cahoon	55bdeb7bc7	[Hexagon] Use constant extenders to fix up hardware loops Use a loop instruction with a constant extender for a hardware loop instruction that is too far away from the start of the loop. This is cheaper than changing the SA register value. Differential Revision: http://reviews.llvm.org/D9262 llvm-svn: 235882	2015-04-27 14:16:43 +00:00
Vasileios Kalintiris	7a6b18783f	Reapply "[mips][FastISel] Implement shift ops for Mips fast-isel."" This reapplies r235194, which was reverted in r235495 because it was causing a failure in our out-of-tree buildbots for MIPS. With the sign-extension patch in r235718, this patch doesn't cause any problem any more. llvm-svn: 235878	2015-04-27 13:28:05 +00:00
Elena Demikhovsky	d1084c5b3f	AVX-512: Extend/Truncate operations for SKX, SETCC for bit-vectors llvm-svn: 235875	2015-04-27 12:57:59 +00:00
Simon Pilgrim	4f683c264a	[X86][SSE] Add v16i8/v32i8 multiplication support Patch to allow int8 vectors to be multiplied on the SSE unit instead of being scalarized. The patch sign extends the i8 lanes to i16, uses the SSE2 pmullw multiplication instruction, then packs the lower byte from each result. Differential Revision: http://reviews.llvm.org/D9115 llvm-svn: 235837	2015-04-27 07:55:46 +00:00
Matt Arsenault	957bfc7458	R600: Remove / merge redundant testcases llvm-svn: 235813	2015-04-26 00:53:33 +00:00
Sanjay Patel	3eb5146b3c	add SSE run to check non-AVX codegen llvm-svn: 235809	2015-04-25 20:41:51 +00:00
Simon Pilgrim	aedd3c5160	line endings fix llvm-svn: 235800	2015-04-25 12:12:43 +00:00
Reid Kleckner	cfbfe6f29c	[SEH] Implement GetExceptionCode in __except blocks This introduces an intrinsic called llvm.eh.exceptioncode. It is lowered by copying the EAX value live into whatever basic block it is called from. Obviously, this only works if you insert it late during codegen, because otherwise mid-level passes might reschedule it. llvm-svn: 235768	2015-04-24 20:25:05 +00:00
David Blaikie	445e3fbc54	[opaque pointer type] Add textual IR support for explicit type parameter to the invoke instruction Same as r235145 for the call instruction - the justification, tradeoffs, etc are all the same. The conversion script worked the same without any false negatives (after replacing 'call' with 'invoke'). llvm-svn: 235755	2015-04-24 19:32:54 +00:00
Sundeep Kushwaha	5d41a6992d	[PATCH] [Hexagon] Adding a test case for calling convention. http://reviews.llvm.org/D9241 llvm-svn: 235754	2015-04-24 19:22:02 +00:00
Yaron Keren	de2e2b0214	Teach AArch64\lit.local.cfg the new triple names windows-gnu and windows-msvc. Tests were failing when built with -DLLVM_DEFAULT_TARGET_TRIPLE=i686-pc-windows-gnu. llvm-svn: 235733	2015-04-24 17:14:16 +00:00
Hans Wennborg	ec679a8b3b	Switch lowering: fix APInt overflow causing infinite loop / OOM llvm-svn: 235729	2015-04-24 16:53:55 +00:00
Reid Kleckner	2c3ccaacb7	[WinEH] Split the landingpad BB instead of cloning it This means we don't have to RAUW the landingpad instruction and landingpad BB, which is a nice win. llvm-svn: 235725	2015-04-24 16:22:19 +00:00
Jingyue Wu	312fd0242d	[NVPTX] Emits "generic()" depending on the original address space Summary: Fixes a bug in the NVPTX codegen. The code used to miss necessary "generic()" on aggregates of addrspacecasts. Test Plan: addrspacecast-gvar.ll Reviewers: eliben, jholewinski Reviewed By: jholewinski Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D9130 llvm-svn: 235689	2015-04-24 02:57:30 +00:00
Matt Arsenault	5e10016f03	R600/SI: Fix verifier error when producing v_madmk_f32 Copy the kill flags when swapping the operands. llvm-svn: 235687	2015-04-24 01:57:58 +00:00
Matthias Braun	e1a67412cf	R600/RegisterCoalescer: Enable more rematerialization/add missing testcase This enables the rematerialization of some R600 MOV instructions in the RegisterCoalescer and adds a testcase for r235668. llvm-svn: 235675	2015-04-24 00:25:50 +00:00
Reid Kleckner	5c5facc2ce	Re-commit "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works" This reverts commit r235617. r235649 should have addressed the problems. llvm-svn: 235667	2015-04-23 23:22:33 +00:00
Hal Finkel	d86e90abdd	[PowerPC] Use sync inst alias when printing So long as the choice between printing msync and sync is not ambiguous, we can print 'sync 0' and just 'sync'. llvm-svn: 235663	2015-04-23 23:05:08 +00:00
Tom Stellard	ff5cf0e1fd	R600: Correctly lower CONCAT_VECTOR nodes with more than 2 operands llvm-svn: 235662	2015-04-23 22:59:24 +00:00
Andrew Kaylor	20ae2a311f	[WinEH] Ignore filter clauses while mapping landing pad blocks. llvm-svn: 235656	2015-04-23 22:38:36 +00:00
Reid Kleckner	e3af86e9d9	[WinEH] Replace more lpad value uses with undef We were asserting on code like this: extern "C" unsigned long _exception_code(); void might_crash(unsigned long); void foo() { __try { might_crash(0); } __except(1) { might_crash(_exception_code()); } } Gtest and many other libraries get the exception code from the __except block. What's supposed to happen here is that EAX is live into the __except block, and it contains the exception code. Eventually we'll represent that as a use of the landingpad ehptr value, but for now we can replace it with undef. llvm-svn: 235649	2015-04-23 21:22:30 +00:00
Quentin Colombet	796d906e06	[MachineCopyPropagation] Handle undef flags conservatively so that we do not remove copies that are useful after breaking some hardware dependencies. In other words, handle this kind of situations conservatively by assuming reg2 is redefined by the undef flag. reg1 = copy reg2 = inst reg2<undef> reg2 = copy reg1 Copy propagation used to remove the last copy. This is incorrect because the undef flag on reg2 in inst, allows next passes to put whatever trashed value in reg2 that may help. In practice we end up with this code: reg1 = copy reg2 reg2 = 0 = inst reg2<undef> reg2 = copy reg1 This fixes PR21743. llvm-svn: 235647	2015-04-23 21:17:39 +00:00
Tom Stellard	8b0182af2f	R600/SI: Fix indirect addressing with a negative constant offset When the base register index of the vector plus the constant offset was less than zero, we were passing the wrong base register to the indirect addressing instruction. In this case, we need to set the base register to v0 and then add the computed (negative) index to m0. llvm-svn: 235641	2015-04-23 20:32:01 +00:00
Peter Collingbourne	167668f8c8	Thumb2: When applying branch optimizations, visit branches in reverse order. The order in which branches appear in ImmBranches is approximately their order within the function body. By visiting later branches first, we reduce the distance between earlier forward branches and their targets, making it more likely that the cbn?z optimization, which can only apply to forward branches, will succeed for those earlier branches. Differential Revision: http://reviews.llvm.org/D9185 llvm-svn: 235640	2015-04-23 20:31:35 +00:00
Peter Collingbourne	cfee5b04bc	ARM: When re-creating a branch via InsertBranch, preserve CPSR flags. In particular, this preserves the kill flag, which allows the Thumb2 cbn?z optimization to be applied in cases where a branch has been re-created after the live variables analysis pass, e.g. by the machine block placement pass. This appears to be low risk; a number of other targets seem to already be doing something similar, e.g. AArch64, PowerPC. Differential Revision: http://reviews.llvm.org/D9184 llvm-svn: 235639	2015-04-23 20:31:32 +00:00
Peter Collingbourne	6529523151	Thumb2: When optimizing for size, do not if-convert branches involving comparisons with zero. This allows the constant island pass to lower these branches to cbn?z instructions, resulting in a shorter instruction sequence. Differential Revision: http://reviews.llvm.org/D9183 llvm-svn: 235638	2015-04-23 20:31:30 +00:00
Peter Collingbourne	78f1ecc59c	ARM: When spilling extra registers for alignment, prefer low registers on all Thumb targets. This makes it more likely that we can use the 16-bit push and pop instructions on Thumb-2, saving around 4 bytes per function. Differential Revision: http://reviews.llvm.org/D9165 llvm-svn: 235637	2015-04-23 20:31:26 +00:00
Peter Collingbourne	1213918bf4	ARM: Only enforce 4-byte alignment on Thumb-2 functions with constant pools. This appears to have been introduced back in r76698 as part of an unrelated change. I can find no official ARM documentation stating that Thumb-2 functions require 4-byte alignment; in fact, ARM documentation appears to contradict this (see, e.g., ARM Architecture Reference Manual Thumb-2 Supplement, section 2.6.1: "Thumb-2 enforces 16-bit alignment on all instructions."). Also remove code that sets alignment for ARM functions, which is redundant with code in the MachineFunction constructor, and remove the hidden -arm-align-constant-islands flag, which has been enabled by default since r146739 (Dec 2011) and has probably received sufficient testing by now. Differential Revision: http://reviews.llvm.org/D9138 llvm-svn: 235636	2015-04-23 20:31:22 +00:00
Reid Kleckner	909ea7e6b8	Revert "[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works" We still have some "uses remain after removal" issues in -O0 builds. This reverts commit r235557. llvm-svn: 235617	2015-04-23 18:34:01 +00:00
Hal Finkel	7c5cb066d0	[PowerPC] Enable printing instructions using aliases TableGen had been nicely generating code to print a number of instructions using shorter aliases (and PowerPC has plenty of short mnemonics), but we were not calling it. For some of the aliases we support in the parser, TableGen can't infer the "inverse" alias relationship, so there is still more to do. Thus, after some hours of updating test cases... llvm-svn: 235616	2015-04-23 18:30:38 +00:00
Pirama Arumuga Nainar	745615ca00	[AArch64] Add nvcast patterns for v4f16 and v8f16 Summary: Constant stores of f16 vectors can create NvCast nodes from various operand types to v4f16 or v8f16 depending on patterns in the stored constants. This patch adds nvcast rules with v4f16 and v8f16 values. AArchISelLowering::LowerBUILD_VECTOR has the details on which constant patterns generate the nvcast nodes. Reviewers: jmolloy, srhines, ab Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9201 llvm-svn: 235610	2015-04-23 17:32:25 +00:00
Pirama Arumuga Nainar	b18815354d	[AArch64] Handle vec4, vec8, vec16 *itofp for half Summary: Set operation action for SINT_TO_FP and UINT_TO_FP nodes with v4i32, v8i8, v8i16 inputs to allow promotion of v4f16 results. Add tests for sitofp and uitofp for vec4, vec8, vec16, and i8, i16, i32, and i64 vectors. Only missing tests are for v16i8 and v16i16 as the shift operations are too complicated to write a proper check sequence. The conversions from v4i64 to v4f16 do not depend on this patch - v4i64 is split and the conversion gets handled while lowering v2i64. I am adding a test here for completeness. Reviewers: aemerson, rengolin, ab, jmolloy, srhines Subscribers: rengolin, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D9166 llvm-svn: 235609	2015-04-23 17:16:27 +00:00
Hans Wennborg	0867b151c9	Re-commit r235560: Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) Third time's the charm. The previous commit was reverted as a reverse for-loop in SelectionDAGBuilder::lowerWorkItem did 'I--' on an iterator at the beginning of a vector, causing asserts when using debugging iterators. This commit fixes that. llvm-svn: 235608	2015-04-23 16:45:24 +00:00
Sanjay Patel	f4b0f07430	use update_llc_test_checks.py to tighten checking; remove unnecessary CPU param llvm-svn: 235604	2015-04-23 16:07:50 +00:00
Krzysztof Parzyszek	876a19d855	[Hexagon] Shrink-wrap stack frame (Hexagon-specific) llvm-svn: 235603	2015-04-23 16:05:39 +00:00
Krzysztof Parzyszek	a17cebd219	[Hexagon] Add testcases for stack alignment and variable-sized objects llvm-svn: 235602	2015-04-23 15:12:49 +00:00
Aaron Ballman	0be238cebd	Revert r235560; this commit was causing several failed assertions in Debug builds using MSVC's STL. The iterator is being used outside of its valid range. llvm-svn: 235597	2015-04-23 13:41:59 +00:00
Simon Pilgrim	86b034bae9	[DAGCombiner] Remove extra bitcasts surrounding vector shuffles Patch to remove extra bitcasts from shuffles, this is often a legacy of XformToShuffleWithZero being used to combine bitmaskings (of float vectors bitcast to integer vectors) into shuffles: bitcast(shuffle(bitcast(s0),bitcast(s1))) -> shuffle(s0,s1) Differential Revision: http://reviews.llvm.org/D9097 llvm-svn: 235578	2015-04-23 08:43:13 +00:00
Andrew Kaylor	86e67f7ebc	[WinEH] Removing seh-filter.ll until I can determine its validity llvm-svn: 235566	2015-04-23 00:38:22 +00:00
Andrew Kaylor	43e1d76278	[WinEH] Don't skip landing pads that end with an unreachable instruction. llvm-svn: 235563	2015-04-23 00:20:44 +00:00
Hans Wennborg	15823d49b6	Switch lowering: extract jump tables and bit tests before building binary tree (PR22262) This is a re-commit of r235101, which also fixes the problems with the previous patch: - Switches with only a default case and non-fallthrough were handled incorrectly - The previous patch tickled a bug in PowerPC Early-Return Creation which is fixed here. > This is a major rewrite of the SelectionDAG switch lowering. The previous code > would lower switches as a binary tre, discovering clusters of cases > suitable for lowering by jump tables or bit tests as it went along. To increase > the likelihood of finding jump tables, the binary tree pivot was selected to > maximize case density on both sides of the pivot. > > By not selecting the pivot in the middle, the binary trees would not always > be balanced, leading to performance problems in the generated code. > > This patch rewrites the lowering to search for clusters of cases > suitable for jump tables or bit tests first, and then builds the binary > tree around those clusters. This way, the binary tree will always be balanced. > > This has the added benefit of decoupling the different aspects of the lowering: > tree building and jump table or bit tests finding are now easier to tweak > separately. > > For example, this will enable us to balance the tree based on profile info > in the future. > > The algorithm for finding jump tables is quadratic, whereas the previous algorithm > was O(n log n) for common cases, and quadratic only in the worst-case. This > doesn't seem to be major problem in practice, e.g. compiling a file consisting > of a 10k-case switch was only 30% slower, and such large switches should be rare > in practice. Compiling e.g. gcc.c showed no compile-time difference. If this > does turn out to be a problem, we could limit the search space of the algorithm. > > This commit also disables all optimizations during switch lowering in -O0. > > Differential Revision: http://reviews.llvm.org/D8649 llvm-svn: 235560	2015-04-22 23:14:56 +00:00
Reid Kleckner	64a2a6a473	[SEH] Remove the old __C_specific_handler code now that WinEHPrepare works This removes the -sehprepare flag and makes __C_specific_handler functions always to use WinEHPrepare. This was tested by building all of chromium_builder_tests and running a few tests that use SEH, but if something breaks, we can revert this. llvm-svn: 235557	2015-04-22 22:13:09 +00:00
Krzysztof Parzyszek	952d951418	[Hexagon] Some cleanup of instruction selection code llvm-svn: 235552	2015-04-22 21:17:00 +00:00
Reid Kleckner	fd7df284b8	[WinEH] Demote values and phis live across exception handlers up front In particular, this handles SSA values that are live out of a handler. The existing code only handles values that are live in to a handler. It also handles phi nodes in the block where normal control should resume after the end of a catch handler. When EH return points have phi nodes, we need to split the return edge. It is impossible for phi elimination to emit copies in the previous block if that block gets outlined. The indirectbr that we leave in the function is only notional, and is eliminated from the MachineFunction CFG early on. Reviewers: majnemer, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D9158 llvm-svn: 235545	2015-04-22 21:05:21 +00:00
Krzysztof Parzyszek	cd97c985c7	[Hexagon] Use A2_tfrsi for constant pool and jump table addresses llvm-svn: 235535	2015-04-22 18:25:53 +00:00
Pirama Arumuga Nainar	67e82482c0	Fix correctness check for test_vec_fpextend_double Summary: Remove the CHECK-DAG calls introduced in r235341, and add a comment that this test may break due to scheduling variations. This patch completes the fix discussed in http://reviews.llvm.org/D8804 Reviewers: dsanders, srhines Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9178 llvm-svn: 235530	2015-04-22 18:04:12 +00:00
Matt Arsenault	deaef8e24b	R600: Fix always inline pass breaking noinline functions No test since calls are not actually supported yet. llvm-svn: 235524	2015-04-22 17:10:44 +00:00
Sanjay Patel	cab567873f	[x86] Add store-folded memop patterns for vcvtps2ph Differential Revision: http://reviews.llvm.org/D7296 llvm-svn: 235517	2015-04-22 16:11:19 +00:00
Andrea Di Biagio	6cd2f42fac	[X86][AVX] Fix failure due to a missing ISel pattern to select VBROADCAST nodes (PR23259). This fixes a regression introduced at revision 218263. On AVX, if we optimize for size, a splat build_vector of a load is lowered into a VBROADCAST node. This is done even if the value type of the splat build_vector node is v2i64. Since AVX doesn't support v2f64/v2i64 broadcasts, revision 218263 added two extra tablegen patterns to allow selecting a VMOVDDUPrm from an X86VBroadcast where the scalar element comes from a loadi64/loadf64. However, revision 218263 forgot to add an extra fallback pattern for the case where we have a X86VBroadcast of a loadi64 with multiple uses. This patch adds the missing tablegen pattern in X86InstrSSE.td. This patch also adds an extra test to 'splat-for-size.ll' to verify that ISel doesn't crash with a 'fatal error in the backend' due to a missing AVX pattern to select v2i64 X86ISD::BROADCAST nodes. llvm-svn: 235509	2015-04-22 14:53:39 +00:00
Hal Finkel	0d49cf2645	[DAGCombine] Disable select(c, load,load) for indexed loads This turned up after r235333, but was a pre-existing bug. The optimization which transforms select(c, load, load) into a load of a select of the addresses does not handle indexed loads (pre/post inc/dec). However, it did not check for them either, leading to a crash if it tried to transform one of them. llvm-svn: 235497	2015-04-22 11:32:25 +00:00
Vasileios Kalintiris	e7508c9fc7	Revert "[mips][FastISel] Implement shift ops for Mips fast-isel." This reverts commit r235194. It was causing a failure in FastISel buildbots due to sign-extension issues. llvm-svn: 235495	2015-04-22 10:08:46 +00:00
James Molloy	cd2334e86e	[AArch64] Disable complex GEP optimization by default. Enough concerns were raised that this optimization is pessimising some code patterns. The obvious fix, to add a Reassociate run afterwards, causes even more pessimisation in some cases due to fewer complex addressing modes being matched. As there isn't a trivial fix for this, backing this out by default until someone gets a chance to fix the addressing mode matcher. llvm-svn: 235491	2015-04-22 09:11:38 +00:00
Lang Hames	65613a634a	[patchpoint] Add support for symbolic patchpoint targets to SelectionDAG and the X86 backend. The code generated for symbolic targets is identical to the code generated for constant targets, except that a relocation is emitted to fix up the actual target address at link-time. This allows IR and object files containing patchpoints to be cached across JIT-invocations where the target address may change. llvm-svn: 235483	2015-04-22 06:02:31 +00:00
Sanjay Patel	fe1365ac50	[x86] allow 64-bit extracted vector element integer stores on a 32-bit system With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system even though 64-bit integers are not legal types. So instead of producing this: pshufd $229, %xmm0, %xmm1 ## xmm1 = xmm0[1,1,2,3] movd %xmm0, (%eax) movd %xmm1, 4(%eax) We can do: movq %xmm0, (%eax) This is a fix for the problem noted in D7296. Differential Revision: http://reviews.llvm.org/D9134 llvm-svn: 235460	2015-04-22 00:24:30 +00:00
Reid Kleckner	f14787dad8	[WinEH] Correctly handle inlined __finally blocks with captures We should also teach the inliner to collapse framerecover of frameaddress of the current frame down to an alloca, but that can happen later. llvm-svn: 235459	2015-04-22 00:07:52 +00:00
Krzysztof Parzyszek	499bc5faa1	[Hexagon] Patterns for frame index with offset for isel llvm-svn: 235418	2015-04-21 21:28:03 +00:00
Reid Kleckner	d2a1a51996	Re-land r235154-r235156 under the existing -sehprepare flag Keep the old SEH fan-in lowering on by default for now, since projects rely on it. This will make it easy to test this change with a simple flag flip. llvm-svn: 235399	2015-04-21 18:23:57 +00:00
Matthias Braun	9e9e8b3230	X86: Match for X86ISD nodes in LowerBUILD_VECTOR instead of BUILD_VECTORCombine There doesn't seem to be a reason to perform this target ISD node matching in an DAGCombine, moving it to lowering fixes PR23296. Differential Revision: http://reviews.llvm.org/D9137 llvm-svn: 235394	2015-04-21 17:21:36 +00:00
Vasileios Kalintiris	32177d6bec	[mips] Optimize code generation for 64-bit variable shift instructions. Summary: The 64-bit version of the variable shift instructions uses the shift_rotate_reg class which uses a GPR32Opnd to specify the variable shift amount. With this patch we avoid the generation of a redundant SLL instruction for the variable shift instructions in 64-bit targets. Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7413 llvm-svn: 235376	2015-04-21 10:49:03 +00:00
Elena Demikhovsky	50b88ddb87	AVX-512: Added logical and arithmetic instructions for SKX by Asaf Badouh (asaf.badouh@intel.com) llvm-svn: 235375	2015-04-21 10:27:40 +00:00
Simon Pilgrim	398ce22b86	[X86][SSE] Provide execution domains for scalar floating point operations This is an updated version of Chandler's patch D7402 that got accepted but never committed, and has bit-rotted a bit since. I've updated the execution domain declarations to match the approach of the packed templates and also added some extra scalar unary tests. Differential Revision: http://reviews.llvm.org/D9095 llvm-svn: 235372	2015-04-21 08:40:22 +00:00
Simon Pilgrim	860f08779c	CONCAT_VECTOR of BUILD_VECTOR - minor fix Fixed issue with the combine of CONCAT_VECTOR of 2 BUILD_VECTOR nodes - the optimisation wasn't ensuring that the scalar operands of both nodes were the same type/size for implicit truncation. Test case spotted by Patrik Hagglund llvm-svn: 235371	2015-04-21 08:05:43 +00:00
Pawel Bylica	57c2f7c756	Fix generic shift expansion when shift amount is 0 Summary: This fixes http://llvm.org/bugs/show_bug.cgi?id=16439. This is one possible way to approach this. The other would be to split InL>>(nbits-Amt) into (InL>>(nbits-1-Amt))>>1, which is also valid since since we only need to care about Amt up nbits-1. It's hard to tell which one is better since the shift might be expensive if this stage of expansion is not yet a legal machine integer, whereas comparisons with zero are relatively cheap at all sizes, but more expensive than a shift if the shift is on a legal machine type. Patch by Keno Fischer! Test Plan: regression test from http://reviews.llvm.org/D7752 Reviewers: chfast, resistor Reviewed By: chfast, resistor Subscribers: sanjoy, resistor, chfast, llvm-commits Differential Revision: http://reviews.llvm.org/D4978 llvm-svn: 235370	2015-04-21 06:28:36 +00:00
Matthias Braun	b6b5aaad98	X86: Do not select X86 custom vector nodes if operand types don't match X86ISD::ADDSUB, X86ISD::(F)HADD, X86ISD::(F)HSUB should not be selected if the operand types do not match the result type because vector type legalization cannot deal with this for custom nodes. Testcase X86ISD::ADDSUB is attached. I could not create a testcase for the FHADD/FHSUB cases because of: https://llvm.org/bugs/show_bug.cgi?id=23296 Differential Revision: http://reviews.llvm.org/D9120 llvm-svn: 235367	2015-04-21 01:13:41 +00:00
Pirama Arumuga Nainar	80f958dbf4	Fix flakiness in fp16-promote.ll Summary: In the f16-promote test, make the checks for native conversion instructions similar to the libcall checks: - Remove hard coded register names - Do not check exact instruction sequences. This fixes test flakiness due to non-determinism in instruction scheduling and register allocation. I also fixed a few minor things in the CHECK-LIBCALL checks. I'll try to find a way to check that unnecessary loads, stores, or conversions don't happen. Reviewers: mzolotukhin, srhines, ab Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D9112 llvm-svn: 235363	2015-04-20 23:54:41 +00:00
Sanjay Patel	362f89cd46	use update_llc_test_checks.py to tighten checking Also, replace win and linux runs with a generic run because that makes no difference in what this test is checking. llvm-svn: 235361	2015-04-20 23:31:53 +00:00
Andrew Kaylor	41758517bf	[WinEH] Fix problem with mapping shared empty handler blocks. Differential Revision: http://reviews.llvm.org/D9125 llvm-svn: 235354	2015-04-20 22:04:09 +00:00
Olivier Sallenave	b99c2eb0f0	Refactoring and enhancement to FMA combine. llvm-svn: 235344	2015-04-20 20:29:40 +00:00
Andrew Kaylor	3ae6251ceb	Fixing line endings llvm-svn: 235342	2015-04-20 20:27:28 +00:00
Pirama Arumuga Nainar	34056dea1b	[MIPS] OperationAction for FP_TO_FP16, FP16_TO_FP Summary: Set operation action for FP16 conversion opcodes, so the Op legalizer can choose the gnu_* libcalls for Mips. Set LoadExtAction and TruncStoreAction for f16 scalars and vectors to prevent (fpext (load )) and (store (fptrunc)) from getting combined into unsupported operations. Added test cases to test that these operations are handled correctly for f16 scalars and vectors. This patch depends on http://reviews.llvm.org/D8755. Reviewers: srhines Subscribers: llvm-commits, ab Differential Revision: http://reviews.llvm.org/D8804 llvm-svn: 235341	2015-04-20 20:15:36 +00:00
Tom Stellard	69a7b91e95	DAGCombine: Remove redundant NaN checks around ISD::FSQRT This folds: (select (setcc x, -0.0, *lt), NaN, (fsqrt x)) -> ( fsqrt x) llvm-svn: 235333	2015-04-20 19:38:27 +00:00
Andrea Di Biagio	98c367093d	[X86][FastIsel] Fix assertion failure when selecting int-to-double conversion (PR23273). This fixes a regression introduced at revision 231243. The target-independent selection algorithm in FastISel knows how to select a SINT_TO_FP if the target is SSE but not AVX. That is because on X86, the tablegen'd 'fastEmit' functions know how to select CVTSI2SSrr and CVTSI2SDrr. Method X86FastISel::X86SelectSIToFP was therefore working under the wrong assumption that the target was AVX. That assumption was incorrect since we can have a target that is neither AVX nor SSE. So, rather than asserting for the presence of AVX, we should have had an early exit from 'X86SelectSIToFP' if the target was not AVX. This patch fixes the issue replacing the invalid assertion with an early exit. Thanks to Dimitry Andric for reporting this problem and for providing a small reproducible testcase. Added test pr23273.ll. llvm-svn: 235295	2015-04-20 11:56:59 +00:00
Hal Finkel	1e5733bbed	[InlineAsm] Remove EarlyClobber on registers that are also inputs When an inline asm call has an output register marked as early-clobber, but that same register is also an input operand, what should we do? GCC accepts this, and is documented to accept this for read/write operands saying, "Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it's used." For write-only operands, the situation seems less clear, but I have at least one existing codebase that assumes this will work, in part because it has syscall macros like this: ({ \ register uint64_t r0 __asm__ ("r0") = (__NR_ ## name); \ register uint64_t r3 __asm__ ("r3") = ((uint64_t) (arg0)); \ register uint64_t r4 __asm__ ("r4") = ((uint64_t) (arg1)); \ register uint64_t r5 __asm__ ("r5") = ((uint64_t) (arg2)); \ __asm__ __volatile__ \ ("sc" \ : "=&r"(r0),"=&r"(r3),"=&r"(r4),"=&r"(r5) \ : "0"(r0), "1"(r3), "2"(r4), "3"(r5) \ : "r6","r7","r8","r9","r10","r11","r12","cr0","memory"); \ r3; \ }) Furthermore, with register aliases and subregister relationships that only the backend knows about, rejecting this in the frontend seems like a difficult proposition (if we wanted to do so). However, keeping the early-clobber flag on the INLINEASM MI does not work for us, because it will cause the register's live interval to end to soon (so it will not appear defined to be used as an input). Fortunately, fixing this does not seem hard: When forming the INLINEASM MI, check to see if any of the early-clobber outputs are also inputs, and if so, remove the early-clobber flag. llvm-svn: 235283	2015-04-20 00:01:30 +00:00
Simon Pilgrim	749953eebb	[X86][SSE] Fix for getScalarValueForVectorElement to detect scalar sources requiring truncation. The fix ensures that scalar sources inserted into a vector are the correct bit size. Integer scalar sources from BUILD_VECTOR and SCALAR_TO_VECTOR nodes may require truncation that this function doesn't currently support. llvm-svn: 235281	2015-04-19 22:16:49 +00:00
Simon Pilgrim	4c107b5258	[X86][SSE] Extended copysign tests to include llvm intrinsic implementation and constant folding. llvm-svn: 235279	2015-04-19 21:34:57 +00:00
Simon Pilgrim	6292d50eda	[X86][AVX2] Force execution domain on broadcast folding tests. llvm-svn: 235260	2015-04-18 21:24:16 +00:00
Simon Pilgrim	e68c8606ed	[X86][SSE] Force execution domain on float/double unpack shuffle tests. llvm-svn: 235259	2015-04-18 18:50:55 +00:00
Ahmed Bougacha	279e3ee954	[GlobalMerge] Look at uses to create smaller global sets. Instead of merging everything together, look at the users of GlobalVariables, and try to group them by function, to create sets of globals used "together". Using that information, a less-aggressive alternative is to keep merging everything together except globals that are only ever used alone, that is, those for which it's clearly non-profitable to merge with others. In my testing, grouping by Function is too aggressive, but grouping by BasicBlock is too conservative. Anything in-between isn't trivially available, so stick with Function grouping for now. cl::opts are added for testing; both enabled by default. A few of the testcases aren't testing the merging proper, but just various edge cases when merging does occur. Update them to use the previous grouping behavior. Also, one of the tests is unrelated to GlobalMerge; change it accordingly. While there, switch to r234666' flags rather than the brutal -O3. Differential Revision: http://reviews.llvm.org/D8070 llvm-svn: 235249	2015-04-18 01:21:58 +00:00
Ahmed Bougacha	e14a4d487e	[AArch64] Don't force MVT::Untyped when selecting LD1LANEpost. The result is either an Untyped reg sequence, on ldN with N > 1, or just the type of the input vector, on ld1. Don't force Untyped. Instead, just use the type of the reg sequence. This mirrors the behavior of createTuple, which feeds the LD1*_POST. The narrow code path wasn't actually covered by tests, because V64 insert_vector_elt are widened to V128 before the LD1LANEpost combine has the chance to run, usually. The only case where it does run on V64 vectors is if the vector ops legalizer ran. So, tickle the code with a ctpop. Fixes PR23265. llvm-svn: 235243	2015-04-17 23:43:33 +00:00
Ahmed Bougacha	28fc7ca53f	Fix another typo in r235224 testcase. NFC. Third time's the charm! llvm-svn: 235242	2015-04-17 23:38:46 +00:00
Andrew Kaylor	ea8df61d4d	[WinEH] Fixes for a few cppeh failures. Differential Review: http://reviews.llvm.org/D9065 llvm-svn: 235239	2015-04-17 23:05:43 +00:00
Pete Cooper	2bbbd8b543	AArch64: Add test for returning [2 x i64] in registers. NFC. llvm-svn: 235228	2015-04-17 21:31:25 +00:00
Ahmed Bougacha	84b801f6b4	Fix typo in r235224 testcase. NFC. llvm-svn: 235226	2015-04-17 21:11:58 +00:00
Ahmed Bougacha	2448ef5f33	[AArch64] Avoid vector->load dependency cycles when creating LD1post. They would break the SelectionDAG. Note that the opposite load->vector dependency is already obvious in: (LD1post vec, ..) llvm-svn: 235224	2015-04-17 21:02:30 +00:00
Pirama Arumuga Nainar	db7c07e2bf	Add support to promote f16 to f32 Summary: This patch adds legalization support to operate on FP16 as a load/store type and do operations on it as floats. Tests for ARM are added to test/CodeGen/ARM/fp16-promote.ll Reviewers: srhines, t.p.northover Differential Revision: http://reviews.llvm.org/D8755 llvm-svn: 235215	2015-04-17 18:36:25 +00:00
Vasileios Kalintiris	816ea84e7a	[mips][FastISel] Implement FastMaterializeAlloca in Mips fast-isel. Summary: Implement the method FastMaterializeAlloca in Mips fast-isel Based on a patch by Reed Kotler. Test Plan: Passes test-suite at O0/O2 for mips32 r1/r2 fastalloca.ll Reviewers: dsanders, rkotler Subscribers: rfuhler, llvm-commits Differential Revision: http://reviews.llvm.org/D6742 llvm-svn: 235213	2015-04-17 17:29:58 +00:00
Sanjay Patel	2161c49a4e	[X86, AVX] add an exedepfix entry for vmovq == vmovlps == vmovlpd This is the AVX extension of r235014: http://llvm.org/viewvc/llvm-project?view=revision&revision=235014 Review: http://reviews.llvm.org/D8691 llvm-svn: 235210	2015-04-17 17:02:37 +00:00
Vasileios Kalintiris	a4035e6284	[mips][FastISel] Implement shift ops for Mips fast-isel. Summary: Add shift operators implementation to fast-isel for Mips. These are shift ops for non legal forms, i.e. i8 and i16. Based on a patch by Reed Kotler. Test Plan: Reviewers: dsanders Subscribers: echristo, rfuhler, llvm-commits Differential Revision: http://reviews.llvm.org/D6726 llvm-svn: 235194	2015-04-17 14:29:21 +00:00
James Molloy	a4ff7b2713	Fix TRUNCATE splitting helper logic. This is a followon to r233681 - I'd misunderstood the semantics of FTRUNC, and had confused it with (FP_ROUND ..., 0). Thanks for Ahmed Bougacha for his post-commit review! llvm-svn: 235191	2015-04-17 13:51:40 +00:00
Vasileios Kalintiris	bb60cfb5c4	[mips] Teach the delay slot filler to remove needless KILL instructions. Summary: Previously, the presence of KILL instructions would block valid candidates from filling a specific delay slot. With the elimination of the KILL instructions, in the appropriate range, we are able to fill more slots and keep the information from future def/use analysis consistent. Reviewers: dsanders Reviewed By: dsanders Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D7724 llvm-svn: 235183	2015-04-17 12:01:02 +00:00
Nico Weber	a762fa6c98	Revert r235154-r235156, they cause asserts when building win64 code (http://crbug.com/477988 ) llvm-svn: 235170	2015-04-17 09:10:43 +00:00
Reid Kleckner	56cc879ac1	Fix test failure due to racing commits It looks like r235145 changed the .ll syntax for variadic calls. Update tests to use the new syntax. llvm-svn: 235156	2015-04-17 01:09:53 +00:00
Reid Kleckner	d4523e3c51	[SEH] Reimplement x64 SEH using WinEHPrepare This now emits simple, unoptimized xdata tables for __C_specific_handler based on the handlers listed in @llvm.eh.actions calls produced by WinEHPrepare. This adds support for running __finally blocks when exceptions are thrown, and removes the old landingpad fan-in codepath. I ran some manual execution tests on small basic test cases with and without optimization, as well as on Chrome base_unittests, which uses a small amount of SEH. I'm sure there are bugs, and we may need to revert. llvm-svn: 235154	2015-04-17 01:01:27 +00:00
Ahmed Bougacha	941420d9ea	[AArch64] Don't assert on f16 in DUP PerfectShuffle generator. Found by code inspection, but breaking i16 at least breaks other tests. They aren't checking this in particular though, so also add some explicit tests for the already working types. llvm-svn: 235148	2015-04-16 23:57:07 +00:00

... 3 4 5 6 7 ...

12912 Commits