llvm-project

Commit Graph

Author	SHA1	Message	Date
Yonghong Song	6621cf67cf	bpf: fix bug on silently truncating 64-bit immediate We came across an llvm bug when compiling some testcases that 64-bit immediates are silently truncated into 32-bit and then packed into BPF_JMP \| BPF_K encoding. This caused comparison with wrong value. This bug looks to be introduced by r308080. The Select_Ri pattern is supposed to be lowered into J_Ri while the latter only support 32-bit immediate encoding, therefore Select_Ri should have similar immediate predicate check as what J_Ri are doing. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 315889	2017-10-16 04:14:53 +00:00
Hiroshi Inoue	e3a3e3c9e9	[PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass. If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated. One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated. void int_func(int); void ii_test(int a) { if (a & 1) return int_func(a); } Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG. Differential Revision: https://reviews.llvm.org/D31319 llvm-svn: 315888	2017-10-16 04:12:57 +00:00
Daniel Sanders	ea8711b88e	Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. The previous commit failed on MSVC due to a failure to convert an initializer_list to a std::vector. Hopefully, MSVC will accept this version. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315887	2017-10-16 03:36:29 +00:00
Daniel Sanders	ce72d611af	Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* MSVC doesn't like one of the constructors. llvm-svn: 315886	2017-10-16 02:15:39 +00:00
Daniel Sanders	6735ea86cd	[globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315885	2017-10-16 01:16:35 +00:00
Daniel Sanders	a71f454765	[globalisel][tablegen] Implement unindexed load, non-extending load, and MemVT checks Summary: This includes some context-sensitivity in the MVT to LLT conversion so that pointer types are tested correctly. FIXME: I'm not happy with the way this is done since everything is a special-case. I've yet to find a reasonable way to implement it. select-load.mir fails because <1 x s64> loads in tablegen get priority over s64 loads. This is fixed in the next patch and as such they should be committed together, I've posted them separately to help with the review. Depends on D37456 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37457 llvm-svn: 315884	2017-10-16 00:56:30 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Amjad Aboud	c8d67979c0	[X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565 Differential Revision: https://reviews.llvm.org/D38359 llvm-svn: 315851	2017-10-15 11:00:56 +00:00
Craig Topper	a9cd59fb5d	[X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions. Summary: It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register. It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day. Reviewers: RKSimon, delena, zvi Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38932 llvm-svn: 315849	2017-10-15 06:39:07 +00:00
Craig Topper	f02e97859b	[X86] Don't use constant condition for select instruction when testing masking ops. We should be able to fold constant conditions by converting to shuffles, but fixing that would break these tests in their current form. Since they are really trying to test masking ops, add a non-constant mask to the selects. llvm-svn: 315848	2017-10-15 06:05:50 +00:00
Konstantin Zhuravlyov	263f7f6676	AMDGPU: Temporary disable pal metadata check line in llvm-readobj test It fails on mips llvm-svn: 315837	2017-10-14 23:42:11 +00:00
Craig Topper	dfb443e88c	[X86] Remove a bunch of dead FileCheck lines with the wrong prefix. llvm-svn: 315828	2017-10-14 21:46:55 +00:00
Simon Pilgrim	36fe00ee17	[X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947) llvm-svn: 315825	2017-10-14 19:57:19 +00:00
Simon Pilgrim	3f49b988e0	[X86][SSE] Test vector imul reduction on 32 and 64-bit targets llvm-svn: 315824	2017-10-14 19:46:08 +00:00
Konstantin Zhuravlyov	a01d8b0b63	AMDGPU: Bring HSA metadata on par with the specification Differential Revision: https://reviews.llvm.org/D38753 llvm-svn: 315821	2017-10-14 19:03:51 +00:00
Konstantin Zhuravlyov	b3c605d680	llvm-readobj: Print AMDGPU note contents Differential Revision: https://reviews.llvm.org/D38752 llvm-svn: 315819	2017-10-14 18:21:42 +00:00
Simon Pilgrim	5bd4431aec	Cleanup update_llc_test_checks.py notes. llvm-svn: 315817	2017-10-14 17:37:03 +00:00
Konstantin Zhuravlyov	7b4be1ed89	AMDGPU: Cleanup elf-notes.ll test llvm-svn: 315816	2017-10-14 17:36:53 +00:00
Konstantin Zhuravlyov	716af741e9	llvm-readobj: Print AMDGPU note type names Differential Revision: https://reviews.llvm.org/D38751 llvm-svn: 315813	2017-10-14 16:43:46 +00:00
Konstantin Zhuravlyov	eda425edd4	AMDGPU: Do not emit deprecated notes for code object v3 Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810	2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov	9c05b2bc3b	AMDGPU: Add support for isa version note - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808	2017-10-14 15:40:33 +00:00
Simon Pilgrim	f367c27d2d	[X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X)) If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle. Fixes the last issue with PR22415 llvm-svn: 315807	2017-10-14 15:01:36 +00:00
Craig Topper	f7e777763d	[X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load. llvm-svn: 315802	2017-10-14 07:04:48 +00:00
Craig Topper	61010a85b8	[X86] Add AVX512 versions of VCVTPD2PS to load folding tables. llvm-svn: 315801	2017-10-14 05:55:43 +00:00
Craig Topper	ee277e190c	[X86] Add patterns for vzmovl+cvtpd2ps with a load. llvm-svn: 315800	2017-10-14 05:55:42 +00:00
Craig Topper	134241e4af	[X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables. llvm-svn: 315796	2017-10-14 04:18:08 +00:00
Daniel Sanders	bfa9e2cae7	[globalisel][tablegen] Simplify named operand/operator lookups and fix a wrong-code bug this revealed. Summary: Operand variable lookups are now performed by the RuleMatcher rather than searching the whole matcher hierarchy for a match. This revealed a wrong-code bug that currently affects ARM and X86 where patterns that use a variable more than once in the match pattern will be imported but won't check that the operands are identical. This can cause the tablegen-erated matcher to accept matches that should be rejected. Depends on D36569 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: aemerson, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36618 llvm-svn: 315780	2017-10-14 00:31:58 +00:00
Craig Topper	f6c69564e7	[X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768	2017-10-13 21:56:48 +00:00
Craig Topper	526b70a089	[X86] Use fsub in the movddup scheduling tests to prevent a future patch from folding movddup as a broadcast load. llvm-svn: 315767	2017-10-13 21:56:45 +00:00
Daniel Sanders	11300cead8	[globalisel][tablegen] Add support for fpimm and import of APInt/APFloat based ImmLeaf. Summary: There's only a tablegen testcase for IntImmLeaf and not a CodeGen one because the relevant rules are rejected for other reasons at the moment. On AArch64, it's because there's an SDNodeXForm attached to the operand. On X86, it's because the rule either emits multiple instructions or has another predicate using PatFrag which cannot easily be supported at the same time. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36569 llvm-svn: 315761	2017-10-13 21:28:03 +00:00
Matt Arsenault	e11d8aca77	AMDGPU: Implement hasBitPreservingFPLogic llvm-svn: 315754	2017-10-13 21:10:22 +00:00
Matt Arsenault	550c66d10f	AMDGPU: Look for src mods before fp_extend When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748	2017-10-13 20:45:49 +00:00
Matt Arsenault	4d70754e3c	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. llvm-svn: 315744	2017-10-13 20:18:59 +00:00
Krzysztof Parzyszek	7c9c05888c	[Hexagon] Minimize number of repeated constant extenders Each constant extender requires an extra instruction, which adds to the code size and also reduces the number of available slots in an instruction packet. In most cases, the value of a repeated constant extender could be loaded into a register, and the instructions using the extender could be replaced with their counterparts that use that register instead. This patch adds a pass that tries to reduce the number of constant extenders, including extenders which differ only in an immediate offset known at compile time, e.g. @global and @global+12. llvm-svn: 315735	2017-10-13 19:02:59 +00:00
Craig Topper	5d692917f4	[X86] Add initial skeleton support for knm cpu This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722	2017-10-13 18:10:17 +00:00
Simon Pilgrim	c4977fa9a1	[X86] Test scalar integer absolutes on 32-bit targets with/without CMOV llvm-svn: 315711	2017-10-13 17:09:20 +00:00
Reid Kleckner	be3724b5e1	Not all buildbots seem to dump the nuw flag in SDAG llvm-svn: 315710	2017-10-13 17:00:49 +00:00
Simon Pilgrim	df9611e178	[X86] Updated scalar integer absolute tests to cover i8/i16/i32/i64 llvm-svn: 315706	2017-10-13 16:53:07 +00:00
Reid Kleckner	c687a34870	Update test to expect nuw flag in SDAG dump, fixes test after r315690 llvm-svn: 315698	2017-10-13 16:13:23 +00:00
Krzysztof Parzyszek	a0f2f7c413	[Hexagon] Add patterns for cmpb/cmph with immediate arguments Patch by Sumanth Gundapaneni. llvm-svn: 315692	2017-10-13 15:43:12 +00:00
Craig Topper	bf0de9d3b6	[X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674	2017-10-13 06:07:10 +00:00
Craig Topper	11655b22dc	[X86] Add the test case for r315613 that I forgot to 'git add'. llvm-svn: 315649	2017-10-13 00:20:47 +00:00
Wei Ding	5676acad9e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610	2017-10-12 19:37:14 +00:00
Craig Topper	3dc37cc592	[X86] Add a bunch of -mcpu strings to the cpus.ll test. We were missing most of the "core" aliases as well as skylake, cannonlake, and knights landing. llvm-svn: 315606	2017-10-12 18:55:57 +00:00
Artem Belevich	3bafc2f0d9	[NVPTX] Implemented wmma intrinsics and instructions. WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601	2017-10-12 18:27:55 +00:00
Lei Huang	0724fea2da	[PowerPC] Add profitablilty check for conversion to mtctr loops Add profitability checks for modifying counted loops to use the mtctr instruction. The latency of mtctr is only justified if there are more than 4 comparisons that will be removed as a result. Usually counted loops are formed relatively early and before unrolling, so most low trip count loops often don't survive. However we want to ensure that if they do, we do not mistakenly update them to mtctr loops. Use CodeMetrics to ensure we are only doing this for small loops with small trip counts. Differential Revision: https://reviews.llvm.org/D38212 llvm-svn: 315592	2017-10-12 16:43:33 +00:00
Tim Renouf	c8ffffe462	[AMDGPU] For amdpal, widen interpolation mode workaround Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591	2017-10-12 16:16:41 +00:00
Mikael Holmen	a079ef68e3	[RegisterCoalescer] Don't set read-undef in pruneValues, only clear Summary: The comments in the code said // Remove <def,read-undef> flags. This def is now a partial redef. but the code didn't just remove read-undef, it could introduce new ones which could cause errors. E.g. if we have something like %vreg1<def> = IMPLICIT_DEF %vreg2:subreg1<def, read-undef> = op %vreg3, %vreg4 %vreg2:subreg2<def> = op %vreg6, %vreg7 and we merge %vreg1 and %vreg2 then we should not set undef on the second subreg def, which the old code did. Now we solve this by actually do what the code comment says. We remove read-undef flags rather than remove or introduce them. Reviewers: qcolombet, MatzeB Reviewed By: MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38616 llvm-svn: 315564	2017-10-12 06:21:28 +00:00
Wei Mi	1736efd16a	Revert r307036 because of PR34919. llvm-svn: 315540	2017-10-12 00:24:52 +00:00
Konstantin Zhuravlyov	c3beb6a075	AMDGPU/NFC: Minor clean ups in PAL metadata - Move PAL metadata definitions to AMDGPUMetadata - Make naming consistent with HSA metadata Differential Revision: https://reviews.llvm.org/D38745 llvm-svn: 315523	2017-10-11 22:41:09 +00:00
Konstantin Zhuravlyov	a63b0f9d20	AMDGPU/NFC: Rename code object metadata as HSA metadata - Rename AMDGPUCodeObjectMetadata to AMDGPUMetadata (PAL metadata will be included in this file in the follow up change) - Rename AMDGPUCodeObjectMetadataStreamer to AMDGPUHSAMetadataStreamer - Introduce HSAMD namespace - Other minor name changes in function and test names llvm-svn: 315522	2017-10-11 22:18:53 +00:00
Krzysztof Parzyszek	c4a9a8d8e0	[Hexagon] Make sure that new-value jump is packetized with producer llvm-svn: 315510	2017-10-11 21:20:43 +00:00
Florian Hahn	e52abba277	[MachineCombiner] Fix initialisation of LastUpdate for incremental update. Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled. Patch by Paul Walker. Reviewers: fhahn, Gerolf, efriedma, MatzeB Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38734 llvm-svn: 315502	2017-10-11 20:25:58 +00:00
Lei Huang	263dc4ef3a	[PowerPC] Utilize DQ-Form instructions for spill/restore and fix FrameIndex elimination to only use `lis/addi` if necessary. Currently we produce a bunch of unnecessary code when emitting the prologue/epilogue for spills/restores. Namely, if the load from stack slot/store to stack slot instruction is an X-Form instruction, we will always produce an LIS/ORI sequence for the stack offset. Furthermore, we have not exploited the P9 vector D-Form loads/stores for this purpose. This patch address both issues. Specifying the D-Form load as the instruction to use for stack spills/reloads should be safe because: 1. The stack should be aligned according to the ABI 2. If the stack isn't aligned, PPCRegisterInfo::eliminateFrameIndex() will check for the offset being a multiple of 16 and will convert it to an X-Form instruction if it isn't. Differential Revision : https://reviews.llvm.org/D38758 llvm-svn: 315500	2017-10-11 20:20:58 +00:00
Sanjay Patel	6c0aef77aa	[x86] avoid infinite loop from SoftenFloatOperand (PR34866) Legalization of fp128 assumes things that we should have asserts for, so that's another potential improvement. Differential Revision: https://reviews.llvm.org/D38771 llvm-svn: 315485	2017-10-11 18:24:21 +00:00
Lei Huang	f9c7f7fed4	[NFC] update test case so checks are not order dependent when not needed llvm-svn: 315482	2017-10-11 18:04:41 +00:00
Krzysztof Parzyszek	8f174dde92	[Pipeliner] Improve serialization order for post-increments The pipeliner is generating a serial sequence that causes poor register allocation when a post-increment instruction appears prior to the use of the post-increment register. This occurs when there is a circular set of dependences involved with a sequence of instructions in the same cycle. In this case, there is no serialization of the parallel semantics that will not cause an additional register to be allocated. This patch fixes the problem by changing the instructions so that the post-increment instruction is used by the subsequent instruction, which enables the register allocator to make a better decision and not require another register. Patch by Brendon Cahoon. llvm-svn: 315466	2017-10-11 15:51:44 +00:00
Sanjay Patel	34fd5eaaf0	[DAGCombiner] convert insertelement of bitcasted vector into shuffle Eg: insert v4i32 V, (v2i16 X), 2 --> shuffle v8i16 V', X', {0,1,2,3,8,9,6,7} This is a generalization of the IR fold in D38316 to handle insertion into a non-undef vector. We may want to abandon that one if we can't find value in squashing the more specific pattern sooner. We're using the existing legal shuffle target hook to avoid AVX512 horror with vXi1 shuffles. There may be room for improvement in the shuffle lowering here, but that would be follow-up work. Differential Revision: https://reviews.llvm.org/D38388 llvm-svn: 315460	2017-10-11 14:12:16 +00:00
Simon Dardis	442ee63468	[mips] Add missing tests from rL315451 llvm-svn: 315454	2017-10-11 11:45:06 +00:00
Uriel Korach	782f28bf2f	[X86] Added tests for TESTM and TESTNM (NFC) Adding this test files now so after another commit that will add a new pattern for TESTM and TESTNM instructions will show the improvemnts that have been done. Change-Id: If3908b7f91897d764053312365a2bc1de78b291d llvm-svn: 315443	2017-10-11 08:39:25 +00:00
Craig Topper	6ce20bd184	[X86] Add 128-bit version of vbroadcasti32x2 to shuffle comment decoding. llvm-svn: 315395	2017-10-11 00:11:53 +00:00
Craig Topper	bb0e316dc7	[X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load. We already have these patterns for AVX512VL, but not AVX1 or 2. llvm-svn: 315382	2017-10-10 22:40:31 +00:00
Sanjay Patel	b74063d21f	[x86] fix prefix typos for CHECK lines; NFC llvm-svn: 315368	2017-10-10 21:12:47 +00:00
Simon Dardis	b994128d14	[mips] Correct the instruction predicates for microMIPSr3 Rather than using the AdditionalPredicates mechanism to guard the microMIPS instructions, use the existing predicates to properly guard those instructions. This also resolves a case where an instruction pattern was incorrectly available for microMIPS32R6, which caused a register allocation failure as the registers specified in the pattern were not available. Reviewers: nitesh.jain, atanasyan Differential Revision: https://reviews.llvm.org/D38451 llvm-svn: 315362	2017-10-10 20:52:53 +00:00
Matt Arsenault	f42074b699	AMDGPU: Fix missing skipFunction calls llvm-svn: 315361	2017-10-10 20:48:36 +00:00
Matt Arsenault	d674e0ac0d	AMDGPU: Fix failure to select branch with optnone opt-bisect/optnone disable the AMDGPUUniformAnnotateValues pass. The heuristic in the custom selector for brcond deferred the branch uniformity check to the pattern, which would fail. llvm-svn: 315360	2017-10-10 20:34:49 +00:00
Yaxun Liu	de4b88d9a1	[AMDGPU] Lower enqueued blocks and generate runtime metadata This patch adds a post-linking pass which replaces the function pointer of enqueued block kernel with a global variable (runtime handle) and adds runtime-handle attribute to the enqueued block kernel. In LLVM CodeGen the runtime-handle metadata will be translated to RuntimeHandle metadata in code object. Runtime allocates a global buffer for each kernel with RuntimeHandel metadata and saves the kernel address required for the AQL packet into the buffer. __enqueue_kernel function in device library knows that the invoke function pointer in the block literal is actually runtime handle and loads the kernel address from it and puts it into AQL packet for dispatching. This cannot be done in FE since FE cannot create a unique global variable with external linkage across LLVM modules. The global variable with internal linkage does not work since optimization passes will try to replace loads of the global variable with its initialization value. Differential Revision: https://reviews.llvm.org/D38610 llvm-svn: 315352	2017-10-10 19:39:48 +00:00
Adrian Prantl	16b8b47152	Debug Info: Fix the SDLoc propagation for a DAGCombiner rule This patch ensures that the rule: fold (zext (load x)) -> (zext (truncate (zextload x))) propagates the SDLoc of the load to the zextload. <rdar://problem/33755881> llvm-svn: 315340	2017-10-10 18:08:32 +00:00
Jacob Gravelle	37af00e7d0	[WebAssembly] Narrow the scope of WebAssemblyFixFunctionBitcasts Summary: The pass to fix function bitcasts generates thunks for functions that are called directly with a mismatching signature. It was also generating thunks in cases where the function was address-taken, causing aliasing problems in otherwise valid cases. This patch tightens the restrictions for when the pass runs. Reviewers: sunfish, dschuff Subscribers: jfb, sbc100, llvm-commits, aheejin Differential Revision: https://reviews.llvm.org/D38640 llvm-svn: 315326	2017-10-10 16:20:18 +00:00
Simon Pilgrim	053a299a9b	[X86][AVX512] Regenerate element insertion/extraction tests llvm-svn: 315322	2017-10-10 15:58:54 +00:00
Sanjay Patel	7d52c7ca74	[x86] add tests for insertelement; NFC llvm-svn: 315312	2017-10-10 13:45:25 +00:00
David Stuttard	51c1b22806	[DAGCombine] Fix for shuffle to vector extend for non power 2 vectors Summary: See https://llvm.org/PR33743 for more details It seems that for non-power of 2 vector sizes, the algorithm can produce non-matching sizes for input and result causing an assert. This usually isn't a problem as the isAnyExtend check will weed these out, but in some cases (most often with lots of undefined values for the mask indices) it can pass this check for non power of 2 vectors. Adding in an extra check that ensures that bit size will match for the result and input (as required) Subscribers: nhaehnle Differential Revision: https://reviews.llvm.org/D35241 llvm-svn: 315307	2017-10-10 12:45:45 +00:00
Nicolai Haehnle	312b64f4d7	AMDGPU: Split MUBUF offset into aligned components Summary: Atomic buffer operations do not work (and trap on gfx9) when the components are unaligned, even if their sum is aligned. Previously, we generated an offset of 4156 without an SGPR by splitting it as 4095 + 61 (immediate + inline constant). The highest offset for which we can do this correctly is 4156 = 4092 + 64. Fixes dEQP-GLES31.functional.ssbo.atomic.* Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37850 llvm-svn: 315302	2017-10-10 12:22:23 +00:00
Gadi Haber	2b132eb4f8	[X86][SKYLAKE] Update regression test to differentiate between HASWELL and SKYLAKE scheduling.<NFC> NFC. Updated 6 regression tests to differentiate between HASWELL and SKYLAKE scheduling information. The fix is in preparation of a patch to update the information of the Skylake Client scheduling to include the appropriate load and store latencies. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D38685 Change-Id: Ifc6b98d9eaf266913698f24c766fd994fc977555 llvm-svn: 315291	2017-10-10 09:53:18 +00:00
Nemanja Ivanovic	7bf866eb10	Fix for PR34888. The issue is that we assume operand zero of the input to the add instruction is a register. In this case, the input comes from inline assembly and operand zero is not a register thereby causing a crash. The code will bail anyway if the input instruction doesn't have the right opcode. So do that check first and let short-circuiting prevent the crash. llvm-svn: 315285	2017-10-10 08:46:10 +00:00
Craig Topper	a88306e6fb	[AVX512] Add patterns to commute integer comparison instructions during isel. This enables broadcast loads to be commuted and allows normal loads to be folded without the peephole pass. llvm-svn: 315274	2017-10-10 06:36:46 +00:00
Reid Kleckner	ab23dace56	[MC] Suppress .Lcfi labels when emitting textual assembly Summary: This suppresses the generation of .Lcfi labels in our textual assembler. It was annoying that this generated cascading .Lcfi labels: llc foo.ll -o - \| llvm-mc \| llvm-mc After three trips through MCAsmStreamer, we'd have three labels in the output when none are necessary. We should only bother creating the labels and frame data when making a real object file. This supercedes D38605, which moved the entire .seh_ implementation into MCObjectStreamer. This has the advantage that we do more checking when emitting textual assembly, as a minor efficiency cost. Outputting textual assembly is not performance critical, so this shouldn't matter. Reviewers: majnemer, MatzeB Subscribers: qcolombet, nemanjai, javed.absar, eraman, hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D38638 llvm-svn: 315259	2017-10-10 00:57:36 +00:00
Aditya Nandakumar	c3bfc81a1f	[GISel]: Fix generation of illegal COPYs during CallLowering We end up creating COPY's that are either truncating/extending and this should be illegal. https://reviews.llvm.org/D37640 Patch for X86 and ARM by igorb, rovka llvm-svn: 315240	2017-10-09 20:07:43 +00:00
Zvi Rackover	c1d5955684	[X86] Unsigned saturation subtraction canonicalization [the backend part] Summary: On behalf of julia.koval@intel.com The patch transforms canonical version of unsigned saturation, which is sub(max(a,b),a) or sub(a,min(a,b)) to special psubus insturuction on targets, which support it(8bit and 16bit uints). umax(a,b) - b -> subus(a,b) a - umin(a,b) -> subus(a,b) There is also extra case handled, when right part of sub is 32 bit and can be truncated, using UMIN(this transformation was discussed in https://reviews.llvm.org/D25987). The example of special case code: ``` void foo(unsigned short p, int max, int n) { int i; unsigned m; for (i = 0; i < n; i++) { m = --p; p = (unsigned short)(m >= max ? m-max : 0); } } ``` Max in this example is truncated to max_short value, if it is greater than m, or just truncated to 16 bit, if it is not. It is vaid transformation, because if max > max_short, result of the expression will be zero. Here is the table of types, I try to support, special case items are bold: \| Size \| 128 \| 256 \| 512 \| ----- \| ----- \| ----- \| ----- \| i8 \| v16i8 \| v32i8 \| v64i8 \| i16 \| v8i16 \| v16i16 \| v32i16 \| i32 \| \| v8i32* \| v16i32 \| i64 \| \| \| v8i64 Reviewers: zvi, spatel, DavidKreitzer, RKSimon Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37534 llvm-svn: 315237	2017-10-09 20:01:10 +00:00
Sanjay Patel	2a61a821a0	[DAG] combine assertsexts around a trunc This was a suggested follow-up to: D37017 / https://reviews.llvm.org/rL313577 llvm-svn: 315206	2017-10-09 15:22:20 +00:00
Amara Emerson	24ca39ce71	[AArch64] Improve codegen for inverted overflow checking intrinsics E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL. rdar://28495949 Reviewed By: kristof.beyls Differential Revision: https://reviews.llvm.org/D38160 llvm-svn: 315205	2017-10-09 15:15:09 +00:00
Sanjay Patel	8557e29408	[x86] regenerate test checks; NFC llvm-svn: 315204	2017-10-09 15:01:58 +00:00
Sanjay Patel	be37ab864c	[AArch64] fix typos in test assertions llvm-svn: 315203	2017-10-09 01:29:54 +00:00
Craig Topper	4f8656a7af	[X86] Enable extended comparison predicate support for SETUEQ/SETONE when targeting AVX instructions. We believe that despite AMD's documentation, that they really do support all 32 comparision predicates under AVX. Differential Revision: https://reviews.llvm.org/D38609 llvm-svn: 315201	2017-10-09 01:05:15 +00:00
Simon Pilgrim	135a2639f4	[X86][SSE] Add test case for PR27708 llvm-svn: 315186	2017-10-08 19:18:10 +00:00
Craig Topper	977c546b0c	[X86] Regenerate fast-isel-select-pseudo-cmov.ll to prepare for D38609. llvm-svn: 315184	2017-10-08 17:54:50 +00:00
Simon Pilgrim	dc32c844f9	[X86] getTargetConstantBitsFromNode - add support for decoding scalar constants llvm-svn: 315182	2017-10-08 17:21:18 +00:00
Craig Topper	c97775c03c	[X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181	2017-10-08 16:57:23 +00:00
Amara Emerson	1f83777bd3	[AArch64][GlobalISel] Add a test case for G_PHI of p0 instruction selection. llvm-svn: 315179	2017-10-08 15:29:35 +00:00
Amara Emerson	e205cab66f	[AArch64][GlobalISel] Add a test case for G_PHI of p0 regbank selection. llvm-svn: 315178	2017-10-08 15:29:31 +00:00
Amara Emerson	1cd89ca669	[AArch64][GlobalISel] Make G_PHI of p0 types legal. Differential Revision: https://reviews.llvm.org/D38621 llvm-svn: 315177	2017-10-08 15:29:11 +00:00
Simon Pilgrim	6410ca70aa	[X86][XOP] Add XOP oddshuffles tests XOP codegen is often different to generic AVX - thank you vpperm! llvm-svn: 315176	2017-10-08 12:58:15 +00:00
Gadi Haber	684944b822	[X86][SKX] Adding the scheduling information for the SKX target. Adding the scheduling information for the SkylakeServer (SKX) target. This patch adds the instruction scheduling information for the SkylakeServer (SKX) architecture target by adding the file X86SchedSkylakeServer.td located under the X86 Target. We used the scheduling information retrieved from the Skylake architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each SKL instruction. The patch continues the scheduling replacement and insertion effort started with the SNB target in r310792, the HSW target in r311879 and the SkylakeClient (SKL) target in rL313613. Please expect some performance fluctuations due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper, chandlerc, aymanmu Differential Revision: https://reviews.llvm.org/D38443 Change-Id: I5c228fcc09e9e5a99b6116e62b356c4f9b971185 llvm-svn: 315175	2017-10-08 12:52:54 +00:00
Craig Topper	bbca2f2978	[X86] Stop LowerSIGN_EXTEND_AVX512 from creating v8i16/v16i16/v16i8 vselects with a v8i1/v16i1 condition when BWI is not available. Some of the tests in vector-shuffle-v1.ll would get into an infinite loop without this. llvm-svn: 315172	2017-10-08 08:50:59 +00:00
Craig Topper	27170fee8d	[X86] If we see an insert of a bitcast into zero vector, canonicalize it to move the bitcast to the other side of the insert. This improves detection of zeroing of upper bits during isel. llvm-svn: 315161	2017-10-08 01:33:41 +00:00
Simon Pilgrim	9508fe7924	[X86][SSE] Match bitcasted BUILD_VECTOR of constants for v2i64 shifts on 64-bit targets (PR34855) Extension to rL315155, generate constant shifts on 64-bits as well as 32-bits. llvm-svn: 315156	2017-10-07 17:57:22 +00:00
Simon Pilgrim	70e1db78db	[X86][SSE] Match bitcasted v4i32 BUILD_VECTORS for v2i64 shifts on 64-bit targets (PR34855) We were already doing this for 32-bit targets, but we can generate these on 64-bits as well. llvm-svn: 315155	2017-10-07 17:42:17 +00:00
Craig Topper	2f60295364	[X86] Add X86ISD::CMOV to computeKnownBitsForTargetNode and ComputeNumSignBitsForTargetNode. Summary: Implementations based on ISD::SELECT. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38663 llvm-svn: 315153	2017-10-07 16:51:19 +00:00
Simon Pilgrim	73f143e774	[X86][SSE] Improve shuffling combining with horizontal operations Recognise cases when we can merge the shuffles with their horizontal (HADD/HSUB/PACK) instruction inputs. Replaces an older implementation which performed some of this during lowering, expanding an existing target shuffle combine stage instead. Differential Revision: https://reviews.llvm.org/D38506 llvm-svn: 315150	2017-10-07 12:42:23 +00:00
Jessica Paquette	13593843f6	[MachineOutliner] Disable outlining from LinkOnceODRs by default Say you have two identical linkonceodr functions, one in M1 and one in M2. Say that the outliner outlines A,B,C from one function, and D,E,F from another function (where letters are instructions). Now those functions are not identical, and cannot be deduped. Locally to M1 and M2, these outlining choices would be good-- to the whole program, however, this might not be true! To mitigate this, this commit makes it so that the outliner sees linkonceodr functions as unsafe to outline from. It also adds a flag, -enable-linkonceodr-outlining, which allows the user to specify that they want to outline from such functions when they know what they're doing. Changing this handles most code size regressions in the test suite caused by competing with linker dedupe. It also doesn't have a huge impact on the code size improvements from the outliner. There are 6 tests that regress > 5% from outlining WITH linkonceodrs to outlining WITHOUT linkonceodrs. Overall, most tests either improve or are not impacted. Not outlined vs outlined without linkonceodrs: https://hastebin.com/raw/qeguxavuda Not outlined vs outlined with linkonceodrs: https://hastebin.com/raw/edepoqoqic Outlined with linkonceodrs vs outlined without linkonceodrs: https://hastebin.com/raw/awiqifiheb Numbers generated using compare.py with -m size.__text. Tests run for AArch64 with -Oz -mllvm -enable-machine-outliner -mno-red-zone. llvm-svn: 315136	2017-10-07 00:16:34 +00:00
Cameron McInally	9d64101fe8	[AVX512] Fix TERNLOG when folding broadcast Patch to fix ternlog instructions with a folded broadcast. The broadcast decorator, e.g. {1toX}, was missing. Differential Revision: https://reviews.llvm.org/D38649 llvm-svn: 315122	2017-10-06 22:31:29 +00:00
Amara Emerson	ba0f79af92	[GlobalISel] Fix legalizer trying to process a deleted instruction. In some cases an instruction is deleted from the block during combining, however it can still exist in the legalizer worklist. This change modifies the combiner routines to add the given MI to the dead instruction list rather than trying to remove it from the block themselves. The responsibility is then on the caller to delete the instruction from the block and any worklists. Differential Revision: https://reviews.llvm.org/D38622 llvm-svn: 315092	2017-10-06 19:24:15 +00:00
Diana Picus	57285d7a43	[ARM] GlobalISel: Make tests less strict These are intended as integration tests, so they shouldn't be too specific about what they're checking. llvm-svn: 315083	2017-10-06 17:47:27 +00:00
Stanislav Mekhanoshin	de42c29a68	[AMDGPU] New 64 bit div/rem expansion Old expansion was 20 VGPRs, 78 SGPRs and ~380 instructions. This expansion is 11 VGPRs, 12 SGPRs and ~120 instructions. Passes OpenCL conformance test_integer_ops quick_[u]long_math Differential Revision: https://reviews.llvm.org/D38607 llvm-svn: 315081	2017-10-06 17:24:45 +00:00
Diana Picus	e393bc72ee	[ARM] GlobalISel: Select shifts Unfortunately TableGen doesn't handle this yet: Unable to deduce gMIR opcode to handle Src (which is a leaf). Just add some temporary hand-written code to generate the proper MOVsr. llvm-svn: 315071	2017-10-06 15:39:16 +00:00
Diana Picus	a81a4b17e5	[ARM] GlobalISel: Map shift operands to GPRs llvm-svn: 315067	2017-10-06 14:52:43 +00:00
Diana Picus	2c95730450	[ARM] GlobalISel: Mark shifts as legal for s32 The new legalize combiner introduces shifts all over the place, so we should support them sooner rather than later. llvm-svn: 315064	2017-10-06 14:30:05 +00:00
Jonas Paulsson	c63ed222b8	[SystemZ] Enable machine scheduler. The machine scheduler (before register allocation) is enabled by default for SystemZ. The SelectionDAG scheduling preference now becomes source order scheduling (was regpressure). Review: Ulrich Weigand https://reviews.llvm.org/D37977 llvm-svn: 315063	2017-10-06 13:59:28 +00:00
Simon Pilgrim	a29dbdf2ca	[X86][SSE] Add SKX cpu tests to SSE/AVX scheduling tests (D38443) llvm-svn: 315061	2017-10-06 13:40:29 +00:00
Derek Schuff	885dc59297	[WebAssembly] Add the rest of the atomic loads Add extending loads and constant offset patterns A bit more refactoring of the tablegen to make the patterns fairly nice and uniform between the regular and atomic loads. Differential Revision: https://reviews.llvm.org/D38523 llvm-svn: 315022	2017-10-05 21:18:42 +00:00
Balaram Makam	7c79470c71	[ARM/AARCH64] Make test MachineBranchProb.ll more robust and re-enable for ARM/AArch64 Summary: Make test robust enough to not fail due to CFG changes and re-enable for ARM/AArch64. Reviewers: rovka, fhahn Reviewed By: fhahn Subscribers: fhahn, aemerson, rengolin, mcrosier, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38590 llvm-svn: 315002	2017-10-05 18:33:34 +00:00
Davide Italiano	ff829cea8b	[PassManager] Run global optimizations after the inliner. The inliner performs some kind of dead code elimination as it goes, but there are cases that are not really caught by it. We might at some point consider teaching the inliner about them, but it is OK for now to run GlobalOpt + GlobalDCE in tandem as their benefits generally outweight the cost, making the whole pipeline faster. This fixes PR34652. Differential Revision: https://reviews.llvm.org/D38154 llvm-svn: 314997	2017-10-05 18:06:37 +00:00
Matt Arsenault	2d3f8f333d	AMDGPU: Set v2i32 any_extend to expand llvm-svn: 314993	2017-10-05 17:38:30 +00:00
Artur Pilipenko	7b15254c8f	[X86] Fix chains update when lowering BUILD_VECTOR to a vector load The code which lowers BUILD_VECTOR of consecutive loads into a single vector load doesn't update chains properly. As a result the vector load can be reordered with the store to the same location. The current code in EltsFromConsecutiveLoads only updates the chain following the first load. The fix is to update the chains following all the loads comprising the vector. This is a fix for PR10114. Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D38547 llvm-svn: 314988	2017-10-05 16:28:21 +00:00
Konstantin Zhuravlyov	aa0835a7ab	AMDGPU: Add and set AMDGPU-specific e_flags Differential Revision: https://reviews.llvm.org/D38556 llvm-svn: 314987	2017-10-05 16:19:18 +00:00
Matt Arsenault	aafff87dda	AMDGPU: Do not fold clamp instructions when sources are different Patch by hakzsam (Samuel Pitoiset) llvm-svn: 314951	2017-10-05 00:13:17 +00:00
Matt Arsenault	9ab1fa6803	AMDGPU: Fix not accounting for instruction size in bundles These were counted as 0. Fixes branch limit exceeded errors in some large programs. llvm-svn: 314944	2017-10-04 22:59:12 +00:00
Konstantin Zhuravlyov	8684f7b4f9	AMDGPU: Correctly set EI_OSABI based on the os Differential Revision: https://reviews.llvm.org/D38555 llvm-svn: 314943	2017-10-04 22:44:13 +00:00
Simon Pilgrim	9edbe110e8	[X86][AVX] Improve (i8 bitcast (v8i1 x)) handling for v8i64/v8f64 512-bit vector compare results. AVX1/AVX2 targets were missing a chance to use vmovmskps for v8f32/v8i32 results for bool vector bitcasts llvm-svn: 314921	2017-10-04 18:00:42 +00:00
Hans Wennborg	2a6c9adb2f	Revert r314886 "[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.)" It broke the Chromium / SQLite build; see PR34830. > Summary: > 1/ Operand folding during complex pattern matching for LEAs has been > extended, such that it promotes Scale to accommodate similar operand > appearing in the DAG. > e.g. > T1 = A + B > T2 = T1 + 10 > T3 = T2 + A > For above DAG rooted at T3, X86AddressMode will no look like > Base = B , Index = A , Scale = 2 , Disp = 10 > > 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs > so that if there is an opportunity then complex LEAs (having 3 operands) > could be factored out. > e.g. > leal 1(%rax,%rcx,1), %rdx > leal 1(%rax,%rcx,2), %rcx > will be factored as following > leal 1(%rax,%rcx,1), %rdx > leal (%rdx,%rcx) , %edx > > 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, > thus avoiding creation of any complex LEAs within a loop. > > Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy > > Reviewed By: lsaba > > Subscribers: jmolloy, spatel, igorb, llvm-commits > > Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314919	2017-10-04 17:54:06 +00:00
Simon Pilgrim	b47b3f2564	[X86][SSE] Add support for lowering v8i16 binary shuffles to PACKSS/PACKUS Missed in D38472 llvm-svn: 314916	2017-10-04 17:31:28 +00:00
Craig Topper	6fb55716e9	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Balaram Makam	45ff83ce1e	"[ARM] Mark flaky test MachineBranchProb.ll unsupported again for ARM/AArch64" r314857 changed the CFG that resulted in the flaky test MachineBranchProb.ll to fail the bots again. Marking it as unsupported for ARM/AArch64 again until we find the cause. llvm-svn: 314912	2017-10-04 16:45:24 +00:00
Simon Pilgrim	bd5d2f0284	[X86][SSE] Add support for lowering unary shuffles to PACKSS/PACKUS Extension to D38472 llvm-svn: 314901	2017-10-04 13:12:08 +00:00
Dylan McKay	d00f9c1ef1	[AVR] Elaborate LDWRdPtr into `ld r, X++; ld r+1, X` Patch by Gergo Erdi. llvm-svn: 314896	2017-10-04 10:33:36 +00:00
Dylan McKay	39069208d5	[AVR] Insert JMP for long branches Previously, on long branches (relative jumps of >4 kB), an assertion failure was hit, as AVRInstrInfo::insertIndirectBranch was not implemented. Despite its name, it is called by the branch relaxator for all unconditional jumps. Patch by Thomas Backman. llvm-svn: 314891	2017-10-04 09:51:28 +00:00
Dylan McKay	c4b002bf5a	[AVR] Fix displacement overflow for LDDW/STDW In some cases, the code generator attempts to generate instructions such as: lddw r24, Y+63 which expands to: ldd r24, Y+63 ldd r25, Y+64 # Oops! This is actually ld r25, Y in the binary This commit limits the first offset to 62, and thus the second to 63. It also updates some asserts in AVRExpandPseudoInsts.cpp, including for INW and OUTW, which appear to be unused. Patch by Thomas Backman. llvm-svn: 314890	2017-10-04 09:51:21 +00:00
Jatin Bhateja	3c29bacd43	[X86] Improvement in CodeGen instruction selection for LEAs (re-applying post required revision changes.) Summary: 1/ Operand folding during complex pattern matching for LEAs has been extended, such that it promotes Scale to accommodate similar operand appearing in the DAG. e.g. T1 = A + B T2 = T1 + 10 T3 = T2 + A For above DAG rooted at T3, X86AddressMode will no look like Base = B , Index = A , Scale = 2 , Disp = 10 2/ During OptimizeLEAPass down the pipeline factorization is now performed over LEAs so that if there is an opportunity then complex LEAs (having 3 operands) could be factored out. e.g. leal 1(%rax,%rcx,1), %rdx leal 1(%rax,%rcx,2), %rcx will be factored as following leal 1(%rax,%rcx,1), %rdx leal (%rdx,%rcx) , %edx 3/ Aggressive operand folding for AM based selection for LEAs is sensitive to loops, thus avoiding creation of any complex LEAs within a loop. Reviewers: lsaba, RKSimon, craig.topper, qcolombet, jmolloy Reviewed By: lsaba Subscribers: jmolloy, spatel, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35014 llvm-svn: 314886	2017-10-04 09:02:10 +00:00
Mikael Holmen	a1a3f5c5e6	Recommit [UnreachableBlockElim] Use COPY if PHI input is undef This time invoking llc with "-march=x86-64" in the testcase, so we don't assume the default target is x86. Summary: If we have %vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3 %vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0 then we can't just change %vreg0 into %vreg3, since %vreg2 is actually undef. We would have to also copy the undef flag to be able to change the register. Instead we deal with this case like other cases where we can't just replace the register: we insert a COPY. The code creating the COPY already copied all flags from the PHI input, so the undef flag will be transferred as it should. Reviewers: kparzysz Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38235 llvm-svn: 314882	2017-10-04 07:42:45 +00:00
Mikael Holmen	75b1992f78	Revert r314879 "[UnreachableBlockElim] Use COPY if PHI input is undef" Build-bots broke on the new testcase. I'll investigate and fix. llvm-svn: 314880	2017-10-04 06:39:22 +00:00
Mikael Holmen	65eb2f394c	[UnreachableBlockElim] Use COPY if PHI input is undef Summary: If we have %vreg0<def> = PHI %vreg2<undef>, <BB#0>, %vreg3, <BB#2>; GR32:%vreg0,%vreg2,%vreg3 %vreg3<def,tied1> = ADD32ri8 %vreg0<kill,tied0>, 1, %EFLAGS<imp-def>; GR32:%vreg3,%vreg0 then we can't just change %vreg0 into %vreg3, since %vreg2 is actually undef. We would have to also copy the undef flag to be able to change the register. Instead we deal with this case like other cases where we can't just replace the register: we insert a COPY. The code creating the COPY already copied all flags from the PHI input, so the undef flag will be transferred as it should. Reviewers: kparzysz Reviewed By: kparzysz Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38235 llvm-svn: 314879	2017-10-04 06:06:31 +00:00
Martin Storsjo	e14145dcb0	[X86] Fix using the SJLJ jump table on x86_64 The previous version didn't work if the jump table base address didn't fit in 32 bit, since it was encoded as an immediate offset. And in case the jump table is encoded as 32 bit label differences, we need to load and add them to the table base first. This solves the first half of the issues mentioned in PR34720. Also fix some of the errors pointed out by -verify-machineinstrs, by using GR32_NOSPRegClass. Differential Revision: https://reviews.llvm.org/D38333 llvm-svn: 314876	2017-10-04 05:12:10 +00:00
Balaram Makam	e0c43152b5	[AArch64] Use LateSimplifyCFG after expanding atomic operations. Summary: After r308422 we defer optimizations that can destroy loop canonical forms to LateSimplifyCFG. Running LateSimplifyCFG after expanding atomic operations can exploit more control-flow opportunities. Reviewers: mcrosier, t.p.northover, efriedma Reviewed By: efriedma Subscribers: aemerson, rengolin, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38262 llvm-svn: 314857	2017-10-03 22:39:24 +00:00
Konstantin Zhuravlyov	908fa90b51	AMDGPU: Expand setcc for v2i32 and v4i32 llvm-svn: 314852	2017-10-03 21:31:24 +00:00
Jessica Paquette	acc15e1265	[MachineOutliner] Fix off-by-one in cost model This commit does two things. Firstly, it cleans up some of the benefit calculation wrt outlined functions and candidates. Secondly, it fixes an off-by-one bug in the cost model which was caused by the benefit value of an OutlinedFunction and Candidate differing by 1. It updates the remarks test to reflect this change. llvm-svn: 314836	2017-10-03 20:32:55 +00:00
Tim Renouf	72800f0436	[AMDGPU] implemented pal metadata Summary: For the amdpal OS type: We write an AMDGPU_PAL_METADATA record in the .note section in the ELF (or as an assembler directive). It contains key=value pairs of 32 bit ints. It is a merge of metadata from codegen of the shaders, and metadata provided by the frontend as _amdgpu_pal_metadata IR metadata. Where both sources have a key=value with the same key, the two values are ORed together. This .note record is part of the amdpal ABI and will be documented in docs/AMDGPUUsage.rst in a future commit. Eventually the amdpal OS type will stop generating the .AMDGPU.config section once the frontend has safely moved over to using the .note records above instead of .AMDGPU.config. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D37753 llvm-svn: 314829	2017-10-03 19:03:52 +00:00
Alexander Timofeev	4651396584	[AMDGPU] Avoid predicated execution of the basic blocks containing scalar instructions. Differential revision: https://reviews.llvm.org/D38293 llvm-svn: 314828	2017-10-03 18:55:36 +00:00
Simon Pilgrim	261a6c29ea	[X86] Add non-SSE tests for PR15215 as well llvm-svn: 314815	2017-10-03 17:04:36 +00:00
Geoff Berry	fabedbad11	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. llvm-svn: 314814	2017-10-03 16:59:13 +00:00
Simon Pilgrim	46a804cfd9	[X86][SSE] Add bool vector extraction test cases from PR15215 llvm-svn: 314813	2017-10-03 16:56:57 +00:00
Simon Dardis	055192ccd3	[mips] Enable spilling and reloading of the dsp register set. The dsp register class is an alias of the gpr register class, so we have to define instructions for spilling and reloading. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D38038 llvm-svn: 314798	2017-10-03 13:45:49 +00:00
Simon Pilgrim	cf99d069c3	[X86][SSE] Add support for decoding PACKSS/PACKUS shuffles masks with UNDEF llvm-svn: 314792	2017-10-03 12:41:39 +00:00
Simon Pilgrim	f5f291d129	[X86][SSE] Add support for lowering shuffles to PACKSS/PACKUS If the upper bits of a truncation shuffle patterns have at least the minimum number of sign/zero bits on their inputs then we can safely use PACKSS/PACKUS as shuffles. Partial fix for https://bugs.llvm.org/show_bug.cgi?id=34773 Differential Revision: https://reviews.llvm.org/D38472 llvm-svn: 314788	2017-10-03 12:01:31 +00:00
Bjorn Pettersson	90cc1b53d0	[DebugInfo] Handle endianness when moving debug info for split integer values (reapplied) Summary: Take the target's endianness into account when splitting the debug information in DAGTypeLegalizer::SetExpandedInteger. This patch fixes so that, for big-endian targets, the fragment expression corresponding to the high part of a split integer value is placed at offset 0, in order to correctly represent the memory address order. I have attached a PPC32 reproducer where the resulting DWARF pieces for a 64-bit integer were incorrectly reversed. Original patch was reverted due to using -stop-after=isel in the test case (but that is only working when AMDGPU target is included in the llc build). The test case has now been updated to use -stop-before=expand-isel-pseudos instead. Patch by: dstenb Reviewers: JDevlieghere, aprantl, dblaikie Reviewed By: JDevlieghere, aprantl, dblaikie Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D38172 llvm-svn: 314781	2017-10-03 11:03:02 +00:00
Simon Pilgrim	640fbf5132	[X86][SSE] Add support for shuffle combining from PACKSS/PACKUS Mentioned in D38472 llvm-svn: 314777	2017-10-03 09:54:03 +00:00
Simon Pilgrim	19d535e75b	[X86][SSE] Add support for PACKSS/PACKUS constant folding Pulled out of D38472 llvm-svn: 314776	2017-10-03 09:41:00 +00:00
Martin Storsjo	1e54738676	[X86] Provide the LSDA pointer with RIP relative addressing if necessary This makes sure the LSDA pointer isn't truncated to 32 bit. Make LowerINTRINSIC_WO_CHAIN a member function instead of a static function, so that it can use the getGlobalWrapperKind method. This solves the second half of the issues mentioned in PR34720. Differential Revision: https://reviews.llvm.org/D38343 llvm-svn: 314767	2017-10-03 06:29:58 +00:00
Quentin Colombet	c2f3cea608	[Legalizer] Add support for G_OR NarrowScalar. Legalize bitwise OR: A = BinOp<Ty> B, C into: B1, ..., BN = G_UNMERGE_VALUES B C1, ..., CN = G_UNMERGE_VALUES C A1 = BinOp<Ty/N> B1, C2 ... AN = BinOp<Ty/N> BN, CN A = G_MERGE_VALUES A1, ..., AN llvm-svn: 314760	2017-10-03 04:53:56 +00:00
Tim Shen	59465d29f8	[PowerPC] Revert r314666. See https://reviews.llvm.org/D38172. I tried to XFAIL it, but sometimes XPASS triggers the bot. Simply revert it. llvm-svn: 314739	2017-10-02 23:20:06 +00:00
Tim Shen	7b8928c729	[PowerPC] Temporarily disable the test introduced by r314666 See https://reviews.llvm.org/D38172 for details. llvm-svn: 314732	2017-10-02 22:40:32 +00:00
Geoff Berry	bfc5fb4571	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729	2017-10-02 22:01:37 +00:00
Matt Arsenault	b866d10127	AMDGPU: Fix potentially incorrectly matching check lines These check lines are supposed to make sure the new d16 load instructions aren't used, but the expected instruction name is a prefix of the incorrect instruction name. llvm-svn: 314714	2017-10-02 20:31:16 +00:00
Walter Lee	35b09cbd42	Add support for Myriad ma2x8x series of CPUs Summary: Also add support for some older Myriad CPUs that were missing. Reviewers: jyknight Subscribers: fedor.sergeev Differential Revision: https://reviews.llvm.org/D37552 llvm-svn: 314705	2017-10-02 18:50:48 +00:00
Stanislav Mekhanoshin	6deb212522	Eliminate ftrunc if source is know to be rounded Differential Revision: https://reviews.llvm.org/D38421 llvm-svn: 314688	2017-10-02 16:57:07 +00:00
Simon Pilgrim	65bde8806f	[X86][SSE] Add PACKSS/PACKUS constant folding tests llvm-svn: 314682	2017-10-02 15:43:26 +00:00
Simon Pilgrim	ec528b2cb6	Regenerate test (missing broadcast constant comments). NFCI. Still avoiding the floating point comments to prevent linux/windows discrepancies. llvm-svn: 314681	2017-10-02 15:22:35 +00:00
Simon Pilgrim	050f60370c	Regenerate test (missing broadcast constant comments). NFCI. llvm-svn: 314680	2017-10-02 15:21:14 +00:00
Simon Pilgrim	037f6d10d5	Regenerate test. NFCI. llvm-svn: 314679	2017-10-02 15:16:30 +00:00
Bjorn Pettersson	839775a277	[Debug info] Handle endianness when moving debug info for split integer values Summary: Take the target's endianness into account when splitting the debug information in DAGTypeLegalizer::SetExpandedInteger. This patch fixes so that, for big-endian targets, the fragment expression corresponding to the high part of a split integer value is placed at offset 0, in order to correctly represent the memory address order. I have attached a PPC32 reproducer where the resulting DWARF pieces for a 64-bit integer were incorrectly reversed. Patch by: dstenb Reviewers: JDevlieghere, aprantl, dblaikie Reviewed By: JDevlieghere, aprantl, dblaikie Subscribers: nemanjai Differential Revision: https://reviews.llvm.org/D38172 llvm-svn: 314666	2017-10-02 12:46:32 +00:00
Hiroshi Inoue	dcedd66b00	[PowerPC] support ZERO_EXTEND in tryBitPermutation This patch add a support of ISD::ZERO_EXTEND in PPCDAGToDAGISel::tryBitPermutation to increase the opportunity to use rotate-and-mask by reordering ZEXT and ANDI. Since tryBitPermutation stops analyzing nodes if it hits a ZEXT node while traversing SDNodes, we want to avoid ZEXT between two nodes that can be folded into a rotate-and-mask instruction. For example, we allow these nodes t9: i32 = add t7, Constant:i32<1> t11: i32 = and t9, Constant:i32<255> t12: i64 = zero_extend t11 t14: i64 = shl t12, Constant:i64<2> to be folded into a rotate-and-mask instruction. Such case often happens in array accesses with logical AND operation in the index, e.g. array[i & 0xFF]; Differential Revision: https://reviews.llvm.org/D37514 llvm-svn: 314655	2017-10-02 09:24:00 +00:00
Michael Zuckerman	e4084f6bdb	[X86][LLVM]Expanding Supports lowerInterleaved{store\|load}() in X86InterleavedAccess (VF64 stride 3-4) I continue to support different VF interleaved and in this pass for this patch, I added the vf64 stride3 support for both load and store. I also added support fot the stride4 store. Reviewers: 1. zvi 2. dorit 3. igorb 4. guyblank Differential Revision: https://reviews.llvm.org/D37687 Change-Id: I3d238efedf217d1768b348d710de1efa2f19d27b llvm-svn: 314651	2017-10-02 07:35:25 +00:00
Ron Lieberman	9bcdd80b66	[Hexagon] Check vector elements for equivalence in the HexagonVectorLoopCarriedReuse pass If the two instructions being compared for equivalence have corresponding operands that are integer constants, then check their values to determine equivalence. Patch by Suyog Sarda! llvm-svn: 314642	2017-10-02 00:34:07 +00:00
Ron Lieberman	f90493d220	[Hexagon] Patch to Extract i1 element from vector of i1 This patch extracts 1 element from vector consisting of elements of size 1 bit at given index. llvm-svn: 314641	2017-10-02 00:16:15 +00:00
Craig Topper	c20b46da2f	[X86] Change register&memory TEST instructions from MRMSrcMem to MRMDstMem Summary: Intel documentation shows the memory operand as the first operand. But we currently treat it as the second operand. Conceptually the order doesn't matter since it doesn't write memory. We have aliases to parse with the operands in either order and the isel matching is commutable. For the register&register form order does matter for the assembly parser. PR22995 was previously filed and fixed by changing the register&register form from MRMSrcReg to MRMDestReg to match gas. Ideally the memory form should match by using MRMDestMem. I believe this supercedes D38025 which was trying to switch the register&register form back to pre-PR22995. Reviewers: aymanmus, RKSimon, zvi Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38120 llvm-svn: 314639	2017-10-01 23:53:53 +00:00
Simon Pilgrim	df23a2700d	[X86][SSE] Add faux shuffle combining support for PACKUS llvm-svn: 314631	2017-10-01 18:43:48 +00:00
Simon Pilgrim	4f255ad6a0	[X86][AVX2] Simplify PACKUS combine test Trying to use a AND mask is tricky as after legalization its nigh impossible for computeKnownBits to do anything with it llvm-svn: 314630	2017-10-01 18:17:39 +00:00
Simon Pilgrim	836fa6dcfd	[X86][SSE] Improve shuffle combining of PACKSS instructions. Support unary packing and fix the faux shuffle mask for vectors larger than 128 bits. llvm-svn: 314629	2017-10-01 17:54:55 +00:00
Simon Pilgrim	d25c200cd6	[X86][SSE] Add shuffle combining tests with PACKSS/PACKUS llvm-svn: 314628	2017-10-01 17:30:44 +00:00
Jina Nahias	98c7f91e54	pre-commit adding test for broadcastm pattern Differential Revision: https://reviews.llvm.org/D38312 Change-Id: Ifbc4189549f2f59995019a86f85f989c04e4d37d llvm-svn: 314626	2017-10-01 14:25:21 +00:00
Michael Zuckerman	1746895490	Adding test for interleved, case stride 4 vf64 store<NFC>. Change-Id: I9ea62aac81b763c83d26613dca6fcd846997a017 llvm-svn: 314621	2017-10-01 09:37:38 +00:00
Xin Tong	bffac0eb81	Fix typo. NFC llvm-svn: 314615	2017-10-01 00:10:52 +00:00
Xin Tong	c063c3f09d	Revert "Fix typo [NFC]" This reverts commit e60b5028619be1c81bd039d63a0627dac32d38f9. Incorrectly include changes that are not typo fix. llvm-svn: 314614	2017-10-01 00:09:53 +00:00
Xin Tong	efec219e1b	Fix typo [NFC] llvm-svn: 314613	2017-10-01 00:07:24 +00:00
Simon Pilgrim	1fffcc4580	Regenerate mul combine tests to update broadcast comment. llvm-svn: 314607	2017-09-30 22:27:46 +00:00
Simon Pilgrim	a8dd6f4f30	[X86][SSE] Fold (VSRAI (VSHLI X, C1), C1) --> X iff NumSignBits(X) > C1 Remove sign extend in register style pattern if the sign is already extended enough llvm-svn: 314599	2017-09-30 17:57:34 +00:00
Craig Topper	619569841a	[AVX-512] Add patterns to make fp compare instructions commutable during isel. llvm-svn: 314598	2017-09-30 17:02:39 +00:00
Simon Pilgrim	5bd43bce07	[X86][SSE] Add vector truncation cases inspired by PR34773 We should be using PACKSS/PACKUS more aggressively when we know the state of the upper bits llvm-svn: 314597	2017-09-30 16:14:59 +00:00
Michael Zuckerman	b92b6d424f	Code refactoring for the interleaved code <NFC> Change-Id: I7831c9febad8e14278a5bc87584a0053dc837be1 llvm-svn: 314596	2017-09-30 14:55:03 +00:00
Gadi Haber	c3b33f0f0d	[X86][SKX] Added codegen regression test for avx512 instructions scheduling.NFC. NFC. Added code gen regression tests for avx512 instructions scheduling called avx512-schedule.ll and avx512-shuffle-schedule.ll. This patch is in preparation of a larger patch of adding all SKX instruction scheduling and therefore the scheduling for the avx512 instructions are still missing. Reviewers: zvi, delena, RKSimon, igorb Differential Revision: https://reviews.llvm.org/D38035 Change-Id: I792762763127a921b9e13684b58af03646536533 llvm-svn: 314594	2017-09-30 14:30:23 +00:00
Craig Topper	d92ade96f4	[X86] Support v64i8 mulhu/mulhs Implemented by splitting into two v32i8 mulhu/mulhs and concatenating the results. Differential Revision: https://reviews.llvm.org/D38307 llvm-svn: 314584	2017-09-30 04:21:46 +00:00
Stanislav Mekhanoshin	1d8cf2be89	[AMDGPU] Set fast-math flags on functions given the options We have a single library build without relaxation options. When inlined library functions remove fast math attributes from the functions they are integrated into. This patch sets relaxation attributes on the functions after linking provided corresponding relaxation options are given. Math instructions inside the inlined functions remain to have no fast flags, but inlining does not prevent fast math transformations of a surrounding caller code anymore. Differential Revision: https://reviews.llvm.org/D38325 llvm-svn: 314568	2017-09-29 23:40:19 +00:00
Yaxun Liu	b33607e5a1	CodeGen: Fix pointer info in expandUnalignedLoad/Store Currently expandUnalignedLoad/Store uses place holder pointer info for temporary memory operand in stack, which does not have correct address space. This causes unaligned private double16 load/store to be lowered to flat_load instead of buffer_load for amdgcn target. This fixes failures of OpenCL conformance test basic/vload_private/vstore_private on target amdgcn---amdgizcl. Differential Revision: https://reviews.llvm.org/D35361 llvm-svn: 314566	2017-09-29 23:31:14 +00:00
Nicolai Haehnle	ce4ddd06da	AMDGPU: VALU carry-in and v_cndmask condition cannot be EXEC The hardware will only forward EXEC_LO; the high 32 bits will be zero. Additionally, inline constants do not work. At least, v_addc_u32_e64 v0, vcc, v0, v1, -1 which could conceivably be used to combine (v0 + v1 + 1) into a single instruction, acts as if all carry-in bits are zero. The llvm.amdgcn.ps.live test is adjusted; it would be nice to combine s_mov_b64 s[0:1], exec v_cndmask_b32_e64 v0, v1, v2, s[0:1] into v_mov_b32 v0, v3 but it's not particularly high priority. Fixes dEQP-GLES31.functional.shaders.helper_invocation.value.* llvm-svn: 314522	2017-09-29 15:37:31 +00:00
Jonas Paulsson	c9e363ac69	[SystemZ] implement shouldCoalesce() Implement shouldCoalesce() to help regalloc avoid running out of GR128 registers. If a COPY involving a subreg of a GR128 is coalesced, the live range of the GR128 virtual register will be extended. If this happens where there are enough phys-reg clobbers present, regalloc will run out of registers (if there is not a single GR128 allocatable register available). This patch tries to allow coalescing only when it can prove that this will be safe by checking the (local) interval in question. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D37899 https://bugs.llvm.org/show_bug.cgi?id=34610 llvm-svn: 314516	2017-09-29 14:31:39 +00:00
Amara Emerson	7d6c55f8aa	[X86] Improve codegen for inverted overflow checking intrinsics. Adds a new combine for: xor(setcc cc, val), 1 --> setcc (invert(cc), val) Differential Revision: https://reviews.llvm.org/D38161 llvm-svn: 314514	2017-09-29 13:53:44 +00:00
Aleksandar Beserminji	29341b88ac	[mips] Reordering callseq* nodes to be linear Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Recommitting r314497. This version does not contain test which fails when compiler is not build in debug mode. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314507	2017-09-29 11:05:02 +00:00
Aleksandar Beserminji	a0a01e7172	Revert "[mips] Reordering callseq* nodes to be linear" Added test relies on the compiler being built in debug mode, which may not be the case. This reverts commit r314497. llvm-svn: 314506	2017-09-29 10:52:03 +00:00
Simon Pilgrim	2b96841d1d	[X86][SSE] Added more tests for vector multiplications as utility for D37896 Added additional tests for vector multiplications with multipliers that are: * powers of 2 displaced by 1, * product of a power of 2 displaced by one with another power of 2. Patch by @pacxx (Michael Haidl) Differential Revision: https://reviews.llvm.org/D38350 llvm-svn: 314504	2017-09-29 10:02:01 +00:00
Tim Renouf	ef1ae8ffac	[AMDGPU] calling conventions for AMDPAL OS type Summary: This commit adds comments on how the AMDPAL OS type overloads the existing AMDGPU_ calling conventions used by Mesa, and adds a couple of new ones. Reviewers: arsenm, nhaehnle, dstuttard Subscribers: mehdi_amini, kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37752 llvm-svn: 314502	2017-09-29 09:51:22 +00:00
Tim Renouf	132291589f	[AMDGPU] AMDPAL scratch buffer support Summary: Added support for scratch (including spilling) for OS type amdpal: generates code to set up the scratch descriptor if it is needed. With amdpal, the scratch resource descriptor is loaded from offset 0 of the global information table. The low 32 bits of the address of the global information table is passed in s0. Added amdgpu-git-ptr-high function attribute to hard-wire the high 32 bits of the address of the global information table. If the function attribute is not specified, or is 0xffffffff, then the backend generates code to use the high 32 bits of pc. The documentation for the AMDPAL ABI will be added in a later commit. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye Differential Revision: https://reviews.llvm.org/D37483 llvm-svn: 314501	2017-09-29 09:49:35 +00:00
Tim Renouf	9f7ead3334	[Triple] Add AMDPAL operating system type Summary: This operating system type represents the AMDGPU PAL runtime, and will be required by the AMDGPU backend in order to generate correct code for this runtime. Currently it generates the same code as not specifying an OS at all. That will change in future commits. Patch from Tim Corringham. Subscribers: arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D37380 llvm-svn: 314500	2017-09-29 09:48:12 +00:00
Aleksandar Beserminji	502dcb035a	[mips] Reordering callseq* nodes to be linear Fix nested callseq* nodes by moving callseq_start after the arguments calculation to temporary registers, so that callseq* nodes in resulting DAG are linear. Differential Revision: https://reviews.llvm.org/D37328 llvm-svn: 314497	2017-09-29 09:32:14 +00:00
Craig Topper	6255c7b675	[X86] Don't select (cmp (and, imm), 0) to testw Summary: X86ISelDAGToDAG tries to analyze ANDs compared with 0 to optimize to narrower immediates using subregisters. I don't think we should be optimizing to 16-bit test instructions. It goes against our normal behavior of promoting i16 operations to i32. It only saves one byte due to the need to add a 0x66 prefix. I think it would also be subject to a length changing prefix penalty in the decoders on Intel CPUs. Reviewers: RKSimon, zvi, spatel Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38273 llvm-svn: 314474	2017-09-28 23:35:36 +00:00
Matthias Braun	51687912a4	ARM: Fix cases where CSI Restored bit is not cleared LR is an untypical callee saved register in that it is restored into a different register (PC) and thus does not live-out of the return block. This case requires the `Restored` flag in CalleeSavedInfo to be cleared. This fixes a number of cases where this wasn't handled correctly yet. llvm-svn: 314471	2017-09-28 23:12:06 +00:00
Sanjay Patel	4664d77316	[x86] add tests for possible insertelement to shuffle transform; NFC See PR34716 and D38316 for more discussion. llvm-svn: 314466	2017-09-28 22:27:25 +00:00
Ulrich Weigand	df86855f61	[SystemZ] Fix fall-out from r314428 The expensive-checks build bot found a problem with the r314428 commit: if CC is live after a ATOMIC_CMP_SWAPW instruction, it needs to be marked as live-in to the block after the loop the pseudo gets expanded to. This actually fixes a code-gen bug as well, since if the CC isn't live, the CR and JLH are merged to a CRJLH which doesn't actually set the condition code any more. llvm-svn: 314465	2017-09-28 22:08:25 +00:00
Craig Topper	ed19350293	[X86] Make use of vpmovwb when possible in LowerMULH If we have BWI, we can truncate in a much simpler way by using vpmovwb. This even works without VLX by using the wider zmm->ymm truncate with a subvector extract. Differential Revision: https://reviews.llvm.org/D38375 llvm-svn: 314457	2017-09-28 20:10:34 +00:00
Martin Storsjo	d6218cc385	[ARM] Restore the right frame pointer register in Int_eh_sjlj_longjmp In setupEntryBlockAndCallSites in CodeGen/SjLjEHPrepare.cpp, we fetch and store the actual frame pointer, but on return via the longjmp intrinsic, it always was restored into the r7 variable. On windows, the frame pointer should be restored into r11 instead of r7. On Darwin (where sjlj exception handling is used by default), the frame pointer is always r7, both in arm and thumb mode, and likewise, on windows, the frame pointer always is r11. On linux however, if sjlj exception handling is enabled (which it isn't by default), libcxxabi and the user code can be built in differing modes using different registers as frame pointer. Therefore, when restoring registers on a platform where we don't always use the same register depending on code mode, restore both r7 and r11. Differential Revision: https://reviews.llvm.org/D38253 llvm-svn: 314451	2017-09-28 19:04:30 +00:00
Martin Storsjo	adceba59a2	[ARM] Fix SJLJ exception handling when manually chosen on a platform where it isn't default Differential Revision: https://reviews.llvm.org/D38252 llvm-svn: 314450	2017-09-28 19:04:14 +00:00
Matthias Braun	5c3e8a450e	MIR: Serialize CaleeSavedInfo Restored flag llvm-svn: 314449	2017-09-28 18:52:14 +00:00

... 2 3 4 5 6 ...

22013 Commits