llvm-project

Commit Graph

Author	SHA1	Message	Date
Stanislav Mekhanoshin	68257700f8	[AMDGPU] Add exec copy to LiveIntervals in SILowerControlFlow::emitElse This instruction is missing from LiveIntervals. I'm not aware of any problems because of this though. Differential Revision: https://reviews.llvm.org/D28879 llvm-svn: 292521	2017-01-19 21:26:22 +00:00
Serge Rogatch	f83d2a25bf	[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 llvm-svn: 292516	2017-01-19 20:24:23 +00:00
Simon Pilgrim	db101e4d57	[X86][SSE] Improve comments describing combineTruncatedArithmetic. NFCI. llvm-svn: 292502	2017-01-19 18:18:32 +00:00
Simon Pilgrim	5f2f53b106	[X86][SSE] Attempt to pre-truncate arithmetic operations that have already been extended As discussed on D28219 - it is profitable to combine trunc(binop (s/zext(x), s/zext(y)) to binop(trunc(s/zext(x)), trunc(s/zext(y))) assuming the trunc(ext()) will simplify further llvm-svn: 292493	2017-01-19 16:25:02 +00:00
Kristof Beyls	e9412b4d47	[GlobalISel] Pointers are legal operands for G_SELECT on AArch64 Differential Revision: https://reviews.llvm.org/D28805 llvm-svn: 292481	2017-01-19 13:32:14 +00:00
Elena Demikhovsky	e01512cecf	Recommiting unsigned saturation with a bugfix. A test case that crached is added to avx512-trunc.ll. (PR31589) llvm-svn: 292479	2017-01-19 12:08:21 +00:00
Daniel Sanders	d64d5024a4	Re-commit: [globalisel] Tablegen-erate current Register Bank Information Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Changes since first commit attempt: * Added missing guards * Added more missing guards * Found and fixed a use-after-free bug involving Twine locals Reviewers: t.p.northover, ab, rovka, qcolombet Reviewed By: qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292478	2017-01-19 11:15:55 +00:00
Craig Topper	200ea31684	[AVX-512] Support ADD/SUB/MUL of mask vectors Summary: Currently we expand and scalarize these operations, but I think we should be able to implement ADD/SUB with KXOR and MUL with KAND. We already do this for scalar i1 operations so I just extended it to vectors of i1. Reviewers: zvi, delena Reviewed By: delena Subscribers: guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D28888 llvm-svn: 292474	2017-01-19 07:12:35 +00:00
Matt Arsenault	3e6f9b5773	AMDGPU: Disable some fneg combines unless nsz For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473	2017-01-19 06:35:27 +00:00
Matt Arsenault	3b99f12a4e	AMDGPU: Remove modifiers from v_div_scale_* They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472	2017-01-19 06:04:12 +00:00
Craig Topper	c227529105	[X86] Merge LowerADD and LowerSUB into a single LowerADD_SUB since they are identical. llvm-svn: 292469	2017-01-19 03:49:29 +00:00
Craig Topper	b561e66384	[AVX-512] Use VSHUF instructions instead of two inserts as fallback for subvector broadcasts that can't fold the load. llvm-svn: 292466	2017-01-19 02:34:29 +00:00
Dehao Chen	1ce8d6ca59	Add -debug-info-for-profiling to emit more debug info for sample pgo profile collection Summary: SamplePGO binaries built with -gmlt to collect profile. The current -gmlt debug info is limited, and we need some additional info: * start line of all subprograms * linkage name of all subprograms * standalone subprograms (functions that has neither inlined nor been inlined) This patch adds these information to the -gmlt binary. The impact on speccpu2006 binary size (size increase comparing with -g0 binary, also includes data for -g binary, which does not change with this patch): -gmlt(orig) -gmlt(patched) -g 433.milc 4.68% 5.40% 19.73% 444.namd 8.45% 8.93% 45.99% 447.dealII 97.43% 115.21% 374.89% 450.soplex 27.75% 31.88% 126.04% 453.povray 21.81% 26.16% 92.03% 470.lbm 0.60% 0.67% 1.96% 482.sphinx3 5.77% 6.47% 26.17% 400.perlbench 17.81% 19.43% 73.08% 401.bzip2 3.73% 3.92% 12.18% 403.gcc 31.75% 34.48% 122.75% 429.mcf 0.78% 0.88% 3.89% 445.gobmk 6.08% 7.92% 42.27% 456.hmmer 10.36% 11.25% 35.23% 458.sjeng 5.08% 5.42% 14.36% 462.libquantum 1.71% 1.96% 6.36% 464.h264ref 15.61% 16.56% 43.92% 471.omnetpp 11.93% 15.84% 60.09% 473.astar 3.11% 3.69% 14.18% 483.xalancbmk 56.29% 81.63% 353.22% geomean 15.60% 18.30% 57.81% Debug info size change for -gmlt binary with this patch: 433.milc 13.46% 444.namd 5.35% 447.dealII 18.21% 450.soplex 14.68% 453.povray 19.65% 470.lbm 6.03% 482.sphinx3 11.21% 400.perlbench 8.91% 401.bzip2 4.41% 403.gcc 8.56% 429.mcf 8.24% 445.gobmk 29.47% 456.hmmer 8.19% 458.sjeng 6.05% 462.libquantum 11.23% 464.h264ref 5.93% 471.omnetpp 31.89% 473.astar 16.20% 483.xalancbmk 44.62% geomean 16.83% Reviewers: davidxl, echristo, dblaikie Reviewed By: echristo, dblaikie Subscribers: aprantl, probinson, llvm-commits, mehdi_amini Differential Revision: https://reviews.llvm.org/D25434 llvm-svn: 292457	2017-01-19 00:44:11 +00:00
Artem Belevich	3d3f6190ab	[NVPTX] Fix lowering of fp16 ISD::FNEG. There's no neg.f16 instruction, so negation has to be done via subtraction from zero. Differential Revision: https://reviews.llvm.org/D28876 llvm-svn: 292452	2017-01-19 00:14:45 +00:00
Krzysztof Parzyszek	954dd8d9ba	[Hexagon] Remove dead defs from the live set when expanding wstores llvm-svn: 292445	2017-01-18 23:11:40 +00:00
Michael Kuperstein	d3d2925933	Revert r291670 because it introduces a crash. r291670 doesn't crash on the original testcase from PR31589, but it crashes on a slightly more complex one. PR31589 has the new reproducer. llvm-svn: 292444	2017-01-18 23:05:58 +00:00
Evandro Menezes	7960b2e19a	[AArch64] Generate literals by the little end ARM seems to prefer that long literals be formed from their little end in order to promote the fusion of the instrs pairs MOV/MOVK and MOVK/MOVK on Cortex A57 and others (v. "Cortex A57 Software Optimisation Guide", section 4.14). Differential revision: https://reviews.llvm.org/D28697 llvm-svn: 292422	2017-01-18 18:57:08 +00:00
Stanislav Mekhanoshin	a4e63ead4b	[AMDGPU] Do not allow register coalescer to create big superregs Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413	2017-01-18 17:30:05 +00:00
Kirill Bobyrev	6afbaf0944	Revert 292404 due to buildbot failures. llvm-svn: 292407	2017-01-18 16:34:25 +00:00
Kirill Bobyrev	9ad06dbe17	[X86] Minor code cleanup to fix several clang-tidy warnings. NFC llvm-svn: 292404	2017-01-18 16:15:47 +00:00
Chad Rosier	771db6f895	[Assembler] Fix crash when assembling .quad for AArch32. A 64-bit relocation does not exist in 32-bit ARMELF. Report an error instead of crashing. PR23870 Patch by Sanne Wouda (sanwou01). Differential Revision: https://reviews.llvm.org/D28851 llvm-svn: 292373	2017-01-18 15:02:54 +00:00
Florian Hahn	8485cecd3f	[thumb,framelowering] Reset NoVRegs in Thumb1FrameLowering::emitPrologue. Summary: In this function, virtual registers can be introduced (for example through calls to emitThumbRegPlusImmInReg). doScavengeFrameVirtualRegs will replace those virtual registers with concrete registers later on in PrologEpilogInserter, which sets NoVRegs again. This patch fixes the Codegen/Thumb/segmented-stacks.ll test case which failed with expensive checks. https://llvm.org/bugs/show_bug.cgi?id=27484 Reviewers: rnk, bkramer, olista01 Reviewed By: olista01 Subscribers: llvm-commits, rengolin Differential Revision: https://reviews.llvm.org/D28829 llvm-svn: 292372	2017-01-18 15:01:22 +00:00
Daniel Sanders	af76f989b5	Re-revert: [globalisel] Tablegen-erate current Register Bank Information More missing guards. My build didn't notice it due to a stale file left over from a Global ISel build. llvm-svn: 292369	2017-01-18 14:26:12 +00:00
Daniel Sanders	517b61cb69	Re-commit: [globalisel] Tablegen-erate current Register Bank Information Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Changes since last commit: The new tablegen pass is now correctly guarded by LLVM_BUILD_GLOBAL_ISEL and this should fix the buildbots however it may not be the whole fix. The previous buildbot failures suggest there may be a memory bug lurking that I'm unable to reproduce (including when using asan) or spot in the source. If they re-occur on this commit then I'll need assistance from the bot owners to track it down. Reviewers: t.p.northover, ab, rovka, qcolombet Reviewed By: qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292367	2017-01-18 14:17:50 +00:00
Sam Parker	df7c6ef96f	[ARM] Create objdump subtarget from build attrs Enable an ELFObjectFile to read the its arm build attributes to produce a target triple with a specific ARM architecture. llvm-objdump now uses this functionality to automatically produce a more accurate target. Differential Revision: https://reviews.llvm.org/D28769 llvm-svn: 292366	2017-01-18 13:52:12 +00:00
Michael Zuckerman	0c0240ce84	[X86] Improve mul combine for negative multiplayer (2^c - 1) This patch improves the mul instruction combine function (combineMul) by adding new layer of logic. In this patch, we are adding the ability to fold (mul x, -((1 << c) -1)) or (mul x, -((1 << c) +1)) into (neg(X << c) -x) or (neg((x << c) + x) respective. Differential Revision: https://reviews.llvm.org/D28232 llvm-svn: 292358	2017-01-18 09:31:13 +00:00
Renato Golin	03c5e69d07	Revert "[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier" This reverts commit r292210, as it broke the Thumb buldbot with: clang-5.0: error: the clang compiler does not support '-fxray-instrument on thumbv7-unknown-linux-gnueabihf'. llvm-svn: 292357	2017-01-18 09:08:43 +00:00
Jonas Paulsson	a9bb00d82b	[SystemZ] Proper handling of undef flag while expanding pseudo. During post-RA pseudo expansion, an 'undef' flag of the source operand should be propagated by emitGRX32Move(). Review: Ulrich Weigand llvm-svn: 292353	2017-01-18 08:32:54 +00:00
Marina Yatsina	197db00e3e	[X86] Fix for bugzilla 31576 - add support for "data32" instruction prefix This patch fixes bugzilla 31576 (https://llvm.org/bugs/show_bug.cgi?id=31576). "data32" instruction prefix was not defined in the llvm. An exception had to be added to the X86 tablegen and AsmPrinter because both "data16" and "data32" are encoded to 0x66 (but in different modes). Differential Revision: https://reviews.llvm.org/D28468 llvm-svn: 292352	2017-01-18 08:07:51 +00:00
Dan Gohman	73e3aaa61e	[WebAssembly] Update grow_memory's return type. The grow_memory instruction now returns the previous memory size. Add the return type to the LLVM intrinsic. llvm-svn: 292322	2017-01-18 01:02:45 +00:00
Justin Lebar	1cf6bf4989	[NVPTX] Support global variables of integer type larger than i64. Reviewers: tra, majnemer Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28825 llvm-svn: 292316	2017-01-18 00:29:53 +00:00
Justin Lebar	9c46450dbb	[NVPTX] Standardize asm printer on "foo \tbar". Some instructions were printed as "foo\tbar", but most are printed as "foo \bar". Standardize on the latter form. llvm-svn: 292306	2017-01-18 00:09:36 +00:00
Justin Lebar	2a2d6f0ddd	[NVPTX] Clean up nested !strconcat calls. !strconcat is a variadic function; it will concatenate an arbitrary number of strings. There's no need to nest it. llvm-svn: 292305	2017-01-18 00:09:19 +00:00
Justin Lebar	cc938fc197	[NVPTX] Implement min/max in tablegen, rather than with custom DAGComine logic. Summary: This change also lets us use max.{s,u}16. There's a vague warning in a test about this maybe being less efficient, but I could not come up with a case where the resulting SASS (sm_35 or sm_60) was different with or without max.{s,u}16. It's true that nvcc seems to emit only max.{s,u}32, but even ptxas 7.0 seems to have no problem generating efficient SASS from max.{s,u}16 (the casts up to i32 and back down to i16 seem to be implicit and nops, happening via register aliasing). In the absence of evidence, better to have fewer special cases, emit more straightforward code, etc. In particular, if a new GPU has 16-bit min/max instructions, we want to be able to use them. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28732 llvm-svn: 292304	2017-01-18 00:09:01 +00:00
Justin Lebar	7dc3d6c341	[NVPTX] Lower integer absolute value idiom to abs instruction. Summary: Previously we lowered it literally, to shifts and xors. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28722 llvm-svn: 292303	2017-01-18 00:08:44 +00:00
Justin Lebar	1091a9f566	[NVPTX] Improve lowering of llvm.ctpop. Summary: Avoid an unnecessary conversion operation when using the result of ctpop.i32 or ctpop.i16 as an i32, as in both cases the ptx instruction we run returns an i32. (Previously if we used the value as an i32, we'd do an unnecessary zext+trunc.) Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28721 llvm-svn: 292302	2017-01-18 00:08:27 +00:00
Justin Lebar	c7d20128bd	[NVPTX] Add lowering for llvm.bitreverse. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28720 llvm-svn: 292301	2017-01-18 00:08:10 +00:00
Justin Lebar	d17de5380b	[NVPTX] Improve lowering of llvm.ctlz. Summary: * Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero. * Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions. Reviewers: tra Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D28719 llvm-svn: 292299	2017-01-18 00:07:35 +00:00
Tim Northover	33a1a0b001	GlobalISel: fix comparison order for G_FCMP As with G_ICMP we'd written the CSET instructions backwards. llvm-svn: 292285	2017-01-17 23:04:01 +00:00
Tim Northover	509091f9e0	GlobalISel: add callseq instructions to record stack usage llvm-svn: 292284	2017-01-17 22:43:34 +00:00
Tim Northover	d943354216	GlobalISel: correctly handle varargs Some platforms (notably iOS) use a different calling convention for unnamed vs named parameters in varargs functions, so we need to keep track of this information when translating calls. Since not many platforms are involved, the guts of the special handling is in the ValueHandler class (with a generic implementation that should work for most targets). llvm-svn: 292283	2017-01-17 22:30:10 +00:00
Alexei Starovoitov	efefbc4a19	[bpf] fix stack-use-after-scope Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 292258	2017-01-17 21:14:00 +00:00
Joerg Sonnenberger	270dd41f75	Remove an overeager assert from r288844. llvm-svn: 292244	2017-01-17 19:29:15 +00:00
Bob Wilson	f2d0b68b3b	Revert r291640 change to fold X86 comparison with atomic_load_add. Even with the fix from r291630, this still causes problems. I get widespread assertion failures in the Swift runtime's WeakRefCount::increment() function. I sent a reduced testcase in reply to the commit. llvm-svn: 292242	2017-01-17 19:18:57 +00:00
Sam Kolton	9dffada98b	[AMDGPU] Assembler: fix v_mac_f16 immediates Reviewers: vpykhtin, artem.tamazov, tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D28802 llvm-svn: 292224	2017-01-17 15:26:02 +00:00
Serge Rogatch	50be6b45a9	[XRay][Arm] Repair XRay table emission on Arm32 and add tests to identify such problem earlier Summary: Emission of XRay table was occasionally disabled for Arm32, but this bug was not then detected because earlier (also by mistake) testing of XRay was occasionally disabled on 32-bit Arm targets. This patch should fix that problem and detect such problems in the future. This patch is one of a series, see also - https://reviews.llvm.org/D28623 Reviewers: rengolin, dberris Reviewed By: dberris Subscribers: llvm-commits, aemerson, rengolin, dberris, iid_iunknown Differential Revision: https://reviews.llvm.org/D28624 llvm-svn: 292210	2017-01-17 11:52:10 +00:00
Matt Arsenault	4165efdc58	AMDGPU: Add replacement export intrinsics llvm-svn: 292205	2017-01-17 07:26:53 +00:00
Alexei Starovoitov	e4975487f5	[bpf] error when unknown bpf helper is called Emit error when BPF backend sees a call to a global function or to an external symbol. The kernel verifier only allows calls to predefined helpers from bpf.h which are defined in 'enum bpf_func_id'. Such calls in assembler must look like 'call [1-9]+' where number matches bpf_func_id. Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 292204	2017-01-17 07:26:17 +00:00
Craig Topper	729d30d0ae	[AVX-512] Add support for taking a bitcast between a SUBV_BROADCAST and VSELECT and moving it to the input of the SUBV_BROADCAST if it will help with using a masked operation. llvm-svn: 292201	2017-01-17 06:49:59 +00:00
Alexei Starovoitov	05de2e4818	[bpf] error when BPF stack size exceeds 512 bytes Signed-off-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 292180	2017-01-17 01:05:17 +00:00
Matt Arsenault	2aab1d45ff	AMDGPU: Remove dead pattern This is the unsafe conversion pattern, but not guarded by an unsafe math check. It is also already done in LegalizeDAG. llvm-svn: 292173	2017-01-17 00:10:43 +00:00
Jan Vesely	334f51a6fe	ADMGPU/EG,CM: Implement _noret global atomics _RTN versions will be a lot more complicated Differential Revision: https://reviews.llvm.org/D28067 llvm-svn: 292162	2017-01-16 21:20:13 +00:00
Tony Jiang	8e8c444d3d	[PowerPC] Expand ISEL instruction into if-then-else sequence. Generally, the ISEL is expanded into if-then-else sequence, in some cases (like when the destination register is the same with the true or false value register), it may just be expanded into just the if or else sequence. llvm-svn: 292154	2017-01-16 20:12:26 +00:00
Chad Rosier	58fb5f5e58	[AArch64] Falkor supports Rounding Double Multiply Add/Subtract instructions. Falkor only partially implements the ARMv8.1a extensions, so this patch refactors the support for the SQRDML[A\|S]H instruction into a separate feature. Differential Revision: https://reviews.llvm.org/D28681 llvm-svn: 292142	2017-01-16 16:28:43 +00:00
Daniel Sanders	a83a1a69c5	Revert r292132: [globalisel] Tablegen-erate current Register Bank Information'... Several buildbots encountered a crash in tablegen when building this commit. Reverting while I investigate the cause. llvm-svn: 292136	2017-01-16 15:34:43 +00:00
Daniel Sanders	ab8194def0	[globalisel] Tablegen-erate current Register Bank Information Summary: Adds a RegisterBank tablegen class that can be used to declare the register banks and an associated tablegen pass to generate the necessary code. Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, rengolin, kristof.beyls, vkalintiris, mgorny, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27338 llvm-svn: 292132	2017-01-16 15:20:43 +00:00
Tony Jiang	8da139a9fd	Revert "[PowerPC] Expand ISEL instruction into if-then-else sequence." This reverts commit 1d0e0374438ca6e153844c683826ba9b82486bb1. llvm-svn: 292131	2017-01-16 15:01:07 +00:00
Tony Jiang	7630b8c5ee	[PowerPC] Expand ISEL instruction into if-then-else sequence. Generally, the ISEL is expanded into if-then-else sequence, in some cases (like when the destination register is the same with the true or false value register), it may just be expanded into just the if or else sequence. llvm-svn: 292128	2017-01-16 14:43:12 +00:00
Simon Dardis	730fdb73a1	[mips] Correct c.cond.fmt instruction definition. Permit explicit $fcc<X> operand in c.cond.fmt instruction. Add c.cond.fmt to the MIPS to microMIPS instruction mapping table. Check that $fcc1 - $fcc7 are unusable for MIPS-I to MIPS-III for c.cond.fmt, bc1t, bc1f. Reviewers: seanbruno, zoran.jovanovic, vkalintiris Differential Revision: https://reviews.llvm.org/D24510 llvm-svn: 292117	2017-01-16 13:55:58 +00:00
Craig Topper	fba613e407	[X86] Merge the disassemblers handling of the different TYPE_RELs by getting the size information from the ENCODING field. NFCI llvm-svn: 292096	2017-01-16 06:49:09 +00:00
Craig Topper	ad944a1cac	[X86] Reduce the number of operand 'types' the disassembler needs to deal with. NFCI We were frequently checking for a list of types and the different types conveyed no real information. So lump them together explicitly. llvm-svn: 292095	2017-01-16 06:49:03 +00:00
Craig Topper	3173a1f8ff	[AVX-512] Teach the disassembler about all of the EVEX gather and scatter instructions. llvm-svn: 292094	2017-01-16 05:44:33 +00:00
Craig Topper	33ac064137	[AVX-512] Begin giving the disassembler a way to recognize that VSIB is a different encoding than regular addressing modes. This part first teaches it not to check error if EVEX.V2 is used by a VSIB instruction. llvm-svn: 292093	2017-01-16 05:44:25 +00:00
Craig Topper	7dfd583644	[AVX-512] Correct memory operand size for VPGATHERQPS and VPGATHERQD with ZMM index. Similar for SCATTER and the prefetch gather and scatter instructions. Fixes PR31618. llvm-svn: 292088	2017-01-16 00:55:58 +00:00
Craig Topper	8be6ebce2b	[AVX-512] Fix register class in one of the gather/scatter memory operands so that all 32 bit registers can be allowed. llvm-svn: 292087	2017-01-16 00:55:50 +00:00
Simon Pilgrim	6ed996cdf0	[CostModel][X86] Fix AVX512BW vector shift costs for vXi16 types We already have patterns in place to support 128/256-bit shifts without AVX512VL llvm-svn: 292077	2017-01-15 20:44:00 +00:00
Justin Lebar	38746d9718	[NVPTX] Let there be One True Way to set NVVMReflect params. Summary: Previously there were three ways to inform the NVVMReflect pass whether you wanted to flush denormals to zero: * An LLVM command-line option * Parameters to the NVVMReflect constructor * Metadata on the module itself. This change removes the first two, leaving only the third. The motivation for this change, aside from simplifying things, is that we want LLVM to be aware of whether it's operating in FTZ mode, so other passes can use this information. Ideally we'd have a target-generic piece of metadata on the module. This change moves us in that direction. Reviewers: tra Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D28700 llvm-svn: 292068	2017-01-15 16:54:35 +00:00
Michael Zuckerman	6baa3838e9	Fix blend mask by switch the side of the operand since Blend node uses opposite mask then Select NODE. llvm-svn: 292066	2017-01-15 16:43:14 +00:00
Craig Topper	f1388ef006	[AVX-512] Remove unnecessary duplicate broadcast patterns. NFC llvm-svn: 292053	2017-01-15 06:15:45 +00:00
Craig Topper	52317e8b6e	[AVX-512] Replicate some broadcast patterns to VLX and disable the AVX2 patterns when VLX is available. llvm-svn: 292051	2017-01-15 05:47:45 +00:00
Craig Topper	c294cff863	[X86] Remove untested MOVDDUP patterns. These all involve bitcasts around the memory operands. This isn't something we normally do for isel patterns. I suspect DAG combine should convert the load type making this unnecessary. llvm-svn: 292050	2017-01-15 05:21:29 +00:00
Davide Italiano	76de68eaf9	[TargetLowering] Simplfiy a bit. NFCI. llvm-svn: 292024	2017-01-14 20:09:29 +00:00
Simon Pilgrim	d419b73a42	[CostModel][X86] Updated vXi64 ASHR costs on AVX512 targets now that D28604 has landed llvm-svn: 292023	2017-01-14 19:24:23 +00:00
Simon Pilgrim	8e5ecf8ad1	[X86][XOP] Added support for VPMADCSWD 'extend+hadd' IFMA patterns VPMADCSWD act as VPADDD( VPMADDWD( x, y ), z ) - multiply+extend+hadd and add to v4i32 accumulator llvm-svn: 292021	2017-01-14 18:52:13 +00:00
Simon Pilgrim	b290805e94	[X86][XOP] Added support for VPMACSDQH/VPMACSDQL 'extension' IFMA patterns VPMACSDQH/VPMACSDQL act as VPADDQ( VPMULDQ( x, y ), z ) - multiply+extending either the odd/even 4i32 input elements and adding to v2i64 accumulator llvm-svn: 292020	2017-01-14 18:08:54 +00:00
Simon Pilgrim	a1631749f8	[X86][XOP] Added support for VPMACSWW/VPMACSDD 'lossy' IFMA patterns VPMACSWW/VPMACSDD act as add( mul( x, y ), z ) - ignoring any upper bits from both the multiply and add stages llvm-svn: 292019	2017-01-14 17:13:52 +00:00
Craig Topper	63e2cd6caa	[AVX-512] Teach two address instruction pass to replace masked move instructions with blendm instructions when its beneficial. Isel now selects masked move instructions for vselect instead of blendm. But sometimes it beneficial to register allocation to remove the tied register constraint by using blendm instructions. This also picks up cases where the masked move was created due to a masked load intrinsic. Differential Revision: https://reviews.llvm.org/D28454 llvm-svn: 292005	2017-01-14 07:50:52 +00:00
Craig Topper	09b7e0f01d	[AVX-512] Replace V_SET0 in AVX-512 patterns with AVX512_128_SET0. Enhance AVX512_128_SET0 expansion to make this possible. We'll now expand AVX512_128_SET0 to an EVEX VXORD if VLX available. Or if its not, but register allocation has selected a non-extended register we will use VEX VXORPS. And if its an extended register without VLX we'll use a 512-bit XOR. Do the same for AVX512_FsFLD0SS/SD. This makes it possible for the register allocator to have all 32 registers available to work with. llvm-svn: 292004	2017-01-14 07:29:24 +00:00
Craig Topper	9cc685a56e	[X86] Simplify the code that calculates a scaled blend mask. We don't need a second loop. llvm-svn: 291996	2017-01-14 04:29:15 +00:00
Craig Topper	9850210d03	[AVX-512] Change blend mask in lowerVectorShuffleAsBlend to a 64-bit value. Also add 32-bit mode command lines to the test case that exercises this just to make sure we sanely handle the 64-bit immediate there. This fixes a undefined sanitizer failure from r291888. llvm-svn: 291994	2017-01-14 04:19:35 +00:00
David L. Jones	41cecba8e9	"Use" lambda captures which are otherwise only used in asserts. NFC Summary: The LLVM coding standards recommend "using" values that are only needed by asserts: http://llvm.org/docs/CodingStandards.html#assert-liberally Without this change, LLVM cannot bootstrap with -Werror as the second stage fails with this new warning: https://reviews.llvm.org/rL291905 See also the previous fixes: https://reviews.llvm.org/rL291916 https://reviews.llvm.org/rL291939 https://reviews.llvm.org/rL291940 https://reviews.llvm.org/rL291941 Reviewers: rsmith Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D28695 llvm-svn: 291957	2017-01-13 21:02:41 +00:00
Artem Belevich	64dc9be7b4	[NVPTX] Added support for half-precision floating point. Only scalar half-precision operations are supported at the moment. - Adds general support for 'half' type in NVPTX. - fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math). - Type conversions to/from fp16 are supported on all GPU variants. - On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage. Differential Revision: https://reviews.llvm.org/D28540 llvm-svn: 291956	2017-01-13 20:56:17 +00:00
Konstantin Zhuravlyov	7d88275577	[AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64) Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954	2017-01-13 19:49:25 +00:00
Artem Belevich	d109f46573	[NVPTX] Only lower sin/cos to approximate instructions if unsafe math is allowed. Previously we'd always lower @llvm.{sin,cos}.f32 to {sin.cos}.approx.f32 instruction even when unsafe FP math was not allowed. Clang-generated IR is not affected by this as it uses precise sin/cos from CUDA's libdevice when unsafe math is disabled. Differential Revision: https://reviews.llvm.org/D28619 llvm-svn: 291936	2017-01-13 18:48:13 +00:00
Malcolm Parsons	17d266bc96	Remove unused lambda captures. NFC llvm-svn: 291916	2017-01-13 17:12:16 +00:00
Ivan Krasin	1ed7896c1b	Revert r291903 and r291898. Reason: they break check-lld on the bots. Summary: Revert [ARM] Fix ubig32_t read in ARMAttributeParser Now using support functions to read data instead of trying to perform casts. =========================================================== Revert [ARM] Enable objdump to construct triple for ARM Now that The ARMAttributeParser has been moved into the library, it has been modified so that it can parse the attributes without printing them and stores them in a map. ELFObjectFile now queries the attributes to fill out the architecture details of a provided triple for 'arm' and 'thumb' targets. llvm-objdump uses this new functionality. Subscribers: llvm-commits, samparker, aemerson, mgorny Differential Revision: https://reviews.llvm.org/D28683 llvm-svn: 291911	2017-01-13 16:45:15 +00:00
Saleem Abdulrasool	6ef45916c6	ARM: match GCC's behaviour for builtins GCC changes the CC between the user-code and the builtins based on the value of `-target` rather than `-mfloat-abi`. When a HF target is used, the VFP variant of the AAPCS CC is used. Otherwise, the AAPCS variant is used. In all cases, the AEABI functions use the AAPCS CC. Adjust the calling convention based on the target. Resolves PR30543! llvm-svn: 291909	2017-01-13 16:25:33 +00:00
Benjamin Kramer	061f4a5fe6	Apply clang-tidy's performance-unnecessary-value-param to LLVM. With some minor manual fixes for using function_ref instead of std::function. No functional change intended. llvm-svn: 291904	2017-01-13 14:39:03 +00:00
Daniel Sanders	d6a1831ea7	[globalisel][aarch64] Make getCopyMapping() take register banks ID's rather than IsGPR booleans Summary: This allows the function to handle architectures with more than two register banks. Depends on D27978 Reviewers: ab, t.p.northover, rovka, qcolombet Subscribers: aditya_nandakumar, kristof.beyls, aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27339 llvm-svn: 291902	2017-01-13 14:16:33 +00:00
Simon Pilgrim	7f2a6d5e8c	[X86][AVX512] Add support for variable ASHR v2i64/v4i64 support without VLX Use v8i64 variable ASHR instructions if we don't have VLX. This is a reduced version of D28537 that just adds support for variable shifts - I'll continue with that patch (for just constant/uniform shifts) once I've fixed the type legalization issue in avx512-cvt.ll. Differential Revision: https://reviews.llvm.org/D28604 llvm-svn: 291901	2017-01-13 13:16:19 +00:00
Daniel Sanders	21ac840fca	[aarch64][globalisel] Move getValueMapping/getCopyMapping to AArch64GenRegisterBankInfo. NFC. Summary: We did lose a little specificity in the assertion messages for the PartialMappingIdx enumerators in this change but this was necessary to avoid unnecessary use of 'public:' and we haven't lost anything that can't be discovered easily in lldb. Once this is tablegen-erated we could also safely remove the assertions. Depends on D27976 Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27978 llvm-svn: 291900	2017-01-13 11:50:34 +00:00
Daniel Sanders	f81cf47e65	[aarch64][globalisel] Refactor getRegBankBaseIdxOffset() to remove the power-of-2 assumption. NFC Summary: We don't exploit it yet though Depends on D27976 Reviewers: t.p.northover, ab, rovka, qcolombet Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27977 llvm-svn: 291899	2017-01-13 11:23:37 +00:00
Sam Parker	770ceb69ba	[ARM] Enable objdump to construct triple for ARM Now that The ARMAttributeParser has been moved into the library, it has been modified so that it can parse the attributes without printing them and stores them in a map. ELFObjectFile now queries the attributes to fill out the architecture details of a provided triple for 'arm' and 'thumb' targets. llvm-objdump uses this new functionality. Differential Revision: https://reviews.llvm.org/D28281 llvm-svn: 291898	2017-01-13 11:04:21 +00:00
Daniel Sanders	438a1ecc2c	[aarch64][globalisel] Move data into <Target>GenRegisterBankInfo. NFC. Summary: Depends on D27809 Reviewers: t.p.northover, rovka, qcolombet, ab Subscribers: aditya_nandakumar, aemerson, rengolin, vkalintiris, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D27976 llvm-svn: 291897	2017-01-13 10:53:57 +00:00
Diana Picus	a2c59149e1	[ARM] CodeGen: Replace AddDefaultT1CC and AddNoT1CC. NFC For AddDefaultT1CC, we add a new helper t1CondCodeOp, which creates the appropriate register operand. For AddNoT1CC, we use the existing condCodeOp helper - we only had two uses of AddNoT1CC, so at this point it's probably not worth having yet another helper just for them. Differential Revision: https://reviews.llvm.org/D28603 llvm-svn: 291894	2017-01-13 10:37:37 +00:00
Diana Picus	8a73f5562f	[ARM] CodeGen: Remove AddDefaultCC. NFC. Replace all uses of AddDefaultCC with add(condCodeOp()). The transformation has been done automatically with a custom tool based on Clang AST Matchers + RefactoringTool. Differential Revision: https://reviews.llvm.org/D28557 llvm-svn: 291893	2017-01-13 10:18:01 +00:00
Diana Picus	116bbab4e4	[CodeGen] Rename MachineInstrBuilder::addOperand. NFC Rename from addOperand to just add, to match the other method that has been added to MachineInstrBuilder for adding more than just 1 operand. See https://reviews.llvm.org/D28057 for the whole discussion. Differential Revision: https://reviews.llvm.org/D28556 llvm-svn: 291891	2017-01-13 09:58:52 +00:00
Diana Picus	4f8c3e1882	[ARM] CodeGen: Remove AddDefaultPred. NFC. Replace all uses of AddDefaultPred with MachineInstrBuilder::add(predOps()). This makes the code building MachineInstrs more readable, because it allows us to write code like: MIB.addSomeOperand(blah) .add(predOps()) .addAnotherOperand(blahblah) instead of AddDefaultPred(MIB.addSomeOperand(blah)) .addAnotherOperand(blahblah) This commit also adds the predOps helper in the ARM backend, as well as the add method taking a variable number of operands to the MachineInstrBuilder. The transformation has been done mostly automatically with a custom tool based on Clang AST Matchers + RefactoringTool. Differential Revision: https://reviews.llvm.org/D28555 llvm-svn: 291890	2017-01-13 09:37:56 +00:00
Michael Zuckerman	558a4d8419	[X86][AVX512] Adding missing shuffle lowering to blend mask instructions Some shuffles can be lowered to blend mask instruction (VPBLENDMB/VPBLENDMW/VPBLENDMD/VPBLENDMQ) . In this patch, I added new pattern match for this case. Reviewers: 1. craig.topper 2. guyblank 3. RKSimon 4. igorb Differential Revision: https://reviews.llvm.org/D28483 llvm-svn: 291888	2017-01-13 09:06:00 +00:00
Craig Topper	1ec84c2a18	[AVX-512] Remove unmasked BLENDM instructions from the wrong load folding table. The unmasked versions read memory from operand 2, but were in the operand 3 table. These aren't the most interesting set of blendm instructions as the unmasked version isn't useful. We were also missing the B and W forms. I'll add the masked versions of all sizes in a future patch. llvm-svn: 291885	2017-01-13 07:28:56 +00:00

1 2 3 4 5 ...

40944 Commits