llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	fe3d59e98b	[X86][AVX512] UNPCKL/H PS and PD should be scheduled with WriteFShuffle not WriteFAdd llvm-svn: 330023	2018-04-13 14:41:05 +00:00
Jun Bum Lim	06073bfff7	[PostRASink]Add register dependency check for implicit operands Summary: This change extend the register dependency check for implicit operands in Copy instructions. Fixes PR36902. Reviewers: thegameg, sebpop, uweigand, jnspaulsson, gberry, mcrosier, qcolombet, MatzeB Reviewed By: thegameg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44958 llvm-svn: 330018	2018-04-13 14:23:09 +00:00
Ivan A. Kosarev	f533a6e5aa	[NEON] Support intrinsic for scalar and vector versions of the VRINTN instruction Differential Revision: https://reviews.llvm.org/D45514 llvm-svn: 330011	2018-04-13 12:45:12 +00:00
Hiroshi Inoue	372ffa15cb	[NFC] fix trivial typos in comments "the the" -> "the", "we we" -> "we", etc llvm-svn: 330006	2018-04-13 11:37:06 +00:00
Gabor Buella	604be4424b	[X86] Introduce cldemote instruction Hint to hardware to move the cache line containing the address to a more distant level of the cache without writing back to memory. Reviewers: craig.topper, zvi Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45256 llvm-svn: 329992	2018-04-13 07:35:08 +00:00
Craig Topper	254ed028a4	[X86] Remove the pmuldq/pmuldq intrinsics and replace with native IR. This completes the work started in r329604 and r329605 when we changed clang to no longer use the intrinsics. We lost some InstCombine SimplifyDemandedBit optimizations through this change as we aren't able to fold 'and', bitcast, shuffle very well. llvm-svn: 329990	2018-04-13 06:07:18 +00:00
Sanjay Patel	9adb386a8e	[PowerPC] add fsub-fneg test; NFC This is a test for a transform that was suggested in the post-commit mailing list thread for rL329821. The target in question is not in trunk, so PPC gets to stand in for it because it's the only in-tree target that sets 'isFPExtFree()' to 'true'. llvm-svn: 329963	2018-04-12 22:14:23 +00:00
Peter Collingbourne	00db326b0d	AArch64: Introduce a DAG combine for folding offsets into addresses. This is a code size win in code that takes offseted addresses frequently, such as C++ constructors that typically need to compute an offseted address of a vtable. This reduces the size of Chromium for Android's .text section by 108KB. Differential Revision: https://reviews.llvm.org/D45199 llvm-svn: 329956	2018-04-12 21:23:55 +00:00
Gabor Buella	297c138798	[X86] Introduce LLVM wbinvd intrinsic A previously missing intrinsic for an old instruction. Reviewers: craig.topper, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45312 llvm-svn: 329936	2018-04-12 18:38:18 +00:00
Lei Huang	10367eb422	[Power9]Legalize and emit code for converting (Un)Signed DWord to Quad-Precision Legalize and emit code for: * xscvsdqp * xscvudqp Differential Revision: https://reviews.llvm.org/D45230 llvm-svn: 329931	2018-04-12 18:00:14 +00:00
Jessica Paquette	8aa6cd5cb9	[AArch64] Move AFI->setRedZone(false) to top of emitPrologue AFI->setRedZone(false) was put in the wrong place before, and so it only fired on functions that didn't have stack frames. This moves that to the top of emitPrologue to make sure that every function without a redzone has it set correctly. This also adds a function representing one of the early exit cases (GHC calling convention) to the MachineOutliner noredzone test to ensure that we can outline from functions like these, where we never use a redzone. llvm-svn: 329922	2018-04-12 16:16:18 +00:00
Sanjay Patel	5ace2b765a	revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617) This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920	2018-04-12 15:27:01 +00:00
Shiva Chen	b48b027d05	[RISCV] Change function alignment to 4 bytes, and 2 bytes for RVC Summary: According RISC-V ELF psABI specification, base RV32 and RV64 ISAs only allow 32-bit instruction alignment, but instruction allow to be aligned to 16-bit boundaries for C-extension. So we just align to 4 bytes and 2 bytes for C-extension is enough. Reviewers: asb, apazos Differential Revision: https://reviews.llvm.org/D45560 Patch by Kito Cheng. llvm-svn: 329899	2018-04-12 11:30:59 +00:00
Petar Jovanovic	984db9ecbc	[MIPS GlobalISel] minor update to MIR tests added in r329819 Remove 'registers' section, as suggested (D. Sanders) at code review https://reviews.llvm.org/D44304 llvm-svn: 329888	2018-04-12 09:12:29 +00:00
Hiroshi Inoue	bcadfee2ad	[NFC] fix trivial typos in documents and comments "is is" -> "is", "if if" -> "if", "or or" -> "or" llvm-svn: 329878	2018-04-12 05:53:20 +00:00
Alex Bradbury	21d28fe8b8	[RISCV] Codegen support for RV32D floating point comparison operations Also add double-prevoius-failure.ll which captures a test case that at one point triggered a compiler crash, while developing calling convention support for f64 on RV32D with soft-float ABI. llvm-svn: 329877	2018-04-12 05:50:06 +00:00
Alex Bradbury	60baa2e015	[RISCV] Codegen support for RV32D floating point conversion operations This also includes support and a test for truncating stores, which are now possible thanks to the fpround pattern. llvm-svn: 329876	2018-04-12 05:47:15 +00:00
Alex Bradbury	5d0dfa5e0e	[RISCV] Add codegen support for RV32D floating point arithmetic operations llvm-svn: 329874	2018-04-12 05:42:42 +00:00
Alex Bradbury	8f296478eb	[RISCV] Add tests missed in r329871 llvm-svn: 329872	2018-04-12 05:36:44 +00:00
Nemanja Ivanovic	c564dc060a	[PowerPC] Fix condition for 64-bit rotate when replacing r+r instr with r+i This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039 The condition only covers one of the two 64-bit rotate instructions. This just adds the second (RLDICLo). Patch by Josh Stone. llvm-svn: 329852	2018-04-11 21:25:44 +00:00
Puyan Lotfi	0cba63c064	Attempting to work around a non-determinism issue. The main thing that matters with this test is that the COPYs are moved together not where the REG_SEQUENCES are. llvm-svn: 329850	2018-04-11 20:29:32 +00:00
Gabor Buella	2ef36f3571	[X86] Describe wbnoinvd instruction Similar to the wbinvd instruction, except this one does not invalidate caches. Ring 0 only. The encoding matches a wbinvd instruction with an F3 prefix. Reviewers: craig.topper, zvi, ashlykov Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D43816 llvm-svn: 329847	2018-04-11 20:01:57 +00:00
Simon Pilgrim	8fc2b49620	[X86][Atom] Convert Atom scheduler model to SchedRW (PR32431) Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550. I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here, There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers. There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical. NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild. Differential Revision: https://reviews.llvm.org/D45486 llvm-svn: 329837	2018-04-11 18:23:01 +00:00
Tim Renouf	fd8d4af3bc	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Re-landed after noticing that the buildbot failure from 329808 seemed to be unrelated. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329826	2018-04-11 17:18:36 +00:00
Reid Kleckner	0828699488	[FastISel] Disable local value sinking by default This is causing compilation timeouts on code with long sequences of local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out that code coverage instrumentation is a great way to create sequences like this, which how our users ran into the issue in practice. Intel has a tool that detects these kinds of non-linear compile time issues, and Andy Kaylor reported it as PR37010. The current sinking code scans the whole basic block once per local value sink, which happens before emitting each call. In theory, local values should only be introduced to be used by instructions between the current flush point and the last flush point, so we should only need to scan those instructions. llvm-svn: 329822	2018-04-11 16:03:07 +00:00
Paul Robinson	0195469a23	[DWARFv5] Fuss with asm syntax for conveying MD5 checksum. Previously the MD5 option of the .file directive provided the checksum as a quoted hex string; now it's a normal hex number with 0x prefix, same as the .octa directive accepts. Differential Revision: https://reviews.llvm.org/D45459 llvm-svn: 329820	2018-04-11 15:14:05 +00:00
Petar Jovanovic	366857a23a	[MIPS GlobalISel] Select add i32, i32 Add the minimal support necessary to lower a function that returns the sum of two i32 values. Support argument/return lowering of i32 values through registers only. Add tablegen for regbankselect and instructionselect. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D44304 llvm-svn: 329819	2018-04-11 15:12:32 +00:00
Yaxun Liu	9381ae9791	[AMDGPU] Fix lowering enqueue_kernel Two issues were fixed: runtime has difficulty to allocate memory for an external symbol of a kernel and set the address of the external symbol, therefore make the runtime handle of an enqueued kernel an ordinary global variable. Runtime only needs to store the address of the loaded kernel to the handle and has verified that this approach works. handle the situation where __enqueue_kernel* gets inlined therefore the enqueued kernel may be used through a constant expr instead of an instruction. Differential Revision: https://reviews.llvm.org/D45187 llvm-svn: 329815	2018-04-11 14:46:15 +00:00
Tim Renouf	8ca33bfcf3	Revert "[AMDGPU] Ensure there are enough registers for wave dispatch" This reverts 329808. That change caused a report of a failure in test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect it is an expensive-check-only error. Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0 llvm-svn: 329811	2018-04-11 14:27:41 +00:00
Tim Renouf	f26b723491	[AMDGPU] Ensure there are enough registers for wave dispatch Summary: This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to allow for registers set up in wave dispatch, even if those registers are not used in the shader. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45503 Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771 llvm-svn: 329808	2018-04-11 14:02:41 +00:00
Simon Pilgrim	89c8a10f7c	[X86] Add variable shuffle schedule classes Split variable index shuffles from immediate index shuffles WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.) WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.) WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.) WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.) Differential Revision: https://reviews.llvm.org/D45404 llvm-svn: 329806	2018-04-11 13:49:19 +00:00
Francis Visoiu Mistrih	7bcb5720fd	[AArch64] Add test case for r329797 Forgot to add a test case in the previous commit. llvm-svn: 329805	2018-04-11 13:37:25 +00:00
Simon Pilgrim	6f97328b1f	[X86][SSE] Tweak cmpps schedule test so that it works properly with just sse1 movhps/movlps test are still broken so we can't disable sse2 yet llvm-svn: 329802	2018-04-11 13:15:36 +00:00
Sjoerd Meijer	ac96d7c4b3	[ARM] FP16 VSEL codegen This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788	2018-04-11 09:28:04 +00:00
Craig Topper	9507fa358c	[X86] Remove 128/256-bit masked pmaddubsw and pmaddwd intrinsics. Replace 512-bit masked intrinsic with unmasked intrinsic and a select. The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency. llvm-svn: 329774	2018-04-11 04:55:04 +00:00
Craig Topper	ee2c1dea4d	[X86] In X86FlagsCopyLowering, when rewriting a memory setcc we need to emit an explicit MOV8mr instruction. Previously the code only knew how to handle setcc to a register. This should fix a crash in the chromium build. llvm-svn: 329771	2018-04-11 01:09:10 +00:00
Craig Topper	72fa9f12a7	[X86] Switch a test from grep to FileCheck. NFC llvm-svn: 329769	2018-04-11 01:05:32 +00:00
Sriraman Tallam	182f2df7c5	Simplification of libcall like printf->puts must check for RtLibUseGOT metadata. With -fno-plt, for example, calls to printf when getting converted to puts still use the PLT. This patch checks for the metadata "RtLibUseGOT" and annotates the declaration with the right attributes. Differential Revision: https://reviews.llvm.org/D45180 llvm-svn: 329768	2018-04-10 23:32:36 +00:00
Sriraman Tallam	d693093a65	GOTPCREL references must always use RIP. With -fno-plt, global value references can use GOTPCREL and RIP must be used. Differential Revision: https://reviews.llvm.org/D45460 llvm-svn: 329765	2018-04-10 22:50:05 +00:00
Marek Olsak	a9a58fa236	AMDGPU: enable 128-bit for local addr space under an option Author: Samuel Pitoiset ds_read_b128 and ds_write_b128 have been recently enabled under the amdgpu-ds128 option because the performance benefit is unclear. Though, using 128-bit loads/stores for the local address space appears to introduce regressions in tessellation shaders. Not sure what is broken, but as ds_read_b128/ds_write_b128 are not enabled by default, just introduce a global option and enable 128-bit only if requested (until it's fixed/used correctly). v2: - fix regressions in merge-stores.ll and multiple_tails.ll Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464 llvm-svn: 329764	2018-04-10 22:48:23 +00:00
Galina Kistanova	3dc27f1a69	Disable flaky tests till they get fixed. llvm-svn: 329763	2018-04-10 22:07:29 +00:00
Geoff Berry	5696e075c3	[AArch64][Falkor] Fix bug in Falkor HWPF collision avoidance pass. Summary: When inserting MOVs to avoid Falkor HWPF collisions, the non-base register operand of load instructions (e.g. a register offset) was not being considered live, so it could potentially have been used as a scratch register, clobbering the actual offset value. Reviewers: mcrosier Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45502 llvm-svn: 329761	2018-04-10 21:43:03 +00:00
Steven Wu	d0804aa6dc	[MachO] Emit Weak ReadOnlyWithRel to ConstDataSection Summary: Darwin dynamic linker can handle weak symbols in ConstDataSection. ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection instead of normal DataSection. rdar://problem/39298457 Reviewers: dexonsmith, kledzik Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45472 llvm-svn: 329752	2018-04-10 20:16:35 +00:00
Amara Emerson	e27d5016ef	[AArch64] Fix isel failure when BUILD_PAIR nodes are left over. rdar://39175175 llvm-svn: 329743	2018-04-10 19:01:58 +00:00
Gabor Buella	213edc4a15	[X86] Split up -march=icelake to -client & -server Reviewers: craig.topper, zvi, echristo Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D45055 llvm-svn: 329742	2018-04-10 18:59:13 +00:00
Krzysztof Parzyszek	71a4c0ca07	[CodeGen] Fix printing bundles in MIR output Delay printing the newline until after the opening bracket was printed, e.g. BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } instead of BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } llvm-svn: 329719	2018-04-10 16:46:13 +00:00
Peter Collingbourne	a7d936f0c0	Revert r329611, "AArch64: Allow offsets to be folded into addresses with ELF." Caused a build failure in check-tsan. llvm-svn: 329718	2018-04-10 16:19:30 +00:00
Francis Visoiu Mistrih	f2c22050e8	[AArch64] Use FP to access the emergency spill slot In the presence of variable-sized stack objects, we always picked the base pointer when resolving frame indices if it was available. This makes us hit an assert where we can't reach the emergency spill slot if it's too far away from the base pointer. Since on AArch64 we decide to place the emergency spill slot at the top of the frame, it makes more sense to use FP to access it. The changes here don't affect only emergency spill slots but all the frame indices. The goal here is to try to choose between FP, BP and SP so that we minimize the offset and avoid scavenging, or worse, asserting when trying to access a slot allocated by the scavenger. Previously discussed here: https://reviews.llvm.org/D40876. Differential Revision: https://reviews.llvm.org/D45358 llvm-svn: 329691	2018-04-10 11:29:40 +00:00
Tim Renouf	7190a4692a	[AMDGPU] For OS type AMDPAL, fixed scratch on compute shader Summary: For OS type AMDPAL, the scratch descriptor is loaded from offset 0 of the GIT, whose 32 bit pointer is in s0 (s8 for gfx9 merged shaders). This commit fixes that to use offset 0x10 instead of offset 0 for a compute shader, per the PAL ABI spec. V2: Ensure s0 (s8 for gfx9 merged shader) is marked live-in when loading scratch descriptor from GIT. Reviewers: kzhuravl, nhaehnle, timcorringham Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits, dstuttard, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D44468 Change-Id: I93dffa647758e37f613bb5e0dfca840d82e6d26f llvm-svn: 329690	2018-04-10 11:25:15 +00:00
Chandler Carruth	0ca3bd0729	[x86] Model the direction flag (DF) separately from the rest of EFLAGS. This cleans up a number of operations that only claimed te use EFLAGS due to using DF. But no instructions which we think of us setting EFLAGS actually modify DF (other than things like popf) and so this needlessly creates uses of EFLAGS that aren't really there. In fact, DF is so restrictive it is pretty easy to model. Only STD, CLD, and the whole-flags writes (WRFLAGS and POPF) need to model this. I've also somewhat cleaned up some of the flag management instruction definitions to be in the correct .td file. Adding this extra register also uncovered a failure to use the correct datatype to hold X86 registers, and I've corrected that as necessary here. Differential Revision: https://reviews.llvm.org/D45154 llvm-svn: 329673	2018-04-10 06:40:51 +00:00

1 2 3 4 5 ...

24155 Commits