llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	b4cbf9862c	AMDGPU/GlobalISel: Select more G_INSERT cases At minimum handle the s64 insert type, which are emitted in real cases during legalization. We really need TableGen to emit something to emit something like the inverse of composeSubRegIndices do determine the subreg index to use. llvm-svn: 373938	2019-10-07 18:43:31 +00:00
Matt Arsenault	27269054d2	GlobalISel: Add target pre-isel instructions Allows targets to introduce regbankselectable pseudo-instructions. Currently the closet feature to this is an intrinsic. However this requires creating a public intrinsic declaration. This litters the public intrinsic namespace with operations we don't necessarily want to expose to IR producers, and would rather leave as private to the backend. Use a new instruction bit. A previous attempt tried to keep using enum value ranges, but it turned into a mess. llvm-svn: 373937	2019-10-07 18:43:29 +00:00
Amaury Sechet	a6a70415c8	Regenerate ptr-rotate.ll . NFC llvm-svn: 373908	2019-10-07 14:10:21 +00:00
Simon Atanasyan	55ac745828	[Mips] Always save RA when disabling frame pointer elimination This ensures that frame-based unwinding will continue to work when calling a noreturn function; there is not much use having the caller's frame pointer saved if you don't also have the caller's program counter. Patch by James Clarke. Differential Revision: https://reviews.llvm.org/D68542 llvm-svn: 373907	2019-10-07 14:01:37 +00:00
Kevin P. Neal	1c3d19c82d	[FPEnv] Add constrained intrinsics for lrint and lround Earlier in the year intrinsics for lrint, llrint, lround and llround were added to llvm. The constrained versions are now implemented here. Reviewed by: andrew.w.kaylor, craig.topper, cameron.mcinally Approved by: craig.topper Differential Revision: https://reviews.llvm.org/D64746 llvm-svn: 373900	2019-10-07 13:20:00 +00:00
Jay Foad	301decd93d	[AMDGPU] Fix test checks The GFX10-DENORM-STRICT checks were only passing by accident. Fix them to make the test more robust in the face of scheduling or register allocation changes. llvm-svn: 373893	2019-10-07 10:57:41 +00:00
Craig Topper	6785108356	[X86] Autogenerate checks in leaFixup32.mir and leaFixup64.mir. NFC llvm-svn: 373878	2019-10-07 06:50:56 +00:00
Craig Topper	2c4f078877	[X86] Support LEA64_32r in processInstrForSlow3OpLEA and use INC/DEC when possible. Move the erasing and iterator updating inside to match the other slow LEA function. I've adapted code from optTwoAddrLEA and basically rebuilt the implementation here. We do lose the kill flags now just like optTwoAddrLEA. This runs late enough in the pipeline that shouldn't really be a problem. llvm-svn: 373877	2019-10-07 06:27:55 +00:00
Yi-Hong Lyu	6088f84398	[NFC][CGP] Tests for making ICMP_EQ use CR result of ICMP_S(L\|G)T dominators llvm-svn: 373876	2019-10-07 05:29:11 +00:00
Simon Pilgrim	b4ba3cbda0	[X86][AVX] Access a scalar float/double as a free extract from a broadcast load (PR43217) If a fp scalar is loaded and then used as both a scalar and a vector broadcast, perform the load as a broadcast and then extract the scalar for 'free' from the 0th element. This involved switching the order of the X86ISD::BROADCAST combines so we only convert to X86ISD::BROADCAST_LOAD once all other canonicalizations have been attempted. Adds a DAGCombinerInfo::recursivelyDeleteUnusedNodes wrapper. Fixes PR43217 Differential Revision: https://reviews.llvm.org/D68544 llvm-svn: 373871	2019-10-06 21:11:45 +00:00
Craig Topper	570ae49d03	[X86] Add custom type legalization for v16i64->v16i8 truncate and v8i64->v8i8 truncate when v8i64 isn't legal Summary: The default legalization for v16i64->v16i8 tries to create a multiple stage truncate concatenating after each stage and truncating again. But avx512 implements truncates with multiple uops. So it should be better to truncate all the way to the desired element size and then concatenate the pieces using unpckl instructions. This minimizes the number of 2 uop truncates. The unpcks are all single uop instructions. I tried to handle this by just custom splitting the v16i64->v16i8 shuffle. And hoped that the DAG combiner would leave the two halves in the state needed to make D68374 do the job for each half. This worked for the first half, but the second half got messed up. So I've implemented custom handling for v8i64->v8i8 when v8i64 needs to be split to produce the VTRUNCs directly. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68428 llvm-svn: 373864	2019-10-06 18:43:08 +00:00
Craig Topper	842dde6be4	[LegalizeTypes][X86] When splitting a vselect for type legalization, don't split a setcc condition if the setcc input is legal and vXi1 conditions are supported Summary: The VSELECT splitting code tries to split a setcc input as well. But on avx512 where mask registers are well supported it should be better to just split the mask and use a single compare. Reviewers: RKSimon, spatel, efriedma Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68359 llvm-svn: 373863	2019-10-06 18:43:03 +00:00
Sanjay Patel	f643fabb52	Revert [DAGCombine] Match more patterns for half word bswap This reverts r373850 (git commit `25ba49824d`) This patch appears to cause multiple codegen regression test failures - http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10680 llvm-svn: 373853	2019-10-06 15:27:34 +00:00
Amaury Sechet	25ba49824d	[DAGCombine] Match more patterns for half word bswap Summary: It ensures that the bswap is generated even when a part of the subtree already matches a bswap transform. Reviewers: craig.topper, efriedma, RKSimon, lebedev.ri Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68250 llvm-svn: 373850	2019-10-06 14:14:55 +00:00
Simon Pilgrim	032dd9b086	[X86][SSE] matchVectorShuffleAsBlend - use Zeroable element mask directly. We can make use of the Zeroable mask to indicate which elements we can safely set to zero instead of creating a target shuffle mask on the fly. This allows us to remove createTargetShuffleMask. This is part of the work to fix PR43024 and allow us to use SimplifyDemandedElts to simplify shuffle chains - we need to get to a point where the target shuffle masks isn't adjusted by its source inputs in setTargetShuffleZeroElements but instead we cache them in a parallel Zeroable mask. llvm-svn: 373846	2019-10-06 12:38:38 +00:00
David Zarzycki	7653ff398d	[X86] Enable AVX512BW for memcmp() llvm-svn: 373845	2019-10-06 10:25:52 +00:00
Matt Arsenault	c0ec72d4f8	AMDGPU/GlobalISel: RegBankSelect DS GWS intrinsics llvm-svn: 373840	2019-10-06 01:37:38 +00:00
Matt Arsenault	bcd6b1d209	AMDGPU/GlobalISel: Lower G_ATOMIC_CMPXCHG_WITH_SUCCESS llvm-svn: 373839	2019-10-06 01:37:37 +00:00
Matt Arsenault	a5b9c75674	GlobalISel: Partially implement lower for G_EXTRACT Turn into shift and truncate. Doesn't yet handle pointers. llvm-svn: 373838	2019-10-06 01:37:35 +00:00
Matt Arsenault	69c65a8609	AMDGPU/GlobalISel: Fix RegBankSelect for sendmsg intrinsics This wasn't updated for the immarg handling change. llvm-svn: 373837	2019-10-06 01:37:34 +00:00
Craig Topper	2decdf42b9	[FastISel] Copy the inline assembly dialect to the INLINEASM instruction. Fixes PR43575. llvm-svn: 373836	2019-10-05 23:21:17 +00:00
Simon Pilgrim	8815be04ec	[X86][AVX] Push sign extensions of comparison bool results through bitops (PR42025) As discussed on PR42025, with more complex boolean math we can end up with many truncations/extensions of the comparison results through each bitop. This patch handles the cases introduced in combineBitcastvxi1 by pushing the sign extension through the AND/OR/XOR ops so its just the original SETCC ops that gets extended. Differential Revision: https://reviews.llvm.org/D68226 llvm-svn: 373834	2019-10-05 20:49:34 +00:00
David Bolvansky	41c934acaf	[SelectionDAG] Add tests for LKK algorithm Added some tests testing urem and srem operations with a constant divisor. Patch by TG908 (Tim Gymnich) Differential Revision: https://reviews.llvm.org/D68421 llvm-svn: 373830	2019-10-05 14:29:25 +00:00
Philip Reames	d5a4dad206	Fix a nasty miscompile in experimental unordered atomic lowering This is an omission in rL371441. Loads which happened to be unordered weren't being added to the PendingLoad set, and thus weren't be ordered w/respect to side effects which followed before the end of the block. Included test case is how I spotted this. We had an atomic load being folded into a using instruction after a fence that load was supposed to be ordered with. I'm sure it showed up a bunch of other ways as well. Spotted via manual inspecting of assembly differences in a corpus w/and w/o the new experimental mode. Finding this with testing would have been "unpleasant". llvm-svn: 373814	2019-10-05 00:32:10 +00:00
Philip Reames	9fe5d730c7	[Test] Add a test case fo a missed oppurtunity in implicit null checking llvm-svn: 373813	2019-10-04 23:46:26 +00:00
Reid Kleckner	67cfa79c01	Revert [CodeGen] Do the Simple Early Return in block-placement pass to optimize the blocks This reverts r371177 (git commit `f879c68755`) It caused PR43566 by removing empty, address-taken MachineBasicBlocks. Such blocks may have references from blockaddress or other operands, and need more consideration to be removed. See the PR for a test case to use when relanding. llvm-svn: 373805	2019-10-04 22:24:21 +00:00
Jessica Paquette	784892c964	[MachineOutliner] Disable outlining from noreturn functions Outlining from noreturn functions doesn't do the correct thing right now. The outliner should respect that the caller is marked noreturn. In the event that we have a noreturn function, and the outlined code is in tail position, the outliner will not see that the outlined function should be tail called. As a result, you end up with a regular call containing a return. Fixing this requires that we check that all candidates live inside noreturn functions. So, for the sake of correctness, don't outline from noreturn functions right now. Add machine-outliner-noreturn.mir to test this. llvm-svn: 373791	2019-10-04 21:24:12 +00:00
Eli Friedman	23ae13d51f	[ScheduleDAG] When a node is cloned, add an edge between the nodes. InstrEmitter's virtual register handling assumes that clones are emitted after the cloned node. Make sure this assumption actually holds. Fixes a "Node emitted out of order - early" assertion on the testcase. This is probably a very rare case to actually hit in practice; even without the explicit edge, the scheduler will usually end up scheduling the nodes in the expected order due to other constraints. Differential Revision: https://reviews.llvm.org/D68068 llvm-svn: 373782	2019-10-04 19:51:40 +00:00
Craig Topper	87aa59a0c7	[X86] Remove isel patterns for mask vpcmpgt/vpcmpeq. Switch vpcmp to these based on the immediate in MCInstLower The immediate form of VPCMP can represent these completely. The vpcmpgt/eq are just shorter encodings. This patch removes the isel patterns and just swaps the opcodes and removes the immediate in MCInstLower. This matches where we do some other encodings tricks. Removes over 10K bytes from the isel table. Differential Revision: https://reviews.llvm.org/D68446 llvm-svn: 373766	2019-10-04 18:02:46 +00:00
Craig Topper	074fa390d2	[X86] Add DAG combine to form saturating VTRUNCUS/VTRUNCS from VTRUNC We already do this for ISD::TRUNCATE, but we can do the same for X86ISD::VTRUNC Differential Revision: https://reviews.llvm.org/D68432 llvm-svn: 373765	2019-10-04 17:53:18 +00:00
Kevin P. Neal	68b8052121	[FPEnv] Strict FP tests should use the requisite function attributes. A set of function attributes is required in any function that uses constrained floating point intrinsics. None of our tests use these attributes. This patch fixes this. These tests have been tested against the IR verifier changes in D68233. Reviewed by: andrew.w.kaylor, cameron.mcinally, uweigand Approved by: andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D67925 llvm-svn: 373761	2019-10-04 17:03:46 +00:00
Tim Northover	a7d90af1be	ARM-Darwin: keep the frame register reserved even if not updated. Darwin platforms need the frame register to always point at a valid record even if it's not updated in a leaf function. Backtraces are more important than one extra GPR. llvm-svn: 373738	2019-10-04 12:29:32 +00:00
Matt Arsenault	d7cad4fb41	AMDGPU/GlobalISel: Fix using wrong addrspace for aperture This was always passing the destination flat address space, when it should be picking between the two valid source options. llvm-svn: 373716	2019-10-04 08:35:38 +00:00
Matt Arsenault	412e0bf8f3	AMDGPU/GlobalISel: Select G_PTRTOINT llvm-svn: 373715	2019-10-04 08:35:37 +00:00
Matt Arsenault	be9521acaa	AMDGPU/GlobalISel: Support wave32 waterfall loops llvm-svn: 373714	2019-10-04 08:35:35 +00:00
David Zarzycki	03b216d854	[X86] Enable inline memcmp() to use AVX512 llvm-svn: 373706	2019-10-04 07:42:34 +00:00
Shiva Chen	ff55e2e047	[RISCV] Split SP adjustment to reduce the offset of callee saved register spill and restore We would like to split the SP adjustment to reduce the instructions in prologue and epilogue as the following case. In this way, the offset of the callee saved register could fit in a single store. add sp,sp,-2032 sw ra,2028(sp) sw s0,2024(sp) sw s1,2020(sp) sw s3,2012(sp) sw s4,2008(sp) add sp,sp,-64 Differential Revision: https://reviews.llvm.org/D68011 llvm-svn: 373688	2019-10-04 02:00:57 +00:00
Sanjay Patel	288079aafd	[DAGCombiner] add operation legality checks before creating shift ops (PR43542) As discussed on llvm-dev and: https://bugs.llvm.org/show_bug.cgi?id=43542 ...we have transforms that assume shift operations are legal and transforms to use them are profitable, but that may not hold for simple targets. In this case, the MSP430 target custom lowers shifts by repeating (many) simpler/fixed ops. That can be avoided by keeping this code as setcc/select. Differential Revision: https://reviews.llvm.org/D68397 llvm-svn: 373666	2019-10-03 21:34:04 +00:00
Philip Reames	82cb5bc302	[Tests] Add a unordered atomic load combine test llvm-svn: 373659	2019-10-03 20:28:59 +00:00
Philip Reames	65d63ac05a	[Test] Fix inconsistency in alignment in test case The IR was using a fixed 8 byte alignment, but the MIR portion was using native alignment. Since the test doesn't appear to be deliberately testing overalignment, just make the IR match the MIR. llvm-svn: 373658	2019-10-03 20:24:18 +00:00
Jinsong Ji	230cf9a360	[AArch64][SVE] Move the testcase into CodeGen dir https://reviews.llvm.org/rL373600 added an AArch64 testcase in top dir which should be moved to Codegen dir. llvm-svn: 373657	2019-10-03 20:21:23 +00:00
Jinsong Ji	4a6881eabc	[PowerPC] Adjust the naming and operand order of fnmsub patterns Summary: This is follow up patch of https://reviews.llvm.org/D67595. Adjust naming and the Commutable operands for additional patterns to make it easier to read. The testcase update also show that we can save some unecessary fmr as well. Reviewers: #powerpc, steven.zhang, hfinkel, nemanjai Reviewed By: #powerpc, nemanjai Subscribers: wuzish, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68112 llvm-svn: 373652	2019-10-03 19:36:42 +00:00
Craig Topper	185ee6ec7c	[X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. This patch recognizes the shuffle pattern we get from a v8i64->v8i8 truncate when v8i64 isn't a legal type. With VLX we can use two VTRUNCs, unpckldq, and a insert_subvector. Diffrential Revision: https://reviews.llvm.org/D68374 llvm-svn: 373645	2019-10-03 18:34:42 +00:00
Matt Arsenault	ed77b27441	AMDGPU/GlobalISel: Handle RegBankSelect of G_INSERT_VECTOR_ELT llvm-svn: 373639	2019-10-03 17:59:03 +00:00
Matt Arsenault	233ff982c7	AMDGPU/GlobalISel: Split 64-bit vector extracts during RegBankSelect Register indexing 64-bit elements is possible on the SALU, but not the VALU. Handle splitting this into two 32-bit indexes. Extend waterfall loop handling to allow moving a range of instructions. llvm-svn: 373638	2019-10-03 17:55:27 +00:00
Matt Arsenault	56271fe180	AMDGPU/GlobalISel: Allow VGPR to index SGPR register We can still do a waterfall loop over the index if using a VGPR to index an SGPR. The result will still be a VGPR, but we can avoid the wide copy of the source register to a VGPR. llvm-svn: 373637	2019-10-03 17:50:32 +00:00
Matt Arsenault	9256183994	AMDGPU/GlobalISel: Add some more tests for G_INSERT legalization llvm-svn: 373636	2019-10-03 17:50:31 +00:00
Matt Arsenault	3d23e58dbe	AMDGPU/GlobalISel: Fix mutationIsSane assert v8s8 and This would try to do FewerElements to v9s8 llvm-svn: 373635	2019-10-03 17:50:29 +00:00
James Molloy	9972c992eb	[ModuloSchedule] removeBranch() before creating the trip count condition The Hexagon code assumes there's no existing terminator when inserting its trip count condition check. This causes swp-stages5.ll to break. The generated code looks good to me, it is likely a permutation. I have disabled the new codegen path to keep everything green and will investigate along with the other 3-4 tests that have different codegen. Fixes expensive-checks build. llvm-svn: 373629	2019-10-03 17:10:32 +00:00
Yonghong Song	02ac75092d	[BPF] Handle offset reloc endpoint ending in the middle of chain properly During studying support for bitfield, I found an issue for an example like the one in test offset-reloc-middle-chain.ll. struct t1 { int c; }; struct s1 { struct t1 b; }; struct r1 { struct s1 a; }; #define _(x) __builtin_preserve_access_index(x) void test1(void p1, void p2, void p3); void test(struct r1 arg) { struct s1 ps = _(&arg->a); struct t1 pt = _(&arg->a.b); int *pi = _(&arg->a.b.c); test1(ps, pt, pi); } The IR looks like: %0 = llvm.preserve.struct.access(base, ...) %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 In this case, we need to generate three relocatiions corresponding to chains: (%0), (%0, %1) and (%0, %1, %2). After collecting all the chains, the current implementation process each chain (in a map) with code generation sequentially. For example, after (%0) is processed, the code may look like: %0 = base + special_global_variable // llvm.preserve.struct.access(base, ...) is delisted // from the instruction stream. %1 = llvm.preserve.struct.access(%0, ...) %2 = llvm.preserve.struct.access(%1, ...) using %0, %1 and %2 When processing chain (%0, %1), the current implementation tries to visit intrinsic llvm.preserve.struct.access(base, ...) to get some of its properties and this caused segfault. This patch fixed the issue by remembering all necessary information (kind, metadata, access_index, base) during analysis phase, so in code generation phase there is no need to examine the intrinsic call instructions. This also simplifies the code. Differential Revision: https://reviews.llvm.org/D68389 llvm-svn: 373621	2019-10-03 16:30:29 +00:00
Sanjay Patel	38c265fe26	[MSP430] add tests for unwanted shift codegen; NFC (PR43542) llvm-svn: 373607	2019-10-03 14:54:03 +00:00
Simon Atanasyan	afe7197f13	[mips] Use llvm-readobj `-A` flag in test cases. NFC llvm-svn: 373589	2019-10-03 12:08:04 +00:00
Sander de Smalen	4f99b6f0fe	[AArch64] Static (de)allocation of SVE stack objects. Adds support to AArch64FrameLowering to allocate fixed-stack SVE objects. The focus of this patch is purely to allow the stack frame to allocate/deallocate space for scalable SVE objects. More dynamic allocation (at compile-time, i.e. determining placement of SVE objects on the stack), or resolving frame-index references that include scalable-sized offsets, are left for subsequent patches. SVE objects are allocated in the stack frame as a separate region below the callee-save area, and above the alignment gap. This is done so that the SVE objects can be accessed directly from the FP at (runtime) VL-based offsets to benefit from using the VL-scaled addressing modes. The layout looks as follows: +-------------+ \| stack arg \| +-------------+ \| Callee Saves\| \| X29, X30 \| (if available) \|-------------\| <- FP (if available) \| : \| \| SVE area \| \| : \| +-------------+ \|/////////////\| alignment gap. \| : \| \| Stack objs \| \| : \| +-------------+ <- SP after call and frame-setup SVE and non-SVE stack objects are distinguished using different StackIDs. The offsets for objects with TargetStackID::SVEVector should be interpreted as purely scalable offsets within their respective SVE region. Reviewers: thegameg, rovka, t.p.northover, efriedma, rengolin, greened Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61437 llvm-svn: 373585	2019-10-03 11:33:50 +00:00
Craig Topper	3a6950d3f0	[X86] Add test case for v8i64->v8i8 truncate with avx512 and prefer-vector-width/min-legal-vector-width=256. NFC With vpmovqb, we should be able to do better here until we get AVX512VBMI on Cannonlake/Icelake. llvm-svn: 373569	2019-10-03 06:18:45 +00:00
Matt Arsenault	1c135a39aa	AMDGPU/GlobalISel: Expand G_BITCAST legality llvm-svn: 373567	2019-10-03 05:46:08 +00:00
Craig Topper	eb420aa379	[X86] Add DAG combine to turn (bitcast (vbroadcast_load)) into just a vbroadcast_load if the scalar size is the same. This improves broadcast load folding of i64 elements on 32-bit targets where i64 isn't legal. Previously we had to represent these as vXf64 vbroadcast_loads and a bitcast to vXi64. But we didn't have any isel patterns looking for that. This also allows us to remove or simplify some isel patterns that were looking for bitcasted vbroadcast_loads. llvm-svn: 373566	2019-10-03 05:30:02 +00:00
Craig Topper	f849f41469	[X86] Add broadcast load folding patterns to NoVLX VPMULLQ/VPMAXSQ/VPMAXUQ/VPMINSQ/VPMINUQ patterns. More fixes for PR36191. llvm-svn: 373560	2019-10-03 03:16:27 +00:00
Stanislav Mekhanoshin	1384c3a5b8	[AMDGPU] Fix illegal agpr use by VALU When SIFixSGPRCopies attempts to fix an illegal copy from vector to scalar register it calls moveToVALU(). A copy from an agpr to sgpr becomes a copy from agpr to agpr, which may result in the illegal register class at a use of this copy. Solution is to copy it always into a vgpr. This may result in a subsequent copy into an agpr if that is what really needed, however should not happen too often and likely will be folded later. The opposite situation may not happen because an sgpr is always illegal where agpr is legal, so such user instructions may not exist. Differential Revision: https://reviews.llvm.org/D68358 llvm-svn: 373544	2019-10-02 23:23:46 +00:00
Craig Topper	f5bda7fe24	[X86] Add test cases for suboptimal vselect+setcc splitting. If the vselect result type needs to be split, it will try to also try to split the condition if it happens to be a setcc. With avx512 where k-registers are legal, its probably better to just use a kshift to split the mask register. llvm-svn: 373536	2019-10-02 22:35:03 +00:00
Yi-Hong Lyu	c7be067974	[PowerPC] Fix SH field overflow issue Store rlwinm Rx, Ry, 32, 0, 31 as rlwinm Rx, Ry, 0, 0, 31 and store rldicl Rx, Ry, 64, 0 as rldicl Rx, Ry, 0, 0. Otherwise SH field is overflow and fails assertion in assembly printing stage. Differential Revision: https://reviews.llvm.org/D66991 llvm-svn: 373519	2019-10-02 20:25:16 +00:00
Craig Topper	74c7d6be28	[X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified. Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening. This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR. Differential Revision: https://reviews.llvm.org/D68311 llvm-svn: 373495	2019-10-02 17:47:09 +00:00
Thomas Lively	5b74c39d72	[WebAssembly] Error when using wasm64 for ISel Summary: 64-bit WebAssembly (wasm64) is not specified and not supported in the WebAssembly backend. We do have support for it in clang, however, and we would like to keep that support because we expect wasm64 to be specified and supported in the future. For now add an error when trying to use wasm64 from the backend to minimize user confusion from unexplained crashes. Reviewers: aheejin, dschuff, sunfish Subscribers: sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68254 llvm-svn: 373493	2019-10-02 17:34:44 +00:00
Piotr Sobczak	265e94e657	[AMDGPU] Extend buffer intrinsics with swizzling Summary: Extend cachepolicy operand in the new VMEM buffer intrinsics to supply information whether the buffer data is swizzled. Also, propagate this information to MIR. Intrinsics updated: int_amdgcn_raw_buffer_load int_amdgcn_raw_buffer_load_format int_amdgcn_raw_buffer_store int_amdgcn_raw_buffer_store_format int_amdgcn_raw_tbuffer_load int_amdgcn_raw_tbuffer_store int_amdgcn_struct_buffer_load int_amdgcn_struct_buffer_load_format int_amdgcn_struct_buffer_store int_amdgcn_struct_buffer_store_format int_amdgcn_struct_tbuffer_load int_amdgcn_struct_tbuffer_store Furthermore, disable merging of VMEM buffer instructions in SI Load/Store optimizer, if the "swizzled" bit on the instruction is on. The default value of the bit is 0, meaning that data in buffer is linear and buffer instructions can be merged. There is no difference in the generated code with this commit. However, in the future it will be expected that front-ends use buffer intrinsics with correct "swizzled" bit set. Reviewers: arsenm, nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, arphaman, jfb, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68200 llvm-svn: 373491	2019-10-02 17:22:36 +00:00
Hans Wennborg	9330005a54	Reapply r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)" This was reverted in r373454 due to breaking the expensive-checks bot. This version addresses that by omitting the addSuccessorWithProb() call when omitting the range check. > Switch lowering: omit range check for bit tests when default is unreachable (PR43129) > > This is modeled after the same functionality for jump tables, which was > added in r357067. > > Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373477	2019-10-02 14:35:06 +00:00
Kerry McLaughlin	822b298958	[AArch64][SVE] Implement int_aarch64_sve_cnt intrinsic Summary: This patch includes tests for the VecOfBitcastsToInt type added by D68021 Reviewers: c-rhodes, sdesmalen, rovka Reviewed By: c-rhodes Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits, cfe-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68023 llvm-svn: 373468	2019-10-02 13:09:54 +00:00
James Molloy	9026518e73	[ModuloSchedule] Peel out prologs and epilogs, generate actual code Summary: This extends the PeelingModuloScheduleExpander to generate prolog and epilog code, and correctly stitch uses through the prolog, kernel, epilog DAG. The key concept in this patch is to ensure that all transforms are local; only a function of a block and its immediate predecessor and successor. By defining the problem in this way we can inductively rewrite the entire DAG using only local knowledge that is easy to reason about. For example, we assume that all prologs and epilogs are near-perfect clones of the steady-state kernel. This means that if a block has an instruction that is predicated out, we can redirect all users of that instruction to that equivalent instruction in our immediate predecessor. As all blocks are clones, every instruction must have an equivalent in every other block. Similarly we can make the assumption by construction that if a value defined in a block is used outside that block, the only possible user is its immediate successors. We maintain this even for values that are used outside the loop by creating a limited form of LCSSA. This code isn't small, but it isn't complex. Enabled a bunch of testing from Hexagon. There are a couple of tests not enabled yet; I'm about 80% sure there isn't buggy codegen but the tests are checking for patterns that we don't produce. Those still need a bit more investigation. In the meantime we (Google) are happy with the code produced by this on our downstream SMS implementation, and believe it generates correct code. Subscribers: mgorny, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68205 llvm-svn: 373462	2019-10-02 12:46:44 +00:00
Hans Wennborg	372aece777	Revert r373431 "Switch lowering: omit range check for bit tests when default is unreachable (PR43129)" This broke http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/19967 > Switch lowering: omit range check for bit tests when default is unreachable (PR43129) > > This is modeled after the same functionality for jump tables, which was > added in r357067. > > Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373454	2019-10-02 12:08:44 +00:00
David Green	c9b5ab8b1c	[ARM] Identity shuffles are legal Identity shuffles, of the form (0, 1, 2, 3, ...) are perfectly OK under MVE (they essentially just become bitcasts). We were not catching that in the existing set of what we considered legal though. On NEON, they would be covered by vext's, but that is not generally available in MVE. This uses ShuffleVectorInst::isIdentityMask which is a little odd to use here but does what we want and prevents us from just rewriting what is the same function. Differential Revision: https://reviews.llvm.org/D68241 llvm-svn: 373446	2019-10-02 11:40:51 +00:00
Hans Wennborg	cbefc36fcc	Switch lowering: omit range check for bit tests when default is unreachable (PR43129) This is modeled after the same functionality for jump tables, which was added in r357067. Differential revision: https://reviews.llvm.org/D68131 llvm-svn: 373431	2019-10-02 08:32:15 +00:00
Craig Topper	8d6a863b02	[X86] Add broadcast load folding patterns to the NoVLX compare patterns. These patterns use zmm registers for 128/256-bit compares when the VLX instructions aren't available. Previously we only supported registers, but as PR36191 notes we can fold broadcast loads, but not regular loads. llvm-svn: 373423	2019-10-02 04:45:02 +00:00
Matt Arsenault	cdfe5efe9b	AMDGPU/GlobalISel: Assume VGPR for G_FRAME_INDEX In principle this should behave as any other constant. However eliminateFrameIndex currently assumes a VALU use and uses a vector shift. Work around this by selecting to VGPR for now until eliminateFrameIndex is fixed. llvm-svn: 373415	2019-10-02 01:02:24 +00:00
Matt Arsenault	bfce0c2664	AMDGPU/GlobalISel: Private loads always use VGPRs llvm-svn: 373414	2019-10-02 01:02:21 +00:00
Matt Arsenault	05aa8a733e	AMDGPU/GlobalISel: Legalize 1024-bit G_BUILD_VECTOR This will be needed to support AGPR operations. llvm-svn: 373413	2019-10-02 01:02:18 +00:00
Matt Arsenault	3a657afb3a	AMDGPU/GlobalISel: Fix RegBankSelect for 1024-bit values llvm-svn: 373412	2019-10-02 01:02:14 +00:00
Stanislav Mekhanoshin	075bc48a7f	[AMDGPU] separate accounting for agprs Account and report agprs separately on gfx908. Other targets do not change the reporting. Differential Revision: https://reviews.llvm.org/D68307 llvm-svn: 373411	2019-10-02 00:26:58 +00:00
Craig Topper	8c19925f42	[X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are constant with sufficient sign bits to fit in vXi32 The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size. I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold. Differential Revision: https://reviews.llvm.org/D68247 llvm-svn: 373408	2019-10-01 23:18:31 +00:00
Changpeng Fang	e4ee28d14c	AMDGPU: Fix an out of date assert in addressing FrameIndex Reviewers: arsenm Differential Revision: https://reviews.llvm.org/D67574 llvm-svn: 373404	2019-10-01 23:07:14 +00:00
Craig Topper	0da163a2cf	Revert r373172 "[X86] Add custom isel logic to match VPTERNLOG from 2 logic ops." This seems to be causing some performance regresions that I'm trying to investigate. One thing that stands out is that this transform can increase the live range of the operands of the earlier logic op. This can be bad for register allocation. If there are two logic op inputs we should really combine the one that is closest, but SelectionDAG doesn't have a good way to do that. Maybe we need to do this as a basic block transform in Machine IR. llvm-svn: 373401	2019-10-01 22:40:03 +00:00
Craig Topper	912870573c	[X86] convertToThreeAddress, make sure second operand of SUB32ri is really an immediate before calling getImm(). It might be a symbol instead. We can't fold those since we can't negate them. Similar for other SUB with immediates. Fixes PR43529. llvm-svn: 373397	2019-10-01 21:55:55 +00:00
Sanjay Patel	9738fd6387	[BypassSlowDivision][CodeGenPrepare] avoid crashing on unused code (PR43514) https://bugs.llvm.org/show_bug.cgi?id=43514 llvm-svn: 373394	2019-10-01 21:25:36 +00:00
Jakub Kuderski	7ed4fb389b	Add a missing pass in ARM O3 pipeline llvm-svn: 373382	2019-10-01 18:53:54 +00:00
Jakub Kuderski	856c1cd852	[Dominators][CodeGen] Don't mark MachineDominatorTree as preserved in MachineLICM llvm-svn: 373378	2019-10-01 18:27:44 +00:00
David Green	a3ebcfe5a6	[ARM] Some MVE shuffle plus extend tests. NFC llvm-svn: 373368	2019-10-01 18:04:02 +00:00
Matt Arsenault	9dba603748	AMDGPU/GlobalISel: Increase max legal size to 1024 There are 1024 bit register classes defined for AGPRs. Additionally OpenCL defines vectors up to 16 x i64, and this helps those tests legalize. llvm-svn: 373350	2019-10-01 16:35:06 +00:00
Craig Topper	105e82edde	[X86] Add a VBROADCAST_LOAD ISD opcode representing a scalar load broadcasted to a vector. Summary: This adds the ISD opcode and a DAG combine to create it. There are probably some places where we can directly create it, but I'll leave that for future work. This updates all of the isel patterns to look for this new node. I had to add a few additional isel patterns for aligned extloads which we should probably fix with a DAG combine or something. This does mean that the broadcast load folding for avx512 can no longer match a broadcasted aligned extload. There's still some work to do here for combining a broadcast of a broadcast_load. We also need to improve extractelement or demanded vector elements of a broadcast_load. I'll try to get those done before I submit this patch. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68198 llvm-svn: 373349	2019-10-01 16:28:20 +00:00
Jakub Kuderski	56b52a207f	[Dominators][CodeGen] Add MachinePostDominatorTree verification Summary: This patch implements Machine PostDominator Tree verification and ensures that the verification doesn't fail the in-tree tests. MPDT verification can be enabled using `verify-machine-dom-info` -- the same flag used by Machine Dominator Tree verification. Flipping the flag revealed that MachineSink falsely claimed to preserve CFG and MDT/MPDT. This patch fixes that. Reviewers: arsenm, hliao, rampitec, vpykhtin, grosser Reviewed By: hliao Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68235 llvm-svn: 373341	2019-10-01 15:23:27 +00:00
Sam Parker	ef7990a88a	[NFC][ARM][MVE] More tests Add some tail predication tests with fast math. llvm-svn: 373331	2019-10-01 13:02:14 +00:00
Dmitri Gribenko	827a7fab78	Revert "GlobalISel: Handle llvm.read_register" This reverts commit r373294. It broke Clang's CodeGen/arm64-microsoft-status-reg.cpp: http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/18483 llvm-svn: 373310	2019-10-01 08:24:01 +00:00
Craig Topper	220cf53540	[X86] Consider isCodeGenOnly in the EVEX2VEX pass to make VMAXPD/PS map to the non-commutable VEX instruction. Use EVEX2VEX override to fix the scalar instructions. Previously the match was ambiguous and VMAXPS/PD and VMAXCPS/PD were mapped to the same VEX instruction. But we should keep the commutableness when change the opcode. llvm-svn: 373303	2019-10-01 07:10:09 +00:00
Heejin Ahn	e2bcab6100	[WebAssembly] Make sure EH pads are preferred in sorting Summary: In CFGSort, we try to make EH pads have higher priorities as soon as they are ready to be sorted, to prevent creation of unwind destination mismatches in CFGStackify. We did that by making priority queues' comparison function prefer EH pads, but it was possible for an EH pad to be popped from `Preferred` queue and then not sorted immediately and enter `Ready` queue instead in a certain condition. This patch makes sure that special condition does not consider EH pads as its candidates. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68229 llvm-svn: 373302	2019-10-01 06:53:28 +00:00
Heejin Ahn	61d5c76a18	[WebAssembly] Unstackify regs after fixing unwinding mismatches Summary: Fixing unwind mismatches for exception handling can result in splicing existing BBs and moving some of instructions to new BBs. In this case some of stackified def registers in the original BB can be used in the split BB. For example, we have this BB and suppose %r0 is a stackified register. ``` bb.1: %r0 = call @foo ... use %r0 ... ``` After fixing unwind mismatches in CFGStackify, `bb.1` can be split and some instructions can be moved to a newly created BB: ``` bb.1: %r0 = call @foo bb.split (new): ... use %r0 ... ``` In this case we should make %r0 un-stackified, because its use is now in another BB. When spliting a BB, this CL unstackifies all def registers that have uses in the new split BB. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68218 llvm-svn: 373301	2019-10-01 06:21:53 +00:00
Matt Arsenault	fdea5e02ce	AMDGPU/GlobalISel: Select s1 src G_SITOFP/G_UITOFP llvm-svn: 373298	2019-10-01 02:23:20 +00:00
Matt Arsenault	59b91aa93e	AMDGPU/GlobalISel: Add support for init.exec intrinsics TThe existing wave32 behavior seems broken and incomplete, but this reproduces it. llvm-svn: 373296	2019-10-01 02:07:25 +00:00
Matt Arsenault	bdcc6d3d26	GlobalISel: Handle llvm.read_register SelectionDAG has a bunch of machinery to defer this to selection time for some reason. Just directly emit a copy during IRTranslator. The x86 usage does somewhat questionably check hasFP, which could depend on the whole function being at minimum translated. This does lose the convergent bit if the callsite had it, which may be a problem. We also lose that in general for intrinsics, which may also be a problem. llvm-svn: 373294	2019-10-01 02:07:16 +00:00
Matt Arsenault	8f6bdb7668	AMDGPU/GlobalISel: Avoid creating shift of 0 in arg lowering This is sort of papering over the fact that we don't run a combiner anywhere, but avoiding creating 2 instructions in the first place is easy. llvm-svn: 373293	2019-10-01 01:44:46 +00:00
Craig Topper	5dc49a8374	[X86] Add test case to show missed opportunity to shrink a constant index to a gather in order to avoid splitting. Also add a test case for an index that could be shrunk, but would create a narrow type. We can go ahead and do it we just need to be before type legalization. Similar test cases for scatter as well. llvm-svn: 373290	2019-10-01 01:27:52 +00:00
Matt Arsenault	54167ea316	AMDGPU/GlobalISel: Select G_UADDO/G_USUBO llvm-svn: 373288	2019-10-01 01:23:13 +00:00
Matt Arsenault	ed85b0cee6	GlobalISel: Implement widenScalar for G_SITOFP/G_UITOFP sources Legalize 16-bit G_SITOFP/G_UITOFP for AMDGPU. llvm-svn: 373287	2019-10-01 01:06:48 +00:00
Matt Arsenault	77ac400117	AMDGPU/GlobalISel: Legalize G_GLOBAL_VALUE Handle other cases besides LDS. Mostly a straight port of the existing handling, without the intermediate custom nodes. llvm-svn: 373286	2019-10-01 01:06:43 +00:00
Amaury Sechet	d60c297d1d	Add partial bswap test to the X86 backend. NFC llvm-svn: 373271	2019-09-30 22:52:28 +00:00

1 2 3 4 5 ...

30985 Commits