llvm-project

Commit Graph

Author	SHA1	Message	Date
Scott Linder	35213793bc	[CodeGen] Fix assert in SelectionDAG::computeKnownBits Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 llvm-svn: 339600	2018-08-13 18:44:21 +00:00
Matt Arsenault	3763f307bd	AMDGPU: Cleanup min/max legacy tests Also add some more tests in preparation for a future patch. llvm-svn: 339526	2018-08-12 19:29:53 +00:00
Matt Arsenault	1201301b94	DAG: Check no-signed-zeros instead of unsafe-fp-math Addresses fixme, although this should still be checking individual operand flags. llvm-svn: 339525	2018-08-12 19:09:12 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Jan Vesely	7b2c98ab59	AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190	2018-08-07 21:54:37 +00:00
Matt Arsenault	08f3fe4fae	AMDGPU: cvt_pk_rtz_f16 canonicalizes llvm-svn: 339078	2018-08-06 23:01:31 +00:00
Matt Arsenault	e94ee833f9	AMDGPU: Handle some vector operations in isCanonicalized llvm-svn: 339077	2018-08-06 22:45:51 +00:00
Matt Arsenault	a29e76244a	AMDGPU: Push fcanonicalize through partially constant build_vector This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072	2018-08-06 22:30:44 +00:00
Matt Arsenault	d49ab0b214	AMDGPU: Treat more custom operations as canonicalizing Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065	2018-08-06 21:58:11 +00:00
Matt Arsenault	ce6d61fba8	AMDGPU: Conversions always produce canonical results Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064	2018-08-06 21:51:52 +00:00
Matt Arsenault	f8768bfc84	AMDGPU: Fix implementation of isCanonicalized If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061	2018-08-06 21:38:27 +00:00
Matt Arsenault	0d1b3934e2	AMDGPU: Fold v_lshl_or_b32 with 0 src0 Appears from expansion of some packed cases. llvm-svn: 339025	2018-08-06 15:40:20 +00:00
Matt Arsenault	dbf77c5b41	AMDGPU: Rename check prefixes in test Will avoid noisy diff in future change. llvm-svn: 339022	2018-08-06 15:16:12 +00:00
Matt Arsenault	c3dc8e65e2	DAG: Enhance isKnownNeverNaN Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910	2018-08-03 18:27:52 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Tim Renouf	f1c7b92a6a	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779	2018-08-02 22:53:57 +00:00
Matt Arsenault	1f3977a856	DAG: Fix vector widening fcanonicalize llvm-svn: 338715	2018-08-02 13:43:53 +00:00
Matt Arsenault	36cdcfadcf	AMDGPU: Fix scalarizing v4f16 fcanonicalize llvm-svn: 338714	2018-08-02 13:43:42 +00:00
Matt Arsenault	709374d186	AMDGPU: Improve hack for packing conversion ops Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619	2018-08-01 20:13:58 +00:00
Matt Arsenault	55ab9213d3	AMDGPU: Partially fix handling of packed amdgpu_ps arguments Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618	2018-08-01 19:57:34 +00:00
Jan Vesely	93b252799b	AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) llvm-svn: 338610	2018-08-01 18:36:07 +00:00
Ryan Taylor	894c8fd0e2	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523	2018-08-01 12:12:01 +00:00
Konstantin Zhuravlyov	bb30ef7af4	AMDGPU: Add clamp bit to dot intrinsics Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470	2018-08-01 01:31:30 +00:00
Matt Arsenault	118c47b6d1	AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests R600 breaks on too many things to usefully test changes with ieee_mode on vs. off. llvm-svn: 338435	2018-07-31 20:38:42 +00:00
Matt Arsenault	feedabfde7	AMDGPU: Break 64-bit arguments into 32-bit pieces llvm-svn: 338421	2018-07-31 19:29:04 +00:00
Matt Arsenault	0395da7842	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418	2018-07-31 19:17:47 +00:00
Matt Arsenault	9ced1e0d80	AMDGPU: Scalarize vector argument types to calls When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416	2018-07-31 19:05:14 +00:00
Matt Arsenault	a5ed032118	DAG: Fix PromoteFloatResult for fcanonicalize llvm-svn: 338382	2018-07-31 14:15:22 +00:00
Matt Arsenault	4aec86d37a	AMDGPU: Fold undef fcanonicalize to qNaN We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376	2018-07-31 13:34:31 +00:00
Matt Arsenault	c1335eaf7e	AMDGPU: Fix test check line bugs llvm-svn: 338374	2018-07-31 13:25:23 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Stanislav Mekhanoshin	7e7268ac1c	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938	2018-07-25 17:02:11 +00:00
Tom Stellard	b7f19e6d1e	AMDGPU/GlobalISel: Legalize G_INSERT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798	2018-07-24 02:19:20 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00

1 2 3 4 5 ...

1740 Commits