llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	8b9c4f2855	[X86][AVX] Cleanup subvector broadcast tests - remove old prefixes. llvm-svn: 310265	2017-08-07 15:50:43 +00:00
Sanjay Patel	807f92b8ff	[x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097) llvm-svn: 310264	2017-08-07 15:47:48 +00:00
Matt Arsenault	8728c5f2db	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258	2017-08-07 14:58:04 +00:00
Nirav Dave	3d3bde7682	[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector. Relanding after case to insert explicit truncation as necessary. Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally improves vector shuffle computations. Reviewers: efriedma, RKSimon, spatel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35566 llvm-svn: 310256	2017-08-07 14:07:49 +00:00
Michael Zuckerman	680ac10aa7	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4). This patch expands the support of lowerInterleavedStore to 16x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 16 chars: c0, c1, , c16 m0, m1, , m16 y0, y1, , y16 k0, k1, ., k16 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Differential Revision: https://reviews.llvm.org/D35829 llvm-svn: 310252	2017-08-07 13:22:39 +00:00
Simon Pilgrim	e2bb83a4ee	[X86][AVX] Added test for broadcast shuffle with undefs (PR34041) llvm-svn: 310249	2017-08-07 12:24:33 +00:00
Guy Blank	5ca01695f7	[SelectionDAG] reset NewNodesMustHaveLegalTypes flag between basic blocks The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types. But before calling CodeGenAndEmitDAG we build the DAG for the basic block. So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'. This patch sets the flag to false before SDAG building each basic block. Differential Revision: https://reviews.llvm.org/D33435 llvm-svn: 310239	2017-08-07 05:51:14 +00:00
Sanjay Patel	a923c2ee95	[x86] use more shift or LEA for select-of-constants We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but I think those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs. 5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310208	2017-08-06 16:27:07 +00:00
Simon Pilgrim	aeb2b1e36d	[X86][X87] Regenerate inline-asm tests llvm-svn: 310201	2017-08-06 12:17:10 +00:00
Simon Pilgrim	5489781cc7	[X86][X87] Add test case for PR34080 Test with/without the sandybridge (default) model for SSE2, SSE3 and AVX targets. pre-SSE3 the issue is the order of the fpsw and fpcw load/stores (with SSE3 trunc-store FIST instructions avoid the sw/cw manipulations). llvm-svn: 310198	2017-08-06 11:22:33 +00:00
Craig Topper	6bfa2aee78	[X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled Summary: On older processors this instruction encoding is treated as a NOP. MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2. This change also seems to also be consistent with gcc behavior. Fixes PR34079 Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36361 llvm-svn: 310190	2017-08-05 23:34:44 +00:00
Matt Arsenault	b94972cb82	IPRA: Don't crash on null getCallPreservedMask Kernels aren't callable, so they don't have a call preserved mask. llvm-svn: 310172	2017-08-05 07:50:18 +00:00
Joel Jones	60711ca253	[AArch64] LSE Atomics reorg - part 1 Add memory synchronization semantics to LSE Atomics. The memory semantics feature will be added in a subsequent patch. In this patch, several corrections were added to the existing LSE Atomics implementation, based on the ARM Errata D11904 from 05/12/2017. Patch by: steleman Differential Revision: https://reviews.llvm.org/D35319 llvm-svn: 310167	2017-08-05 04:30:55 +00:00
Reid Kleckner	61776e6910	Commit the local change I had to make my test pass llvm-svn: 310153	2017-08-05 00:15:40 +00:00
Reid Kleckner	7662d50d10	[X86] Teach fastisel to select calls to dllimport functions Summary: Direct calls to dllimport functions are very common Windows. We should add them to the -O0 fast path. Reviewers: rafael Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36197 llvm-svn: 310152	2017-08-05 00:10:43 +00:00
Craig Topper	4a8a0a2a88	[X86] Regenerate the fsin/fcos instruction test using update_llc_test_checks.py. NFC This looks to have been converted from a grep based test at some point in a really strange way. llvm-svn: 310150	2017-08-04 23:36:03 +00:00
Nico Weber	b24df62bb6	Revert r310058, it caused PR34073. llvm-svn: 310118	2017-08-04 20:24:13 +00:00
Ulrich Weigand	a11f63a952	[SystemZ] Add support for 128-bit atomic load/store/cmpxchg This adds support for the main 128-bit atomic operations, using the SystemZ instructions LPQ, STPQ, and CDSG. Generating these instructions is a bit more complex than usual since the i128 type is not legal for the back-end. Therefore, we have to hook the LowerOperationWrapper and ReplaceNodeResults TargetLowering callbacks. llvm-svn: 310094	2017-08-04 18:57:58 +00:00
Ulrich Weigand	02f1c02c27	[SystemZ] Eliminate unnecessary serialization operations We currently emit a serialization operation (bcr 14, 0) before every atomic load and after every atomic store. This is overly conservative. The SystemZ architecture actually does not require any serialization for atomic loads, and a serialization after an atomic store only if we need to enforce sequential consistency. This is what other compilers for the platform implement as well. llvm-svn: 310093	2017-08-04 18:53:35 +00:00
Connor Abbott	66b9bd6e50	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088	2017-08-04 18:36:54 +00:00
Connor Abbott	92638ab625	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	8c217d0a29	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Chad Rosier	14fc82a1df	[AArch64] Fix an assertion for pre-index generation with unscaled loads/stores. Differential Revision: https://reviews.llvm.org/D36248 PR34035 llvm-svn: 310066	2017-08-04 16:44:06 +00:00
Simon Pilgrim	5c63586489	[DAGCombiner] Extending pattern detection for vector shuffle. If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 310058	2017-08-04 12:46:35 +00:00
Zoran Jovanovic	1c17001235	[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP Author: milena.vujosevic.janicic The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Usage of u_int64_t replaced by uint64_t to avoid issues because of which previous patch version was reverted: Differential Revision: https://reviews.llvm.org/D34511 llvm-svn: 310044	2017-08-04 10:18:44 +00:00
Stanislav Mekhanoshin	6c7a8d0b5f	[AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL In case if SI_KILL is in between of the SI_IF and SI_END_CF we need to preserve the bits actually flipped by if rather then restoring the original mask. Differential Revision: https://reviews.llvm.org/D36299 llvm-svn: 310031	2017-08-04 06:58:42 +00:00
Connor Abbott	00755362b9	[AMDGPU] Add missing hazard for DPP-after-EXEC-write Summary: Following the docs, we need at least 5 wait states between an EXEC write and an instruction that uses DPP. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34849 llvm-svn: 310013	2017-08-04 01:09:43 +00:00
Matt Arsenault	a176cc5b93	AMDGPU: Don't use report_fatal_error for unsupported call types llvm-svn: 310004	2017-08-03 23:32:41 +00:00
Matt Arsenault	a202538bfa	AMDGPU: Remove error on calls for amdgcn Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. llvm-svn: 310003	2017-08-03 23:24:05 +00:00
Matt Arsenault	817c253e60	AMDGPU: Fix implicitarg.ptr handling special inputs llvm-svn: 310002	2017-08-03 23:12:44 +00:00
Matt Arsenault	8623e8d864	AMDGPU: Pass special input registers to functions llvm-svn: 309998	2017-08-03 23:00:29 +00:00
Taewook Oh	424c88c381	Move unit test to the proper location Summary: Move test/CodeGen/AArch64/reg-bank-128bit.mir to test/CodeGen/AArch64/GlobalISel/reg-bank-128bit.mir so that the test is executed only when global-isel is enabled. lit.local.cfg under test/CodeGen/AArch64/GlobalISel checks if 'global-isel' is in the available_features while the same file under test/CodeGen/AArch64 doesn't. Reviewers: qcolombet, davide Reviewed By: davide Subscribers: davide, aemerson, javed.absar, igorb, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D36282 llvm-svn: 309985	2017-08-03 21:07:12 +00:00
Connor Abbott	82267a553d	test commit llvm-svn: 309981	2017-08-03 20:22:30 +00:00
Simon Pilgrim	99d0ab38bc	[X86] Adding a test for vector shuffle extractions. When both the vector inputs of the shuffle vector is comprising of same vector or shuffle mask is accessing elements from only one operand vector (like in PR33758 test already present). Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D36271 llvm-svn: 309963	2017-08-03 17:04:59 +00:00
Simon Pilgrim	0dfd4fa8d3	[X86][AVX512] Tidied up v64i8 vector shuffle tests with triple llvm-svn: 309961	2017-08-03 16:56:52 +00:00
Changpeng Fang	ef4dbb46da	AMDGPU/SI: Don't fix a PHI under uniform branch in SIFixSGPRCopies only when sources and destination are all sgprs Summary: If a PHI has at lease one VGPR operand, we have to fix the PHI in SIFixSGPRCopies. Reviewer: Matt Differential Revision: http://reviews.llvm.org/D34727 llvm-svn: 309959	2017-08-03 16:37:02 +00:00
Nirav Dave	3fc1c2365c	[DAG] Allow merging of stores of vector loads Remove restriction disallowing merging of stores vector loads into larger store of larger vector load. Reviewers: RKSimon, efriedma, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36158 llvm-svn: 309951	2017-08-03 15:51:20 +00:00
Nico Weber	bfde70b097	Revert r309923, it caused PR34045. llvm-svn: 309950	2017-08-03 15:41:26 +00:00
Simon Dardis	51296593a8	[SelectionDAG] Resolve PR33978. rL306209 taught SelectionDAG how to add the dereferenceable flag when expanding memcpy and memmove. The fix however contained a nit where the offset + size was constructed as an APInt of PointerSize rather than PointerSizeInBits. This lead to isDereferenceableAndAlignedPointer() get truncated values or values which would be sign extended within that function leading to incorrect results. Thanks to Alex Crichton for reporting the issue! This resolves PR33978. Reviewers: inouehrs Differential Revision: https://reviews.llvm.org/D36236 llvm-svn: 309930	2017-08-03 09:38:46 +00:00
Diana Picus	930e6ec8f3	[ARM] GlobalISel: Select simple G_GLOBAL_VALUE instructions Add support in the instruction selector for G_GLOBAL_VALUE for ELF and MachO for the static relocation model. We don't handle Windows yet because that's Thumb-only, and we don't handle Thumb in general at the moment. Support for PIC, ROPI, RWPI and TLS will be added in subsequent commits. Differential Revision: https://reviews.llvm.org/D35883 llvm-svn: 309927	2017-08-03 09:14:59 +00:00
Dinar Temirbulatov	a0beedef1c	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926	2017-08-03 08:50:18 +00:00
Roger Ferrer Ibanez	854980341b	[ARM] Use ADDCARRY / SUBCARRY This patch: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 309923	2017-08-03 07:45:10 +00:00
Rafael Espindola	79e238afee	Delete Default and JITDefault code models IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911	2017-08-03 02:16:21 +00:00
Tom Stellard	3337d74399	AMDGPU/GlobalISel: Mark 32-bit G_FMUL as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36218 llvm-svn: 309898	2017-08-02 22:56:30 +00:00
Stefan Pintilie	873889ca16	[Power9] Exploit vector absolute difference instructions on Power 9 Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW) for byte, halfword and word. We should take advantage of these. Differential Revision: https://reviews.llvm.org/D34684 llvm-svn: 309876	2017-08-02 20:07:21 +00:00
Evandro Menezes	3840db527b	[AArch64] Add Exynos M2 feature test (NFC) Test fusion of AES operations. llvm-svn: 309855	2017-08-02 18:55:34 +00:00
Evandro Menezes	89a67b8353	[AArch64] Improve the test of conditional branch fusion Separate the checking of the fused pairings with B.cc and CBcc. llvm-svn: 309825	2017-08-02 15:34:06 +00:00
Diana Picus	d5a00b0ff6	[MIR] Print target-specific constant pools This should enable us to test the generation of target-specific constant pools, e.g. for ARM: constants: - id: 0 value: 'g(GOT_PREL)-(LPC0+8-.)' alignment: 4 isTargetSpecific: true I intend to use this to test PIC support in GlobalISel for ARM. This is difficult to test outside of that context, since the existing MIR tests usually rely on parser support as well, and that seems a bit trickier to add. We could try to add a unit test, but the setup for that seems rather convoluted and overkill. We do test however that the parser reports a nice error when encountering a target-specific constant pool. Differential Revision: https://reviews.llvm.org/D36092 llvm-svn: 309806	2017-08-02 11:09:30 +00:00
Matt Arsenault	8e8f8f43b0	AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it llvm-svn: 309783	2017-08-02 01:52:45 +00:00
Matt Arsenault	1d6317c3ad	AMDGPU: Fix emitting encoded calls This was failing on out of bounds access to the extra operands on the s_swappc_b64 beyond those in the instruction definition. This was working, but somehow regressed within the past few weeks, although I don't see any obvious commit. llvm-svn: 309782	2017-08-02 01:42:04 +00:00
Matt Arsenault	6ed7b9bfc0	AMDGPU: Analyze callee resource usage in AsmPrinter llvm-svn: 309781	2017-08-02 01:31:28 +00:00
Matt Arsenault	d1867c0345	AMDGPU: Don't place arguments in emergency stack slot When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776	2017-08-02 00:59:51 +00:00
Matt Arsenault	acc5e82b0e	DAG: Undo and->or combine with FrameIndexes This pattern shows up when lowering byval copies on AMDGPU. The byval object access is split into 4-byte chunks, adding a constant offset to the FixedStack base. When some of the offsets turn into ors, this prevents combining the constant offsets. This makes it not apparent that the object is there when matching addressing modes, so it ends up using a scratch wave offset relative access and the lengthy frame index expansion for that. llvm-svn: 309775	2017-08-02 00:43:42 +00:00
Matthias Braun	6b898beb8e	X86: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309774	2017-08-02 00:28:10 +00:00
Stanislav Mekhanoshin	da0edef1bd	[AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused With SI_END_CF elimination for some nested control flow we can now eliminate saved exec register completely by turning a saveexec version of instruction into just a logical instruction. Differential Revision: https://reviews.llvm.org/D36007 llvm-svn: 309766	2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin	37e7f959c0	[AMDGPU] Collapse adjacent SI_END_CF Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 llvm-svn: 309762	2017-08-01 23:14:32 +00:00
Matthias Braun	7c91336efe	ARM: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309755	2017-08-01 22:20:49 +00:00
Matthias Braun	e2d2ce9ff1	PowerPC: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. This patch: - Removes -march if the .ll file already has a matching `target triple` directive or -mtriple argument. - In all other cases changes -march=ppc32/-march=ppc64 to -mtriple=ppc32--/-mtriple=ppc64-- See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309754	2017-08-01 22:20:41 +00:00
Adrian Prantl	032d2381bf	Remove PrologEpilogInserter's usage of DBG_VALUE's offset field In the last half-dozen commits to LLVM I removed code that became dead after removing the offset parameter from llvm.dbg.value gradually proceeding from IR towards the backend. Before I can move on to DwarfDebug and friends there is one last side-called offset I need to remove: This patch modifies PrologEpilogInserter's use of the DBG_VALUE's offset argument to use a DIExpression instead. Because the PrologEpilogInserter runs at the Machine level I had to play a little trick with a named llvm.dbg.mir node to get the DIExpressions to print in MIR dumps (which print the llvm::Module followed by the MachineFunction dump). I also had to add rudimentary DwarfExpression support to CodeView and as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo was supposed to give up on variables with complex DIExpressions, but would fail to do so for fragments, which are also modeled as DIExpressions). With this last holdover removed we will have only one canonical way of representing offsets to debug locations which will simplify the code in DwarfDebug (and future versions of CodeViewDebug once it starts handling more complex expressions) and make it easier to reason about. This patch is NFC-ish: All test case changes are for assembler comments and the binary output does not change. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D36125 llvm-svn: 309751	2017-08-01 21:45:24 +00:00
Martin Storsjo	eacf4e408b	[AArch64] Rewrite stack frame handling for win64 vararg functions The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset). Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack consists of mainly three parts; - AFI->getLocalStackSize() - AFI->getCalleeSavedStackSize() - FixedObject Most of the places in the code which previously used the CSStackSize now use PrologueSaveSize instead, which is the sum of the latter two, while some cases which need exactly the middle one use AFI->getCalleeSavedStackSize() explicitly instead of a local variable. In addition to moving the offsetting into emitPrologue/emitEpilogue (which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference. Add tests for a function that keeps a frame pointer and another one that uses a VLA. Differential Revision: https://reviews.llvm.org/D35919 llvm-svn: 309744	2017-08-01 21:13:54 +00:00
Matt Arsenault	206f826348	AMDGPU: Fix handling of div_scale with undef inputs The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743	2017-08-01 20:49:41 +00:00
Matt Arsenault	a5fcb83bfa	AMDGPU: Add test for r308774 llvm-svn: 309733	2017-08-01 19:54:58 +00:00
Matt Arsenault	b62a4eb524	AMDGPU: Initial implementation of calls Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732	2017-08-01 19:54:18 +00:00
Simon Pilgrim	f0a5d09b8d	[X86][SSE3] Add scheduler tests for MONITOR/MWAIT llvm-svn: 309718	2017-08-01 18:16:44 +00:00
Nirav Dave	27a6605bdc	Revert "[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector." This reverts commit r309680 which appears to be raising an assertion in the test-suite. llvm-svn: 309717	2017-08-01 18:09:25 +00:00
Simon Pilgrim	486072d3d6	[X86][SSE] Added missing vector logic intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715	2017-08-01 17:51:20 +00:00
Sanjay Patel	4dbdd470a8	[CGP] use narrower types in memcmp expansion when possible This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711	2017-08-01 17:24:54 +00:00
Craig Topper	2a5bba7325	[X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706	2017-08-01 17:18:14 +00:00
Simon Pilgrim	3f24ff6130	[X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701	2017-08-01 16:47:48 +00:00
Craig Topper	e2cc589283	[X86] Split bmi.ll into a bmi test and a bmi2 test. This moves all the bmi2 specific intrinsics to a separate test file and adds a bmi1 only command line to the existing bmi test. This will allow us to see the missed opportunity to use bextr to handle 64-bit 'and' with a large mask. This will be improved in an upcoming patch. llvm-svn: 309700	2017-08-01 16:45:11 +00:00
Simon Pilgrim	810677eba2	[X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules llvm-svn: 309699	2017-08-01 16:18:25 +00:00
Manoj Gupta	7ebbe655ed	[X86] Fix a crash in FEntryInserter Pass. Summary: FEntryInserter pass unconditionally derefs the first Instruction in the first Basic Block. The pass crashes when the first BasicBlock is empty. Fix the crash by not dereferencing the basic Block iterator. This fixes an issue observed when building Linux kernel 4.4 with clang. Fixes PR33971. Reviewers: hfinkel, niravd, dblaikie Reviewed By: niravd Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D35979 llvm-svn: 309694	2017-08-01 15:39:12 +00:00
Craig Topper	2462a713ae	[AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693	2017-08-01 15:31:24 +00:00
Simon Pilgrim	78b305f8d5	[X86][SSSE3] Fix typos in pabsw/pmulhrsw tests for load folding scheduling. llvm-svn: 309692	2017-08-01 15:31:24 +00:00
Simon Pilgrim	8484698321	[X86] Added missing cpu to fix generic scheduling model tests llvm-svn: 309691	2017-08-01 15:14:35 +00:00
Nirav Dave	b5cb48c6ae	[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector. Summary: Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally improves vector shuffle computations. Reviewers: efriedma, RKSimon, spatel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35566 llvm-svn: 309680	2017-08-01 13:45:35 +00:00
Strahinja Petrovic	a2b4748bdc	[Mips] Fix for BBIT octeon instruction This patch enables control flow optimization for variations of BBIT instruction. In this case optimization removes unnecessary branch after BBIT instruction. Differential Revision: https://reviews.llvm.org/D35359 llvm-svn: 309679	2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek	91ff5c6d47	[Hexagon] Convert HVX vector constants of i1 to i8 Certain operations require vector of i1 values. However, for Hexagon architecture compatibility, they need to be represented as vector of i8. Patch by Suyog Sarda. llvm-svn: 309677	2017-08-01 13:12:53 +00:00
Simon Pilgrim	4e0b41450d	[X86] Regenerate big structure return test and check on x86_64 as well. llvm-svn: 309676	2017-08-01 13:12:15 +00:00
Tom Stellard	9d8337d857	AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35916 llvm-svn: 309675	2017-08-01 12:38:33 +00:00
Andrew V. Tischenko	d56595184b	Support itineraries in TargetSubtargetInfo::getSchedInfoStr - Now if the given instr does not have sched model then we try to calculate the latecy/throughput with help of itineraries. Differential Revision https://reviews.llvm.org/D35997 llvm-svn: 309666	2017-08-01 09:15:43 +00:00
Eli Friedman	37f41d1e7c	[ScheduleDAG] Don't schedule node with physical register interference https://reviews.llvm.org/D31536 didn't really solve the problem it was trying to solve; it got rid of the assertion failure, but we were still scheduling the DAG incorrectly (mixing together instructions from different calls), leading to a MachineVerifier failure. In order to schedule the DAG correctly, we have to make sure we don't schedule a node which should be blocked by an interference. Fix ScheduleDAGRRList::PickNodeToScheduleBottomUp so it doesn't pick a node like that. The added call to FindAvailableNode() is the key change here; this makes sure we don't try to schedule a call while we're in the middle of scheduling a different call. I'm not sure this is the right approach; in particular, I'm not sure how to prove we don't end up with an infinite loop of repeatedly backtracking. This also reverts the code change from D31536. It doesn't do anything useful: we should never schedule an ADJCALLSTACKDOWN unless we've already scheduled the corresponding ADJCALLSTACKUP. Differential Revision: https://reviews.llvm.org/D33818 llvm-svn: 309642	2017-08-01 00:28:40 +00:00
Craig Topper	410d252f5b	[AVX-512] Add unmasked subvector inserts and extract to the execution domain tables. llvm-svn: 309632	2017-07-31 22:07:29 +00:00
Craig Topper	7dd26785e7	[AVX512] Add a common prefix to avx512-insert-extract.ll so we can reduce the number of check lines on some test cases. This was pointed out during the review for D313804. llvm-svn: 309629	2017-07-31 21:20:06 +00:00
Craig Topper	043c44efff	[AVX-512] Use AVX512 as test check prefix instead of AVX3. NFC llvm-svn: 309625	2017-07-31 20:58:06 +00:00
Konstantin Belochapka	b77d0a5cf1	[X86][MMX] Added custom lowering action for MMX SELECT (PR30418) Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch Differential Revision: https://reviews.llvm.org/D34661 llvm-svn: 309614	2017-07-31 20:11:49 +00:00
Quentin Colombet	15f6ffbf7c	[TargetPassConfig] Feature generic options to setup start/stop-after/before This patch refactors the code used in llc such that all the users of the addPassesToEmitFile API have access to a homogeneous way of handling start/stop-after/before options right out of the box. In particular, just invoking addPassesToEmitFile will set the proper pipeline without additional effort (modulo parsing a .mir file if the start-before/after options are used. NFC. Differential Revision: https://reviews.llvm.org/D30913 llvm-svn: 309599	2017-07-31 18:24:07 +00:00
Sanjay Patel	fea731a4aa	[CGP] use subtract or subtract-of-cmps for result of memcmp expansion As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 llvm-svn: 309597	2017-07-31 18:08:24 +00:00
Craig Topper	cb0e74975a	[AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589	2017-07-31 17:35:44 +00:00
Aditya Nandakumar	02c602e18c	[GISel]: Support Widening G_ICMP's destination operand. Updated AArch64 to widen destination to s32. https://reviews.llvm.org/D35737 Reviewed by Tim llvm-svn: 309579	2017-07-31 17:00:16 +00:00
Simon Pilgrim	11ea6fdcd7	[X86] Extending a test cases for LEA factorization. Submitted on the behalf of Jatin Bhateja Differential Revision: https://reviews.llvm.org/D36048 llvm-svn: 309565	2017-07-31 14:23:28 +00:00
Simon Dardis	8930b4a049	[SelectionDAG][mips] Fix PR33883 PR33883 shows that calls to intrinsic functions should not have their vector arguments or returns subject to ABI changes required by the target. This resolves PR33883. Thanks to Alex Crichton for reporting the issue! Reviewers: zoran.jovanovic, atanasyan Differential Revision: https://reviews.llvm.org/D35765 llvm-svn: 309561	2017-07-31 14:06:58 +00:00
Guy Blank	b169d56dc3	[X86][AVX512] Add masked MOVS[S\|D] patterns Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546	2017-07-31 08:26:14 +00:00
Craig Topper	97e9fa7954	[X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535	2017-07-31 05:55:54 +00:00
Michael Zuckerman	11148120d0	Expanding the test case for vf8 for stride 4 interleaved. llvm-svn: 309511	2017-07-30 11:54:57 +00:00
Florian Hahn	f63a5e91db	[AArch64] Tie source and destination operands for AESMC/AESIMC. Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E\|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495	2017-07-29 20:35:28 +00:00
Florian Hahn	2f86e3d494	[AArch64] Use 8 bytes as preferred function alignment on Cortex-A53. Summary: This change gives a 0.25% speedup on execution time, a 0.82% improvement in benchmark scores and a 0.20% increase in binary size on a Cortex-A53. These numbers are the geomean results on a wide range of benchmarks from the test-suite and a range of proprietary suites. Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin Reviewed By: rengolin Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35568 llvm-svn: 309494	2017-07-29 20:04:54 +00:00
Simon Pilgrim	718cb0ea62	[SelectionDAG][X86] CombineBT - more aggressively determine demanded bits This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486	2017-07-29 14:50:25 +00:00
Matt Arsenault	9608a2891d	AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472	2017-07-29 01:26:21 +00:00
Matt Arsenault	dc8f5cc39c	AMDGPU: Teach isLegalAddressingMode about global_* instructions Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471	2017-07-29 01:12:31 +00:00

1 2 3 4 5 ...

21168 Commits