llvm-project

Commit Graph

Author	SHA1	Message	Date
Chad Rosier	1bbd7fb38e	[AArch64] Add support for generating pre- and post-index load/store pairs. llvm-svn: 248593	2015-09-25 17:48:17 +00:00
Matt Arsenault	0a10900070	AMDGPU: Disable some passes that are not meaningful Don't run passes related to stack maps, garbage collection, exceptions since these aren't useful for GPUs. There might be a few more to turn off that I'm less sure about (e.g. ShrinkWrapping) or I'm not sure how to disable (SafeStack and StackProtector) llvm-svn: 248591	2015-09-25 17:41:20 +00:00
Matt Arsenault	4bf43d4e68	AMDGPU: Handle i64->v2i32 loads/stores in PreprocessISelDAG This fixes a select error when the i64 source was also bitcasted to v2i32 in the original source. Instead of awkwardly trying to select the modified source value and the store, replace before isel begins. Uses a worklist to avoid possible problems from mutating the DAG, although it seems to work OK without it. llvm-svn: 248589	2015-09-25 17:27:08 +00:00
Matt Arsenault	0cb8517dc6	AMDGPU: Fix recomputing dominator tree unnecessarily SIFixSGPRCopies does not modify the CFG, but this was being recomputed before running SIFoldOperands. llvm-svn: 248587	2015-09-25 17:21:28 +00:00
Matt Arsenault	2d6fdb8495	AMDGPU: Re-justify workaround and fix worked around problem When buffer resource descriptors were built, the upper two components of the descriptor were first composed into a 64-bit register because legalizeOperands assumed all operands had the same register class. Fix that problem, but keep the workaround. I'm not sure anything actually is actually emitting such a REG_SEQUENCE now. If multiple resource descriptors are set up with different base pointers, this is copied with a single s_mov_b64. We probably should fix this better by recognizing a pair of s_mov_b32 later, but for now delete the dead code. llvm-svn: 248585	2015-09-25 17:08:42 +00:00
Matt Arsenault	3ad55ec946	AMDGPU: Don't create REG_SEQUENCE with SGPR dest and VGPR sources This avoids needting to re-legalize the new REG_SEQUENCE. llvm-svn: 248584	2015-09-25 17:08:40 +00:00
Matt Arsenault	6525aa3529	AMDGPU: Fix not adding exec to defs of cmpx instruction pseudos This was only set on the final _si/_vi version, but not on the pseudos most of codegen sees. No test since these instructions aren't used yet. llvm-svn: 248583	2015-09-25 16:58:27 +00:00
Matt Arsenault	5f70436c49	AMDGPU: Improve accuracy of instruction rates for VOPC These were all using the default 32-bit VALU write class, but the i64/f64 compares are half rate. I'm not sure this is really correct, because they are still using the write to VALU write class, even though they really write to the SALU. llvm-svn: 248582	2015-09-25 16:58:25 +00:00
Saleem Abdulrasool	8e99f50768	ARM: make -Asserts,-Werror=unused-variable build happy The value was only used in an assertion. Sink the variable usage into the assertion. llvm-svn: 248562	2015-09-25 05:41:02 +00:00
Saleem Abdulrasool	fe83b50289	ARM: address WoA division limitation We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561	2015-09-25 05:15:46 +00:00
Matt Arsenault	8aa9973696	AMDGPU: Remove unused includes llvm-svn: 248553	2015-09-25 00:28:43 +00:00
Chad Rosier	b02f5a5a1f	[AArch64] Improve the readability of the ld/st optimization pass. NFC. In this context, MI is an add/sub instruction not a loads/store. llvm-svn: 248540	2015-09-24 21:27:49 +00:00
Simon Pilgrim	68d0050c6a	[X86][SSE2] Fix zero/any extension shuffles that don't start from the first element Fix for D12561 - we weren't correctly ensuring that the base element for extension was moved to start on a boundary suitable for UNPCKL/H llvm-svn: 248536	2015-09-24 21:02:17 +00:00
Matt Arsenault	e66621b306	AMDGPU: Add s_dcache_* instructions llvm-svn: 248533	2015-09-24 19:52:27 +00:00
Matt Arsenault	d6adfb401c	AMDGPU: Add cache invalidation instructions. These are necessary for implementing mem_fence for OpenCL 2.0. The VI assembler tests are disabled since it seems to be using the wrong encoding or opcode. llvm-svn: 248532	2015-09-24 19:52:21 +00:00
Chad Rosier	7cd472b719	[AArch64] The paired post-increment store instruction has an output register. The pre- and post-increment version update the base register, but the post- version was defined incorrectly. There is no test case as we don't currently generate these instructions, but I plan on changing that in the near future. llvm-svn: 248528	2015-09-24 19:21:42 +00:00
Artyom Skrobov	cf296444ab	[ARM] Handle +t2dsp feature as an ArchExtKind in ARMTargetParser.def Currently, the availability of DSP instructions (ACLE 6.4.7) is handled in a hand-rolled tricky condition block in tools/clang/lib/Basic/Targets.cpp, with a FIXME: attached. This patch changes the handling of +t2dsp to be in line with other architecture extensions. Following a revert of r248152 and new review comments, this patch also includes renaming FeatureDSPThumb2 -> FeatureDSP, hasThumb2DSP() -> hasDSP(), etc. The spelling of "t2dsp" is preserved, pending a further investigation of its possible external usage. Differential Revision: http://reviews.llvm.org/D12937 llvm-svn: 248519	2015-09-24 17:31:16 +00:00
Daniel Sanders	090f6e41c4	[mips] Use PredicateControl for the MSA ASE instructions. NFC. Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13092 llvm-svn: 248486	2015-09-24 12:10:23 +00:00
Matt Arsenault	68d938649e	Introduce target hook for optimizing register copies Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478	2015-09-24 08:36:14 +00:00
Matt Arsenault	e068f9a263	AMDGPU: Return after instruction is processed. llvm-svn: 248476	2015-09-24 07:51:28 +00:00
Matt Arsenault	708586faa2	AMDGPU: Remove another unnecessary check from commuteInstruction llvm-svn: 248475	2015-09-24 07:51:25 +00:00
Matt Arsenault	fa242960fc	AMDGPU: Add readonly to InstrMapping functions llvm-svn: 248474	2015-09-24 07:51:23 +00:00
Matt Arsenault	cab64f1c75	AMDGPU: Fix printing trailing whitespace for mubuf atomics llvm-svn: 248472	2015-09-24 07:51:17 +00:00
Matt Arsenault	c8e2ce4046	AMDGPU: Reduce number of copies emitted Instead of always inserting a copy in case the super register is itself a subregister, only extract to the super reg class if this is actually the case. This shouldn't really change codegen, but makes looking at the output of SIFixSGPRCopies easier to read. llvm-svn: 248467	2015-09-24 07:16:37 +00:00
Tim Northover	beb5bccf88	ARM: fix folding stack adjustment (again again again...) This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437	2015-09-23 22:21:09 +00:00
Sanjay Patel	1a6534661b	[x86] replace integer 'xor' ops with packed SSE FP 'xor' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx xorl %eax, %ecx movd %ecx, %xmm0 into this: xorps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248415	2015-09-23 18:33:42 +00:00
Sanjay Patel	aba37553c4	[x86] replace integer 'or' ops with packed SSE FP 'or' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx orl %eax, %ecx movd %ecx, %xmm0 into this: orps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 This is an extension of: http://reviews.llvm.org/rL248395 llvm-svn: 248409	2015-09-23 18:19:07 +00:00
Evgeniy Stepanov	a2002b08f7	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). This is a re-commit of a change in r248357 that was reverted in r248358. llvm-svn: 248405	2015-09-23 18:07:56 +00:00
Sanjay Patel	b14ecd34f7	move call to convertIntLogicToFPLogic up; NFCI The BEXTR comments didn't make sense before, we may want to extend the FP logic transform to work on vectors, and this way is more beautiful. llvm-svn: 248404	2015-09-23 18:03:37 +00:00
Sanjay Patel	ade3abd2d9	[x86] move code for converting int logic to FP logic to a helper function; NFCI This is a follow-on to: http://reviews.llvm.org/rL248395 so we can add the call to the or/xor combines too. llvm-svn: 248399	2015-09-23 17:39:41 +00:00
Sanjay Patel	df2495f331	[x86] replace integer 'and' ops with packed SSE FP 'and' ops when operating on FP scalars Turn this: movd %xmm0, %eax movd %xmm1, %ecx andl %eax, %ecx movd %ecx, %xmm0 into this: andps %xmm1, %xmm0 This is related to, but does not solve: https://llvm.org/bugs/show_bug.cgi?id=22428 Differential Revision: http://reviews.llvm.org/D13065 llvm-svn: 248395	2015-09-23 17:00:06 +00:00
Dan Gohman	979840d31f	[WebAssembly] Fix hasAddr64 being used before being initializer. This reverts r248388 and fixes the underlying bug: hasAddr64 was initialized in runOnMachineFunction, but runOnMachineFunction isn't ever called in CodeGen/WebAssembly/global.ll since that testcase has no functions. The fix here is to use AsmPrinter's getPointerSize() as needed to determine the pointer size instead. llvm-svn: 248394	2015-09-23 16:59:10 +00:00
Alexander Kornienko	a3eaa204e6	Fix CodeGen/WebAssembly/global.ll test under ASAN. llvm-svn: 248388	2015-09-23 15:41:25 +00:00
Chad Rosier	2dfd35499e	[AArch64] Refactor pre- and post-index merge fuctions into a single function. NFC. llvm-svn: 248377	2015-09-23 13:51:44 +00:00
Oliver Stannard	f2ed5c68d2	[ARM] Add option to force fast-isel The ARM backend has some logic that only allows the fast-isel to be enabled for subtargets where it is known to be stable. This adds a backend option to override this and force the fast-isel to be used for any target, to allow it to be tested. This is an ARM-specific option, because no other backend disables the fast-isel on a per-subtarget basis. llvm-svn: 248369	2015-09-23 09:19:54 +00:00
Simon Pilgrim	9cb018b6b6	[X86][SSE] Replace 128-bit SSE41 PMOVSX intrinsics with native IR This patches removes the x86.sse41.pmovsx* intrinsics, provides a suitable upgrade path and updates relevant tests to sign extend a subvector instead. LLVM counterpart to D12835 Differential Revision: http://reviews.llvm.org/D13002 llvm-svn: 248368	2015-09-23 08:48:33 +00:00
Sanjoy Das	2aacc0ecca	[SCEV] Introduce ScalarEvolution::getOne and getZero. Summary: It is fairly common to call SE->getConstant(Ty, 0) or SE->getConstant(Ty, 1); this change makes such uses a little bit briefer. I've refactored the call sites I could find easily to use getZero / getOne. Reviewers: hfinkel, majnemer, reames Subscribers: sanjoy, llvm-commits Differential Revision: http://reviews.llvm.org/D12947 llvm-svn: 248362	2015-09-23 01:59:04 +00:00
Evgeniy Stepanov	8d0e3011d8	Revert "Android support for SafeStack." test/Transforms/SafeStack/abi.ll breaks when target is not supported; needs refactoring. llvm-svn: 248358	2015-09-23 01:23:22 +00:00
Evgeniy Stepanov	ce2e16f00c	Android support for SafeStack. Add two new ways of accessing the unsafe stack pointer: * At a fixed offset from the thread TLS base. This is very similar to StackProtector cookies, but we plan to extend it to other backends (ARM in particular) soon. Bionic-side implementation here: https://android-review.googlesource.com/170988. * Via a function call, as a fallback for platforms that provide neither a fixed TLS slot, nor a reasonable TLS implementation (i.e. not emutls). llvm-svn: 248357	2015-09-23 01:03:51 +00:00
Ahmed Bougacha	81616a72ea	[ARM] Emit clrex in the expanded cmpxchg fail block. ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294	2015-09-22 17:22:58 +00:00
Ahmed Bougacha	07a844d758	[AArch64] Emit clrex in the expanded cmpxchg fail block. In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248291	2015-09-22 17:21:44 +00:00
Daniel Sanders	86cce70010	[mips][sched] Split IIBranch into specific instruction classes. Summary: Almost no functional change since the InstrItinData's have been duplicated. The one functional change is to remove IIBranch from the MSA branches. The classes will be assigned to the MSA instructions as part of implementing the P5600 scheduler. II_IndirectBranchPseudo and II_ReturnPseudo can probably be removed. I've preserved the itinerary information for the corresponding pseudo instructions to avoid making a functional change to these pseudos in this patch. Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12189 llvm-svn: 248273	2015-09-22 13:36:28 +00:00
Daniel Sanders	1af1d275bc	[mips][sched] Temporarily rename IIAlu to IIM16Alu. NFC. Summary: The only instructions left in IIAlu are MIPS16 specific. We're not implementing a MIPS16 scheduler at this time so rename the class to make it obvious that they are MIPS16 instructions. Reviewers: vkalintiris Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D12188 llvm-svn: 248267	2015-09-22 12:36:28 +00:00
Stephen Canon	8216d88511	Don't raise inexact when lowering ceil, floor, round, trunc. The C standard has historically not specified whether or not these functions should raise the inexact flag. Traditionally on Darwin, these functions did raise inexact, and the llvm lowerings followed that conventions. n1778 (C bindings for IEEE-754 (2008)) clarifies that these functions should not set inexact. This patch brings the lowerings for arm64 and x86 in line with the newly specified behavior. This also lets us fold some logic into TD patterns, which is nice. Differential Revision: http://reviews.llvm.org/D12969 llvm-svn: 248266	2015-09-22 11:43:17 +00:00
NAKAMURA Takumi	10c80e7996	Prune trailing whitespaces. llvm-svn: 248265	2015-09-22 11:19:03 +00:00
NAKAMURA Takumi	0a7d0ad95f	Untabify. llvm-svn: 248264	2015-09-22 11:15:07 +00:00
NAKAMURA Takumi	a9cb538a74	Reformat blank lines. llvm-svn: 248263	2015-09-22 11:14:39 +00:00
NAKAMURA Takumi	84965031a7	Reformat comment lines. llvm-svn: 248262	2015-09-22 11:14:12 +00:00
NAKAMURA Takumi	70ad98aca4	Reformat. llvm-svn: 248261	2015-09-22 11:13:55 +00:00
NAKAMURA Takumi	59a16a76be	ARMInstrInfo.cpp: Reformat. llvm-svn: 248260	2015-09-22 11:10:17 +00:00

1 2 3 4 5 ...

34379 Commits