llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	a65e6b8335	AMDGPU: Remove brev intrinsic llvm-svn: 275620	2016-07-15 21:27:13 +00:00
Matt Arsenault	82e5e1e564	AMDGPU: Fix TargetPrefix for remaining r600 intrinsics llvm-svn: 275619	2016-07-15 21:27:08 +00:00
Matt Arsenault	11d3e21f2b	AMDGPU: Remove AMDGPU.ldexp llvm-svn: 275618	2016-07-15 21:26:56 +00:00
Matt Arsenault	09b2c4aee8	AMDGPU: Remove legacy rsq.clamped intrinsic Mesa still has a use of llvm.AMDGPU.rsq.f64 remaining. Also fix mismatch with non-IEEE rsq selecting to IEEE rsq. llvm-svn: 275617	2016-07-15 21:26:52 +00:00
Saleem Abdulrasool	467269a40e	CodeGen: avoid emitting unnecessary CFI Remove unnecessary clutter in assembly output. When using SjLj EH, the CFI is not actually used for anything. Do not emit the CFI needlessly. The minor test adjustments are interesting. The prologue test was just overzealous matcching. The interesting case is the LSDA change. It was originally added to ensure that various compilations did not mangle the name (it explicitly checked the name!). However, subsequent cleanups made it more reliant on the CFI to find the name. Parse the generated code flow to generically find the label still. llvm-svn: 275614	2016-07-15 21:10:29 +00:00
Nico Weber	8d66df15f4	Teach fast isel about the win64 calling convention. This mostly just works. Vectorcall rets are still not supported. The win64_eh test change is because fast isel doesn't use rsi for temporary computations, so it doesn't need to be pushed. The test case I'm changing was originally added to test pushes, but by now there are other test cases in that file exercising that code path. https://reviews.llvm.org/D22422 llvm-svn: 275607	2016-07-15 20:18:37 +00:00
Vitaly Buka	7f64844481	Revert "[AMDGPU] Add metadata for runtime" This reverts commit r275566. llvm-svn: 275599	2016-07-15 19:14:57 +00:00
Krzysztof Parzyszek	bba0bf7d37	[Hexagon] Improve patterns with stack-based addressing - Treat bitwise OR with a frame index as an ADD wherever possible, fold it into addressing mode. - Extend patterns for memops to allow memops with frame indexes as address operands. llvm-svn: 275569	2016-07-15 15:35:52 +00:00
Nico Weber	f7f2b81602	In dag-optnone.ll, use varargs instead of win64 to fast SDIsel. The test used to rely on targeting win64 to disable fast isel, but I'd like to teach fast isel about win64 rets. Change the test to use varargs to disable fast isel. llvm-svn: 275568	2016-07-15 15:30:18 +00:00
Yaxun Liu	b3d17690eb	[AMDGPU] Add metadata for runtime Added emitting metadata to elf for runtime. Runtime requires certain information (metadata) about kernels to be able to execute and query them. Such information is emitted to an elf section as a key-value pair stream. Differential Revision: https://reviews.llvm.org/D21849 llvm-svn: 275566	2016-07-15 14:58:21 +00:00
Simon Pilgrim	efd841e294	[X86][AVX] Added shuffle tests for UNPCK+PERMUTE lowerVectorShuffleAsPermuteAndUnpack could solve this if it worked with 256-bit vectors llvm-svn: 275554	2016-07-15 11:51:46 +00:00
Simon Pilgrim	cf9c31550c	[X86][AVX2] Added a memory version of test_mm256_broadcastsi128_si256 This should lower to vbroadcasti128 llvm-svn: 275552	2016-07-15 11:40:27 +00:00
Simon Pilgrim	2683ad54ad	[X86][AVX2] Improve lowerShuffleAsRepeatedMaskAndLanePermute permutation of 64-bit sub-lanes As discussed on PR28136, lowerShuffleAsRepeatedMaskAndLanePermute was attempting to match repeated masks at the 128-bit level and then permute the resultant lanes at the 128-bit (AVX1) or 64-bit (AVX2) sub-lane level. This change allows us to create the repeated masks at the sub-lane level (and then concat them together to create a 128-bit repeated mask) and then select which sub-lane to permute. This has no effect on the AVX1 codegen. Fixes PR28136. llvm-svn: 275543	2016-07-15 09:49:12 +00:00
James Molloy	b3326df56a	[Thumb-1] Select post-increment load and store where possible Thumb-1 doesn't have post-inc or pre-inc load or store instructions. However the LDM/STM instructions with writeback can function as post-inc load/store: ldm r0!, {r1} @ load from r0 into r1 and increment r0 by 4 Obviously, this only works if the post increment is 4. llvm-svn: 275540	2016-07-15 08:03:56 +00:00
James Molloy	a454a11d60	[ARM] Prefer indirect calls in minsize mode ... When we emit several calls to the same function in the same basic block. An indirect call uses a "BLX r0" instruction which has a 16-bit encoding. If many calls are made to the same target, this can enable significant code size reductions. llvm-svn: 275537	2016-07-15 07:55:21 +00:00
Matt Arsenault	b91805ea2b	AMDGPU: Fix not expanding control flow after some kill blocks Also stop trying to insert skip blocks at end_cf. This was inserting them at the end of the block which doesn't make sense. The skip should be inserted at the beginning of the block right after the end cf. Just remove this for now since no tests seem to stress this and I think this can be handled more generally later. Fixes bug 28550 llvm-svn: 275510	2016-07-15 00:58:15 +00:00
Matt Arsenault	fa5a86a403	AMDGPU: Fix trying to skip from a block with no successors Found while reducing bug 28550 llvm-svn: 275509	2016-07-15 00:58:13 +00:00
Matt Arsenault	83ab049af2	AMDGPU: Fix splitting kill blocks with defs before kill llvm-svn: 275508	2016-07-15 00:58:09 +00:00
Simon Pilgrim	420b266d0a	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle (reapplied) This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. This was incorrectly reverted in rL275421 during triage of PR28552. llvm-svn: 275497	2016-07-14 23:05:09 +00:00
Krzysztof Parzyszek	ecea07c50e	[Hexagon] Packetize function call arguments with tail call instructions On Hexagon is it legal to packetize the instructions setting up call arguments with the call instruction itself. This was already done, except for tail calls. Make sure tail calls are handled as well. llvm-svn: 275458	2016-07-14 19:30:55 +00:00
Sanjay Patel	2996a342f3	auto-generate checks Note: I removed the checks after each jump because that's noise, but we apparently need branches rather than returning i1 to see the bt codegen in some cases. llvm-svn: 275439	2016-07-14 17:07:55 +00:00
Nico Weber	5bb284226b	Don't optimize movs to pushes in -O0 builds. https://reviews.llvm.org/D22362 llvm-svn: 275431	2016-07-14 15:40:22 +00:00
Nico Weber	3afaf16abc	Revert r275411, it cause PR28552. llvm-svn: 275421	2016-07-14 14:49:35 +00:00
Nico Weber	ecdf45b1e6	Teach fast isel calls and rets about stdcall. stdcall is callee-pop like thiscall, so the thiscall changes already did most of the work for this. This change only opts stdcall in and adds tests. llvm-svn: 275414	2016-07-14 13:54:26 +00:00
Simon Pilgrim	bed37ccd54	[X86][AVX] Added an additional vperm2f128 memory folding test llvm-svn: 275413	2016-07-14 13:40:53 +00:00
Simon Pilgrim	3ecb6bdd5f	[X86][AVX2] Allow VPERMPD/VPERMQ shuffles to call combineShuffle This improves the situation discussed in D19228 where we were forcing VPERMPD/VPERMQ where VPERM2F128/VPERM2I128 would have been better. llvm-svn: 275411	2016-07-14 13:28:43 +00:00
Daniel Sanders	46fe6550ac	[mips] SelectionDAGISel subclasses now follow the optimization level. Summary: It was recently discovered that, for Mips's SelectionDAGISel subclasses, all optimization levels caused SelectionDAGISel to behave like -O2. This change adds the necessary plumbing to initialize the optimization level. Reviewers: andrew.w.kaylor Subscribers: andrew.w.kaylor, sdardis, dean, llvm-commits, vradosavljevic, petarj, qcolombet, probinson, dsanders Differential Revision: https://reviews.llvm.org/D14900 llvm-svn: 275410	2016-07-14 13:25:22 +00:00
Simon Pilgrim	053d32906f	[X86][AVX] Add support for narrowing 128-bit+ shuffle mask elements to 64-bits to allow combining Primarily this is to allow blend with zero instead of having to use vperm2f128, but we can use this in the future to deal with AVX512 cases where we need to keep the original element size to correctly fold masked operations. llvm-svn: 275406	2016-07-14 12:58:04 +00:00
Simon Pilgrim	700e4a1ab8	[X86][AVX] Add 128-bit wide shuffle tests that should combine to blend-with-zero llvm-svn: 275402	2016-07-14 12:21:40 +00:00
Simon Pilgrim	a76a8e50e5	[X86][AVX] Add VBROADCASTF128/VBROADCASTI128 shuffle comments support llvm-svn: 275400	2016-07-14 12:07:43 +00:00
Simon Pilgrim	9e812169cc	[X86][AVX] Regenerate broadcast upgrade tests llvm-svn: 275398	2016-07-14 11:05:43 +00:00
Eli Friedman	17e8ea18e9	[X86] Fix stupid typo in isel lowering. Apparently someone miscounted the number of zeros in the immediate. Fixes https://llvm.org/bugs/show_bug.cgi?id=28544 . llvm-svn: 275376	2016-07-14 05:48:25 +00:00
Matt Arsenault	ca7f5701f8	AMDGPU/R600: Delete/rename intrinsics no longer used by mesa Use the replacement pass to update the tests, and delete old names. llvm-svn: 275375	2016-07-14 05:47:17 +00:00
Matt Arsenault	897eee4187	AMDGPU: Remove unused intrinsics llvm-svn: 275371	2016-07-14 05:23:19 +00:00
Matt Arsenault	aa94c1e7ee	AMDGPU: Fix test not actually testing anything It wasn't actually running the pass, and since it is missing the llvm prefix, the eh intrinsic was not really an IntrinsicInst. Also add missing test for lifetime markers. llvm-svn: 275370	2016-07-14 05:23:15 +00:00
Dean Michael Berris	52735fc435	XRay: Add entry and exit sleds Summary: In this patch we implement the following parts of XRay: - Supporting a function attribute named 'function-instrument' which currently only supports 'xray-always'. We should be able to use this attribute for other instrumentation approaches. - Supporting a function attribute named 'xray-instruction-threshold' used to determine whether a function is instrumented with a minimum number of instructions (IR instruction counts). - X86-specific nop sleds as described in the white paper. - A machine function pass that adds the different instrumentation marker instructions at a very late stage. - A way of identifying which return opcode is considered "normal" for each architecture. There are some caveats here: 1) We don't handle PATCHABLE_RET in platforms other than x86_64 yet -- this means if IR used PATCHABLE_RET directly instead of a normal ret, instruction lowering for that platform might do the wrong thing. We think this should be handled at instruction selection time to by default be unpacked for platforms where XRay is not availble yet. 2) The generated section for X86 is different from what is described from the white paper for the sole reason that LLVM allows us to do this neatly. We're taking the opportunity to deviate from the white paper from this perspective to allow us to get richer information from the runtime library. Reviewers: sanjoy, eugenis, kcc, pcc, echristo, rnk Subscribers: niravd, majnemer, atrick, rnk, emaste, bmakam, mcrosier, mehdi_amini, llvm-commits Differential Revision: http://reviews.llvm.org/D19904 llvm-svn: 275367	2016-07-14 04:06:33 +00:00
Nico Weber	af7e8465e1	Teach fast isel about thiscall (and callee-pop) calls. http://reviews.llvm.org/D22315 llvm-svn: 275360	2016-07-14 01:52:51 +00:00
Mehdi Amini	9e332a7719	Add missing test for r275347 "[IPRA] Set callee saved registers to none for local function when IPRA is enabled." llvm-svn: 275358	2016-07-14 01:31:20 +00:00
Michael Kuperstein	be837fa40f	[DAG] Correctly chain masked loads If a masked loads is not added to the chain, it should not reset the chain's root. This fixes the remaining part of PR28515. llvm-svn: 275340	2016-07-13 23:23:40 +00:00
Quentin Colombet	68a84587c5	[MIR] Fix one GlobalISel test case that I missed in r275314. llvm-svn: 275333	2016-07-13 22:35:33 +00:00
Nico Weber	b888555bcc	Add a triple to fix test on bots after 275320. llvm-svn: 275327	2016-07-13 22:19:40 +00:00
Nico Weber	eb9488b151	Fix a TODO in X86CallFrameOptimization to not rely on a codegen artifact. This happens to make X86CallFrameOptimization in -O0 / FastISel builds as well, but it's not clear if the pass should run in that setup. http://reviews.llvm.org/D22314 llvm-svn: 275320	2016-07-13 21:38:27 +00:00
Quentin Colombet	545e558b82	[MIR] Print on the given output instead of stderr. Currently the MIR framework prints all its outputs (errors and actual representation) on stderr. This patch fixes that by printing the regular output in the output specified with -o. Differential Revision: http://reviews.llvm.org/D22251 llvm-svn: 275314	2016-07-13 20:36:03 +00:00
Matt Arsenault	f071102647	AMDGPU: Remove last AMDIL intrinsics llvm-svn: 275309	2016-07-13 19:42:06 +00:00
Andrew Kaylor	346dd7f1bd	Reverting r275284 due to platform-specific test failures llvm-svn: 275304	2016-07-13 19:09:16 +00:00
Simon Pilgrim	5d664af3c3	[X86][SSE] Regenerate truncated shift test Check SSE2 and AVX2 implementations llvm-svn: 275300	2016-07-13 18:50:10 +00:00
Simon Pilgrim	631643e7d9	Regenerate test llvm-svn: 275299	2016-07-13 18:46:37 +00:00
Krzysztof Parzyszek	cb4dd7656b	Move mempcpy_call.ll to X86 subdirectory llvm-svn: 275294	2016-07-13 18:28:45 +00:00
Andrew Kaylor	12cccdd731	Fix for Bug 26903, adds support to inline __builtin_mempcpy Patch by Sunita Marathe Differential Revision: http://reviews.llvm.org/D21920 llvm-svn: 275284	2016-07-13 17:25:11 +00:00
Matthias Braun	512424f28a	PatchableFunction: Skip pseudos that do not create code This fixes http://llvm.org/PR28524 llvm-svn: 275278	2016-07-13 16:37:29 +00:00

1 2 3 4 5 ...

16580 Commits