llvm-project

Commit Graph

Author	SHA1	Message	Date
Artyom Skrobov	7fd67e25aa	Adding support for TargetLoweringBase::LibCall Summary: TargetLoweringBase::Expand is defined as "Try to expand this to other ops, otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between the two possibilities was defined in a rather convoluted way: - if DIVREM is legal, expand to DIVREM - if DIVREM has a custom lowering, expand to DIVREM - if DIVREM libcall is defined and a remainder from the same division is computed elsewhere, expand to a DIVREM libcall - else, expand to a DIV libcall This had the undesirable effect that if both DIV and DIVREM are implemented as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM libcall, even when the remainder isn't used. The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that backends can directly control whether they prefer an expansion or a conversion to a libcall. This makes the generic lowering code even more generic, allowing its reuse in a wider range of target-specific configurations. The useful effect is that ARM backend will now generate a call to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where it doesn't need the remainder. There's no functional change outside the ARM backend. Reviewers: t.p.northover, rengolin Subscribers: t.p.northover, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D13862 llvm-svn: 250826	2015-10-20 13:14:52 +00:00
Igor Breger	21296d230a	AVX512: Implemented encoding and intrinsics for VPBROADCASTB/W/D/Q instructions. Differential Revision: http://reviews.llvm.org/D13884 llvm-svn: 250819	2015-10-20 11:56:42 +00:00
Andrea Di Biagio	9a85b7abe0	[x86] Fix AVX maskload/store intrinsic prototypes. The mask value type for maskload/maskstore GCC builtins is never a vector of packed floats/doubles. This patch fixes the following issues: 1. The mask argument for builtin_ia32_maskloadpd and builtin_ia32_maskstorepd should be of type llvm_v2i64_ty and not llvm_v2f64_ty. 2. The mask argument for builtin_ia32_maskloadpd256 and builtin_ia32_maskstorepd256 should be of type llvm_v4i64_ty and not llvm_v4f64_ty. 3. The mask argument for builtin_ia32_maskloadps and builtin_ia32_maskstoreps should be of type llvm_v4i32_ty and not llvm_v4f32_ty. 4. The mask argument for builtin_ia32_maskloadps256 and builtin_ia32_maskstoreps256 should be of type llvm_v8i32_ty and not llvm_v8f32_ty. Differential Revision: http://reviews.llvm.org/D13776 llvm-svn: 250817	2015-10-20 11:20:13 +00:00
Matt Arsenault	8f18917a90	AMDGPU: Stop reserving v[254:255] This wasn't doing anything useful. They weren't explicitly used anywhere, and the RegScavenger ignores reserved registers. This for some reason caused a random scheduling change in the test. Getting the check lines to pass is too frustrating, and there's probably not too much value in checking the vector case's operands N times. llvm-svn: 250794	2015-10-20 03:59:58 +00:00
JF Bastien	c8f89e86d5	WebAssembly: fix call/return syntax. They are now typeless, unlike other operations. llvm-svn: 250793	2015-10-20 01:26:54 +00:00
JF Bastien	3b0177c542	WebAssembly: fix syntax for br_if. llvm-svn: 250777	2015-10-20 00:37:42 +00:00
Cong Hou	7745dbc5c4	Enhance loop rotation with existence of profile data in MachineBlockPlacement pass. Currently, in MachineBlockPlacement pass the loop is rotated to let the best exit to be the last BB in the loop chain, to maximize the fall-through from the loop to outside. With profile data, we can determine the cost in terms of missed fall through opportunities when rotating a loop chain and select the best rotation. Basically, there are three kinds of cost to consider for each rotation: 1. The possibly missed fall through edge (if it exists) from BB out of the loop to the loop header. 2. The possibly missed fall through edges (if they exist) from the loop exits to BB out of the loop. 3. The missed fall through edge (if it exists) from the last BB to the first BB in the loop chain. Therefore, the cost for a given rotation is the sum of costs listed above. We select the best rotation with the smallest cost. This is only for PGO mode when we have more precise edge frequencies. Differential revision: http://reviews.llvm.org/D10717 llvm-svn: 250754	2015-10-19 23:16:40 +00:00
Jun Bum Lim	d3548303ec	[AArch64]Merge halfword loads into a 32-bit load Convert two halfword loads into a single 32-bit word load with bitfield extract instructions. For example : ldrh w0, [x2] ldrh w1, [x2, #2] becomes ldr w0, [x2] ubfx w1, w0, #16, #16 and w0, w0, #ffff llvm-svn: 250719	2015-10-19 18:34:53 +00:00
Krzysztof Parzyszek	db8677067c	[Hexagon] Delay emission of CFI instructions Emit the CFI instructions after all code transformation have been done. This will avoid any interference between CFI instructions and packetization. llvm-svn: 250714	2015-10-19 17:46:01 +00:00
Asiri Rathnayake	1040a53be3	Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions The mapping of these two intrinsics in ARMInstrInfo.td had a small omission which lead to their operands not being validated/transformed before being lowered into usat and ssat instructions. This can cause incorrect instructions to be emitted. I've also added tests for the remaining two saturating arithmatic intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing codegen tests. llvm-svn: 250697	2015-10-19 11:44:24 +00:00
Simon Pilgrim	4708060e94	[X86][SSE] Add vector bit rotation tests. llvm-svn: 250656	2015-10-18 12:54:37 +00:00
Asaf Badouh	696e8e0bb7	[X86][AVX512DQ] add scalar fpclass Differential Revision: http://reviews.llvm.org/D13769 llvm-svn: 250650	2015-10-18 11:04:38 +00:00
Igor Breger	cbb9550537	AVX512: Lowering i8/i16 vector CTLZ using the dword LZCNT vector instruction Differential Revision: http://reviews.llvm.org/D13632 llvm-svn: 250649	2015-10-18 09:56:39 +00:00
Simon Pilgrim	4773763bb0	[X86][XOP] Add VPROT rotate by immediate intrinsics tests llvm-svn: 250618	2015-10-17 18:21:53 +00:00
Simon Pilgrim	5b65f28fe7	[X86][FastISel] Teach how to select SSE4A nontemporal stores. Add FastISel support for SSE4A scalar float / double non-temporal stores Follow up to D13698 Differential Revision: http://reviews.llvm.org/D13773 llvm-svn: 250610	2015-10-17 13:04:42 +00:00
Colin LeMahieu	68d155be8e	[Hexagon] Reverting test file change. llvm-svn: 250601	2015-10-17 01:58:51 +00:00
Colin LeMahieu	7c9587136d	[Hexagon] Adding skeleton of HVX extension instructions. llvm-svn: 250600	2015-10-17 01:33:04 +00:00
JF Bastien	3428ed4f53	WebAssembly: don't omit dead vregs from locals Summary: This is a temporary hack until we get around to remapping the vreg numbers to local numbers. Dead vregs cause bad numbering and make consumers sad. We could also just look at debug info an use named locals instead, but vregs have to work properly anyways so there! Reviewers: binji, sunfish Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D13839 llvm-svn: 250594	2015-10-17 00:25:38 +00:00
JF Bastien	4f43e80ece	WebAssembly: fix the syntax for comparisons Summary: It has also slightly changed. Reviewers: binji Subscribers: jfb, dschuff, llvm-commits, sunfish Differential Revision: http://reviews.llvm.org/D13837 llvm-svn: 250591	2015-10-17 00:12:29 +00:00
Joseph Tremoulet	55b51e9dcc	[WinEH] Fix eh.exceptionpointer intrinsic lowering Summary: Some shared code for handling eh.exceptionpointer and eh.exceptioncode needs to not share the part that truncates to 32 bits, which is intended just for exception codes. Reviewers: rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13747 llvm-svn: 250588	2015-10-17 00:08:08 +00:00
Reid Kleckner	28e490342b	[WinEH] Fix stack alignment in funclets and ParentFrameOffset calculation Our previous value of "16 + 8 + MaxCallFrameSize" for ParentFrameOffset is incorrect when CSRs are involved. We were supposed to have a test case to catch this, but it wasn't very rigorous. The main effect here is that calling _CxxThrowException inside a catchpad doesn't immediately crash on MOVAPS when you have an odd number of CSRs. llvm-svn: 250583	2015-10-16 23:43:27 +00:00
Sanjay Patel	bbd524496c	[x86] promote 'add nsw' to a wider type to allow more combines The motivation for this patch starts with PR20134: https://llvm.org/bugs/show_bug.cgi?id=20134 void foo(int *a, int i) { a[i] = a[i+1] + a[i+2]; } It seems better to produce this (14 bytes): movslq %esi, %rsi movl 0x4(%rdi,%rsi,4), %eax addl 0x8(%rdi,%rsi,4), %eax movl %eax, (%rdi,%rsi,4) Rather than this (22 bytes): leal 0x1(%rsi), %eax cltq leal 0x2(%rsi), %ecx movslq %ecx, %rcx movl (%rdi,%rcx,4), %ecx addl (%rdi,%rax,4), %ecx movslq %esi, %rax movl %ecx, (%rdi,%rax,4) The most basic problem (the first test case in the patch combines constants) should also be fixed in InstCombine, but it gets more complicated after that because we need to consider architecture and micro-architecture. For example, AArch64 may not see any benefit from the more general transform because the ISA solves the sexting in hardware. Some x86 chips may not want to replace 2 ADD insts with 1 LEA, and there's an attribute for that: FeatureSlowLEA. But I suspect that doesn't go far enough or maybe it's not getting used when it should; I'm also not sure if FeatureSlowLEA should also mean "slow complex addressing mode". I see no perf differences on test-suite with this change running on AMD Jaguar, but I see small code size improvements when building clang and the LLVM tools with the patched compiler. A more general solution to the sext(add nsw(x, C)) problem that works for multiple targets is available in CodeGenPrepare, but it may take quite a bit more work to get that to fire on all of the test cases that this patch takes care of. Differential Revision: http://reviews.llvm.org/D13757 llvm-svn: 250560	2015-10-16 22:14:12 +00:00
Joseph Tremoulet	d11a998e81	[WinEH] Fix CatchRetSuccessorColorMap accounting Summary: We now use the block for the catchpad itself, rather than its normal successor, as the funclet entry. Putting the normal successor in the map leads downstream funclet membership computations to erroneous results. Reviewers: majnemer, rnk Subscribers: rnk, llvm-commits Differential Revision: http://reviews.llvm.org/D13798 llvm-svn: 250552	2015-10-16 21:22:54 +00:00
Andrew Kaylor	09b39acc03	Fix assertion failure with fp128 to unsigned i64 conversion Patch by Mitch Bodart Differential Revision: http://reviews.llvm.org/D13780 llvm-svn: 250550	2015-10-16 20:39:20 +00:00
Krzysztof Parzyszek	a7c5f0409c	[Hexagon] Split double registers llvm-svn: 250549	2015-10-16 20:38:54 +00:00
Krzysztof Parzyszek	5b7dd0cdf9	[Hexagon] Merge adjacent stores llvm-svn: 250542	2015-10-16 19:43:56 +00:00
JF Bastien	6126d2b883	WebAssembly: fix load/store syntax Summary: The syntax has changed a bit recently. Reviewers: binji Subscribers: llvm-commits, jfb, sunfish, dschuff Differential Revision: http://reviews.llvm.org/D13821 llvm-svn: 250535	2015-10-16 18:24:42 +00:00
Joseph Tremoulet	53e9cbd95a	[WinEH] Fix endpad coloring/numbering Summary: When a cleanup's cleanupendpad or cleanupret targets a catchendpad, stop trying to propagate the cleanup's parent's color to the catchendpad, since what's needed is the cleanup's grandparent's color and the catchendpad will get that color from the catchpad linkage already. We already had this exclusion for invokes, but were missing it for cleanupendpad/cleanupret. Also add a missing line that tags cleanupendpads' states in the EHPadStateMap, without with lowering invokes that target cleanupendpads which unwind to other handlers (and so don't have the -1 state) will fail. This fixes the reduced IR repro in PR25163. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D13797 llvm-svn: 250534	2015-10-16 18:08:16 +00:00
Charlie Turner	434d4599d4	[AArch64] Implement vector splitting on UADDV. Summary: Fixes PR25056. Reviewers: mcrosier, junbuml, jmolloy Subscribers: aemerson, rengolin, llvm-commits Differential Revision: http://reviews.llvm.org/D13466 llvm-svn: 250520	2015-10-16 15:38:25 +00:00
Craig Topper	09b6598572	[X86] Add fxsr feature flag for fxsave/fxrestore instructions. llvm-svn: 250497	2015-10-16 06:03:09 +00:00
JF Bastien	1d20a5e9e8	WebAssembly: update syntax Summary: Follow the same syntax as for the spec repo. Both have evolved slightly independently and need to converge again. This, along with wasmate changes, allows me to do the following: echo "int add(int a, int b) { return a + b; }" > add.c ./out/bin/clang -O2 -S --target=wasm32-unknown-unknown add.c -o add.wack ./experimental/prototype-wasmate/wasmate.py add.wack > add.wast ./sexpr-wasm-prototype/out/sexpr-wasm add.wast -o add.wasm ./sexpr-wasm-prototype/third_party/v8-native-prototype/v8/v8/out/Release/d8 -e "print(WASM.instantiateModule(readbuffer('add.wasm'), {print:print}).add(42, 1337));" As you'd expect, the d8 shell prints out the right value. Reviewers: sunfish Subscribers: jfb, llvm-commits, dschuff Differential Revision: http://reviews.llvm.org/D13712 llvm-svn: 250480	2015-10-16 00:53:49 +00:00
JF Bastien	2cdd5e4710	x86: preserve flags when folding atomic operations D4796 taught LLVM to fold some atomic integer operations into a single instruction. The pattern was unaware that the instructions clobbered flags. I fixed some of this issue in D13680 but had missed INC/DEC. This patch adds the missing EFLAGS definition. llvm-svn: 250438	2015-10-15 18:24:52 +00:00
Kevin B. Smith	89760f04f9	Change test to use FileCheck rather than grep. Differential Revision: http://reviews.llvm.org/D13751 llvm-svn: 250431	2015-10-15 17:05:12 +00:00
JF Bastien	5b327712b0	x86 FP atomic codegen: don't drop globals, stack Summary: x86 codegen is clever about generating good code for relaxed floating-point operations, but it was being silly when globals and immediates were involved, forgetting where the global was and loading/storing from/to the wrong place. The same applied to hard-coded address immediates. Don't let it forget about the displacement. This fixes https://llvm.org/bugs/show_bug.cgi?id=25171 A very similar bug when doing floating-points atomics to the stack is also fixed by this patch. This fixes https://llvm.org/bugs/show_bug.cgi?id=25144 Reviewers: pete Subscribers: llvm-commits, majnemer, rsmith Differential Revision: http://reviews.llvm.org/D13749 llvm-svn: 250429	2015-10-15 16:46:29 +00:00
Daniel Sanders	8008de5551	[mips][mips16] MIPS16 is not a CPU/Architecture but is an ASE. Summary: The -mcpu=mips16 option caused the Integrated Assembler to crash because it couldn't figure out the architecture revision number to write to the .MIPS.abiflags section. This CPU definition has been removed because, like microMIPS, MIPS16 is an ASE to a base architecture. Reviewers: vkalintiris Subscribers: rkotler, llvm-commits, dsanders Differential Revision: http://reviews.llvm.org/D13656 llvm-svn: 250407	2015-10-15 14:34:23 +00:00
Igor Breger	d7bae451de	AVX512: Implemented DAG lowering for shuff62x2/shufi62x2 instructions ( shuffle packed values at 128-bit granularity ) Differential Revision: http://reviews.llvm.org/D13648 llvm-svn: 250400	2015-10-15 13:29:07 +00:00
Igor Breger	b4bb190eed	AVX512: Implemented encoding and intrinsics for vpternlogd/q. Differential Revision: http://reviews.llvm.org/D13768 llvm-svn: 250396	2015-10-15 12:33:24 +00:00
Elena Demikhovsky	ecff21b297	AVX-512: Fixed a bug in shuffle lowering 32-bit mode AVX-512 bit shuffle fails on 32 bit since we create a vector of 64-bit constants. I split 8x64-bit const vector to 16x32 on 32-bit mode. Differential Revision: http://reviews.llvm.org/D13644 llvm-svn: 250390	2015-10-15 11:35:33 +00:00
Andrea Di Biagio	6a61cecef0	[x86] Merge test pr24562.ll into x86-fold-pshufb.ll. NFC. llvm-svn: 250387	2015-10-15 09:54:25 +00:00
Akira Hatanaka	8ad7399f8e	[MachO] Stop generating coal sections. Recommit r250342: move coal-sections-powerpc.s to subdirectory for powerpc. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250370	2015-10-15 05:28:38 +00:00
Quentin Colombet	5084e44d71	[ARM] Make sure we do not dereference the end iterator when accessing debug information. Although the problem was always here, it would only be exposed when shrink-wrapping is enable. rdar://problem/23110493 llvm-svn: 250352	2015-10-15 00:41:26 +00:00
Akira Hatanaka	276332b47f	Revert r250349. Test case coal-sections-powerpc.s is still failing on some buildbots. llvm-svn: 250351	2015-10-15 00:11:03 +00:00
Akira Hatanaka	1cea644114	[MachO] Stop generating coal sections. Recommit r250342: add -arch=ppc32 to the RUN lines of powerpc tests. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250349	2015-10-14 23:48:10 +00:00
Akira Hatanaka	d58d347e42	Revert r250342. Investigate why coal-sections-powerpc.s is failing on some buildbots. llvm-svn: 250346	2015-10-14 23:29:10 +00:00
Akira Hatanaka	c078ae3e4f	[MachO] Stop generating coal sections. Some background on why we don't have to use coal sections anymore: Long ago when C++ was new and "weak" had not been standardized, an attempt was made in cctools to support C++ inlines that can be coalesced by putting them into their own section (TEXT/textcoal_nt instead of TEXT/text). The current macho linker supports the weak-def bit on any symbol to allow it to be coalesced, but the compiler still puts weak-def functions/data into alternate section names, which the linker must map back to the base section name. This patch makes changes that are necessary to prevent the compiler from using the "coal" sections and have it use the non-coal sections instead when the target architecture is not powerpc: TEXT/textcoal_nt instead use TEXT/text TEXT/const_coal instead use TEXT/const DATA/datacoal_nt instead use DATA/data If the target is powerpc, we continue to use the coal sections since anyone targeting powerpc is probably using an old linker that doesn't have support for the weak-def bits. Also, have the assembler issue a warning if it encounters a coal section in the assembly file and inform the users to use the non-coal sections instead. rdar://problem/14265330 Differential Revision: http://reviews.llvm.org/D13188 llvm-svn: 250342	2015-10-14 22:45:36 +00:00
Sanjay Patel	e2b528074d	add x86 codegen tests for 'add nsw' followed by 'sext' llvm-svn: 250332	2015-10-14 21:47:03 +00:00
Bill Schmidt	048cc97fb1	[PowerPC] Fix invalid lxvdsx optimization (PR25157) PR25157 identifies a bug where a load plus a vector shuffle is incorrectly converted into an LXVDSX instruction. That optimization is only valid if the load is of a doubleword, and in the noted case, it was not. This corrects that problem. Joint patch with Eric Schweitz, who provided the bugpoint-reduced test case. llvm-svn: 250324	2015-10-14 20:45:00 +00:00
Andrea Di Biagio	c47edbef4c	[x86][FastISel] Teach how to select nontemporal stores. This patch teaches x86 fast-isel how to select nontemporal stores. On x86, we can use MOVNTI for nontemporal stores of doublewords/quadwords. Instructions (V)MOVNTPS/PD/DQ can be used for SSE2/AVX aligned nontemporal vector stores. Before this patch, fast-isel always selected 'movd/movq' instead of 'movnti' for doubleword/quadword nontemporal stores. In the case of nontemporal stores of aligned vectors, fast-isel always selected movaps/movapd/movdqa instead of movntps/movntpd/movntdq. With this patch, if we use SSE2/AVX intrinsics for nontemporal stores we now always get the expected (V)MOVNT instructions. The lack of fast-isel support for nontemporal stores was spotted when analyzing the -O0 codegen for nontemporal stores. Differential Revision: http://reviews.llvm.org/D13698 llvm-svn: 250285	2015-10-14 10:03:13 +00:00
Joseph Tremoulet	28c89bbb36	[WinEH] Add CoreCLR EH table emission Summary: Emit the handler and clause locations immediately after the standard xdata. Clauses are emitted in the same order and format used to communiate them to the CLR Execution Engine. Add a lit test to verify correct table generation on a small but interesting example function. Reviewers: majnemer, andrew.w.kaylor, rnk Subscribers: pgavlin, AndyAyers, llvm-commits Differential Revision: http://reviews.llvm.org/D13451 llvm-svn: 250219	2015-10-13 20:18:27 +00:00
Matt Arsenault	e5d9515fb7	DAGCombiner: Don't stop finding better chain on 2 aliases The comment says this was stopped because it was unlikely to be profitable. This is not true if you want to combine vector loads with multiple components. For a simple case that looks like t0 = load t0 ... t1 = load t0 ... t2 = load t0 ... t3 = load t0 ... t4 = store t0:1, t0:1 t5 = store t4, t1:0 t6 = store t5, t2:0 t7 = store t6, t3:0 We want to get all of these stores onto a chain that is a TokenFactor of these N loads. This mostly solves the AMDGPU merge-stores.ll regressions with -combiner-alias-analysis for merging vector stores of vector loads. llvm-svn: 250138	2015-10-13 00:49:00 +00:00

1 2 3 4 5 ...

13962 Commits