llvm-project

Commit Graph

Author	SHA1	Message	Date
Eli Friedman	96e3cd85bd	[ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available. The inline sequence is very long (about 70 bytes on Thumb1), so it's not really a good idea to inline it, especially when optimizing for size. Differential Revision: https://reviews.llvm.org/D47917 llvm-svn: 340458	2018-08-22 21:47:14 +00:00
Martin Storsjo	5ab1d107bb	[ARM] Avoid injecting constant islands in movw+movt pairs on Windows On Windows, movw+movt pairs with relocations are handled with a single relocation that covers them both. Therefore we can't inject anything between these instructions, otherwise the relocation (which in LLVM only is treated as the movw instruction's relocation, while the movt instruction's relocation is dropped) will end up bogus. These instructions are bundled up until right before the constant islands pass, making this effectively the only place that can split them apart. Differential Revision: https://reviews.llvm.org/D51032 llvm-svn: 340451	2018-08-22 20:34:12 +00:00
Eli Friedman	c11e2b9470	[ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant. This avoids a potential infinite loop setting and unsetting bits in the mask. Reduced from a failure on the polly-aosp bot. Differential Revision: https://reviews.llvm.org/D51066 llvm-svn: 340446	2018-08-22 20:13:45 +00:00
Sam Parker	4d519fc3b5	[ARM] Rotated operand patterns for *xtb16 Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16 so that they can perform a ror. Differential Revision: https://reviews.llvm.org/D51034 llvm-svn: 340405	2018-08-22 12:58:36 +00:00
Sam Parker	597811e7a7	[DAGCombiner] Reduce load widths of shifted masks During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261	2018-08-21 10:26:59 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Sam Parker	0d51197051	[ARM] Ignore GEPs in ARMCodeGenPrepare While searching through the use-def tree, ignore GetElementPtrInst instructions because they don't need promoting and neither do their indices. Otherwise, the wide indices prevent the transformation from happening. Differential Revision: https://reviews.llvm.org/D50762 llvm-svn: 339871	2018-08-16 12:24:40 +00:00
Sam Parker	0e2f0bd48e	[ARM] Allow zext in ARMCodeGenPrepare Treat zext instructions as roots, like we do for truncs. Differential Revision: https://reviews.llvm.org/D50759 llvm-svn: 339868	2018-08-16 11:54:09 +00:00
Sam Parker	13567dbbd8	[ARM] Allow signed icmps in ARMCodeGenPrepare Originally committed in r339755 which was reverted in r339806 due to an asan issue. The issue was caused by my assumption that operands to a CallInst mapped to the FunctionType Params. CallInsts are now handled by iterating over their ArgOperands instead of Operands. Original Message: Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339858	2018-08-16 10:05:39 +00:00
Vitaly Buka	ed4239f482	Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare" use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806	2018-08-15 20:09:35 +00:00
Sam Parker	fabf7fe5f8	[ARM] TypeSize lower bound for ARMCodeGenPrepare We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770	2018-08-15 13:29:50 +00:00
Sam Parker	6548cd3905	[ARM] Allow signed icmps in ARMCodeGenPrepare Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755	2018-08-15 08:23:03 +00:00
Sam Parker	7def86bbdb	[ARM] Allow pointer values in ARMCodeGenPrepare Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754	2018-08-15 07:52:35 +00:00
Luke Geeson	4ce41d2bb7	[ARM] Added FP16 VREV Vector Instrinsic CodeGen support llvm-svn: 339546	2018-08-13 08:37:41 +00:00
Eli Friedman	e1687a89e8	[ARM] Adjust AND immediates to make them cheaper to select. LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472	2018-08-10 21:21:53 +00:00
Sam Parker	8c4b964c5a	[ARM] Disallow zexts in ARMCodeGenPrepare Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432	2018-08-10 13:57:13 +00:00
Evandro Menezes	9a92fe0c9e	[ARM] Replace processor check with feature Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354	2018-08-09 16:13:24 +00:00
Sjoerd Meijer	806f70d229	[ARM] FP16: codegen support for VTRN Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340	2018-08-09 12:45:09 +00:00
Petr Hosek	7b27454477	[ADT] Normalize empty triple components LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294	2018-08-08 22:23:57 +00:00
Eli Friedman	5b45a39056	[ARM] Avoid spilling lr with Thumb1 tail calls. Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283	2018-08-08 20:03:10 +00:00
Ties Stuij	0244aa67d6	revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved' llvm-svn: 339276	2018-08-08 17:19:32 +00:00
Ties Stuij	52f3631f4b	[CodeGen] emit inline asm clobber list warnings for reserved Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257	2018-08-08 15:15:59 +00:00
Sjoerd Meijer	1919ecfd0b	[ARM][NFC] Replaced tab-characters in test file vtrn.ll llvm-svn: 339251	2018-08-08 14:42:11 +00:00
Sjoerd Meijer	f8c394f0f5	[ARM] FP16: codegen support for VEXT Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241	2018-08-08 13:26:38 +00:00
Sjoerd Meijer	db5908deb9	[ARM] FP16: vector vmov and vdup support This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238	2018-08-08 13:11:31 +00:00
Sjoerd Meijer	920a453485	[ARM] FP16: vector VMUL variants This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232	2018-08-08 10:27:34 +00:00
Sjoerd Meijer	b33a4c02cc	[ARM] FP16: support vector INT_TO_FP and FP_TO_INT This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227	2018-08-08 09:45:34 +00:00
Thomas Preud'homme	4107b31df2	Support inline asm with multiple 64bit output in 32bit GPR Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45437 llvm-svn: 339225	2018-08-08 09:35:26 +00:00
Sjoerd Meijer	b264944ed5	[ARM] FP16: support the vector vmin and vmax variants Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221	2018-08-08 07:20:15 +00:00
Sjoerd Meijer	b39cd886b9	[ARM] FP16: codegen support for VACGT Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148	2018-08-07 15:11:47 +00:00
Sjoerd Meijer	a2ddddfd3e	[ARM][NFC] Replaced tab characters in test file vfcmp.ll. llvm-svn: 339111	2018-08-07 08:05:15 +00:00
Sjoerd Meijer	d62c5ec2fe	[ARM] FP16: support vector zip and unzip This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835	2018-08-03 09:24:29 +00:00
Sjoerd Meijer	9b30213828	[ARM] FP16: support VFMA This is addressing PR38404. llvm-svn: 338830	2018-08-03 09:12:56 +00:00
Sjoerd Meijer	8e7fab0443	[ARM][NFC] Follow up of r338568 I disabled more tests than necessary, this enables them. llvm-svn: 338717	2018-08-02 14:04:48 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Sjoerd Meijer	590e4e8dde	[ARM] Armv8.2-A FP16 vector intrinsics tests Clang support for the Armv8.2-A FP16 vector intrinsic was committed in rC328277, but this was never followed up, i.e. the LLVM part is missing. I've raised PR38404, and this is the first step to address this. I.e., this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows which intrinsics already work, and which need further work. Differential Revision: https://reviews.llvm.org/D50142 llvm-svn: 338568	2018-08-01 14:43:59 +00:00
Reid Kleckner	b32ff46ff7	Revert r338354 "[ARM] Revert r337821" Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452	2018-07-31 23:09:42 +00:00
Sam Parker	2a6c842fda	[ARM] Revert r337821 Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354	2018-07-31 09:04:14 +00:00
Thomas Preud'homme	196149c943	Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR" This reapplies commit r338206 reverted by r338214 since the bug that r338206 uncovered has been fixed in r338268. Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338269	2018-07-30 16:48:39 +00:00
Thomas Preud'homme	6c1b075299	Fix uninitialized read in ARM's PrintAsmOperand Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268	2018-07-30 16:45:40 +00:00
Petr Pavlu	8b6eff4e77	[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233	2018-07-30 08:49:30 +00:00
Sanjay Patel	7312206f2f	revert r338206 because the test does not pass Example of bot failure: http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll llvm-svn: 338214	2018-07-29 14:30:49 +00:00
Thomas Preud'homme	74ffd14e15	Fix crash on inline asm with 64bit matching input in 32bit GPR Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338206	2018-07-28 21:33:39 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Evandro Menezes	fcca45f0dd	[ARM] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147	2018-07-27 18:16:47 +00:00
Thomas Preud'homme	768d6ce4a3	Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR Add support for inline assembly with output operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). llvm-svn: 337903	2018-07-25 11:11:12 +00:00
Sam Parker	8b93e82c3d	[ARM] Disable ARMCodeGenPrepare by default ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821	2018-07-24 12:04:23 +00:00
Sam Parker	3828c6ff94	[ARM] ARMCodeGenPrepare backend pass Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687	2018-07-23 12:27:47 +00:00
Evandro Menezes	fffa9b5897	[ARM] Add new feature to enable optimizing the VFP registers Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575	2018-07-20 16:49:28 +00:00
Tim Northover	a7c18ee8c4	ARM: switch armv7em MachO triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450	2018-07-19 12:44:51 +00:00
Tim Northover	da142d10d9	Revert "ARM: switch armv7em triple to hard-float defaults and libcalls." This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424	2018-07-18 21:32:49 +00:00
Tim Northover	d4abd14c1b	ARM: switch armv7em triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385	2018-07-18 12:37:04 +00:00
Simon Pilgrim	e4d12bb2d6	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258	2018-07-17 09:45:35 +00:00
Simon Pilgrim	a39389ebff	[ARM] Regenerated arg endian test As requested on D49262 llvm-svn: 336980	2018-07-13 09:16:56 +00:00
Joel E. Denny	9fa9c9368d	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843	2018-07-11 20:25:49 +00:00
Sjoerd Meijer	53449daa96	[ARM] ParallelDSP: multiple reduction stmts in loop This fixes an issue that we were not properly supporting multiple reduction stmts in a loop, and not generating SMLADs for these cases. The alias analysis checks were done too early, making it too conservative. Differential revision: https://reviews.llvm.org/D49125 llvm-svn: 336795	2018-07-11 12:36:25 +00:00
Eli Friedman	d2c739230c	[ARM] Treat cmn immediates as legal in isLegalICmpImmediate. The original code attempted to do this, but the std::abs() call didn't actually do anything due to implicit type conversions. Fix the type conversions, and perform the correct check for negative immediates. This probably has very little practical impact, but it's worth fixing just to avoid confusion in the future, I think. Differential Revision: https://reviews.llvm.org/D48907 llvm-svn: 336742	2018-07-10 23:44:37 +00:00
Nico Weber	038dbf3c24	Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084. llvm-svn: 336453	2018-07-06 17:37:24 +00:00
Sjoerd Meijer	b3e06faa28	[ARM] ParallelDSP: added statistics, NFC. Added statistics for the number of SMLAD instructions created, and als renamed the pass name to -arm-parallel-dsp. Differential Revision: https://reviews.llvm.org/D48971 llvm-svn: 336441	2018-07-06 14:47:09 +00:00
Diogo N. Sampaio	81e9dd1ed7	Commit rL336426 cause buildbot failures http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/ This removes the comments of the function label causing this error. llvm-svn: 336440	2018-07-06 14:41:09 +00:00
Diogo N. Sampaio	742bf1a255	[SelectionDAG] https://reviews.llvm.org/D48278 D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426	2018-07-06 09:42:25 +00:00
Ivan A. Kosarev	466037900c	[NEON] Fix combining of vldx_dup intrinsics with updating of base addresses Resolves: Unsupported ARM Neon intrinsics in Target-specific DAG combine function for VLDDUP https://bugs.llvm.org/show_bug.cgi?id=38031 Related diff: D48439 Differential Revision: https://reviews.llvm.org/D48920 llvm-svn: 336325	2018-07-05 08:59:49 +00:00
Mikael Holmen	8505f34b29	Partial revert of "NFC - Various typo fixes in tests" This partially reverts r336268 since it causes buildbot failures. Added FIXME at the places where the CHECKs are misspelled. llvm-svn: 336323	2018-07-05 08:42:16 +00:00
Sjoerd Meijer	27be58b307	[ARM] ParallelDSP: only support i16 loads for now We were miscompiling i8 loads, so reject them as unsupported narrow operations for now. Differential Revision: https://reviews.llvm.org/D48944 llvm-svn: 336319	2018-07-05 08:21:40 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
Vadzim Dambrouski	fd10286e04	[ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m. Reviewers: efriedma, rogfer01, javed.absar Reviewed By: efriedma, rogfer01 Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D48846 llvm-svn: 336144	2018-07-02 21:05:26 +00:00
Sjoerd Meijer	c89ca5582a	[ARM] Parallel DSP Pass Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850	2018-06-28 12:55:29 +00:00
Ivan A. Kosarev	7231598fce	[NEON] Support vldNq intrinsics in AArch32 (LLVM part) This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733	2018-06-27 13:57:52 +00:00
Than McIntosh	3190993a02	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
George Rimar	dcf59c5480	Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text" With compilation fix. Original commit message: D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335336	2018-06-22 10:53:47 +00:00
George Rimar	6d448da1be	Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text" It broke bots. http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891 http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443 http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551 llvm-svn: 335333	2018-06-22 10:27:33 +00:00
George Rimar	e14485a0c6	[MC] - Add .stack_size sections into groups and link them with .text D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335332	2018-06-22 10:10:53 +00:00
Sjoerd Meijer	1043dffbd3	Recommit of r335326, with the test fixed that I missed. llvm-svn: 335331	2018-06-22 10:03:03 +00:00
Sjoerd Meijer	7ee5b090de	Reverting r335326 while I look at the test failure llvm-svn: 335328	2018-06-22 09:17:08 +00:00
Sjoerd Meijer	8d2f1565b7	[ARM] ARMv6m and v8m.baseline strict align This sets target feature FeatureStrictAlign for Armv6-m and Armv8-m.baseline, because it has no support for unaligned accesses. It looks like we always pass target feature "+strict-align" from Clang, so this is not a user facing problem, but querying the subtarget (in e.g. llc) for unaligned access support is incorrect. Differential Revision: https://reviews.llvm.org/D48437 llvm-svn: 335326	2018-06-22 08:48:13 +00:00
David Green	21a2973cc4	[ARM] Enable useAA() for the in-order Cortex-R52 This option allows codegen (such as DAGCombine or MI scheduling) to use alias analysis information, which can help with the codegen on in-order cpu's, especially machine scheduling. Here I have done things the same way as AArch64, adding a subtarget feature to enable this for specific cores, and enabled it for the R52 where we have a schedule to make use of it. Differential Revision: https://reviews.llvm.org/D48074 llvm-svn: 335249	2018-06-21 15:48:29 +00:00
Sam Parker	ad71333b92	[NFC][ARM] ldrd/strd negative tests Add negative tests for load and stores of alignment 2. llvm-svn: 335241	2018-06-21 14:53:06 +00:00
David Green	a465188500	[DAGCombine] Fix alignment for offset loads/stores The alignment parameter to getExtLoad is treated as a base alignment, not the alignment of the load (base + offset). When we infer a better alignment for a Ptr we need to ensure that it applies to the base to prevent the alignment on the load from being wrong. This fixes a bug where the alignment could then be used to incorrectly prove noalias between a load and a store, leading to a miscompile. Differential Revision: https://reviews.llvm.org/D48029 llvm-svn: 335210	2018-06-21 08:30:07 +00:00
Alina Sbirlea	dfd14adeb0	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183	2018-06-20 22:01:04 +00:00
Tim Northover	644a819534	ARM: convert ORR instructions to ADD where possible on Thumb. Thumb has more 16-bit encoding space dedicated to ADD than ORR, allowing both a 3-address encoding and a wider range of immediates. So, particularly when optimizing for code size (but it doesn't make things worse elsewhere) it's beneficial to select an OR operation to an ADD if we know overflow won't occur. This is made even better by LLVM's penchant for putting operations in canonical form by converting the other way. llvm-svn: 335119	2018-06-20 12:09:44 +00:00
Eli Friedman	e6b4719244	[ARM] Add Thumb1 coverage for cmn testcases. There's a missed optimization for immediates: we can save two instructions by using adds instead of movs+mvns+cmp. llvm-svn: 335002	2018-06-19 00:09:44 +00:00
Eli Friedman	801c2f4c3a	[ARM] Testcase for missed optimization with i16 compare. The result looks weird because the DAG actually has an explicit shift; I haven't figured out why, exactly. llvm-svn: 335000	2018-06-19 00:07:30 +00:00
Michael Berg	0c20447a02	easing the constraint for isNegatibleForFree and GetNegatedExpression Summary: Here we relax the old constraint which utilized unsafe with the TargetOption flag HonorSignDependentRoundingFPMathOption, with the assertion that unsafe is no longer needed or never was required for correctness on FDIV/FMUL. Reviewers: spatel, hfinkel, wristow, arsenm, javed.absar Reviewed By: spatel Subscribers: efriedma, wdng, tpr Differential Revision: https://reviews.llvm.org/D48057 llvm-svn: 334769	2018-06-14 20:54:13 +00:00
Matt Arsenault	5615fa0a87	DAG: Fix extract_subvector combine for a single element This would fail before because 1x vectors aren't legal, so instead just use the scalar type. Avoids regressions in a future AMDGPU commit to add v4i16/v4f16 as legal types. Test update is just the one test that this triggers on in tree now. It wasn't checking anything before. The result is completely changed since the selects are eliminated. Not sure if it's considered better or not. llvm-svn: 334440	2018-06-11 21:27:41 +00:00
Ivan A. Kosarev	847daa11f8	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361	2018-06-10 09:27:27 +00:00
Eli Friedman	864df22307	[ARM] Allow CMPZ transforms even if the input has multiple uses. It looks like this got left in by accident in r289794; I can't think of any reason this check would be necessary. (Maybe it was meant to be a check that the AND has one use? But we check that a few lines earlier.) Differential Revision: https://reviews.llvm.org/D47921 llvm-svn: 334322	2018-06-08 21:16:56 +00:00
Evandro Menezes	b2c8244715	[AArch64, ARM] Add support for Samsung Exynos M4 Create a separate feature set for Exynos M4 and add test cases. llvm-svn: 334115	2018-06-06 18:56:00 +00:00
David Green	25312b2b6c	[GlobalMerge] Set the alignment on merged global structs If no alignment is set, the abi/preferred alignment of structs will be used which may be higher than required. This can lead to extra padding and in the end an increase in data size. Differential Revision: https://reviews.llvm.org/D47633 llvm-svn: 334099	2018-06-06 14:48:32 +00:00
Peter Smith	57f661bd7d	[MC] Pass MCSubtargetInfo to fixupNeedsRelaxation and applyFixup On targets like Arm some relaxations may only be performed when certain architectural features are available. As functions can be compiled with differing levels of architectural support we must make a judgement on whether we can relax based on the MCSubtargetInfo for the function. This change passes through the MCSubtargetInfo for the function to fixupNeedsRelaxation so that the decision on whether to relax can be made per function. In this patch, only the ARM backend makes use of this information. We must also pass the MCSubtargetInfo to applyFixup because some fixups skip error checking on the assumption that relaxation has occurred, to prevent code-generation errors applyFixup must see the same MCSubtargetInfo as fixupNeedsRelaxation. Differential Revision: https://reviews.llvm.org/D44928 llvm-svn: 334078	2018-06-06 09:40:06 +00:00
Ivan A. Kosarev	60a991ed1a	[NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47120 llvm-svn: 333825	2018-06-02 16:40:03 +00:00
Ivan A. Kosarev	73c5337a64	Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)" The LLVM part was committed instead of the Clang part. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333824	2018-06-02 16:38:38 +00:00
Ivan A. Kosarev	51f19b9ee1	[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part) We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333819	2018-06-02 16:26:42 +00:00
Eli Friedman	63fead0f43	[ARM] Enable SETCCCARRY lowering for Thumb1. We've had Thumb1 support for ARMISD::SUBE for a while now, so this just works. Reduces codesize a bit for 64-bit integer comparisons. Differential Revision: https://reviews.llvm.org/D47387 llvm-svn: 333445	2018-05-29 18:17:16 +00:00
Tim Northover	4e3eec39fa	ARM: be conservative when asked load/store alignment of weird type. Chances are we'll be asked again after type legalization, but before that point it's better to claim misaligned accesses aren't allowed than to assert. llvm-svn: 332840	2018-05-21 12:43:54 +00:00
Haicheng Wu	69ba0613f2	[GlobalMerge] Exit early if only one global is to be merged To save some compilation time and prevent some unnecessary changes. Differential Revision: https://reviews.llvm.org/D46640 llvm-svn: 332813	2018-05-19 18:00:02 +00:00
Sanjay Patel	5c48b73fff	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332638	2018-05-17 18:09:56 +00:00
Sanjay Patel	38892e5ce1	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in https://reviews.llvm.org/D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. Follow-up to: https://reviews.llvm.org/rL332538 ...because that change wasn't enough. llvm-svn: 332637	2018-05-17 18:08:27 +00:00
Sanjay Patel	2c1846de2d	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332539	2018-05-16 22:20:33 +00:00
Sanjay Patel	ce20ac0bc5	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332538	2018-05-16 22:20:26 +00:00
Sanjay Patel	066309f4ed	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332537	2018-05-16 22:20:11 +00:00
Sanjay Patel	6dcda28ddc	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332533	2018-05-16 21:57:19 +00:00
Sanjay Patel	82eca55953	[ARM] preserve test intent by removing undef We need to clean up the DAG floating-point undef logic. This process is similar to how we handled integer undef logic in D43141. And as we did there, I'm trying to reduce the patch by changing tests that would probably become meaningless once we correct FP undef folding. llvm-svn: 332532	2018-05-16 21:57:00 +00:00
Sirish Pande	cabe50a308	[AArch64] Gangup loads and stores for pairing. Keep loads and stores together (target defines how many loads and stores to gang up), such that it will help in pairing and vectorization. Differential Revision https://reviews.llvm.org/D46477 llvm-svn: 332482	2018-05-16 15:36:52 +00:00
Amara Emerson	0d6a26dffc	[GlobalISel][IRTranslator] Split aggregates during IR translation. We currently handle all aggregates by creating one large LLT, and letting the legalizer deal with splitting them up. However using this approach means that we can't support big endian code correctly. This patch changes the way that the IRTranslator deals with aggregate values, by splitting them up into their constituent element values. To do this, parts of the translator need to be modified to deal with multiple VRegs for a single Value. A new Value to VReg mapper is introduced to help keep compile time under control, currently there is no measurable impact on CTMark despite the extra code being generated in some cases. Patch is based on the original work of Tim Northover. Differential Revision: https://reviews.llvm.org/D46018 llvm-svn: 332449	2018-05-16 10:32:02 +00:00
Sanjay Patel	8652c53d29	[DAG] propagate FMF for all FPMathOperators This is a simple hack based on what's proposed in D37686, but we can extend it if needed in follow-ups. It gets us most of the FMF functionality that we want without adding any state bits to the flags. It also intentionally leaves out non-FMF flags (nsw, etc) to minimize the patch. It should provide a superset of the functionality from D46563 - the extra tests show propagation and codegen diffs for fcmp, vecreduce, and FP libcalls. The PPC log2() test shows the limits of this most basic approach - we only applied 'afn' to the last node created for the call. AFAIK, there aren't any libcall optimizations based on the flags currently, so that shouldn't make any difference. Differential Revision: https://reviews.llvm.org/D46854 llvm-svn: 332358	2018-05-15 14:16:24 +00:00
Martin Storsjo	ace7ae935f	[ARM] Back up R4 and LR if calling the stack probe function Differential Revision: https://reviews.llvm.org/D46777 llvm-svn: 332298	2018-05-14 21:32:52 +00:00
Shiva Chen	2c864551df	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label. In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841	2018-05-09 02:40:45 +00:00
Daniel Sanders	acc008cb0c	[globalisel] Remove redundant -global-isel option from tests that use -run-pass. NFC As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the -global-isel option is redundant when -run-pass is given. -global-isel sets up the GlobalISel passes in the pass manager but -run-pass skips that entirely and configures it's own pipeline. llvm-svn: 331603	2018-05-05 21:19:59 +00:00
Tim Northover	28e0a6f7dd	ARM: don't try to over-align large vectors as arguments. By default LLVM thinks very large vectors get aligned to their size when passed across functions. Unfortunately no-one told the ARM backend so it doesn't trigger stack realignment and so accesses can cause the usual misalignment issues (e.g. a data abort). This changes the ABI alignment to the stack alignment, which in practice (and as a bonus) also coincides with the alignment "natural" vectors get. llvm-svn: 331451	2018-05-03 12:54:25 +00:00
Vedant Kumar	e23173b677	[DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N) The logic for this combine is almost identical to the logic for a (sext (sextload x)) combine. This commit factors out the logic so it can be shared by both combines, and corrects the SDLoc assigned in the zext version of the combine. Prior to this patch, for the given test case, we would apply the location associated with the udiv instruction to instructions which perform the load. Part of: llvm.org/PR37262 llvm-svn: 331303	2018-05-01 19:51:15 +00:00
Vedant Kumar	d7117ed0f9	[DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N) Prior to this patch, for the given test case, we would apply the location associated with the sdiv instruction to instructions which perform the load. Part of: llvm.org/PR37262. Differential Revision: https://reviews.llvm.org/D46222 llvm-svn: 331302	2018-05-01 19:51:15 +00:00
Vedant Kumar	ee4bfcaa5a	[DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N) Setting the right SDLoc on a newly-created zextload fixes a line table bug which resulted in non-linear stepping behavior. Several backend tests contained CHECK lines which relied on the IROrder inherited from the wrong SDLoc. This patch breaks that dependence where feasbile and regenerates test cases where not. In some cases, changing a node's IROrder may alter register allocation and spill behavior. This can affect performance. I have chosen not to prevent this by applying a "known good" IROrder to SDLocs, as this may hide a more general bug in the scheduler, or cause regressions on other test inputs. rdar://33755881, Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D45995 llvm-svn: 331300	2018-05-01 19:26:15 +00:00
Daniel Sanders	27fe8a5011	[globalisel][legalizerinfo] Add support for legalization based on the MachineMemOperand Summary: Currently only the memory size is supported but others can be added as needed. narrowScalar for G_LOAD and G_STORE now correctly update the MachineMemOperand and will refuse to legalize atomics since those need more careful expansions to maintain atomicity. Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45466 llvm-svn: 331071	2018-04-27 19:48:53 +00:00
Oliver Stannard	f3632143da	[ARM] Codegen for v8.2A dot product intrinsics This adds IR intrinsics for the ARM dot-product instructions introduced in v8.2-A. Differential revision: https://reviews.llvm.org/D46106 llvm-svn: 331032	2018-04-27 12:50:40 +00:00
David Green	c4cccea4c9	[ARM] Enable misched for R52. Back when the R52 schedule was added in rL286949, there was no way to enable machine schedules in ARM for specific cores. Since then a target feature has been added. This enables the feature for R52, removing the need to manually specify compiler flags. llvm-svn: 331027	2018-04-27 11:29:49 +00:00
Francis Visoiu Mistrih	57fcd3454a	[MIR] Add support for debug metadata for fixed stack objects Debug var, expr and loc were only supported for non-fixed stack objects. This patch adds the following fields to the "fixedStack:" entries, and renames the ones from "stack:" to: * debug-info-variable * debug-info-expression * debug-info-location Differential Revision: https://reviews.llvm.org/D46032 llvm-svn: 330859	2018-04-25 18:58:06 +00:00
Sanjay Patel	3d453ad711	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) This was originally committed at rL328921 and reverted at rL329920 to investigate failures in Chrome. This time I've added to the ReleaseNotes to warn users of the potential of exposing UB and let me repeat that here for more exposure: Optimization of floating-point casts is improved. This may cause surprising results for code that is relying on undefined behavior. Code sanitizers can be used to detect affected patterns such as this: int main() { float x = 4294967296.0f; x = (float)((int)x); printf("junk in the ftrunc: %f\n", x); return 0; } $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 Original commit message: fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 330437	2018-04-20 15:07:55 +00:00
Sjoerd Meijer	a79ea80d7b	[ARM] Add some missing FP16 VSEL test cases Differential Revision: https://reviews.llvm.org/D45724 llvm-svn: 330313	2018-04-19 08:21:50 +00:00
Tim Northover	271d3d2771	MachO: trap unreachable instructions Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073	2018-04-13 22:25:20 +00:00
Sjoerd Meijer	834f7dc7ab	[ARM] FP16 vmaxnm/vminnm scalar instructions This adds code generation support for the FP16 vmaxnm/vminnm scalar instructions. Differential Revision: https://reviews.llvm.org/D44675 llvm-svn: 330034	2018-04-13 15:34:26 +00:00
Ivan A. Kosarev	f533a6e5aa	[NEON] Support intrinsic for scalar and vector versions of the VRINTN instruction Differential Revision: https://reviews.llvm.org/D45514 llvm-svn: 330011	2018-04-13 12:45:12 +00:00
Sanjay Patel	5ace2b765a	revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617) This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920	2018-04-12 15:27:01 +00:00
Reid Kleckner	0828699488	[FastISel] Disable local value sinking by default This is causing compilation timeouts on code with long sequences of local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out that code coverage instrumentation is a great way to create sequences like this, which how our users ran into the issue in practice. Intel has a tool that detects these kinds of non-linear compile time issues, and Andy Kaylor reported it as PR37010. The current sinking code scans the whole basic block once per local value sink, which happens before emitting each call. In theory, local values should only be introduced to be used by instructions between the current flush point and the last flush point, so we should only need to scan those instructions. llvm-svn: 329822	2018-04-11 16:03:07 +00:00
Sjoerd Meijer	ac96d7c4b3	[ARM] FP16 VSEL codegen This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788	2018-04-11 09:28:04 +00:00
Krzysztof Parzyszek	71a4c0ca07	[CodeGen] Fix printing bundles in MIR output Delay printing the newline until after the opening bracket was printed, e.g. BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } instead of BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } llvm-svn: 329719	2018-04-10 16:46:13 +00:00
Sam Parker	1f4f4d9a08	[DAGCombine] Improve ReduceLoad for SRL Recommitting r329283, third time lucky... If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329551	2018-04-09 08:16:11 +00:00
Guozhi Wei	0eb86c8efc	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst)) In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 329516	2018-04-07 23:36:10 +00:00
Craig Topper	5b95eae1c3	[DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector. llvm-svn: 329509	2018-04-07 19:09:50 +00:00
Tim Northover	e25e458d52	Reapply ARM: Do not spill CSR to stack on entry to noreturn functions Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494	2018-04-07 10:57:03 +00:00
Vitaly Buka	de5f196530	Revert "ARM: Do not spill CSR to stack on entry to noreturn functions" Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486	2018-04-07 05:36:44 +00:00
Tim Northover	b30388bf11	ARM: Do not spill CSR to stack on entry to noreturn functions A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287	2018-04-05 14:26:06 +00:00
Sam Parker	0e7deb8104	[DAGCombine] Revert r329160 Again, broke the big endian stage 2 builders. llvm-svn: 329283	2018-04-05 13:46:17 +00:00
Sam Parker	7ec722d603	[DAGCombine] Improve ReduceLoadWidth for SRL Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160	2018-04-04 09:26:56 +00:00
Sanjay Patel	6124cae8f7	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 328921	2018-03-31 17:55:44 +00:00
Christof Douma	a1e77c0e02	[ARM] Support float literals under XO Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691	2018-03-28 10:02:26 +00:00
Artur Pilipenko	ca1d849cd6	Fix a reoccuring typo in load-combine tests %tmp = bitcast i32* %arg to i8* %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0 - %tmp2 = load i8, i8* %tmp, align 1 + %tmp2 = load i8, i8* %tmp1, align 1 This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended. llvm-svn: 328642	2018-03-27 17:33:50 +00:00
Krzysztof Parzyszek	52396bb9c5	Use .set instead of = when printing assignment in assembly output On Hexagon "x = y" is a syntax used in most instructions, and is not treated as a directive. Differential Revision: https://reviews.llvm.org/D44256 llvm-svn: 328635	2018-03-27 16:44:41 +00:00
Rafael Espindola	78fdca3cd5	Use local symbols for creating .stack-size. llvm-svn: 328581	2018-03-26 20:40:22 +00:00
Christof Douma	4a025cc79d	[ARM] Support float literals under XO When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313	2018-03-23 13:02:03 +00:00
Rafael Espindola	d947977e87	Run dos2unix on a test. NFC. llvm-svn: 327934	2018-03-20 01:06:29 +00:00
Aaron Smith	6738960588	[SelectionDAG] Transfer DbgValues when integer operations are promoted Summary: DbgValue nodes were not transferred when integer DAG nodes were promoted. For example, if an i32 add node was promoted to an i64 add node by DAGTypeLegalizer::PromoteIntegerResult(), its DbgValue node was not transferred to the new node. The simple fix is to update SetPromotedInteger() to transfer DbgValues. Add AArch64/dbg-value-i8.ll to test this change and fix ARM/debug-info-d16-reg.ll which had the wrong DILocalVariable nodes with arg numbers even though they are not for function parameters. Patch by Se Jong Oh! Reviewers: vsk, JDevlieghere, aprantl Reviewed By: JDevlieghere Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D44546 llvm-svn: 327919	2018-03-19 22:58:50 +00:00
Martin Storsjo	9a55c1b0dc	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897	2018-03-19 20:06:50 +00:00
Sjoerd Meijer	d16037d9bb	[ARM] Support for v4f16 and v8f16 vectors This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839	2018-03-19 13:35:25 +00:00
Sjoerd Meijer	d391a1a985	[ARM] FP16 codegen support for VSEL This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695	2018-03-16 08:06:25 +00:00
Craig Topper	1b8cf49704	[SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683	2018-03-15 23:04:11 +00:00
Reid Kleckner	3a7a2e4a0a	[FastISel] Sink local value materializations to first use Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581	2018-03-14 21:54:21 +00:00
Francis Visoiu Mistrih	e85b06d65f	[CodeGen] Use MIR syntax for MachineMemOperand printing Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580	2018-03-14 21:52:13 +00:00
Arnold Schwaighofer	bf1638daa8	SjLjEHPrepare: Don't reg-to-mem swifterror values swifterror llvm values model the swifterror register as memory at the LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror values are constraint in how they can be used. Spilling them to memory is not allowed. SjLjEHPrepare tried to lower swifterror values to memory which is unecessary since the back-end will spill and reload the register as neccessary (as long as clobbering calls are marked as such which is the case here) and further leads to invalid IR because swifterror values can't be stored to memory. rdar://38164004 llvm-svn: 327521	2018-03-14 15:44:07 +00:00
Sjoerd Meijer	af30f06d5c	[ARM] Fix for PR36577 Don't PerformSHLSimplify if the given node is used by a node that also uses a constant because we may get stuck in an infinite combine loop. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577 Patch by Sam Parker. Differential Revision: https://reviews.llvm.org/D44097 llvm-svn: 326882	2018-03-07 09:10:44 +00:00
Florian Hahn	9deef20b6c	[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570	2018-03-02 13:02:55 +00:00

1 2 3 4 5 ...

3587 Commits