llvm-project

Commit Graph

Author	SHA1	Message	Date
Nirav Dave	6ce9f72f76	[DAGCombine] Improve alias analysis for chain of independent stores. FindBetterNeighborChains simulateanously improves the chain dependencies of a chain of related stores avoiding the generation of extra token factors. For chains longer than the GatherAllAliasDepths, stores further down in the chain will necessarily fail, a potentially significant waste and preventing otherwise trivial parallelization. This patch directly parallelize the chains of stores before improving each store. This generally improves DAG-level parallelism. Reviewers: courbet, spatel, RKSimon, bogner, efriedma, craig.topper, rnk Subscribers: sdardis, javed.absar, hiraditya, jrtc27, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D53552 llvm-svn: 346432	2018-11-08 19:14:20 +00:00
Eli Friedman	d00fb2e0a8	[AArch64] [Windows] Trap after noreturn calls. Like the comment says, this isn't the most efficient fix in terms of codesize, but it works. Differential Revision: https://reviews.llvm.org/D54129 llvm-svn: 346358	2018-11-07 21:31:14 +00:00
Volkan Keles	fa441730bb	[AArch64][GlobalISel] Simplify and autogenerate the legalizer tests llvm-svn: 346253	2018-11-06 18:59:18 +00:00
Volkan Keles	ecacfe9c6c	Reland r346166: [GlobalISel] Refactor the artifact combiner a bit by using MIPatternMatch It was causing a crash because we were trying to get the definition of a target register. Fixed the issue by adding a check and added a test case for that. llvm-svn: 346251	2018-11-06 18:31:25 +00:00
Roman Lebedev	7db25f2b38	[NFC][x86][AArch64] extract-bits.ll: add test with 'ashr'. llvm-svn: 346121	2018-11-05 09:20:08 +00:00
Mandeep Singh Grang	547a0d765a	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Patch by: Yin Ma (yinma@codeaurora.org) Reviewers: mgrang, ssijaric, eli.friedman, TomTan, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53996 llvm-svn: 345909	2018-11-01 23:22:25 +00:00
Jessica Paquette	c991cf3687	[MachineOutliner][NFC] Remember when you map something illegal across MBBs Instruction mapping in the outliner uses "illegal numbers" to signify that something can't ever be part of an outlining candidate. This means that the number is unique and can't be part of any repeated substring. Because each of these is unique, we can use a single unique number to represent a range of things we can't outline. The outliner tries to leverage this using a flag which is set in an MBB when the previous instruction we tried to map was "illegal". This patch improves that logic to work across MBBs. As a bonus, this also simplifies the mapping logic somewhat. This also updates the machine-outliner-remarks test, which was impacted by the order of Candidates on an OutlinedFunction changing. This order isn't guaranteed, so I added a FIXME to fix that in a follow-up. The order of Candidates on an OutlinedFunction isn't important, so this still is NFC. llvm-svn: 345906	2018-11-01 23:09:06 +00:00
Mandeep Singh Grang	df19e57a1c	[COFF, ARM64] Implement llvm.addressofreturnaddress intrinsic Reviewers: rnk, mstorsjo, efriedma, TomTan Reviewed By: efriedma Subscribers: javed.absar, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D53962 llvm-svn: 345892	2018-11-01 21:23:47 +00:00
Volkan Keles	0a8dc9eb0f	[GlobalISel] Fix a bug in LegalizeRuleSet::clampMaxNumElements Summary: This function was causing a crash when `MaxElements == 1` because it was trying to create a single element vector type. Reviewers: dsanders, aemerson, aditya_nandakumar Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53734 llvm-svn: 345875	2018-11-01 19:01:53 +00:00
Mandeep Singh Grang	b0cdf56dd7	Revert "[COFF, ARM64] Implement Intrinsic.sponentry for AArch64" This reverts commit 585b6667b4712e3c7f32401e929855b3313b4ff2. llvm-svn: 345863	2018-11-01 17:53:57 +00:00
Mandeep Singh Grang	88ad9ac720	[COFF, ARM64] Implement Intrinsic.sponentry for AArch64 Summary: This patch adds Intrinsic.sponentry. This intrinsic is required to correctly support setjmp for AArch64 Windows platform. Reviewers: mgrang, TomTan, rnk, compnerd, mstorsjo, efriedma Reviewed By: efriedma Subscribers: majnemer, chrib, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53673 llvm-svn: 345791	2018-10-31 23:16:20 +00:00
Daniel Sanders	a01348fa2a	[globalisel][irtranslator] Fix test from r345743 on non-asserts builds. llvm-svn: 345754	2018-10-31 17:58:47 +00:00
Daniel Sanders	3b39040ad4	[globalisel][irtranslator] Verify that DILocations aren't lost in translation Summary: Also fix a couple bugs where DILocations are lost. EntryBuilder wasn't passing on debug locations for PHI's, constants, GLOBAL_VALUE, etc. Reviewers: aprantl, vsk, bogner, aditya_nandakumar, volkan, rtereshin, aemerson Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53740 llvm-svn: 345743	2018-10-31 17:31:23 +00:00
Matthias Braun	8763c0c5b7	MachineModuleInfo: Initialize DbgInfoAvailable depending on debug_cus existing Before this patch DbgInfoAvailable was set to true in DwarfDebug::beginModule() or CodeViewDebug::CodeViewDebug(). This made MIR testing weird since passes would suddenly stop dealing with debug info just because we stopped the pipeline before the debug printers. This patch changes the logic to initialize DbgInfoAvailable based on the fact that debug_compile_units exist in the llvm Module. The debug printers may then override it with false in case of debug printing being disabled. Differential Revision: https://reviews.llvm.org/D53885 llvm-svn: 345740	2018-10-31 17:18:41 +00:00
Sanjin Sijaric	fadebc8aae	[ARM64] [Windows] Exception handling support in frame lowering Emit pseudo instructions indicating unwind codes corresponding to each instruction inside the prologue/epilogue. These are used by the MCLayer to populate the .xdata section. Differential Revision: https://reviews.llvm.org/D50288 llvm-svn: 345701	2018-10-31 09:27:01 +00:00
Martin Storsjo	315357faca	[AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk This is similar to SVN r311061 for ARM. Differential Revision: https://reviews.llvm.org/D53878 llvm-svn: 345698	2018-10-31 08:14:09 +00:00
Matthias Braun	a83403892a	MachineOperand/MIParser: Do not print debug-use flag, infer it The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671	2018-10-30 23:28:27 +00:00
Mandeep Singh Grang	71e0cc2a0b	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641	2018-10-30 20:46:10 +00:00
Bjorn Pettersson	fe09a20f09	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636	2018-10-30 20:16:39 +00:00
Eli Friedman	93d0129b78	[AArch64] [Windows] SEH opcodes should be scheduling boundaries. Prevents the post-RA scheduler from modifying the prologue sequences emitting by frame lowering. This is roughly similar to what we do for other targets: TargetInstrInfo::isSchedulingBoundary checks isPosition(), which checks for CFI_INSTRUCTION. isSEHInstruction is taken from D50288; it'll land with whatever patch lands first. Differential Revision: https://reviews.llvm.org/D53851 llvm-svn: 345634	2018-10-30 19:24:51 +00:00
David Greene	3e89fa8e08	[AArch64] Create proper memoperand for multi-vector stores Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631	2018-10-30 19:17:51 +00:00
Craig Topper	b293322cee	[LegalizeTypes] Teach PromoteIntRes_BITCAST to better handle a bitcast with vector output type and a vector input type that needs to be widened Summary: Previously if we had a bitcast vector output type that needs promotion and a vector input type that needs widening we would just do a stack store and load to handle the conversion. We can do a little better if we can widen the bitcast to a legal vector type the same size as the widened input type. Then we can do the bitcast between this widened type and the widened input type. Afterwards we can extract_subvector back to the original output and any_extend that. Type legalization will then circle back and handle promotion of the extract_subvector and the any_extend will just be removed. This will avoid going through the stack and allows us to remove a custom version of this legalization from X86. Reviewers: efriedma, RKSimon Reviewed By: efriedma Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53229 llvm-svn: 345567	2018-10-30 03:27:15 +00:00
Craig Topper	2640795c94	[AArch64] Add test case for D53229. NFC llvm-svn: 345566	2018-10-30 03:27:13 +00:00
Jessica Paquette	e3932eeea4	[MachineOutliner] Inherit target features from parent function If a function has target features, it may contain instructions that aren't represented in the default set of instructions. If the outliner pulls out one of these instructions, and the function doesn't have the right attributes attached, we'll run into an LLVM error explaining that the target doesn't support the necessary feature for the instruction. This makes outlined functions inherit target features from their parents. It also updates the machine-outliner.ll test to check that we're properly inheriting target features. llvm-svn: 345535	2018-10-29 20:27:07 +00:00
Matthias Braun	c045c557b0	Relax fast register allocator related test cases; NFC - Relex hard coded registers and stack frame sizes - Some test cleanups - Change phi-dbg.ll to match on mir output after phi elimination instead of going through the whole codegen pipeline. This is in preparation for https://reviews.llvm.org/D52010 I'm committing all the test changes upfront that work before and after independently. llvm-svn: 345532	2018-10-29 20:10:42 +00:00
Luke Cheeseman	71c989ae1f	[AArch64] Return address signing B key support - Add support to generate AUTIBSP, PACIBSP, RETAB instructions for return address signing - The key used to sign the function is controlled by the function attribute "sign-return-address-key" Differential Revision: https://reviews.llvm.org/D51427 llvm-svn: 345511	2018-10-29 16:26:58 +00:00
Sanjin Sijaric	96f2ea3dd4	[ARM64][Windows] MCLayer support for exception handling Add ARM64 unwind codes to MCLayer, as well SEH directives that will be emitted by the frame lowering patch to follow. We only emit unwind codes into object object files for now. Differential Revision: https://reviews.llvm.org/D50166 llvm-svn: 345450	2018-10-27 06:13:06 +00:00
Vlad Tsyrklevich	21beeb29ea	Revert "[AArch64] Create proper memoperand for multi-vector stores" This reverts commit r345315, it was causing test failures on sanitizer-x86_64-linux-fast. llvm-svn: 345356	2018-10-26 02:00:14 +00:00
Bryan Chan	f0923f16f8	[AArch64] Implement FP16FML intrinsics Add LLVM intrinsics for the ARMv8.2-A FP16FML vector-form instructions. Add a DAG pattern to define the indexed-form intrinsics in terms of the vector-form ones, similarly to how the Dot Product intrinsics were implemented. Based on a patch by Gao Yiling. Differential Revision: https://reviews.llvm.org/D53632 llvm-svn: 345337	2018-10-25 23:36:41 +00:00
David Greene	53e869da7d	[AArch64] Create proper memoperand for multi-vector stores Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345315	2018-10-25 21:10:39 +00:00
Volkan Keles	f28e81f6aa	[AArch64][GlobalISel] Simplify a legalizer test. NFC. llvm-svn: 345307	2018-10-25 20:01:19 +00:00
Volkan Keles	60c6affcb0	[GlobalISel] LegalizerHelper: Fix the incorrect alignment when splitting loads/stores in narrowScalar Reviewers: dsanders, bogner, jpaquette, aemerson, ab, paquette Reviewed By: dsanders Subscribers: rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53664 llvm-svn: 345292	2018-10-25 17:52:19 +00:00
Volkan Keles	3a103b1d25	[AArch64][GlobalISel] Fix the LegalityPredicate for lowerIf for G_LOAD/G_STORE Summary: Currently, Legalizer is trying to lower G_LOAD with a vector type that has more than two elements due to the incorrect LegalityPredicate. This patch fixes the issue by removing the multiplication by 8 as `MemDesc.Size` already contains the size in bits. Reviewers: dsanders, aemerson Reviewed By: dsanders Subscribers: rovka, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53679 llvm-svn: 345282	2018-10-25 17:23:25 +00:00
John Brawn	958865202d	[AArch64] Add EXT patterns for 64-bit EXT of a subvector of a 128-bit vector If we have a 64-bit EXT where one of the operands is a subvector of a 128-bit vector then in some cases we can eliminate an extract_subvector by converting to a 128-bit EXT of the 128-bit vector. Differential Revision: https://reviews.llvm.org/D53582 llvm-svn: 345275	2018-10-25 15:31:51 +00:00
John Brawn	49e61d90ca	[AArch64] Do 64-bit vector move of 0 and -1 by extracting from the 128-bit move Currently a vector move of 0 or -1 will use different instructions depending on the size of the vector. Using a single instruction (the 128-bit one) for both gives more opportunity for Machine CSE to eliminate instructions. Differential Revision: https://reviews.llvm.org/D53579 llvm-svn: 345270	2018-10-25 14:56:48 +00:00
Amara Emerson	cbd86d8429	[GlobalISel] Use the target preferred type for G_EXTRACT_VECTOR_ELT index. Allows for better imported pattern re-use. llvm-svn: 345265	2018-10-25 14:04:54 +00:00
Tim Northover	1c353419ab	AArch64: add a pass to compress jump-table entries when possible. llvm-svn: 345188	2018-10-24 20:19:09 +00:00
Matthias Braun	4f82406c46	SelectionDAG: Reuse bigger sized constants in memset expansion. When implementing memset's today we often see this pattern: $x0 = MOV 0xXYXYXYXYXYXYXYXY store $x0, ... $w1 = MOV 0xXYXYXYXY store $w1, ... We first create a 64bit constant in a 64bit register with all bytes the same and then create a 32bit constant with all bytes the same in a 32bit register. In many targets we could just access the lower byte of the 64bit register instead. - Ideally this would be handled by the ConstantHoist pass but it runs too early when memset isn't expanded yet. - The memset expansion code already had this optimization implemented, however SelectionDAG constantfolding would constantfold the "trunc(bigconstnat)" pattern to "smallconstant". - This patch makes the memset expansion mark the constant as Opaque and stop DAGCombiner from constant folding in this situation. (Similar to how ConstantHoisting marks things as Opaque to avoid folding ADD/SUB/etc.) Differential Revision: https://reviews.llvm.org/D53181 llvm-svn: 345102	2018-10-23 23:19:23 +00:00
Justin Bogner	912adfba7e	Reapply "[MachineCopyPropagation] Reimplement CopyTracker in terms of register units" Recommits r342942, which was reverted in r343189, with a fix for an issue where we would propagate unsafely if we defined only the upper part of a register. Original message: Change the copy tracker to keep a single map of register units instead of 3 maps of registers. This gives a very significant compile time performance improvement to the pass. I measured a 30-40% decrease in time spent in MCP on x86 and AArch64 and much more significant improvements on out of tree targets with more registers. llvm-svn: 344942	2018-10-22 19:51:31 +00:00
Sanjay Patel	e439cc2745	[DAGCombiner] reduce insert+bitcast+extract vector ops to truncate (PR39016) This is a late backend subset of the IR transform added with: D52439 We can confirm that the conversion to a 'trunc' is correct by running: $ opt -instcombine -data-layout="e" (assuming the IR transforms are correct; change "e" to "E" for big-endian) As discussed in PR39016: https://bugs.llvm.org/show_bug.cgi?id=39016 ...the pattern may emerge during legalization, so that's we are waiting for an insertelement to become a scalar_to_vector in the pattern matching here. The DAG allows for fun variations that are not possible in IR. Result types for extracts and scalar_to_vector don't necessarily match input types, so that means we have to be a bit more careful in the transform (see code comments). The tests show that we don't handle cases that require a shift (as we did in the IR version). I've left that as a potential follow-up because I'm not sure if that's a real concern at this late stage. Differential Revision: https://reviews.llvm.org/D53201 llvm-svn: 344872	2018-10-21 20:13:29 +00:00
Roman Tereshin	8d6ff4c0af	[MachineCSE][GlobalISel] Making sure MachineCSE works mid-GlobalISel (again) Change of approach, it looks like it's a much better idea to deal with the vregs that have LLTs and reg classes both properly, than trying to avoid creating those across all GlobalISel passes and all targets. The change mostly touches MachineRegisterInfo::constrainRegClass, which is apparently only used by MachineCSE. The changes are NFC for any pipeline but one that contains MachineCSE mid-GlobalISel. NOTE on isCallerPreservedOrConstPhysReg change in MachineCSE: There is no test covering it as the only way to insert a new pass (MachineCSE) from a command line I know of is llc's -run-pass option, which only works with MIR, but MIRParser freezes reserved registers upon MachineFunctions creation, making it impossible to reproduce the state that exposes the issue. Reviwed By: aditya_nandakumar Differential Revision: https://reviews.llvm.org/D53144 llvm-svn: 344822	2018-10-20 00:06:15 +00:00
Aditya Nandakumar	cd04e366d7	[GISel]: Allow PHIs to be DCEd https://reviews.llvm.org/D53304 Currently dead phis are not cleaned up during DCE. This patch allows dead PHI and G_PHI insts to be deleted. Reviewed by: dsanders llvm-svn: 344811	2018-10-19 20:11:52 +00:00
Simon Pilgrim	095a7fe635	[AARCH64] Improve vector popcnt lowering with ADDLP AARCH64 equivalent to D53257 - uses widening pairwise adds on vXi8 CTPOP to support i16/i32/i64 vectors. This is a blocker for generic vector CTPOP expansion (P32655) - this will remove the aarch64 diff from D53258. Differential Revision: https://reviews.llvm.org/D53259 llvm-svn: 344554	2018-10-15 21:15:58 +00:00
Sanjay Patel	8bd74785f0	[DAGCombiner] allow undef elts in vector fmul matching llvm-svn: 344534	2018-10-15 16:54:07 +00:00
Sanjay Patel	7cf5733f7f	[AArch64] add tests for fmul x, -2.0 with undef elts; NFC Also, add tests with commuted operands. There was no coverage for that case. llvm-svn: 344531	2018-10-15 16:44:00 +00:00
Simon Pilgrim	2ac03ec2c4	[AARCH64] Regenerate popcnt tests Improve codegen view as part of PR32655 llvm-svn: 344466	2018-10-13 21:50:15 +00:00
Arnaud A. de Grandmaison	162435e7b5	[AArch64] Swap comparison operands if that enables some folding. Summary: AArch64 can fold some shift+extend operations on the RHS operand of comparisons, so swap the operands if that makes sense. This provides a fix for https://bugs.llvm.org/show_bug.cgi?id=38751 Reviewers: efriedma, t.p.northover, javed.absar Subscribers: mcrosier, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D53067 llvm-svn: 344439	2018-10-13 07:43:56 +00:00
Zachary Turner	9f169afab2	Make YAML quote forward slashes. If you have the string /usr/bin, prior to this patch it would not be quoted by our YAML serializer. But a string like C:\src would be, due to the presence of a backslash. This makes the quoting rules of basically every single file path different depending on the path syntax (posix vs. Windows). While technically not required by the YAML specification to quote forward slashes, when the behavior of paths is inconsistent it makes it difficult to portably write FileCheck lines that will work with either kind of path. Differential Revision: https://reviews.llvm.org/D53169 llvm-svn: 344359	2018-10-12 16:31:20 +00:00
Zachary Turner	9c544199cf	Revert "Make YAML quote forward slashes." This reverts commit b86c16ad8c97dadc1f529da72a5bb74e9eaed344. This is being reverted because I forgot to write a useful commit message, so I'm going to resubmit it with an actual commit message. llvm-svn: 344358	2018-10-12 16:31:08 +00:00
Zachary Turner	ec234052a6	Make YAML quote forward slashes. llvm-svn: 344357	2018-10-12 16:24:09 +00:00
Sanjay Patel	f5b1892348	[AArch64][x86] add tests for trunc disguised as vector ops (PR39016); NFC These correspond to the IR transform from: D52439 llvm-svn: 344353	2018-10-12 15:22:14 +00:00
Roman Lebedev	62cd430602	[NFC][X86][AArch64] extract-bits.ll: add tests with constants+storing results. As noted in https://reviews.llvm.org/D53080#inline-467678, this may get pessimized by that diff. llvm-svn: 344182	2018-10-10 20:50:52 +00:00
Volkan Keles	da5578c5d0	[GlobalISel] Fix the artifact combiner to fold G_IMPLICIT_DEF properly Summary: GlobalISel generates incorrect code because the legalizer artifact combiner assumes `G_[SZ]EXT (G_IMPLICIT_DEF)` is equivalent to `G_IMPLICIT_DEF `. Replace `G_[SZ]EXT (G_IMPLICIT_DEF)` with 0 because the top bits will be 0 for G_ZEXT and 0/1 for the G_SEXT. Reviewers: aditya_nandakumar, dsanders, aemerson, javed.absar Reviewed By: aditya_nandakumar Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D52996 llvm-svn: 344163	2018-10-10 18:01:48 +00:00
Nirav Dave	07acc992dc	[DAGCombine] Improve Load-Store Forwarding Summary: Extend analysis forwarding loads from preceeding stores to work with extended loads and truncated stores to the same address so long as the load is fully subsumed by the store. Hexagon's swp-epilog-phis.ll and swp-memrefs-epilog1.ll test are deleted as they've no longer seem to be relevant. Reviewers: RKSimon, rnk, kparzysz, javed.absar Subscribers: sdardis, nemanjai, hiraditya, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D49200 llvm-svn: 344142	2018-10-10 14:15:52 +00:00
Nemanja Ivanovic	72d4866e57	[DAGCombiner] Expand combining of FP logical ops to sign-setting FP ops We already do the following combines: (bitcast int (and (bitcast fp X to int), 0x7fff...) to fp) -> fabs X (bitcast int (xor (bitcast fp X to int), 0x8000...) to fp) -> fneg X When the target has "bit preserving fp logic". This patch just extends it to also combine: (bitcast int (or (bitcast fp X to int), 0x8000...) to fp) -> fneg (fabs X) As some targets have fnabs and even those that don't can efficiently lower both the fabs and the fneg. Differential revision: https://reviews.llvm.org/D44548 llvm-svn: 344093	2018-10-09 23:20:11 +00:00
Sanjay Patel	a875030ab3	[AArch64][x86] add tests for bitcasted fnabs; NFC Alternate target coverage for D44548. llvm-svn: 344059	2018-10-09 17:20:26 +00:00
Oliver Stannard	367b4741f4	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969	2018-10-08 14:12:08 +00:00
Oliver Stannard	c922116a51	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968	2018-10-08 14:09:15 +00:00
Oliver Stannard	250e5a5b65	[AArch64][v8.5A] Branch Target Identification code-generation pass The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967	2018-10-08 14:04:24 +00:00
Oliver Stannard	9ecdac8ee0	[AArch64] Fix verifier error when outlining indirect calls The MachineOutliner for AArch64 transforms indirect calls into indirect tail calls, replacing the call with the TCRETURNri pseudo-instruction. This pseudo lowers to a BR, but has the isCall and isReturn flags set. The problem is that TCRETURNri takes a tcGPR64 as the register argument, to prevent indiret tail-calls from using caller-saved registers. The indirect calls transformed by the outliner could use caller-saved registers. This is fine, because the outliner ensures that the register is available at all call sites. However, this causes a verifier failure when the register is not in tcGPR64. The fix is to add a new pseudo-instruction like TCRETURNri, but which accepts any GPR. Differential revision: https://reviews.llvm.org/D52829 llvm-svn: 343959	2018-10-08 09:18:48 +00:00
Simon Pilgrim	012fda59a5	[AARCH64][X86] Remove _nonsplat from test names As discussed on D50222 llvm-svn: 343934	2018-10-07 11:24:04 +00:00
Jessica Paquette	b328d95333	[GlobalIsel] Add llvm.invariant.start and llvm.invariant.end Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator and update the arm64-irtranslator test. These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select), and this patch fixes that. https://reviews.llvm.org/D52945 llvm-svn: 343885	2018-10-05 21:02:46 +00:00
Daniel Sanders	a464ffd52c	[globalisel][combine] When placing truncates, handle the case when the BB is empty GlobalISel uses MIR with implicit fallthrough on each basic block. As a result, getFirstNonPhi() can return end(). llvm-svn: 343829	2018-10-04 23:47:37 +00:00
Daniel Sanders	ab358bfd09	[globalisel][combine] Fix a rare crash when encountering an instruction whose op0 isn't a reg The simplest instance of this is an intrinsic with no results which will have the intrinsic ID as operand 0. Also fix some benign incorrectness when op0 is a reg but isn't a def that was guarded against by checking for the extension opcodes. llvm-svn: 343821	2018-10-04 21:44:32 +00:00
Daniel Sanders	a05c7583c9	[globalisel][combine] Improve the truncate placement for the extending-loads combine This brings the extending loads patch back to the original intent but minus the PHI bug and with another small improvement to de-dupe truncates that are inserted into the same block. The truncates are sunk to their uses unless this would require inserting before a phi in which case it sinks to the _beginning_ of the predecessor block for that path (but no earlier than the def). The reason for choosing the beginning of the predecessor is that it makes de-duping multiple truncates in the same block simple, and optimized code is going to run a scheduler at some point which will likely change the position anyway. llvm-svn: 343804	2018-10-04 18:44:58 +00:00
Matthias Braun	0c67a4e958	AArch64: Fix XSeqPairs/WSeqPairs problems - Fix spill/reloads of XSeqPairs failing with vregs (only physregs worked correctly) - Add missing spill/reload code for WSeqPairs class Differential Revision: https://reviews.llvm.org/D52761 llvm-svn: 343799	2018-10-04 17:02:53 +00:00
Daniel Sanders	fb9b99b26e	[globalisel][combines] Don't sink G_TRUNC down to use if that use is a G_PHI This fixes a problem where the register allocator fails to eliminate a PHI because there's a non-PHI in the middle of the PHI instructions at the start of a BB. This G_TRUNC can be better placed but this at least fixes the correctness issue quickly. I'll follow up with a patch to the verifier to catch this kind of bug in future. llvm-svn: 343693	2018-10-03 15:43:39 +00:00
Daniel Sanders	bad3936109	[globalisel] Fix one more missing Verifier pass from gisel-commandline-option.ll llvm-svn: 343658	2018-10-03 02:52:54 +00:00
Daniel Sanders	34eac35a60	Add the missing new files from r343654 llvm-svn: 343655	2018-10-03 02:21:30 +00:00
Daniel Sanders	c973ad1878	Re-commit: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 The previous commit failed portions of the test-suite on GreenDragon due to duplicate COPY instructions and iterator invalidation. Both issues have now been fixed. To assist with this, a helper (cloneVirtualRegister) has been added to MachineRegisterInfo that can be used to get another register that has the same type and class/bank as an existing one. llvm-svn: 343654	2018-10-03 02:12:17 +00:00
Daniel Sanders	f430d941e9	[globalisel] Attempt to fix llvm-clang-x86_64-expensive-checks-win The behaviour of this bot indicates that -verify-machineinstrs has been forced on and is therefore inserting the verifier on builds that don't expect it. Explicitly specify whether it's enabled or disabled for each test. llvm-svn: 343633	2018-10-02 20:51:27 +00:00
Matt Morehouse	4b1ec17fb0	Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616	2018-10-02 18:35:44 +00:00
Fangrui Song	99d4f74d01	[AArch64][DAGCombiner]: change -stop-after=isel to instruction-select "isel" is registered by AMDGPU. The test will break if the AMDGPU target is not built. llvm-svn: 343553	2018-10-02 00:22:51 +00:00
Daniel Sanders	33f42f97af	Revert: r343521 and r343541: [globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 There's a strange assertion on two of the Green Dragon bots that goes away when this is reverted. The assertion is in RegBankAlloc and if it is this commit then -verify-machine-instrs should have caught it earlier in the pipeline. llvm-svn: 343546	2018-10-01 22:32:08 +00:00
Daniel Sanders	9659bfda5a	[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 343521	2018-10-01 18:56:47 +00:00
Matthias Braun	3e081703c3	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520	2018-10-01 18:56:39 +00:00
Matthias Braun	004fe6bf83	DAGCombiner: StoreMerging: Fix bad index calculating when adjusting mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector\|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493	2018-10-01 16:25:50 +00:00
Roman Lebedev	0496477c5d	[NFC][CodeGen][X86][AArch64] Add 64-bit constant bit field extract pattern tests llvm-svn: 343404	2018-09-30 12:42:08 +00:00
Evandro Menezes	fc1852ff1c	[AArch64] Split zero cycle feature more granularly Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp` and `zcz-fp`, respectively, while retaining the original feature option to mean both. Differential revision: https://reviews.llvm.org/D52621 llvm-svn: 343354	2018-09-28 19:05:09 +00:00
Luke Cheeseman	10981cc884	Revert r343317 - asan buildbots are breaking and I need to investigate the issue llvm-svn: 343341	2018-09-28 17:01:50 +00:00
Luke Cheeseman	21f2955bb2	Reapply changes reverted by r343235 - Add fix so that all code paths that create DWARFContext with an ObjectFile initialise the target architecture in the context - Add an assert that the Arch is known in the Dwarf CallFrameString method llvm-svn: 343317	2018-09-28 13:37:27 +00:00
Luke Cheeseman	8e5676b1aa	Revert r343192 as an ubsan build is currently failing llvm-svn: 343235	2018-09-27 16:47:30 +00:00
Luke Cheeseman	f6844b307a	Reapply changes reverted in r343114, lldb patch to follow shortly llvm-svn: 343192	2018-09-27 10:39:20 +00:00
Luke Cheeseman	77aaa22081	Revert r343112 as CallFrameString API change has broken lldb builds llvm-svn: 343114	2018-09-26 14:48:03 +00:00
Luke Cheeseman	03ad8812f5	[AArch64] - Return address signing dwarf support - Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll llvm-svn: 343112	2018-09-26 14:30:29 +00:00
Francis Visoiu Mistrih	6acaa18afc	[CodeGen] Always print register ties in MI::dump() It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107	2018-09-26 13:33:09 +00:00
Hans Wennborg	00b88bbcaf	Revert r343089 "[AArch64] - Return address signing dwarf support" This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail. > Functions that have signed return addresses need additional dwarf support: > - After signing the LR, and before authenticating it, the LR register is in a > state the is unusable by a debugger or unwinder > - To account for this a new directive, .cfi_negate_ra_state, is added > - This directive says the signed state of the LR register has now changed, > i.e. unsigned -> signed or signed -> unsigned > - This directive has the same CFA code as the SPARC directive GNU_window_save > (0x2d), adding a macro to account for multiply defined codes > - This patch matches the gcc implementation of this support: > https://patchwork.ozlabs.org/patch/800271/ > > Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343103	2018-09-26 12:57:45 +00:00
Luke Cheeseman	f755e687fc	[AArch64] - Return address signing dwarf support Functions that have signed return addresses need additional dwarf support: - After signing the LR, and before authenticating it, the LR register is in a state the is unusable by a debugger or unwinder - To account for this a new directive, .cfi_negate_ra_state, is added - This directive says the signed state of the LR register has now changed, i.e. unsigned -> signed or signed -> unsigned - This directive has the same CFA code as the SPARC directive GNU_window_save (0x2d), adding a macro to account for multiply defined codes - This patch matches the gcc implementation of this support: https://patchwork.ozlabs.org/patch/800271/ Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343089	2018-09-26 10:14:15 +00:00
Christy Lee	e94374809e	Re-submitting changes in D51550 because it failed to patch. Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919	2018-09-24 20:47:12 +00:00
Sanjay Patel	2c901742ca	[DAGCombiner] use UADDO to optimize saturated unsigned add This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886	2018-09-24 14:47:15 +00:00
Roman Lebedev	fb697d0f1b	[NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constants It would be best to introduce ISD::BitFieldExtract, because clearly more than one backend faces the same problem. But for now let's solve this in the x86-specific DAG combine. https://bugs.llvm.org/show_bug.cgi?id=38938 llvm-svn: 342880	2018-09-24 13:24:20 +00:00
Tri Vo	6c47c62588	[AArch64] Support adding X[8-15,18] registers as CSRs. Summary: Specifying X[8-15,18] registers as callee-saved is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we: - use custom CSR list/mask when user specifies custom CSRs - update Machine Register Info's list of CSRs with additional custom CSRs in LowerCall and LowerFormalArguments. Reviewers: srhines, nickdesaulniers, efriedma, javed.absar Reviewed By: nickdesaulniers Subscribers: kristof.beyls, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52216 llvm-svn: 342824	2018-09-22 22:17:50 +00:00
Roman Lebedev	38c25ace53	[NFC][x86][AArch64] Add BEXTR-like test patterns. Summary: Also, adjust the check prefixes so that we actually get to check the BMI1-only-case. Reviewers: craig.topper, RKSimon, spatel, javed.absar Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48490 llvm-svn: 342623	2018-09-20 07:54:49 +00:00
Matthias Braun	28d6a4ac9a	AArch64: Add FuseCryptoEOR fusion rules There's some additional rules available on newer apple CPUs. rdar://41235346 llvm-svn: 342590	2018-09-19 20:50:51 +00:00
John Brawn	83d7414e19	[TargetLowering] Android has sincos functions Since Android API version 9 the Android libm has had the sincos functions, so they should be recognised as libcalls and sincos optimisation should be applied. Differential Revision: https://reviews.llvm.org/D52025 llvm-svn: 342471	2018-09-18 13:18:21 +00:00
Simon Pilgrim	dbdd46da18	[AArch64] Add integer abs testcases for D51873. llvm-svn: 342156	2018-09-13 17:11:25 +00:00
Sander de Smalen	2d77e788f2	[AArch64] Implement aarch64_vector_pcs codegen support. This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049	2018-09-12 12:10:22 +00:00
Jessica Paquette	2386eab360	[MachineOutliner] Add codegen size remarks to the MachineOutliner Since the outliner is a module pass, it doesn't get codegen size remarks like the other codegen passes do. This adds size remarks to the outliner. This is kind of a workaround, so it's peppered with FIXMEs; size remarks really ought to not ever be handled by the pass itself. However, since the outliner is the only "MachineModulePass", this works for now. Since the entire purpose of the MachineOutliner is to produce code size savings, it really ought to be included in codgen size remarks. If we ever go ahead and make a MachineModulePass (say, something similar to MachineFunctionPass), then all of this ought to be moved there. llvm-svn: 342009	2018-09-11 23:05:34 +00:00
Josh Stone	f446facab0	[GlobalISel] Lower dbg.declare into indirect DBG_VALUE Summary: D31439 changed the semantics of dbg.declare to take the address of a variable as the first argument, making it indirect. It specifically updated FastISel for this change here: https://reviews.llvm.org/D31439#change-WVArzi177jPl GlobalISel needs to follow suit, or else it will be missing a level of indirection in the generated debuginfo. This problem was seen in a Rust debuginfo test on aarch64, since GlobalISel is used at -O0 for aarch64. https://github.com/rust-lang/rust/issues/49807 https://bugzilla.redhat.com/show_bug.cgi?id=1611597 https://bugzilla.redhat.com/show_bug.cgi?id=1625768 Reviewers: dblaikie, aprantl, t.p.northover, javed.absar, rnk Reviewed By: rnk Subscribers: #debug-info, rovka, kristof.beyls, JDevlieghere, llvm-commits, tstellar Differential Revision: https://reviews.llvm.org/D51749 llvm-svn: 341969	2018-09-11 17:52:01 +00:00
Roman Lebedev	baf2628043	[DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform For https://reviews.llvm.org/D50222 Patch by: hermord (Dmytro Shynkevych)! llvm-svn: 341953	2018-09-11 15:34:26 +00:00
Sanjay Patel	e368f46788	[AArch64] test codegen for unsigned saturated add; NFC This is identical to the tests added for x86 at rL341845. A semi-generic DAGCombine should improve things universally. llvm-svn: 341935	2018-09-11 13:21:28 +00:00
Nick Desaulniers	287a3be379	[AArch64] Support reserving x1-7 registers. Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706	2018-09-07 20:58:57 +00:00
JF Bastien	2920061105	ARM64: improve non-zero memset isel by ~2x Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558	2018-09-06 16:03:32 +00:00
JF Bastien	ec812ce3d6	NFC: more memset inline arm64 coverage I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes. llvm-svn: 341493	2018-09-05 20:35:06 +00:00
Sanjay Patel	dbf52837fe	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x)) This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481	2018-09-05 17:01:56 +00:00
Zhaoshi Zheng	a0aa41d793	Revert "Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions" Reland r341269. Use std::stable_sort when sorting constant condidates. Reverting commit, r341365: Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. Original commit, r341269: [Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. https://reviews.llvm.org/D51396 Differential Revision: https://reviews.llvm.org/D51654 llvm-svn: 341417	2018-09-04 22:17:03 +00:00
JF Bastien	fd458fe205	NFC: expand memset inline arm64 coverage I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch simply expands the two existing tests to capture more of the current code generation when it comes to heap-based and stack-based small memset on arm64. The tested code is already pretty good, notably when it comes to using STP, FP stores, FP immediate generation, and folding one of the stores into a stack spill when possible. The uses of STUR could be improved, and some more pairing could occur. Straying from bzero patterns currently yield suboptimal code, and I expect a variety of small changes could make things way better. llvm-svn: 341406	2018-09-04 21:02:00 +00:00
Martin Storsjo	fed420d6b6	[MinGW] [AArch64] Add stubs for potential automatic dllimported variables The runtime pseudo relocations can't handle the AArch64 format PC relative addressing in adrp+add/ldr pairs. By using stubs, the potentially dllimported addresses can be touched up by the runtime pseudo relocation framework. Differential Revision: https://reviews.llvm.org/D51452 llvm-svn: 341401	2018-09-04 20:56:21 +00:00
Chandler Carruth	6cb12444cc	Revert r341269: [Constant Hoisting] Hoisting Constant GEP Expressions One of the tests is failing 50% of the time when expensive checks are enabled. Not sure how deep the problem is so just reverting while the author can investigate so that the bots stop repeatedly failing and blaming things incorrectly. Will respond with details on the original commit. llvm-svn: 341365	2018-09-04 13:36:44 +00:00
Sanjay Patel	0945959869	[AArch64][x86] add tests for pow(x, 0.25); NFC Folds for this were proposed in D49306, but we decided the transform is better suited for the backend. llvm-svn: 341341	2018-09-03 22:11:47 +00:00
Sander de Smalen	6cab60fa06	Extend hasStoreToStackSlot with list of FI accesses. For instructions that spill/fill to and from multiple frame-indices in a single instruction, hasStoreToStackSlot and hasLoadFromStackSlot should return an array of accesses, rather than just the first encounter of such an access. This better describes FI accesses for AArch64 (paired) LDP/STP instructions. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D51537 llvm-svn: 341301	2018-09-03 09:15:58 +00:00
Roman Lebedev	d7a6244475	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle inverted pattern Summary: A follow-up for D49266 / rL337166 + D49497 / rL338044. This is still the same pattern to check for the [lack of] signed truncation, but in this case the constants and the predicate are negated. https://rise4fun.com/Alive/BDV https://rise4fun.com/Alive/n7Z Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma, dmgreen Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51532 llvm-svn: 341287	2018-09-02 13:56:22 +00:00
Zhaoshi Zheng	f5297fb24b	[Constant Hoisting] Hoisting Constant GEP Expressions Leverage existing logic in constant hoisting pass to transform constant GEP expressions sharing the same base global variable. Multi-dimensional GEPs are rewritten into single-dimensional GEPs. Differential Revision: https://reviews.llvm.org/D51396 llvm-svn: 341269	2018-09-01 00:04:56 +00:00
Roman Lebedev	75c2961b76	[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern.[NFC][X86][AArch64] A few more patterns for [lack of] signed truncation check pattern. llvm-svn: 341188	2018-08-31 08:52:03 +00:00
Ties Stuij	9c16d809d2	[CodeGen] emit inline asm clobber list warnings for reserved (cont) Summary: This is a continuation of https://reviews.llvm.org/D49727 Below the original text, current changes in the comments: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } Compiled for thumb, this gives: $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ Reviewers: efriedma, olista01, javed.absar Reviewed By: efriedma Subscribers: eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D51165 llvm-svn: 341062	2018-08-30 12:52:35 +00:00
David Green	1f203bcd75	[AArch64] Optimise load(adr address) to ldr address Providing that the load is known to be 4 byte aligned, we can optimise a ldr(adr address) to just ldr address. Differential Revision: https://reviews.llvm.org/D51030 llvm-svn: 341058	2018-08-30 11:55:16 +00:00
Roman Lebedev	26a1836757	[NFC][CodeGen][SelectionDAG] Tests for X % C == 0 codegen improvement. Hacker's Delight 10-17: when C is constant, the result of X % C == 0 can be computed more cheaply without actually calculating the remainder. The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479. Patch by: hermord (Dmytro Shynkevych)! For https://reviews.llvm.org/D50222 llvm-svn: 341047	2018-08-30 09:32:21 +00:00
Huihui Zhang	2f4106592d	[GlobalMerge] Fix GlobalMerge on bss external global variables. Summary: Global variables that are external and zero initialized are supposed to be merged with global variables in the bss section rather than the data section. Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc Reviewed By: efriedma Subscribers: dmgreen, llvm-commits Differential Revision: https://reviews.llvm.org/D51379 llvm-svn: 341008	2018-08-30 00:49:50 +00:00
Peter Collingbourne	9c9c8b22d2	Start reserving x18 by default on Android targets. Differential Revision: https://reviews.llvm.org/D45588 llvm-svn: 340889	2018-08-29 01:38:47 +00:00
Aditya Nandakumar	6b4d343e13	[GISel]: Add missing opcodes for overflow intrinsics https://reviews.llvm.org/D51197 Currently, IRTranslator (and GISel) seems to be arbitrarily picking which overflow intrinsics get mapped into opcodes which either have a carry as an input or not. For intrinsics such as Intrinsic::uadd_with_overflow, translate it to an opcode (G_UADDO) which doesn't have any carry inputs (similar to LLVM IR). This patch adds 4 missing opcodes for completeness - G_UADDO, G_USUBO, G_SSUBE and G_SADDE. llvm-svn: 340865	2018-08-28 18:54:10 +00:00
Eli Friedman	071203bbf2	[AArch64] Reject inline asm with FP registers when FP is disabled. Otherwise, we would crash trying to deal with an illegal input. Differential Revision: https://reviews.llvm.org/D51202 llvm-svn: 340637	2018-08-24 19:12:13 +00:00
Sanjay Patel	ed1b9695ee	[SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527) This solves the motivating case from: https://bugs.llvm.org/show_bug.cgi?id=38527 If we are legalizing an FP vector op that maps to 1 of the LLVM intrinsics that mimic libm calls, but we're going to end up with scalar libcalls for that vector type anyway, then we should unroll the vector op into scalars before widening. This avoids libcalls because we've lost the knowledge that some of the scalar elements are undef. Differential Revision: https://reviews.llvm.org/D50791 llvm-svn: 340469	2018-08-22 22:52:05 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Aditya Nandakumar	2a08285cf3	Revert "Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics" This reverts commit 7debc334e6421bb5251ef8f18e97166dfc7dd787. I missed updating legalizer-info-validation.mir as I had assertions turned off in my build and that specific test requires asserts. Fixed it now. llvm-svn: 340197	2018-08-20 18:43:19 +00:00
Matt Arsenault	25e51540e1	DAG: Fix isKnownNeverNaN for basic non-sNaN cases fadd/fsub/fmul need to worry about infinities as well as fdiv. llvm-svn: 340085	2018-08-17 21:19:22 +00:00
Luke Cheeseman	64dcdec60c	[AArch64] - Generate pointer authentication instructions - Generate pointer authentication instructions - The functions instrumented depend on function attribtues: all (all functions instrumentent) non-leaf (only those that spill LR) none - Function epilogues sign the LR before spilling to the stack and authenticate the LR once restored - If the target is v8.3a or greater than can use the combined authenticate and return instruction Differential revision: https://reviews.llvm.org/D49793 llvm-svn: 340018	2018-08-17 12:53:22 +00:00
Chandler Carruth	b898b86f49	Revert r339977: [GISel]: Add Opcodes for a few LLVM Intrinsics This is breaking ~all the bots. llvm-svn: 339982	2018-08-17 04:47:16 +00:00
Aditya Nandakumar	973a557338	[GISel]: Add Opcodes for a few LLVM Intrinsics https://reviews.llvm.org/D50401 Add opcodes for llvm.intrinsic.trunc, round, and update the IRTranslator for the same. Reviewed by: dsanders. llvm-svn: 339977	2018-08-17 01:41:56 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Sanjay Patel	49a8280f43	[AArch64] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC These correspond to the x86 tests added with rL339790 / rL339791, but I widened the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops. llvm-svn: 339793	2018-08-15 17:06:21 +00:00
Amara Emerson	30e61404a8	[GlobalISel][IRTranslator] Fix a bug in handling repeating struct types during argument lowering. Differential Revision: https://reviews.llvm.org/D49442 llvm-svn: 339674	2018-08-14 12:04:25 +00:00
Sanjay Patel	15d1501aae	[SelectionDAG] try harder to convert funnel shift to rotate Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359	2018-08-09 17:26:22 +00:00
Ties Stuij	0244aa67d6	revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved' llvm-svn: 339276	2018-08-08 17:19:32 +00:00
Ties Stuij	52f3631f4b	[CodeGen] emit inline asm clobber list warnings for reserved Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257	2018-08-08 15:15:59 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Aditya Nandakumar	e07b3b737b	[GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP https://reviews.llvm.org/D48600 Added IRTranslator support to translate these known intrinsics into GISel opcodes. llvm-svn: 338944	2018-08-04 01:22:12 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Lei Liu	b9a7b7a84d	Fix FCOPYSIGN expansion In expansion of FCOPYSIGN, the shift node is missing when the two operands of FCOPYSIGN are of the same size. We should always generate shift node (if the required shift bit is not zero) to put the sign bit into the right position, regardless of the size of underlying types. Differential Revision: https://reviews.llvm.org/D49973 llvm-svn: 338665	2018-08-02 01:54:12 +00:00
Sanjay Patel	8aac22e06a	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592	2018-08-01 17:17:08 +00:00
Bryan Chan	67106b5e08	[AArch64] Fix FCCMP with FP16 operands Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554	2018-08-01 13:50:29 +00:00
Amara Emerson	6cdfe29d8e	[GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate. Previously we were just visiting the blocks in the function in IR order, which is rather arbitrary. Therefore we wouldn't always visit defs before uses, but the translation code relies on this assumption in some places. Only codegen change seen in tests is an elision of a redundant copy. Fixes PR38396 llvm-svn: 338476	2018-08-01 02:17:42 +00:00
Amara Emerson	1e8c164c63	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR. Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337	2018-07-31 00:09:02 +00:00
Amara Emerson	0e86c07077	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal. Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336	2018-07-31 00:08:56 +00:00
Amara Emerson	6aff5a7810	[GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants. Differential Revision: https://reviews.llvm.org/D49900 llvm-svn: 338335	2018-07-31 00:08:50 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Jessica Paquette	fa3bee4756	[MachineOutliner][AArch64] Add support for saving LR to a register This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278	2018-07-30 17:45:28 +00:00
Jessica Paquette	bbcc8895bb	Add machine verifier to arm64-opt-remarks-lazy-bfi Previously, I thought this was a Windows failure. Then I realized it failed on every bot that used the verifier. This makes it use the verifier always, and adds that pass to the pipeline checks so that it's consistent across all bots. llvm-svn: 338272	2018-07-30 17:13:25 +00:00
David Bolvansky	2fa7fb14ea	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270	2018-07-30 16:50:00 +00:00
Jessica Paquette	7816531f3c	Attempt to fix Windows test failure caused by r338133 It seems like the pass pipeline on Windows is slightly different than on Linux and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing. This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly again. It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT lines following the line we care about (the optimization remark emitter) do a pretty good job of enforcing the ordering we want. Hopefully this works, since I don't have a Windows machine. ;) Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295 llvm-svn: 338267	2018-07-30 16:36:22 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Jessica Paquette	f90edbe3d6	Recommit "Enable MachineOutliner by default under -Oz for AArch64" Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160	2018-07-27 20:18:27 +00:00
Sanjay Patel	06c7d5aef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150	2018-07-27 18:31:21 +00:00
Sanjay Patel	efac39eef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC llvm-svn: 338143	2018-07-27 18:12:29 +00:00
Jessica Paquette	faea2d3130	Revert "Enable MachineOutliner by default under -Oz for AArch64" It failed an Asan test on a bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio Fixing that before recommitting. llvm-svn: 338136	2018-07-27 17:25:38 +00:00
Jessica Paquette	d4229b985c	Enable MachineOutliner by default under -Oz for AArch64 This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338133	2018-07-27 16:44:42 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Sanjay Patel	f815bc658b	[AArch64] add more tests for signbit math; NFC llvm-svn: 338129	2018-07-27 16:21:56 +00:00
Matthias Braun	09810c9269	MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling When fusing instructions A and B, we must add all predecessors of B as predecessors of A to avoid instructions getting scheduling in between. There is a special case involving ExitSU: Every other node must be scheduled before it by design and we don't need to make this explicit in the graph, however when fusing with a different node we need to schedule every othere node before the fused node too and we need to make this explicit now: This patch adds a dependency from the fused node to all roots in the graph. Differential Revision: https://reviews.llvm.org/D49830 llvm-svn: 338046	2018-07-26 17:43:56 +00:00
Roman Lebedev	41ba5c1455	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes. Summary: A follow-up for D49266 / rL337166. At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyhZZ We won't get to these cases with I1 being -1, as that will be constant-folded to true or false. I'm also not sure we actually hit the 'ule' case, but i think the worst think that could happen is that being dead code. Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49497 llvm-svn: 338044	2018-07-26 17:34:28 +00:00
Martin Storsjo	9dafd6f6d9	Revert "[COFF] Use comdat shared constants for MinGW as well" This reverts commit r337951. While that kind of shared constant generally works fine in a MinGW setting, it broke some cases of inline assembly that worked before: $ cat const-asm.c int MULH(int a, int b) { int rt, dummy; __asm__ ( "imull %3" :"=d"(rt), "=a"(dummy) :"a"(a), "rm"(b) ); return rt; } int func(int a) { return MULH(a, 1); } $ clang -target x86_64-win32-gnu -c const-asm.c -O2 const-asm.c:4:9: error: invalid variant '00000001' "imull %3" ^ <inline asm>:1:15: note: instantiated into assembly here imull __real@00000001(%rip) ^ A similar error is produced for i686 as well. The same test with a target of x86_64-win32-msvc or i686-win32-msvc works fine. llvm-svn: 338018	2018-07-26 10:48:20 +00:00
Amara Emerson	fdd089aa14	[GlobalISel] Fall back to SDISel for swifterror/swiftself attributes. We don't currently support these, fall back until we do. llvm-svn: 337994	2018-07-26 01:25:58 +00:00
Sanjay Patel	215dcbf4db	[SelectionDAG] try to convert funnel shift directly to rotate if legal If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966	2018-07-25 21:38:30 +00:00
Sanjay Patel	f94c4c84e6	[AArch, PowerPC] add more tests for legal rotate ops; NFC llvm-svn: 337964	2018-07-25 21:25:50 +00:00
Martin Storsjo	ff33a95ed4	[COFF] Use comdat shared constants for MinGW as well GNU binutils tools have no problems with this kind of shared constants, provided that we actually hook it up completely in AsmPrinter and produce a global symbol. This effectively reverts SVN r335918 by hooking the rest of it up properly. This feature was implemented originally in SVN r213006, with no reason for why it can't be used for MinGW other than the fact that GCC doesn't do it while MSVC does. Differential Revision: https://reviews.llvm.org/D49646 llvm-svn: 337951	2018-07-25 18:35:42 +00:00
Martin Storsjo	d2662c32fb	[COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950	2018-07-25 18:35:31 +00:00
Martin Storsjo	c2b701408e	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755	2018-07-23 22:15:14 +00:00
Simon Pilgrim	bfb900d363	[DAGCombiner] Add rotate-extract tests Add new tests from D47681 to current codegen. Also added i686 codegen tests. llvm-svn: 337445	2018-07-19 09:27:34 +00:00
Roman Lebedev	5317e88300	[NFC][X86][AArch64][DAGCombine] More tests for optimizeSetCCOfSignedTruncationCheck() At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyh llvm-svn: 337400	2018-07-18 16:19:06 +00:00
Sanjay Patel	c71adc8040	[Intrinsics] define funnel shift IR intrinsics + DAG builder support As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221	2018-07-16 22:59:31 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Sanjay Patel	a41c886c55	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X) This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130	2018-07-15 16:27:07 +00:00
Roman Lebedev	b64e74feed	[NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern See D49247, D49266 I'm only adding the sane negative tests, and not adding the one-use tests yet. Also, not adding negative tests for the second pattern with inverted operands yet, since it's handling will be added in later differential. llvm-svn: 337014	2018-07-13 16:14:37 +00:00
Simon Pilgrim	9fe0bf3be7	[AArch64] Updated bigendian buildvector tests As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests. llvm-svn: 336981	2018-07-13 09:25:32 +00:00
Roman Lebedev	1574e49792	[NFC][X86][AArch64] Add tests for the 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But as it looks from these tests, i think we want to revert at least some cases in DAGCombine. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49247 llvm-svn: 336917	2018-07-12 17:00:11 +00:00
Joel E. Denny	9fa9c9368d	[FileCheck] Add -allow-deprecated-dag-overlap to failing llvm tests See https://reviews.llvm.org/D47106 for details. Reviewed By: probinson Differential Revision: https://reviews.llvm.org/D47171 This commit drops that patch's changes to: llvm/test/CodeGen/NVPTX/f16x2-instructions.ll llvm/test/CodeGen/NVPTX/param-load-store.ll For some reason, the dos line endings there prevent me from commiting via the monorepo. A follow-up commit (not via the monorepo) will finish the patch. llvm-svn: 336843	2018-07-11 20:25:49 +00:00
Simon Pilgrim	f6ff75c4c2	Fix check-prefix vs check-prefixes typo in updated test llvm-svn: 336787	2018-07-11 10:42:51 +00:00
Simon Pilgrim	1975efe555	[AArch64] Regenerate SDIV tests Will make codegen diffs much easier to grok in a future patch llvm-svn: 336786	2018-07-11 10:39:50 +00:00
Daniel Sanders	9481399c0f	[globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg Summary: This patch adds support for the atomicrmw instructions and the strong cmpxchg instruction to the IRTranslator. I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what difference it makes to the backend. As far as I can tell from the code, it only matters to AtomicExpandPass which is run at the LLVM-IR level. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D40092 llvm-svn: 336589	2018-07-09 19:33:40 +00:00
Yvan Roux	7382cf8225	[MachineOutliner] Add missing liveness tracking info in MIR test. This should bring the bots back to green state. llvm-svn: 336482	2018-07-07 08:42:31 +00:00
Nico Weber	038dbf3c24	Revert 336426 (and follow-ups 428, 440), it very likely caused PR38084. llvm-svn: 336453	2018-07-06 17:37:24 +00:00
Diogo N. Sampaio	81e9dd1ed7	Commit rL336426 cause buildbot failures http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/50537/testReport/junit/LLVM/CodeGen_AArch64/FoldRedundantShiftedMasking_ll/ This removes the comments of the function label causing this error. llvm-svn: 336440	2018-07-06 14:41:09 +00:00
Diogo N. Sampaio	742bf1a255	[SelectionDAG] https://reviews.llvm.org/D48278 D48278 Allow to reduce redundant shift masks. For example: x1 = x & 0xAB00 x2 = (x >> 8) & 0xAB can be reduced to: x1 = x & 0xAB00 x2 = x1 >> 8 It only allows folding when the masks and shift values are constants. llvm-svn: 336426	2018-07-06 09:42:25 +00:00
Sanjay Patel	bce899ff59	[AArch64, PowerPC, x86] add tests for signbit bit hacks; NFC llvm-svn: 336348	2018-07-05 13:16:46 +00:00
Mikael Holmen	8505f34b29	Partial revert of "NFC - Various typo fixes in tests" This partially reverts r336268 since it causes buildbot failures. Added FIXME at the places where the CHECKs are misspelled. llvm-svn: 336323	2018-07-05 08:42:16 +00:00
Gabor Buella	da4a966e1c	NFC - Various typo fixes in tests llvm-svn: 336268	2018-07-04 13:28:39 +00:00
Amara Emerson	d912ffaba5	[AArch64][GlobalISel] Fix fallbacks introduced in r336120 due to unselectable stores. r336120 resulted in falling back to SelectionDAG more often due to the G_STORE MMOs not matching the vreg size. This fixes that by explicitly any-extending the value. llvm-svn: 336209	2018-07-03 15:59:26 +00:00
Amara Emerson	846f2436e8	[AArch64][GlobalISel] Any-extend vararg parameters to stack slot size on Darwin. We currently don't any-extend vararg parameters before storing them to the stack locations on Darwin. However, SelectionDAG however does this, and so user code is in the wild which inadvertently relies on this extension. This can manifest in cases where the value stored is (int)0, but the actual parameter is interpreted by va_arg as a pointer, and so not extending to 64 bits causes the callee to load additional undefined bits. llvm-svn: 336120	2018-07-02 16:39:09 +00:00
Jessica Paquette	8bda1881ca	[MachineOutliner] Add support for target-default outlining. This adds functionality to the outliner that allows targets to specify certain functions that should be outlined from by default. If a target supports default outlining, then it specifies that in its TargetOptions. In the case that it does, and the user hasn't specified that they never want to outline, the outliner will be added to the pass pipeline and will run on those default functions. This is a preliminary patch for turning the outliner on by default under -Oz for AArch64. https://reviews.llvm.org/D48776 llvm-svn: 336040	2018-06-30 03:56:03 +00:00
Jessica Paquette	79917b9686	[MachineOutliner] Add always and never options to -enable-machine-outliner This is a recommit of r335887, which was erroneously committed earlier. To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target default outlining behaviour. https://reviews.llvm.org/D48682 llvm-svn: 335986	2018-06-29 16:12:45 +00:00
Jessica Paquette	0c5d3ffbb8	[MachineOutliner] Never add the outliner in -O0 This is a recommit of r335879. We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also removes -O0 from the outliner DWARF test. llvm-svn: 335930	2018-06-28 21:49:24 +00:00
Jessica Paquette	d6261bef7b	Revert "[MachineOutliner] Add always and never options to -enable-machine-outliner" I accidentally committed this instead of D48683 because I haven't had coffee yet. llvm-svn: 335883	2018-06-28 17:26:19 +00:00
Jessica Paquette	f3a44fe833	Revert "[MachineOutliner] Never add the outliner in -O0" This reverts commit 9c7c10e4073a0bc6a759ce5cd33afbac74930091. It relies on r335872 since that introduces the machine outliner flags test. I meant to commit D48683 in that commit, but got mixed up and committed D48682 instead. So, I'm reverting this and r335872, since D48682 hasn't made it through review yet. llvm-svn: 335882	2018-06-28 17:26:18 +00:00
Jessica Paquette	c9d675266e	[MachineOutliner] Never add the outliner in -O0 We shouldn't add the outliner when compiling at -O0 even if -enable-machine-outliner is passed in. This makes sure that we don't add it in this case. This also updates machine-outliner-flags to reflect the change and improves the comment describing what that test does. llvm-svn: 335879	2018-06-28 17:05:57 +00:00
Jessica Paquette	1ccb66c5fb	[MachineOutliner] Add always and never options to -enable-machine-outliner To enable the MachineOutliner by default on AArch64, we need to be able to disable the MachineOutliner and also provide an option to "always" enable the outliner. This adds that capability. It allows the user to still use the old -enable-machine-outliner option, which defaults to "always". This is building up to allowing the user to specify "always" versus the target-default outlining behaviour. llvm-svn: 335872	2018-06-28 16:39:42 +00:00
Daniel Sanders	bdeb880d14	[globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767	2018-06-27 19:03:21 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Jessica Paquette	f472f6159a	[MachineOutliner] Don't outline sequences where x16/x17/nzcv are live across It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758	2018-06-27 17:43:27 +00:00
Luke Geeson	316327150b	[AArch64] Reverting FP16 vcvth_n_s64_f16 to fix llvm-svn: 335737	2018-06-27 14:34:40 +00:00
Adhemerval Zanella	cadcfed7aa	[AArch64] Add custom lowering for v4i8 trunc store This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int src, int width, unsigned char dst) { for (int i = 0; i < width; i++) dst++ = src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735	2018-06-27 13:58:46 +00:00
Luke Geeson	68cb233c0f	[AArch64] Remove Duplicate FP16 Patterns with same encoding, match on existing patterns llvm-svn: 335715	2018-06-27 09:20:13 +00:00

... 2 3 4 5 6 ...

2489 Commits