llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	08e082619a	[X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result. This fixes PR35833 Differential Revison: https://reviews.llvm.org/D41794 llvm-svn: 339818	2018-08-15 21:21:52 +00:00
Matt Arsenault	6c7ba82900	AMDGPU: Address todo for handling 1/(2 pi) llvm-svn: 339814	2018-08-15 21:03:55 +00:00
Vitaly Buka	ed4239f482	Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare" use-after-poison in check-llvm under asan This reverts commit r339755. llvm-svn: 339806	2018-08-15 20:09:35 +00:00
Sanjay Patel	49a8280f43	[AArch64] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC These correspond to the x86 tests added with rL339790 / rL339791, but I widened the non-fsin tests to v3f32 to show the problem because AArch supports v2f32 ops. llvm-svn: 339793	2018-08-15 17:06:21 +00:00
Krzysztof Parzyszek	3b097b4d3e	[RegisterCoalescer] Ensure that both registers have subranges if one does llvm-svn: 339792	2018-08-15 17:04:58 +00:00
Sanjay Patel	712d42f53d	[x86] add fabs test for vector intrinsic to potential libcall bug; NFC This is a negative test for x86 because it has custom lowering for fabs. llvm-svn: 339791	2018-08-15 16:56:09 +00:00
Sanjay Patel	f9afee479f	[x86] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC llvm-svn: 339790	2018-08-15 16:35:50 +00:00
Krzysztof Parzyszek	88d267d094	[RegisterCoalescer] Reset VNInfo def when copying segments over llvm-svn: 339788	2018-08-15 16:21:53 +00:00
Derek Schuff	82812fb986	[WebAssembly] SIMD replace_lane Implement and test replace_lane instructions. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50750 llvm-svn: 339786	2018-08-15 16:18:51 +00:00
Krzysztof Parzyszek	46ce441df6	[RegAlloc] Check that subreg liveness tracking applies to given virtual reg Subregister liveness applies selectively to register classes with certain properties. Make sure that when it's enabled, it applies to a given virtual register (in virtual register rewriter). llvm-svn: 339784	2018-08-15 16:07:47 +00:00
Krzysztof Parzyszek	4e06beb820	[SystemZ] Add testcase for r339778 llvm-svn: 339780	2018-08-15 15:43:13 +00:00
Nemanja Ivanovic	5b9a4f8ee5	[PowerPC] Enhance the selection(ISD::VSELECT) of vector type To make ISD::VSELECT available(legal) so long as there are altivec instruction, otherwise it's default behavior is expanding. Use xxsel to match vselect if vsx is open, or use vsel. In order to do not write many patterns in td file, promote (for vector it's bitcast) all other type into v4i32 and only pattern match vselect of v4i32 into vsel or xxsel. Patch by wuzish Differential revision: https://reviews.llvm.org/D49531 llvm-svn: 339779	2018-08-15 15:30:36 +00:00
Sam Parker	fabf7fe5f8	[ARM] TypeSize lower bound for ARMCodeGenPrepare We only try to promote types with are smaller than 16-bits, but we also need to check that the type is not less than 8-bits. Differential Revision: https://reviews.llvm.org/D50769 llvm-svn: 339770	2018-08-15 13:29:50 +00:00
Nemanja Ivanovic	8b4bd09e22	[PowerPC] Don't run BV DAG Combine before legalization if it assumes legal types When trying to combine a DAG that builds a vector out of sign-extensions of vector extracts, the code assumes legal input types. Due to that, we have to disable this combine prior to legalization. In some cases, the DAG will look slightly different after legalization so account for that in the matching code. This is a fix for https://bugs.llvm.org/show_bug.cgi?id=38087 Differential Revision: https://reviews.llvm.org/D49080 llvm-svn: 339769	2018-08-15 12:58:13 +00:00
Simon Pilgrim	51cee894da	[X86][SSE] Add sdiv by nonuniform constant vector tests Tests cover each TargetLowering::BuildSDIV path separately plus combos llvm-svn: 339761	2018-08-15 10:59:29 +00:00
Aleksandr Urakov	eb3735e425	[X86] Add sibling-call test cases This commit adds new sibling-call test cases, so it will be possible to see how these test cases will be changed after applying D45653. See D45653 for details. llvm-svn: 339760	2018-08-15 10:54:06 +00:00
Simon Pilgrim	a272fa9b0c	[TargetLowering] Add support for non-uniform vectors to BuildExactSDIV This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators. Differential Revision: https://reviews.llvm.org/D50392 llvm-svn: 339756	2018-08-15 09:35:12 +00:00
Sam Parker	6548cd3905	[ARM] Allow signed icmps in ARMCodeGenPrepare Treat signed icmps as 'sinks', allowing them to be in the use-def tree, enabling more promotions to be performed. As a sink, any promoted incoming values need to be truncated before being used by the signed icmp. Differential Revision: https://reviews.llvm.org/D50067 llvm-svn: 339755	2018-08-15 08:23:03 +00:00
Sam Parker	7def86bbdb	[ARM] Allow pointer values in ARMCodeGenPrepare Add pointers to the list of allowed types, but don't try to promote them. Also fixed a bug with the promotion of undef values, so a new value is now created instead of mutating in place. We also now only promote if there's an instruction in the use-def chains other than the icmp, sinks and sources. Differential Revision: https://reviews.llvm.org/D50054 llvm-svn: 339754	2018-08-15 07:52:35 +00:00
Derek Schuff	4ec8bca13e	[WebAssembly] SIMD Splats Implement and test SIMD splat ops. Patch by Thomas Lively Differential Revision: https://reviews.llvm.org/D50741 llvm-svn: 339744	2018-08-15 00:30:27 +00:00
Heejin Ahn	283e1c11bd	[WebAssembly] Delete a specific push number from test expectations Summary: This shouldn't have been a specific number but rather a regex. This was a part of rL339474 which got reverted. Reviewers: aardappel Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50728 llvm-svn: 339736	2018-08-14 22:14:51 +00:00
Cameron McInally	00b0658aae	[FPEnv] Scalarize StrictFP vector operations Add a helper function to scalarize constrained FP operations as needed. Differential Revision: https://reviews.llvm.org/D50720 llvm-svn: 339735	2018-08-14 22:13:11 +00:00
Heejin Ahn	c15a87848b	[WebAssembly] SIMD encoding tests Modifies existing SIMD tests to also check that SIMD instructions are lowered to the expected bytes. This CL depends on D50597. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50660 Patch by Thomas Lively (tlively) llvm-svn: 339712	2018-08-14 19:10:50 +00:00
Heejin Ahn	a0fd9c3e9a	[WebAssembly] SIMD extract_lane Implement instruction selection for all versions of the extract_lane instruction. Use explicit sext/zext to differentiate between extract_lane_s and extract_lane_u for applicable types, otherwise default to extract_lane_u. Reviewers: aheejin Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50597 Patch by Thomas Lively (tlively) llvm-svn: 339707	2018-08-14 18:53:27 +00:00
Simon Pilgrim	2ce3d6e135	[X86][SSE] Avoid duplicate shuffle input sources in combineX86ShufflesRecursively rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR(). This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future. llvm-svn: 339696	2018-08-14 17:22:37 +00:00
Simon Pilgrim	ed55138247	[X86][SSE] Add shuffle combine support for OR(PSHUFB,PSHUFB) style patterns. If each element is zero from one (or both) inputs then we can combine these into a single shuffle mask. llvm-svn: 339686	2018-08-14 16:00:05 +00:00
Simon Pilgrim	52c88a7c0e	[X86][SSE] Add shuffle combine tests for OR(PSHUFB,PSHUFB) style patterns. We generate these shuffle patterns but we fail to combine them. llvm-svn: 339684	2018-08-14 15:21:26 +00:00
Amara Emerson	30e61404a8	[GlobalISel][IRTranslator] Fix a bug in handling repeating struct types during argument lowering. Differential Revision: https://reviews.llvm.org/D49442 llvm-svn: 339674	2018-08-14 12:04:25 +00:00
Tomasz Krupa	86a63889f3	[X86] Lowering addus/subus intrinsics to native IR Summary: This revision improves previous version (rL330322) which has been reverted due to crashes. This is the patch that lowers x86 intrinsics to native IR in order to enable optimizations. The patch also includes folding of previously missing saturation patterns so that IR emits the same machine instructions as the intrinsics. Reviewers: craig.topper, spatel, RKSimon Reviewed By: craig.topper Subscribers: mike.dvoretsky, DavidKreitzer, sroland, llvm-commits Differential Revision: https://reviews.llvm.org/D46179 llvm-svn: 339650	2018-08-14 08:00:56 +00:00
Wouter van Oortmerssen	a7be375586	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit 917a99b71ce21c975be7bfbf66f4040f965d9f3c. llvm-svn: 339630	2018-08-13 23:12:49 +00:00
Scott Linder	35213793bc	[CodeGen] Fix assert in SelectionDAG::computeKnownBits Fix SelectionDAG::computeKnownBits asserting when handling EXTRACT_SUBVECTOR when zero extending the demanded elements mask if it is already as long as the source vector. Differential Revision: https://reviews.llvm.org/D49574 llvm-svn: 339600	2018-08-13 18:44:21 +00:00
Daniel Cederman	dc3e4c6d95	Revert "[Sparc] Add support for the cycle counter available in GR740" It breaks when using EXPENSIVE_CHECKS with the error message "Bad machine code: Using an undefined physical register". llvm-svn: 339570	2018-08-13 14:18:09 +00:00
Simon Pilgrim	4aaf48013d	[X86] Add tests showing missing div/rem 0, X -> 0 combines llvm-svn: 339562	2018-08-13 13:29:54 +00:00
Simon Pilgrim	ee82a79041	[CGP] Fix GEP issue with out of range APInt constant values not fitting in int64_t Test case reduced from https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=7173 llvm-svn: 339556	2018-08-13 12:10:09 +00:00
Daniel Cederman	1bfbc62022	[Sparc] Add support for the cycle counter available in GR740 Summary: The GR740 provides an up cycle counter in the registers ASR22 and ASR23. As these registers can not be read together atomically we only use the value of ASR23 for llvm.readcyclecounter(). The ASR23 register holds the 32 LSBs of the up-counter. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48638 llvm-svn: 339551	2018-08-13 10:49:48 +00:00
Luke Geeson	4ce41d2bb7	[ARM] Added FP16 VREV Vector Instrinsic CodeGen support llvm-svn: 339546	2018-08-13 08:37:41 +00:00
Craig Topper	cacf12a149	[SelectionDAG] In PromoteFloatOp_BITCAST, insert a bitcast after the fp_to_fp16 in case the result type isn't a scalar integer. This is another variation of PR38533. In this case, the result type of the bitcast is legal and 16-bits wide, but not a scalar integer. So we need to emit the convert to i16 and then bitcast it to the true result type. This new bitcast will be further type legalized if necessary. llvm-svn: 339536	2018-08-13 06:53:49 +00:00
Craig Topper	e42a159537	[SelectionDAG] In PromoteIntRes_BITCAST, when the input is TypePromoteFloat, make sure the output type is scalar. For vectors, use a store and load of temporary. Previously if the result type was a vector, we emitted a FP_TO_FP16 with a vector result type which isn't valid. This is basically the opposite case of the root cause of PR38533. llvm-svn: 339535	2018-08-13 06:53:47 +00:00
Lei Liu	901a0a9588	Restore correct x86_64 EH encodings in kernel code model Fixes PR37524. The exception handling encodings for x86_64 in kernel code model has been changed with r309884. Restore it to correct ones. These encodings include PersonalityEncoding, LSDAEncoding and TTypeEncoding. Differential Revision: https://reviews.llvm.org/D50490 llvm-svn: 339534	2018-08-13 06:06:53 +00:00
Craig Topper	42e32117bb	[SelectionDAG] In PromoteFloatRes_BITCAST, insert a bitcast before the fp16_to_fp in case the input type isn't an i16. The bitcast can be further legalized as needed. Fixes PR38533. llvm-svn: 339533	2018-08-13 05:26:49 +00:00
Matt Arsenault	3763f307bd	AMDGPU: Cleanup min/max legacy tests Also add some more tests in preparation for a future patch. llvm-svn: 339526	2018-08-12 19:29:53 +00:00
Matt Arsenault	1201301b94	DAG: Check no-signed-zeros instead of unsafe-fp-math Addresses fixme, although this should still be checking individual operand flags. llvm-svn: 339525	2018-08-12 19:09:12 +00:00
Matt Arsenault	13b0db9285	AMDGPU: Check NSZ MI flag when folding omod I'm not sure the exact nsz flag combination that is OK. I think as long as it's on either, this is OK. For now just check it on the omod multiply. llvm-svn: 339513	2018-08-12 08:44:25 +00:00
Matt Arsenault	b5acec1f79	AMDGPU: Use splat vectors for undefs when folding canonicalize If one of the elements is undef, use the canonicalized constant from the other element instead of 0. Splat vectors are more useful for other optimizations, such as matching vector clamps. This was breaking on clamps of half3 from the undef 4th component. llvm-svn: 339512	2018-08-12 08:42:54 +00:00
Matt Arsenault	3ead7d7389	AMDGPU: Fix packing undef parts of build_vector llvm-svn: 339511	2018-08-12 08:42:46 +00:00
Craig Topper	570d47a010	[X86] Change the MOV32ri64 pseudo instruction to def a GR64 directly instead of wrapping it in a SUBREG_TO_REG. Now we switch to the subregister in expandPostRAPseudos where we already switched the opcode. This simplifies a few isel patterns that used the pseudo directly. And magically seems to have improved our ability to CSE it in the undef-label.ll test. llvm-svn: 339496	2018-08-11 05:33:00 +00:00
Tom Stellard	8adc86a7dc	AMDGPU/GlobalISel: Define instruction mapping for G_INSERT Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49625 llvm-svn: 339491	2018-08-11 00:51:54 +00:00
Wouter van Oortmerssen	ab26bd0647	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: jfb, llvm-commits, aheejin, eraman, jgravelle-google, sbc100 Differential Revision: https://reviews.llvm.org/D50568 llvm-svn: 339474	2018-08-10 21:32:47 +00:00
Eli Friedman	e1687a89e8	[ARM] Adjust AND immediates to make them cheaper to select. LLVM normally prefers to minimize the number of bits set in an AND immediate, but that doesn't always match the available ARM instructions. In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer a two-instruction sequence movs+ands or movs+bics. Some potential improvements outlined in ARMTargetLowering::targetShrinkDemandedConstant, but seems to work pretty well already. The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX instruction due to a larger-than-expected mask. (It's orthogonal, in some sense, but as far as I can tell it's either impossible or nearly impossible to reproduce the bug without this change.) According to my testing, this seems to consistently improve codesize by a small amount by forming bic more often for ISD::AND with an immediate. Differential Revision: https://reviews.llvm.org/D50030 llvm-svn: 339472	2018-08-10 21:21:53 +00:00
Matt Arsenault	940e6075e4	AMDGPU: More canonicalized operations llvm-svn: 339464	2018-08-10 19:20:17 +00:00
Matt Arsenault	3dcf4ce435	AMDGPU: Combine and of seto/setuo and fp_class Clear the nan (or non-nan) test bits from the mask. llvm-svn: 339462	2018-08-10 18:58:56 +00:00
Matt Arsenault	8ad00d30fa	AMDGPU: Match isfinite pattern to class instructions llvm-svn: 339460	2018-08-10 18:58:41 +00:00
Sam Parker	8c4b964c5a	[ARM] Disallow zexts in ARMCodeGenPrepare Enabling ARMCodeGenPrepare by default caused a whole load of failures. This is due to zexts and truncs not being handled properly. ZExts are messy so it's just easier to disable for now and truncs are allowed only as 'sinks'. I still need to figure out why allowing them as 'sources' causes so many failures. The other main changes are that we are explicit in the types that we converting to, it's now always 'TypeSize'. Type support is also now performed while checking for valid opcodes as it unnecessarily complicated having the checks are different stages. I've moved the tests around too, so we have the zext and truncs in their own file as well as the overflowing opcode tests. Differential Revision: https://reviews.llvm.org/D50518 llvm-svn: 339432	2018-08-10 13:57:13 +00:00
Hans Wennborg	d4090be340	Rename the cfguard module flag to cfguardtable The previous name sounds like it inserts cfguard implementation, but it really just emits the table of address-taken functions. Change the name to better reflect that. Clang will be updated in the next commit. llvm-svn: 339419	2018-08-10 09:48:53 +00:00
Heejin Ahn	5831e9cc79	[WebAssembly] Gate i64x2 and f64x2 on -wasm-enable-unimplemented Summary: i64x2 and f64x2 operations are not implemented in V8, so we normally do not want to emit them. However, they are in the SIMD spec proposal, so we still want to be able to test them in the toolchain. This patch adds a flag to enable their emission. Reviewers: aheejin, dschuff Subscribers: sunfish, jgravelle-google, sbc100, llvm-commits Differential Revision: https://reviews.llvm.org/D50423 Patch by Thomas Lively (tlively) llvm-svn: 339407	2018-08-09 23:58:51 +00:00
Craig Topper	9a8136f7b4	[X86] Qualify one of the heuristics in combineMul to only apply to positive multiply amounts. This seems to slightly help the performance of one of our internal benchmarks. We probably need better heuristics here. llvm-svn: 339406	2018-08-09 23:27:42 +00:00
Krzysztof Parzyszek	75c2ca3638	[Hexagon] Map ISD::TRAP to J2_trap0(#0 ) llvm-svn: 339365	2018-08-09 18:03:45 +00:00
Sanjay Patel	15d1501aae	[SelectionDAG] try harder to convert funnel shift to rotate Similar to rL337966 - if the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. AArch only goes right, and PPC only goes left. x86 has both, so no diffs there. Differential Revision: https://reviews.llvm.org/D50091 llvm-svn: 339359	2018-08-09 17:26:22 +00:00
Michael Berg	ca38254601	extend folding fsub/fadd to fneg for FMF Summary: This change provides a common optimization path for both Unsafe and FMF driven optimization for this fsub fold adding reassociation, as it the flag that most closely represents the translation Reviewers: spatel, wristow, arsenm Reviewed By: spatel Subscribers: wdng Differential Revision: https://reviews.llvm.org/D50195 llvm-svn: 339357	2018-08-09 17:00:03 +00:00
Evandro Menezes	9a92fe0c9e	[ARM] Replace processor check with feature Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a processor check. Otherwise, NFC. llvm-svn: 339354	2018-08-09 16:13:24 +00:00
Sjoerd Meijer	806f70d229	[ARM] FP16: codegen support for VTRN Differential Revision: https://reviews.llvm.org/D50454 llvm-svn: 339340	2018-08-09 12:45:09 +00:00
Simon Pilgrim	511c3fc529	[X86][SSE] Remove PMULDQ/PMULUDQ by zero Exposed by D50328 Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339337	2018-08-09 12:37:36 +00:00
Simon Pilgrim	01ae462fef	[X86][SSE] Combine (some) target shuffles with multiple uses As discussed on D41794, we have many cases where we fail to combine shuffles as the input operands have other uses. This patch permits these shuffles to be combined as long as they don't introduce additional variable shuffle masks, which should reduce instruction dependencies and allow the total number of shuffles to still drop without increasing the constant pool. However, this may mean that some memory folds may no longer occur, and on pre-AVX require the occasional extra register move. This also exposes some poor PMULDQ/PMULUDQ codegen which was doing unnecessary upper/lower calculations which will in fact fold to zero/undef - the fix will be added in a followup commit. Differential Revision: https://reviews.llvm.org/D50328 llvm-svn: 339335	2018-08-09 12:30:02 +00:00
Jonas Hahnfeld	20526bf483	[NVPTX] Select atomic loads and stores According to PTX ISA .volatile has the same memory synchronization semantics as .relaxed.sys, so it can be used to implement monotonic atomic loads and stores. This is important for OpenMP's atomic construct where - 'read's and 'write's are lowered to atomic loads and stores, and - an update of float or double types are lowered into a cmpxchg loop. (Note that PTX could do better because it has atom.add.f{32,64} but LLVM's atomicrmw instruction only allows integer types.) Higher levels of atomicity (like acquire and release) need additional synchronization properties which were added with PTX ISA 6.0 / sm_70. So using these instructions still results in an error. Differential Revision: https://reviews.llvm.org/D50391 llvm-svn: 339316	2018-08-09 07:45:49 +00:00
Sanjay Patel	f9a80fe87a	[x86] add test for commuted variant for fsub fold; NFC llvm-svn: 339300	2018-08-08 23:06:59 +00:00
Sanjay Patel	e47dc1a405	[DAGCombiner] loosen constraints for fsub+fadd fold isNegatibleForFree() should not matter here (as the test diffs show) because it's always a win to replace an fsub+fadd with fneg. The problem in D50195 persists because either (1) we are doing these folds in the wrong order or (2) we're missing another fold for fadd. llvm-svn: 339299	2018-08-08 23:04:43 +00:00
Petr Hosek	7b27454477	[ADT] Normalize empty triple components LLVM triple normalization is handling "unknown" and empty components differently; for example given "x86_64-unknown-linux-gnu" and "x86_64-linux-gnu" which should be equivalent, triple normalization returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's config.sub returns "x86_64-unknown-linux-gnu" for both "x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the triple normalization to behave the same way, replacing empty triple components with "unknown". This addresses PR37129. Differential Revision: https://reviews.llvm.org/D50219 llvm-svn: 339294	2018-08-08 22:23:57 +00:00
Sanjay Patel	f8937c8406	[x86] add tests for fsub+fadd with FMF; NFC These are related to the block of code under review in D50195. llvm-svn: 339293	2018-08-08 22:18:16 +00:00
Jonas Devlieghere	49ff4d9041	[DWARF] Unclamp line table version on Darwin for v5 and later. On Darwin we pin the DWARF line tables to version 2. Stop doing so for DWARF v5 and later. Differential revision: https://reviews.llvm.org/D49381 llvm-svn: 339288	2018-08-08 21:16:50 +00:00
Eli Friedman	5b45a39056	[ARM] Avoid spilling lr with Thumb1 tail calls. Normally, if any registers are spilled, we prefer to spill lr on Thumb1 so we can fold the "bx lr" into the "pop". However, if there are tail calls involved, restoring lr is expensive, so skip the optimization in that case. The spill of r7 in the new test also isn't necessary, but that's mostly orthogonal to this patch. (It's the same code in ARMFrameLowering, but it's not related to tail calls.) Differential Revision: https://reviews.llvm.org/D49459 llvm-svn: 339283	2018-08-08 20:03:10 +00:00
Ties Stuij	0244aa67d6	revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved' llvm-svn: 339276	2018-08-08 17:19:32 +00:00
Krzysztof Parzyszek	1df7059150	[Hexagon] Diagnose misaligned absolute loads and stores Differential Revision: https://reviews.llvm.org/D50405 llvm-svn: 339272	2018-08-08 17:00:09 +00:00
Matt Arsenault	935f3b70fe	AMDGPU: Error more gracefully on libcalls I think this is the only situation where the callsite will have a null instruction. llvm-svn: 339271	2018-08-08 16:58:39 +00:00
Matt Arsenault	e719139b10	AMDGPU: Fix shifts for i128 llvm-svn: 339270	2018-08-08 16:58:33 +00:00
Zaara Syeda	b2595b988b	[PowerPC] Improve codegen for vector loads using scalar_to_vector This patch aims to improve the codegen for vector loads involving the scalar_to_vector (load X) sequence. Initially, ld->mv instructions were used for scalar_to_vector (load X), so this patch allows scalar_to_vector (load X) to utilize: LXSD and LXSDX for i64 and f64 LXSIWAX for i32 (sign extension to i64) LXSIWZX for i32 and f64 Committing on behalf of Amy Kwan. Differential Revision: https://reviews.llvm.org/D48950 llvm-svn: 339260	2018-08-08 15:20:43 +00:00
Ties Stuij	52f3631f4b	[CodeGen] emit inline asm clobber list warnings for reserved Summary: Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results. For example: ``` extern int bar(int[]); int foo(int i) { int a[i]; // VLA asm volatile( "mov r7, #1" : : : "r7" ); return 1 + bar(a); } ``` Compiled for thumb, this gives: ``` $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb ... foo: .fnstart @ %bb.0: @ %entry .save {r4, r5, r6, r7, lr} push {r4, r5, r6, r7, lr} .setfp r7, sp, #12 add r7, sp, #12 .pad #4 sub sp, #4 movs r1, #7 add.w r0, r1, r0, lsl #2 bic r0, r0, #7 sub.w r0, sp, r0 mov sp, r0 @APP mov.w r7, #1 @NO_APP bl bar adds r0, #1 sub.w r4, r7, #12 mov sp, r4 pop {r4, r5, r6, r7, pc} ... ``` r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block. This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now. The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter. If we find a reserved register, we print a warning: ``` repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm] "mov r7, #1" ^ ``` Reviewers: eli.friedman, olista01, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49727 llvm-svn: 339257	2018-08-08 15:15:59 +00:00
Simon Pilgrim	164e8b0b5c	[TargetLowering] BuildUDIV - Add support for divide by one (PR38477) Provide a pass-through of the numerator for divide by one cases - this is the same approach we take in DAGCombiner::visitSDIVLike. I investigated whether we could achieve this by magic MULHU/SRL values but nothing appeared to work as we don't have a way for MULHU(x,c) -> x llvm-svn: 339254	2018-08-08 14:51:19 +00:00
Sjoerd Meijer	1919ecfd0b	[ARM][NFC] Replaced tab-characters in test file vtrn.ll llvm-svn: 339251	2018-08-08 14:42:11 +00:00
Simon Pilgrim	9f5b8f093e	[X86][SSE] PR38477 test is more cleanly tested with udiv instead of urem Making the test use urem relies on it calling udiv-like combines, but the real issue is with the udiv so we're better off using that directly. llvm-svn: 339247	2018-08-08 14:11:44 +00:00
Sjoerd Meijer	f8c394f0f5	[ARM] FP16: codegen support for VEXT Differential Revision: https://reviews.llvm.org/D50427 llvm-svn: 339241	2018-08-08 13:26:38 +00:00
Sjoerd Meijer	db5908deb9	[ARM] FP16: vector vmov and vdup support This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50329 llvm-svn: 339238	2018-08-08 13:11:31 +00:00
Sjoerd Meijer	920a453485	[ARM] FP16: vector VMUL variants This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants. Differential Revision: https://reviews.llvm.org/D50326 llvm-svn: 339232	2018-08-08 10:27:34 +00:00
Simon Pilgrim	5477f11ba3	[X86][SSE] Add divide-by-one exact sdiv vector test Based on PR38477, we need to ensure we're testing for divide-by-one in non-uniform vectors llvm-svn: 339231	2018-08-08 10:16:43 +00:00
Simon Pilgrim	a10cfcc1db	[TargetLowering] BuildUDIV - Early out for divide by one (PR38477) We're not handling the UDIV by one special case properly - for now just early out. llvm-svn: 339229	2018-08-08 10:00:54 +00:00
Sjoerd Meijer	b33a4c02cc	[ARM] FP16: support vector INT_TO_FP and FP_TO_INT This adds codegen support for the different vcvt_f16 variants. Differential Revision: https://reviews.llvm.org/D50393 llvm-svn: 339227	2018-08-08 09:45:34 +00:00
Thomas Preud'homme	4107b31df2	Support inline asm with multiple 64bit output in 32bit GPR Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma Reviewed By: efriedma Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45437 llvm-svn: 339225	2018-08-08 09:35:26 +00:00
Sjoerd Meijer	b264944ed5	[ARM] FP16: support the vector vmin and vmax variants Differential Revision: https://reviews.llvm.org/D50238 llvm-svn: 339221	2018-08-08 07:20:15 +00:00
Michael Berg	2e60ad2e58	[NFC] adding tests for Y - (X + Y) --> -X llvm-svn: 339197	2018-08-07 22:52:57 +00:00
Jan Vesely	7b2c98ab59	AMDGPU: Remove broken i16 ternary patterns Fixup test to check for GCN prefix These patterns always zero extend the result even though it might need sign extension. This has been broken since the addition of i16 support. It has popped up in mad_sat(char) test since min(max()) combination is turned into v_med3, resulting in the following (incorrect) sequence: v_mad_i16 v2, v10, v9, v11 v_med3_i32 v2, v2, v8, v7 Fixes mad_sat(char) piglit on VI. Differential Revision: https://reviews.llvm.org/D49836 llvm-svn: 339190	2018-08-07 21:54:37 +00:00
Derek Schuff	51ed131ed2	[WebAssembly] Update SIMD binary arithmetic Add missing SIMD types (v2f64) and binary ops. Also adds tablegen support for automatically prepending prefix byte to SIMD opcodes. Differential Revision: https://reviews.llvm.org/D50292 Patch by Thomas Lively llvm-svn: 339186	2018-08-07 21:24:01 +00:00
Krzysztof Parzyszek	e7ce247dd7	[Hexagon] Allow use of gather intrinsics even with no-packets Vgather requires must be in a packet with a store, which contradicts the no-packets feature. As a consequence, gather/scatter could not be used with no-packets. Relax this, and allow gather packets as exceptions to the no-packets requirements. llvm-svn: 339177	2018-08-07 20:33:47 +00:00
Heejin Ahn	7fb68d2679	[WebAssembly] CFG sort support for exception handling Summary: This patch extends CFGSort pass to support exception handling. Once it places a loop header, it does not place blocks that are not dominated by the loop header until all the loop blocks are sorted. This patch extends the same algorithm to exception 'catch' part, using the information calculated by WebAssemblyExceptionInfo class. Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, llvm-commits Differential Revision: https://reviews.llvm.org/D46500 llvm-svn: 339172	2018-08-07 20:19:23 +00:00
Craig Topper	49ed49fcb1	[SelectionDAG] When splitting scatter nodes during DAGCombine, create a serial chain dependency. Scatter could have multiple identical indices. We need to maintain sequential order. We get this right in LegalizeVectorTypes, but not in this code. Differential Revision: https://reviews.llvm.org/D50374 llvm-svn: 339157	2018-08-07 17:35:02 +00:00
Sjoerd Meijer	b39cd886b9	[ARM] FP16: codegen support for VACGT Differential Revision: https://reviews.llvm.org/D50236 llvm-svn: 339148	2018-08-07 15:11:47 +00:00
Aleksandar Beserminji	949a17c016	[mips] Handle branch expansion corner cases When potential jump instruction and target are in the same segment, use jump instruction with immediate field. In cases where offset does not fit immediate value of a bc/j instructions, offset is stored into register, and then jump register instruction is used. Differential Revision: https://reviews.llvm.org/D48019 llvm-svn: 339126	2018-08-07 10:45:45 +00:00
Simon Pilgrim	7e18938793	[TargetLowering] Add support for non-uniform vectors to BuildUDIV This patch refactors the existing TargetLowering::BuildUDIV base implementation to support non-uniform constant vector denominators. It also includes a fold for MULHU by pow2 constants to SRL which can now more readily occur from BuildUDIV. Differential Revision: https://reviews.llvm.org/D49248 llvm-svn: 339121	2018-08-07 09:51:34 +00:00
Simon Pilgrim	974a5a7d94	[X86][SSE] Add more non-uniform exact sdiv vector tests covering all/none ashr paths llvm-svn: 339120	2018-08-07 09:31:22 +00:00
Sjoerd Meijer	a2ddddfd3e	[ARM][NFC] Replaced tab characters in test file vfcmp.ll. llvm-svn: 339111	2018-08-07 08:05:15 +00:00
Heejin Ahn	e8653bb89a	[WebAssembly] Enable atomic expansion for unsupported atomicrmws Summary: Wasm does not have direct counterparts to some of LLVM IR's atomicrmw instructions (min, max, umin, umax, and nand). This enables atomic expansion using cmpxchg instruction within a loop for those atomicrmw instructions. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49440 llvm-svn: 339084	2018-08-07 00:22:22 +00:00
Matt Arsenault	08f3fe4fae	AMDGPU: cvt_pk_rtz_f16 canonicalizes llvm-svn: 339078	2018-08-06 23:01:31 +00:00
Matt Arsenault	e94ee833f9	AMDGPU: Handle some vector operations in isCanonicalized llvm-svn: 339077	2018-08-06 22:45:51 +00:00
Matt Arsenault	a29e76244a	AMDGPU: Push fcanonicalize through partially constant build_vector This usually avoids some re-packing code, and may help find canonical sources. llvm-svn: 339072	2018-08-06 22:30:44 +00:00
Matt Arsenault	d49ab0b214	AMDGPU: Treat more custom operations as canonicalizing Everything should quiet, and I think everything should flush. I assume the min3/med3/max3 follow the same rules as regular min/max for flushing, which should at least be conservatively correct. There are still more operations that need to be handled. llvm-svn: 339065	2018-08-06 21:58:11 +00:00
Matt Arsenault	ce6d61fba8	AMDGPU: Conversions always produce canonical results Not sure why this was checking for denormals for f16. My interpretation of the IEEE standard is conversions should produce a canonical result, and the ISA manual says denormals are created when appropriate. llvm-svn: 339064	2018-08-06 21:51:52 +00:00
Matt Arsenault	f8768bfc84	AMDGPU: Fix implementation of isCanonicalized If denormals are enabled, denormals are canonical. Also fix a few other issues. minnum/maxnum are supposed to canonicalize. Temporarily improve workaround for the instruction behavior change in gfx9. Handle selects and fcopysign. The tests were also largely broken, since they were checking for a flush used on some targets after the store of the result. llvm-svn: 339061	2018-08-06 21:38:27 +00:00
Reid Kleckner	15e91c3235	[X86] Fix assertion in subreg extraction This assert fires when attempting to extract a subregister from the global PIC base register. This virtual register SD node is not in the VRBaseMap, so we shouldn't call getVR to look it up there. If this is a RegisterSDNode, we should be able to use the virtual register directly. Fixes PR38385 llvm-svn: 339056	2018-08-06 21:16:16 +00:00
Easwaran Raman	10fd92dd94	[X86] Recognize a splat of negate in isFNEG Summary: Expand isFNEG so that we generate the appropriate F(N)M(ADD\|SUB) instructions in more cases. For example, the following sequence a = _mm256_broadcast_ss(f) d = _mm256_fnmadd_ps(a, b, c) generates an fsub and fma without this patch and an fnma with this change. Reviewers: craig.topper Subscribers: llvm-commits, davidxl, wmi Differential Revision: https://reviews.llvm.org/D48467 llvm-svn: 339043	2018-08-06 19:23:38 +00:00
Craig Topper	0076477a4c	[X86] When using "and $0" and "orl $-1" to store 0 and -1 for minsize, make sure the store isn't volatile If the store is volatile this might be a memory mapped IO access. In that case we shouldn't generate a load that didn't exist in the source Differential Revision: https://reviews.llvm.org/D50270 llvm-svn: 339041	2018-08-06 18:44:26 +00:00
Craig Topper	f8a8c746e3	[X86] Add test cases to show bad use of "and $0" and "orl $-1" for minsize when the store is volatile If the store is volatile we shouldn't be adding a little that didn't exist in the source. llvm-svn: 339040	2018-08-06 18:44:21 +00:00
Wei Mi	3c1c088500	[RegisterCoalescer] Delay live interval update work until the rematerialization for all the uses from the same def is done. We run into a compile time problem with flex generated code combined with `-fno-jump-tables`. The cause is that machineLICM hoists a lot of invariants outside of a big loop, and drastically increases the compile time in global register splitting and copy coalescing. https://reviews.llvm.org/D49353 relieves the problem in global splitting. This patch is to handle the problem in copy coalescing. About the situation where the problem in copy coalescing happens. After machineLICM, we have several defs outside of a big loop with hundreds or thousands of uses inside the loop. Rematerialization in copy coalescing happens for each use and everytime rematerialization is done, shrinkToUses will be called to update the huge live interval. Because we have 'n' uses for a def, and each live interval update will have at least 'n' complexity, the total update work is n^2. To fix the problem, we try to do the live interval update work in a collective way. If a def has many copylike uses larger than a threshold, each time rematerialization is done for one of those uses, we won't do the live interval update in time but delay that work until rematerialization for all those uses are completed, so we only have to do the live interval update work once. Delaying the live interval update could potentially change the copy coalescing result, so we hope to limit that change to those defs with many (like above a hundred) copylike uses, and the cutoff can be adjusted by the option -mllvm -late-remat-update-threshold=xxx. Differential Revision: https://reviews.llvm.org/D49519 llvm-svn: 339035	2018-08-06 17:30:45 +00:00
Matt Arsenault	0d1b3934e2	AMDGPU: Fold v_lshl_or_b32 with 0 src0 Appears from expansion of some packed cases. llvm-svn: 339025	2018-08-06 15:40:20 +00:00
Matt Arsenault	dbf77c5b41	AMDGPU: Rename check prefixes in test Will avoid noisy diff in future change. llvm-svn: 339022	2018-08-06 15:16:12 +00:00
Bryan Chan	e023706471	[AArch64] Fix assertion failure on widened f16 BUILD_VECTOR Summary: Ensure that NormalizedBuildVector returns a BUILD_VECTOR with operands of the same type. This fixes an assertion failure in VerifySDNode. Reviewers: SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50202 llvm-svn: 339013	2018-08-06 14:14:41 +00:00
Craig Topper	fb33181038	[X86] Remove stale comments from a test. NFC The 16-bit case was recently fixed so this comment no longer applies. llvm-svn: 338964	2018-08-05 06:25:01 +00:00
Aditya Nandakumar	e07b3b737b	[GISel]: Add Opcodes for CTLZ/CTTZ/CTPOP https://reviews.llvm.org/D48600 Added IRTranslator support to translate these known intrinsics into GISel opcodes. llvm-svn: 338944	2018-08-04 01:22:12 +00:00
Craig Topper	3c869cb5e5	[X86] Add isel patterns for atomic_load+sub+atomic_sub. Despite the comment removed in this patch, this is beneficial when the RHS of the sub is a register. llvm-svn: 338930	2018-08-03 22:08:30 +00:00
Craig Topper	84319d1b42	[X86] Add test cases to show missed opportunity to use RMW for atomic_load+sub+atomic_store. llvm-svn: 338929	2018-08-03 22:08:28 +00:00
Reid Kleckner	8e40702c1c	[X86] Re-generate abi-isel.ll checks with update_llc_test_checks.py These tests were clearly auto-generated when they were converted to FileCheck back in r80019 (2009), but we didn't have a fancy script to keep them up to date then. I've reviewed the diff, and we should be generating the exact same code sequences we used to. After this, I plan to commit a change that changes our output slightly, but in a way that is still correct. It will generate a large diff, and I want it to be clearly correct, so I am regenerating these checks in preparation for that. llvm-svn: 338928	2018-08-03 21:58:25 +00:00
Reid Kleckner	5578b53c92	[X86] Make abi-isel.ll like update_llc_test_checks.py output - Remove -asm-verbose=0 from every llc command. The tests still pass. - Reorder the RUN lines to match CHECKs. - Use -LABEL like update_llc_test_checks.py does. llvm-svn: 338927	2018-08-03 21:58:12 +00:00
Reid Kleckner	13a9035190	[X86] Layout tests exactly as update_llc_test_checks.py would Put the LLVM IR at the bottom of the function instead of the top. In my next patch, I will run update_llc_test_checks.py on this file, and I want to only highlight the diffs in the CHECK lines. Hopefully by doing this change first, the patch will be more understandable. llvm-svn: 338926	2018-08-03 21:57:59 +00:00
Craig Topper	d7391eefdf	[X86] Remove RELEASE_ and ACQUIRE_ pseudo instructions. Use isel patterns and the normal instructions instead At one point in time acquire implied mayLoad and mayStore as did release. Thus we needed separate pseudos that also carried that property. This appears to no longer be the case. I believe it was changed in 2012 with a comment saying that atomic memory accesses are marked volatile which preserves the ordering. So from what I can tell we shouldn't need additional pseudos since they aren't carry any flags that are different from the normal instructions. The only thing I can think of is that we may consider them for load folding candidates in the peephole pass now where we didn't before. If that's important hopefully there's something in the memory operand we can check to prevent the folding without relying on pseudo instructions. Differential Revision: https://reviews.llvm.org/D50212 llvm-svn: 338925	2018-08-03 21:40:44 +00:00
Craig Topper	8c41136ca3	[X86] Autogenerate complete checks. NFC llvm-svn: 338921	2018-08-03 20:58:14 +00:00
Craig Topper	c4960582ec	[SelectionDAG] Teach LegalizeVectorTypes to widen the mask input to a masked store. The mask operand is visited before the data operand so we need to be able to widen it. Fixes PR38436. llvm-svn: 338915	2018-08-03 20:14:18 +00:00
Matt Arsenault	c3dc8e65e2	DAG: Enhance isKnownNeverNaN Add a parameter for testing specifically for sNaNs - at least one instruction pattern on AMDGPU needs to check specifically for this. Also handle more cases, and add a target hook for custom nodes, similar to the hooks for known bits. llvm-svn: 338910	2018-08-03 18:27:52 +00:00
Artem Belevich	0a11b6366a	[NVPTX] Handle __nvvm_reflect("__CUDA_ARCH"). Summary: libdevice in recent CUDA versions relies on __nvvm_reflect() to select GPU-specific bitcode. This patch addresses the requirement. Reviewers: jlebar Subscribers: jholewinski, sanjoy, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D50207 llvm-svn: 338908	2018-08-03 18:05:24 +00:00
Craig Topper	feb2a58860	[X86] Add a DAG combine for the __builtin_parity idiom used by clang to enable better codegen Clang uses "ctpop & 1" to implement __builtin_parity. If the popcnt instruction isn't supported this generates a large amount of code to calculate the population count. Instead we can bisect the data down to a single byte using xor and then check the parity flag. Even when popcnt is supported, its still a good idea to split 64-bit data on 32-bit targets using an xor in front of a single popcnt. Otherwise we get two popcnts and an add before the and. I've specifically targeted this at the sizes supported by clang builtins, but we could generalize this if we think that's useful. Differential Revision: https://reviews.llvm.org/D50165 llvm-svn: 338907	2018-08-03 18:00:29 +00:00
Craig Topper	b0ad9b9fd7	[X86] Add test cases for the current codegen of __builtin_parity. Will be improved in a follow commit llvm-svn: 338906	2018-08-03 18:00:23 +00:00
Nicholas Wilson	e408a89a3a	[WebAssembly] Cleanup of the way globals and global flags are handled Differential Revision: https://reviews.llvm.org/D44030 llvm-svn: 338894	2018-08-03 14:33:37 +00:00
Jonas Paulsson	f107b7275c	[SystemZ] Improve handling of instructions which expand to several groups Some instructions expand to more than one decoder group. This has been hitherto ignored, but is handled with this patch. Review: Ulrich Weigand https://reviews.llvm.org/D50187 llvm-svn: 338849	2018-08-03 10:43:05 +00:00
Sjoerd Meijer	d62c5ec2fe	[ARM] FP16: support vector zip and unzip This is addressing PR38404. Differential Revision: https://reviews.llvm.org/D50186 llvm-svn: 338835	2018-08-03 09:24:29 +00:00
Simon Pilgrim	4014fb1049	[X86] Add example of 'zero shift' guards on rotation patterns (PR34924) Basic pattern that leaves an unnecessary select on a rotation by zero result. This variant is trivial - the more general case with a compare+branch to prevent execution of undefined shifts is more tricky. llvm-svn: 338833	2018-08-03 09:20:02 +00:00
Sjoerd Meijer	9b30213828	[ARM] FP16: support VFMA This is addressing PR38404. llvm-svn: 338830	2018-08-03 09:12:56 +00:00
Craig Topper	a7a12399a1	[X86] Remove all the vector NOP bitcast patterns. Use a few lines of code in the Select method in X86ISelDAGToDAG.cpp instead. There are a lot of permutations of types here generating a lot of patterns in the isel table. It's more efficient to just ReplaceUses and RemoveDeadNode from the Select function. The test changes are because we have a some shuffle patterns that have a bitcast as their root node. But the behavior is identical to another instruction whose pattern doesn't start with a bitcast. So this isn't a functional change. llvm-svn: 338824	2018-08-03 07:01:10 +00:00
Craig Topper	e902b7d0b0	[X86] Support fp128 and/or/xor/load/store with VEX and EVEX encoded instructions. Move all the patterns to X86InstrVecCompiler.td so we can keep SSE/AVX/AVX512 all in one place. To save some patterns we'll use an existing DAG combine to convert f128 fand/for/fxor to integer when sse2 is enabled. This allows use to reuse all the existing patterns for v2i64. I believe this now makes SHA instructions the only case where VEX/EVEX and legacy encoded instructions could be generated simultaneously. llvm-svn: 338821	2018-08-03 06:12:56 +00:00
Craig Topper	a80352c04e	[X86] When post-processing the DAG to remove zero extending moves for YMM/ZMM, make sure the producing instruction is VEX/XOP/EVEX encoded. If the producing instruction is legacy encoded it doesn't implicitly zero the upper bits. This is important for the SHA instructions which don't have a VEX encoded version. We might also be able to hit this with the incomplete f128 support that hasn't been ported to VEX. llvm-svn: 338812	2018-08-03 04:49:42 +00:00
Craig Topper	ded14af7aa	[X86] Autogenerate complete checks. NFC llvm-svn: 338811	2018-08-03 04:49:41 +00:00
Craig Topper	55697276dc	[X86] Autogenerate complete checks. NFC llvm-svn: 338802	2018-08-03 01:28:12 +00:00
Craig Topper	b99281c9b8	[X86] Autogenerate complete checks. NFC llvm-svn: 338799	2018-08-03 01:20:32 +00:00
Craig Topper	2c095444a4	[X86] Prevent promotion of i16 add/sub/and/or/xor to i32 if we can fold an atomic load and atomic store. This makes them consistent with i8/i32/i64. Which still seems to be more aggressive on folding than icc, gcc, or MSVC. llvm-svn: 338795	2018-08-03 00:37:34 +00:00
Tim Renouf	abd85fb1f5	[AMDGPU] Reworked SIFixWWMLiveness Summary: I encountered some problems with SIFixWWMLiveness when WWM is in a loop: 1. It sometimes gave invalid MIR where there is some control flow path to the new implicit use of a register on EXIT_WWM that does not pass through any def. 2. There were lots of false positives of registers that needed to have an implicit use added to EXIT_WWM. 3. Adding an implicit use to EXIT_WWM (and adding an implicit def just before the WWM code, which I tried in order to fix (1)) caused lots of the values to be spilled and reloaded unnecessarily. This commit is a rework of SIFixWWMLiveness, with the following changes: 1. Instead of considering any register with a def that can reach the WWM code and a def that can be reached from the WWM code, it now considers three specific cases that need to be handled. 2. A register that needs liveness over WWM to be synthesized now has it done by adding itself as an implicit use to defs other than the dominant one. Also added the following fixmes: FIXME: We should detect whether a register in one of the above categories is already live at the WWM code before deciding to add the implicit uses to synthesize its liveness. FIXME: I believe this whole scheme may be flawed due to the possibility of the register allocator doing live interval splitting. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D46756 Change-Id: Ie7fba0ede0378849181df3f1a9a7a39ed1a94a94 llvm-svn: 338783	2018-08-02 23:31:32 +00:00
Craig Topper	63873db5c4	[X86] Allow 'atomic_store (neg/not atomic_load)' to isel to a RMW instruction. There was a FIXMe in the td file about a type inference issue that was easy to fix. llvm-svn: 338782	2018-08-02 23:30:38 +00:00
Craig Topper	2deeeae2a5	[X86] Add NEG and NOT test cases to atomic_mi.ll in preparation for fixing the FIXME in X86InstrCompiler.td to make these work for atomic load/store. llvm-svn: 338781	2018-08-02 23:30:31 +00:00
Tim Renouf	f1c7b92a6a	[AMDGPU] Avoid using divergent value in mubuf addr64 descriptor Summary: This fixes a problem where a load from global+idx generated incorrect code on <=gfx7 when the index is divergent. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D47383 Change-Id: Ib4d177d6254b1dd3f8ec0203fdddec94bd8bc5ed llvm-svn: 338779	2018-08-02 22:53:57 +00:00
Krzysztof Parzyszek	d91a9e27a9	[Hexagon] Simplify CFG after atomic expansion This will remove suboptimal branching from the generated ll/sc loops. The extra simplification pass affects a lot of testcases, which have been modified to accommodate this change: either by modifying the test to become immune to the CFG simplification, or (less preferablt) by adding option -hexagon-initial-cfg-clenaup=0. llvm-svn: 338774	2018-08-02 22:17:53 +00:00
Heejin Ahn	4128cb0b6b	[WebAssembly] Support for atomic.wait / atomic.wake instructions Summary: This adds support for atomic.wait / atomic.wake instructions in the wasm thread proposal. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49395 llvm-svn: 338770	2018-08-02 21:44:24 +00:00
Craig Topper	db89ec1185	[X86] Autogenerate complete checks. NFC llvm-svn: 338765	2018-08-02 20:28:45 +00:00
Sam Clegg	41d7047de5	[WebAssembly] Ensure bitcasts that would result in invalid wasm are removed by FixFunctionBitcasts Rather than allowing invalid bitcasts to be lowered to wasm call instructions that won't validate, generate wrappers that contain unreachable thereby delaying the error until runtime. Differential Revision: https://reviews.llvm.org/D49517 llvm-svn: 338744	2018-08-02 17:38:06 +00:00
Craig Topper	0423881820	[X86] Allow fake unary unpckhpd and movhlps to be commuted for execution domain fixing purposes These instructions perform the same operation, but the semantic of which operand is destroyed is reversed. If the same register is used as both operands we can change the execution domain without worrying about this difference. Unfortunately, this really only works in cases where the input register is killed by the instruction. If its not killed, the two address isntruction pass inserts a copy that will become a move instruction. This makes the instruction use different physical registers that contain the same data at the time the unpck/movhlps executes. I've considered using a unary pseudo instruction with tied operand to trick the two address instruction pass. We could then expand the pseudo post regalloc to get the same physical register on both inputs. Differential Revision: https://reviews.llvm.org/D50157 llvm-svn: 338735	2018-08-02 16:48:01 +00:00
Simon Pilgrim	ef494e1722	[X86][SSE] Add uniform/non-uniform exact sdiv vector tests covering all paths Regenerated tests and tested on 64-bit (AVX2) as well. llvm-svn: 338729	2018-08-02 15:34:51 +00:00
Sjoerd Meijer	8e7fab0443	[ARM][NFC] Follow up of r338568 I disabled more tests than necessary, this enables them. llvm-svn: 338717	2018-08-02 14:04:48 +00:00
Matt Arsenault	1f3977a856	DAG: Fix vector widening fcanonicalize llvm-svn: 338715	2018-08-02 13:43:53 +00:00
Matt Arsenault	36cdcfadcf	AMDGPU: Fix scalarizing v4f16 fcanonicalize llvm-svn: 338714	2018-08-02 13:43:42 +00:00
Simon Pilgrim	090d58b2b5	[X86][SSE] Add more UDIV nonuniform-constant vector tests Ensure we cover all paths for vector data as requested on D49248 llvm-svn: 338698	2018-08-02 10:53:53 +00:00
Alexander Ivchenko	49168f6778	[GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value This is logical continuation of https://reviews.llvm.org/D46018 (r332449) Differential Revision: https://reviews.llvm.org/D49660 llvm-svn: 338685	2018-08-02 08:33:31 +00:00
Lei Liu	b9a7b7a84d	Fix FCOPYSIGN expansion In expansion of FCOPYSIGN, the shift node is missing when the two operands of FCOPYSIGN are of the same size. We should always generate shift node (if the required shift bit is not zero) to put the sign bit into the right position, regardless of the size of underlying types. Differential Revision: https://reviews.llvm.org/D49973 llvm-svn: 338665	2018-08-02 01:54:12 +00:00
Nemanja Ivanovic	e1a525ed06	[PowerPC] Do not round values prior to converting to integer Adding the FP_ROUND nodes when combining FP_TO_[SU]INT of elements feeding a BUILD_VECTOR into an FP_TO_[SU]INT of the built vector loses precision. This patch removes the code that adds these nodes to true f64 operands. It also adds patterns required to ensure the code is still vectorized rather than converting individual elements and inserting into a vector. Fixes https://bugs.llvm.org/show_bug.cgi?id=38342 Differential Revision: https://reviews.llvm.org/D50121 llvm-svn: 338658	2018-08-02 00:03:22 +00:00
Reid Kleckner	a30a6d2c29	Load from the GOT for external symbols in the large, PIC code model Do the same handling for external symbols that we do for jump table symbols and global values. Fixes one of the cases in PR38385 llvm-svn: 338651	2018-08-01 22:56:05 +00:00
Matt Arsenault	709374d186	AMDGPU: Improve hack for packing conversion ops Mutate the node type during selection when it doesn't matter. This avoids an intermediate bitcast node on targets with legal i16/f16. Also fixes missing output modifiers on v_cvt_pkrtz_f32_f16, which I assume are OK. llvm-svn: 338619	2018-08-01 20:13:58 +00:00
Matt Arsenault	55ab9213d3	AMDGPU: Partially fix handling of packed amdgpu_ps arguments Fixes annoying limitations when writing tests. Also remove more leftover code for manually scalarizing arguments and return values. llvm-svn: 338618	2018-08-01 19:57:34 +00:00
Heejin Ahn	b3724b7169	[WebAssembly] Support for a ternary atomic RMW instruction Summary: This adds support for a ternary atomic RMW instruction: cmpxchg. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49195 llvm-svn: 338617	2018-08-01 19:40:28 +00:00
Craig Topper	c985d42903	[X86] Canonicalize the pattern for __builtin_ffs in a similar way to '__builtin_ffs + 5' We now emit a move of -1 before the cmov and do the addition after the cmov just like the case with an extra addition. This may be slightly worse for code size, but is more consistent with other compilers. And we might be able to hoist the mov -1 outside of loops. llvm-svn: 338613	2018-08-01 18:38:46 +00:00
Craig Topper	ffb8eb30ff	[X86] Add test cases for the patterns used by __builtin_ffs. We previously had tests for "__builtin_ffs + 5", but the SelectinoDAG without an extra addition came out slightly different. llvm-svn: 338612	2018-08-01 18:38:43 +00:00
Jan Vesely	93b252799b	AMDGPU/R600: Convert kernel param loads to use PARAM_I_ADDRESS Non ext aligned i32 loads are still optimized to use CONSTANT_BUFFER (AS 8) llvm-svn: 338610	2018-08-01 18:36:07 +00:00
Vlad Tsyrklevich	ab016e00ec	[X86] FastISel fall back on !absolute_symbol GVs Summary: D25878, which added support for !absolute_symbol for normal X86 ISel, did not add support for materializing references to absolute symbols for X86 FastISel. This causes build failures because FastISel generates PC-relative relocations for absolute symbols. Fall back to normal ISel for references to !absolute_symbol GVs. Fix for PR38200. Reviewers: pcc, craig.topper Reviewed By: pcc Subscribers: hiraditya, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D50116 llvm-svn: 338599	2018-08-01 17:44:37 +00:00
Sanjay Patel	d5ae183034	[x86] remove stale FIXME note from test; NFC This was fixed with rL338592. llvm-svn: 338593	2018-08-01 17:18:50 +00:00
Sanjay Patel	8aac22e06a	[SelectionDAG] fix bug in translating funnel shift with non-power-of-2 type The bug is visible in the constant-folded x86 tests. We can't use the negated shift amount when the type is not power-of-2: https://rise4fun.com/Alive/US1r ...so in that case, use the regular lowering that includes a select to guard against a shift-by-bitwidth. This path is improved by only calculating the modulo shift amount once now. Also, improve the rotate (with power-of-2 size) lowering to use a negate rather than subtract from bitwidth. This improves the codegen whether we have a rotate instruction or not (although we can still see that we're not matching to a legal rotate in all cases). llvm-svn: 338592	2018-08-01 17:17:08 +00:00
Sanjay Patel	6d302c93cc	[x86] add tests to show miscompile for funnel shift with weird size; NFC llvm-svn: 338587	2018-08-01 16:59:54 +00:00
Sjoerd Meijer	590e4e8dde	[ARM] Armv8.2-A FP16 vector intrinsics tests Clang support for the Armv8.2-A FP16 vector intrinsic was committed in rC328277, but this was never followed up, i.e. the LLVM part is missing. I've raised PR38404, and this is the first step to address this. I.e., this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows which intrinsics already work, and which need further work. Differential Revision: https://reviews.llvm.org/D50142 llvm-svn: 338568	2018-08-01 14:43:59 +00:00
Cameron McInally	04ae85859d	[FPEnv] Widen illegal width StrictFP vector operations as needed Differential Revision: https://reviews.llvm.org/D49806 llvm-svn: 338562	2018-08-01 14:17:19 +00:00
Bryan Chan	67106b5e08	[AArch64] Fix FCCMP with FP16 operands Summary: This patch adds support for FCCMP instruction with FP16 operands, avoiding an assertion during instruction selection. Reviewers: olista01, SjoerdMeijer, t.p.northover, javed.absar Reviewed By: SjoerdMeijer Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D50115 llvm-svn: 338554	2018-08-01 13:50:29 +00:00
Ryan Taylor	894c8fd0e2	[AMDGPU] Optimize _L image intrinsic to _LZ when lod is zero Summary: Add _L to _LZ image intrinsic table mapping to table gen. In ISelLowering check if image intrinsic has lod and if it's equal to zero, if so remove lod and change opcode to equivalent mapped _LZ. Change-Id: Ie24cd7e788e2195d846c7bd256151178cbb9ec71 Subscribers: arsenm, mehdi_amini, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49483 llvm-svn: 338523	2018-08-01 12:12:01 +00:00
Ulrich Weigand	58a9786e81	[SystemZ, TableGen] Fix shift count handling The DAG combiner logic to simplify AND masks in shift counts is invalid. While it is true that the SystemZ shift instructions ignore all but the low 6 bits of the shift count, it is still invalid to simplify the AND masks while the DAG still uses the standard shift operators (which are not defined to match the SystemZ instruction behavior). Instead, this patch performs equivalent operations during instruction selection. For completely removing the AND, this now happens via additional DAG match patterns implemented by a multi-alternative PatFrags. For simplifying a 32-bit AND to a 16-bit AND, the existing DAG patterns were already mostly OK, they just needed an output XForm to actually truncate the immediate value. Unfortunately, the latter change also exposed a bug in TableGen: it seems XForms are currently only handled correctly for direct operands of the outermost operation node. This patch also fixes that bug by simply recurring through the whole pattern. This should be NFC for all other targets. Differential Revision: https://reviews.llvm.org/D50096 llvm-svn: 338521	2018-08-01 11:57:58 +00:00
Petar Jovanovic	64c10ba8e2	[MIPS GlobalISel] Select global address Select G_GLOBAL_VALUE for position dependent code. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49803 llvm-svn: 338499	2018-08-01 09:03:23 +00:00
Jatin Bhateja	36432a70c1	[X86] Adding more test patterns for lea-opt (PR37939) Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50128 llvm-svn: 338483	2018-08-01 03:53:27 +00:00
Chandler Carruth	2ce191e220	[x86] Fix a really subtle miscompile due to a somewhat glaring bug in EFLAGS copy lowering. If you have a branch of LLVM, you may want to cherrypick this. It is extremely unlikely to hit this case empirically, but it will likely manifest as an "impossible" branch being taken somewhere, and will be ... very hard to debug. Hitting this requires complex conditions living across complex control flow combined with some interesting memory (non-stack) initialized with the results of a comparison. Also, because you have to arrange for an EFLAGS copy to be in just the right place, almost anything you do to the code will hide the bug. I was unable to reduce anything remotely resembling a "good" test case from the place where I hit it, and so instead I have constructed synthetic MIR testing that directly exercises the bug in question (as well as the good behavior for completeness). The issue is that we would mistakenly assume any SETcc with a valid condition and an initial operand that was a register and a virtual register at that to be a register defining SETcc... It isn't though.... This would in turn cause us to test some other bizarre register, typically the base pointer of some memory. Now, testing this register and using that to branch on doesn't make any sense. It even fails the machine verifier (if you are running it) due to the wrong register class. But it will make it through LLVM, assemble, and it looks fine... But wow do you get a very unsual and surprising branch taken in your actual code. The fix is to actually check what kind of SETcc instruction we're dealing with. Because there are a bunch of them, I just test the may-store bit in the instruction. I've also added an assert for sanity that ensure we are, in fact, defining the register operand. =D llvm-svn: 338481	2018-08-01 03:01:58 +00:00
Chandler Carruth	014047a99a	[x86/slh] Add unwind info to several tests to make it more obvious that we aren't incorrectly generating any of it when doing SLH. There was a bug that only occured with SLH that very much looked like it could be caused by bad unwind info, and so this was a prime suspect. Turns out that everything is fine, but this way we'll see if we end up, for example, putting things we shouldn't inside the prolog. llvm-svn: 338480	2018-08-01 03:01:10 +00:00
Amara Emerson	6cdfe29d8e	[GlobalISel][IRTranslator] Use RPO traversal when visiting blocks to translate. Previously we were just visiting the blocks in the function in IR order, which is rather arbitrary. Therefore we wouldn't always visit defs before uses, but the translation code relies on this assumption in some places. Only codegen change seen in tests is an elision of a redundant copy. Fixes PR38396 llvm-svn: 338476	2018-08-01 02:17:42 +00:00
Konstantin Zhuravlyov	bb30ef7af4	AMDGPU: Add clamp bit to dot intrinsics Differential Revision: https://reviews.llvm.org/D49874 llvm-svn: 338470	2018-08-01 01:31:30 +00:00
Reid Kleckner	b32ff46ff7	Revert r338354 "[ARM] Revert r337821" Disable ARMCodeGenPrepare by default again. It is causing verifier failues in V8 that look like: Duplicate integer as switch case switch i32 %trunc, label %if.end13 [ i32 0, label %cleanup36 i32 0, label %if.then8 ], !dbg !4981 i32 0 fatal error: error in backend: Broken function found, compilation aborted! I will continue reducing the test case and send it along. llvm-svn: 338452	2018-07-31 23:09:42 +00:00
Matt Arsenault	118c47b6d1	AMDGPU: Split amdgcn/r600 fminnum/fmaxnum tests R600 breaks on too many things to usefully test changes with ieee_mode on vs. off. llvm-svn: 338435	2018-07-31 20:38:42 +00:00
Matt Arsenault	feedabfde7	AMDGPU: Break 64-bit arguments into 32-bit pieces llvm-svn: 338421	2018-07-31 19:29:04 +00:00
Matt Arsenault	0395da7842	AMDGPU: Split wide vectors of i16/f16 into 32-bit regs on calls This improves code for the same reasons as scalarizing 32-bit element vectors. llvm-svn: 338418	2018-07-31 19:17:47 +00:00
Matt Arsenault	9ced1e0d80	AMDGPU: Scalarize vector argument types to calls When lowering calling conventions, prefer to decompose vectors into the constitute register types. This avoids artifical constraints to satisfy a wide super-register. This improves code quality because now optimizations don't need to deal with the super-register constraint. For example the immediate folding code doesn't deal with 4 component reg_sequences, so by breaking the register down earlier the existing immediate folding code is able to work. This also avoids the need for the shader input processing code to manually split vector types. llvm-svn: 338416	2018-07-31 19:05:14 +00:00
Simon Pilgrim	5d9b00d15b	[X86][SSE] Use ISD::MULHU for constant/non-zero ISD::SRL lowering (PR38151) As was done for vector rotations, we can efficiently use ISD::MULHU for vXi8/vXi16 ISD::SRL lowering. Shift-by-zero cases are still problematic (mainly on v32i8 due to extra AND/ANDN/OR or VPBLENDVB blend masks but v8i16/v16i16 aren't great either if PBLENDW fails) so I've limited this first patch to known non-zero cases if we can't easily use PBLENDW. Differential Revision: https://reviews.llvm.org/D49562 llvm-svn: 338407	2018-07-31 18:05:56 +00:00
Craig Topper	bef126fb71	[X86] Add pattern matching for PMADDUBSW Summary: Similar to D49636, but for PMADDUBSW. This instruction has the additional complexity that the addition of the two products saturates to 16-bits rather than wrapping around. And one operand is treated as signed and the other as unsigned. A C example that triggers this pattern ``` static const int N = 128; int8_t A[2N]; uint8_t B[2N]; int16_t C[N]; void foo() { for (int i = 0; i != N; ++i) C[i] = MIN(MAX((int16_t)A[2i](int16_t)B[2i] + (int16_t)A[2i+1](int16_t)B[2i+1], -32768), 32767); } ``` Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49829 llvm-svn: 338402	2018-07-31 17:12:08 +00:00
Craig Topper	d03d44e0b9	[X86] Add test cases that could use PMADDUBSW. llvm-svn: 338401	2018-07-31 17:12:06 +00:00
Francis Visoiu Mistrih	ae8002c1cf	[X86] Preserve more liveness information in emitStackProbeInline This commit fixes two issues with the liveness information after the call: 1) The code always spills RCX and RDX if InProlog == true, which results in an use of undefined phys reg. 2) FinalReg, JoinReg, RoundedReg, SizeReg are not added as live-ins to the basic blocks that use them, therefore they are seen undefined. https://llvm.org/PR38376 Differential Revision: https://reviews.llvm.org/D50020 llvm-svn: 338400	2018-07-31 16:41:12 +00:00
Matt Arsenault	a5ed032118	DAG: Fix PromoteFloatResult for fcanonicalize llvm-svn: 338382	2018-07-31 14:15:22 +00:00
Matt Arsenault	4aec86d37a	AMDGPU: Fold undef fcanonicalize to qNaN We could choose a free 0 for this, but this matches the behavior for fmul undef, 1.0. Also, the NaN use is more useful for folding use operations although if it's not eliminated it is more expensive in terms of code size. llvm-svn: 338376	2018-07-31 13:34:31 +00:00
Matt Arsenault	c1335eaf7e	AMDGPU: Fix test check line bugs llvm-svn: 338374	2018-07-31 13:25:23 +00:00
Jonas Paulsson	2f12e45d5a	[SystemZ] Improve decoding in case of instructions with four register operands. Since z13, the max group size will be 2 if any μop has more than 3 register sources. This has been ignored sofar in the SystemZHazardRecognizer, but is now handled by recognizing those instructions and adjusting the tracking of decoding and the cost heuristic for grouping. Review: Ulrich Weigand https://reviews.llvm.org/D49847 llvm-svn: 338368	2018-07-31 13:00:42 +00:00
Sam Parker	2a6c842fda	[ARM] Revert r337821 Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354	2018-07-31 09:04:14 +00:00
Craig Topper	9164b9b16e	[X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont. In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path. llvm-svn: 338342	2018-07-31 00:43:54 +00:00
Ana Pazos	2baa767455	[RISCV] Fixed test case failure due to r338047 llvm-svn: 338341	2018-07-31 00:36:28 +00:00
Amara Emerson	1e8c164c63	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR. Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337	2018-07-31 00:09:02 +00:00
Amara Emerson	0e86c07077	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal. Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336	2018-07-31 00:08:56 +00:00
Amara Emerson	6aff5a7810	[GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants. Differential Revision: https://reviews.llvm.org/D49900 llvm-svn: 338335	2018-07-31 00:08:50 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Jessica Paquette	fa3bee4756	[MachineOutliner][AArch64] Add support for saving LR to a register This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278	2018-07-30 17:45:28 +00:00
Jessica Paquette	bbcc8895bb	Add machine verifier to arm64-opt-remarks-lazy-bfi Previously, I thought this was a Windows failure. Then I realized it failed on every bot that used the verifier. This makes it use the verifier always, and adds that pass to the pipeline checks so that it's consistent across all bots. llvm-svn: 338272	2018-07-30 17:13:25 +00:00
David Bolvansky	2fa7fb14ea	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270	2018-07-30 16:50:00 +00:00
Thomas Preud'homme	196149c943	Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR" This reapplies commit r338206 reverted by r338214 since the bug that r338206 uncovered has been fixed in r338268. Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338269	2018-07-30 16:48:39 +00:00
Thomas Preud'homme	6c1b075299	Fix uninitialized read in ARM's PrintAsmOperand Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268	2018-07-30 16:45:40 +00:00
Jessica Paquette	7816531f3c	Attempt to fix Windows test failure caused by r338133 It seems like the pass pipeline on Windows is slightly different than on Linux and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing. This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly again. It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT lines following the line we care about (the optimization remark emitter) do a pretty good job of enforcing the ordering we want. Hopefully this works, since I don't have a Windows machine. ;) Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295 llvm-svn: 338267	2018-07-30 16:36:22 +00:00
Simon Pilgrim	186b62c9e4	[X86] Regenerate NOBMI/BMI combine-select tests. Test cleanup for D38128 llvm-svn: 338265	2018-07-30 16:18:38 +00:00
Simon Pilgrim	2d5118432b	[X86] Regenerate PKU test to merge 32/64-bit rdpkru checks Test cleanup for D38128 llvm-svn: 338264	2018-07-30 16:15:18 +00:00
Simon Pilgrim	22ff9f94bb	[X86] Regenerate fast-isel tests. Test cleanup for D38128 llvm-svn: 338262	2018-07-30 16:13:40 +00:00
Krzysztof Parzyszek	24fae50905	[Hexagon] Simplify A4_rcmp[n]eqi R, 0 Consider cases when register R is known to be zero/non-zero, or when it is defined by a C2_muxii instruction. llvm-svn: 338251	2018-07-30 14:28:02 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Francis Visoiu Mistrih	7d003657de	[MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237	2018-07-30 09:59:33 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Petr Pavlu	8b6eff4e77	[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233	2018-07-30 08:49:30 +00:00
Sanjay Patel	7312206f2f	revert r338206 because the test does not pass Example of bot failure: http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll llvm-svn: 338214	2018-07-29 14:30:49 +00:00
Thomas Preud'homme	74ffd14e15	Fix crash on inline asm with 64bit matching input in 32bit GPR Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338206	2018-07-28 21:33:39 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Wouter van Oortmerssen	a90d24da1c	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f. (SVN revision 338164) llvm-svn: 338176	2018-07-27 23:19:51 +00:00
Craig Topper	c3e11bf3f7	[X86] Add support expanding multiplies by constant where the constant is -3/-5/-9 multplied by a power of 2. These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do. llvm-svn: 338174	2018-07-27 23:04:59 +00:00
Wouter van Oortmerssen	a67c4137c3	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D49160 llvm-svn: 338164	2018-07-27 20:56:43 +00:00
Jessica Paquette	f90edbe3d6	Recommit "Enable MachineOutliner by default under -Oz for AArch64" Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160	2018-07-27 20:18:27 +00:00
Sanjay Patel	06c7d5aef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150	2018-07-27 18:31:21 +00:00
Evandro Menezes	fcca45f0dd	[ARM] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147	2018-07-27 18:16:47 +00:00
Sanjay Patel	efac39eef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC llvm-svn: 338143	2018-07-27 18:12:29 +00:00
Jessica Paquette	faea2d3130	Revert "Enable MachineOutliner by default under -Oz for AArch64" It failed an Asan test on a bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio Fixing that before recommitting. llvm-svn: 338136	2018-07-27 17:25:38 +00:00
Yonghong Song	04ccfda075	bpf: add missing RegState to notify MachineInstr verifier necessary register usage Errors like the following are reported by: https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e= * Bad machine code: Explicit definition marked as use * - function: cal_align1 - basic block: %bb.0 entry (0x47edd98) - instruction: LDB $r3, $r2, 0 - operand 0: $r3 This is because RegState info was missing for ScratchReg inside expandMEMCPY. This caused incomplete register usage information to MachineInstr verifier which then would complain as there could be potential code-gen issue if the complained MachineInstr is used in place where register usage information matters even though the memcpy expanding is not in such case as it happens at the last stage of IR optimization pipeline. We should always specify those register usage information which compiler couldn't deduct automatically whenever we add a hardware register manually. Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261 Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 338134	2018-07-27 16:58:52 +00:00
Jessica Paquette	d4229b985c	Enable MachineOutliner by default under -Oz for AArch64 This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338133	2018-07-27 16:44:42 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Sanjay Patel	1812d33e22	[x86] add more tests for signbit math; NFC llvm-svn: 338131	2018-07-27 16:22:40 +00:00
Sanjay Patel	60c04b961e	[PowerPC] add more tests for signbit math; NFC llvm-svn: 338130	2018-07-27 16:22:18 +00:00
Sanjay Patel	f815bc658b	[AArch64] add more tests for signbit math; NFC llvm-svn: 338129	2018-07-27 16:21:56 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Craig Topper	561e298e29	[X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and -INT64_MAX as power of 2 minus 1 in the multiply expansion code. Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap. llvm-svn: 338101	2018-07-27 05:56:27 +00:00
Craig Topper	e364baa88b	[X86] Add matching for another pattern of PMADDWD. Summary: This is the pattern you get from the loop vectorizer for something like this int16_t A[1024]; int16_t B[1024]; int32_t C[512]; void pmaddwd() { for (int i = 0; i != 512; ++i) C[i] = (A[2i]B[2i]) + (A[2i+1]B[2i+1]); } In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49636 llvm-svn: 338097	2018-07-27 04:29:10 +00:00
Craig Topper	f7bc550223	[X86] When removing sign extends from gather/scatter indices, make sure we handle UpdateNodeOperands finding an existing node to CSE with. If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node. llvm-svn: 338090	2018-07-27 00:00:30 +00:00
Craig Topper	1a40a06549	[SelectionDAGBuilder] Add masked loads to PendingLoads rather than calling DAG.setRoot. Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load. This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization. llvm-svn: 338085	2018-07-26 23:22:11 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Ana Pazos	2e4106b73d	[RISCV] Add support for _interrupt attribute - Save/restore only registers that are used. This includes Callee saved registers and Caller saved registers (arguments and temporaries) for integer and FP registers. - If there is a call in the interrupt handler, save/restore all Caller saved registers (arguments and temporaries) and all FP registers. - Emit special return instructions depending on "interrupt" attribute type. Based on initial patch by Zhaoshi Zheng. Reviewers: asb Reviewed By: asb Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D48411 llvm-svn: 338047	2018-07-26 17:49:43 +00:00
Matthias Braun	09810c9269	MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling When fusing instructions A and B, we must add all predecessors of B as predecessors of A to avoid instructions getting scheduling in between. There is a special case involving ExitSU: Every other node must be scheduled before it by design and we don't need to make this explicit in the graph, however when fusing with a different node we need to schedule every othere node before the fused node too and we need to make this explicit now: This patch adds a dependency from the fused node to all roots in the graph. Differential Revision: https://reviews.llvm.org/D49830 llvm-svn: 338046	2018-07-26 17:43:56 +00:00
Roman Lebedev	41ba5c1455	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes. Summary: A follow-up for D49266 / rL337166. At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyhZZ We won't get to these cases with I1 being -1, as that will be constant-folded to true or false. I'm also not sure we actually hit the 'ule' case, but i think the worst think that could happen is that being dead code. Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49497 llvm-svn: 338044	2018-07-26 17:34:28 +00:00
Stefan Maksimovic	4a612d4bf2	[mips] Sign extend i32 return values on MIPS64 Override getTypeForExtReturn so that functions returning an i32 typed value have it sign extended on MIPS64. Also provide patterns to get rid of unneeded sign extensions for arithmetic instructions which implicitly sign extend their results. Differential Revision: https://reviews.llvm.org/D48374 llvm-svn: 338019	2018-07-26 10:59:35 +00:00
Martin Storsjo	9dafd6f6d9	Revert "[COFF] Use comdat shared constants for MinGW as well" This reverts commit r337951. While that kind of shared constant generally works fine in a MinGW setting, it broke some cases of inline assembly that worked before: $ cat const-asm.c int MULH(int a, int b) { int rt, dummy; __asm__ ( "imull %3" :"=d"(rt), "=a"(dummy) :"a"(a), "rm"(b) ); return rt; } int func(int a) { return MULH(a, 1); } $ clang -target x86_64-win32-gnu -c const-asm.c -O2 const-asm.c:4:9: error: invalid variant '00000001' "imull %3" ^ <inline asm>:1:15: note: instantiated into assembly here imull __real@00000001(%rip) ^ A similar error is produced for i686 as well. The same test with a target of x86_64-win32-msvc or i686-win32-msvc works fine. llvm-svn: 338018	2018-07-26 10:48:20 +00:00
Craig Topper	4e687d5bb2	[X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul. I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited. So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations. Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now. llvm-svn: 338007	2018-07-26 05:40:10 +00:00
Amara Emerson	fdd089aa14	[GlobalISel] Fall back to SDISel for swifterror/swiftself attributes. We don't currently support these, fall back until we do. llvm-svn: 337994	2018-07-26 01:25:58 +00:00
Yonghong Song	71d81e5c8f	bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977	2018-07-25 22:40:02 +00:00
Sanjay Patel	215dcbf4db	[SelectionDAG] try to convert funnel shift directly to rotate if legal If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966	2018-07-25 21:38:30 +00:00
Sanjay Patel	f94c4c84e6	[AArch, PowerPC] add more tests for legal rotate ops; NFC llvm-svn: 337964	2018-07-25 21:25:50 +00:00

... 3 4 5 6 7 ...

25699 Commits