llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	24b346da42	[X86] Emit a single shuffle for the v16i8->v4i32 step of a SIGN_EXTEND_VECTOR_INREG lowering on pre-sse4.1 targets. Previously we emitted to separate shuffles, one for unpcklbw and one for unpcklwd. Instead emit a single shuffle equivalent to both of the original shuffles. Shuffle lowering seems able to handle it. This avoids a bitcast between the two shuffles which seems helpful to DAG combine. Remove the custom type legalization for v8i8->v8i32. I had put that in to avoid some almost duplicate punpcklbw instructions I was seeing, but this lowering change seems to fix that. It also fixes some duplicate shuffles seen in vector-sext.ll llvm-svn: 347348	2018-11-20 21:21:52 +00:00
Craig Topper	17fa42a69b	[X86] Preserve undef information when creating a punpckl/hbw from a v16i8 where all the even or odd elements are undef. Previously if V2 was unused we ended up using V1 for both inputs as part of the code that follows the new code. By using lowerVectorShuffleWithUNPCK we keep the undef nature of V2 in the output. As near as I can tell this makes v16i8 behavior consistent with every other VT now. This does mean that we give the register allocator freedom to fill in random registers now and create false dependencies. But like I said we're already doing that for other types. llvm-svn: 347296	2018-11-20 09:04:01 +00:00
Simon Pilgrim	7f92efa5a9	[X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions. llvm-svn: 347177	2018-11-18 22:13:31 +00:00
Sanjay Patel	0a515595a7	[x86] allow vector load narrowing with multi-use values This is a long-awaited follow-up suggested in D33578. Since then, we've picked up even more opportunities for vector narrowing from changes like D53784, so there are a lot of test diffs. Apart from 2-3 strange cases, these are all wins. I've structured this to be no-functional-change-intended for any target except for x86 because I couldn't tell if AArch64, ARM, and AMDGPU would improve or not. All of those targets have existing regression tests (4, 4, 10 files respectively) that would be affected. Also, Hexagon overrides the shouldReduceLoadWidth() hook, but doesn't show any regression test diffs. The trade-off is deciding if an extra vector load is better than a single wide load + extract_subvector. For x86, this is almost always better (on paper at least) because we often can fold loads into subsequent ops and not increase the official instruction count. There's also some unknown -- but potentially large -- benefit from using narrower vector ops if wide ops are implemented with multiple uops and/or frequency throttling is avoided. Differential Revision: https://reviews.llvm.org/D54073 llvm-svn: 346595	2018-11-10 20:05:31 +00:00
Craig Topper	9a7e19b8f2	[DAGCombiner][X86][Mips] Enable combineShuffleOfScalars to run between vector op legalization and DAG legalization. Fix bad one use check in combineShuffleOfScalars It's possible for vector op legalization to generate a shuffle. If that happens we should give a chance for DAG combine to combine that with a build_vector input. I also fixed a bug in combineShuffleOfScalars that was considering the number of uses on a undef input to a shuffle. We don't care how many times undef is used. Differential Revision: https://reviews.llvm.org/D54283 llvm-svn: 346530	2018-11-09 18:04:34 +00:00
Craig Topper	f7108aef14	[X86] In LowerEXTEND_VECTOR_INREG, emit a vector shuffle instead of directly using X86ISD::UNPCKL The majority of the changes are because the rest of shuffle lowering/combining prefers to replace the undef input with the other operand. Using UNPCKL directly seemed to avoid this and just grabbed a randomish register for the undef which can create false dependencies. llvm-svn: 346050	2018-11-02 22:48:02 +00:00
Craig Topper	60c202a494	[X86] Don't emit *_extend_vector_inreg nodes when both the input and output types are legal with AVX1 We already have custom lowering for the AVX case in LegalizeVectorOps. So its better to keep the regular extend op around as long as possible. I had to qualify one place in DAG combine that created illegal vector extending load operations. This change by itself had no effect on any tests which is why its included here. I've made a few cleanups to the custom lowering. The sign extend code no longer creates an identity shuffle with undef elements. The zero extend code now emits a zero_extend_vector_inreg instead of an unpckl with a zero vector. For the high half of the custom lowering of zero_extend/any_extend, we're now using an unpckh with a zero vector or undef. Previously we used used a pshufd to move the upper 64-bits to the lower 64-bits and then used a zero_extend_vector_inreg. I think the zero vector should require less execution resources and be smaller code size. Differential Revision: https://reviews.llvm.org/D54024 llvm-svn: 346043	2018-11-02 21:09:49 +00:00
Sanjay Patel	8b207defea	[DAGCombiner] narrow vector binops when extraction is cheap Narrowing vector binops came up in the demanded bits discussion in D52912. I don't think we're going to be able to do this transform in IR as a canonicalization because of the risk of creating unsupported widths for vector ops, but we already have a DAG TLI hook to allow what I was hoping for: isExtractSubvectorCheap(). This is currently enabled for x86, ARM, and AArch64 (although only x86 has existing regression test diffs). This is artificially limited to not look through bitcasts because there are so many test diffs already, but that's marked with a TODO and is a small follow-up. Differential Revision: https://reviews.llvm.org/D53784 llvm-svn: 345602	2018-10-30 14:14:34 +00:00
Craig Topper	aa5eb2fbaa	[X86] Force floating point values in constant pool decoding to print in scientific notation so they can't be confused with integers. When the floating point constants are whole numbers they have no decimal point so look like integers, but mean something very different in something like an 'and' instruction. Ideally we would just print a decimal point and a 0, but I couldn't see how to make APFloat::toString do that. llvm-svn: 345488	2018-10-29 04:52:04 +00:00
Simon Pilgrim	838eb24014	[TargetLowering] Improve vXi64 UINT_TO_FP vXf64 support (P38226) As suggested on D52965, this patch moves the i64 to f64 UINT_TO_FP expansion code from LegalizeDAG into TargetLowering and makes it available to LegalizeVectorOps as well. Not only does this help perform X86 lowering as a true vectorization instead of (partially vectorized) scalar conversions, it avoids the HADDPD op from the scalar code which can be slow on most targets. The AVX512F does have the vcvtusi2sdq scalar operation but we don't unroll to use it as it seems to only help for the v2f64 case - otherwise the unrolling cost will certainly be too high. My feeling is that we should leave it to the vectorizers - and if it generates the vector UINT_TO_FP we should use it. Differential Revision: https://reviews.llvm.org/D53649 llvm-svn: 345256	2018-10-25 11:15:57 +00:00
Simon Pilgrim	0dcf1cea03	[X86][SSE] Add SSE41 vector int2fp tests llvm-svn: 343925	2018-10-06 20:24:27 +00:00
Simon Pilgrim	ad23f270db	[X86] Standardize floating point assembly comments Consistently try to use APFloat::toString for floating point constant comments to get rid of differences between Constant / ConstantDataSequential values - it should help stop some of the linux-windows buildbot failures matching NaN/INF etc. as well. Differential Revision: https://reviews.llvm.org/D52702 llvm-svn: 343562	2018-10-02 09:08:51 +00:00
Craig Topper	fe0b973fbf	[X86] Remove an fp->int->fp domain crossing in LowerUINT_TO_FP_i64. Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back? Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52134 llvm-svn: 342327	2018-09-15 16:23:35 +00:00
Simon Pilgrim	1d181bc992	[X86][AVX] Use extract_subvector to reduce vector op widths (PR36761) We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types. This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types. Differential Revision: https://reviews.llvm.org/D49556 llvm-svn: 337500	2018-07-19 21:52:06 +00:00
Craig Topper	07a1787501	[X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147	2018-07-16 06:56:09 +00:00
Vedant Kumar	cc7b2a55c2	[DAGCombiner] Change the SDLoc on split extloads (2/N) In DAGCombiner, we try to simplify this pattern: ([s\|z]ext (load ...)) Conceptually, a new extload which is created while splitting the load should have the same debug location as the load. Making this change affects the IROrder of the new load, causing some test case churn. In practice, the new location is never different from the location of the [s\|z]ext, at least not during check-llvm or a stage2 build. Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D46156 llvm-svn: 331301	2018-05-01 19:29:15 +00:00
Simon Pilgrim	ca38c762e4	[TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorElts Notably helps cleanup after legalization of vector types Differential Revision: https://reviews.llvm.org/D43674 llvm-svn: 326838	2018-03-06 22:32:01 +00:00
Geoff Berry	a2b9011290	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208	2018-02-27 16:59:10 +00:00
Quentin Colombet	48abac82b8	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r323991. This commit breaks target that don't model all the register constraints in TableGen. So far the workaround was to set the hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the cases. For instance, when mutating an instruction (like in the lowering of COPYs) the isRenamable flag is not properly updated. The same problem will happen when attaching machine operand from one instruction to another. Geoff Berry is working on a fix in https://reviews.llvm.org/D43042. llvm-svn: 325421	2018-02-17 03:05:33 +00:00
Geoff Berry	94503c7bc3	[MachineCopyPropagation] Extend pass to do COPY source forwarding Summary: This change extends MachineCopyPropagation to do COPY source forwarding and adds an additional run of the pass to the default pass pipeline just after register allocation. This version of this patch uses the newly added MachineOperand::isRenamable bit to avoid forwarding registers is such a way as to violate constraints that aren't captured in the Machine IR (e.g. ABI or ISA constraints). This change is a continuation of the work started in D30751. Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits Differential Revision: https://reviews.llvm.org/D41835 llvm-svn: 323991	2018-02-01 18:54:01 +00:00
Puyan Lotfi	43e94b15ea	Followup on Proposal to move MIR physical register namespace to '$' sigil. Discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html In preparation for adding support for named vregs we are changing the sigil for physical registers in MIR to '$' from '%'. This will prevent name clashes of named physical register with named vregs. llvm-svn: 323922	2018-01-31 22:04:26 +00:00
Francis Visoiu Mistrih	a8a83d150f	[CodeGen] Use MachineOperand::print in the MIRPrinter for MO_Register. Work towards the unification of MIR and debug output by refactoring the interfaces. For MachineOperand::print, keep a simple version that can be easily called from `dump()`, and a more complex one which will be called from both the MIRPrinter and MachineInstr::print. Add extra checks inside MachineOperand for detached operands (operands with getParent() == nullptr). https://reviews.llvm.org/D40836 * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+)<def> ([^ ]+)/kill: \1 def \2 \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/kill: def ([^ ]+) ([^ ]+) ([^ ]+)<def>/kill: def \1 \2 def \3/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/<def>//g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<kill>/killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use,kill>/implicit killed \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<def[ ],[ ]dead>/dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def[ ],[ ]dead>/implicit-def dead \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-def>/implicit-def \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<imp-use>/implicit \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name ".s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<internal>/internal \1/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" -o -name "*.s" $ -type f -print0 \| xargs -0 sed -i '' -E 's/([^ ]+)<undef>/undef \1/g' llvm-svn: 320022	2017-12-07 10:40:31 +00:00
Francis Visoiu Mistrih	25528d6de7	[CodeGen] Unify MBB reference format in both MIR and debug output As part of the unification of the debug format and the MIR format, print MBB references as '%bb.5'. The MIR printer prints the IR name of a MBB only for block definitions. * find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)->getNumber/" << printMBBReference(\1)/g' find . $ -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#" << ([a-zA-Z0-9_]+)\.getNumber/" << printMBBReference(\1)/g' * find . $ -name ".txt" -o -name ".s" -o -name ".mir" -o -name ".cpp" -o -name ".h" -o -name ".ll" $ -type f -print0 \| xargs -0 sed -i '' -E 's/BB#([0-9]+)/%bb.\1/g' * grep -nr 'BB#' and fix Differential Revision: https://reviews.llvm.org/D40422 llvm-svn: 319665	2017-12-04 17:18:51 +00:00
Francis Visoiu Mistrih	9d7bb0cb40	[CodeGen] Print register names in lowercase in both MIR and debug output As part of the unification of the debug format and the MIR format, always print registers as lowercase. * Only debug printing is affected. It now follows MIR. Differential Revision: https://reviews.llvm.org/D40417 llvm-svn: 319187	2017-11-28 17:15:09 +00:00
Craig Topper	198f7d78d3	[X86] Regenerate a test with broadcast comments. NFC llvm-svn: 318640	2017-11-20 08:15:04 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Geoff Berry	fabedbad11	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This reverts commit r314729. Another bug has been encountered in an out-of-tree target reported by Quentin. llvm-svn: 314814	2017-10-03 16:59:13 +00:00
Geoff Berry	bfc5fb4571	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Avoid bug in regalloc greedy/machine verifier when forwarding to use in an instruction that re-defines the same virtual register. - Fixed bug when forwarding to use in EarlyClobber instruction slot. - Fixed incorrect forwarding to register definitions that showed up in explicit_uses() iterator (e.g. in INLINEASM). - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 314729	2017-10-02 22:01:37 +00:00
Simon Pilgrim	ec528b2cb6	Regenerate test (missing broadcast constant comments). NFCI. Still avoiding the floating point comments to prevent linux/windows discrepancies. llvm-svn: 314681	2017-10-02 15:22:35 +00:00
Craig Topper	a6054328e8	[X86] Teach the execution domain fixing tables to use movlhps inplace of unpcklpd for the packed single domain. MOVLHPS has a smaller encoding than UNPCKLPD in the legacy encodings. With VEX and EVEX encodings it doesn't matter. llvm-svn: 313509	2017-09-18 04:40:58 +00:00
Craig Topper	87f7381edf	[X86] Teach execution domain fixing to convert between FP and int unpack instructions. llvm-svn: 313508	2017-09-18 03:29:54 +00:00
Elena Demikhovsky	6cab129464	[X86 CodeGen] Optimization of ZeroExtendLoad for v2i8 vector Load with zero-extend and sign-extend from v2i8 to v2i32 is "Legal" since SSE4.1 and may be performed using PMOVZXBD , PMOVSXBD instructions. llvm-svn: 313121	2017-09-13 06:40:26 +00:00
Sam McCall	f71bb198ed	Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490	2017-09-04 15:47:00 +00:00
Geoff Berry	65528f2991	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328	2017-09-01 14:27:20 +00:00
Hans Wennborg	24775a0a6c	Revert r312154 "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"" It caused PR34387: Assertion failed: (RegNo < NumRegs && "Attempting to access record for invalid register number!") > Issues identified by buildbots addressed since original review: > - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. > - The pass no longer forwards COPYs to physical register uses, since > doing so can break code that implicitly relies on the physical > register number of the use. > - The pass no longer forwards COPYs to undef uses, since doing so > can break the machine verifier by creating LiveRanges that don't > end on a use (since the undef operand is not considered a use). > > [MachineCopyPropagation] Extend pass to do COPY source forwarding > > This change extends MachineCopyPropagation to do COPY source forwarding. > > This change also extends the MachineCopyPropagation pass to be able to > be run during register allocation, after physical registers have been > assigned, but before the virtual registers have been re-written, which > allows it to remove virtual register COPY LiveIntervals that become dead > through the forwarding of all of their uses. llvm-svn: 312178	2017-08-30 22:11:37 +00:00
Geoff Berry	feffb0c8af	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Issues identified by buildbots addressed since original review: - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312154	2017-08-30 18:41:07 +00:00
Craig Topper	48a7917079	[AVX512] Use 256-bit extract instructions for extracting bits [255:128] from a 512-bit register This enables the use of a smaller encoding by using a VEX instruction when possible. Differential Revision: https://reviews.llvm.org/D37092 llvm-svn: 312100	2017-08-30 07:26:12 +00:00
Craig Topper	641e2af9e8	[X86] Provide a separate feature bit for macro fusion support instead of basing it on the AVX flag Summary: Currently we determine if macro fusion is supported based on the AVX flag as a proxy for the processor being Sandy Bridge". This is really strange as now AMD supports AVX. It also means if user explicitly disables AVX we disable macro fusion. This patch adds an explicit macro fusion feature. I've also enabled for the generic 64-bit CPU (which doesn't have AVX) This is probably another candidate for being in the MI layer, but for now I at least wanted to correct the overloading of the AVX feature. Reviewers: spatel, chandlerc, RKSimon, zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37280 llvm-svn: 312097	2017-08-30 04:34:48 +00:00
Geoff Berry	bd47e8a4f7	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" round 2 This reverts commit r311135. sanitizer-x86_64-linux-android buildbot is timing out with just this patch applied. llvm-svn: 311142	2017-08-18 01:43:11 +00:00
Geoff Berry	51f52c4fca	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311135	2017-08-17 23:06:55 +00:00
Geoff Berry	4e38e02e6f	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. llvm-svn: 311062	2017-08-17 04:04:11 +00:00
Geoff Berry	87f8d25150	[MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311038	2017-08-16 20:50:01 +00:00
Simon Pilgrim	46dd55f1e1	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688	2017-06-04 20:12:04 +00:00
Hans Wennborg	b00ffd8cb7	Revert r302938 "Add LiveRangeShrink pass to shrink live range within BB." This also reverts follow-ups r303292 and r303298. It broke some Chromium tests under MSan, and apparently also internal tests at Google. llvm-svn: 303369	2017-05-18 18:50:05 +00:00
Dehao Chen	65dd23e273	Add LiveRangeShrink pass to shrink live range within BB. Summary: LiveRangeShrink pass moves instruction right after the definition with the same BB if the instruction and its operands all have more than one use. This pass is inexpensive and guarantees optimal live-range within BB. Reviewers: davidxl, wmi, hfinkel, MatzeB, andreadb Reviewed By: MatzeB, andreadb Subscribers: hiraditya, jyknight, sanjoy, skatkov, gberry, jholewinski, qcolombet, javed.absar, krytarowski, atrick, spatel, RKSimon, andreadb, MatzeB, mehdi_amini, mgorny, efriedma, davide, dberlin, llvm-commits Differential Revision: https://reviews.llvm.org/D32563 llvm-svn: 302938	2017-05-12 19:29:27 +00:00
Ayman Musa	d9fb157845	[X86][SSE2] Fix asm string for movq (Move Quadword) instruction. Replace "mov{d\|q}" with "movq". Differential Revision: https://reviews.llvm.org/D32220 llvm-svn: 301386	2017-04-26 07:08:44 +00:00
Michael Kuperstein	6129887d21	[X86] Revert r299387 due to AVX legalization infinite loop. llvm-svn: 299720	2017-04-06 22:33:25 +00:00
Simon Pilgrim	af33757b5d	[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging. This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values. There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch. Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this. Differential Revision: https://reviews.llvm.org/D31373 llvm-svn: 299387	2017-04-03 21:06:51 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00

1 2 3

103 Commits