llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	b941f5dc5f	[X86] Tag CET-IBT instruction scheduler classes llvm-svn: 324898	2018-02-12 15:57:00 +00:00
Simon Pilgrim	d0693a6501	[X86][MMX] Add missing scheduling class tag for EMMS/FEMMS We only tagged it with the itinerary class, so completeness checks were erroneously passed (PR35639). AMD targets can perform these a lot quicker than WriteMicrocoded so will need an override in the models. llvm-svn: 324897	2018-02-12 15:52:59 +00:00
Krzysztof Parzyszek	450d4cf93a	[NFC] Fix comment of class InstrStage Patch by Wei-Ren Chen. Differential Revision: https://reviews.llvm.org/D42905 llvm-svn: 324894	2018-02-12 15:02:49 +00:00
Alexey Bataev	ca2396e673	[SLP] Take user instructions cost into consideration in insertelement vectorization. Summary: For better vectorization result we should take into consideration the cost of the user insertelement instructions when we try to vectorize sequences that build the whole vector. I.e. if we have the following scalar code: ``` <Scalar code> insertelement <ScalarCode>, ... ``` we should consider the cost of the last `insertelement ` instructions as the cost of the scalar code. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D42657 llvm-svn: 324893	2018-02-12 14:54:48 +00:00
Oliver Stannard	4269917304	[AArch64] Improve v8.1-A code-gen for atomic load-subtract Armv8.1-A added an atomic load-add instruction, but not a load-subtract instruction. Our current code-generation for atomic load-subtract always inserts a NEG instruction to negate it's argument, even if it could be folded into a constant or another instruction. This adds lowering early in selection DAG to convert a load-subtract operation into a subtract and a load-add, allowing the normal DAG optimisations to work on it. I've left the old tablegen patterns in because they are still needed for global isel. Some of the tests in this patch are copied from D35375 by Chad Rosier (which was abandoned). Differential revision: https://reviews.llvm.org/D42477 llvm-svn: 324892	2018-02-12 14:22:03 +00:00
Sanjay Patel	39059d2630	[InstCombine] various clean-ups for commonIDivTransforms; NFC llvm-svn: 324891	2018-02-12 14:14:56 +00:00
Nicholas Wilson	5170b54013	Test commit: reformat comment llvm-svn: 324889	2018-02-12 13:17:09 +00:00
Hans Wennborg	7e19dfc45f	Revert r324835 "[X86] Reduce Store Forward Block issues in HW" It asserts building Chromium; see PR36346. (This also reverts the follow-up r324836.) > If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. > A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. > The estimated penalty for a store forward block is ~13 cycles. > > This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence > of a load and a store. > > The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. > breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. llvm-svn: 324887	2018-02-12 12:43:39 +00:00
Simon Atanasyan	0874cf5e62	[mips] Fix 'l' constraint handling for types smaller than 32 bits In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. This commit is the same as r324869 with fixed the test's file name. llvm-svn: 324885	2018-02-12 12:21:55 +00:00
Simon Atanasyan	dc4ed35ea6	[mips] Revert rL324869 This commit adds inlineasm-cnstrnt-bad-l.ll which is clashing with inlineasm-cnstrnt-bad-L.ll on case insensitive file systems. llvm-svn: 324882	2018-02-12 11:15:37 +00:00
Florian Hahn	e54a20e094	[LoopInterchange] Simplify splitInnerLoopHeader logic (NFC). We can use SplitBlock for both cases, which makes the code slightly simpler and updates both LoopInfo and the dominator tree. llvm-svn: 324881	2018-02-12 11:10:58 +00:00
David Green	6d9f8c9817	[CodeGen] Add a -trap-unreachable option for debugging Add a common -trap-unreachable option, similar to the target specific hexagon equivalent, which has been replaced. This turns unreachable instructions into traps, which is useful for debugging. Differential Revision: https://reviews.llvm.org/D42965 llvm-svn: 324880	2018-02-12 11:06:27 +00:00
Sam McCall	2d8242d60d	[gtest] Support raw_ostream printing functions more comprehensively. Summary: These are functions like operator<<(raw_ostream&, Foo). Previously these were only supported for messages. In the assertion EXPECT_EQ(A, B) << C; the local modifications would explicitly try to use raw_ostream printing for C. However A and B would look for a std::ostream printing function, and often fall back to gtest's default "168 byte object <00 01 FE 42 ...>". This patch pulls out the raw_ostream support into a new header under `custom/`. I changed the mechanism: instead of a convertible stream, we wrap the printed value in a proxy object to allow it to be sent to a std::ostream. I think the new way is clearer. I also changed the policy: we prefer raw_ostream printers over std::ostream ones. This is because the fallback printers are defined using std::ostream, while all the raw_ostream printers should be "good". Reviewers: ilya-biryukov, chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43091 llvm-svn: 324876	2018-02-12 10:20:09 +00:00
Simon Atanasyan	e08f2a19d4	[mips] Fix 'l' constraint handling for types smaller than 32 bits In case of correct using of the 'l' constraint llvm now generates valid code; otherwise it shows an error message. Initially these triggers an assertion. llvm-svn: 324869	2018-02-12 07:51:21 +00:00
Gerolf Hoflehner	bf82e99691	[MC] Issue error message when data region is not terminated llvm-svn: 324868	2018-02-12 07:19:05 +00:00
Max Kazantsev	b57ca09e43	[NFC] Fix typos llvm-svn: 324867	2018-02-12 05:16:28 +00:00
Max Kazantsev	db3a9e0cfe	[SCEV] Make getPostIncExpr guaranteed to return AddRec The current implementation of `getPostIncExpr` invokes `getAddExpr` for two recurrencies and expects that it always returns it a recurrency. But this is not guaranteed to happen if we have reached max recursion depth or refused to make SCEV simplification for other reasons. This patch changes its implementation so that now it always returns SCEVAddRec without relying on `getAddExpr`. Differential Revision: https://reviews.llvm.org/D42953 llvm-svn: 324866	2018-02-12 05:09:38 +00:00
Craig Topper	b424fafa9f	[X86] Don't look for TEST instruction shrinking opportunities when the root node is a X86ISD::SUB. I don't believe we ever create an X86ISD::SUB with a 0 constant which is what the TEST handling needs. The ternary operator at the end of this code shows up as only going one way in the llvm-cov report from the bots. llvm-svn: 324865	2018-02-12 03:02:02 +00:00
Craig Topper	3ccbd3f32f	[X86] Remove check for X86ISD::AND with no flag users from the TEST instruction immediate shrinking code. We turn X86ISD::AND with no flag users back to ISD::AND in PreprocessISelDAG. llvm-svn: 324864	2018-02-12 03:02:01 +00:00
Craig Topper	98ae8f833f	[X86] Change some compare patterns to use loadi8/loadi16/loadi32/loadi64 helper fragments. This enables CMP8mi to fold zextloadi8i1 which in all tests allows us to avoid creating a TEST8rr that peephole can't fold. llvm-svn: 324863	2018-02-12 02:48:42 +00:00
Craig Topper	27d5b6e4a6	[X86] Autogenerate complete checks. NFC llvm-svn: 324862	2018-02-12 02:03:36 +00:00
Craig Topper	3ce035acf3	[X86] Add KADD X86ISD opcode instead of reusing ISD::ADD. ISD::ADD implies individual vector element addition with no carries between elements. But for a vXi1 type that would be the same as XOR. And we already turn ISD::ADD into ISD::XOR for all vXi1 types during lowering. So the ISD::ADD pattern would never be able to match anyway. KADD is different, it adds the elements but also propagates a carry between them. This just a way of doing an add in k-register without bitcasting to the scalar domain. There's still no way to match the pattern, but at least its not obviously wrong. llvm-svn: 324861	2018-02-12 01:33:38 +00:00
Craig Topper	dfc322ddf4	[X86] Allow zextload/extload i1->i8 to be folded into instructions during isel Previously we just emitted this as a MOV8rm which would likely get folded during the peephole pass anyway. This just makes it explicit earlier. The gpr-to-mask.ll test changed because the kaddb instruction has no memory form. llvm-svn: 324860	2018-02-12 01:33:36 +00:00
Charles Saternos	d061dd06e8	Follow on to rL324854 (Added tests) llvm-svn: 324859	2018-02-12 00:20:16 +00:00
Craig Topper	363e099446	[X86] Remove MASK_BINOP intrinsic type. NFC llvm-svn: 324858	2018-02-11 22:32:30 +00:00
Craig Topper	38d61c38a2	[X86] Remove dead code from getMaskNode that looked for a i64 mask with a maskVT that wasn't v64i1. NFC llvm-svn: 324857	2018-02-11 22:32:29 +00:00
Craig Topper	a7ac028a6b	[X86] Remove LowerBoolVSETCC_AVX512, we get this with a target independent DAG combine now. NFC llvm-svn: 324856	2018-02-11 22:32:27 +00:00
Charles Saternos	d3e7d19f59	[ThinLTO] Add GraphTraits for FunctionSummaries Add GraphTraits definitions to the FunctionSummary and ModuleSummaryIndex classes. These GraphTraits will be used to construct find SCC's in ThinLTO analysis passes. llvm-svn: 324854	2018-02-11 22:06:20 +00:00
Brock Wyma	19e17b3970	[CodeView] Allow variable names to be as long as the codeview format supports Instead of reserving 0xF00 bytes for the fixed length portion of the CodeView symbol name, calculate the actual length of the fixed length portion. Differential Revision: https://reviews.llvm.org/D42125 llvm-svn: 324850	2018-02-11 21:26:46 +00:00
Craig Topper	3a354152dd	[X86] Update some required-vector-width.ll test cases to not pass 512-bit vectors in arguments or return. ABI for these would require 512 bits support so we don't want to test that. llvm-svn: 324845	2018-02-11 18:52:16 +00:00
Simon Pilgrim	0d8c4bfc2a	[X86][SSE] Use SplitBinaryOpsAndApply to recognise PSUBUS patterns before they're split on AVX1 This needs to be generalised further to support AVX512BW cases but I want to add non-uniform constants first. llvm-svn: 324844	2018-02-11 17:29:42 +00:00
Sanjay Patel	510d647a4d	[InstCombine] X / (X * Y) -> 1 / Y if the multiplication does not overflow The related cases for (X * Y) / X were handled in rL124487. https://rise4fun.com/Alive/6k9 The division in these tests is subsequently eliminated by existing instcombines for 1/X. llvm-svn: 324843	2018-02-11 17:20:32 +00:00
Craig Topper	ca5a340171	[X86] Use min/max for vector ult/ugt compares if avoids a sign flip. Summary: Currently we only use min/max to help with ule/uge compares because it removes an invert of the result that would otherwise be needed. But we can also use it for ult/ugt compares if it will prevent the need for a sign bit flip needed to use pcmpgt at the cost of requiring an invert after the compare. I also refactored the code so that the max/min code is self contained and does its own return instead of setting up a flag to manipulate the rest of the function's behavior. Most of the test cases look ok with this. I did notice that we added instructions when one of the operands being sign flipped is a constant vector that we were able to constant fold the flip into. I also noticed that sometimes the SSE min/max clobbers a register that is needed after the compare. This resulted in an extra move being inserted before the min/max to preserve the register. We could try to detect this and switch from min to max and change the compare operands to use the operand that gets reused in the compare. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42935 llvm-svn: 324842	2018-02-11 17:11:40 +00:00
Simon Pilgrim	c2544c572a	[X86][SSE] Moved SplitBinaryOpsAndApply earlier so more methods can use it. NFCI. llvm-svn: 324841	2018-02-11 17:01:43 +00:00
Sanjay Patel	aee107f30d	[InstCombine] add tests for div-mul folds; NFC The related cases for (X * Y) / X were handled in rL124487. llvm-svn: 324840	2018-02-11 16:52:44 +00:00
Sanjay Patel	eb8c408e50	[TargetLowering] try to create -1 constant operand for math ops via demanded bits This reverses instcombine's demanded bits' transform which always tries to clear bits in constants. As noted in PR35792 and shown in the test diffs: https://bugs.llvm.org/show_bug.cgi?id=35792 ...we can do better in codegen by trying to form -1. The x86 sub test shows a missed opportunity. I did investigate changing instcombine's behavior, but it would be more work to change canonicalization in IR. Clearing bits / shrinking constants can allow killing instructions, so we'd have to figure out how to not regress those cases. Differential Revision: https://reviews.llvm.org/D42986 llvm-svn: 324839	2018-02-11 14:38:23 +00:00
Simon Pilgrim	7630150222	[X86] Add PR33747 test case llvm-svn: 324838	2018-02-11 13:12:50 +00:00
Simon Pilgrim	0be5567a89	[X86][SSE] Enable SMIN/SMAX/UMIN/UMAX custom lowering for all legal types This allows us to recognise more saturation patterns and also simplify some MINMAX codegen that was failing to combine CMPGE comparisons to a legal CMPGT. Differential Revision: https://reviews.llvm.org/D43014 llvm-svn: 324837	2018-02-11 10:52:37 +00:00
Lama Saba	91e2b9d081	fix test/CodeGen/X86/fixup-sfb.ll test failure after commit https://reviews.llvm.org/rL324835 Change-Id: I2526c2f342654e85ce054237de03ae9db9ab4994 llvm-svn: 324836	2018-02-11 10:33:06 +00:00
Lama Saba	c2ba6c387e	[X86] Reduce Store Forward Block issues in HW If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Change-Id: I620b6dc91583ad9a1444591e3ddc00dd25d81748 llvm-svn: 324835	2018-02-11 09:34:12 +00:00
Craig Topper	24d3b28d93	[X86] Don't make 512-bit vectors legal when preferred vector width is 256 bits and 512 bits aren't required This patch adds a new function attribute "required-vector-width" that can be set by the frontend to indicate the maximum vector width present in the original source code. The idea is that this would be set based on ABI requirements, intrinsics or explicit vector types being used, maybe simd pragmas, etc. The backend will then use this information to determine if its save to make 512-bit vectors illegal when the preference is for 256-bit vectors. For code that has no vectors in it originally and only get vectors through the loop and slp vectorizers this allows us to generate code largely similar to our AVX2 only output while still enabling AVX512 features like mask registers and gather/scatter. The loop vectorizer doesn't always obey TTI and will create oversized vectors with the expectation the backend will legalize it. In order to avoid changing the vectorizer and potentially harm our AVX2 codegen this patch tries to make the legalizer behavior similar. This is restricted to CPUs that support AVX512F and AVX512VL so that we have good fallback options to use 128 and 256-bit vectors and still get masking. I've qualified every place I could find in X86ISelLowering.cpp and added tests cases for many of them with 2 different values for the attribute to see the codegen differences. We still need to do frontend work for the attribute and teach the inliner how to merge it, etc. But this gets the codegen layer ready for it. Differential Revision: https://reviews.llvm.org/D42724 llvm-svn: 324834	2018-02-11 08:06:27 +00:00
Craig Topper	a4bf9b8d51	[X86] Remove setOperationAction lines for promoting vXi1 SINT_TO_FP/UINT_TO_FP. We promote these via a DAG combine now before lowering gets the chance. Also remove the v2i1 custom handling since it will no longer be triggered. llvm-svn: 324833	2018-02-11 07:44:33 +00:00
Craig Topper	36f913ee80	[SelectionDAG] Remove TargetLowering::getConstTrueVal. Use SelectionDAG::getBoolConstant in the one place it was used. SelectionDAG::getBoolConstant was recently introduced. At the time I didn't know getConstTrueVal existed, but I think getBoolConstant is better as it will use the source VT to make sure it can properly detect floating point if it is configured differently. llvm-svn: 324832	2018-02-11 04:58:58 +00:00
Craig Topper	ba5ad55965	[X86] Remove some redundant qualifications from the setOperationAction blocks. NFC These were added as part of the refactoring for prefer vector width. At the time I thought the hasAVX512 here would be replaced with "allow 512 bit vectors" so that it would read "allow 512 bit vectors OR VLX". But now the plan is to only give the option of disabling 512 bit vectors when VLX is enabled. So we don't need this qualification at all llvm-svn: 324831	2018-02-11 03:07:19 +00:00
Simon Pilgrim	d229bfd20d	[X86][SSE] Add SMIN/SMAX combine test As discussed on D43014, we need the ability to flip SMIN/SMAX to (legal) UMIN/UMAX llvm-svn: 324829	2018-02-10 23:38:50 +00:00
Craig Topper	4dccffc84a	[X86] Change signatures of avx512 packed fp compare intrinsics to return a vXi1 mask type to be closer to an fcmp. Summary: This patch changes the signature of the avx512 packed fp compare intrinsics to return a vXi1 vector and no longer take a mask as input. The casts to scalar type will now need to be explicit in the IR. The masking node will now be an explicit and in the IR. This makes the intrinsic look much more similar to an fcmp instruction that we wish we could use for these but can't. We already use icmp instructions for integer compares. Previously the lowering step of isel would turn the intrinsic into an X86 specific ISD node and a emit the masking nodes as well as some bitcasts. This means DAG combines can't see the vXi1 type until somewhat late, making it more difficult to combine out gpr<->mask transition sequences. By exposing the vXi1 type explicitly in the IR and initial SelectionDAG we give earlier DAG combines and even InstCombine the chance to see it and optimize it. This should make any issues with gpr<->mask sequences the same between integer and fp. Meaning we only have to fix them once. Reviewers: spatel, delena, RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43137 llvm-svn: 324827	2018-02-10 23:33:55 +00:00
Simon Pilgrim	b781b2e24c	[X86][SSE] Add UMIN/UMAX combine test As discussed on D43014, we need the ability to flip UMIN/UMAX to (legal) SMIN/SMAX llvm-svn: 324826	2018-02-10 22:27:35 +00:00
Simon Pilgrim	19495198af	[InstCombine] Add constant vector support for ~(C >> Y) --> ~C >> Y Includes adding m_NonNegative constant pattern matcher llvm-svn: 324825	2018-02-10 21:46:09 +00:00
Simon Pilgrim	cb9a02f60e	[X86][SSE] Increase PMULLD costs to better match hardware Until Skylake, most hardware could only issue a PMULLD op every other cycle llvm-svn: 324823	2018-02-10 19:27:10 +00:00
Craig Topper	9121eb575e	[X86] Custom legalize (v2i32 (setcc (v2f32))) so that we don't end up with a (v4i1 (setcc (v4f32))) Undef VLX, getSetCCResultType returns v2i1/v4i1 for v2f32/v4f32 so default type legalization will end up changing the setcc result type back to vXi1 if it had been extended. The resulting extend gets messed up further by type legalization and is difficult to recombine back to (v4i32 (setcc (v4f32))) after legalization. I went ahead and enabled this for SSE2 and later since its always the result we want and this helps type legalization get there in less steps. llvm-svn: 324822	2018-02-10 19:12:58 +00:00

1 2 3 4 5 ...

160074 Commits