llvm-project

Commit Graph

Author	SHA1	Message	Date
Alexander Ivchenko	5b8418983c	[NFC] Fix unused variable warning in X86RegisterBankInfo.cpp llvm-svn: 341198	2018-08-31 10:39:54 +00:00
Alexander Ivchenko	a26a364e75	[GlobalIsel][X86] Support for G_FCMP Differential Revision: https://reviews.llvm.org/D49172 llvm-svn: 341193	2018-08-31 09:38:27 +00:00
Andrea Di Biagio	b998eae2f2	[X86][BtVer2] Fix WriteFShuffle256 schedule write info. This patch fixes the number of micro opcodes, and processor resource cycles for the following AVX instructions: vinsertf128rr/rm vperm2f128rr/rm vbroadcastf128 Tests have been regenerated using the usual scripts in the llvm/utils directory. Differential Revision: https://reviews.llvm.org/D51492 llvm-svn: 341185	2018-08-31 08:30:47 +00:00
Martin Storsjo	f010872b5c	[MinGW] [X86] Pass true for the second parameter to StubValueTy for MO_COFFSTUB. NFC. These stubs should never be emitted for internal symbols, and nothing in AsmPrinter ever actually use this value when producing the stubs for COFF anyway. llvm-svn: 341177	2018-08-31 08:00:31 +00:00
Craig Topper	83e9f928ba	[X86] Don't do anything in ReplaceNodeResults for (v2i32 (fptoui/fptosi v2f32)) when -x86-experimental-vector-widening-legalization is on. We don't need to do our own widening, the generic legalizer can do it. llvm-svn: 341174	2018-08-31 07:05:39 +00:00
Craig Topper	e9af89a78b	[X86] Don't custom widen (v2i32 (setcc v2f32)) when -x86-experimental-vector-widening-legalization is in effect. We aren't doing anything than what the generic legalizer will do so just let it do it. llvm-svn: 341172	2018-08-31 07:05:37 +00:00
Craig Topper	1a8c99e670	[X86] Weaken an overly aggressive assert. This assert tried to check that AND constants are only on the RHS. But its possible for both operands to be constants if one is opaque which will prevent the AND from being constant folded. Fixes PR38771 llvm-svn: 341102	2018-08-30 19:35:38 +00:00
Alexander Ivchenko	af96112ec6	Make TargetInstrInfo::isCopyInstr return true for regular COPY-instructions ..Move all target-dependent checks into new isCopyInstrImpl method. This change allows us to treat MoveReg-type instructions and generic COPY instruction in the same way Differential Revision: https://reviews.llvm.org/D49913 llvm-svn: 341072	2018-08-30 14:32:47 +00:00
Andrew V. Tischenko	62f7a3207b	[X86] Improved sched model for X86 CMPXCHG* instructions. Differential Revision: https://reviews.llvm.org/D50070 llvm-svn: 341024	2018-08-30 06:26:00 +00:00
Craig Topper	b7b353be60	[X86] Make Feature64Bit useful We now only add +64bit to the CPU string for "generic" CPU. All other CPU names are assumed to have the feature flag already set if they support 64-bit. I've remove the implies from CMPXCHG8 so that Feature64Bit only comes in via CPUs or user passing -mattr=+64bit. I've changed the assert to a report_fatal_error so it's not lost in Release builds. The test updates are to fix things that tripped the new error. Differential Revision: https://reviews.llvm.org/D51231 llvm-svn: 341022	2018-08-30 06:01:05 +00:00
Martin Storsjo	489993db94	[MinGW] [X86] Add stubs for references to data variables that might end up imported from a dll Variables declared with the dllimport attribute are accessed via a stub variable named __imp_<var>. In MinGW configurations, variables that aren't declared with a dllimport attribute might still end up imported from another DLL with runtime pseudo relocs. For x86_64, this avoids the risk that the target is out of range for a 32 bit PC relative reference, in case the target DLL is loaded further than 4 GB from the reference. It also avoids having to make the text section writable at runtime when doing the runtime fixups, which makes it worthwhile to do for i386 as well. Add stub variables for all dso local data references where a definition of the variable isn't visible within the module, since the DLL data autoimporting might make them imported even though they are marked as dso local within LLVM. Don't do this for variables that actually are defined within the same module, since we then know for sure that it actually is dso local. Don't do this for references to functions, since there's no need for runtime pseudo relocations for autoimporting them; if a function from a different DLL is called without the appropriate dllimport attribute, the call just gets routed via a thunk instead. GCC does something similar since 4.9 (when compiling with -mcmodel=medium or large; from that version, medium is the default code model for x86_64 mingw), but only for x86_64. Differential Revision: https://reviews.llvm.org/D51288 llvm-svn: 340942	2018-08-29 17:28:34 +00:00
Simon Pilgrim	6b9bf7ecbc	[X86][AVX] Prefer VPBLENDW+VPBLENDD to VPBLENDVB for v16i16 blend shuffles Noticed while looking at D49562 codegen - we can avoid a large constant mask load and a slow VPBLENDVB select op by using VPBLENDW+VPBLENDD instead. TODO: As discussed on the patch, we should investigate adding VPBLENDVB handling to target shuffle combining as well, that will allow us to extend this to VPBLENDW+VPBLENDW+VPBLENDD. Differential Revision: https://reviews.llvm.org/D50074 llvm-svn: 340913	2018-08-29 10:51:08 +00:00
Craig Topper	9401fd0ed2	[X86] Add intrinsics for KADD instructions These are intrinsics for supporting kadd builtins in clang. These builtins are already in gcc to implement intrinsics from icc. Though they are missing from the Intel Intrinsics Guide. This instruction adds two mask registers together as if they were scalar rather than a vXi1. We might be able to get away with a bitcast to scalar and a normal add instruction, but that would require DAG combine smarts in the backend to recoqnize add+bitcast. For now I'd prefer to go with the easiest implementation so we can get these builtins in to clang with good codegen. Differential Revision: https://reviews.llvm.org/D51370 llvm-svn: 340869	2018-08-28 19:22:55 +00:00
Craig Topper	c73095e264	[X86] Mark the FUCOMI instructions as requiring CMOV to be enabled. NFCI These instructions were added on the PentiumPro along with CMOV. This was already comprehended by the lowering process which should emit an alternate sequence using FCOM and FNSTW. This just makes it an explicit error if that doesn't work for some reason. llvm-svn: 340844	2018-08-28 17:17:13 +00:00
Simon Pilgrim	af98587095	[X86][SSE] Improve variable scalar shift of vXi8 vectors (PR34694) This patch creates the shift mask and actual shift using the vXi16 vector shift ops. Differential Revision: https://reviews.llvm.org/D51263 llvm-svn: 340813	2018-08-28 10:37:29 +00:00
Simon Pilgrim	f119e27d80	[X86][SSE] Avoid vector extraction/insertion for non-constant uniform shifts As discussed on D51263, we're better off using byte shifts to clear the upper bits on pre-SSE41 hardware. llvm-svn: 340810	2018-08-28 10:14:09 +00:00
Craig Topper	c1436db753	[X86] Fix some comments to refer to KORTEST not KTEST. NFC KTEST is a different instruction. All of this code uses KORTEST. llvm-svn: 340799	2018-08-28 06:39:35 +00:00
Craig Topper	4be11c0585	[X86] When lowering v32i8 MULHS/MULHU, shuffle after the PACKUS rather than before. We're using a 256-bit PACKUS to do the truncation, but that instruction operates on 128-bit lanes. So previously we shuffled first to rearrange the lanes. But that requires 2 shuffles. Instead we can shuffle after the PACKUS using a single VPERMQ. This matches what our normal LowerTRUNCATE code does when it uses PACKUS. Differential Revision: https://reviews.llvm.org/D51284 llvm-svn: 340757	2018-08-27 17:20:41 +00:00
Craig Topper	fff90377fd	[X86] Add support for matching paddus patterns where one of the vectors is a constant. InstCombine mucks these up a bit. So we need to do some additional pattern matching to fix it. There are a still a few special cases not handled, but this covers the general case. Differential Revision: https://reviews.llvm.org/D50952 llvm-svn: 340756	2018-08-27 17:20:38 +00:00
Craig Topper	73ed2a2a6b	[X86] Cleanup the LowerMULH code by hoisting some commonalities between the vXi32 and vXi8 handling. NFCI vXi32 support was recently moved from LowerMUL_LOHI to LowerMULH. This commit shares the getOperand calls, switches both to use common IsSigned flag, and hoists the NumElems/NumElts variable. llvm-svn: 340720	2018-08-27 06:35:02 +00:00
Craig Topper	a72012c206	[X86] Correct the cost of (v4i32 (fptoui (v4f64))) under AVX512F. Summary: This was inheriting the cost from the AVX table, but should be legal under AVX512. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51267 llvm-svn: 340708	2018-08-26 18:47:44 +00:00
Craig Topper	128915f4ae	[X86] Add FeatureCMOV explicitly to all CPUs that support it. Remove FeatureCMOV implication from Feature64Bit and FeatureSSE1 Summary: Previously most CPUs inherited cmov support through Feature64Bit(or FeatureCMPXCHG16HB implying Feature64Bit) or FeatureSSE1. This has the surprising side effect that -mattr=-cmov causes an assert to fire in 64-bit mode because it clears the Feature64Bit. Or in 32-bit mode, -mattr=-cmov disables any sse/avx features which seems surprising. This patch removes the implication and instead updates hasCMOV in X86Subtarget to check SSE1 or is64Bit in addition to the regular cmov flag. This should keep most things working the way they did before. I don't believe there is a way to specific "-cmov" directly from clang so this should only effect our lower level tools. This does stop -mattr=cx16(cmpxchg16b) from implying cmov is enabled via the 64bit flag as you can see from one of the changed tests. But that was a 32-bit test so I don't know why it enabled cx16 anyway. For the other test I had to add -sse to override the new sse check in hasCMOV. Reviewers: RKSimon, DavidKreitzer, spatel Reviewed By: RKSimon Subscribers: llvm-commits, jfb Differential Revision: https://reviews.llvm.org/D51228 llvm-svn: 340707	2018-08-26 18:29:33 +00:00
Craig Topper	b68a78b9ac	[X86] Add FeatureCMOV to athlon and athlon-tbird cpus. Summary: This matches gcc and one cpuid dump I found online. Given that these are considered 7th generation x86 CPU it seems likely they support cmov since cmov was added by Intel in their 6th generation. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51264 llvm-svn: 340706	2018-08-26 18:29:27 +00:00
Sanjay Patel	113cac3b15	[SelectionDAG][x86] turn insertelement into undef with variable index into splat I noticed this along with the patterns in D51125, but when the index is variable, we don't convert insertelement into a build_vector. For x86, that means these get expanded at legalization time into the loading/spilling code that we see in the tests. I think it's always better to avoid going to memory on these, and we get the optimal 'broadcast' if it's available. I suspect other targets may want to look at enabling the hook. AArch64 and AMDGPU have regression tests that would be affected (although I did not check what would happen in those cases). In the most basic cases shown here, AArch64 would probably do much better with a splat. Differential Revision: https://reviews.llvm.org/D51186 llvm-svn: 340705	2018-08-26 18:20:41 +00:00
Craig Topper	4240ecb909	[X86] Fix typo in comment, expect->except. NFC llvm-svn: 340695	2018-08-26 03:43:23 +00:00
Craig Topper	ebec2793d1	[X86] Replace support for vXi32 SMUL_LOHI/UMUL_LOHI with MULHS/MULHU support instead. Summary: The only time vector SMUL_LOHI/UMUL_LOHI nodes are created is during division/remainder lowering. If its created before op legalization, generic DAGCombine immediately turns that SMUL_LOHI/UMUL_LOHI into a MULHS/MULHU since only the upper half is used. That node will stick around through vector op legalization and will be turned back into UMUL_LOHI/SMUL_LOHI during op legalization. It will then be custom lowered by the X86 backend. Due to this two step lowering the vector shuffles created by the custom lowering get legalized after their inputs rather than before. This prevents the shuffles from being combined with any build_vector of constants. This patch uses changes vXi32 to use MULHS/MULHU instead. This is what the later DAG combine did anyway. But by skipping the change back to UMUL_LOHI/SMUL_LOHI we lower it before any constant BUILD_VECTORS. This allows the vector_shuffle creation to constant fold with the build_vectors. This accounts for the test changes here. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D51254 llvm-svn: 340690	2018-08-25 18:01:24 +00:00
Craig Topper	a11a3b3818	[SelectionDAG][X86] Reorder the operands the MaskedStoreSDNode to put the value first. Summary: Previously the value being stored is the last operand in SDNode. This causes the type legalizer to visit the mask operand before the value operand. The type legalizer was more complicated because of this since we want the type of the value to drive the decisions. This patch moves the value to be the first operand so we visit it first during type legalization. It also simplifies the type legalization code accordingly. X86 is currently the only in tree target that uses this SDNode. Not sure if there are any users out of tree. Reviewers: RKSimon, delena, hfinkel, eli.friedman Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50402 llvm-svn: 340689	2018-08-25 17:48:17 +00:00
Craig Topper	bce8680605	[X86] Make sure type is a vector before calling VT.getVectorNumElements() in combineLoopMAddPattern Fixes PR38700. llvm-svn: 340688	2018-08-25 17:23:43 +00:00
Sanjay Patel	8a84c747d2	[x86] try harder to use broadcast to load a scalar into vector reg This is a preliminary step for a preliminary step for D50992. I noticed that x86 often misses chances to load a scalar directly into a vector register. So this patch is just allowing more of those cases to match a broadcast op in lowerBuildVectorAsBroadcast(). The old code comment said it doesn't make sense to use a broadcast when we're loading a single element and everything else is undef, but I think that's the best case in the improved tests in insert-loaded-scalar.ll. We avoid scalar-to-vector-register move and/or less efficient shuffling. Note that there are some existing types that were already producing a broadcast, but that happens semi-accidentally. Ie, it's not happening as part of lowerBuildVectorAsBroadcast(). The build vector gets expanded into load + shuffle, and then shuffle lowering produces the broadcast. Description of the other test diffs: 1. avx-basic.ll - replacing load+shufle is a win. 2. sse3-avx-addsub-2.ll - vmovddup vs. vbroadcastss is neutral 3. sse41.ll - don't care - we convert that intrinsic to generic IR now, so this test is deprecated 4. vector-shuffle-128-v8.ll / vector-shuffle-256-v16.ll - pshufb alternatives with an extra instruction are not obviously bad Differential Revision: https://reviews.llvm.org/D51125 llvm-svn: 340685	2018-08-25 14:56:05 +00:00
Craig Topper	4058e29e7d	[X86] Teach combineLoopMAddPattern to handle cases where there is no loop and the add has two multiply inputs Differential Revision: https://reviews.llvm.org/D50868 llvm-svn: 340631	2018-08-24 18:05:04 +00:00
Joel Galenson	c6f6c17c9b	Add missing override keyword (NFC) llvm-svn: 340615	2018-08-24 16:15:44 +00:00
Joel Galenson	d36fb48a27	Find PLT entries for x86, x86_64, and AArch64. This adds a new method to ELFObjectFileBase that returns the symbols and addresses of PLT entries. This design was suggested by pcc and eugenis in https://reviews.llvm.org/D49383. Differential Revision: https://reviews.llvm.org/D50203 llvm-svn: 340610	2018-08-24 15:21:56 +00:00
Sanjay Patel	40aa86751a	[x86] add debug option for and-immediate shrinking The commit that added this functionality: rL322957 may be causing/exposing a miscompile in PR38648: https://bugs.llvm.org/show_bug.cgi?id=38648 so allow enabling/disabling to make debugging easier. llvm-svn: 340540	2018-08-23 15:58:07 +00:00
Chandler Carruth	ae0cafece8	[x86/retpoline] Split the LLVM concept of retpolines into separate subtarget features for indirect calls and indirect branches. This is in preparation for enabling only the call retpolines when using speculative load hardening. I've continued to use subtarget features for now as they continue to seem the best fit given the lack of other retpoline like constructs so far. The LLVM side is pretty simple. I'd like to eventually get rid of the old feature, but not sure what backwards compatibility issues that will cause. This does remove the "implies" from requesting an external thunk. This always seemed somewhat questionable and is now clearly not desirable -- you specify a thunk the same way no matter which set of things are getting retpolines. I really want to keep this nicely isolated from end users and just an LLVM implementation detail, so I've moved the `-mretpoline` flag in Clang to no longer rely on a specific subtarget feature by that name and instead to be directly handled. In some ways this is simpler, but in order to preserve existing behavior I've had to add some fallback code so that users who relied on merely passing -mretpoline-external-thunk continue to get the same behavior. We should eventually remove this I suspect (we have never tested that it works!) but I've not done that in this patch. Differential Revision: https://reviews.llvm.org/D51150 llvm-svn: 340515	2018-08-23 06:06:38 +00:00
Craig Topper	cf9df99d79	[X86] Teach combineLoopSADPattern to handle cases where there is no loop and the add has two absolute difference inputs Previously we asumed a vector reduction add is part of a loop and one of the input is a phi. But the code in SelectionDAGBuilder that sets vector reduction flag handles more cases than that. It just requires that the use chain ends in a horizontal reduction. And there are no other uses. This means it can handle unrolled reduction loops. If the initial value of the reduction was 0, an unrolled loop would begin with a vector reduction add that has two sad inputs. Previously we would only transform one side of the add, but for this case we need to transform both sides. I've created a lambda to reuse some of the code for both sides. And fixed the variables names to remove reference to "phi". Differential Revision: https://reviews.llvm.org/D50817 llvm-svn: 340478	2018-08-22 23:19:01 +00:00
Craig Topper	538f8ab438	[X86] Replace (32/64 - n) shift amounts with (neg n) since the shift amount is masked in hardware Inspired by what AArch64 does for shifts, this patch attempts to replace shift amounts with neg if we can. This is done directly as part of isel so its as late as possible to avoid breaking some BZHI patterns since those patterns need an unmasked (32-n) to be correct. To avoid manual load folding and custom instruction selection for the negate. I've inserted new nodes in the DAG above the shift node in topological order. Differential Revision: https://reviews.llvm.org/D48789 llvm-svn: 340441	2018-08-22 19:39:09 +00:00
Craig Topper	87f78cfe15	[X86] In OptimizeLEAs pass, check that the key is in the LEAs map before accessing When the key is not already in the map, the access operator[] creates an empty value and grows the map. Resizing a map is very slow, so this needs to be avoided. Found with csmith + asserts. May help with https://bugs.llvm.org/show_bug.cgi?id=25843 Patch by Tom Rix. Differential Revision: https://reviews.llvm.org/D50780 llvm-svn: 340434	2018-08-22 18:24:13 +00:00
Simon Pilgrim	ffdfe45645	[X86][SSE] LowerMULH vXi8 - use SSE shifts directly. We know these vXi16 extended cases are legal constant splat shifts. llvm-svn: 340414	2018-08-22 15:37:11 +00:00
David Green	9dd1d451d9	[AArch64] Add Tiny Code Model for AArch64 This adds the plumbing for the Tiny code model for the AArch64 backend. This, instead of loading addresses through the normal ADRP;ADD pair used in the Small model, uses a single ADR. The 21 bit range of an ADR means that the code and its statically defined symbols need to be within 1MB of each other. This makes it mostly interesting for embedded applications where we want to fit as much as we can in as small a space as possible. Differential Revision: https://reviews.llvm.org/D49673 llvm-svn: 340397	2018-08-22 11:31:39 +00:00
Heejin Ahn	ed5e06b0a7	[WebAssembly] Add isEHScopeReturn instruction property Summary: So far, `isReturn` property is used to mean both a return instruction from a functon and the end of an EH scope, a scope that starts with a EH scope entry BB and ends with a catchret or a cleanupret instruction. Because WinEH uses funclets, all EH-scope-ending instructions are also real return instruction from a function. But for wasm, they only serve as the end marker of an EH scope but not a return instruction that exits a function. This mismatch caused incorrect prolog and epilog generation in wasm EH scopes. This patch fixes this. This patch is in the same vein with rL333045, which splits `MachineBasicBlock::isEHFuncletEntry` into `isEHFuncletEntry` and `isEHScopeEntry`. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D50653 llvm-svn: 340325	2018-08-21 19:44:11 +00:00
Simon Pilgrim	50eba6b380	[X86][SSE] Lower vXi8 general shifts to SSE shifts directly. NFCI. Most of these shifts are extended to vXi16 so we don't gain anything from forcing another round of generic shift lowering - we know these extended cases are legal constant splat shifts. llvm-svn: 340307	2018-08-21 17:27:03 +00:00
Simon Pilgrim	98eb4ae499	[X86][SSE] Lower v8i16 general shifts to SSE shifts directly. NFCI. We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts. llvm-svn: 340302	2018-08-21 17:05:07 +00:00
Simon Pilgrim	dbe4e9e3ff	[X86][SSE] Lower directly to SSE shifts in the BLEND(SHIFT, SHIFT) combine. NFCI. We don't gain anything from forcing another round of generic shift lowering - we know these are legal constant splat shifts. llvm-svn: 340300	2018-08-21 16:46:48 +00:00
Simon Pilgrim	5a83a1fd13	[X86][SSE] Add helper function to convert to/between the SSE vector shift opcodes. NFCI. Also remove some more getOpcode calls from LowerShift when we already have Opc. llvm-svn: 340290	2018-08-21 15:57:33 +00:00
Craig Topper	210ccfe3db	[X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG. This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle. Fixes PR38639 Differential Revision: https://reviews.llvm.org/D50981 llvm-svn: 340214	2018-08-20 21:08:35 +00:00
Craig Topper	7dcb2c4b0a	[X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector. Differential Revision: https://reviews.llvm.org/D50878 llvm-svn: 340212	2018-08-20 20:57:35 +00:00
Craig Topper	803912ea57	[X86] Fix an issue in the matching for ADDUS. We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add. This rewrites the canonicalization to just be based on the condition code. llvm-svn: 340134	2018-08-19 04:26:31 +00:00
Craig Topper	2b03df9b05	[X86] Use SDValue::operator== instead of DAG.isEqualTo in strictly integer matching. isEqualTo is more useful for floating point. operator== is sufficient for integer. llvm-svn: 340130	2018-08-18 19:16:56 +00:00
Craig Topper	3e299d896f	[X86] Simplify the PADDUS legality check in combineSelect to match PSUBUS. NFC While there remove some trailing whitespace. llvm-svn: 340129	2018-08-18 18:51:04 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00

1 2 3 4 5 ...

17652 Commits