llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	c362f42b6a	[X86][Znver1] Remove InstRWs for BLENDVPS/PD Summary: This removes the InstRWs for BLENDVPS/PD in favor of WriteFVarBlend. The latency listed was 3 cycles but WriteFVarBlend is defined as 1 cycle latency. The 1 cycle latency matches Agner Fog's data. The patterns were missing the VEX forms which is why there are no test changes. We don't test "-mcpu=znver1 -mattr=-avx" Reviewers: RKSimon, GGanesh Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44841 llvm-svn: 329538	2018-04-08 17:53:15 +00:00
Simon Pilgrim	bf2df1e26c	[X86] Regenerate and + immediate mask tests Added i686 checks llvm-svn: 329529	2018-04-08 12:31:52 +00:00
Simon Pilgrim	44374cf7b0	[X86][PKU] Regenerate rdpkru/wrpkru intrinsic tests Added i686 checks llvm-svn: 329528	2018-04-08 12:30:30 +00:00
Simon Pilgrim	14df0ae8d2	[X86][SSE3] Regenerate mwait/monitor intrinsic tests Added i686 checks llvm-svn: 329527	2018-04-08 12:29:11 +00:00
Zvi Rackover	7a53f169f1	DAGCombiner: Combine SDIV with non-splat vector pow2 divisor Summary: Extend existing SDIV combine for pow2 constant divider to handle non-splat vectors of pow2 constants. Reviewers: RKSimon, craig.topper, spatel, hfinkel, efriedma Reviewed By: RKSimon Subscribers: magabari, llvm-commits Differential Revision: https://reviews.llvm.org/D42479 llvm-svn: 329525	2018-04-08 11:35:20 +00:00
Simon Pilgrim	86588fc809	[X86][Btver2] Add vector extract costs llvm-svn: 329524	2018-04-08 11:26:26 +00:00
Guozhi Wei	0eb86c8efc	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst)) In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 329516	2018-04-07 23:36:10 +00:00
Simon Pilgrim	d6981b1d37	[X86] Regenerate atom pshufb test llvm-svn: 329511	2018-04-07 19:50:09 +00:00
Craig Topper	ef37aebc96	[X86] Combine vXi64 multiplies to MULDQ/MULUDQ during DAG combine instead of lowering. Previously we used a custom lowering for this because of the AVX1 splitting requirement. But we can do the split during DAG combine if we check the types and subtarget llvm-svn: 329510	2018-04-07 19:09:52 +00:00
Craig Topper	5b95eae1c3	[DAGCombiner] Add a combine to turn a build vector of zero extends of extract vector elts into a vector zero extend and possibly an extract subvector. llvm-svn: 329509	2018-04-07 19:09:50 +00:00
Tim Northover	e25e458d52	Reapply ARM: Do not spill CSR to stack on entry to noreturn functions Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494	2018-04-07 10:57:03 +00:00
Vitaly Buka	de5f196530	Revert "ARM: Do not spill CSR to stack on entry to noreturn functions" Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486	2018-04-07 05:36:44 +00:00
Matt Davis	13b8331054	[StackProtector] Ignore certain intrinsics when calculating sspstrong heuristic. Summary: The 'strong' StackProtector heuristic takes into consideration call instructions. Certain intrinsics, such as lifetime.start, can cause the StackProtector to protect functions that do not need to be protected. Specifically, a volatile variable, (not optimized away), but belonging to a stack allocation will encourage a llvm.lifetime.start to be inserted during compilation. Because that intrinsic is a 'call' the strong StackProtector will see that the alloca'd variable is being passed to a call instruction, and insert a stack protector. In this case the intrinsic isn't really lowered to a call. This can cause unnecessary stack checking, at the cost of additional (wasted) CPU cycles. In the future we should rely on TargetTransformInfo::isLoweredToCall, but as of now that routine considers all intrinsics as not being lowerable. That needs to be corrected, and such a change is on my list of things to get moving on. As a side note, the updated stack-protector-dbginfo.ll test always seems to pass. I never see the dbg.declare/dbg.value reaching the StackProtector::HasAddressTaken, but I don't see any code excluding dbg intrinsic calls either, so I think it's the safest thing to do. Reviewers: void, timshen Reviewed By: timshen Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45331 llvm-svn: 329450	2018-04-06 20:14:13 +00:00
Craig Topper	f0d042619b	[X86] Attempt to model basic arithmetic instructions in the Haswell/Broadwell/Skylake scheduler models without InstRWs Summary: This patch removes InstRW overrides for basic arithmetic/logic instructions. To do this I've added the store address port to RMW. And used a WriteSequence to make the latency additive. It does not cover ADC/SBB because they have different latency. Apparently we were inconsistent about whether the store has latency or not thus the test changes. I've also left out Sandy Bridge because the load latency there is currently 4 cycles and should be 5. Reviewers: RKSimon, andreadb Reviewed By: andreadb Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45351 llvm-svn: 329416	2018-04-06 16:16:48 +00:00
Simon Pilgrim	09eeb3a8b9	[X86][SandyBridge] Add (V)DPPS memory fold latencies Noticed this during D44654 llvm-svn: 329389	2018-04-06 11:25:21 +00:00
Simon Pilgrim	8a83f16ccd	[X86][SandyBridge] SBWriteResPair +5cy Memory Folds As mentioned on D44647, this patch increases the default memory latency to +5cy , which more closely matches what most custom cases are doing for reg-mem instructions. I've bumped LoadLatency, ReadAfterLd and WriteLoad values to 5cy to be consistent. As Sandy Bridge is currently our default generic model, this affects a lot of scheduling tests... Differential Revision: https://reviews.llvm.org/D44654 llvm-svn: 329388	2018-04-06 11:00:51 +00:00
Zvi Rackover	78a065ff16	X86 Tests: Add a case for combining sdiv by a splatted pow2 negative. NFC. Noticed test was missing while working on D42479. llvm-svn: 329356	2018-04-05 21:57:20 +00:00
Craig Topper	fbe3132f67	[X86] Separate CDQ and CDQE in the scheduler model. According to Agner's data, CDQE is closer to CWDE. llvm-svn: 329354	2018-04-05 21:56:19 +00:00
Craig Topper	4cc3827791	[X86] Add MOVZPQILo2PQIrr to the Sandy Bridge scheduler model llvm-svn: 329351	2018-04-05 21:40:32 +00:00
Craig Topper	3b0b96c591	[X86] Add LEAVE instruction to the scheduler models using the same data as LEAVE64. Make LEAVE/LEAVE64 more correct on Sandy Bridge. This is the 32-bit mode version of LEAVE64. It should be at least somewhat similar to LEAVE64. The Sandy Bridge version was missing a load port use. llvm-svn: 329347	2018-04-05 21:16:26 +00:00
Simon Pilgrim	9b41cac3e9	[X86][SSE] Add floating point add/mul fast-math vector.reduce tests Strict versions aren't working at all (PR36732) and the accumulators aren't supported (PR36734) llvm-svn: 329344	2018-04-05 21:01:21 +00:00
Simon Pilgrim	806252fab0	[X86][SSE] Add floating point min/max vector.reduce tests llvm-svn: 329343	2018-04-05 20:54:55 +00:00
Craig Topper	c6bb36a3d0	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329339	2018-04-05 20:04:06 +00:00
Craig Topper	9eec2025c5	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329330	2018-04-05 18:38:45 +00:00
Simon Pilgrim	7f6f43fa3e	[X86][SSE] Add integer add/mul vector.reduce tests llvm-svn: 329321	2018-04-05 17:37:35 +00:00
Simon Pilgrim	de5d0ffe47	[X86][SSE] Add integer and/or/xor vector.reduce tests llvm-svn: 329320	2018-04-05 17:29:51 +00:00
Simon Pilgrim	57d324082c	[X86][SSE] Add integer min/max vector.reduce tests llvm-svn: 329319	2018-04-05 17:25:40 +00:00
Tim Northover	b30388bf11	ARM: Do not spill CSR to stack on entry to noreturn functions A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287	2018-04-05 14:26:06 +00:00
Sam Parker	0e7deb8104	[DAGCombine] Revert r329160 Again, broke the big endian stage 2 builders. llvm-svn: 329283	2018-04-05 13:46:17 +00:00
Craig Topper	15303dda0d	[X86] Revert r329251-329254 It's failing on the bots and I'm not sure why. This reverts: [X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. [X86] Use WriteFShuffle256 for VEXTRACTF128 to be consistent with VEXTRACTI128 which uses WriteShuffle256. [X86] Remove some InstRWs for plain store instructions on Sandy Bridge. [X86] Auto-generate complete checks. NFC llvm-svn: 329256	2018-04-05 05:19:36 +00:00
Craig Topper	25c7110a37	[X86] Synchronize the SchedRW on some EVEX instructions with their VEX equivalents. Mostly vector load, store, and move instructions. llvm-svn: 329254	2018-04-05 04:42:03 +00:00
Craig Topper	6c4e08c835	[X86] Remove some InstRWs for plain store instructions on Sandy Bridge. We were forcing the latency of these instructions to 5 cycles, but every other scheduler model had them as 1 cycle. I'm sure I didn't get everything, but this gets a big portion. llvm-svn: 329252	2018-04-05 04:42:01 +00:00
Craig Topper	5c36557426	[X86] Auto-generate complete checks. NFC llvm-svn: 329251	2018-04-05 04:41:59 +00:00
Craig Topper	498875fab0	[X86] Separate BSWAP32r and BSWAP64r scheduling data in SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211	2018-04-04 17:54:19 +00:00
Sam Parker	7ec722d603	[DAGCombine] Improve ReduceLoadWidth for SRL Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160	2018-04-04 09:26:56 +00:00
Vlad Tsyrklevich	e3446017ed	Add the ShadowCallStack pass Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139	2018-04-04 01:21:16 +00:00
Jessica Paquette	5fa2a63785	[MachineOutliner] Test for X86FI->getUsesRedZone() as well as Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function truly uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134	2018-04-03 23:32:41 +00:00
Jessica Paquette	d506bf8e3d	[MachineOutliner][NFC] Make outlined functions have internal linkage The linkage type on outlined functions was private before. This meant that if you set a breakpoint in an outlined function, the debugger wouldn't be able to give a sane name to the outlined function. This commit changes the linkage type to internal and updates any tests that relied on the prefixes on the names of outlined functions. llvm-svn: 329116	2018-04-03 21:36:00 +00:00
Sanjay Patel	223ef402c9	[x86] add tests for convert-FP-to-integer with constants; NFC We don't constant fold any of these, but we could...but if we do, we must produce the right answer. Unlike the IR fptosi instruction or its DAG node counterpart ISD::FP_TO_SINT, these are not undef for an out-of-range input. llvm-svn: 329100	2018-04-03 18:34:56 +00:00
Chandler Carruth	ff2f4fcd51	[x86] Fix a pretty obvious think-o with my asm scrubbing. You have to in fact use regular expression syntax to use regular expressions. Should restore the bots. Sorry for the noise on this test. Thanks to Philip for spotting the bug! llvm-svn: 329057	2018-04-03 10:28:56 +00:00
Chandler Carruth	44a791a57a	[x86] Clean up and enhance a test around eflags copying. This adds the basic test cases from all the EFLAGS bugs in more direct forms. It also switches to generated check lines, and includes both 32-bit and 64-bit variations. No functionality changing here, just setting things up to have a nice clean asm diff in my EFLAGS patch. llvm-svn: 329056	2018-04-03 10:04:37 +00:00
Chandler Carruth	6646becd0c	[x86] Extend my goofy SP offset scrubbing for llc test cases to actually do explicit scrubbing of the offsets of stack spills and reloads. You can always turn this off in order to test specific stack slot usage. We were already hiding most of this, but the new logic hides it more generically. Notably, we should effectively hide stack slot churn in functions that have a frame pointer now, and should also hide it when changing a function from stack pointer to frame pointer. That transition already changes enough to be clearly noticed in the test case diff, showing every spill and reload is really noisy without benefit. See the test case I ran this on as a classic example. llvm-svn: 329055	2018-04-03 09:57:05 +00:00
Chandler Carruth	72eb30f7b3	[x86] Tidy up test case, generate check lines with script. NFC. Just adds basic block labels and tidies up where comments go in the test case and then generates fresh CHECK lines with the script. This way, the check lines are much easier to maintain. They were already close to this but not quite there. llvm-svn: 329040	2018-04-03 02:19:05 +00:00
Rafael Espindola	8c58750cc4	Align stubs for external and common global variables to pointer size. This patch fixes PR36885: clang++ generates unaligned stub symbol holding a pointer. Patch by Rahul Chaudhry! llvm-svn: 329030	2018-04-02 23:20:30 +00:00
Lama Saba	927468309f	[X86] Reduce Store Forward Block issues in HW - Recommit after fixing Bug 36346 If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load, This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory. A "store forward block" occurs in cases that a store cannot be forwarded to the load. The most typical case of store forward block on Intel Core microarchiticutre that a small store cannot be forwarded to a large load. The estimated penalty for a store forward block is ~13 cycles. This pass tries to recognize and handle cases where "store forward block" is created by the compiler when lowering memcpy calls to a sequence of a load and a store. The pass currently only handles cases where memcpy is lowered to XMM/YMM registers, it tries to break the memcpy into smaller copies. breaking the memcpy should be possible since there is no atomicity guarantee for loads and stores to XMM/YMM. Differential revision: https://reviews.llvm.org/D41330 Change-Id: Ib48836ccdf6005989f7d4466fa2035b7b04415d9 llvm-svn: 328973	2018-04-02 13:48:28 +00:00
Craig Topper	96729cd64b	[X86][Silvermont] Use correct latency and throughput information for divide and square root in the scheduler model. Data taken from Table 16-17 in the Intel Optimization Manual. llvm-svn: 328962	2018-04-02 06:34:16 +00:00
Craig Topper	6a814904da	[X86][SkylakeServer] Correct throughput for 512-bit sqrt and divide. Data taken from the AVX512_SKX_PortAssign spreadsheet at http://instlatx64.atw.hu/ llvm-svn: 328961	2018-04-02 05:54:34 +00:00
Craig Topper	8104f266a4	[X86] Correct the throughput for divide instructions in Sandy Bridge/Haswell/Broadwell/Skylake scheduler models. Fixes most of PR36898. Still need to fix the 512-bit instructions, but Agner's tables don't have those. llvm-svn: 328960	2018-04-02 05:33:28 +00:00
Craig Topper	dc74094398	[X86] Fix the SchedRW for AVX512 shift instructions. It was being inadvertently defaulted to an FADD scheduler class. llvm-svn: 328959	2018-04-02 03:15:02 +00:00
Craig Topper	caec723a1a	[X86] Add an itinerary to BTR64rr. llvm-svn: 328956	2018-04-02 01:12:34 +00:00
Craig Topper	db6caabccc	[X86] Check if the load and store are to the same pointer before preventing i16 RMW shifts and subtracts from being promoted. llvm-svn: 328930	2018-04-01 06:29:28 +00:00
Craig Topper	3998041e80	[X86] Add test case to show failure to promote i16 subtract when the LHS is a load and the result is stored to a different address. We mistakenly believe we might be able to fold this as a RMW operation, but that doesn't end up happening. llvm-svn: 328929	2018-04-01 06:29:27 +00:00
Craig Topper	ae2de57db0	[X86] Allow i16 subtracts to be promoted if the load is on the LHS and its not being stored. llvm-svn: 328928	2018-04-01 06:29:25 +00:00
Craig Topper	280f631350	[X86] Add test case to show failure to promote i16 subtract because we mistakenly believe the load can be folded. NFC The left hand side of the subtract is a load, but we cna't fold those unless we also have a store. llvm-svn: 328927	2018-04-01 06:29:23 +00:00
Sanjay Patel	6124cae8f7	[DAGCombine] (float)((int) f) --> ftrunc (PR36617) fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 328921	2018-03-31 17:55:44 +00:00
Simon Pilgrim	3b8ad346f9	[X86][Btver2] Add MMX_PSHUFB to the JWritePSHUFB InstRW entries llvm-svn: 328918	2018-03-31 09:15:54 +00:00
Craig Topper	13a0f83a05	[X86] Add SchedRW for PMULLD Summary: It seems many CPUs don't implement this instruction as well as the other vector multiplies. Often using a multi uop flow. Silvermont in particular has a 7 uop flow with 11 cycle throughput. Sandy Bridge implements it as a single uop with 5 cycle latency and 1 cycle throughput. But Haswell and later use 2 uops with 10 cycle latency and 2 cycle throughput. This patch adds a new X86SchedWritePair we can use to tag this instruction separately. I've provided correct information for Silvermont, Btver2, and Sandy Bridge. I've removed the InstRWs for SandyBridge. I've left Haswell/Broadwell/Skylake InstRWs in place because I wasn't sure how to account for the different load latency between 128 and 256 bits. I also left Znver1 InstRWs in place because the existing values don't match Agner's spreadsheet. I also left a FIXME in the SandyBridge model because it being used for the "generic" model is too optimistic for the 256/512-bit versions since those are multiple uops on all known CPUs. Reviewers: RKSimon, GGanesh, courbet Reviewed By: RKSimon Subscribers: gchatelet, gbedwell, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D44972 llvm-svn: 328914	2018-03-31 04:54:32 +00:00
Sanjay Patel	e09b7dcf3d	[SelectionDAG] Removing FABS folding from DAGCombiner The code has bugs dealing with -0.0. Since D44550 introduced FABS pattern folding in InstCombine, this patch removes the now-redundant code that causes https://bugs.llvm.org/show_bug.cgi?id=36600. Patch by Mikhail Dvoretckii! Differential Revision: https://reviews.llvm.org/D44683 llvm-svn: 328872	2018-03-30 15:42:52 +00:00
Craig Topper	89310f56c8	[X86] Correct the placement of ReadAfterLd in BEXTR and BZHI. Add dedicated SchedRW for BEXTR/BZHI. These instructions have the memory operand before the register operand. So we need to put ReadDefault for all the load ops first. Then the ReadAfterLd Differential Revision: https://reviews.llvm.org/D44838 llvm-svn: 328823	2018-03-29 20:41:39 +00:00
Jun Bum Lim	f90fe701ef	[PostRAMachineSink] preserve CFG Summary: Mark CFG is preserved since this pass do not make any change in CFG. Reviewers: sebpop, mzolotukhin, mcrosier Reviewed By: mzolotukhin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44845 llvm-svn: 328727	2018-03-28 19:56:26 +00:00
Simon Pilgrim	7237e0cf39	[X86][AVX2] Add shuffle test case from PR36933 llvm-svn: 328714	2018-03-28 16:48:48 +00:00
Paul Robinson	7cb26ad2ef	[DWARF] Suppress split line tables more carefully. If a given split type unit does not have source locations, don't have it refer to the split line table. If no split type unit refers to the split line table, don't emit the line table at all. This will save a little space on rare occasions, but also refactors things a bit to improve which class is responsible for what. Responding to review comments on r326395. Differential Revision: https://reviews.llvm.org/D44220 llvm-svn: 328670	2018-03-27 21:28:59 +00:00
Simon Pilgrim	a2f26788a3	[X86] Add WriteFMOVMSK/WriteVecMOVMSK/WriteMMXMOVMSK scheduler classes Currently MOVMSK instructions use the WriteVecLogic class, which is a very poor choice given that MOVMSK involves a SSE->GPR transfer. Differential Revision: https://reviews.llvm.org/D44924 llvm-svn: 328664	2018-03-27 20:38:54 +00:00
Krzysztof Parzyszek	52396bb9c5	Use .set instead of = when printing assignment in assembly output On Hexagon "x = y" is a syntax used in most instructions, and is not treated as a directive. Differential Revision: https://reviews.llvm.org/D44256 llvm-svn: 328635	2018-03-27 16:44:41 +00:00
Simon Pilgrim	5f7ab4fedf	[X86][Btver2] Add MMX_PMOVMSKBrr to MOVMSK scheduler class llvm-svn: 328620	2018-03-27 12:26:12 +00:00
Sanjay Patel	15f7df9f44	[x86] add RUN for target before roundss; NFC llvm-svn: 328601	2018-03-27 00:32:19 +00:00
Sanjay Patel	8653776367	[x86] add tests for ftrunc; NFC llvm-svn: 328592	2018-03-26 23:18:32 +00:00
Simon Pilgrim	f6440b6fb1	Fix newlines. NFCI. llvm-svn: 328583	2018-03-26 21:07:59 +00:00
Simon Pilgrim	28e7bcbba6	[X86] Add WriteCRC32 scheduler class Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582	2018-03-26 21:06:14 +00:00
Rafael Espindola	78fdca3cd5	Use local symbols for creating .stack-size. llvm-svn: 328581	2018-03-26 20:40:22 +00:00
Reid Kleckner	41fb2dba9c	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570	2018-03-26 18:49:48 +00:00
Simon Pilgrim	f33d905293	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566	2018-03-26 18:19:28 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Hans Wennborg	311b63f13b	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482	2018-03-26 10:07:51 +00:00
Craig Topper	6f28d3c954	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT. llvm-svn: 328474	2018-03-26 05:05:12 +00:00
Craig Topper	cdfcf8ecda	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473	2018-03-26 05:05:10 +00:00
Craig Topper	fbf2d850e3	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions. llvm-svn: 328472	2018-03-26 04:20:36 +00:00
Craig Topper	659f85af14	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD. They were backwards. llvm-svn: 328469	2018-03-26 02:17:13 +00:00
Craig Topper	15fef89ad9	[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465	2018-03-25 23:40:56 +00:00
Simon Pilgrim	913345f8f5	[X86][AES] Ensure we're testing both non-VEX/VEX variants of AES instructions on AVX targets Add skylake server tests as well llvm-svn: 328424	2018-03-24 15:05:12 +00:00
Simon Pilgrim	91fe24b8cf	[X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE instructions on AVX targets And ensure we don't use later instruction sets in SSE schedule tests llvm-svn: 328423	2018-03-24 14:51:52 +00:00
Simon Pilgrim	f7d0f7e6db	[X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule tests llvm-svn: 328421	2018-03-24 13:47:48 +00:00
Simon Pilgrim	d2016f95fb	[X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule tests llvm-svn: 328420	2018-03-24 13:47:01 +00:00
Craig Topper	2c0a62ab9a	[X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them. Differential Revision: https://reviews.llvm.org/D44375 llvm-svn: 328405	2018-03-24 01:52:01 +00:00
Reid Kleckner	e27b410661	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Both GCC and MSVC only look at the low byte of a boolean when it is passed. llvm-svn: 328386	2018-03-23 23:38:53 +00:00
Simon Pilgrim	e5c0a041ff	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338	2018-03-23 17:38:59 +00:00
Simon Pilgrim	8619962c73	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318	2018-03-23 14:27:26 +00:00
Simon Pilgrim	2755893834	[X86][SandyBridge] Fix missing comma that was causing string concatenation of 2 instregex entries Found while updating D44687 llvm-svn: 328308	2018-03-23 11:56:38 +00:00
Martin Storsjo	db75aa96d3	Revert "[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))" This reverts commit r328252. This change broke building a number of projects when targeting ARM and AArch64, see PR36873. llvm-svn: 328297	2018-03-23 08:36:47 +00:00
Craig Topper	4787b7f434	[X86] Correct the latencies of SNB integer vector multiplies based on Agner's data. Add missing MMX multiplies. llvm-svn: 328295	2018-03-23 06:41:43 +00:00
Craig Topper	7580a7997d	[X86] Change VPSADBW itinerary to SSE_INTALU_ITINS_P to match the SSE version. llvm-svn: 328293	2018-03-23 06:41:40 +00:00
Craig Topper	7f142b8bf1	[X86] Merge VMOVMSKBrr and MOVMSKBrr in the SNB sheduler model. The VMOVMSKBrr was in a separate InstRW with a lower latency, but I assume they should be the same and the higher latency matches Agners table so I'm going with that. llvm-svn: 328291	2018-03-23 06:41:38 +00:00
Craig Topper	fae4173b47	[X86] Add VEXTRB/W/D/Q to Zen scheduler model. The SSE versions were present, but not the VEX version. llvm-svn: 328290	2018-03-23 06:41:36 +00:00
Michael Zolotukhin	3520331f93	Reapply "[test] Add tests for llc passes pipelines." with a fix for bots with expensive checks on. llvm-svn: 328267	2018-03-22 23:02:48 +00:00
Craig Topper	adb173314d	[X86] Correct the VROUND regular expressions in Znver1 scheduler model to account for r328254 llvm-svn: 328260	2018-03-22 22:17:11 +00:00
Craig Topper	40d3b32e12	[X86] Rename VROUNDYPS* and VROUNDYPD* instructions to VROUNDPSY* and VROUNDPDY*. Fix itinerary mistake on all memory forms of VROUNDPD This makes the Y position consistent with other instructions. This should have been NFC, but while refactoring the multiclass I noticed that VROUNDPD memory forms were using the register itinerary. llvm-svn: 328254	2018-03-22 21:55:20 +00:00
Guozhi Wei	17ff975eb1	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst)) In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 328252	2018-03-22 21:47:25 +00:00
Craig Topper	58afb4ea58	[X86][SkylakeClient] Fix a bunch of instructions that were incorrectly assigned Port015 instead of Port01. The VEC ADD and VEC MUL units aren't present on port 5 on SkylakeClient. llvm-svn: 328241	2018-03-22 21:10:07 +00:00
Jun Bum Lim	2ecb7ba4c6	[CodeGen] Add a new pass for PostRA sink Summary: This pass sinks COPY instructions into a successor block, if the COPY is not used in the current block and the COPY is live-in to a single successor (i.e., doesn't require the COPY to be duplicated). This avoids executing the the copy on paths where their results aren't needed. This also exposes additional opportunites for dead copy elimination and shrink wrapping. These copies were either not handled by or are inserted after the MachineSink pass. As an example of the former case, the MachineSink pass cannot sink COPY instructions with allocatable source registers; for AArch64 these type of copy instructions are frequently used to move function parameters (PhyReg) into virtual registers in the entry block.. For the machine IR below, this pass will sink %w19 in the entry into its successor (%bb.1) because %w19 is only live-in in %bb.1. ``` %bb.0: %wzr = SUBSWri %w1, 1 %w19 = COPY %w0 Bcc 11, %bb.2 %bb.1: Live Ins: %w19 BL @fun %w0 = ADDWrr %w0, %w19 RET %w0 %bb.2: %w0 = COPY %wzr RET %w0 ``` As we sink %w19 (CSR in AArch64) into %bb.1, the shrink-wrapping pass will be able to see %bb.0 as a candidate. With this change I observed 12% more shrink-wrapping candidate and 13% more dead copies deleted in spec2000/2006/2017 on AArch64. Reviewers: qcolombet, MatzeB, thegameg, mcrosier, gberry, hfinkel, john.brawn, twoh, RKSimon, sebpop, kparzysz Reviewed By: sebpop Subscribers: evandro, sebpop, sfertile, aemerson, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41463 llvm-svn: 328237	2018-03-22 20:06:47 +00:00
Nirav Dave	8c5f47ac40	[DAG, X86] Fix ISel-time node insertion ids As in SystemZ backend, correctly propagate node ids when inserting new unselected nodes into the DAG during instruction Seleciton for X86 target. Fixes PR36865. Reviewers: jyknight, craig.topper Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D44797 llvm-svn: 328233	2018-03-22 19:32:07 +00:00
Craig Topper	4a3be6e578	[X86] Correct the scheduling data for some of the 32 and 64 bit multiplies to as best as I understand how they are implemented. llvm-svn: 328231	2018-03-22 19:22:51 +00:00
Jonas Devlieghere	7e69dd02bb	Revert "[test] Add tests for llc passes pipelines." This reverts r328159 because the two AArch64 tests fail on GreenDragon: http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-expensive/11030/ llvm-svn: 328188	2018-03-22 10:34:06 +00:00
Michael Zolotukhin	7e6fa1d6ae	[test] Add tests for llc passes pipelines. This is basically an extension of existing test test/CodeGen/X86/O0-pipeline.ll introduced in r302608. llvm-svn: 328159	2018-03-21 22:17:13 +00:00
Craig Topper	137a4dd84d	[X86] Fix the SchedRW for XOP vpcom register form instructions to not be marked as loads. llvm-svn: 328071	2018-03-21 03:41:33 +00:00
Craig Topper	d25f1acf67	[X86] Change PMULLD to 10 cycles on Skylake per Agner's tables and llvm-exegesis. Also restrict to port 0 and 1 for SkylakeClient. It looks like the scheduler models don't account for client not having a full vector ALU on port 5 like server. Fixes PR36808. llvm-svn: 328061	2018-03-20 23:39:48 +00:00
Martin Storsjo	07589fc496	[X86] Don't use the MSVC stack protector names on mingw Mingw uses the same stack protector functions as GCC provides on other platforms as well. Patch by Valentin Churavy! Differential Revision: https://reviews.llvm.org/D27296 llvm-svn: 328039	2018-03-20 20:37:51 +00:00
Krzysztof Parzyszek	eb0c510ecd	[X86] Add phony registers for high halves of regs with low halves Registers E[A-D]X, E[SD]I, E[BS]P, and EIP have 16-bit subregisters that cover the low halves of these registers. This change adds artificial subregisters for the high halves in order to differentiate (in terms of register units) between the 32- and the low 16-bit registers. This patch contains parts that aim to preserve the calculated register pressure. This is in order to preserve the current codegen (minimize the impact of this patch). The approach of having artificial subregisters could be used to fix PR23423, but the pressure calculation would need to be changed. Differential Revision: https://reviews.llvm.org/D43353 llvm-svn: 328016	2018-03-20 18:46:55 +00:00
Michael Zolotukhin	fb3f509e01	[XRay] Lazily compute MachineLoopInfo instead of requiring it. Summary: Currently X-Ray Instrumentation pass has a dependency on MachineLoopInfo (and thus on MachineDominatorTree as well) and we have to compute them even if X-Ray is not used. This patch changes it to a lazy computation to save compile time by avoiding these redundant computations. Reviewers: dberris, kubamracek Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D44666 llvm-svn: 327999	2018-03-20 17:02:29 +00:00
Simon Pilgrim	62690e9d0e	[X86][Haswell][Znver1] Fix typo in fldl instregexs Missing comma was casing 2 instregex entries to be concatenated together by mistake. Found while investigating PR35548 llvm-svn: 327992	2018-03-20 15:44:47 +00:00
Martin Storsjo	802b434156	[X86] Properly implement the calling convention for f80 for mingw/x86_64 In these cases, both parameters and return values are passed as a pointer to a stack allocation. MSVC doesn't use the f80 data type at all, while it is used for long doubles on mingw. Normally, this part of the calling convention is handled within clang, but for intrinsics that are lowered to libcalls, it may need to be handled within llvm as well. Differential Revision: https://reviews.llvm.org/D44592 llvm-svn: 327957	2018-03-20 06:19:38 +00:00
Craig Topper	4778fa7e8a	[X86] Fix the SchedRW for memory forms of CMP and TEST. They were incorrectly marked as RMW operations. Some of the CMP instrucions worked, but the ones that use a similar encoding as RMW form of ADD ended up marked as RMW. TEST used the same tablegen class as some of the CMPs. llvm-svn: 327947	2018-03-20 03:55:17 +00:00
Craig Topper	3e9462607e	[X86] Add TEST16mi/TEST32mi/TEST64mi32 to the Sandybridge/Haswell/Broadwell/Skylake scheduler models. Move it from a load+store group on SNB to a load only group, the same group as CMP. llvm-svn: 327944	2018-03-20 03:02:03 +00:00
Craig Topper	7c90e29cf8	[X86] Add ROR/ROL/SHR/SAR by 1 instructions to the Sandy Bridge scheduler model. I assume these match the generic immediate version like they do in the other models. llvm-svn: 327943	2018-03-20 03:01:59 +00:00
Quentin Colombet	508f68233d	[ShrinkWrap] Take into account landing pad When scanning the function for CSRs uses and defs, also check if the basic block are landing pads. Consider that landing pads needs the CSRs to be properly set. That way we force the prologue/epilogue to always be pushed out of the problematic "throw" region. The "throw" region is problematic because the jumps are not properly modeled. Fixes PR36513 llvm-svn: 327942	2018-03-20 02:44:40 +00:00
Craig Topper	2330d6cd55	[X86] Fix the SNB scheduler for BLENDVB. PBLENDVBrr0 was with the memory version of VBLENDVB and PBLENDVBrm0 was missing. llvm-svn: 327937	2018-03-20 01:30:21 +00:00
Nirav Dave	3264c1bdf6	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying node id invariant traversal and correcting typo. llvm-svn: 327898	2018-03-19 20:19:46 +00:00
Craig Topper	5e65996fac	[X86] Remove OUT32rr/OUT8rr/OUT32ri/OUT8ri from Sandybridge scheduler model. PR35590 was already filed for this information being wrong. It's probably better to default to WriteSystem behavior instead of using something completely wrong. llvm-svn: 327882	2018-03-19 19:00:35 +00:00
Craig Topper	b4c7873f8c	[X86] Add JCXZ/JECXZ to Sandybridge/Haswell/Broadwell/Skylake scheduler models. JRCXZ was already present, but not the others. We never codegen this instruction so this doesn't affect much just trying to get them all into a single generated scheduler class in the output. llvm-svn: 327881	2018-03-19 19:00:32 +00:00
Craig Topper	afabf36505	[X86] Correct regular expression in Zen scheduler model that was excluding JECXZ instruction. The regex was looking for JECXZ_32 or JECXZ_64, but their is just one instruction called JECXZ. They used to exist as separate instructions, but were merged over 3 years ago. llvm-svn: 327880	2018-03-19 19:00:29 +00:00
Craig Topper	259eaa6e7c	[X86] Remove sse41 specific code from lowering v16i8 multiply With the SRAs removed from the SSE2 code in D44267, then there doesn't appear to be any advantage to the sse41 code. The punpcklbw instruction and pmovsx seem to have the same latency and throughput on most CPUs. And the SSE41 code requires moving the upper 64-bits into the lower 64-bit before the sign extend can be done. The unpckhbw in sse2 code can do better than that. llvm-svn: 327869	2018-03-19 17:31:41 +00:00
Craig Topper	5ccd87233f	[X86] Make the multiply and divide itineraries more consistent. Sometimes we used the same itinerary for MEM and REG forms, but that seems inconsistent with our usual usage. We also used the MUL8 itinerary for MULX32/64 which was also weird. The test changes are because we were using IIC_IMUL32_RR and IIC_IMUL64_RR instead of IIC_IMUL32_REG/IIC_IMUL64_REG for the 32 and 64 bit multiplies that produce double width result. llvm-svn: 327866	2018-03-19 16:38:33 +00:00
Matt Davis	4b54e5fc38	[CodeGen] Avoid handling DBG_VALUE in the LivePhysRegs (addUses,removeDefs,stepForward) Summary: This patch prevents DBG_VALUE instructions from influencing LivePhysRegs::stepBackwards and stepForwards. In at least one case, specifically branch folding, the stepBackwards logic was having an influence on code generation. The result was that certain code compiled with '-g -O2' would differ from that compiled with '-O2' alone. It seems that the original logic, accounting for DBG_VALUE, was influencing the placement of an IMPLICIT_DEF which had a later impact on how blocks were processed in branch folding. Reviewers: kparzysz, MatzeB Reviewed By: kparzysz Subscribers: bjope, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D43850 llvm-svn: 327862	2018-03-19 16:06:40 +00:00
Simon Pilgrim	30c38c3849	[X86] Generalize schedule classes to support multiple stages Currently the WriteResPair style multi-classes take a single pipeline stage and latency, this patch generalizes this to make it easier to create complex schedules with ResourceCycles and NumMicroOps be overriden from their defaults. This has already been done for the Jaguar scheduler to remove a number of custom schedule classes and adding it to the other x86 targets will make it much tidier as we add additional classes in the future to try and replace so many custom cases. I've converted some instructions but a lot of the models need a bit of cleanup after the patch has been committed - memory latencies not being consistent, the class not actually being used when we could remove some/all customs, etc. I'd prefer to keep this as NFC as possible so later patches can be smaller and target specific. Differential Revision: https://reviews.llvm.org/D44612 llvm-svn: 327855	2018-03-19 14:46:07 +00:00
Sanjay Patel	05daae75ad	[x86] put nops into the WriteNop class and customize for Jaguar 1. Given that we already have a classification bucket with 'nop' in the name, that's where 'nop' belongs. Right now, it's only used for prefix bytes and 'pause'. 2. Make the latency of this class '1' for Jaguar to tell the scheduler (and presumably llvm-mca) how to model the resource requirements better even though a nop has no dependencies. Differential Revision: https://reviews.llvm.org/D44608 llvm-svn: 327853	2018-03-19 14:26:50 +00:00
Clement Courbet	6d047b70a4	[MergeICmps] Re-land 324317 "Enable the MergeICmps Pass by default." Now that PR36557 is fixed. llvm-svn: 327840	2018-03-19 13:37:04 +00:00
Craig Topper	d10ceffa5f	[X86] Add ADD16i16/ADD32i32/ADD64i32 and similar to the scheduler models to match ADD8i8. Also move ADC8i8 and SBB8i8 in the Sandy Bridge model to the same class as ADC8ri and SBB8ri. That seems more accurate since its the 8i8 is just the register forced to AL instead of coming from modrm. llvm-svn: 327820	2018-03-19 04:21:40 +00:00
Simon Pilgrim	203876f104	[X86][Btver2] Fix crc32 schedule costs The default is currently FAdd for some reason llvm-svn: 327807	2018-03-18 19:54:42 +00:00
Craig Topper	2d451e73f9	[X86] Fix a bunch of overlapping regular expressions in the scheduler models. llvm-svn: 327787	2018-03-18 08:38:06 +00:00
Craig Topper	89dcda3e90	[X86] Remove MMX_MASKMOVQ64 and VMASKMOVDQU from scheduler models. The information was so wildly inaccurate and incomplete its better to just remove it. MMX_MASKMOVQ64 showed up twice in several scheduler models. In Haswell and Broadwell they were on adjacent lines. On Skylake the copies had different information. MMX_MASKMOVQ and MASKMOVDQU were completely missing. MMX_MASKMOVQ64 was listed on Haswell/Broadwell as 1 cycle on port 1 despite it being a store instruction. Filed PR36780 to track fixing this right. llvm-svn: 327783	2018-03-18 03:24:42 +00:00
Nirav Dave	5f0ab71b62	Revert "[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172"" as it times out building test-suite on PPC. llvm-svn: 327778	2018-03-17 19:24:54 +00:00
Nirav Dave	982d3a56ea	[DAG, X86] Revert r327197 "Revert r327170, r327171, r327172" Reland ISel cycle checking improvements after simplifying and reducing node id invariant traversal. llvm-svn: 327777	2018-03-17 17:42:10 +00:00
Oren Ben Simhon	fdd72fd522	[X86] Added support for nocf_check attribute for indirect Branch Tracking X86 Supports Indirect Branch Tracking (IBT) as part of Control-Flow Enforcement Technology (CET). IBT instruments ENDBR instructions used to specify valid targets of indirect call / jmp. The `nocf_check` attribute has two roles in the context of X86 IBT technology: 1. Appertains to a function - do not add ENDBR instruction at the beginning of the function. 2. Appertains to a function pointer - do not track the target function of this pointer by adding nocf_check prefix to the indirect-call instruction. This patch implements `nocf_check` context for Indirect Branch Tracking. It also auto generates `nocf_check` prefixes before indirect branchs to jump tables that are guarded by range checks. Differential Revision: https://reviews.llvm.org/D41879 llvm-svn: 327767	2018-03-17 13:29:46 +00:00
Reid Kleckner	f8b51c5f90	[IR] Avoid the need to prefix MS C++ symbols with '\01' Now the Windows mangling modes ('w' and 'x') do not do any mangling for symbols starting with '?'. This means that clang can stop adding the hideous '\01' leading escape. This means LLVM debug logs are less likely to contain ASCII escape characters and it will be easier to copy and paste MS symbol names from IR. Finally. For non-Windows platforms, names starting with '?' still get IR mangling, so once clang stops escaping MS C++ names, we will get extra '_' prefixing on MachO. That's fine, since it is currently impossible to construct a triple that uses the MS C++ ABI in clang and emits macho object files. Differential Revision: https://reviews.llvm.org/D7775 llvm-svn: 327734	2018-03-16 20:13:32 +00:00
Craig Topper	e6913ec340	[X86] Post process the DAG after isel to remove vector moves that were added to zero upper bits. We previously avoided inserting these moves during isel in a few cases which is implemented using a whitelist of opcodes. But it's too difficult to generate a perfect list of opcodes to whitelist. Especially with AVX512F without AVX512VL using 512 bit vectors to implement some 128/256 bit operations. Since isel is done bottoms up, we'd have to check the VT and opcode and subtarget in order to determine whether an EXTRACT_SUBREG would be generated for some operations. So instead of doing that, this patch adds a post processing step that detects when the moves are unnecesssary after isel. At that point any EXTRACT_SUBREGs would have already been created and appear in the DAG. So then we just need to ensure the input to the move isn't one. Differential Revision: https://reviews.llvm.org/D44289 llvm-svn: 327724	2018-03-16 17:13:42 +00:00
Simon Pilgrim	23578e7d3c	[X86][Btver2] Add correct mul/imul schedule costs Integer multiply is performed on the JMul function unit and i64 requires double pumping llvm-svn: 327707	2018-03-16 14:01:01 +00:00
Simon Pilgrim	8d28ae6aec	[X86][Btver2] Add correct lzcnt/tzcnt/popcnt schedule costs Don't use WriteIMul defaults llvm-svn: 327706	2018-03-16 13:43:55 +00:00
Craig Topper	1b8cf49704	[SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683	2018-03-15 23:04:11 +00:00
Craig Topper	c3983c34cd	[X86] Make sure we use FSUB instruction as the reference for operand order in isAddSubOrSubAdd when recognizing subadd The FADD part of the addsub/subadd pattern can have its operands commuted, but when checking for fsubadd we were using the fadd as reference and commuting the fsub node. llvm-svn: 327660	2018-03-15 20:30:54 +00:00
Craig Topper	46502fa2ef	[X86] Add test case showing bad fmsubadd creation due to bad commuting. The code that creates fmsubadd from shuffle vector has some code to allow commuting the operands of the fadd node. This code was originally created when we only recognized fmaddsub. When fmsubadd support was added this code was not updated and is now commuting the fsub operands instead. llvm-svn: 327659	2018-03-15 20:30:51 +00:00
Simon Pilgrim	d30df5769e	[X86][Btver2] Remove JAny resource, and map system/microcoded instructions to JALU pipes Simplifies throughput to the issue width (1/2) instead of permitting any pipe (1/6) llvm-svn: 327632	2018-03-15 15:12:12 +00:00
Simon Pilgrim	fb7aa57bf1	[X86][SSE] Introduce Float/Vector WriteMove, WriteLoad and Writetore scheduler classes As discussed on D44428 and PR36726, this patch splits off WriteFMove/WriteVecMove, WriteFLoad/WriteVecLoad and WriteFStore/WriteVecStore scheduler classes to permit vectors to be handled separately from gpr/scalar types. I've minimised the diff here by only moving various basic SSE/AVX vector instructions across - we can fix the rest when called for. This does fix the MOVDQA vs MOVAPS/MOVAPD discrepancies mentioned on D44428. Differential Revision: https://reviews.llvm.org/D44471 llvm-svn: 327630	2018-03-15 14:45:30 +00:00
Simon Pilgrim	69a4132f63	[X86] Regenerate schedule tests with zero latency comments llvm-svn: 327628	2018-03-15 14:30:59 +00:00
Craig Topper	ff6e82c9d0	[X86] Add test cases for 512-bit addsub from build_vector. There is no 512 bit addsub instruction, but we partially match it handle fmaddsub matching. We explicitly bail out for 512 bit vectors after failing the fmaddsub match, but we had no test coverage for that bail out. We might want to consider splitting and using 256 bit instructions instead of the long sequence seen here. llvm-svn: 327605	2018-03-15 06:49:01 +00:00
Craig Topper	26a3a80c87	[X86] Add support for matching FMSUBADD from build_vector. llvm-svn: 327604	2018-03-15 06:14:55 +00:00
Reid Kleckner	3a7a2e4a0a	[FastISel] Sink local value materializations to first use Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581	2018-03-14 21:54:21 +00:00
Francis Visoiu Mistrih	e85b06d65f	[CodeGen] Use MIR syntax for MachineMemOperand printing Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580	2018-03-14 21:52:13 +00:00
Simon Pilgrim	adf72e8549	[X86] Add haswell testing for PR35635 as well. To improve complete model testing for schedulers for instructions with multiple results. llvm-svn: 327572	2018-03-14 21:03:09 +00:00
Craig Topper	9c098ed819	[X86] Add back fast-isel code for handling i8 shifts. I removed this in r316797 because the coverage report showed no coverage and I thought it should have been handled by the auto generated table. I now see that there is code that bypasses the table if the shift amount is out of bounds. This adds back the code. We'll codegen out of bounds i8 shifts to effectively (amount & 0x1f). The 0x1f is a strange quirk of x86 that shift amounts are always masked to 5-bits(except 64-bits). So if the masked value is still out bounds the result will be 0. Fixes PR36731. llvm-svn: 327540	2018-03-14 17:57:19 +00:00
Craig Topper	b36cb20ef9	[X86] Teach X86TargetLowering::targetShrinkDemandedConstant to set non-demanded bits if it helps created an and mask that can be matched as a zero extend. I had to modify the bswap recognition to allow unshrunk masks to make this work. Fixes PR36689. Differential Revision: https://reviews.llvm.org/D44442 llvm-svn: 327530	2018-03-14 16:55:15 +00:00
Simon Pilgrim	d1c3c995c0	[X86][AVX] Use WriteFShuffleLd for broadcast reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327524	2018-03-14 15:47:08 +00:00
Alexander Ivchenko	86ef9ab28f	[GlobalIsel][X86] Support for G_SDIV instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44430 llvm-svn: 327520	2018-03-14 15:41:11 +00:00
Simon Pilgrim	d594942928	[X86][Btver2] Fix YMM shuffle, permute and permutevar scheduler costs Account for ymm double pumping and add proper pshufb/permutevar support llvm-svn: 327510	2018-03-14 14:05:19 +00:00
Simon Pilgrim	de995e6e37	[X86][SSE] Use WriteFShuffleLd for MOVDDUP/MOVSHDUP/MOVSLDUP reg-mem instructions They shouldn't be treated as pure loads. Found while investigating D44428 llvm-svn: 327505	2018-03-14 13:22:56 +00:00
Alexander Ivchenko	0bd4d8c901	[GlobalISel][X86] Support G_LSHR/G_ASHR/G_SHL Support G_LSHR/G_ASHR/G_SHL. We have 3 variance for shift instructions : shift gpr, shift imm, shift 1. Currently GlobalIsel TableGen generate patterns for shift imm and shift 1, but with shiftCount i8. In G_LSHR/G_ASHR/G_SHL like LLVM-IR both arguments has the same type, so for now only shift i8 can use auto generated TableGen patterns. The support of G_SHL/G_ASHR enables tryCombineSExt from LegalizationArtifactCombiner.h to hit, which results in different legalization for the following tests: LLVM :: CodeGen/X86/GlobalISel/ext-x86-64.ll LLVM :: CodeGen/X86/GlobalISel/gep.ll LLVM :: CodeGen/X86/GlobalISel/legalize-ext-x86-64.mir -; X64-NEXT: movsbl %dil, %eax +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: shll %cl, %edi +; X64-NEXT: movl $24, %ecx +; X64-NEXT: # kill: def $cl killed $ecx +; X64-NEXT: sarl %cl, %edi +; X64-NEXT: movl %edi, %eax ..which is not optimal and should be addressed later. Rework of the patch by igorb Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44395 llvm-svn: 327499	2018-03-14 11:23:57 +00:00
Alexander Ivchenko	327de80529	[GlobalIsel][X86] Support for G_ZEXT instruction Reviewed By: igorb Differential Revision: https://reviews.llvm.org/D44378 llvm-svn: 327482	2018-03-14 09:11:23 +00:00
Craig Topper	9ca7e67c4c	[X86] Re-generate test to get proper capitalization of its CHECK lines. NFC llvm-svn: 327462	2018-03-13 23:31:48 +00:00
Craig Topper	cc060e921b	[X86] Rewrite LowerAVXCONCAT_VECTORS similar to how we handle vXi1 concats. This better able to detect undef and zeros pieces in the concat. Or cases when only one subvector is non-zero. This allows us to avoid silly things like double inserts into progressively larger undefs. This still builds 512 bit concats of 128 bits by building up through 256 bits first. But I don't know if that's best. We probably want to merge this with the vXi1 concat code since they are very similar. llvm-svn: 327454	2018-03-13 22:05:25 +00:00
Craig Topper	4aeec51986	[DAGCombiner] Allow visitEXTRACT_SUBVECTOR to combine with BUILD_VECTORS between LegalizeVectorOps and LegalizeDAG. BUILD_VECTORs aren't themselves legalized until LegalizeDAG so we should still be able to create an "illegal" one before that. This helps combine with BUILD_VECTORS that are introduced during LegalizeVectorOps due to unrolling. llvm-svn: 327446	2018-03-13 20:36:28 +00:00
Sanjay Patel	bb45cc126d	[x86] add test for WriteZero sched class instructions; NFC Nops should have zero latency because there is no result. Idioms like 'xorps xmm0, xmm0' may have zero latency because they are handled without using an execution unit. llvm-svn: 327435	2018-03-13 19:20:01 +00:00
Simon Pilgrim	9855b39380	[DAGCombine] visitREM - Don't assume that one divrem isn't driving another Under some circumstances the divrems won't have been combined together before getting to this code. So replace the assertion with a if() guard to not expand to X-((X/C)*C) to give the other combine chance to happen. Reduced from OSS-Fuzz #6883 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=6883 llvm-svn: 327424	2018-03-13 17:17:15 +00:00
Simon Pilgrim	3d4c86d399	[X86][Btver2] Split i8/i16/i32/i64 div/idiv costs We were assuming a mixture of 32/64 division costs. llvm-svn: 327407	2018-03-13 15:22:24 +00:00
Simon Pilgrim	93bd7187f4	[X86][SSE41] createVariablePermute v2X64 - PCMPEQQ can test for index 0/1 and select between them. llvm-svn: 327385	2018-03-13 12:22:58 +00:00
Craig Topper	80058e30cc	[LegalizeTypes] In SplitVecOp_TruncateHelper, use GetSplitVector on the input instead of creating new extract_subvectors. llvm-svn: 327355	2018-03-13 01:17:40 +00:00
Simon Pilgrim	6618e2a09c	[X86][SSE] createVariablePermute - PSHUFB requires SSSE3 not just SSE3 llvm-svn: 327259	2018-03-12 12:30:04 +00:00
Simon Pilgrim	d09cc9c62c	[X86][MMX] Support MMX build vectors to avoid SSE usage (PR29222) 64-bit MMX vector generation usually ends up lowering into SSE instructions before being spilled/reloaded as a MMX type. This patch creates a MMX vector from MMX source values, taking the lowest element from each source and constructing broadcasts/build_vectors with direct calls to the MMX PUNPCKL/PSHUFW intrinsics. We're missing a few consecutive load combines that could be handled in a future patch if that would be useful - my main interest here is just avoiding a lot of the MMX/SSE crossover. Differential Revision: https://reviews.llvm.org/D43618 llvm-svn: 327247	2018-03-11 19:22:13 +00:00
Simon Pilgrim	55ed3dc676	[X86][AVX512] Added more non-VLX test cases Cleaned up check prefixes so that they actually share a bit more llvm-svn: 327246	2018-03-11 18:28:37 +00:00
Simon Pilgrim	30f74c14ff	[X86][AVX] createVariablePermute - scale v16i16 variable permutes to use v32i8 codegen XOP was already doing this, and now AVX performs v32i8 variable permutes as well. llvm-svn: 327245	2018-03-11 17:23:54 +00:00
Simon Pilgrim	b306501796	[X86][AVX] createVariablePermute - widen permutes for cases where the source vector is wider than the destination type llvm-svn: 327244	2018-03-11 17:00:46 +00:00
Simon Pilgrim	9a5d0c7540	[X86][AVX] createVariablePermute - use PSHUFB+PCMPGT+SELECT for v32i8 variable permutes Same as the VPERMILPS/VPERMILPD approach for v8f32/v4f64 cases, rely on PSHUFB using bits[3:0] for indexing - we can ignore the sign bit (zero element) as those index vector values are considered undefined. The select between the lo/hi permute results based on the index size. llvm-svn: 327242	2018-03-11 16:28:11 +00:00
Simon Pilgrim	f9cc80d218	[X86][AVX] createVariablePermute - use 2xVPERMIL+PCMPGT+SELECT for v8i32/v8f32 and v4i64/v4f64 variable permutes As VPERMILPS/VPERMILPD only selects elements based on the bits[1:0]/bit[1] then we can permute both the (repeated) lo/hi 128-bit vectors in each case and then select between these results based on whether the index was for for lo/hi. For v4i64/v4f64 this avoids some rather nasty v4i64 multiples on the AVX2 implementation, which seems to be worse than the extra port5 pressure from the additional shuffles/blends. llvm-svn: 327239	2018-03-11 11:52:26 +00:00
Simon Pilgrim	2565bd421e	[X86][AVX512] createVariablePermute - Non-VLX targets can widen v4i64/v8f64 variable permutes to v8i64/v8f64 Permutes in the upper elements will be undefined, but they will be discarded anyway. llvm-svn: 327238	2018-03-11 11:19:19 +00:00
Craig Topper	d88204fe1b	[X86] Add comments to the end of FMA3 instructions to make the operation clear Summary: There are 3 different operand orders for FMA instructions so figuring out the exact operation being performed requires a lot of thought. This patch adds a comment to the end of the assembly line to print the exact operation. I think I've got all the instructions in here except the ones with builtin rounding. I didn't update all tests, but I assume we can get them as we regenerate tests in the future. Reviewers: spatel, v_klochkov, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44345 llvm-svn: 327225	2018-03-10 21:30:46 +00:00
Simon Pilgrim	de7f3f0f91	[X86][XOP] createVariablePermute - use VPERMIL2 for v8i32/v4i64 variable permutes llvm-svn: 327222	2018-03-10 19:49:59 +00:00
Simon Pilgrim	ff1248f82f	[X86][XOP] createVariablePermute - use VPPERM for v16i16 variable permutes llvm-svn: 327218	2018-03-10 18:33:29 +00:00
Simon Pilgrim	8224241f75	[X86][XOP] createVariablePermute - use VPPERM for v32i8 variable permutes llvm-svn: 327213	2018-03-10 16:51:45 +00:00
Craig Topper	9804c67d21	[X86] Rewrite printMasking code in X86InstComments to use TSFlags to determine whether the instruction is masked. This should have been NFC, but it looks like we were missing PUNPCKLHQDQ/PUNPCKLQDQ instructions in there. llvm-svn: 327200	2018-03-10 03:12:00 +00:00
Rafael Espindola	63c378d343	Go back to sometimes assuming intristics are local. This fixes pr36674. While it is valid for shouldAssumeDSOLocal to return false anytime, always returning false for intrinsics is not optimal on i386 and also hits a bug in the backend. To use a plt, the caller must first setup ebx to handle the case of that file being linked into a PIE executable or shared library. In those cases the generated PLT uses ebx. Currently we can produce "calll expf@plt" without setting ebx. We could fix that by correctly setting ebx, but this would produce worse code for the case where the runtime library is statically linked. It would also required other tools to handle R_386_PLT32. llvm-svn: 327198	2018-03-10 02:42:14 +00:00
Nirav Dave	042678bd55	Revert: r327172 "Correct load-op-store cycle detection analysis" r327171 "Improve Dependency analysis when doing multi-node Instruction Selection" r328170 "[DAG] Enforce stricter NodeId invariant during Instruction selection" Reverting patch as NodeId invariant change is causing pathological increases in compile time on PPC llvm-svn: 327197	2018-03-10 02:16:15 +00:00
Craig Topper	f6ff51fc62	[TwoAddressInstructionPass] Improve tryInstructionCommute of X86 FMA and vpternlog instructions These instructions have 3 operands that can be commuted. The first commute we find may not be the best. So we should keep searching if we performed an aggressive commute. There may still be an operand that is killed or a physical register constraint that might be better. Differential Revision: https://reviews.llvm.org/D44324 llvm-svn: 327188	2018-03-09 23:36:58 +00:00
Nirav Dave	0fab41782d	Correct load-op-store cycle detection analysis Add missing cycle dependency checks in load-op-store fusion. Fixes PR36274. Reviewers: craig.topper, bogner Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D43154 llvm-svn: 327172	2018-03-09 20:58:07 +00:00
Nirav Dave	d668f69ee7	Improve Dependency analysis when doing multi-node Instruction Selection Relanding after fixing NodeId Invariant. Cleanup cycle/validity checks in ISel (IsLegalToFold, HandleMergeInputChains) and X86 (isFusableLoadOpStore). Now do a full search for cycles / dependencies pruning the search when topological property of NodeId allows. As part of this propogate the NodeId-based cutoffs to narrow hasPreprocessorHelper searches. Reviewers: craig.topper, bogner Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D41293 llvm-svn: 327171	2018-03-09 20:57:42 +00:00
Nirav Dave	071699bf82	[DAG] Enforce stricter NodeId invariant during Instruction selection Instruction Selection makes use of the topological ordering of nodes by node id (a node's operands have smaller node id than it) when doing cycle detection. During selection we may violate this property as a selection of multiple nodes may induce a use dependence (and thus a node id restriction) between two unrelated nodes. If a selected node has an unselected successor this may allow us to miss a cycle in detection an invalid selection. This patch fixes this by marking all unselected successors of a selected node have negated node id. We avoid pruning on such negative ids but still can reconstruct the original id for pruning. In-tree targets have been updated to replace DAG-level replacements with ISel-level ones which enforce this property. This preemptively fixes PR36312 before triggering commit r324359 relands Reviewers: craig.topper, bogner, jyknight Subscribers: arsenm, nhaehnle, javed.absar, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43198 llvm-svn: 327170	2018-03-09 20:57:15 +00:00
Peter Collingbourne	2974856ad4	Use branch funnels for virtual calls when retpoline mitigation is enabled. The retpoline mitigation for variant 2 of CVE-2017-5715 inhibits the branch predictor, and as a result it can lead to a measurable loss of performance. We can reduce the performance impact of retpolined virtual calls by replacing them with a special construct known as a branch funnel, which is an instruction sequence that implements virtual calls to a set of known targets using a binary tree of direct branches. This allows the processor to speculately execute valid implementations of the virtual function without allowing for speculative execution of of calls to arbitrary addresses. This patch extends the whole-program devirtualization pass to replace certain virtual calls with calls to branch funnels, which are represented using a new llvm.icall.jumptable intrinsic. It also extends the LowerTypeTests pass to recognize the new intrinsic, generate code for the branch funnels (x86_64 only for now) and lay out virtual tables as required for each branch funnel. The implementation supports full LTO as well as ThinLTO, and extends the ThinLTO summary format used for whole-program devirtualization to support branch funnels. For more details see RFC: http://lists.llvm.org/pipermail/llvm-dev/2018-January/120672.html Differential Revision: https://reviews.llvm.org/D42453 llvm-svn: 327163	2018-03-09 19:11:44 +00:00
Simon Pilgrim	2cd489feb2	[X86][AVX] createVariablePermute - fix v2i64/v2f64 VPERMILPD index creation. The input indices vector will put the index in bit0, but VPERMILPD actually selects off bit1 - so we need to scale accordingly. llvm-svn: 327159	2018-03-09 18:37:56 +00:00
Craig Topper	784f1bbf5e	[X86] Remove SRAs from v16i8 multiply lowering on sse2 targets Previously we unpacked the even bytes of each input into the high byte of 16-bit elements then did an v8i16 arithmetic shift right by 8 bits to fill the upper bits of each word with sign bits. Then we did the v8i16 multiply and then masked to zero the upper 8-bits of each result. The similar was done for all the odd bytes. The results are then packed together with packuswb Since we are masking each multiply result element to 8-bits, and those 8-bits are determined only by the lower 8-bits of each of the inputs, we don't need to fill the upper bits with sign bits. So we can just unpack into the low byte of each element and treat the upper bits as garbage. This is what gcc also does. Differential Revision: https://reviews.llvm.org/D44267 llvm-svn: 327093	2018-03-09 01:22:31 +00:00
Sanjay Patel	0cdccf5f37	[x86] fix test to be independent of FP undef llvm-svn: 327030	2018-03-08 17:24:30 +00:00
Sanjay Patel	af2c4185a2	[x86] regenerate checks; NFC This test will fail if we fix FP undef constant folding. llvm-svn: 327026	2018-03-08 16:56:49 +00:00
Craig Topper	a406796f5f	[X86] Change X86::PMULDQ/PMULUDQ opcodes to take vXi64 type as input instead of vXi32. This instruction can be thought of as reading either the even elements of a vXi32 input or the lower half of each element of a vXi64 input. We currently use the vXi32 interpretation, but vXi64 matches better with its broadcast behavior in EVEX. I'm looking at moving MULDQ/MULUDQ creation to a DAG combine so we can do it when AVX512DQ is enabled without having to go through Custom lowering. But in some of the test cases we failed to use a broadcast load due to the size difference. This should help with that. I'm also wondering if we can model these instructions in native IR and remove the intrinsics and I think using a vXi64 type will work better with that. llvm-svn: 326991	2018-03-08 08:02:52 +00:00
Craig Topper	7ff9779768	[X86] Fix some isel patterns that used aligned vector load instructions with unaligned predicates. These patterns weren't checking the alignment of the load, but were using the aligned instructions. This will cause a GP fault if the data isn't aligned. I believe these were introduced in r312450. llvm-svn: 326967	2018-03-08 00:21:17 +00:00
Simon Pilgrim	dc1a0385ee	[X86][SSE] Regenerate float maxnum/minnum tests llvm-svn: 326930	2018-03-07 19:14:05 +00:00
Craig Topper	c3c15dd640	[X86] Make the MUL->VPMADDWD work before op legalization on AVX1 targets. Simplify feature checks by using isTypeLegal. The v8i32 conversion on AVX1 targets was only working after LowerMUL splits 256-bit vectors. While I was there I've also made it so we don't have to check for AVX2 and BWI directly and instead just ask if the type is legal. Differential Revision: https://reviews.llvm.org/D44190 llvm-svn: 326917	2018-03-07 17:53:18 +00:00
Simon Pilgrim	eab108ba39	[X86][X87] Add X87 fp80 conversion tests llvm-svn: 326897	2018-03-07 14:13:14 +00:00
Simon Pilgrim	ca38c762e4	[TargetLowering] Add vector BITCAST support to SimplifyDemandedVectorElts Notably helps cleanup after legalization of vector types Differential Revision: https://reviews.llvm.org/D43674 llvm-svn: 326838	2018-03-06 22:32:01 +00:00
Craig Topper	274e08dd81	[X86] Reject registers that require a REX prefix in inline asm constraints in 32-bit mode We don't currently reject r8-r15 or xmm8-32 or bpl/spl/sil/dil in 32-bit mode. Differential Revision: https://reviews.llvm.org/D44031 llvm-svn: 326826	2018-03-06 18:56:33 +00:00
Martin Storsjo	a7adc3185b	[X86] Handle EAX being live when calling chkstk for x86_64 EAX can turn out to be alive here, when shrink wrapping is done (which is allowed when using dwarf exceptions, contrary to the normal case with WinCFI). This fixes PR36487. Differential Revision: https://reviews.llvm.org/D43968 llvm-svn: 326764	2018-03-06 06:00:13 +00:00

... 2 3 4 5 6 ...

11753 Commits