llvm-project

Commit Graph

Author	SHA1	Message	Date
Bjorn Pettersson	e06321382b	[RegisterCoalescer] Use substPhysReg in reMaterializeTrivialDef Summary: When RegisterCoalescer::reMaterializeTrivialDef is substituting a register use in a DBG_VALUE instruction, and the old register is a subreg, and the new register is a physical register, then we need to use substPhysReg in order to extract the correct subreg. Reviewers: wmi, aprantl Reviewed By: wmi Subscribers: hiraditya, MatzeB, qcolombet, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D50844 llvm-svn: 340326	2018-08-21 19:47:32 +00:00
Simon Pilgrim	9848e0c9ac	[X86][SSE] Add non-uniform udiv test that is mostly divide by 1. The test demonstrates over-complicated codegen for a udiv that only has one divisor that doesn't equal 1. This should have allowed the codegen to be a lot simpler (uniform shifts etc.) but only the SSE2 manages to make use of this...... llvm-svn: 340313	2018-08-21 18:02:28 +00:00
Craig Topper	b172b8884a	[BypassSlowDivision] Teach bypass slow division not to interfere with div by constant where constants have been constant hoisted, but not moved from their basic block DAGCombiner doesn't pay attention to whether constants are opaque before doing the div by constant optimization. So BypassSlowDivision shouldn't introduce control flow that would make DAGCombiner unable to see an opaque constant. This can occur when a div and rem of the same constant are used in the same basic block. it will be hoisted, but not leave the block. Longer term we probably need to look into the X86 immediate cost model used by constant hoisting and maybe not mark div/rem immediates for hoisting at all. This fixes the case from PR38649. Differential Revision: https://reviews.llvm.org/D51000 llvm-svn: 340303	2018-08-21 17:15:33 +00:00
Simon Pilgrim	43cf2c20ab	[X86] Add SSE2 and XOP udiv combine tests llvm-svn: 340282	2018-08-21 15:21:45 +00:00
Simon Pilgrim	8e15b43092	[X86] Add SSE2 sdiv combine tests llvm-svn: 340264	2018-08-21 10:44:06 +00:00
Sam Parker	597811e7a7	[DAGCombiner] Reduce load widths of shifted masks During combining, ReduceLoadWdith is used to combine AND nodes that mask loads into narrow loads. This patch allows the mask to be a shifted constant. This results in a narrow load which is then left shifted to compensate for the new offset. Differential Revision: https://reviews.llvm.org/D50432 llvm-svn: 340261	2018-08-21 10:26:59 +00:00
Simon Pilgrim	72b324de4d	[TargetLowering] Add BuildSDiv support for division by one or negone. This reduces most of the sdiv stages (the MULHS, shifts etc.) to just zero/identity values and use the numerator scale factor to multiply by +1/-1. llvm-svn: 340260	2018-08-21 10:20:36 +00:00
Bjorn Pettersson	880f291577	[RegisterCoalescer] Do not assert when trying to remat dead values Summary: RegisterCoalescer::reMaterializeTrivialDef used to assert that the input register was live in. But as shown by the new coalesce-dead-lanes.mir test case that seems to be a valid scenario. We now return false instead of the assert, simply avoiding to remat the dead def. Normally a COPY of an undef value is eliminated by eliminateUndefCopy(). Although we only do that when the destination isn't a physical register. So the situation above should be limited to the case when we copy an undef value to a physical register. Reviewers: kparzysz, wmi, tpr Reviewed By: kparzysz Subscribers: MatzeB, qcolombet, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D50842 llvm-svn: 340255	2018-08-21 07:49:05 +00:00
Craig Topper	9c57ba0dc3	[X86] Add test command line to expose PR38649. Bypass slow division and constant hoisting are conspiring to break div+rem of large constants. llvm-svn: 340217	2018-08-20 21:51:35 +00:00
Craig Topper	210ccfe3db	[X86] Prevent lowerVectorShuffleByMerging128BitLanes from creating cycles Due to some splat handling code in getVectorShuffle, its possible for NewV1/NewV2 to have their mask modified from what is requested. This can lead to cycles being created in the DAG. This patch examines the returned mask and makes sure its different. Long term we may need to look closer at that splat code in getVectorShuffle, or add more splat awareness to getVectorShuffle. Fixes PR38639 Differential Revision: https://reviews.llvm.org/D50981 llvm-svn: 340214	2018-08-20 21:08:35 +00:00
Craig Topper	7dcb2c4b0a	[X86] Teach combineTruncatedArithmetic to handle some cases of ISD::SUB We can safely avoid interfering with the subus combine if both inputs are freely truncatable. Either both extends, or an extend and a constant vector. Differential Revision: https://reviews.llvm.org/D50878 llvm-svn: 340212	2018-08-20 20:57:35 +00:00
Craig Topper	08e7e04998	[X86] Pre-commit test cases for D50878. llvm-svn: 340211	2018-08-20 20:57:32 +00:00
Cameron McInally	94b9029be9	[FPEnv] Support constrained FREM intrinsic Differential Revision: https://reviews.llvm.org/D50975 llvm-svn: 340201	2018-08-20 19:28:56 +00:00
Simon Pilgrim	6ac905926f	[TargetLowering] Disable BuildSDiv division by one or negone. Fuzz tests have detected an issue, currently working on a fix. llvm-svn: 340195	2018-08-20 18:23:54 +00:00
Simon Pilgrim	5b78c9d58d	[SelectionDAG] Add partial sign-bit support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. Handle the case where the sign bit extends to only part of the small elements. llvm-svn: 340169	2018-08-20 13:05:48 +00:00
Simon Pilgrim	11bec5b80c	[X86][SSE] Fix PACKSS bitcast test from rL340166 We need the signbits to extends to lower 16-bits of the even elements llvm-svn: 340167	2018-08-20 11:47:15 +00:00
Simon Pilgrim	cee9c64838	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle a partial sign bits extension through a bitcast llvm-svn: 340166	2018-08-20 11:10:12 +00:00
Simon Pilgrim	686090a45f	[X86] Drop unnecessary exact qualifier from packss test llvm-svn: 340165	2018-08-20 11:01:51 +00:00
Simon Pilgrim	5b936ec89e	[SelectionDAG] Add basic demanded elements support to ComputeNumSignBits for BITCAST nodes Only adds support to the existing 'large element' scalar/vector to 'small element' vector bitcasts. The next step would be to support cases where the large elements aren't all sign bits, and determine the small element equivalent based on the demanded elements. llvm-svn: 340143	2018-08-19 17:47:50 +00:00
Simon Pilgrim	0fd72ab44f	[X86][SSE] Add PACKSS test showing ComputeNumSignBits failure to handle demanded elts through a bitcast llvm-svn: 340139	2018-08-19 16:01:47 +00:00
Craig Topper	803912ea57	[X86] Fix an issue in the matching for ADDUS. We were basically assuming only one operand of the compare could be an ADD node and using that to swap operands. But we can have a normal add followed by a saturing add. This rewrites the canonicalization to just be based on the condition code. llvm-svn: 340134	2018-08-19 04:26:31 +00:00
Craig Topper	a85d7e927b	[X86] Add a test case showing an issue in our addusw pattern matching. We are unable to handle a normal add followed by a saturing add with certain operand orders on the icmp. llvm-svn: 340133	2018-08-19 04:26:29 +00:00
Craig Topper	40c9559b74	[X86] Add support for using 512-bit PSUBUS to combineSelect. The code already support 128 and 256 and even knows to split 256 for AVX1. So we really just needed to stop looking for specific VTs and subtarget features and just look for legal VTs with i8/i16 elements. While there, add some curly braces around outer if statement bodies that contain only another if. It makes all the closing curly braces look more regular. llvm-svn: 340128	2018-08-18 18:51:03 +00:00
Craig Topper	b40a1d5f84	[X86] Add test cases to show missed opportunities to use 512-bit PSUBUS. llvm-svn: 340127	2018-08-18 18:50:59 +00:00
Craig Topper	911efbb926	[X86] Add a signed test case for PR38622. Use nounwind to reduce the output on the unsigned test case. llvm-svn: 340121	2018-08-18 06:00:16 +00:00
Craig Topper	cc5dbbf759	[DAGCombiner] Allow divide by constant optimization on opaque constants. Summary: I believe this restores the behavior we had before r339147. Fixes PR38622. Reviewers: RKSimon, chandlerc, spatel Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D50936 llvm-svn: 340120	2018-08-18 05:52:42 +00:00
Simon Pilgrim	2f48122cc9	[X86][SSE] Lower constant vXi8 ISD::SRL/ISD::SRA using PMULLW Extending the concept introduced in D49562, this patch lowers constant vXi8 ISD::SRL/ISD::SRA by zero/sign extending to vXi16 and using PMULLW and then truncating the high 8 bits of the result. Differential Revision: https://reviews.llvm.org/D50781 llvm-svn: 340062	2018-08-17 18:03:11 +00:00
Francis Visoiu Mistrih	f006b491bd	[x86] Fix test breaking on Darwin after r339962 * -march=x86-64 -> -mtriple=x86_64-unknown-linux to avoid _ prefixes to symbols * add -start-before to avoid running the whole codegen on the IR. I assumed it is meant to be running after X86SpeculativeLoadHardening. llvm-svn: 340034	2018-08-17 14:47:01 +00:00
Francis Visoiu Mistrih	8bff832534	[X86] Fix liveness information when expanding X86::EH_SjLj_LongJmp64 test/CodeGen/X86/shadow-stack.ll has the following machine verifier errors: ``` * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: %3:gr64 = MOV64rm killed %2:gr64, 1, $noreg, 8, $noreg - operand 1: killed %2:gr64 * Bad machine code: Using a killed virtual register * - function: bar - basic block: %bb.6 entry (0x7fdc81857818) - instruction: $rsp = MOV64rm killed %2:gr64, 1, $noreg, 16, $noreg - operand 1: killed %2:gr64 * Bad machine code: Virtual register killed in block, but needed live out. * - function: bar - basic block: %bb.2 entry (0x7fdc818574f8) Virtual register %2 is used after the block. ``` The fix here is to only copy the machine operand's register without the kill flags for all the instructions except the very last one of the sequence. I had to insert dummy PHIs in the test case to force the NoPHI function property to be set to false. More on this here: https://llvm.org/PR38439 Differential Revision: https://reviews.llvm.org/D50260 llvm-svn: 340033	2018-08-17 14:46:56 +00:00
Simon Pilgrim	03e57521c0	[DAGCombiner] extractShiftForRotate - fix out of range shift issue Don't just check for negative shift amounts. Fixes OSS Fuzz #9935 https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=9935 llvm-svn: 340015	2018-08-17 12:25:18 +00:00
Simon Pilgrim	5113b48798	[DAGCombine] Improve (sra (sra x, c1), c2) -> (sra x, (add c1, c2)) folding Add support for cases where only some c1+c2 results exceed the max bitshift, clamping accordingly. Differential Revision: https://reviews.llvm.org/D35722 llvm-svn: 340010	2018-08-17 10:52:49 +00:00
Chandler Carruth	75ca6be1c1	[x86/MIR] Implement support for pre- and post-instruction symbols, as well as MIR parsing support for `MCSymbol` `MachineOperand`s. The only real way to test pre- and post-instruction symbol support is to use them in operands, so I ended up implementing that within the patch as well. I can split out the operand support if folks really want but it doesn't really seem worth it. The functional implementation of pre- and post-instruction symbols is now completely trivial. Two tiny bits of code in the (misnamed) AsmPrinter. It should be completely target independent as well. We emit these exactly the same way as we emit basic block labels. Most of the code here is to give full dumping, MIR printing, and MIR parsing support so that we can write useful tests. The MIR parsing of MC symbol operands still isn't 100%, as it forces the symbols to be non-temporary and non-local symbols with names. However, those names often can encode most (if not all) of the special semantics desired, and unnamed symbols seem especially annoying to serialize and de-serialize. While this isn't perfect or full support, it seems plenty to write tests that exercise usage of these kinds of operands. The MIR support for pre-and post-instruction symbols was quite straightforward. I chose to print them out in an as-if-operand syntax similar to debug locations as this seemed the cleanest way and let me use nice introducer tokens rather than inventing more magic punctuation like we use for memoperands. However, supporting MIR-based parsing of these symbols caused me to change the design of the symbol support to allow setting arbitrary symbols. Without this, I don't see any reasonable way to test things with MIR. Differential Revision: https://reviews.llvm.org/D50833 llvm-svn: 339962	2018-08-16 23:11:05 +00:00
Craig Topper	883ff69c93	[DAGCombiner] Don't reassociate operations that have the vector reduction flag set. When nodes are reassociated the vector-reduction flag gets lost. The test case is here is what would happen if you had a sum of absolute differences loop that started with a non-zero but contant sum and that loop was unrolled. The vectorizer will generate a constant vector for the initial value. And DAGCombiner reassociate tries to move it down the addition tree erasing the vector-reduction flag. Interestingly this moves constants the opposite direction of the reassociate IR pass. I've chosen to just punt on the reassociate, but I suppose we could maybe preserve the flag if both nodes have it set. Differential Revision: https://reviews.llvm.org/D50827 llvm-svn: 339946	2018-08-16 21:54:05 +00:00
Craig Topper	bde2b43cb3	[X86] In EFLAGS copy pass, don't emit EXTRACT_SUBREG instructions since we're after peephole Normally the peephole pass converts EXTRACT_SUBREG to COPY instructions. But we're after peephole so we can't rely on it to clean these up. To fix this, the eflags pass now emits a COPY with a subreg input. I also noticed that in 32-bit mode we need to constrain the input to the copy to ensure the subreg is valid. Otherwise we'll fail verify-machineinstrs Differential Revision: https://reviews.llvm.org/D50656 llvm-svn: 339945	2018-08-16 21:54:02 +00:00
Craig Topper	3dfc5af178	[X86] Pre-commit test case for D50827. llvm-svn: 339926	2018-08-16 19:27:43 +00:00
Eli Friedman	73e8a784e6	[SelectionDAG] Improve the legalisation lowering of UMULO. There is no way in the universe, that doing a full-width division in software will be faster than doing overflowing multiplication in software in the first place, especially given that this same full-width multiplication needs to be done anyway. This patch replaces the previous implementation with a direct lowering into an overflowing multiplication algorithm based on half-width operations. Correctness of the algorithm was verified by exhaustively checking the output of this algorithm for overflowing multiplication of 16 bit integers against an obviously correct widening multiplication. Baring any oversights introduced by porting the algorithm to DAG, confidence in correctness of this algorithm is extremely high. Following table shows the change in both t = runtime and s = space. The change is expressed as a multiplier of original, so anything under 1 is “better” and anything above 1 is worse. +-------+-----------+-----------+-------------+-------------+ \| Arch \| u64u64 t \| u64u64 s \| u128u128 t \| u128u128 s \| +-------+-----------+-----------+-------------+-------------+ \| X64 \| - \| - \| ~0.5 \| ~0.64 \| \| i686 \| ~0.5 \| ~0.6666 \| ~0.05 \| ~0.9 \| \| armv7 \| - \| ~0.75 \| - \| ~1.4 \| +-------+-----------+-----------+-------------+-------------+ Performance numbers have been collected by running overflowing multiplication in a loop under `perf` on two x86_64 (one Intel Haswell, other AMD Ryzen) based machines. Size numbers have been collected by looking at the size of function containing an overflowing multiply in a loop. All in all, it can be seen that both performance and size has improved except in the case of armv7 where code size has regressed for 128-bit multiply. u128*u128 overflowing multiply on 32-bit platforms seem to benefit from this change a lot, taking only 5% of the time compared to original algorithm to calculate the same thing. The final benefit of this change is that LLVM is now capable of lowering the overflowing unsigned multiply for integers of any bit-width as long as the target is capable of lowering regular multiplication for the same bit-width. Previously, 128-bit overflowing multiply was the widest possible. Patch by Simonas Kazlauskas! Differential Revision: https://reviews.llvm.org/D50310 llvm-svn: 339922	2018-08-16 18:39:39 +00:00
Simon Pilgrim	87d0039a45	[TargetLowering] Add support for non-uniform vectors to BuildSDIV This patch refactors the existing TargetLowering::BuildSDIV base implementation to support non-uniform constant vector denominators. This is the last patch necessary to close PR36545 Differential Revision: https://reviews.llvm.org/D50765 llvm-svn: 339908	2018-08-16 17:44:33 +00:00
Simon Pilgrim	8b9e545477	[X86][SSE] Add sdiv by nonuniform constant vector test containing -1/+1 and all-bits style constants llvm-svn: 339901	2018-08-16 17:07:41 +00:00
Craig Topper	9c1d9fdeaa	[X86] Remove masking from the 512-bit padds and psubs intrinsics. Use select in IR instead. llvm-svn: 339842	2018-08-16 06:20:24 +00:00
Craig Topper	9d6983c9fd	[X86] Remove the unused masked 128 and 256-bit masked padds/psubs intrinsics. Still need to remove masking from the 512-bit versions. llvm-svn: 339841	2018-08-16 06:20:22 +00:00
Craig Topper	054b8cce2d	[X86] Correct some bad FileCheck prefixes in tests. Add test cases for v64i8 padd/psub saturation intrinsics. For some reason we had the 128/256-bit tests, but no the 512-bit tests. llvm-svn: 339840	2018-08-16 06:20:19 +00:00
Chandler Carruth	00c35c7794	[x86] Actually initialize the SLH pass with the x86 backend and use a shorter name ('x86-slh') for the internal flags and pass name. Without this, you can't use the -stop-after or -stop-before infrastructure. I seem to have just missed this when originally adding the pass. The shorter name solves two problems. First, the flag names were ... really long and hard to type/manage. Second, the pass name can't be the exact same as the flag name used to enable this, and there are already some users of that flag name so I'm avoiding changing it unnecessarily. llvm-svn: 339836	2018-08-16 01:22:19 +00:00
Craig Topper	08e082619a	[X86] Improve AVX1 shuffle lowering for v8f32 shuffles where the low half comes from V1 and the high half comes from V2 and the halves do the same operation To lower this we now create a new V1 containing the low half of both sources and a new V2 containing the upper half of both sources. Then we created a repeated lane shuffle of those new sources to create the final result. This fixes PR35833 Differential Revison: https://reviews.llvm.org/D41794 llvm-svn: 339818	2018-08-15 21:21:52 +00:00
Sanjay Patel	712d42f53d	[x86] add fabs test for vector intrinsic to potential libcall bug; NFC This is a negative test for x86 because it has custom lowering for fabs. llvm-svn: 339791	2018-08-15 16:56:09 +00:00
Sanjay Patel	f9afee479f	[x86] add tests for poor vector intrinsic lowering via legalization (PR38527); NFC llvm-svn: 339790	2018-08-15 16:35:50 +00:00
Simon Pilgrim	51cee894da	[X86][SSE] Add sdiv by nonuniform constant vector tests Tests cover each TargetLowering::BuildSDIV path separately plus combos llvm-svn: 339761	2018-08-15 10:59:29 +00:00
Aleksandr Urakov	eb3735e425	[X86] Add sibling-call test cases This commit adds new sibling-call test cases, so it will be possible to see how these test cases will be changed after applying D45653. See D45653 for details. llvm-svn: 339760	2018-08-15 10:54:06 +00:00
Simon Pilgrim	a272fa9b0c	[TargetLowering] Add support for non-uniform vectors to BuildExactSDIV This patch refactors the existing BuildExactSDIV implementation to support non-uniform constant vector denominators. Differential Revision: https://reviews.llvm.org/D50392 llvm-svn: 339756	2018-08-15 09:35:12 +00:00
Cameron McInally	00b0658aae	[FPEnv] Scalarize StrictFP vector operations Add a helper function to scalarize constrained FP operations as needed. Differential Revision: https://reviews.llvm.org/D50720 llvm-svn: 339735	2018-08-14 22:13:11 +00:00
Simon Pilgrim	2ce3d6e135	[X86][SSE] Avoid duplicate shuffle input sources in combineX86ShufflesRecursively rL339686 added the case where a faux shuffle might have repeated shuffle inputs coming from either side of the OR(). This patch improves the insertion of the inputs into the source ops lists to account for this, as well as making it trivial to add support for shuffles with more than 2 inputs in the future. llvm-svn: 339696	2018-08-14 17:22:37 +00:00

1 2 3 4 5 ...

12349 Commits