llvm-project

Commit Graph

Author	SHA1	Message	Date
Simon Pilgrim	abebe4c746	[DAGCombiner] Ensure we use the correct CC result type in visitSDIV We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821	2018-06-28 09:54:28 +00:00
Sanjay Patel	d052de856d	[DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zeros As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761	2018-06-27 18:16:40 +00:00
Craig Topper	812fcb35e7	[X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754	2018-06-27 16:47:39 +00:00
Craig Topper	069628b4df	[X86] Add test cases for D48606. llvm-svn: 335753	2018-06-27 16:47:36 +00:00
Simon Pilgrim	8a02b25313	[X86][SSE] Add missing AVX512 rotation tests Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746	2018-06-27 16:00:53 +00:00
Craig Topper	31cbe75b3b	[X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744	2018-06-27 15:57:53 +00:00
Simon Pilgrim	d3e583a52d	[DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727	2018-06-27 12:45:31 +00:00
Simon Pilgrim	41afbcb9ca	[X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 tests llvm-svn: 335721	2018-06-27 10:59:36 +00:00
Simon Pilgrim	dfbcc66adc	[DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0) Fixes PR37569. llvm-svn: 335719	2018-06-27 10:21:06 +00:00
Simon Pilgrim	0a566bc0ae	[DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector expansion (PR37569) llvm-svn: 335717	2018-06-27 09:41:22 +00:00
Simon Pilgrim	c9e60adcb5	[X86] Add test for SDIV by sign bit (minsigned) value llvm-svn: 335671	2018-06-26 22:03:00 +00:00
Jessica Paquette	67599c2e1e	[X86][AsmParser] Recommit r335658 Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668	2018-06-26 21:30:34 +00:00
Jessica Paquette	0a80af0761	Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660	2018-06-26 20:57:19 +00:00
Jessica Paquette	0e40d4bfc3	[X86][AsmParser] Emit an error when RIP-relative instructions are used in 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658	2018-06-26 20:33:46 +00:00
Simon Pilgrim	7f55af37f4	[DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion (PR37119) Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported. llvm-svn: 335637	2018-06-26 17:46:51 +00:00
Simon Pilgrim	1576df53a9	[X86][SSE] Add another sdiv by (nonuniform) minus one test (PR37119) Include a test that divides by -1 but not by 1 (another special case) llvm-svn: 335629	2018-06-26 17:06:05 +00:00
Than McIntosh	3190993a02	[X86,ARM] Retain split-stack prolog check for sibling calls Summary: If a routine with no stack frame makes a sibling call, we need to preserve the stack space check even if the local stack frame is empty, since the call target could be a "no-split" function (in which case the linker needs to be able to fix up the prolog sequence in order to switch to a larger stack). This fixes PR37807. Reviewers: cherry, javed.absar Subscribers: srhines, llvm-commits Differential Revision: https://reviews.llvm.org/D48444 llvm-svn: 335604	2018-06-26 14:11:30 +00:00
Craig Topper	c42ed4e3c4	[X86] Use XOR for SUB (C, X) during isel if will help fold an immediate Summary: Same idea as D48529, but restricted to X86 and done very late to avoid any surprises where subtract might be better for DAG combining. This seems like the safest way to do this trick. And we consider doing it as a DAG combine later. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D48557 llvm-svn: 335575	2018-06-26 03:11:15 +00:00
Craig Topper	689e363ff2	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. This recommits r335562 and 335563 as a single commit. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335568	2018-06-26 01:37:02 +00:00
Craig Topper	6f4fdfa9af	Revert r335562 and 335563 "[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction." These were supposed to have been squashed to a single commit. llvm-svn: 335566	2018-06-26 01:31:53 +00:00
Craig Topper	c2ee4a5035	[X86] Redefine avx512 packed fpclass intrinsics to return a vXi1 mask and implement the mask input argument using an 'and' IR instruction. The frontend will surround the intrinsic with the appropriate marshalling to/from a scalar type to match the sigature of the builtin that software expects. By exposing the vXi1 type directly in the llvm intrinsic we make it available to optimizers much earlier. This can enable the scalar marshalling code to be optimized away. llvm-svn: 335563	2018-06-26 00:43:46 +00:00
Craig Topper	53a41858c1	[X86] Update fpclass intrinsic tests to chain their calls to the intrinsic rather than joining them with add. The test cases try to test masked and unmasked isntructions at the same time. Previously the masked version relies on an extra fucntion parameter. Then the two results were combined with 'add'. This patch gets rid of the second parameter and just passes the result of the first intrinsic into the mask argument of the second call. Then there's no need for an 'add'. This configuration works a lot better with an upcoming patch to redefine the intrinsics to use vXi1 types for the output and mask argument. llvm-svn: 335551	2018-06-25 23:29:47 +00:00
Reid Kleckner	88fee5fdbc	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. I restricted the MCJIT/eh-lg-pic.ll test to Linux, since the large PIC code model is not implemented for MachO yet. Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335508	2018-06-25 18:16:27 +00:00
Artur Pilipenko	ddc7f391d2	Revert change 335077 "[InlineSpiller] Fix a crash due to lack of forward progress from remat specifically for STATEPOINT" This change caused widespread assertion failures in our downstream testing: lib/CodeGen/LiveInterval.cpp:409: bool llvm::LiveRange::overlapsFrom(const llvm::LiveRange&, llvm::LiveRange::const_iterator) const: Assertion `!empty() && "empty range"' failed. llvm-svn: 335462	2018-06-25 12:58:13 +00:00
Artur Pilipenko	ab52071ddd	Revert change 335091. It adds extra test for the change 335077, which is also to be reverted as it causes test failures in downstream testing. llvm-svn: 335461	2018-06-25 12:55:58 +00:00
Craig Topper	4331d6218d	[X86] Remove the changes to combineScalarToVector made in r335037. They appear to be untested other than the test case for p37879.ll and I believe we should be using SimplifyDemandedElts here to handle these cases. llvm-svn: 335436	2018-06-25 00:21:53 +00:00
Sanjay Patel	962ee178fa	[DAGCombiner] eliminate setcc bool math when input is low-bit of some value This patch has the same motivating example as D48466: define void @foo(i64 %x, i32 %c.0282.in, i32 %d.0280, i32* %ptr0, i32* %ptr1) { %c.0282 = and i32 %c.0282.in, 268435455 %a16 = lshr i64 32508, %x %a17 = and i64 %a16, 1 %tobool = icmp eq i64 %a17, 0 %. = select i1 %tobool, i32 1, i32 2 %.286 = select i1 %tobool, i32 27, i32 26 %shr97 = lshr i32 %c.0282, %. %shl98 = shl i32 %c.0282.in, %.286 %or99 = or i32 %shr97, %shl98 %shr100 = lshr i32 %d.0280, %. %shl101 = shl i32 %d.0280, %.286 %or102 = or i32 %shr100, %shl101 store i32 %or99, i32* %ptr0 store i32 %or102, i32* %ptr1 ret void } ...but I'm trying to kill the setcc bool math sooner rather than later. By matching a larger pattern that includes both the low-bit mask and the trailing add/sub, we can create a universally good fold because we always eliminate the condition code intermediate value. Here are Alive proofs for these (currently instcombine folds the 'add' variants, but misses the 'sub' patterns): https://rise4fun.com/Alive/Gsyp Name: sub of zext cmp mask %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %z = zext i1 %c to i32 %r = sub i32 C1, %z => %optional_cast = zext i8 %a to i32 %r = add i32 %optional_cast, C1-1 Name: add of zext cmp mask %a = and i32 %x, 1 %c = icmp eq i32 %a, 0 %z = zext i1 %c to i8 %r = add i8 %z, C1 => %optional_cast = trunc i32 %a to i8 %r = sub i8 C1+1, %optional_cast All of the tests look like improvements or neutral to me. But it is possible that x86 test+set+bitop is better than what we now show here. I suspect we could do better by adding another fold for the 'sub' variants. We start with select-of-constant in IR in the larger motivating test, so that's why I included tests with selects. Proofs for those variants: https://rise4fun.com/Alive/Bx1 Name: true const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C2, i64 C1 => %z = zext i8 %a to i64 %r = sub i64 C2, %z Name: false const is bigger Pre: C2 == (C1 + 1) %a = and i8 %x, 1 %c = icmp eq i8 %a, 0 %r = select i1 %c, i64 C1, i64 C2 => %z = zext i8 %a to i64 %r = add i64 C1, %z Differential Revision: https://reviews.llvm.org/D48466 llvm-svn: 335433	2018-06-24 14:37:30 +00:00
Sanjay Patel	80b85a46db	[x86] add more tests for bit hacking opportunities with setcc; NFC Missed cases where the input and output are the same size in rL335391. llvm-svn: 335396	2018-06-22 22:07:26 +00:00
Sanjay Patel	705cde3ac8	[x86] add tests for bit hacking opportunities with setcc; NFC We likely gave up on folding some select-of-constants patterns in IR with rL331486, and we need to recover those in the DAG. The tests without select are based on our current DAGCombiner optimizations for select-of-constants. llvm-svn: 335391	2018-06-22 21:16:54 +00:00
Craig Topper	a55cc4a2e9	[X86] Add test cases showing missed select simplifcation for MCU when icmp is in a slightly different form. These test cases show that the "(select (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) ^ y" doesn't work if the select condition is changed to (and (x, 0x1) != 1) llvm-svn: 335389	2018-06-22 21:09:31 +00:00
Simon Pilgrim	938dbe664b	[X86][SSE] Add sdiv by (nonuniform) minus one tests (PR37119) Test cases from D45806 llvm-svn: 335376	2018-06-22 18:31:57 +00:00
Easwaran Raman	f997233890	[X86] Add a test to show missed opportunity to generate vfnmadd llvm-svn: 335367	2018-06-22 17:01:13 +00:00
Simon Pilgrim	234a6f6842	[X86] Regenerate tests to include fma comments Noticed in the review of D48467 llvm-svn: 335342	2018-06-22 12:41:48 +00:00
George Rimar	dcf59c5480	Recommit r335333 "[MC] - Add .stack_size sections into groups and link them with .text" With compilation fix. Original commit message: D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335336	2018-06-22 10:53:47 +00:00
George Rimar	6d448da1be	Revert r335332 "[MC] - Add .stack_size sections into groups and link them with .text" It broke bots. http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/12891 http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/9443 http://lab.llvm.org:8011/builders/lldb-x86_64-ubuntu-14.04-buildserver/builds/25551 llvm-svn: 335333	2018-06-22 10:27:33 +00:00
George Rimar	e14485a0c6	[MC] - Add .stack_size sections into groups and link them with .text D39788 added a '.stack-size' section containing metadata on function stack sizes to output ELF files behind the new -stack-size-section flag. This change does following two things on top: 1) Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs. 2) Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections. Differential revision: https://reviews.llvm.org/D46874 llvm-svn: 335332	2018-06-22 10:10:53 +00:00
Mikhail Dvoretckii	0963562083	[X86] Changing the check for valid inputs in combineScalarToVector Changing the logic of scalar mask folding to check for valid input types rather than against invalid ones, making it more robust and fixing PR37879. Differential Revision: https://reviews.llvm.org/D48366 llvm-svn: 335323	2018-06-22 08:28:05 +00:00
Reid Kleckner	2ef486690c	[X86] Fix 32-bit mingw comdat names, only add one underscore llvm-svn: 335304	2018-06-21 23:06:33 +00:00
Reid Kleckner	3a2fd1c2f3	Revert r335297 "[X86] Implement more of x86-64 large and medium PIC code models" MCJIT can't handle R_X86_64_GOT64 yet. llvm-svn: 335300	2018-06-21 22:19:05 +00:00
Reid Kleckner	247fe6aeab	[X86] Implement more of x86-64 large and medium PIC code models Summary: The large code model allows code and data segments to exceed 2GB, which means that some symbol references may require a displacement that cannot be encoded as a displacement from RIP. The large PIC model even relaxes the assumption that the GOT itself is within 2GB of all code. Therefore, we need a special code sequence to materialize it: .LtmpN: leaq .LtmpN(%rip), %rbx movabsq $_GLOBAL_OFFSET_TABLE_-.LtmpN, %rax # Scratch addq %rax, %rbx # GOT base reg From that, non-local references go through the GOT base register instead of being PC-relative loads. Local references typically use GOTOFF symbols, like this: movq extern_gv@GOT(%rbx), %rax movq local_gv@GOTOFF(%rbx), %rax All calls end up being indirect: movabsq $local_fn@GOTOFF, %rax addq %rbx, %rax callq *%rax The medium code model retains the assumption that the code segment is less than 2GB, so calls are once again direct, and the RIP-relative loads can be used to access the GOT. Materializing the GOT is easy: leaq _GLOBAL_OFFSET_TABLE_(%rip), %rbx # GOT base reg DSO local data accesses will use it: movq local_gv@GOTOFF(%rbx), %rax Non-local data accesses will use RIP-relative addressing, which means we may not always need to materialize the GOT base: movq extern_gv@GOTPCREL(%rip), %rax Direct calls are basically the same as they are in the small code model: They use direct, PC-relative addressing, and the PLT is used for calls to non-local functions. This patch adds reasonably comprehensive testing of LEA, but there are lots of interesting folding opportunities that are unimplemented. Reviewers: chandlerc, echristo Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D47211 llvm-svn: 335297	2018-06-21 21:55:08 +00:00
Reid Kleckner	13c9ee684c	[mingw] Fix GCC ABI compatibility for comdat things Summary: GCC and the binutils COFF linker do comdats differently from MSVC. If we want to be ABI compatible, we have to do what they do, which is to emit unique section names like ".text$_Z3foov" instead of short section names like ".text". Otherwise, the binutils linker gets confused and reports multiple definition errors when two object files from GCC and Clang containing the same inline function are linked together. The best description of the issue is probably at https://github.com/Alexpux/MINGW-packages/issues/1677, we don't seem to have a good one in our tracker. I fixed up the .pdata and .xdata sections needed everywhere other than 32-bit x86. GCC doesn't use associative comdats for those, it appears to rely on the section name. Reviewers: smeenai, compnerd, mstorsjo, martell, mati865 Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D48402 llvm-svn: 335286	2018-06-21 20:27:38 +00:00
Matt Davis	d041f21810	[DebugInfo] Ignore DBG_VALUE instructions in PostRA Machine Sink Summary: The logic for handling the sinking of COPY instructions was generating different code when building with debug flags. The original code did not take into consideration debug instructions. This resulted in the registers in the DBG_VALUE instructions being treated as used, and prevented the COPY from being sunk. This patch avoids analyzing debug instructions when trying to sink COPY instructions. This patch also creates a routine from the code in MachineSinking::SinkInstruction to perform the logic of sinking an instruction along with its debug instructions. This functionality is used in multiple places, including the code for sinking COPY instrs. Reviewers: junbuml, javed.absar, MatzeB, bjope Reviewed By: bjope Subscribers: aprantl, probinson, thegameg, jonpa, bjope, vsk, kristof.beyls, JDevlieghere, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D45637 llvm-svn: 335264	2018-06-21 17:59:52 +00:00
Craig Topper	8014053cbd	[X86] Update fast-isel tests for clang r335253. The new IR fixes a mismatch in the final extractelement for the i32 intrinsics. Previously we extracted a 64-bit element even though we only wanted 32 bits. SimplifyDemandedElts isn't able to make FP elements undef now and the shuffle mask I used prevents the use of horizontal add we had before. Not sure we should have been using horizontal add anyway. It's implemented on Intel with two port 5 shuffles and an add. So we have on less shuffle now, but an additional instruction to decode. Differential Revision: https://reviews.llvm.org/D48347 llvm-svn: 335256	2018-06-21 16:54:18 +00:00
Stanislav Mekhanoshin	22ee191c3e	DAG combine "and\|or (select c, -1, 0), x" -> "select c, x, 0\|-1" Allowed folding for "and/or" binops with non-constant operand if arguments of select are 0/-1 values. Normally this code with "and" opcode does not get to a DAG combiner and simplified yet in the InstCombine. However AMDGPU produces it during lowering and InstCombine has no chance to optimize it out. In turn the same pattern with "or" opcode can reach DAG. Differential Revision: https://reviews.llvm.org/D48301 llvm-svn: 335250	2018-06-21 16:02:05 +00:00
Mikhail Dvoretckii	22c82af5c8	[x86] Lower some trunc + shuffle patterns to vpmov[q\|d][b\|w] This should help in lowering the following four intrinsics: _mm256_cvtepi32_epi8 _mm256_cvtepi64_epi16 _mm256_cvtepi64_epi8 _mm512_cvtepi64_epi8 Differential Revision: https://reviews.llvm.org/D46957 llvm-svn: 335238	2018-06-21 14:16:45 +00:00
Mikael Holmen	42f7bc96dd	[DebugInfo] Make sure all DBG_VALUEs' reguse operands have IsDebug property Summary: In some cases, these operands lacked the IsDebug property, which is meant to signal that they should not affect codegen. This patch adds a check for this property in the MachineVerifier and adds it where it was missing. This includes refactorings to use MachineInstrBuilder construction functions instead of manually setting up the intrinsic everywhere. Patch by: JesperAntonsson Reviewers: aprantl, rnk, echristo, javed.absar Reviewed By: aprantl Subscribers: qcolombet, sdardis, nemanjai, JDevlieghere, atanasyan, llvm-commits Differential Revision: https://reviews.llvm.org/D48319 llvm-svn: 335214	2018-06-21 10:03:34 +00:00
Craig Topper	76df3d61a3	[X86] Go through some tests that still reference old intrinsics that have been autoupgraded and replace them with the upgraded IR. This is mostly the stack folding tests and is by no means a thorough audit of tests. llvm-svn: 335204	2018-06-21 06:17:16 +00:00
Craig Topper	296526bf46	[X86] Remove masking from 512-bit floating max/min intrinsics. Use select instruction instead. llvm-svn: 335199	2018-06-21 05:00:56 +00:00
Alina Sbirlea	dfd14adeb0	Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred. Summary: Two utils methods have essentially the same functionality. This is an attempt to merge them into one. 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor Prior to the patch: 1. MergeBasicBlockIntoOnlyPred Updates either DomTree or DeferredDominance Moves all instructions from Pred to BB, deletes Pred Asserts BB has single predecessor If address was taken, replace the block address with constant 1 (?) 2. MergeBlockIntoPredecessor Updates DomTree, LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken After the patch: Method 2. MergeBlockIntoPredecessor is attempting to become the new default: Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults Moves all instruction from BB to Pred, deletes BB Returns if doesn't have a single predecessor Returns if BB's address was taken Uses of MergeBasicBlockIntoOnlyPred that need to be replaced: 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp Updated in this patch. No challenges. 2. lib/CodeGen/CodeGenPrepare.cpp Updated in this patch. i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation. ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks Some interesting aspects: - Since Pred is not deleted (BB is), the entry block does not need updating. - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred. - isMergingEmptyBlockProfitable assumes BB is the one to be deleted. - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead. - adding some test owner as subscribers for the interesting tests modified: test/CodeGen/X86/avx-cmp.ll test/CodeGen/AMDGPU/nested-loop-conditions.ll test/CodeGen/AMDGPU/si-annotate-cf.ll test/CodeGen/X86/hoist-spill.ll test/CodeGen/X86/2006-11-17-IllegalMove.ll 3. lib/Transforms/Scalar/JumpThreading.cpp Not covered in this patch. It is the only use case using the DeferredDominance. I would defer to Brian Rzycki to make this replacement. Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits Differential Revision: https://reviews.llvm.org/D48202 llvm-svn: 335183	2018-06-20 22:01:04 +00:00
Craig Topper	c2696d577b	[X86] Use setcc ISD opcode for AVX512 integer comparisons all the way to isel I don't believe there is any real reason to have separate X86 specific opcodes for vector compares. Setcc has the same behavior just uses a different encoding for the condition code. I had to change the CondCodeAction for SETLT and SETLE to prevent some transforms from changing SETGT lowering. Differential Revision: https://reviews.llvm.org/D43608 llvm-svn: 335173	2018-06-20 21:05:02 +00:00

1 2 3 4 5 ...

12030 Commits