llvm-project

Commit Graph

Author	SHA1	Message	Date
Sam Parker	2a6c842fda	[ARM] Revert r337821 Re-enabling ARMCodeGenPrepare by default after failing to reproduce the bootstrap issues that I was concerned it was causing. llvm-svn: 338354	2018-07-31 09:04:14 +00:00
Craig Topper	9164b9b16e	[X86] Stop accidentally running the Bonnell LEA fixup path on Goldmont. In one place we checked X86Subtarget.slowLEA() to decide if the pass should run. But to decide what the pass should we only check isSLM. This resulted in Goldmont going down the Bonnell path. llvm-svn: 338342	2018-07-31 00:43:54 +00:00
Ana Pazos	2baa767455	[RISCV] Fixed test case failure due to r338047 llvm-svn: 338341	2018-07-31 00:36:28 +00:00
Amara Emerson	1e8c164c63	[AArch64][GlobalISel] Add isel support for G_BLOCK_ADDR. Also refactors some existing code to materialize addresses for the large code model so it can be shared between G_GLOBAL_VALUE and G_BLOCK_ADDR. This implements PR36390. Differential Revision: https://reviews.llvm.org/D49903 llvm-svn: 338337	2018-07-31 00:09:02 +00:00
Amara Emerson	0e86c07077	[AArch64][GlobalISel] Make G_BLOCK_ADDR legal. Differential Revision: https://reviews.llvm.org/D49902 llvm-svn: 338336	2018-07-31 00:08:56 +00:00
Amara Emerson	6aff5a7810	[GlobalISel] Add a G_BLOCK_ADDR opcode to handle IR blockaddress constants. Differential Revision: https://reviews.llvm.org/D49900 llvm-svn: 338335	2018-07-31 00:08:50 +00:00
Sanjay Patel	9f807f44b1	[DAGCombiner] transform sub-of-shifted-signbit to add This is exchanging a sub-of-1 with add-of-minus-1: https://rise4fun.com/Alive/plKAH This is another step towards improving select-of-constants codegen (see D48970). x86 is the motivating target, and those diffs all appear to be wins. PPC and AArch64 look neutral. I've limited this to early combining (!LegalOperations) in case a target wants to reverse it, but I think canonicalizing to 'add' is more likely to produce further transforms because we have more folds for 'add'. Differential Revision: https://reviews.llvm.org/D49924 llvm-svn: 338317	2018-07-30 22:21:37 +00:00
Jessica Paquette	fa3bee4756	[MachineOutliner][AArch64] Add support for saving LR to a register This teaches the outliner to save LR to a register rather than the stack when possible. This allows us to avoid bumping the stack in outlined functions in some cases. By doing this, in a later patch, we can teach the outliner to do something like this: f1: ... bl OUTLINED_FUNCTION ... f2: ... move LR's contents to a register bl OUTLINED_FUNCTION move the register's contents back instead of falling back to saving LR in both cases. llvm-svn: 338278	2018-07-30 17:45:28 +00:00
Jessica Paquette	bbcc8895bb	Add machine verifier to arm64-opt-remarks-lazy-bfi Previously, I thought this was a Windows failure. Then I realized it failed on every bot that used the verifier. This makes it use the verifier always, and adds that pass to the pipeline checks so that it's consistent across all bots. llvm-svn: 338272	2018-07-30 17:13:25 +00:00
David Bolvansky	2fa7fb14ea	[DAGCombiner] Bug 31275- Extract a shift from a constant mul or udiv if a rotate can be formed Summary: Attempt to extract a shrl from a udiv or a shl from a mul if this allows a rotate to be formed. This targets cases where the input to a rotate pattern was a mul or udiv by a constant and InstCombine merged one of the shifts with the op. Patch by: sameconrad (Sam Conrad) Reviewers: RKSimon, craig.topper, spatel, lebedev.ri, javed.absar Reviewed By: lebedev.ri Subscribers: efriedma, kparzysz, llvm-commits Differential Revision: https://reviews.llvm.org/D47681 llvm-svn: 338270	2018-07-30 16:50:00 +00:00
Thomas Preud'homme	196149c943	Reapply "Fix crash on inline asm with 64bit matching input in 32bit GPR" This reapplies commit r338206 reverted by r338214 since the bug that r338206 uncovered has been fixed in r338268. Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338269	2018-07-30 16:48:39 +00:00
Thomas Preud'homme	6c1b075299	Fix uninitialized read in ARM's PrintAsmOperand Summary: Fix read of uninitialized RC variable in ARM's PrintAsmOperand when hasRegClassConstraint returns false. This was causing inline-asm-operand-implicit-cast test to fail in r338206. Reviewers: t.p.northover, weimingz, javed.absar, chill Reviewed By: chill Subscribers: chill, eraman, kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D49984 llvm-svn: 338268	2018-07-30 16:45:40 +00:00
Jessica Paquette	7816531f3c	Attempt to fix Windows test failure caused by r338133 It seems like the pass pipeline on Windows is slightly different than on Linux and macOS. As a result, the arm64-opt-remarks-lazy-bfi test has been failing. This switches a CHECK-NEXT to a CHECK-DAG to try and get this running properly again. It'd be nice to switch it back to a CHECK-NEXT if possible, but the CHECK-NEXT lines following the line we care about (the optimization remark emitter) do a pretty good job of enforcing the ordering we want. Hopefully this works, since I don't have a Windows machine. ;) Example failure: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/11295 llvm-svn: 338267	2018-07-30 16:36:22 +00:00
Simon Pilgrim	186b62c9e4	[X86] Regenerate NOBMI/BMI combine-select tests. Test cleanup for D38128 llvm-svn: 338265	2018-07-30 16:18:38 +00:00
Simon Pilgrim	2d5118432b	[X86] Regenerate PKU test to merge 32/64-bit rdpkru checks Test cleanup for D38128 llvm-svn: 338264	2018-07-30 16:15:18 +00:00
Simon Pilgrim	22ff9f94bb	[X86] Regenerate fast-isel tests. Test cleanup for D38128 llvm-svn: 338262	2018-07-30 16:13:40 +00:00
Krzysztof Parzyszek	24fae50905	[Hexagon] Simplify A4_rcmp[n]eqi R, 0 Consider cases when register R is known to be zero/non-zero, or when it is defined by a C2_muxii instruction. llvm-svn: 338251	2018-07-30 14:28:02 +00:00
Matt Arsenault	de496c32a4	AMDGPU: Reduce code size with fcanonicalize (fneg x) When fcanonicalize is lowered to a mul, we can use -1.0 for free and avoid the cost of the bigger encoding for source modifers. llvm-svn: 338244	2018-07-30 12:16:58 +00:00
Matt Arsenault	f3c9a34def	AMDGPU: Make fneg combine handle fcanonicalize llvm-svn: 338243	2018-07-30 12:16:47 +00:00
Francis Visoiu Mistrih	7d003657de	[MachineOutliner][X86] Use TAILJMPd64 instead of JMP_1 for TailCall construction The machine verifier asserts with: Assertion failed: (isMBB() && "Wrong MachineOperand accessor"), function getMBB, file ../include/llvm/CodeGen/MachineOperand.h, line 542. It calls analyzeBranch which tries to call getMBB if the opcode is JMP_1, but in this case we do: JMP_1 @OUTLINED_FUNCTION I believe we have to use TAILJMPd64 instead of JMP_1 since JMP_1 is used with brtarget8. Differential Revision: https://reviews.llvm.org/D49299 llvm-svn: 338237	2018-07-30 09:59:33 +00:00
Nicolai Haehnle	7f0d05d532	AMDGPU: Force skip over s_sendmsg and exp instructions Summary: These instructions interact with hardware blocks outside the shader core, and they can have "scalar" side effects even when EXEC = 0. We don't want these scalar side effects to occur when all lanes want to skip these instructions, so always add the execz skip branch instruction for basic blocks that contain them. Also ensure that we skip scalar stores / atomics, though we don't code-gen those yet. Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D48431 Change-Id: Ieaeb58352e2789ffd64745603c14970c60819d44 llvm-svn: 338235	2018-07-30 09:23:59 +00:00
Petr Pavlu	8b6eff4e77	[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors Code in `CC_ARM_AAPCS_Custom_Aggregate()` is responsible for handling homogeneous aggregates for `CC_ARM_AAPCS_VFP`. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible according to AAPCS. Once the first item was laid out, the alignment used for consecutive items was the size of one item. This logic went wrong for 128-bit vectors because their alignment is normally only 64 bits, and so could result in inserting unexpected padding between the first and second element. The patch fixes the problem by updating the alignment with the item size only if this results in reducing it. Differential Revision: https://reviews.llvm.org/D49720 llvm-svn: 338233	2018-07-30 08:49:30 +00:00
Sanjay Patel	7312206f2f	revert r338206 because the test does not pass Example of bot failure: http://lab.llvm.org:8011/builders/clang-cmake-armv8-quick/builds/5107/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Ainline-asm-operand-implicit-cast.ll llvm-svn: 338214	2018-07-29 14:30:49 +00:00
Thomas Preud'homme	74ffd14e15	Fix crash on inline asm with 64bit matching input in 32bit GPR Add support for inline assembly with matching input operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR). Note that regular input is already handled by existing code. llvm-svn: 338206	2018-07-28 21:33:39 +00:00
Matt Arsenault	8f9dde94b7	AMDGPU: Stop wasting argument registers with v3i32/v3f32 SelectionDAGBuilder widens v3i32/v3f32 arguments to to v4i32/v4f32 which consume an additional register. In addition to wasting argument space, this produces extra instructions since now it appears the 4th vector component has a meaningful value to most combines. llvm-svn: 338197	2018-07-28 14:11:34 +00:00
Matt Arsenault	72b0e38b26	AMDGPU: Stop trying to extend arguments for clover This was trying to replace i8/i16 arguments with i32, which was broken and no longer necessary. llvm-svn: 338193	2018-07-28 12:34:25 +00:00
Craig Topper	50b1d4303d	[DAGCombiner] Teach DAG combiner that A-(B-C) can be folded to A+(C-B) This can be useful since addition is commutable, and subtraction is not. This matches a transform that is also done by InstCombine. llvm-svn: 338181	2018-07-28 00:27:25 +00:00
Wouter van Oortmerssen	a90d24da1c	Revert "[WebAssembly] Added default stack-only instruction mode for MC." This reverts commit d3c9af4179eae7793d1487d652e2d4e23844555f. (SVN revision 338164) llvm-svn: 338176	2018-07-27 23:19:51 +00:00
Craig Topper	c3e11bf3f7	[X86] Add support expanding multiplies by constant where the constant is -3/-5/-9 multplied by a power of 2. These can be replaced with an LEA, a shift, and a negate. This seems to match what gcc and icc would do. llvm-svn: 338174	2018-07-27 23:04:59 +00:00
Wouter van Oortmerssen	a67c4137c3	[WebAssembly] Added default stack-only instruction mode for MC. Summary: Moved Explicit Locals pass to last. Made that pass obligatory. Made it convert from register to stack based instructions, and removed the registers. Fixes to related code that was expecting register based instructions. Added the correct testing flag to all tests, depending on what the format they were expecting so far. Translated one test to stack format as example: reg-stackify-stack.ll tested: llvm-lit -v `find test -name WebAssembly` unittests/MC/* Reviewers: dschuff, sunfish Subscribers: sbc100, jgravelle-google, eraman, aheejin, llvm-commits Differential Revision: https://reviews.llvm.org/D49160 llvm-svn: 338164	2018-07-27 20:56:43 +00:00
Jessica Paquette	f90edbe3d6	Recommit "Enable MachineOutliner by default under -Oz for AArch64" Fixed the ASAN failure from before in r338148, so recommiting. This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338160	2018-07-27 20:18:27 +00:00
Sanjay Patel	06c7d5aef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC The tests with a constant sub operand were added with rL338143, but the potential transform doesn't have that requirement, so adding more tests with variable operands. llvm-svn: 338150	2018-07-27 18:31:21 +00:00
Evandro Menezes	fcca45f0dd	[ARM] Add new target feature to fuse literal generation This feature enables the fusion of such operations on Cortex A57 and Cortex A72, as recommended in their Software Optimisation Guides, sections 4.14 and 4.11, respectively. Differential revision: https://reviews.llvm.org/D49563 llvm-svn: 338147	2018-07-27 18:16:47 +00:00
Sanjay Patel	efac39eef6	[AArch64, PowerPC, x86] add more signbit math tests; NFC llvm-svn: 338143	2018-07-27 18:12:29 +00:00
Jessica Paquette	faea2d3130	Revert "Enable MachineOutliner by default under -Oz for AArch64" It failed an Asan test on a bot: http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/21543/steps/check-llvm%20asan/logs/stdio Fixing that before recommitting. llvm-svn: 338136	2018-07-27 17:25:38 +00:00
Yonghong Song	04ccfda075	bpf: add missing RegState to notify MachineInstr verifier necessary register usage Errors like the following are reported by: https://urldefense.proofpoint.com/v2/url?u=http-3A__lab.llvm.org-3A8011_builders_llvm-2Dclang-2Dx86-5F64-2Dexpensive-2Dchecks-2Dwin_builds_11261&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=DA8e1B5r073vIqRrFz7MRA&m=929oWPCf7Bf2qQnir4GBtowB8ZAlIRWsAdTfRkDaK-g&s=9k-wbEUVpUm474hhzsmAO29VXVvbxJPWD9RTgCD71fQ&e= * Bad machine code: Explicit definition marked as use * - function: cal_align1 - basic block: %bb.0 entry (0x47edd98) - instruction: LDB $r3, $r2, 0 - operand 0: $r3 This is because RegState info was missing for ScratchReg inside expandMEMCPY. This caused incomplete register usage information to MachineInstr verifier which then would complain as there could be potential code-gen issue if the complained MachineInstr is used in place where register usage information matters even though the memcpy expanding is not in such case as it happens at the last stage of IR optimization pipeline. We should always specify those register usage information which compiler couldn't deduct automatically whenever we add a hardware register manually. Reported-by: Builder llvm-clang-x86_64-expensive-checks-win Build #11261 Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 338134	2018-07-27 16:58:52 +00:00
Jessica Paquette	d4229b985c	Enable MachineOutliner by default under -Oz for AArch64 This patch enables the MachineOutliner by default in AArch64 under -Oz. The MachineOutliner offers around a 4.5% improvement on the current -Oz code size improvements. We have done work into improving the debuggability of outlined code, so that users of -Oz won't be surprised by the optimization. We have also been executing the LLVM test suite and common external tests such as the SPEC suites continuously with no issue. The outliner has a low compile-time overhead of roughly 1%. At this point, the outliner would be a really good addition to the -Oz pass pipeline! llvm-svn: 338133	2018-07-27 16:44:42 +00:00
Sanjay Patel	c7abb416dc	[DAGCombiner] fold 'not' with signbit math This is a follow-up suggested in D48970. Alive proofs: https://rise4fun.com/Alive/sII We can eliminate an instruction in the usual select-of-constants to bit hack transform by adjusting the add/sub with constant. This is always a win. There are more transforms that are likely wins, but they may need target hooks in case some targets do not benefit. This is another step towards making up for canonicalizing to select-of-constants in rL331486. llvm-svn: 338132	2018-07-27 16:42:55 +00:00
Sanjay Patel	1812d33e22	[x86] add more tests for signbit math; NFC llvm-svn: 338131	2018-07-27 16:22:40 +00:00
Sanjay Patel	60c04b961e	[PowerPC] add more tests for signbit math; NFC llvm-svn: 338130	2018-07-27 16:22:18 +00:00
Sanjay Patel	f815bc658b	[AArch64] add more tests for signbit math; NFC llvm-svn: 338129	2018-07-27 16:21:56 +00:00
Jan Vesely	6ff58ed5ca	AMDGPU/R600: Add MOV instructions to BFE patterns R600 can't handle immediates for BFE, these will be eliminated later. Fixes powr/pow regressions n r600 since r334817 Differential Revision: https://reviews.llvm.org/D49641 llvm-svn: 338127	2018-07-27 15:00:13 +00:00
Matt Arsenault	0183c56c11	AMDGPU: Fix code size for return_to_epilog pseudo llvm-svn: 338113	2018-07-27 09:15:03 +00:00
Tom Stellard	e9bdc5f1d8	AMDGPU/GlobalISel: Fix crash in regbankselect on non-power-of-2 types Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D49624 llvm-svn: 338102	2018-07-27 06:04:40 +00:00
Craig Topper	561e298e29	[X86] Remove an unnecessary 'if' that prevented treating INT64_MAX and -INT64_MAX as power of 2 minus 1 in the multiply expansion code. Not sure why they were being explicitly excluded, but I believe all the math inside the if works. I changed the absolute value to be uint64_t instead of int64_t so INT64_MIN+1 wouldn't be signed wrap. llvm-svn: 338101	2018-07-27 05:56:27 +00:00
Craig Topper	e364baa88b	[X86] Add matching for another pattern of PMADDWD. Summary: This is the pattern you get from the loop vectorizer for something like this int16_t A[1024]; int16_t B[1024]; int32_t C[512]; void pmaddwd() { for (int i = 0; i != 512; ++i) C[i] = (A[2i]B[2i]) + (A[2i+1]B[2i+1]); } In this case we will have (add (mul (build_vector), (build_vector)), (mul (build_vector), (build_vector))). This is different than the pattern we currently match which has the build_vectors between an add and a single multiply. I'm not sure what C code would get you that pattern. Reviewers: RKSimon, spatel, zvi Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49636 llvm-svn: 338097	2018-07-27 04:29:10 +00:00
Craig Topper	f7bc550223	[X86] When removing sign extends from gather/scatter indices, make sure we handle UpdateNodeOperands finding an existing node to CSE with. If this happens the operands aren't updated and the existing node is returned. Make sure we pass this existing node up to the DAG combiner so that a proper replacement happens. Otherwise we get stuck in an infinite loop with an unoptimized node. llvm-svn: 338090	2018-07-27 00:00:30 +00:00
Craig Topper	1a40a06549	[SelectionDAGBuilder] Add masked loads to PendingLoads rather than calling DAG.setRoot. Masked loads are calling DAG.getRoot rather than calling SelectionDAGBuilder::getRoot, which means the PendingLoads weren't emptied to update the root and create any needed TokenFactor. So it would be incorrect to call setRoot for the masked load. This patch instead adds the masked load to PendingLoads so that the root doesn't get update until a store or scatter or something happens.. Alternatively, we could call SelectionDAGBuilder::getRoot before it, but that would create unnecessary serialization. llvm-svn: 338085	2018-07-26 23:22:11 +00:00
Scott Linder	eb1f75d561	[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory. Differential Revision: https://reviews.llvm.org/D49448 llvm-svn: 338060	2018-07-26 19:47:51 +00:00
Ana Pazos	2e4106b73d	[RISCV] Add support for _interrupt attribute - Save/restore only registers that are used. This includes Callee saved registers and Caller saved registers (arguments and temporaries) for integer and FP registers. - If there is a call in the interrupt handler, save/restore all Caller saved registers (arguments and temporaries) and all FP registers. - Emit special return instructions depending on "interrupt" attribute type. Based on initial patch by Zhaoshi Zheng. Reviewers: asb Reviewed By: asb Subscribers: rkruppe, the_o, MartinMosbeck, brucehoult, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, mgrang, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D48411 llvm-svn: 338047	2018-07-26 17:49:43 +00:00
Matthias Braun	09810c9269	MacroFusion: Fix macro fusion with ExitSU failing in top-down scheduling When fusing instructions A and B, we must add all predecessors of B as predecessors of A to avoid instructions getting scheduling in between. There is a special case involving ExitSU: Every other node must be scheduled before it by design and we don't need to make this explicit in the graph, however when fusing with a different node we need to schedule every othere node before the fused node too and we need to make this explicit now: This patch adds a dependency from the fused node to all roots in the graph. Differential Revision: https://reviews.llvm.org/D49830 llvm-svn: 338046	2018-07-26 17:43:56 +00:00
Roman Lebedev	41ba5c1455	[DAGCombine] optimizeSetCCOfSignedTruncationCheck(): handle ule,ugt CondCodes. Summary: A follow-up for D49266 / rL337166. At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyhZZ We won't get to these cases with I1 being -1, as that will be constant-folded to true or false. I'm also not sure we actually hit the 'ule' case, but i think the worst think that could happen is that being dead code. Reviewers: spatel, craig.topper, RKSimon, javed.absar, efriedma Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49497 llvm-svn: 338044	2018-07-26 17:34:28 +00:00
Stefan Maksimovic	4a612d4bf2	[mips] Sign extend i32 return values on MIPS64 Override getTypeForExtReturn so that functions returning an i32 typed value have it sign extended on MIPS64. Also provide patterns to get rid of unneeded sign extensions for arithmetic instructions which implicitly sign extend their results. Differential Revision: https://reviews.llvm.org/D48374 llvm-svn: 338019	2018-07-26 10:59:35 +00:00
Martin Storsjo	9dafd6f6d9	Revert "[COFF] Use comdat shared constants for MinGW as well" This reverts commit r337951. While that kind of shared constant generally works fine in a MinGW setting, it broke some cases of inline assembly that worked before: $ cat const-asm.c int MULH(int a, int b) { int rt, dummy; __asm__ ( "imull %3" :"=d"(rt), "=a"(dummy) :"a"(a), "rm"(b) ); return rt; } int func(int a) { return MULH(a, 1); } $ clang -target x86_64-win32-gnu -c const-asm.c -O2 const-asm.c:4:9: error: invalid variant '00000001' "imull %3" ^ <inline asm>:1:15: note: instantiated into assembly here imull __real@00000001(%rip) ^ A similar error is produced for i686 as well. The same test with a target of x86_64-win32-msvc or i686-win32-msvc works fine. llvm-svn: 338018	2018-07-26 10:48:20 +00:00
Craig Topper	4e687d5bb2	[X86] Don't use CombineTo to skip adding new nodes to the DAGCombiner worklist in combineMul. I'm not sure if this was trying to avoid optimizing the new nodes further or what. Or maybe to prevent a cycle if something tried to reform the multiply? But I don't think its a reliable way to do that. If the user of the expanded multiply is visited by the DAGCombiner after this conversion happens, the DAGCombiner will check its operands, see that they haven't been visited by the DAGCombiner before and it will then add the first node to the worklist. This process will repeat until all the new nodes are visited. So this seems like an unreliable prevention at best. So this patch just returns the new nodes like any other combine. If this starts causing problems we can try to add target specific nodes or something to more directly prevent optimizations. Now that we handle the combine normally, we can combine any negates the mul expansion creates into their users since those will be visited now. llvm-svn: 338007	2018-07-26 05:40:10 +00:00
Amara Emerson	fdd089aa14	[GlobalISel] Fall back to SDISel for swifterror/swiftself attributes. We don't currently support these, fall back until we do. llvm-svn: 337994	2018-07-26 01:25:58 +00:00
Yonghong Song	71d81e5c8f	bpf: new option -bpf-expand-memcpy-in-order to expand memcpy in order Some BPF JIT backends would want to optimize memcpy in their own architecture specific way. However, at the moment, there is no way for JIT backends to see memcpy semantics in a reliable way. This is due to LLVM BPF backend is expanding memcpy into load/store sequences and could possibly schedule them apart from each other further. So, BPF JIT backends inside kernel can't reliably recognize memcpy semantics by peephole BPF sequence. This patch introduce new intrinsic expand infrastructure to memcpy. To get stable in-order load/store sequence from memcpy, we first lower memcpy into BPF::MEMCPY node which then expanded into in-order load/store sequences in expandPostRAPseudo pass which will happen after instruction scheduling. By this way, kernel JIT backends could reliably recognize memcpy through scanning BPF sequence. This new memcpy expand infrastructure is gated by a new option: -bpf-expand-memcpy-in-order Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 337977	2018-07-25 22:40:02 +00:00
Sanjay Patel	215dcbf4db	[SelectionDAG] try to convert funnel shift directly to rotate if legal If the DAGCombiner's rotate matching was working as expected, I don't think we'd see any test diffs here. This sidesteps the issue of custom lowering for rotates raised in PR38243: https://bugs.llvm.org/show_bug.cgi?id=38243 ...by only dealing with legal operations. llvm-svn: 337966	2018-07-25 21:38:30 +00:00
Sanjay Patel	f94c4c84e6	[AArch, PowerPC] add more tests for legal rotate ops; NFC llvm-svn: 337964	2018-07-25 21:25:50 +00:00
Martin Storsjo	ff33a95ed4	[COFF] Use comdat shared constants for MinGW as well GNU binutils tools have no problems with this kind of shared constants, provided that we actually hook it up completely in AsmPrinter and produce a global symbol. This effectively reverts SVN r335918 by hooking the rest of it up properly. This feature was implemented originally in SVN r213006, with no reason for why it can't be used for MinGW other than the fact that GCC doesn't do it while MSVC does. Differential Revision: https://reviews.llvm.org/D49646 llvm-svn: 337951	2018-07-25 18:35:42 +00:00
Martin Storsjo	d2662c32fb	[COFF] Hoist constant pool handling from X86AsmPrinter into AsmPrinter In SVN r334523, the first half of comdat constant pool handling was hoisted from X86WindowsTargetObjectFile (which despite the name only was used for msvc targets) into the arch independent TargetLoweringObjectFileCOFF, but the other half of the handling was left behind in X86AsmPrinter::GetCPISymbol. With only half of the handling in place, inconsistent comdat sections/symbols are created, causing issues with both GNU binutils (avoided for X86 in SVN r335918) and with the MS linker, which would complain like this: fatal error LNK1143: invalid or corrupt file: no symbol for COMDAT section 0x4 Differential Revision: https://reviews.llvm.org/D49644 llvm-svn: 337950	2018-07-25 18:35:31 +00:00
Eli Friedman	733f4ed1bb	[ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1. Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945	2018-07-25 18:22:22 +00:00
Ulrich Weigand	5f75371c5d	Fix corruption of result number in LegalizeVectorOps.cpp When VectorLegalizer::LegalizeOp creates a new SDValue after iterating over its arguments, we need to refer to the same result number of the new node that the original value used. Reviewed by: cameron.mcinally Differential Revision: https://reviews.llvm.org/D49805 llvm-svn: 337939	2018-07-25 17:08:13 +00:00
Stanislav Mekhanoshin	7e7268ac1c	[AMDGPU] Use AssumptionCacheTracker in the divrem32 expansion Differential Revision: https://reviews.llvm.org/D49761 llvm-svn: 337938	2018-07-25 17:02:11 +00:00
Krzysztof Parzyszek	4e07509d18	[Hexagon] Properly scale bit index when extracting elements from vNi1 For example v = <2 x i1> is represented as bbbbaaaa in a predicate register, where b = v[1], a = v[0]. Extracting v[1] is equivalent to extracting bit 4 from the predicate register. llvm-svn: 337934	2018-07-25 16:20:59 +00:00
Petar Jovanovic	58c0210023	[MIPS GlobalISel] Lower pointer arguments Add support for lowering pointer arguments. Changing type from pointer to integer is already done in MipsTargetLowering::getRegisterTypeForCallingConv. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D49419 llvm-svn: 337912	2018-07-25 12:35:01 +00:00
Thomas Preud'homme	768d6ce4a3	Fix PR34170: Crash on inline asm with 64bit output in 32bit GPR Add support for inline assembly with output operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). llvm-svn: 337903	2018-07-25 11:11:12 +00:00
Craig Topper	d9fa8147c4	[X86] Autogenerate complete checks and fix a failure introduced in r337875. llvm-svn: 337889	2018-07-25 05:22:13 +00:00
Chandler Carruth	7024921c0a	[x86/SLH] Teach the x86 speculative load hardening pass to harden against v1.2 BCBS attacks directly. Attacks using spectre v1.2 (a subset of BCBS) are described in the paper here: https://people.csail.mit.edu/vlk/spectre11.pdf The core idea is to speculatively store over the address in a vtable, jumptable, or other target of indirect control flow that will be subsequently loaded. Speculative execution after such a store can forward the stored value to subsequent loads, and if called or jumped to, the speculative execution will be steered to this potentially attacker controlled address. Up until now, this could be mitigated by enableing retpolines. However, that is a relatively expensive technique to mitigate this particular flavor. Especially because in most cases SLH will have already mitigated this. To fully mitigate this with SLH, we need to do two core things: 1) Unfold loads from calls and jumps, allowing the loads to be post-load hardened. 2) Force hardening of incoming registers even if we didn't end up needing to harden the load itself. The reason we need to do these two things is because hardening calls and jumps from this particular variant is importantly different from hardening against leak of secret data. Because the "bad" data here isn't a secret, but in fact speculatively stored by the attacker, it may be loaded from any address, regardless of whether it is read-only memory, mapped memory, or a "hardened" address. The only 100% effective way to harden these instructions is to harden the their operand itself. But to the extent possible, we'd like to take advantage of all the other hardening going on, we just need a fallback in case none of that happened to cover the particular input to the control transfer instruction. For users of SLH, currently they are paing 2% to 6% performance overhead for retpolines, but this mechanism is expected to be substantially cheaper. However, it is worth reminding folks that this does not mitigate all of the things retpolines do -- most notably, variant #2 is not in any way mitigated by this technique. So users of SLH may still want to enable retpolines, and the implementation is carefuly designed to gracefully leverage retpolines to avoid the need for further hardening here when they are enabled. Differential Revision: https://reviews.llvm.org/D49663 llvm-svn: 337878	2018-07-25 01:51:29 +00:00
Craig Topper	fc501a9223	[X86] Use a shift plus an lea for multiplying by a constant that is a power of 2 plus 2/4/8. The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2. llvm-svn: 337875	2018-07-25 01:15:38 +00:00
Craig Topper	5be253d988	[X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we do for pow2 - 2. llvm-svn: 337874	2018-07-25 01:15:35 +00:00
Craig Topper	56c104f104	[X86] Use a two lea sequence for multiply by 37, 41, and 73. These fit a pattern used by 11, 21, and 19. llvm-svn: 337871	2018-07-24 23:44:17 +00:00
Craig Topper	b5342b592e	[X86] Add test cases for multiply by 37, 41, and 73. These can all be handled with 2 LEAs similar to what we do for 11, 19, 21. llvm-svn: 337870	2018-07-24 23:44:15 +00:00
Craig Topper	f8fcee70a3	[X86] Change multiply by 26 to use two multiplies by 5 and an add instead of multiply by 3 and 9 and a subtract. Same number of operations, but ending in an add is friendlier due to it being commutable. llvm-svn: 337869	2018-07-24 23:44:12 +00:00
Craig Topper	5ddc0a2b14	[X86] When expanding a multiply by a negative of one less than a power of 2, like 31, don't generate a negate of a subtract that we'll never optimize. We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway. This patch makes use explicitly emit the swapped subtract. llvm-svn: 337858	2018-07-24 21:31:21 +00:00
Craig Topper	6d29891bef	[X86] Generalize the multiply by 30 lowering to generic multipy by power 2 minus 2. Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA. Use this for multiply by 14 as well. llvm-svn: 337856	2018-07-24 21:15:41 +00:00
Heejin Ahn	8daef0751d	[WebAssembly] Add tests for weaker memory consistency orderings Summary: Currently all wasm atomic memory access instructions are sequentially consistent, so even if LLVM IR specifies weaker orderings than that, we should upgrade them to sequential ordering and treat them in the same way. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49194 llvm-svn: 337854	2018-07-24 21:06:44 +00:00
Craig Topper	86d6320b94	[X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1. The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub. llvm-svn: 337851	2018-07-24 20:31:48 +00:00
Craig Topper	1296c622df	[X86] Add test case to show failure to combine away negates that may be created by mul by constant expansion. Mul by constant can expand to a sequence that ends with a negate. If the next instruction is an add or sub we might be able to fold the negate away. We currently fail to do this because we explicitly don't add anything to the DAG combine worklist when we expand multiplies. This is primarily to keep the multipy from being reformed, but we should consider adding the users to worklist. llvm-svn: 337843	2018-07-24 18:36:46 +00:00
Simon Atanasyan	28ded4ee19	[mips] Fix local dynamic TLS with Sym64 For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827	2018-07-24 13:47:52 +00:00
Sam Parker	8b93e82c3d	[ARM] Disable ARMCodeGenPrepare by default ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821	2018-07-24 12:04:23 +00:00
Chandler Carruth	a25aca21af	[x86] Clean up and convert test to use generated CHECK lines. This test was already checking microscopic behavior of tail call under specific conditions. This just makes the CHECK lines much more consistent, clear, and easily updated when intentional changes are made. I've also switched the test to consistently name the entry block and to order the helper declarations and comments for specific tests in the more usual locations. llvm-svn: 337806	2018-07-24 03:18:08 +00:00
Chandler Carruth	d41dca2ddc	[x86] Update the CHECK lines of this test to use the latest patterns from the script. This minimizes the diff in subsequent changes. llvm-svn: 337805	2018-07-24 03:07:07 +00:00
Tom Stellard	b7f19e6d1e	AMDGPU/GlobalISel: Legalize G_INSERT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798	2018-07-24 02:19:20 +00:00
Thomas Anderson	8e8a652c2f	Fix typo in test/CodeGen/Mips/dins.ll Differential Revision: https://reviews.llvm.org/D49704 llvm-svn: 337771	2018-07-23 23:19:53 +00:00
Martin Storsjo	100fc97051	[COFF] Fix assembly output of comdat sections without an attached symbol Since SVN r335286, the .xdata sections are produced without an attached symbol, which requires using a different syntax when printing assembly output. Instead of the usual syntax of '.section <name>,"dr",discard,<symbol>', use '.section <name>,"dr"' + '.linkonce discard' (which is what GCC uses for all assembly output). This fixes PR38254. Differential Revision: https://reviews.llvm.org/D49651 llvm-svn: 337756	2018-07-23 22:15:19 +00:00
Martin Storsjo	c2b701408e	[AArch64] Use MCAsmInfoMicrosoft and MCAsmInfoGNUCOFF as base classes This matches the structure used on X86 and ARM. This requires a little bit of duplication of the parts that are equal in both AArch64 COFF variants though. Before SVN r335286, these classes didn't add anything that MCAsmInfoCOFF didn't, but now they do. This makes AArch64 match X86 in how comdat is used for float constants for MinGW. Differential Revision: https://reviews.llvm.org/D49637 llvm-svn: 337755	2018-07-23 22:15:14 +00:00
Reid Kleckner	980c4df037	Re-land r335297 "[X86] Implement more of x86-64 large and medium PIC code models" Don't try to generate large PIC code for non-ELF targets. Neither COFF nor MachO have relocations for large position independent code, and users have been using "large PIC" code models to JIT 64-bit code for a while now. With this change, if they are generating ELF code, their JITed code will truly be PIC, but if they target MachO or COFF, it will contain 64-bit immediates that directly reference external symbols. For a JIT, that's perfectly fine. llvm-svn: 337740	2018-07-23 21:14:35 +00:00
Nirav Dave	5af81d5bfa	Add inline asm aliasing test. llvm-svn: 337734	2018-07-23 20:19:10 +00:00
Krzysztof Parzyszek	9500a24fce	[Hexagon] Handle unnamed globals in HexagonConstExpr Instead of comparing names, compare positions in the parent module. llvm-svn: 337723	2018-07-23 18:30:17 +00:00
Simon Atanasyan	307e5b31ce	[mips] Add more checks to the tls.ll test case. NFC llvm-svn: 337705	2018-07-23 16:05:44 +00:00
Cameron McInally	2c9bcffc92	[FPEnv] Legalize double width StrictFP vector operations Differential Revision: https://reviews.llvm.org/D48809 llvm-svn: 337698	2018-07-23 14:40:17 +00:00
Sam Parker	3828c6ff94	[ARM] ARMCodeGenPrepare backend pass Arm specific codegen prepare is implemented to perform type promotion on icmp operands, which can enable the removal of uxtb and uxth (unsigned extend) instructions. This is possible because performing type promotion before ISel alleviates this duty from the DAG builder which has to perform legalisation, but has a limited view on data ranges. The pass visits any instruction operand of an icmp and creates a worklist to traverse the use-def tree to determine whether the values can simply be promoted. Our concern is values in the registers overflowing the narrow (i8, i16) data range, so instructions marked with nuw can be promoted easily. For add and sub instructions, we are able to use the parallel dsp instructions to operate on scalar data types and avoid overflowing bits. Underflowing adds and subs are also permitted when the result is only used by an unsigned icmp. Differential Revision: https://reviews.llvm.org/D48832 llvm-svn: 337687	2018-07-23 12:27:47 +00:00
Chandler Carruth	1d926fb9f4	[x86/SLH] Fix a bug where we would harden tail calls twice -- once as a call, and then again as a return. Also added a comment to try and explain better why we would be doing what we're doing when hardening the (non-call) returns. llvm-svn: 337673	2018-07-23 07:56:15 +00:00
Chandler Carruth	b66f2d8df8	[x86/SLH] Add a test covering indirect forms of control flow. NFC. This specifically covers different ways of making indirect calls and jumps. There are some bugs in SLH that I will be fixing in subsequent patches where the diff in the generated instructions makes the bug fix much more clear, so just checking in a baseline of this test to start. I'm also going to be adding direct mitigation for variant 1.2 which this file very specifically tests in the various forms it can arise on x86. Again, the diff to the generated instructions should make the change for that much more clear, so having the test as a baseline seems useful. llvm-svn: 337672	2018-07-23 07:51:51 +00:00
Craig Topper	b2a626b52e	[X86] Remove the max vector width restriction from combineLoopMAddPattern and rely splitOpsAndApply to handle splitting. This seems to be a net improvement. There's still an issue under avx512f where we have a 512-bit vpaddd, but not vpmaddwd so we end up doing two 256-bit vpmaddwds and inserting the results before a 512-bit vpaddd. It might be better to do two 512-bits paddds with zeros in the upper half. Same number of instructions, but breaks a dependency. llvm-svn: 337656	2018-07-22 19:44:35 +00:00
Craig Topper	d8f80e90ce	[X86] Add more MADD recurrence test cases with larger and narrower vector widths. llvm-svn: 337650	2018-07-22 05:16:47 +00:00
Simon Atanasyan	ecd1e0afdd	[mips] Move out the WrapperPat declaration from the NotInMicroMips predicate This is a follow-up to the rL335185. Those commit adds some WrapperPat patterns for microMIPS target. But declaration of the WrapperPat class is under the NotInMicroMips predicate and microMIPS patterns cannot be selected because predicate (Subtarget->inMicroMipsMode()) && (!Subtarget->inMicroMipsMode()) is always false. This change move out the WrapperPat class declaration from the NotInMicroMips predicate and enables microMIPS WrapperPat patterns. Differential revision: https://reviews.llvm.org/D49533 llvm-svn: 337646	2018-07-21 16:16:03 +00:00
Krzysztof Parzyszek	05337bdb50	[Hexagon] Disable packets in test to avoid ordering issues in checks llvm-svn: 337624	2018-07-20 21:55:55 +00:00
Roman Tereshin	31d52847ef	Reapply "[LSV] Refactoring + supporting bitcasts to a type of different size" This reapplies commit r337489 reverted by r337541 Additionally, this commit contains a speculative fix to the issue reported in r337541 (the report does not contain an actionable reproducer, just a stack trace) llvm-svn: 337606	2018-07-20 20:10:04 +00:00
Craig Topper	28ac623f6f	[X86] Remove isel patterns for MOVSS/MOVSD ISD opcodes with integer types. Ideally our ISD node types going into the isel table would have types consistent with their instruction domain. This prevents us having to duplicate patterns with different types for the same instruction. Unfortunately, it seems our shuffle combining is currently relying on this a little remove some bitcasts. This seems to enable some switching between shufps and shufd. Hopefully there's some way we can address this in the combining. Differential Revision: https://reviews.llvm.org/D49280 llvm-svn: 337590	2018-07-20 17:57:53 +00:00
Simon Pilgrim	70fcd0f481	[X86][XOP] Fix SUB constant folding for VPSHA/VPSHL shift lowering We can safely use getConstant here as we're still lowering, which allows constant folding to kick in and simplify the vector shift codegen. Noticed while working on D49562. llvm-svn: 337578	2018-07-20 16:55:18 +00:00
Evandro Menezes	fffa9b5897	[ARM] Add new feature to enable optimizing the VFP registers Enable the optimization of operations on DPR and SPR via a feature instead of checking the target. Differential revision: https://reviews.llvm.org/D49463 llvm-svn: 337575	2018-07-20 16:49:28 +00:00
Simon Pilgrim	c7132031a2	[X86][SSE] Use SplitOpsAndApply to improve HADD/HSUB lowering Improve AVX1 256-bit vector HADD/HSUB matching by using SplitOpsAndApply to split into 128-bit instructions. llvm-svn: 337568	2018-07-20 16:20:45 +00:00
Simon Pilgrim	a85b86a982	[X86][AVX] Add support for i16 256-bit vector horizontal op redundant shuffle removal llvm-svn: 337566	2018-07-20 15:51:01 +00:00
Simon Pilgrim	a2bc2d488c	[X86][AVX] Add v16i16 horizontal op redundant shuffle tests llvm-svn: 337565	2018-07-20 15:41:15 +00:00
Nirav Dave	25802ac9fd	[DAG] Avoid Node Update assertion due to AND simplification Check for construction-time folding for incomplete AND nodes in BackwardsPropagateMask. Fixes PR38185. Reviewers: RKSimon, samparker Reviewed By: samparker Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D49444 llvm-svn: 337563	2018-07-20 15:27:24 +00:00
Simon Pilgrim	7c56bce996	[X86][AVX] Add support for 32/64 bits 256-bit vector horizontal op redundant shuffle removal llvm-svn: 337561	2018-07-20 15:24:12 +00:00
Nirav Dave	5a4e11ad9c	[DAG] Fix Memory ordering check in ReduceLoadOpStore. When merging through a TokenFactor we need to check that the load may be ordered such that no other aliasing memory operations may happen. It is not sufficient to just check that the load is a member of the chain token factor as it there may be a indirect chain. Require the load's chain has only one use. This fixes PR37826. Reviewers: spatel, davide, efriedma, craig.topper, RKSimon Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49388 llvm-svn: 337560	2018-07-20 15:20:50 +00:00
Simon Pilgrim	8342126c4e	[X86][AVX] Add 256-bit vector horizontal op redundant shuffle tests llvm-svn: 337558	2018-07-20 15:07:53 +00:00
Simon Pilgrim	f907e19b5e	Regenerate partial vector fold test. NFCI. llvm-svn: 337551	2018-07-20 13:58:57 +00:00
Simon Pilgrim	cbf5af12b0	Regenerate remainder test. llvm-svn: 337546	2018-07-20 13:14:29 +00:00
Ulrich Weigand	9dd23b8433	[SystemZ] Test case formatting fixes Fix systematically wrong whitespace from a prior automated change. NFC. llvm-svn: 337542	2018-07-20 12:12:10 +00:00
Sam McCall	57743883f1	Revert "[LSV] Refactoring + supporting bitcasts to a type of different size" This reverts commit r337489. It causes asserts to fire in some TensorFlow tests, e.g. tensorflow/compiler/tests/gather_test.py on GPU. Example stack trace: Start test case: GatherTest.testHigherRank assertion failed at third_party/llvm/llvm/lib/Support/APInt.cpp:819 in llvm::APInt llvm::APInt::trunc(unsigned int) const: width && "Can't truncate to 0 bits" @ 0x5559446ebe10 __assert_fail @ 0x55593ef32f5e llvm::APInt::trunc() @ 0x55593d78f86e (anonymous namespace)::Vectorizer::lookThroughComplexAddresses() @ 0x55593d78f2bc (anonymous namespace)::Vectorizer::areConsecutivePointers() @ 0x55593d78d128 (anonymous namespace)::Vectorizer::isConsecutiveAccess() @ 0x55593d78c926 (anonymous namespace)::Vectorizer::vectorizeInstructions() @ 0x55593d78c221 (anonymous namespace)::Vectorizer::vectorizeChains() @ 0x55593d78b948 (anonymous namespace)::Vectorizer::run() @ 0x55593d78b725 (anonymous namespace)::LoadStoreVectorizer::runOnFunction() @ 0x55593edf4b17 llvm::FPPassManager::runOnFunction() @ 0x55593edf4e55 llvm::FPPassManager::runOnModule() @ 0x55593edf563c (anonymous namespace)::MPPassManager::runOnModule() @ 0x55593edf5137 llvm::legacy::PassManagerImpl::run() @ 0x55593edf5b71 llvm::legacy::PassManager::run() @ 0x55593ced250d xla::gpu::IrDumpingPassManager::run() @ 0x55593ced5033 xla::gpu::(anonymous namespace)::EmitModuleToPTX() @ 0x55593ced40ba xla::gpu::(anonymous namespace)::CompileModuleToPtx() @ 0x55593ced33d0 xla::gpu::CompileToPtx() @ 0x55593b26b2a2 xla::gpu::NVPTXCompiler::RunBackend() @ 0x55593b21f973 xla::Service::BuildExecutable() @ 0x555938f44e64 xla::LocalService::CompileExecutable() @ 0x555938f30a85 xla::LocalClient::Compile() @ 0x555938de3c29 tensorflow::XlaCompilationCache::BuildExecutable() @ 0x555938de4e9e tensorflow::XlaCompilationCache::CompileImpl() @ 0x555938de3da5 tensorflow::XlaCompilationCache::Compile() @ 0x555938c5d962 tensorflow::XlaLocalLaunchBase::Compute() @ 0x555938c68151 tensorflow::XlaDevice::Compute() @ 0x55593f389e1f tensorflow::(anonymous namespace)::ExecutorState::Process() @ 0x55593f38a625 tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady()::$_1::operator()() * SIGABRT received by PID 7798 (TID 7837) from PID 7798; * llvm-svn: 337541	2018-07-20 12:03:00 +00:00
Jonas Paulsson	c88d3f6a99	[SystemZ] Reimplent SchedModel IssueWidth and WriteRes/ReadAdvance mappings. As a consequence of recent discussions (http://lists.llvm.org/pipermail/llvm-dev/2018-May/123164.html), this patch changes the SystemZ SchedModels so that the IssueWidth is 6, which is the decoder capacity, and NumMicroOps become the number of decoder slots needed per instruction. In addition, the SchedWrite latencies now match the MachineInstructions def-operand indexes, and ReadAdvances have been added on instructions with one register operand and one memory operand. Review: Ulrich Weigand https://reviews.llvm.org/D47008 llvm-svn: 337538	2018-07-20 09:40:43 +00:00
Matt Arsenault	4bec7d4261	Reapply "AMDGPU: Fix handling of alignment padding in DAG argument lowering" Reverts r337079 with fix for msan error. llvm-svn: 337535	2018-07-20 09:05:08 +00:00
Stephen Canon	34f5867310	Add x86_64-unkown triple to llc for x86 test. llvm-svn: 337523	2018-07-20 03:50:55 +00:00
Craig Topper	d8734450a2	[DAGCombiner] Fold X - (-Y Z) -> X + (Y Z) llvm-svn: 337518	2018-07-20 01:40:03 +00:00
Stephen Canon	8995c5f0f6	Skip out of SimplifyDemandedBits for BITCAST of f16 to i16 Mirrors the existing exit path for f128, avoiding a crash later on. Differential Revision: https://reviews.llvm.org/D49524 llvm-svn: 337506	2018-07-19 22:46:42 +00:00
Craig Topper	c12c5d421f	[DAGCombiner] Teach DAGCombiner that A-(-B) is A+B. We already knew A+(-B) is A-B in visitAdd. This does the opposite for visitSub. llvm-svn: 337502	2018-07-19 22:24:43 +00:00
Simon Pilgrim	1d181bc992	[X86][AVX] Use extract_subvector to reduce vector op widths (PR36761) We have a number of cases where we fail to reduce vector op widths, performing the op in a larger vector and then extracting a subvector. This is often because by default it would create illegal types. This peephole patch attempts to handle a few common cases detailed in PR36761, which typically involved extension+conversion to vX2f64 types. Differential Revision: https://reviews.llvm.org/D49556 llvm-svn: 337500	2018-07-19 21:52:06 +00:00
Roman Tereshin	b49b2a601f	[LSV] Refactoring + supporting bitcasts to a type of different size This is mostly a preparation work for adding a limited support for select instructions. It proved to be difficult to do due to size and irregularity of Vectorizer::isConsecutiveAccess, this is fixed here I believe. It also turned out that these changes make it simpler to finish one of the TODOs and fix a number of other small issues, namely: 1. Looking through bitcasts to a type of a different size (requires careful tracking of the original load/store size and some math converting sizes in bytes to expected differences in indices of GEPs). 2. Reusing partial analysis of pointers done by first attempt in proving them consecutive instead of starting from scratch. This added limited support for nested GEPs co-existing with difficult sext/zext instructions. This also required a careful handling of negative differences between constant parts of offsets. 3. Handing a case where the first pointer index is not an add, but something else (a function parameter for instance). I observe an increased number of successful vectorizations on a large set of shader programs. Only few shaders are affected, but those that are affected sport >5% less loads and stores than before the patch. Reviewed By: rampitec Differential-Revision: https://reviews.llvm.org/D49342 llvm-svn: 337489	2018-07-19 19:42:43 +00:00
Stefan Pintilie	0c122d5a41	[Power9] Code Cleanup - Remove needsAggressiveScheduling() As we already return true from needsAggressiveScheduling() for the most recent hardware it would be cleaner to just return true for all PowerPC hardware. Differential Revision: https://reviews.llvm.org/D48663 llvm-svn: 337488	2018-07-19 19:34:18 +00:00
Andrea Di Biagio	b6022aa8d9	[X86][BtVer2] correctly model the latency/throughput of LEA instructions. This patch fixes the latency/throughput of LEA instructions in the BtVer2 scheduling model. On Jaguar, A 3-operands LEA has a latency of 2cy, and a reciprocal throughput of 1. That is because it uses one cycle of SAGU followed by 1cy of ALU1. An LEA with a "Scale" operand is also slow, and it has the same latency profile as the 3-operands LEA. An LEA16r has a latency of 3cy, and a throughput of 0.5 (i.e. RThrouhgput of 2.0). This patch adds a new TIIPredicate named IsThreeOperandsLEAFn to X86Schedule.td. The tablegen backend (for instruction-info) expands that definition into this (file X86GenInstrInfo.inc): ``` static bool isThreeOperandsLEA(const MachineInstr &MI) { return ( ( MI.getOpcode() == X86::LEA32r \|\| MI.getOpcode() == X86::LEA64r \|\| MI.getOpcode() == X86::LEA64_32r \|\| MI.getOpcode() == X86::LEA16r ) && MI.getOperand(1).isReg() && MI.getOperand(1).getReg() != 0 && MI.getOperand(3).isReg() && MI.getOperand(3).getReg() != 0 && ( ( MI.getOperand(4).isImm() && MI.getOperand(4).getImm() != 0 ) \|\| (MI.getOperand(4).isGlobal()) ) ); } ``` A similar method is generated in the X86_MC namespace, and included into X86MCTargetDesc.cpp (the declaration lives in X86MCTargetDesc.h). Back to the BtVer2 scheduling model: A new scheduling predicate named JSlowLEAPredicate now checks if either the instruction is a three-operands LEA, or it is an LEA with a Scale value different than 1. A variant scheduling class uses that new predicate to correctly select the appropriate latency profile. Differential Revision: https://reviews.llvm.org/D49436 llvm-svn: 337469	2018-07-19 16:42:15 +00:00
Simon Pilgrim	d8291e34e1	[X86][SSE] Add FPEXT vXf32 - vXf64 tests Some basic subvector special cases based on PR36761 llvm-svn: 337464	2018-07-19 15:32:45 +00:00
Tim Northover	a7c18ee8c4	ARM: switch armv7em MachO triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. Recommitted without breaking ELF this time. llvm-svn: 337450	2018-07-19 12:44:51 +00:00
Simon Pilgrim	bfb900d363	[DAGCombiner] Add rotate-extract tests Add new tests from D47681 to current codegen. Also added i686 codegen tests. llvm-svn: 337445	2018-07-19 09:27:34 +00:00
Heejin Ahn	47068a42d2	[WebAssembly] Add missing -mattr=+exception-handling guards Summary: The use of exception handling instructions should only be enabled with `-mattr=+exception-handling` option. Reviewers: jgravelle-google Subscribers: dschuff, sbc100, sunfish, llvm-commits Differential Revision: https://reviews.llvm.org/D49391 llvm-svn: 337425	2018-07-18 21:42:22 +00:00
Tim Northover	da142d10d9	Revert "ARM: switch armv7em triple to hard-float defaults and libcalls." This reverts commit r337385 until it can be targeted at MachO only. llvm-svn: 337424	2018-07-18 21:32:49 +00:00
Simon Pilgrim	d4b82da113	[X86][SSE] Canonicalize scalar fp arithmetic shuffle patterns As discussed on PR38197, this canonicalizes MOVS(N0, OP(N0, N1)) --> MOVS(N0, SCALAR_TO_VECTOR(OP(N0[0], N1[0]))) This returns the scalar-fp codegen lost by rL336971. Additionally it handles the OP(N1, N0)) case for commutable (FADD/FMUL) ops. Differential Revision: https://reviews.llvm.org/D49474 llvm-svn: 337419	2018-07-18 19:55:19 +00:00
Nirav Dave	5217bbce04	[DAG] Add testcase. llvm-svn: 337414	2018-07-18 18:34:52 +00:00
Nirav Dave	a747d3ca60	[ScheduleDAG] Fix unfolding of SUnits to already existent nodes. Summary: If unfolding an SUnit results in both load or the operation using it which already exist in the DAG, abort the unfold if they are already scheduled. If not, make sure we don't add duplicate dependencies. This fixes PR37916. Reviewers: davide, eli.friedman, fhahn, bogner Subscribers: MatzeB, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48666 llvm-svn: 337409	2018-07-18 18:01:03 +00:00
Roman Lebedev	5317e88300	[NFC][X86][AArch64][DAGCombine] More tests for optimizeSetCCOfSignedTruncationCheck() At least one of these cases is more canonical, so we really do have to handle it. https://godbolt.org/g/pkzP3X https://rise4fun.com/Alive/pQyh llvm-svn: 337400	2018-07-18 16:19:06 +00:00
Simon Atanasyan	e9d7b3198a	[mips] Fix predicate for the MipsTruncIntFP pattern This is a follow-up to the rL337171. This patch fixes regression introduced by the r337171 and enables MipsTruncIntFP pattern. Differential revision: https://reviews.llvm.org/D49469 llvm-svn: 337392	2018-07-18 14:11:22 +00:00
Tim Northover	d4abd14c1b	ARM: switch armv7em triple to hard-float defaults and libcalls. We were emitting incorrect calls to libm functions that LLVM had decided it knew about because the default is soft-float. llvm-svn: 337385	2018-07-18 12:37:04 +00:00
Simon Pilgrim	21813140f6	[X86][SSE] Add extra scalar fop + blend tests for commuted inputs While working on PR38197, I noticed that we don't make use of FADD/FMUL being able to commute the inputs to support the addps+movss -> addss style combine llvm-svn: 337375	2018-07-18 10:54:13 +00:00
Daniel Cederman	959c8bf51c	Revert "[Sparc] Use the IntPair reg class for r constraints with value type f64" This reverts commit 55222c9183c6e07f53a54c4061677734f54feac1. I missed that this patch has a dependency on https://reviews.llvm.org/D49219 that has not been approved yet. llvm-svn: 337373	2018-07-18 10:05:30 +00:00
Daniel Cederman	4e38df18ea	[Sparc] Use the IntPair reg class for r constraints with value type f64 Summary: This is how it appears to be handled in GCC and it prevents a "Unknown mismatch" error in the SelectionDAGBuilder. Reviewers: venkatra, jyknight, jrtc27 Reviewed By: jyknight, jrtc27 Subscribers: eraman, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49218 llvm-svn: 337370	2018-07-18 09:25:33 +00:00
Craig Topper	92ea7a7b48	[X86] Enable commuting of VUNPCKHPD to VMOVLHPS to enable load folding by using VMOVLPS with a modified address. This required an annoying amount of tablegen multiclass changes to make only VUNPCKHPDZ128rr commutable. llvm-svn: 337357	2018-07-18 07:31:32 +00:00
Craig Topper	a2bbfa21ce	[X86] Add test case for missed opportunity to commute vunpckhpd to enable use of vmovlps to fold a load. We do this transform for SSE, but not AVX or AVX512VL. llvm-svn: 337356	2018-07-18 07:31:30 +00:00
Craig Topper	a9ec6a11d8	[X86] Regenerate fma.ll checks using current version of the script which produces different regular expressions on spills and reloads. NFC llvm-svn: 337354	2018-07-18 07:08:28 +00:00
Craig Topper	1425e10cc6	[X86] Generate v2f64 X86ISD::UNPCKL/UNPCKH instead of X86ISD::MOVLHPS/MOVHLPS for unary v2f64 {0,0} and {1,1} shuffles with SSE2. I'm trying to restrict the MOVLHPS/MOVHLPS ISD nodes to SSE1 only. With SSE2 we can use unpcks. I believe this will allow some patterns to be cleaned up to require fewer bitcasts. I've put in an odd isel hack to still select MOVHLPS instruction from the unpckh node to avoid changing tests and because movhlps is a shorter encoding. Ideally we'd do execution domain switching on this, but the operands are in the wrong order and are tied. We might be able to try a commute in the domain switching using custom code. We already support domain switching for UNPCKLPD and MOVLHPS. llvm-svn: 337348	2018-07-18 05:10:51 +00:00
Justin Hibbits	d52990c71b	Introduce codegen for the Signal Processing Engine Summary: The Signal Processing Engine (SPE) is found on NXP/Freescale e500v1, e500v2, and several e200 cores. This adds support targeting the e500v2, as this is more common than the e500v1, and is in SoCs still on the market. This patch is very intrusive because the SPE is binary incompatible with the traditional FPU. After discussing with others, the cleanest solution was to make both SPE and FPU features on top of a base PowerPC subset, so all FPU instructions are now wrapped with HasFPU predicates. Supported by this are: * Code generation following the SPE ABI at the LLVM IR level (calling conventions) * Single- and Double-precision math at the level supported by the APU. Still to do: * Vector operations * SPE intrinsics As this changes the Callee-saved register list order, one test, which tests the precise generated code, was updated to account for the new register order. Reviewed by: nemanjai Differential Revision: https://reviews.llvm.org/D44830 llvm-svn: 337347	2018-07-18 04:25:10 +00:00
Peter Collingbourne	fc50498ced	CodeGen: Don't create address significance table entries for thread-local variables. The presence of these symbols in the symbol table can cause symbol type mismatch errors (or undefined symbol errors on emulated TLS targets) and they can't be ICF'd anyway. llvm-svn: 337338	2018-07-18 00:21:40 +00:00
Craig Topper	a29f58dc31	[X86] Remove the vector alignment requirement from the patterns added in r337320. The resulting instruction will only load 64 bits so alignment isn't required. llvm-svn: 337334	2018-07-17 23:26:20 +00:00
Peter Collingbourne	bd9d313d5c	CodeGen: Add a target option for emitting .addrsig directives for all address-significant symbols. Differential Revision: https://reviews.llvm.org/D48143 llvm-svn: 337331	2018-07-17 22:40:08 +00:00
Craig Topper	9ef92865ec	[X86] Add patterns for folding full vector load into MOVHPS and MOVLPS with SSE1 only. llvm-svn: 337320	2018-07-17 20:16:18 +00:00
Craig Topper	c0f2e306f2	[X86] Add test case for missed opportunity to use MOVLPS on the SSE1 only targets. llvm-svn: 337319	2018-07-17 20:16:15 +00:00
Chandler Carruth	c0cb5731fc	[x86/SLH] Flesh out the data-invariant instruction table a bit based on feedback from Craig. Summary: The only thing he suggested that I've skipped here is the double-wide multiply instructions. Multiply is an area I'm nervous about there being some hidden data-dependent behavior, and it doesn't seem important for any benchmarks I have, so skipping it and sticking with the minimal multiply support that matches what I know is widely used in existing crypto libraries. We can always add double-wide multiply when we have clarity from vendors about its behavior and guarantees. I've tried to at least cover the fundamentals here with tests, although I've not tried to cover every width or permutation. I can add more tests where folks think it would be helpful. Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49413 llvm-svn: 337308	2018-07-17 18:07:59 +00:00
Sam Clegg	28b3e99482	[WebAssembly] Update WebAssemblyLowerEmscriptenEHSjLj to handle separate compilation Previously we were assuming whole program compilation. Now that separate compilation is a thing we need to update this pass. Firstly, it can no longer assert on the existence of malloc and free. This functions might not be in the current translation unit. If we need them then we will generate not imports for them. Secondly the global helper function we create should be marked as weak since we will be generating a separate copy in each translation unit. Finally the names of the symbols used must be unique and fixed since they need to agree across translation units. Differential Revision: https://reviews.llvm.org/D49263 llvm-svn: 337301	2018-07-17 16:40:03 +00:00
Petar Jovanovic	f10e4798b4	[Mips][FastISel] Fix handling of icmp with i1 type The Mips FastISel back-end does not extend i1 values while lowering icmp. Ensure that we bail into DAG ISel when handling this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D49290 llvm-svn: 337288	2018-07-17 14:57:46 +00:00
Tim Renouf	e1016f1bc7	More fixes for subreg join failure in RegCoalescer Summary: Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange intervals when updating. 2 changes. The first to ensure that bogus SubRange Segments aren't propagated when encountering Segments of the form [1234r, 1234d:0) when preparing to merge value numbers. These can be removed in this case. The second forces a shrinkToUses call if SubRanges end on the copy index (instead of just the parent register). V2: Addressed review comments, plus MIR test instead of ll test Subscribers: MatzeB, qcolombet, nhaehnle Differential Revision: https://reviews.llvm.org/D40308 Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05 llvm-svn: 337273	2018-07-17 12:38:39 +00:00
Simon Pilgrim	e4d12bb2d6	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258	2018-07-17 09:45:35 +00:00
Daniel Cederman	c812dcc3db	[Sparc] Do not depend on icc for ta 1 The ta instruction will always trap, regardless of the value of the integer condition codes. TRAPri is marked as using icc, so we cannot use a pattern for TRAPri to implement ta 1, as verify-machineinstrs can complain that icc is not defined. Instead we implement ta 1 the same way as ta 5. llvm-svn: 337236	2018-07-17 05:49:33 +00:00
Craig Topper	c376a1916b	[X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234	2018-07-17 05:48:48 +00:00
Craig Topper	c89ef8b13e	[X86] Add test cases for selecting floor/ceil/trunc/rint/nearbyint to rndscale with masking, loading, and broadcasting. llvm-svn: 337233	2018-07-17 05:48:46 +00:00
Craig Topper	6751727d76	[X86] Add a missing FMA3 scalar intrinsic pattern. This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics. llvm-svn: 337223	2018-07-16 23:10:58 +00:00
Sanjay Patel	c71adc8040	[Intrinsics] define funnel shift IR intrinsics + DAG builder support As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221	2018-07-16 22:59:31 +00:00
Farhana Aleen	c370d7b33d	[AMDGPU] [AMDGPU] Support a fdot2 pattern. Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198	2018-07-16 18:19:59 +00:00
Wei Mi	40c4aa7637	[RegAlloc] Skip global splitting if the live range is huge and its spill is trivially rematerializable. We run into a case where machineLICM hoists a large number of live ranges outside of a big loop because it thinks those live ranges are trivially rematerializable. In regalloc, global splitting is tried out first for those live ranges before they are spilled and rematerialized. Because the global splitting algorithm is quadratic, increasing a lot of global splitting candidates causes huge compile time increase (50s to 1400s on my local machine when compiling a module). However, we think for live ranges which are very large and are trivially rematerialiable, it is better to just skip global splitting so as to save compile time with little chance of sacrificing performance. We uses the segment size of live range to indirectly evaluate whether the global splitting of the live range can introduce high cost, and use an option as a knob to adjust the size limit threshold. Differential Revision: https://reviews.llvm.org/D49353 llvm-svn: 337186	2018-07-16 15:42:20 +00:00
Chandler Carruth	fa065aa75c	[x86/SLH] Completely rework how we sink post-load hardening past data invariant instructions to be both more correct and much more powerful. While testing, I continued to find issues with sinking post-load hardening. Unfortunately, it was amazingly hard to create any useful tests of this because we were mostly sinking across copies and other loading instructions. The fact that we couldn't sink past normal arithmetic was really a big oversight. So first, I've ported roughly the same set of instructions from the data invariant loads to also have their non-loading varieties understood to be data invariant. I've also added a few instructions that came up so often it again made testing complicated: inc, dec, and lea. With this, I was able to shake out a few nasty bugs in the validity checking. We need to restrict to hardening single-def instructions with defined registers that match a particular form: GPRs that don't have a NOREX constraint directly attached to their register class. The (tiny!) test case included catches all of the issues I was seeing (once we can sink the hardening at all) except for the NOREX issue. The only test I have there is horrible. It is large, inexplicable, and doesn't even produce an error unless you try to emit encodings. I can keep looking for a way to test it, but I'm out of ideas really. Thanks to Ben for giving me at least a sanity-check review. I'll follow up with Craig to go over this more thoroughly post-commit, but without it SLH crashes everywhere so landing it for now. Differential Revision: https://reviews.llvm.org/D49378 llvm-svn: 337177	2018-07-16 14:58:32 +00:00
Petar Jovanovic	021e4c82eb	[MIPS GlobalISel] Select instructions to load and store i32 on stack Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and G_CONSTANT. Support loads and stores of i32 values. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D48957 llvm-svn: 337168	2018-07-16 13:29:32 +00:00
Roman Lebedev	de506632aa	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166	2018-07-16 12:44:10 +00:00
Daniel Cederman	92a700a215	[Sparc] Use the correct encoding for ta 3 Summary: The old encoding generated a "tn %g1 + 3" instruction instead of the expected "ta 3". Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49171 llvm-svn: 337165	2018-07-16 12:28:26 +00:00
Daniel Cederman	ab09da7b57	[Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3 Summary: These are the names used in libgcc. Reviewers: venkatra, jyknight, ekedaigle Reviewed By: jyknight Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48915 llvm-svn: 337164	2018-07-16 12:22:08 +00:00
Daniel Cederman	68765757d4	[Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic Summary: Software trap number one is the trap used for breakpoints in the Sparc ABI. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48637 llvm-svn: 337163	2018-07-16 12:16:53 +00:00
Daniel Cederman	c3d8002c2e	Avoid losing Hi part when expanding VAARG nodes on big endian machines Summary: If the high part of the load is not used the offset to the next element will not be set correctly. For example, on Sparc V8, the following code will read val2 from offset 4 instead of 8. ``` int val = __builtin_va_arg(va, long long); int val2 = __builtin_va_arg(va, int); ``` Reviewers: jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48595 llvm-svn: 337161	2018-07-16 12:14:17 +00:00
Chandler Carruth	e66a6f48e3	[x86/SLH] Fix a bug where we would try to post-load harden non-GPRs. Found cases that hit the assert I added. This patch factors the validity checking into a nice helper routine and calls it when deciding to harden post-load, and asserts it when doing so later. I've added tests for the various ways of loading a floating point type, as well as loading all vector permutations. Even though many of these go to identical instructions, it seems good to somewhat comprehensively test them. I'm confident there will be more fixes needed here, I'll try to add tests each time as I get this predicate adjusted. llvm-svn: 337160	2018-07-16 11:38:48 +00:00
Mark Searles	72da47df25	run post-RA hazard recognizer pass late Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154	2018-07-16 10:02:41 +00:00
Craig Topper	07a1787501	[X86] Merge the FR128 and VR128 regclass since they have identical spill and alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147	2018-07-16 06:56:09 +00:00
Chandler Carruth	cdf0addc65	[x86/SLH] Teach speculative load hardening to correctly harden the indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144	2018-07-16 04:17:51 +00:00
Craig Topper	cdb0ed2910	[X86] Add custom execution domain fixing for 128/256-bit integer logic operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137	2018-07-15 23:32:36 +00:00
Craig Topper	ec0038398a	[X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns to match AVX. llvm-svn: 337135	2018-07-15 18:51:08 +00:00
Craig Topper	8f34858779	[X86] Use 128-bit ops for 256-bit vzmovl patterns. 128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2. The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue. llvm-svn: 337134	2018-07-15 18:51:07 +00:00
Sanjay Patel	a41c886c55	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X) This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130	2018-07-15 16:27:07 +00:00
Sanjay Patel	810f51ec1b	[AMDGPU] adjusted test checks because minnum with NaN gets simplified This was improved with rL337127, but I missed the failure in this test. I'm not sure what the expected result will be, so I've generalized it and added a FIXME comment. llvm-svn: 337128	2018-07-15 15:14:40 +00:00
Craig Topper	60ce856134	[X86] Add some optsize patterns for 256-bit X86vzmovl. These patterns use VMOVSS/SD. Without optsize we use BLENDI instead. llvm-svn: 337119	2018-07-15 06:03:19 +00:00
Francis Visoiu Mistrih	f905bf1408	[MachineOutliner] Check the last instruction from the sequence when updating liveness The MachineOutliner was doing an std::for_each from the call (inserted before the outlined sequence) to the iterator at the end of the sequence. std::for_each needs the iterator past the end, so the last instruction was not taken into account when propagating the liveness information. This fixes the machine verifier issue in machine-outliner-disubprogram.ll. Differential Revision: https://reviews.llvm.org/D49295 llvm-svn: 337090	2018-07-14 09:40:01 +00:00
Chandler Carruth	fb503ac027	[x86/SLH] Fix an issue where we wouldn't harden any loads if we found no conditions. This is only valid to do if we're hardening calls and rets with LFENCE which results in an LFENCE guarding the entire entry block for us. llvm-svn: 337089	2018-07-14 09:32:37 +00:00
Craig Topper	7426cf6717	[X86] Fix a subtle bug in the custom execution domain fixing for blends. The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted. Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file. llvm-svn: 337088	2018-07-14 06:30:30 +00:00
Craig Topper	f0b164415c	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083	2018-07-14 02:05:08 +00:00
Evgeniy Stepanov	1971ba097d	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering" This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079	2018-07-14 01:20:53 +00:00
Tim Shen	e78bd2815e	Add a CHECK line for r337072. llvm-svn: 337074	2018-07-13 23:48:59 +00:00
Krzysztof Parzyszek	7ced04c0fd	[Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs If an HVX vector register is to be coalesced into a vector pair, make sure that the vector pair will not have a function call in its live range, unless it already had one. All HVX vector registers are volatile, so any vector register live across a function call will have to be spilled. If a vector needs to be spilled, and it's coalesced into a vector pair then the whole pair will need to be spilled (even if only a part of it is live), taking extra stack space. llvm-svn: 337073	2018-07-13 23:42:29 +00:00
Tim Shen	9e25d5d2ce	[LSR] If no Use is interesting, early return. Summary: By looking at the callers of getUse(), we can see that even though IVUsers may offer uses, but they may not be interesting to LSR. It's possible that none of them is interesting. Reviewers: sanjoy Subscribers: jlebar, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D49049 llvm-svn: 337072	2018-07-13 23:40:00 +00:00
Tom Stellard	ac68471326	AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056	2018-07-13 22:16:03 +00:00
Craig Topper	da424ba1c5	[X86][FastISel] Support uitofp with avx512. llvm-svn: 337055	2018-07-13 22:09:30 +00:00
Tom Stellard	390a5f4774	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046	2018-07-13 21:05:14 +00:00
Craig Topper	f0831eef0b	[X86][FastISel] Add EVEX support to sitofp handling. llvm-svn: 337045	2018-07-13 21:03:43 +00:00
Fangrui Song	90d9c201dc	[X86] Try fixing r336768 llvm-svn: 337043	2018-07-13 20:54:24 +00:00
Matt Arsenault	de95077780	AMDGPU: Fix handling of alignment padding in DAG argument lowering This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021	2018-07-13 16:40:25 +00:00
Roman Lebedev	b64e74feed	[NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' pattern See D49247, D49266 I'm only adding the sane negative tests, and not adding the one-use tests yet. Also, not adding negative tests for the second pattern with inverted operands yet, since it's handling will be added in later differential. llvm-svn: 337014	2018-07-13 16:14:37 +00:00
Nemanja Ivanovic	080c35050e	[PowerPC] Materialize more constants with CR-field set in late peephole Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008	2018-07-13 15:21:03 +00:00
Simon Atanasyan	fb0d4e4432	[mips] Add microMIPS case to the tests and regenerate assertions using update_llc_test_checks.py. NFC llvm-svn: 337004	2018-07-13 15:03:24 +00:00
Chandler Carruth	90358e1ef1	[SLH] Introduce a new pass to do Speculative Load Hardening to mitigate Spectre variant #1 for x86. There is a lengthy, detailed RFC thread on llvm-dev which discusses the high level issues. High level discussion is probably best there. I've split the design document out of this patch and will land it separately once I update it to reflect the latest edits and updates to the Google doc used in the RFC thread. This patch is really just an initial step. It isn't quite ready for prime time and is only exposed via debugging flags. It has two major limitations currently: 1) It only supports x86-64, and only certain ABIs. Many assumptions are currently hard-coded and need to be factored out of the code here. 2) It doesn't include any options for more fine-grained control, either of which control flow edges are significant or which loads are important to be hardened. 3) The code is still quite rough and the testing lighter than I'd like. However, this is enough for people to begin using. I have had numerous requests from people to be able to experiment with this patch to understand the trade-offs it presents and how to use it. We would also like to encourage work to similar effect in other toolchains. The ARM folks are actively developing a system based on this for AArch64. We hope to merge this with their efforts when both are far enough along. But we also don't want to block making this available on that effort. Many thanks to the numerous people who helped along the way here. For this patch in particular, both Eric and Craig did a ton of review to even have confidence in it as an early, rough cut at this functionality. Differential Revision: https://reviews.llvm.org/D44824 llvm-svn: 336990	2018-07-13 11:13:58 +00:00
Chandler Carruth	1151692ec5	[x86] Fix a capitalization that I failed to save in my editor before landing the patch. =/ llvm-svn: 336986	2018-07-13 09:48:04 +00:00
Chandler Carruth	caa7b03a50	[x86] Teach the EFLAGS copy lowering to handle much more complex control flow patterns including forks, merges, and even cyles. This tries to cover a reasonably comprehensive set of patterns that still don't require PHIs or PHI placement. The coverage was inspired by the amazing variety of patterns produced when copy EFLAGS and restoring it to implement Speculative Load Hardening. Without this patch, we simply cannot make such complex and invasive changes to x86 instruction sequences due to EFLAGS. I've added "just" one test, but this test covers many different complexities and corner cases of this approach. It is actually more comprehensive, as far as I can tell, than anything that I have encountered in the wild on SLH. Because the test is so complex, I've tried to give somewhat thorough comments and an ASCII-art diagram of the control flows to make it a bit easier to read and maintain long-term. Differential Revision: https://reviews.llvm.org/D49220 llvm-svn: 336985	2018-07-13 09:39:10 +00:00
Simon Pilgrim	9fe0bf3be7	[AArch64] Updated bigendian buildvector tests As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests. llvm-svn: 336981	2018-07-13 09:25:32 +00:00
Simon Pilgrim	a39389ebff	[ARM] Regenerated arg endian test As requested on D49262 llvm-svn: 336980	2018-07-13 09:16:56 +00:00
Craig Topper	2ab325ba23	[X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into scalar intrinsic instructions. This is not an optimization we should be doing in isel. This is more suitable for a DAG combine. My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this. llvm-svn: 336971	2018-07-13 04:50:39 +00:00

... 2 3 4 5 6 ...

25458 Commits