llvm-project

Commit Graph

Author	SHA1	Message	Date
Sanjay Patel	9aad934710	[x86] add tests to show missing div/rem simplifications; NFC These are not x86-specific, but the problem is not visible for all targets because it is masked by other transforms. These can lead to compiler crashes. llvm-svn: 297017	2017-03-06 15:50:07 +00:00
Nemanja Ivanovic	12e67d868a	[PowerPC] Fix failure with STBRX when store is narrower than the bswap Fixes a crash caused by r296811 by truncating the input of the STBRX node when the bswap is wider than i32. Fixes https://bugs.llvm.org/show_bug.cgi?id=32140 Differential Revision: https://reviews.llvm.org/D30615 llvm-svn: 297001	2017-03-06 07:32:13 +00:00
Dean Michael Berris	7e8eea429f	[XRay] Allow logging the first argument of a function call. Summary: Functions with the "xray-log-args" attribute will have a special XRay sled kind emitted, for compiler-rt to copy any call arguments to your logging handler. For practical and performance reasons, only the first argument is supported, and only up to 64 bits. Reviewers: dberris Reviewed By: dberris Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29702 llvm-svn: 296998	2017-03-06 06:48:56 +00:00
Simon Pilgrim	584d6d9d91	[SelectionDAG] Fix vector splitting for *_EXTEND_VECTOR_INREG instructions Found by fuzz testing after rL296985 landed llvm-svn: 296989	2017-03-05 15:52:18 +00:00
Simon Pilgrim	9f5c251d57	[X86][SSE] Lower 128-bit vectors to SIGN/ZERO_EXTEND_VECTOR_IN_REG ops As described on PR31712, we miss a variety of legalization combines because we lower these to X86ISD::VSEXT/VZEXT despite them having the same functionality. This patch makes 128-bit (SSE41) SIGN/ZERO_EXTEND_VECTOR_IN_REG ops legal, adds the necessary tablegen plumbing and uses a helper 'getExtendInVec' to decide when to use SIGN/ZERO_EXTEND_VECTOR_IN_REG or VSEXT/VZEXT. We're missing a couple of shuffle combines that will be added in a future patch for review. Later patches can then support the AVX2 cases as a mixture of SIGN/ZERO_EXTEND and SIGN/ZERO_EXTEND_VECTOR_IN_REG, and then finally deal with the AVX512 cases. Differential Revision: https://reviews.llvm.org/D30549 llvm-svn: 296985	2017-03-05 09:57:20 +00:00
Sanjay Patel	b974be5ef4	[x86] don't require a zext when forming ADC/SBB The larger goal is to move the ADC/SBB transforms currently in combineX86SetCC() to combineAddOrSubToADCOrSBB() because we're creating ADC/SBB in lots of places where we shouldn't. This was intended to be an NFC change, but avx-512 has something strange going on. It doesn't seem like any of the affected tests should really be using SET+TEST or ADC; a simple ADD could replace several instructions. But that's another bug... llvm-svn: 296978	2017-03-04 20:35:19 +00:00
Sanjay Patel	066f3208bf	[DAGCombiner] allow transforming (select Cond, C +/- 1, C) to (add(ext Cond), C) select Cond, C +/- 1, C --> add(ext Cond), C -- with a target hook. This is part of the ongoing process to obsolete D24480. The motivation is to canonicalize to select IR in InstCombine whenever possible, so we need to have a way to undo that easily in codegen. PowerPC is an obvious winner for this kind of transform because it has fast and complete bit-twiddling abilities but generally lousy conditional execution perf (although this might have changed in recent implementations). x86 also sees some wins, but the effect is limited because these transforms already mostly exist in its target-specific combineSelectOfTwoConstants(). The fact that we see any x86 changes just shows that that code is a mess of special-case holes. We may be able to remove some of that logic now. My guess is that other targets will want to enable this hook for most cases. The likely follow-ups would be to add value type and/or the constants themselves as parameters for the hook. As the tests in select_const.ll show, we can transform any select-of-constants to math/logic, but the general transform for any 2 constants needs one more instruction (multiply or 'and'). ARM is one target that I think may not want this for most cases. I see infinite loops there because it wants to use selects to enable conditionally executed instructions. Differential Revision: https://reviews.llvm.org/D30537 llvm-svn: 296977	2017-03-04 19:18:09 +00:00
Simon Pilgrim	40a0e66b37	[X86][SSE] Enable post-legalize vXi64 shuffle combining on 32-bit targets Long ago (2010 according to svn blame), combineShuffle probably needed to prevent the accidental creation of illegal i64 types but there doesn't appear to be any combines that can cause this any more as they all have their own legality checks. Differential Revision: https://reviews.llvm.org/D30213 llvm-svn: 296966	2017-03-04 12:50:47 +00:00
Florian Hahn	6406f98342	[legalize-types] Remove stale entries from SoftenedFloats. Summary: When replacing a SDValue, we should remove the replaced value from SoftenedFloats (and possibly the other maps as well?). When we revisit a Node because it needs analyzing again, we have to remove all result values from SoftenedFloats (and possibly other maps?). This fixes the fp128 test failures with expensive checks for X86. I think we probably should also remove the values from the other maps (PromotedIntegers and so on), let me know what you think. Reviewers: baldrick, bogner, davidxl, ab, arsenm, pirama, chh, RKSimon Reviewed By: chh Subscribers: danalbert, wdng, srhines, hfinkel, sepavloff, llvm-commits Differential Revision: https://reviews.llvm.org/D29265 llvm-svn: 296964	2017-03-04 12:00:35 +00:00
Matthias Braun	21f340fd25	X86ISelLowering: Only perform copy elision on legal types. This fixes cases where i1 types were not properly legalized yet and lead to the creating of 0-sized stack slots. This fixes http://llvm.org/PR32136 llvm-svn: 296950	2017-03-04 01:40:40 +00:00
Sanjay Patel	a84fd041c6	[x86] check for commuted add pattern to find ADC/SBB llvm-svn: 296933	2017-03-04 00:18:31 +00:00
Sanjay Patel	71c1958fca	[x86] add test to show failed recognition of commuted pattern; NFC llvm-svn: 296931	2017-03-04 00:06:37 +00:00
Hans Wennborg	1c9d800fbc	Revert r296865 "[ARM] fpscr read/write intrinsics not aware of each other" It caused PR32134: "Cannot select: intrinsic %llvm.arm.get.fpscr". llvm-svn: 296926	2017-03-03 23:19:31 +00:00
Tim Northover	3e6a7afd81	GlobalISel: constrain G_INSERT to inserting just one value per instruction. It's much easier to reason about single-value inserts and no-one was actually using the variadic variants before. llvm-svn: 296923	2017-03-03 23:05:47 +00:00
Tim Northover	bf017293af	GlobalISel: add merge/unmerge nodes for legalization. These are simplified variants of the current G_SEQUENCE and G_EXTRACT, which assume the individual parts will be contiguous, homogeneous, and occupy the entirity of the larger register. This makes reasoning about them much easer since you only have to look at the first register being merged and the result to know what the instruction is doing. I intend to gradually replace all uses of the more complicated sequence/extract with these (or single-element insert/extracts), and then remove the older variants. For now we start with legalization. llvm-svn: 296921	2017-03-03 22:46:09 +00:00
Sanjay Patel	000d61acfd	[x86] regenerate checks; NFC llvm-svn: 296908	2017-03-03 20:48:54 +00:00
Sanjay Patel	1716aa45f1	[x86] regenerate checks; NFC llvm-svn: 296883	2017-03-03 16:58:51 +00:00
Sanjay Patel	9f5db7d4e0	[x86] regenerate checks; NFC llvm-svn: 296881	2017-03-03 16:45:57 +00:00
Sanjay Patel	872e0b86eb	[x86] regenerate checks; NFC llvm-svn: 296880	2017-03-03 16:42:43 +00:00
Sanjay Patel	6fa6316769	[x86] regenerate checks; NFC llvm-svn: 296877	2017-03-03 16:34:35 +00:00
Ranjeet Singh	7b60a9ed0c	[ARM] fpscr read/write intrinsics not aware of each other The intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr read and write to the fpscr (Floating-Point Status and Control Register) register. A bug exists in the __builtin_arm_get_fpscr intrinsic definition in llvm which treats this intrinsic as a IntroNoMem which means it's not a memory access and doesn't have any other side-effects. Having this property on this intrinsic means that various optimizations can be done on this such as common sub-expression elimination with other reads. This can cause issues if there has been write to this register, e.g. void foo(int *p) { p[0] = __builtin_arm_get_fpscr(); __builtin_arm_set_fpscr(1); p[1] = __builtin_arm_get_fpscr(); } in the above example the second read is currently CSE'd into the first read, this is because llvm isn't aware that the write done by __builtin_arm_set_fpscr effects the same register that __builtin_arm_get_fpscr reads from, to fix this problem I've removed the property IntrNoMem so that __builtin_arm_get_fpscr is treated as a memory access. Differential Revision: https://reviews.llvm.org/D30542 llvm-svn: 296865	2017-03-03 11:40:07 +00:00
Chandler Carruth	ce52b80744	[SDAG] Revert r296476 (and r296486, r296668, r296690). This patch causes compile times for some patterns to explode. I have a (large, unreduced) test case that slows down by more than 20x and several test cases slow down by 2x. I'm sending some of the test cases directly to Nirav and following up with more details in the review log, but this should unblock anyone else hitting this. llvm-svn: 296862	2017-03-03 10:02:25 +00:00
Amjad Aboud	4f97751798	[X86] Generate VZEROUPPER for Skylake-avx512. VZEROUPPER should not be issued on Knights Landing (KNL), but on Skylake-avx512 it should be. Differential Revision: https://reviews.llvm.org/D29874 llvm-svn: 296859	2017-03-03 09:03:24 +00:00
Igor Breger	321cf3c650	[GlobalISel][X86] Support float/double and vector types. Summary: [GlobalISel][X86] Add support for f32/f64 and vector types in RegisterBank and InstructionSelector. Reviewers: delena, zvi Reviewed By: zvi Subscribers: dberris, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D30533 llvm-svn: 296856	2017-03-03 08:06:46 +00:00
Kyle Butt	1fa6030767	CodeGen: BlockPlacement: Precompute layout for chains of triangles. For chains of triangles with small join blocks that can be tail duplicated, a simple calculation of probabilities is insufficient. Tail duplication can be profitable in 3 different ways for these cases: 1) The post-dominators marked 50% are actually taken 56% (This shrinks with longer chains) 2) The chains are statically correlated. Branch probabilities have a very U-shaped distribution. [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805] If the branches in a chain are likely to be from the same side of the distribution as their predecessor, but are independent at runtime, this transformation is profitable. (Because the cost of being wrong is a small fixed cost, unlike the standard triangle layout where the cost of being wrong scales with the # of triangles.) 3) The chains are dynamically correlated. If the probability that a previous branch was taken positively influences whether the next branch will be taken We believe that 2 and 3 are common enough to justify the small margin in 1. The code pre-scans a function's CFG to identify this pattern and marks the edges so that the standard layout algorithm can use the computed results. llvm-svn: 296845	2017-03-03 01:00:22 +00:00
Taewook Oh	96c6415697	[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) Summary: Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below ``` extern int bar(); extern int baz(); int foo(int x, int y) { if (x != y) return bar(); else return baz(); } ``` , following is the bitcode representation of 'foo' at the end of llvm-ir level optimization: ``` define i32 @foo(i32 %x, i32 %y) !dbg !4 { entry: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12 tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13 %cmp = icmp ne i32 %x, %y, !dbg !14 br i1 %cmp, label %if.then, label %if.else, !dbg !16 if.then: ; preds = %entry %call = tail call i32 (...) @bar() #3, !dbg !17 br label %return, !dbg !18 if.else: ; preds = %entry %call1 = tail call i32 (...) @baz() #3, !dbg !19 br label %return, !dbg !20 return: ; preds = %if.else, %if.then %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ] ret i32 %retval.0, !dbg !21 } !14 = !DILocation(line: 5, column: 9, scope: !15) !16 = !DILocation(line: 5, column: 7, scope: !4) ``` As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue. Reviewers: atrick, bogner, andreadb, craig.topper, aprantl Reviewed By: andreadb Subscribers: jlebar, mkuper, jholewinski, andreadb, llvm-commits Differential Revision: https://reviews.llvm.org/D29813 llvm-svn: 296825	2017-03-02 21:58:35 +00:00
Krzysztof Parzyszek	e720feb1c6	[Hexagon] Pick the right branch opcode depending on branch probabilities Specifically, pick the opcode with the correct branch prediction, i.e. jump:t or jump:nt. llvm-svn: 296821	2017-03-02 21:49:49 +00:00
Tobias Grosser	02d86b80ec	Revert "AMDGPU: Re-do update for branch-relaxation test" This commit also relied on r296812, which I just reverted. We should probably apply it again, after the r296812 has been discussed and been reapplied in some variant. llvm-svn: 296820	2017-03-02 21:47:51 +00:00
Kyle Butt	1393761e0c	CodeGen: MachineBlockPlacement: Remove the unused outlining heuristic. Outlining optional branches isn't a good heuristic, and it's never been on by default. Remove it to clean things up. llvm-svn: 296818	2017-03-02 21:44:24 +00:00
Eli Friedman	bb821276d0	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would move stores to the wrong insert point. Re-commit with a fix to increment NumMove in the right place. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296815	2017-03-02 21:39:39 +00:00
Guozhi Wei	ed28e742ee	[PPC] Fix code generation for bswap(int32) followed by store16 This patch fixes pr32063. Current code in PPCTargetLowering::PerformDAGCombine can transform bswap store into a single PPCISD::STBRX instruction. but it doesn't consider the case that the operand size of bswap may be larger than store size. When it occurs, we need 2 modifications, 1 For the last operand of PPCISD::STBRX, we should not use DAG.getValueType(N->getOperand(1).getValueType()), instead we should use cast<StoreSDNode>(N)->getMemoryVT(). 2 Before PPCISD::STBRX, we need to shift the original operand of bswap to the right side. Differential Revision: https://reviews.llvm.org/D30362 llvm-svn: 296811	2017-03-02 21:07:59 +00:00
Chad Rosier	ea25eca04a	[AArch64] Extend redundant copy elimination pass to handle non-zero stores. This patch extends the current functionality of the AArch64 redundant copy elimination pass to handle non-zero cases such as: BB#0: cmp x0, #1 b.eq .LBB0_1 .LBB0_1: orr x0, xzr, #0x1 ; <-- redundant copy; x0 known to hold #1. Differential Revision: https://reviews.llvm.org/D29344 llvm-svn: 296809	2017-03-02 20:48:11 +00:00
Vadzim Dambrouski	eafb805506	[MSP430] Add SRet support to MSP430 target This patch adds support for struct return values to the MSP430 target backend. It also reverses the order of argument and return registers in the calling convention to bring it into closer alignment with the published EABI from TI. Patch by Andrew Wygle (awygle). Differential Revision: https://reviews.llvm.org/D29069 llvm-svn: 296807	2017-03-02 20:25:10 +00:00
Simon Pilgrim	b3067dc374	[X86][MMX] Fixed i32 extraction on 32-bit targets MMX extraction often ends up as extract_i32(bitcast_v2i32(extract_i64(bitcast_v1i64(x86mmx v), 0)), 0) which fails to simplify on 32-bit targets as i64 isn't legal llvm-svn: 296782	2017-03-02 18:56:06 +00:00
Krzysztof Parzyszek	056c945a5d	[Hexagon] Skip blocks that define vector predicate registers in early-if llvm-svn: 296777	2017-03-02 18:10:59 +00:00
Krzysztof Parzyszek	fcbb7d10fe	[Hexagon] Properly handle 'q' constraint in 128-byte vector mode llvm-svn: 296772	2017-03-02 17:50:24 +00:00
Nemanja Ivanovic	db8425eff0	[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size This patch reduces the stack frame size by not allocating the parameter area if it is not required. In the current implementation LowerFormalArguments_64SVR4 already handles the parameter area, but LowerCall_64SVR4 does not (when calculating the stack frame size). What this patch does is make LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29881 llvm-svn: 296771	2017-03-02 17:38:59 +00:00
Sanjay Patel	fffa179837	[DAGCombiner] avoid assertion when folding binops with opaque constants This bug was introduced with: https://reviews.llvm.org/rL296699 There may be a way to loosen the restriction, but for now just bail out on any opaque constant. The tests show that opacity is target-specific. This goes back to cost calculations in ConstantHoisting based on TTI->getIntImmCost(). llvm-svn: 296768	2017-03-02 17:18:56 +00:00
Tim Northover	e80d6d1360	GlobalISel: record correct stack usage for signext parameters. The CallingConv.td rules allocate 8 bytes for these kinds of arguments on AAPCS targets, but we were only recording the smaller amount. The difference is theoretical on AArch64 because we don't actually store more than the smaller amount, but it's still much better to have these two components in agreement. Based on Diana Picus's ARM equivalent patch (where it matters a lot more). llvm-svn: 296754	2017-03-02 15:34:18 +00:00
Andrew V. Tischenko	2855dc7ddc	Added special test covering a problem with PIC relocation model on SLM architecture. The fix will come in D26855. llvm-svn: 296746	2017-03-02 13:47:03 +00:00
Serge Pavlov	e2bf69715f	Do not verify MachimeDominatorTree if it is not calculated If dominator tree is not calculated or is invalidated, set corresponding pointer in the pass state to nullptr. Such pointer value will indicate that operations with dominator tree are not allowed. In particular, it allows to skip verification for such pass state. The dominator tree is not calculated if the machine dominator pass was skipped, it occures in the case of entities with linkage available_externally. The change fixes some test fails observed when expensive checks are enabled. Differential Revision: https://reviews.llvm.org/D29280 llvm-svn: 296742	2017-03-02 12:00:10 +00:00
Matthias Braun	dbcf9e2ee4	LiveRegMatrix: Fix some subreg interference checks Surprisingly, one of the three interference checks in LiveRegMatrix was using the main live range instead of the apropriate subregister range resulting in unnecessarily conservative results. llvm-svn: 296722	2017-03-02 00:35:08 +00:00
Eli Friedman	933863ce61	Revert r296708; causing test failures on ARM hosts. Original commit message: [ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. llvm-svn: 296718	2017-03-02 00:08:50 +00:00
Amaury Sechet	71f511fd1e	[DAGCombiner] mulhi + 1 never overflow. Summary: This can be used to optimize large multiplications after legalization. Depends on D29565 Reviewers: mkuper, spatel, RKSimon, zvi, bkramer, aaboud, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D29587 llvm-svn: 296711	2017-03-01 23:44:17 +00:00
Ahmed Bougacha	120ae22d70	[GlobalISel] Add a way for targets to enable GISel. Until now, we've had to use -global-isel to enable GISel. But using that on other targets that don't support it will result in an abort, as we can't build a full pipeline. Additionally, we want to experiment with enabling GISel by default for some targets: we can't just enable GISel by default, even among those target that do have some support, because the level of support varies. This first step adds an override for the target to explicitly define its level of support. For AArch64, do that using a new command-line option (I know..): -aarch64-enable-global-isel-at-O=<N> Where N is the opt-level below which GISel should be used. Default that to -1, so that we still don't enable GISel anywhere. We're not there yet! While there, remove a couple LLVM_UNLIKELYs. Building the pipeline is such a cold path that in practice that shouldn't matter at all. llvm-svn: 296710	2017-03-01 23:33:08 +00:00
Amaury Sechet	683f5743f6	Improve mulhi overflow test. NFC llvm-svn: 296709	2017-03-01 23:31:19 +00:00
Eli Friedman	1c9216b003	[ARM] Fix insert point for store rescheduling. In ARMPreAllocLoadStoreOpt::RescheduleOps, LastOp should be the last operation which we want to merge. If we break out of the loop because an operation has the wrong offset, we shouldn't use that operation as LastOp. This patch fixes some cases where we would sink stores for no reason. Differential Revision: https://reviews.llvm.org/D30124 llvm-svn: 296708	2017-03-01 23:20:29 +00:00
Eli Friedman	28c2c0e311	[ARM] Check correct instructions for load/store rescheduling. This code starts from the high end of the sorted vector of offsets, and works backwards: it tries to find contiguous offsets, process them, then pops them from the end of the vector. Most of the code agrees with this order of processing, but one loop doesn't: it instead processes elements from the low end of the vector (which are nodes with unrelated offsets). Fix that loop to process the correct elements. This has a few implications. One, we don't incorrectly return early when processing multiple groups of offsets in the same block (which allows rescheduling prera-ldst-insertpt.mir). Two, we pick the correct insert point for loads, so they're correctly sorted (which affects the scheduling of vldm-liveness.ll). I think it might also impact some of the heuristics slightly. Differential Revision: https://reviews.llvm.org/D30368 llvm-svn: 296701	2017-03-01 22:56:20 +00:00
Sanjay Patel	92938657a0	[DAGCombiner] fold binops with constant into select-of-constants This is part of the ongoing attempt to improve select codegen for all targets and select canonicalization in IR (see D24480 for more background). The transform is a subset of what is done in InstCombine's FoldOpIntoSelect(). I first noticed a regression in the x86 avx512-insert-extract.ll tests with a patch that hopes to convert more selects to basic math ops. This appears to be a general missing DAG transform though, so I added tests for all standard binops in rL296621 (PowerPC was chosen semi-randomly; it has scripted FileCheck support, but so do ARM and x86). The poor output for "sel_constants_shl_constant" is tracked with: https://bugs.llvm.org/show_bug.cgi?id=32105 Differential Revision: https://reviews.llvm.org/D30502 llvm-svn: 296699	2017-03-01 22:51:31 +00:00
Amaury Sechet	250b4a7491	Add test case for mulhi's overflow. NFC llvm-svn: 296696	2017-03-01 22:27:21 +00:00

1 2 3 4 5 ...

19448 Commits