llvm-project

Commit Graph

Author	SHA1	Message	Date
Geoff Berry	51f52c4fca	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding" Two issues identified by buildbots were addressed: - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311135	2017-08-17 23:06:55 +00:00
Sanjay Patel	f2d67f7ecc	[x86] add tests for vector select-of-constants; NFC We've discussed canonicalizing to this form in IR, so the backend should be prepared to lower these in ways better than what we see here in most cases. llvm-svn: 311103	2017-08-17 17:07:37 +00:00
Sanjay Patel	18424e1581	[PowerPC] add tests for vector select-of-constants; NFC We've discussed canonicalizing to this form in IR, so the backend should be prepared to lower these in ways better than what we see here. llvm-svn: 311099	2017-08-17 17:03:11 +00:00
Adrian Prantl	6a57daad81	Improve line debug info when translating a CaseBlock to SDNodes. The SelectionDAGBuilder translates various conditional branches into CaseBlocks which are then translated into SDNodes. If a conditional branch results in multiple CaseBlocks only the first CaseBlock is translated into SDNodes immediately, the rest of the CaseBlocks are put in a queue and processed when all LLVM IR instructions in the basic block have been processed. When a CaseBlock is transformed into SDNodes the SelectionDAGBuilder is queried for the current LLVM IR instruction and the resulting SDNodes are annotated with the debug info of the current instruction (if it exists and has debug metadata). When the deferred CaseBlocks are processed, the SelectionDAGBuilder does not have a current LLVM IR instruction, and the resulting SDNodes will not have any debuginfo. As DwarfDebug::beginInstruction() outputs a .loc directive for the first instruction in a labeled block (typically the case for something coming from a CaseBlock) this tends to produce a line-0 directive. This patch changes the handling of CaseBlocks to store the current instruction's debug info into the CaseBlock when it is created (and the SelectionDAGBuilder knows the current instruction) and to always use the stored debug info when translating a CaseBlock to SDNodes. Patch by Frej Drejhammar! Differential Revision: https://reviews.llvm.org/D36671 llvm-svn: 311097	2017-08-17 16:57:13 +00:00
Craig Topper	3a622a14f9	[AVX512] Don't switch unmasked subvector insert/extract instructions when AVX512DQI is enabled. There's no reason to switch instructions with and without DQI. It just creates extra isel patterns and test divergences. There is however value in enabling the masked version of the instructions with DQI. This required introducing some new multiclasses to enabling this splitting. Differential Revision: https://reviews.llvm.org/D36661 llvm-svn: 311091	2017-08-17 15:40:25 +00:00
Simon Pilgrim	8be9f4af4f	[DAGCombiner] Add support for non-uniform constant vectors to (mul x, (1 << c)) -> x << c llvm-svn: 311083	2017-08-17 13:03:34 +00:00
Daniel Sanders	edd0784be6	Re-commit: [globalisel][tablegen] Support zero-instruction emission. Summary: Support the case where an operand of a pattern is also the whole of the result pattern. In this case the original result and all its uses must be replaced by the operand. However, register class restrictions can require a COPY. This patch handles both cases by always emitting the copy and leaving it for the register allocator to optimize. The previous commit failed on Windows machines due to a flaw in the sort predicate which allowed both A < B < C and B == C to be satisfied simultaneously. The cause of this was some sloppiness in the priority order of G_CONSTANT instructions compared to other instructions. These had equal priority because it makes no difference, however there were operands had higher priority than G_CONSTANT but lower priority than any other instruction. As a result, a priority order between G_CONSTANT and other instructions must be enforced to ensure the predicate defines a strict weak order. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D36084 llvm-svn: 311076	2017-08-17 09:26:14 +00:00
Jonas Paulsson	57a705d9d0	[SystemZ, MachineScheduler] Improve post-RA scheduling. The idea of this patch is to continue the scheduler state over an MBB boundary in the case where the successor block has only one predecessor. This means that the scheduler will continue in the successor block (after emitting any branch instructions) with e.g. maintained processor resource counters. Benchmarks have been confirmed to benefit from this. The algorithm in MachineScheduler.cpp that extracts scheduling regions of an MBB has been extended so that the strategy may optionally reverse the order of processing the regions themselves. This is controlled by a new method doMBBSchedRegionsTopDown(), which defaults to false. Handling the top-most region of an MBB first also means that a top-down scheduler can continue the scheduler state across any scheduling boundary between to regions inside MBB. Review: Ulrich Weigand, Matthias Braun, Andy Trick. https://reviews.llvm.org/D35053 llvm-svn: 311072	2017-08-17 08:33:44 +00:00
Elad Cohen	124d32829c	[SelectionDAG] Teach the vector-types operand scalarizer about SETCC When v1i1 is legal (e.g. AVX512) the legalizer can reach a case where a v1i1 SETCC with an illgeal vector type operand wasn't scalarized (since v1i1 is legal) but its operands does have to be scalarized. This used to assert because SETCC was missing from the vector operand scalarizer. This patch attemps to teach the legalizer to handle these cases by scalazring the operands, converting the node into a scalar SETCC node. Differential revision: https://reviews.llvm.org/D36651 llvm-svn: 311071	2017-08-17 08:06:36 +00:00
Geoff Berry	4e38e02e6f	Revert "[MachineCopyPropagation] Extend pass to do COPY source forwarding" This reverts commit r311038. Several buildbots are breaking, and at least one appears to be due to the forwarding of physical regs enabled by this change. Reverting while I investigate further. llvm-svn: 311062	2017-08-17 04:04:11 +00:00
Saleem Abdulrasool	dd8c16b58e	ARM: mark CPSR as clobbered for Windows VLAs When lowering a VLA, we emit a __chstk call. However, this call can internally clobber CPSR. We did not mark this register as an ImpDef, which could potentially allow a comparison to be hoisted above the call to `__chkstk`. In such a case, the CPSR could be clobbered, and the check invalidated. When the support was initially added, it seemed that the call would take care of preventing CPSR from being clobbered, but this is not the case. Mark the register as clobbered to fix a possible state corruption. llvm-svn: 311061	2017-08-17 02:42:24 +00:00
Sanjay Patel	4abc3f6036	[x86] add cmov promotion tests for D36711; NFC This way we can see what the current codegen looks like. I've also explicitly added/removed the cmov attribute from the RUN lines, so we know exactly what we're checking in the runs. llvm-svn: 311052	2017-08-16 22:50:11 +00:00
Geoff Berry	87f8d25150	[MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. Reviewers: qcolombet, javed.absar, MatzeB, jonpa Subscribers: jyknight, nemanjai, llvm-commits, nhaehnle, mcrosier, mgorny Differential Revision: https://reviews.llvm.org/D30751 llvm-svn: 311038	2017-08-16 20:50:01 +00:00
Simon Pilgrim	38e8a023fa	[X86] Regenerate immediate store merging tests llvm-svn: 311016	2017-08-16 16:22:19 +00:00
Balaram Makam	c5698befb6	Revert "MachineInstr: Reason locally about some memory objects before going to AA." r310825 caused the clang-ppc64le-linux-lnt bot to go red (http://lab.llvm.org:8011/builders/clang-ppc64le-linux-lnt/builds/5712) because of a test-suite failure of SingleSource/UnitTests/2003-07-09-SignedArgs This reverts commit 0028f6a87224fb595a1c19c544cde9b003035996. llvm-svn: 311008	2017-08-16 14:17:43 +00:00
Simon Dardis	9c5d64b901	[mips] Handle variables with an explicit section and interactions with .sdata, .sbss If a variable has an explicit section such as .sdata or .sbss, it is placed in that section and accessed in a gp relative manner. This overrides the global -G setting. Otherwise if a variable has a explicit section attached to it, such as '.rodata' or '.mysection', it is not placed in the small data section. This also overrides the global -G setting. Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D36616 llvm-svn: 311001	2017-08-16 12:18:04 +00:00
Igor Breger	ce5ea38135	[GlobalISel][X86] Fix mir tests, use correct physical register.NFC. llvm-svn: 310996	2017-08-16 07:25:51 +00:00
Stanislav Mekhanoshin	a9487d92d7	[AMDGPU] Eliminate no effect instructions before s_endpgm Differential Revision: https://reviews.llvm.org/D36585 llvm-svn: 310987	2017-08-16 04:43:49 +00:00
Derek Schuff	8e71359561	[WebAssembly] Remove infinite loop from reg-stackify test r310940 exposed reverse-unreachable code to some optimizers, which caused some of the code in this test to be sunk, changing the input to the pass and breaking the exptectations. Since that change is irrelevant to this particular test, this change just adds an exit node to work around the problem; the test should really be more robust (or be an MIR test?) but this preserves the existing test intent. llvm-svn: 310981	2017-08-16 00:49:44 +00:00
Quentin Colombet	647b482c1e	[VirtRegRewriter] Properly model the register liveness on undef subreg definition Undef subreg definition means that the content of the super register doesn't matter at this point. While that's true for virtual registers, this may not hold when replacing them with actual physical registers. Indeed, some part of the physical register may be coalesced with the related virtual register and thus, the values for those parts matter and must be live. The fix consists in checking whether or not subregs of the physical register being assigned to an undef subreg definition are live through that def and insert an implicit use if they are. Doing so, will keep them alive until that point like they should be. E.g., let vreg14 being assigned to R0_R1 then %vreg14:gsub_0<def,read-undef> = COPY %R0 ; <-- R1 is still live here %vreg14:gsub_1<def> = COPY %R1 Before this changes, the rewriter would change the code into: %R0<def> = KILL %R0, %R0_R1<imp-def> ; <-- this tells R1 is redefined %R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use> ; this value of this R1 ; is believed to come ; from the previous ; instruction Because of this invalid liveness, later pass could make wrong choices and in particular clobber live register as it happened with the register scavenger in llvm.org/PR34107 Now we would generate: %R0<def> = KILL %R0, %R0_R1<imp-def>, %R0_R1<imp-use> ; This tells R1 needs to ; reach this point %R1<def> = KILL %R1, %R0_R1<imp-def>, %R0_R1<imp-use> The bug has been here forever, it got exposed recently because the register scavenger got smarter. Fixes llvm.org/PR34107 llvm-svn: 310979	2017-08-16 00:17:05 +00:00
Jakub Kuderski	638c085d07	[Dominators] Include infinite loops in PostDominatorTree Summary: This patch teaches PostDominatorTree about infinite loops. It is built on top of D29705 by @dberlin which includes a very detailed motivation for this change. What's new is that the patch also teaches the incremental updater how to deal with reverse-unreachable regions and how to properly maintain and verify tree roots. Before that, the incremental algorithm sometimes ended up preserving reverse-unreachable regions after updates that wouldn't appear in the tree if it was constructed from scratch on the same CFG. This patch makes the following assumptions: - A sequence of updates should produce the same tree as a recalculating it. - Any sequence of the same updates should lead to the same tree. - Siblings and roots are unordered. The last two properties are essential to efficiently perform batch updates in the future. When it comes to the first one, we can decide later that the consistency between freshly built tree and an updated one doesn't matter match, as there are many correct ways to pick roots in infinite loops, and to relax this assumption. That should enable us to recalculate postdominators less frequently. This patch is pretty conservative when it comes to incremental updates on reverse-unreachable regions and ends up recalculating the whole tree in many cases. It should be possible to improve the performance in many cases, if we decide that it's important enough. That being said, my experiments showed that reverse-unreachable are very rare in the IR emitted by clang when bootstrapping clang. Here are the statistics I collected by analyzing IR between passes and after each removePredecessor call: ``` # functions: 52283 # samples: 337609 # reverse unreachable BBs: 216022 # BBs: 247840796 Percent reverse-unreachable: 0.08716159869015269 % Max(PercRevUnreachable) in a function: 87.58620689655172 % # > 25 % samples: 471 ( 0.1395104988314885 % samples ) ... in 145 ( 0.27733680163724345 % functions ) ``` Most of the reverse-unreachable regions come from invalid IR where it wouldn't be possible to construct a PostDomTree anyway. I would like to commit this patch in the next week in order to be able to complete the work that depends on it before the end of my internship, so please don't wait long to voice your concerns :). Reviewers: dberlin, sanjoy, grosser, brzycki, davide, chandlerc, hfinkel Reviewed By: dberlin Subscribers: nhaehnle, javed.absar, kparzysz, uabelho, jlebar, hiraditya, llvm-commits, dberlin, david2050 Differential Revision: https://reviews.llvm.org/D35851 llvm-svn: 310940	2017-08-15 18:14:57 +00:00
Daniel Sanders	eb2f5f3256	Revert r310919 - [globalisel][tablegen] Support zero-instruction emission. As expected, this failed on the windows bots but the instrumentation showed something interesting. The ADD8ri and INC8r rules are never directly compared on the windows machines. That implies that the issue lies in transitivity of the Compare predicate. I believe I've already verified that but maybe I missed something. llvm-svn: 310922	2017-08-15 15:10:31 +00:00
Daniel Sanders	16e6dd3cd6	Re-commit with some instrumentation: [globalisel][tablegen] Support zero-instruction emission. Summary: Support the case where an operand of a pattern is also the whole of the result pattern. In this case the original result and all its uses must be replaced by the operand. However, register class restrictions can require a COPY. This patch handles both cases by always emitting the copy and leaving it for the register allocator to optimize. The previous commit failed on the windows bots and this one is likely to fail on those same bots. However, the added instrumentation should reveal a particular isHigherPriorityThan() evaluation which I'm expecting to expose that these machines are weighing priority of two rules differently from the non-windows machines. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D36084 llvm-svn: 310919	2017-08-15 13:50:09 +00:00
John Baldwin	1255b165bf	[MIPS] Implement support for -mstack-alignment. Summary: This is modeled on the implementation for x86 which stores the command line option in a 'StackAlignOverride' field in MipsSubtarget and then uses this to compute a 'stackAlignment' value in MipsSubtarget::initializeSubtargetDependencies. The stackAlignment() method in MipsSubTarget is renamed to getStackAlignment() and returns the computed 'stackAlignment'. Reviewers: sdardis Reviewed By: sdardis Subscribers: llvm-commits, arichardson Differential Revision: https://reviews.llvm.org/D35874 llvm-svn: 310891	2017-08-14 21:49:38 +00:00
Lei Huang	451ef4adcd	[PowerPC] Add codegen for VSX word extract convert to FP Add codegen for VSX word extract conversion from signed/unsigned to single/double precision. For UINT_TO_FP: Extract word unsigned and convert to float was implemented in https://reviews.llvm.org/D20239. Here we will add the missing extract integer and conversion to double. This utilizes the new P9 instruction xxextractuw to extracting an integer element when the result will be converted to double thereby saving 2 direct moves (VSR <-> GPR). For SINT_TO_FP: We will implement the following sequence which will also reduce the number of instructions by saving 2 direct moves. v4i32->f32: xxspltw xvcvsxwsp xscvspdpn v4i32->f64: xxspltw xvcvsxwdp Differential Revision: https://reviews.llvm.org/D35859 llvm-svn: 310866	2017-08-14 18:09:29 +00:00
Sanjay Patel	92653865e6	[x86] fold the mask op on 8- and 16-bit rotates Ref the post-commit thread for r310770: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170807/478507.html The motivating cases as 'C' source examples can look like this: unsigned char rotate_right_8(unsigned char v, int shift) { // shift &= 7; v = ( v >> shift ) \| ( v << ( 8 - shift ) ); return v; } https://godbolt.org/g/K6rc1A Notice that the source doesn't contain UB-safe masked shift amounts, but instcombine created those in order to produce narrow rotate patterns. This should be the last step needed to resolve PR34046: https://bugs.llvm.org/show_bug.cgi?id=34046 Differential Revision: https://reviews.llvm.org/D36644 llvm-svn: 310849	2017-08-14 15:55:43 +00:00
Elad Cohen	6a9edda356	[SelectionDAG] combine vextract (v1iX extract_subvector(vNiX, Idx)) into vextract(vNiX,Idx) when creating vextract with getNode(). This case appeared in AVX512 after fixing pr33349 in r310552. Differential revision: https://reviews.llvm.org/D36571 llvm-svn: 310828	2017-08-14 10:49:45 +00:00
Balaram Makam	d9f53414de	MachineInstr: Reason locally about some memory objects before going to AA. This addresses a FIXME in MachineInstr::mayAlias. llvm-svn: 310825	2017-08-14 09:41:40 +00:00
Elad Cohen	3a90a0c10d	Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)" This reverts commit r310782. llvm-svn: 310822	2017-08-14 09:06:00 +00:00
Chandler Carruth	fe6b509f83	[PowerPC] Revert r310346 (and followups r310356 & r310424) which introduce a miscompile bug. There appears to be a bug where the generated code to extract the sign bit doesn't work correctly for 32-bit inputs. I've replied to the original commit pointing out the problem. I think I see by inspection (and reading the manual for PPC) how to fix this, but I can't be 100% confident and I also don't know what the best way to test this is. Currently it seems nearly impossible to get the backend to hit this code path, but the patch autohr is likely in a better position to craft such test cases than I am, and based on where the bug is it should be easily done. Original commit message for r310346: """ [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds the handling for the special case where RHS == 0. Differential Revision: https://reviews.llvm.org/D34048 """ llvm-svn: 310809	2017-08-14 03:41:00 +00:00
Simon Pilgrim	8d9e6e607a	[X86][BMI] Add BEXTR demanded bits test cases (PR34042) llvm-svn: 310802	2017-08-13 20:35:38 +00:00
Gadi Haber	bed2c50607	[X86][SandyBridge] Additional updates to the SNB instructions scheduling information This is a continuation patch for commit r307529 which completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target (see also https://reviews.llvm.org/D35019). In this patch we added the scheduling information of additional SNB instructions that were missing from the patch commit r307529, fixed the scheduling of several resource groups that include only port0 instead of port05 (i.e., port0 OR port5) and fixed several incorrect instructions' scheduling in the r307529 commit. The patch also includes the X87 instructions which were missing in previous patch commit r307529 as reported in bugzilla bug 34080. Reviewers: zvi, RKSimon, chandlerc, igorb, m_zuckerman, craig.topper, aymanmus, dim Differential Revision: https://reviews.llvm.org/D36388 llvm-svn: 310792	2017-08-13 13:59:24 +00:00
Simon Pilgrim	631991fdde	[X86][AVX512] Added additional shuffle+trunc test case. An existing test should have covered this but a typo caused it to fail. I've kept both as the codegen for the typo case needs addressing as well. llvm-svn: 310791	2017-08-13 12:30:36 +00:00
Simon Pilgrim	ad1c5566a7	[X86][TBM] Add tests showing failure to fold RFLAGS result into TBM instructions. And fails to select TBM instructions at all. llvm-svn: 310790	2017-08-13 12:16:00 +00:00
Simon Pilgrim	808ce12878	[X86][TBM] Regenerate bextri intrinsics tests. NFCI. llvm-svn: 310788	2017-08-13 11:56:15 +00:00
Guy Blank	de425ae753	[X86][AVX512] Add combine for TESTM Add an X86 combine for TESTM when one of the operands is a BUILD_VECTOR(0,0,...). TESTM op0, BUILD_VECTOR(0,0,...) -> BUILD_VECTOR(0,0,...) TESTM BUILD_VECTOR(0,0,...), op1 -> BUILD_VECTOR(0,0,...) Differential Revision: https://reviews.llvm.org/D36536 llvm-svn: 310787	2017-08-13 08:03:37 +00:00
Craig Topper	44cb1ffb6a	[X86] When handling addcarry intrinsic, create the flag result with the correct type so we don't crash if we use a memory instruction Summary: Previously we were creating the flag result with MVT::Other which is interpretted as a Chain node. If we used a memory form of the instruction we would end up with a copyToReg that consumed the chain result of the adcx instruction instead of the flag result. Pretty sure we should be using MVT::i32 here, that's what we do other places we create these node types. We should probably consider this for 5.0 as well. Reviewers: RKSimon, zvi, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36645 llvm-svn: 310784	2017-08-12 20:19:44 +00:00
Simon Pilgrim	5a86f0e717	[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED) If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Reapplied with fix to only work with simple value types. Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 310782	2017-08-12 17:43:25 +00:00
Simon Pilgrim	32546d1434	[X86] Regenerate merge store tests. NFCI. Gives us a much better idea of what is going on than just relying on a few checks. llvm-svn: 310780	2017-08-12 17:27:35 +00:00
Richard Smith	3704eba1d1	D36604: PR34148: Do not assume we can use a copy relocation for an `external_weak` global An `external_weak` global may be intended to resolve as a null pointer if it's not defined, so it doesn't make sense to use a copy relocation for it. Differential Revision: https://reviews.llvm.org/D36604 llvm-svn: 310773	2017-08-11 23:52:28 +00:00
Sanjay Patel	2b452c7192	[x86] add tests for rotate left/right with masked shifter; NFC As noted in the test comment, instcombine now produces the masked shift value even when it's not included in the source, so we should handle this. Although the AMD/Intel docs don't say it explicitly, over-rotating the narrow ops produces the same results. An existence proof that this works as expected on all x86 comes from gcc 4.9 or later: https://godbolt.org/g/K6rc1A llvm-svn: 310770	2017-08-11 22:38:40 +00:00
John Baldwin	eebcc47500	[MIPS] Use ABI to determine stack alignment. Summary: The stack alignment depends on the ABI (16 bytes for N32 and N64 and 8 bytes for O32), not the CPU type. Reviewers: sdardis Reviewed By: sdardis Subscribers: atanasyan, arichardson, llvm-commits Differential Revision: https://reviews.llvm.org/D36326 llvm-svn: 310768	2017-08-11 22:07:56 +00:00
Sanjay Patel	5d6df36fde	[x86] regenerate test checks, add 64-bit run; NFC llvm-svn: 310767	2017-08-11 22:05:33 +00:00
Craig Topper	ac217b7aa3	[X86] Don't use fsin/fcos/fsincos instructions ever Summary: Previously we would use these instructions if sse was disabled and fastmath was enabled. As mentioned in D28335, this is a bad idea. Reviewers: efriedma, scanon, DavidKreitzer Reviewed By: DavidKreitzer Subscribers: zvi, llvm-commits Differential Revision: https://reviews.llvm.org/D36344 llvm-svn: 310762	2017-08-11 20:55:29 +00:00
Rafael Espindola	b8956a70d3	Fix access to undefined weak symbols in pic code When the access to a weak symbol is not a call, the access has to be able to produce the value 0 at runtime. We were sometimes producing code sequences where that was not possible if the code was leaded more than 4g away from 0. llvm-svn: 310756	2017-08-11 20:49:27 +00:00
Matt Arsenault	71bcbd451f	AMDGPU: Start adding tail call support Handle the sibling call cases. llvm-svn: 310753	2017-08-11 20:42:08 +00:00
Daniel Sanders	e6c216ed5b	Revert r310716 (and r310735): [globalisel][tablegen] Support zero-instruction emission. Two of the Windows bots are failing test\CodeGen\X86\GlobalISel\select-inc.mir which should not have been affected by the change. Reverting while I investigate. Also reverted r310735 because it builds on r310716. llvm-svn: 310745	2017-08-11 19:19:21 +00:00
Stanislav Mekhanoshin	7f37794ebd	[AMDGPU] Ported and adopted AMDLibCalls pass The pass does simplifications of well known AMD library calls. If given -amdgpu-prelink option it works in a pre-link mode which allows to reference new library functions which will be linked in later. In addition it also used to process traditional AMD option -fuse-native which allows to replace some of the functions with their fast native implementations from the library. The necessary glue to pass the prelink option and translate -fuse-native is to be added to the driver. Differential Revision: https://reviews.llvm.org/D36436 llvm-svn: 310731	2017-08-11 16:42:09 +00:00
Craig Topper	561092f233	[AVX512] Remove and autoupgrade many of the broadcast intrinsics Summary: This autoupgrades most of the broadcast intrinsics. They've been unused in clang for some time. This leaves the 32x2 intrinsics because they are still used in clang. Reviewers: RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36606 llvm-svn: 310725	2017-08-11 16:22:45 +00:00
Craig Topper	0f30fe9634	[x86] Enable some support for lowerVectorShuffleWithUndefHalf with AVX-512 Summary: This teaches 512-bit shuffles to detect unused halfs in order to reduce shuffle size. We may need to refine the 512-bit exit point. I couldn't remember if we had good cross lane shuffles for 8/16 bit with AVX-512 or not. I believe this is step towards being able to handle D36454 without a special case. From here we need to improve our ability to combine extract_subvector with insert_subvector and other extract_subvectors. And we need to support narrowing binary operations where we don't demand all elements. This may be improvements to DAGCombiner::narrowExtractedVectorBinOp(by recognizing an insert_subvector in addition to concat) or we may need a target specific combiner. Reviewers: RKSimon, zvi, delena, jbhateja Reviewed By: RKSimon, jbhateja Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36601 llvm-svn: 310724	2017-08-11 16:20:05 +00:00
Sanjay Patel	169dae70a6	[x86] use more shift or LEA for select-of-constants (2nd try) The previous rev (r310208) failed to account for overflow when subtracting the constants to see if they're suitable for shift/lea. This version add a check for that and more test were added in r310490. We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. We could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so we don't actually care about these diffs. 5. sbb.ll - This shows a win for what is likely a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310717	2017-08-11 15:44:14 +00:00
Daniel Sanders	1fb1ce0c87	[globalisel][tablegen] Support zero-instruction emission. Summary: Support the case where an operand of a pattern is also the whole of the result pattern. In this case the original result and all its uses must be replaced by the operand. However, register class restrictions can require a COPY. This patch handles both cases by always emitting the copy and leaving it for the register allocator to optimize. Depends on D35833 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Subscribers: javed.absar, kristof.beyls, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D36084 llvm-svn: 310716	2017-08-11 15:40:32 +00:00
Simon Dardis	ae5b53e7cd	[mips] Lift the assertion on the types that can be used with MipsGPRel Post commit review of rL308619 highlighted the need for handling N64 with -fno-pic. Testing reveale a stale assert when generating a GP relative addressing mode. This patch removes that assert and adds the necessary patterns for MIPS64 to perform gp relative addressing with -fno-pic (and the implicit -mno-abicalls + -mgpopt). Reviewers: atanasyan, nitesh.jain Differential Revision: https://reviews.llvm.org/D36472 llvm-svn: 310713	2017-08-11 14:36:05 +00:00
Nirav Dave	0a48e5d506	Improve handling of insert_subvector of bitcast values Fix insert_subvector / extract_subvector merges of bitcast values. Reviewers: efriedma, craig.topper, RKSimon Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D34571 llvm-svn: 310711	2017-08-11 13:21:41 +00:00
Nirav Dave	d1b3f09faa	[X86][DAG] Switch X86 Target to post-legalized store merge Move store merge to happen after intrinsic lowering to allow lowered stores to be merged. Some regressions due in MergeConsecutiveStores to missing insert_subvector that are addressed in follow up patch. Reviewers: craig.topper, efriedma, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34559 llvm-svn: 310710	2017-08-11 13:21:35 +00:00
Simon Pilgrim	83cf3a29b5	[DAGCombiner] Remove shuffle support from simplifyShuffleMask rL310372 enabled simplifyShuffleMask to support undef shuffle mask inputs, but its causing hangs. Removing support until I can triage the problem llvm-svn: 310699	2017-08-11 08:37:00 +00:00
Mikael Holmen	8b10680922	[IfConversion] Maintain the CFG when predicating/merging blocks in IfConvert* Summary: This fixes PR32721 in IfConvertTriangle and possible similar problems in IfConvertSimple, IfConvertDiamond and IfConvertForkedDiamond. In PR32721 we had a triangle EBB \| \ \| \| \| TBB \| / FBB where FBB didn't have any successors at all since it ended with an unconditional return. Then TBB and FBB were be merged into EBB, but EBB would still keep its successors, and the use of analyzeBranch and CorrectExtraCFGEdges wouldn't help to remove them since the return instruction is not analyzable (at least not on ARM). The edge updating code and branch probability updating code is now pushed into MergeBlocks() which allows us to share the same update logic between more callsites. This lets us remove several dependencies on analyzeBranch and completely eliminate RemoveExtraEdges. One thing that showed up with this patch was that IfConversion sometimes left a successor with 0% probability even if there was no branch or fallthrough to the successor. One such example from the test case ifcvt_bad_zero_prob_succ.mir. The indirect branch tBRIND can only jump to bb.1, but without the patch we got: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000), %bb.2(0x00000000) tBRIND %r1, 1, %cpsr B %bb.1 bb.2: There is no way to jump from bb.1 to bb2, but still there is a 0% edge from bb.1 to bb.2. With the patch applied we instead get the expected: bb.0: successors: %bb.1(0x80000000) bb.1: successors: %bb.1(0x80000000) tBRIND %r1, 1, %cpsr B %bb.1 Since bb.2 had no predecessor at all, it was removed. Several testcases had to be updated due to this since the removed successor made the "Branch Probability Basic Block Placement" pass sometimes place blocks in a different order. Finally added a couple of new test cases: * PR32721_ifcvt_triangle_unanalyzable.mir: Regression test for the original problem dexcribed in PR 32721. * ifcvt_triangleWoCvtToNextEdge.mir: Regression test for problem that caused a revert of my first attempt to solve PR 32721. * ifcvt_simple_bad_zero_prob_succ.mir: Test case showing the problem where a wrong successor with 0% probability was previously left. * ifcvt_[diamond\|forked_diamond\|simple]_unanalyzable.mir Very simple test cases for the simple and (forked) diamond cases involving unanalyzable branches that can be nice to have as a base if wanting to write more complicated tests. Reviewers: iteratee, MatzeB, grosser, kparzysz Reviewed By: kparzysz Subscribers: kbarton, davide, aemerson, nemanjai, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34099 llvm-svn: 310697	2017-08-11 06:57:08 +00:00
Nirav Dave	4d28c0ff4f	[DAG] Relax type restriction for store merge Summary: Allow stores of bitcastable types to be merged by peeking through BITCAST nodes and recasting stored values constant and vector extract nodes as necessary. Reviewers: jyknight, hfinkel, efriedma, RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34569 llvm-svn: 310655	2017-08-10 19:52:45 +00:00
Taewook Oh	f5040b9685	Make .file directive to have basename only Summary: Currently LLVM puts directory along with the filename in .file directive, but this behavior doesn't match gcc. There's a no clear description about which one is right (https://sourceware.org/binutils/docs/as/File.html#File), but one document (https://sourceware.org/gdb/current/onlinedocs/stabs/ELF-Linker-Relocation.html) suggests that STT_FILE symbol in elf file is expected to have basename only, which should have a same sting file .file directive according to (https://docs.oracle.com/cd/E26502_01/html/E28388/eoiyg.html). This also affects badly on the build system that uses hashing, as the directory info could be differnt from developer to developer even when they're working on same file. Reviewers: pcc, mehdi_amini Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36018 llvm-svn: 310642	2017-08-10 18:17:11 +00:00
Nirav Dave	926e2d39bf	[X86] Keep dependencies when constructing loads in combineStore Summary: Preserve chain dependecies between old and new loads constructed to prevent loads from reordering below later stores. Fixes PR34088. Reviewers: craig.topper, spatel, RKSimon, efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36528 llvm-svn: 310604	2017-08-10 15:12:32 +00:00
Guy Blank	136b543745	[SelectionDAG] Allow constant folding for implicitly truncating BUILD_VECTOR nodes. In FoldConstantArithmetic, handle BUILD_VECTOR nodes that do implicit truncation on the elements. This is similar to what is done in FoldConstantVectorArithmetic. Differential Revision: https://reviews.llvm.org/D36506 llvm-svn: 310593	2017-08-10 14:09:50 +00:00
Zoran Jovanovic	f4f2d084c6	[mips][microMIPS] Extending size reduction pass with XOR16 Author: milena.vujosevic.janicic Reviewers: sdardis The patch extends size reduction pass for MicroMIPS. XOR instruction is transformed into 16-bit instruction XOR16, if possible. Differential Revision: https://reviews.llvm.org/D34239 llvm-svn: 310579	2017-08-10 10:27:29 +00:00
Elad Cohen	22ba97a0a6	[SelectionDAG] When scalarizing vselect, don't assert on a legal cond operand. When scalarizing the result of a vselect, the legalizer currently expects to already have scalarized the operands. While this is true for the true/false operands (which have the same type as the result), it is not case for the condition operand. On X86 AVX512, v1i1 is legal - this leads to operations such as '< N x type> vselect < N x i1> < N x type> < N x type>' where < N x type > is illegal to hit an assertion during the scalarization. The handling is similar to r205625. This also exposes the fact that (v1i1 extract_subvector) should be legal and selectable on AVX512 - We do this by custom lowering to vector_extract_elt. This still leaves us in some cases with redundant dag nodes which will be combined in a separate soon to come patch. This fixes pr33349. Differential revision: https://reviews.llvm.org/D36511 llvm-svn: 310552	2017-08-10 07:44:23 +00:00
Matthias Braun	a88587ce0c	ARM: Fix CMP_SWAP expansion Clean up after my misguided attempt in r304267 to "fix" CMP_SWAP returning an uninitialized status value. - I was always using tMOVi8 to zero the status register which cannot encode higher register numbers and llvm would silently miscompile) - Nobody was ever looking at that status value outside the expansion. ARMDAGToDAGISel::SelectCMP_SWAP() the only place creating CMP_SWAP instructions was not mapping anything to it. (The cmpxchg status value from llvm IR is lowered to a manual comparison after the CMP_SWAP) So this: - Renames the register from "status" to "temp" it make it obvious that it isn't used outside the expansion. - Remove the zeroing status/temp register. - Keep the live-in list improvements from r304267 Fixes http://llvm.org/PR34056 llvm-svn: 310534	2017-08-09 22:22:05 +00:00
Krzysztof Parzyszek	1966fd79a7	[Hexagon] Ignore DBG_VALUEs when counting instructions in hexagon-early-if llvm-svn: 310524	2017-08-09 21:22:05 +00:00
Matt Arsenault	36cd1859f3	AMDGPU: Fix assert on n inline asm constraint llvm-svn: 310515	2017-08-09 20:09:35 +00:00
Guy Blank	7f60c991ae	[X86][AVX512] Choose correct registers in vpbroadcastb/w Fixes the vpbroadcastb/w instructions which use GPRs as source operands, to use the correct registers. The full GPR should be used, and not the subregister, as it happens before the patch. Fixes pr33795 Differential Revision: https://reviews.llvm.org/D36479 llvm-svn: 310498	2017-08-09 17:21:01 +00:00
Sanjay Patel	6f80d6b46f	[x86] add more tests for select-of-constants; NFC This is to help recommit a fixed version of r310208. As shown in PR34097, we could miscompile if subtraction of the constants overflowed. llvm-svn: 310490	2017-08-09 15:57:02 +00:00
Florian Hahn	d68bc7ae8d	[ARM] Emit error when ARM exec mode is not available. Summary: A similar error message has been removed from the ARMTargetMachineBase constructor in r306939. With this patch, we generate an error message for the example below, compiled with -mcpu=cortex-m0, which does not have ARM execution mode. __attribute__((target("arm"))) int foo(int a, int b) { return a + b % a; } __attribute__((target("thumb"))) int bar(int a, int b) { return a + b % a; } By adding this error message to ARMBaseTargetMachine::getSubtargetImpl, we can deal with functions that set -thumb-mode in target-features. At the moment it seems like Clang does not have access to target-feature specific information, so adding the error message to the frontend will be harder. Reviewers: echristo, richard.barton.arm, t.p.northover, rengolin, efriedma Reviewed By: echristo, efriedma Subscribers: efriedma, aemerson, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D35627 llvm-svn: 310486	2017-08-09 15:39:10 +00:00
Florian Hahn	fc4b3951e9	[ARM] Remove FeatureNoARM implies ModeThumb. Summary: By removing FeatureNoARM implies ModeThumb, we can detect cases where a function's target-features contain -thumb-mode (enables ARM codegen for the function), but the architecture does not support ARM mode. Previously, the implication caused the FeatureNoARM bit to be cleared for functions with -thumb-mode, making the assertion in ARMSubtarget::ARMSubtarget [1] pointless for such functions. This assertion is the only guard against generating ARM code for architectures without ARM codegen support. Is there a place where we could easily generate error messages for the user? At the moment, we would generate ARM code for Thumb-only architectures. X86 has the same behavior as ARM, as in it only has an assertion and no error message, but I think for ARM an error message would be helpful. What do you think? For the example below, `llc -mtriple=armv7m-eabi test.ll -o -` will generate ARM assembler (or fail with an assertion error with this patch). Note that if we run the resulting assembler through llvm-mc, we get an appropriate error message, but not when codegen is handled through clang. ``` define void @bar() #0 { entry: ret void } attributes #0 = { "target-features"="-thumb-mode" } ``` [1] `c1f7b54cef/lib/Target/ARM/ARMSubtarget.cpp (L147)` Reviewers: t.p.northover, rengolin, peter.smith, aadg, silviu.baranga, richard.barton.arm, echristo Reviewed By: rengolin, echristo Subscribers: efriedma, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35569 llvm-svn: 310476	2017-08-09 13:53:28 +00:00
Jonas Paulsson	6228aeda65	[LSR / TTI / SystemZ] Eliminate TargetTransformInfo::isFoldableMemAccess() isLegalAddressingMode() has recently gained the extra optional Instruction* parameter, and therefore it can now do the job that previously only isFoldableMemAccess() could do. The SystemZ implementation of isLegalAddressingMode() has gained the functionality of checking for offsets, which used to be done with isFoldableMemAccess(). The isFoldableMemAccess() hook has been removed everywhere. Review: Quentin Colombet, Ulrich Weigand https://reviews.llvm.org/D35933 llvm-svn: 310463	2017-08-09 11:28:01 +00:00
Serguei Katkov	6ea2e81cf6	[ImplicitNullCheck] Fix the bug when dependent instruction accesses memory It is possible that dependent instruction may access memory. In this case we must reject optimization because the memory change will be visible in null handler basic block. So we will execute an instruction which we must not execute if check fails. Reviewers: sanjoy, reames Reviewed By: sanjoy Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36392 llvm-svn: 310443	2017-08-09 05:17:02 +00:00
Jessica Paquette	d36945bf3a	[MachineOutliner] Ensure AArch64 outliner doesn't mess with W30 or LR Before, the outliner would mark all instructions that read from/modify LR as illegal. This doesn't handle W30, which overlaps with LR. This shouldn't be outlined. This commit fixes that by making modifiesRegister() and readsRegister() look at W30 + take in a TRI argument. This makes sure that modifiesRegister() and readsRegister() won't outline either of W30 and LR. https://reviews.llvm.org/D36435 llvm-svn: 310422	2017-08-08 21:51:26 +00:00
Connor Abbott	249fc7bd2a	[AMDGPU] Add llvm.amdgpu.update.dpp intrinsic Summary: Now that we've made all the necessary backend changes, we can add a new intrinsic which exposes the new capabilities to IR producers. Since llvm.amdgpu.update.dpp is a strict superset of llvm.amdgpu.mov.dpp, we should deprecate the former. We also add tests for all the functionality that was added in previous changes, now that we can access it via an IR construct. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34718 llvm-svn: 310399	2017-08-08 18:52:22 +00:00
Simon Pilgrim	91b7b991d4	[DAGCombiner] simplifyShuffleMask - handle UNDEF inputs from shuffles as well as BUILD_VECTOR Minor extension to D36393 llvm-svn: 310372	2017-08-08 16:10:33 +00:00
Nemanja Ivanovic	979dcb6f09	[PowerPC] Don't crash on larger splats achieved through 1-byte splats We've implemented a 1-byte splat using XXSPLTISB on P9. However, LLVM will produce a 1-byte splat even for wider element BUILD_VECTOR nodes. This patch prevents crashing in that situation. Differential Revision: https://reviews.llvm.org/D35650 llvm-svn: 310358	2017-08-08 13:52:45 +00:00
Amjad Aboud	6fa6813aec	[X86] Improved X86::CMOV to Branch heuristic. Resolved PR33954. This patch contains two more constraints that aim to reduce the noise cases where we convert CMOV into branch for small gain, and end up spending more cycles due to overhead. Differential Revision: https://reviews.llvm.org/D36081 llvm-svn: 310352	2017-08-08 12:17:56 +00:00
Nemanja Ivanovic	809fbfa6a1	[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLE/SETGE Adds handling for SETLE/SETGE comparisons on i32 values. Furthermore, it adds the handling for the special case where RHS == 0. Differential Revision: https://reviews.llvm.org/D34048 llvm-svn: 310346	2017-08-08 11:20:44 +00:00
Simon Pilgrim	ef44228acb	[DAGCombiner] Simplify shuffle mask index if the referenced input element is UNDEF Fixes one of the cases in PR34041. Differential Revision: https://reviews.llvm.org/D36393 llvm-svn: 310344	2017-08-08 11:03:30 +00:00
Daniel Sanders	0554004698	[globalisel][tablegen] Add support for importing 'imm' operands. Summary: This patch enables the import of rules containing 'imm' operands that do not constrain the acceptable values using predicates. Support for ImmLeaf will arrive in a later patch. Depends on D35681 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: rovka Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits Differential Revision: https://reviews.llvm.org/D35833 llvm-svn: 310343	2017-08-08 10:44:31 +00:00
Simon Pilgrim	8c1167df5c	[X86][AVX] Added test for broadcast shuffle from binary sources with undefs (D36393) llvm-svn: 310317	2017-08-07 22:20:06 +00:00
Evgeny Stupachenko	c675290680	Reapply fix PR23384 (part 3 of 3) r304824 (was reverted in r305720). The root cause of reverting was fixed - PR33514. Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> <evgeny.v.stupachenko@intel.com> llvm-svn: 310289	2017-08-07 19:56:34 +00:00
Connor Abbott	79f3ade51a	[AMDGPU] Add pseudo "old" source to all DPP instructions Summary: All instructions with the DPP modifier may not write to certain lanes of the output if bound_ctrl=1 is set or any bits in bank_mask or row_mask aren't set, so the destination register may be both defined and modified. The right way to handle this is to add a constraint that the destination register is the same as one of the inputs. We could tie the destination to the first source, but that would be too restrictive for some use-cases where we want the destination to be some other value before the instruction executes. Instead, add a fake "old" source and tie it to the destination. Effectively, the "old" source defines what value unwritten lanes will get. We'll expose this functionality to users with a new intrinsic later. Also, we want to use DPP instructions for computing derivatives, which means we need to set WQM for them. We also need to enable the entire wavefront when using DPP intrinsics to implement nonuniform subgroup reductions, since otherwise we'll get incorrect results in some cases. To accomodate this, add a new operand to all DPP instructions which will be interpreted by the SI WQM pass. This will be exposed with a new intrinsic later. We'll also add support for Whole Wavefront Mode later. I also fixed llvm.amdgcn.mov.dpp to overwrite the source and fixed up the test. However, I could also keep the old behavior (where lanes that aren't written are undefined) if people want it. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34716 llvm-svn: 310283	2017-08-07 19:10:56 +00:00
Matt Arsenault	36b4b0bed7	AMDGPU: Remove -mcpu=SI Leftover from before amdgcn/r600 split. llvm-svn: 310277	2017-08-07 18:30:35 +00:00
Simon Pilgrim	0242cead2c	[X86][AVX] Add full test coverage of subvector_broadcasts from registers X86SubVBroadcast is for memory subvector broadcasts, but we must test that it handles all cases without the load as well just in case. This was noticed while I was triaging the test cases from PR34041. llvm-svn: 310268	2017-08-07 16:49:09 +00:00
Simon Pilgrim	8b9c4f2855	[X86][AVX] Cleanup subvector broadcast tests - remove old prefixes. llvm-svn: 310265	2017-08-07 15:50:43 +00:00
Sanjay Patel	807f92b8ff	[x86] revert r310208 to investigate test-suite failures (PR34105 / PR34097) llvm-svn: 310264	2017-08-07 15:47:48 +00:00
Matt Arsenault	8728c5f2db	AMDGPU: Cleanup subtarget features Try to avoid mutually exclusive features. Don't use a real default GPU, and use a fake "generic". The goal is to make it easier to see which set of features are incompatible between feature strings. Most of the test changes are due to random scheduling changes from not having a default fullspeed model. llvm-svn: 310258	2017-08-07 14:58:04 +00:00
Nirav Dave	3d3bde7682	[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector. Relanding after case to insert explicit truncation as necessary. Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally improves vector shuffle computations. Reviewers: efriedma, RKSimon, spatel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35566 llvm-svn: 310256	2017-08-07 14:07:49 +00:00
Michael Zuckerman	680ac10aa7	[X86][LLVM]Expanding Supports lowerInterleavedStore() in X86InterleavedAccess (VF16 stride 4). This patch expands the support of lowerInterleavedStore to 16x8i stride 4. LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=4 VF=16) and we plan to include more patterns in the future. The patch goal is to optimize the following sequence: At the end of the computation, we have ymm2, ymm0, ymm12 and ymm3 holding each 16 chars: c0, c1, , c16 m0, m1, , m16 y0, y1, , y16 k0, k1, ., k16 And these need to be transposed/interleaved and stored like so: c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 .... Differential Revision: https://reviews.llvm.org/D35829 llvm-svn: 310252	2017-08-07 13:22:39 +00:00
Simon Pilgrim	e2bb83a4ee	[X86][AVX] Added test for broadcast shuffle with undefs (PR34041) llvm-svn: 310249	2017-08-07 12:24:33 +00:00
Guy Blank	5ca01695f7	[SelectionDAG] reset NewNodesMustHaveLegalTypes flag between basic blocks The NewNodesMustHaveLegalTypes flag is set to false at the beginning of CodeGenAndEmitDAG, and set to true after legalizing types. But before calling CodeGenAndEmitDAG we build the DAG for the basic block. So for the first basic block NewNodesMustHaveLegalTypes would be 'false' during the SDAG building, and for all other basic blocks it would be 'true'. This patch sets the flag to false before SDAG building each basic block. Differential Revision: https://reviews.llvm.org/D33435 llvm-svn: 310239	2017-08-07 05:51:14 +00:00
Sanjay Patel	a923c2ee95	[x86] use more shift or LEA for select-of-constants We can convert any select-of-constants to math ops: http://rise4fun.com/Alive/d7d For this patch, I'm enhancing an existing x86 transform that uses fake multiplies (they always become shl/lea) to avoid cmov or branching. The current code misses cases where we have a negative constant and a positive constant, so this is just trying to plug that hole. The DAGCombiner diff prevents us from hitting a terrible inefficiency: we can start with a select in IR, create a select DAG node, convert it into a sext, convert it back into a select, and then lower it to sext machine code. Some notes about the test diffs: 1. 2010-08-04-MaskedSignedCompare.ll - We were creating control flow that didn't exist in the IR. 2. memcmp.ll - Choose -1 or 1 is the case that got me looking at this again. I think we could avoid the push/pop in some cases if we used 'movzbl %al' instead of an xor on a different reg? That's a post-DAG problem though. 3. mul-constant-result.ll - The trade-off between sbb+not vs. setne+neg could be addressed if that's a regression, but I think those would always be nearly equivalent. 4. pr22338.ll and sext-i1.ll - These tests have undef operands, so I don't think we actually care about these diffs. 5. sbb.ll - This shows a win for what I think is a common case: choose -1 or 0. 6. select.ll - There's another borderline case here: cmp+sbb+or vs. test+set+lea? Also, sbb+not vs. setae+neg shows up again. 7. select_const.ll - These are motivating cases for the enhancement; replace cmov with cheaper ops. Assembly differences between movzbl and xor to avoid a partial reg stall are caused later by the X86 Fixup SetCC pass. Differential Revision: https://reviews.llvm.org/D35340 llvm-svn: 310208	2017-08-06 16:27:07 +00:00
Simon Pilgrim	aeb2b1e36d	[X86][X87] Regenerate inline-asm tests llvm-svn: 310201	2017-08-06 12:17:10 +00:00
Simon Pilgrim	5489781cc7	[X86][X87] Add test case for PR34080 Test with/without the sandybridge (default) model for SSE2, SSE3 and AVX targets. pre-SSE3 the issue is the order of the fpsw and fpcw load/stores (with SSE3 trunc-store FIST instructions avoid the sw/cw manipulations). llvm-svn: 310198	2017-08-06 11:22:33 +00:00
Craig Topper	6bfa2aee78	[X86] Enable isel to use the PAUSE instruction even when SSE2 is disabled Summary: On older processors this instruction encoding is treated as a NOP. MSVC doesn't disable intrinsics based on features the way clang/gcc does. Because the PAUSE instruction encoding doesn't crash older processors, some software out there uses these intrinsics without checking for SSE2. This change also seems to also be consistent with gcc behavior. Fixes PR34079 Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36361 llvm-svn: 310190	2017-08-05 23:34:44 +00:00
Matt Arsenault	b94972cb82	IPRA: Don't crash on null getCallPreservedMask Kernels aren't callable, so they don't have a call preserved mask. llvm-svn: 310172	2017-08-05 07:50:18 +00:00
Joel Jones	60711ca253	[AArch64] LSE Atomics reorg - part 1 Add memory synchronization semantics to LSE Atomics. The memory semantics feature will be added in a subsequent patch. In this patch, several corrections were added to the existing LSE Atomics implementation, based on the ARM Errata D11904 from 05/12/2017. Patch by: steleman Differential Revision: https://reviews.llvm.org/D35319 llvm-svn: 310167	2017-08-05 04:30:55 +00:00
Reid Kleckner	61776e6910	Commit the local change I had to make my test pass llvm-svn: 310153	2017-08-05 00:15:40 +00:00
Reid Kleckner	7662d50d10	[X86] Teach fastisel to select calls to dllimport functions Summary: Direct calls to dllimport functions are very common Windows. We should add them to the -O0 fast path. Reviewers: rafael Subscribers: llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36197 llvm-svn: 310152	2017-08-05 00:10:43 +00:00
Craig Topper	4a8a0a2a88	[X86] Regenerate the fsin/fcos instruction test using update_llc_test_checks.py. NFC This looks to have been converted from a grep based test at some point in a really strange way. llvm-svn: 310150	2017-08-04 23:36:03 +00:00
Nico Weber	b24df62bb6	Revert r310058, it caused PR34073. llvm-svn: 310118	2017-08-04 20:24:13 +00:00
Ulrich Weigand	a11f63a952	[SystemZ] Add support for 128-bit atomic load/store/cmpxchg This adds support for the main 128-bit atomic operations, using the SystemZ instructions LPQ, STPQ, and CDSG. Generating these instructions is a bit more complex than usual since the i128 type is not legal for the back-end. Therefore, we have to hook the LowerOperationWrapper and ReplaceNodeResults TargetLowering callbacks. llvm-svn: 310094	2017-08-04 18:57:58 +00:00
Ulrich Weigand	02f1c02c27	[SystemZ] Eliminate unnecessary serialization operations We currently emit a serialization operation (bcr 14, 0) before every atomic load and after every atomic store. This is overly conservative. The SystemZ architecture actually does not require any serialization for atomic loads, and a serialization after an atomic store only if we need to enforce sequential consistency. This is what other compilers for the platform implement as well. llvm-svn: 310093	2017-08-04 18:53:35 +00:00
Connor Abbott	66b9bd6e50	[AMDGPU] Implement llvm.amdgcn.set.inactive intrinsic Summary: This intrinsic lets us set inactive lanes to an identity value when implementing wavefront reductions. In combination with Whole Wavefront Mode, it lets inactive lanes be skipped over as required by GLSL/Vulkan. Lowering the intrinsic needs to happen post-RA so that RA knows that the destination isn't completely overwritten due to the EXEC shenanigans, so we need another pseudo-instruction to represent the un-lowered intrinsic. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D34719 llvm-svn: 310088	2017-08-04 18:36:54 +00:00
Connor Abbott	92638ab625	[AMDGPU] Add support for Whole Wavefront Mode Summary: Whole Wavefront Wode (WWM) is similar to WQM, except that all of the lanes are always enabled, regardless of control flow. This is required for implementing wavefront reductions in non-uniform control flow, where we need to use the inactive lanes to propagate intermediate results, so they need to be enabled. We need to propagate WWM to uses (unless they're explicitly marked as exact) so that they also propagate intermediate results correctly. We do the analysis and exec mask munging during the WQM pass, since there are interactions with WQM for things that require both WQM and WWM. For simplicity, WWM is entirely block-local -- blocks are never WWM on entry or exit of a block, and WWM is not propagated to the block level. This means that computations involving WWM cannot involve control flow, but we only ever plan to use WWM for a few limited purposes (none of which involve control flow) anyways. Shaders can ask for WWM using the @llvm.amdgcn.wwm intrinsic. There isn't yet a way to turn WWM off -- that will be added in a future change. Finally, it turns out that turning on inactive lanes causes a number of problems with register allocation. While the best long-term solution seems like teaching LLVM's register allocator about predication, for now we need to add some hacks to prevent ourselves from getting into trouble due to constraints that aren't currently expressed in LLVM. For the gory details, see the comments at the top of SIFixWWMLiveness.cpp. Reviewers: arsenm, nhaehnle, tpr Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D35524 llvm-svn: 310087	2017-08-04 18:36:52 +00:00
Connor Abbott	8c217d0a29	[AMDGPU] Add an llvm.amdgcn.wqm intrinsic for WQM Summary: Previously, we assumed that certain types of instructions needed WQM in pixel shaders, particularly DS instructions and image sampling instructions. This was ok because with OpenGL, the assumption was correct. But we want to start using DPP instructions for derivatives as well as other things, so the assumption that we can infer whether to use WQM based on the instruction won't continue to hold. This intrinsic lets frontends like Mesa indicate what things need WQM based on their knowledge of the API, rather than second-guessing them in the backend. We need to keep around the old method of enabling WQM, but eventually we should remove it once Mesa catches up. For now, this will let us use DPP instructions for computing derivatives correctly. Reviewers: arsenm, tpr, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35167 llvm-svn: 310085	2017-08-04 18:36:49 +00:00
Chad Rosier	14fc82a1df	[AArch64] Fix an assertion for pre-index generation with unscaled loads/stores. Differential Revision: https://reviews.llvm.org/D36248 PR34035 llvm-svn: 310066	2017-08-04 16:44:06 +00:00
Simon Pilgrim	5c63586489	[DAGCombiner] Extending pattern detection for vector shuffle. If all the operands of a BUILD_VECTOR extract elements from same vector then split the vector efficiently based on the maximum vector access index. Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D35788 llvm-svn: 310058	2017-08-04 12:46:35 +00:00
Zoran Jovanovic	1c17001235	[mips][microMIPS] Extending size reduction pass with ADDIUSP and ADDIUR1SP Author: milena.vujosevic.janicic The patch extends size reduction pass for MicroMIPS. The following instructions are examined and transformed, if possible: ADDIU instruction is transformed into 16-bit instruction ADDIUSP ADDIU instruction is transformed into 16-bit instruction ADDIUR1SP Usage of u_int64_t replaced by uint64_t to avoid issues because of which previous patch version was reverted: Differential Revision: https://reviews.llvm.org/D34511 llvm-svn: 310044	2017-08-04 10:18:44 +00:00
Stanislav Mekhanoshin	6c7a8d0b5f	[AMDGPU] Preserve inverted bit in SI_IF in presence of SI_KILL In case if SI_KILL is in between of the SI_IF and SI_END_CF we need to preserve the bits actually flipped by if rather then restoring the original mask. Differential Revision: https://reviews.llvm.org/D36299 llvm-svn: 310031	2017-08-04 06:58:42 +00:00
Connor Abbott	00755362b9	[AMDGPU] Add missing hazard for DPP-after-EXEC-write Summary: Following the docs, we need at least 5 wait states between an EXEC write and an instruction that uses DPP. Reviewers: tstellar, arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D34849 llvm-svn: 310013	2017-08-04 01:09:43 +00:00
Matt Arsenault	a176cc5b93	AMDGPU: Don't use report_fatal_error for unsupported call types llvm-svn: 310004	2017-08-03 23:32:41 +00:00
Matt Arsenault	a202538bfa	AMDGPU: Remove error on calls for amdgcn Repurpose the -amdgpu-function-calls flag. Rather than require it to emit a call, only use it to run the always inline path or not. llvm-svn: 310003	2017-08-03 23:24:05 +00:00
Matt Arsenault	817c253e60	AMDGPU: Fix implicitarg.ptr handling special inputs llvm-svn: 310002	2017-08-03 23:12:44 +00:00
Matt Arsenault	8623e8d864	AMDGPU: Pass special input registers to functions llvm-svn: 309998	2017-08-03 23:00:29 +00:00
Taewook Oh	424c88c381	Move unit test to the proper location Summary: Move test/CodeGen/AArch64/reg-bank-128bit.mir to test/CodeGen/AArch64/GlobalISel/reg-bank-128bit.mir so that the test is executed only when global-isel is enabled. lit.local.cfg under test/CodeGen/AArch64/GlobalISel checks if 'global-isel' is in the available_features while the same file under test/CodeGen/AArch64 doesn't. Reviewers: qcolombet, davide Reviewed By: davide Subscribers: davide, aemerson, javed.absar, igorb, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D36282 llvm-svn: 309985	2017-08-03 21:07:12 +00:00
Connor Abbott	82267a553d	test commit llvm-svn: 309981	2017-08-03 20:22:30 +00:00
Simon Pilgrim	99d0ab38bc	[X86] Adding a test for vector shuffle extractions. When both the vector inputs of the shuffle vector is comprising of same vector or shuffle mask is accessing elements from only one operand vector (like in PR33758 test already present). Committed on behalf of @jbhateja (Jatin Bhateja) Differential Revision: https://reviews.llvm.org/D36271 llvm-svn: 309963	2017-08-03 17:04:59 +00:00
Simon Pilgrim	0dfd4fa8d3	[X86][AVX512] Tidied up v64i8 vector shuffle tests with triple llvm-svn: 309961	2017-08-03 16:56:52 +00:00
Changpeng Fang	ef4dbb46da	AMDGPU/SI: Don't fix a PHI under uniform branch in SIFixSGPRCopies only when sources and destination are all sgprs Summary: If a PHI has at lease one VGPR operand, we have to fix the PHI in SIFixSGPRCopies. Reviewer: Matt Differential Revision: http://reviews.llvm.org/D34727 llvm-svn: 309959	2017-08-03 16:37:02 +00:00
Nirav Dave	3fc1c2365c	[DAG] Allow merging of stores of vector loads Remove restriction disallowing merging of stores vector loads into larger store of larger vector load. Reviewers: RKSimon, efriedma, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36158 llvm-svn: 309951	2017-08-03 15:51:20 +00:00
Nico Weber	bfde70b097	Revert r309923, it caused PR34045. llvm-svn: 309950	2017-08-03 15:41:26 +00:00
Simon Dardis	51296593a8	[SelectionDAG] Resolve PR33978. rL306209 taught SelectionDAG how to add the dereferenceable flag when expanding memcpy and memmove. The fix however contained a nit where the offset + size was constructed as an APInt of PointerSize rather than PointerSizeInBits. This lead to isDereferenceableAndAlignedPointer() get truncated values or values which would be sign extended within that function leading to incorrect results. Thanks to Alex Crichton for reporting the issue! This resolves PR33978. Reviewers: inouehrs Differential Revision: https://reviews.llvm.org/D36236 llvm-svn: 309930	2017-08-03 09:38:46 +00:00
Diana Picus	930e6ec8f3	[ARM] GlobalISel: Select simple G_GLOBAL_VALUE instructions Add support in the instruction selector for G_GLOBAL_VALUE for ELF and MachO for the static relocation model. We don't handle Windows yet because that's Thumb-only, and we don't handle Thumb in general at the moment. Support for PIC, ROPI, RWPI and TLS will be added in subsequent commits. Differential Revision: https://reviews.llvm.org/D35883 llvm-svn: 309927	2017-08-03 09:14:59 +00:00
Dinar Temirbulatov	a0beedef1c	[X86] SET0 to use XMM registers where possible PR26018 PR32862 Differential Revision: https://reviews.llvm.org/D35965 llvm-svn: 309926	2017-08-03 08:50:18 +00:00
Roger Ferrer Ibanez	854980341b	[ARM] Use ADDCARRY / SUBCARRY This patch: - makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32 - lowering is done by first converting the boolean value into the carry flag using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two operations does the actual addition. - for subtraction, given that ISD::SUBCARRY second result is actually a borrow, we need to invert the value of the second operand and result before and after using ARMISD::SUBE. We need to invert the carry result of ARMISD::SUBE to preserve the semantics. - given that the generic combiner may lower ISD::ADDCARRY and ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering as well otherwise i64 operations now would require branches. This implies updating the corresponding test for unsigned. - add new combiner to remove the redundant conversions from/to carry flags to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C Differential Revision: https://reviews.llvm.org/D35192 llvm-svn: 309923	2017-08-03 07:45:10 +00:00
Rafael Espindola	79e238afee	Delete Default and JITDefault code models IMHO it is an antipattern to have a enum value that is Default. At any given piece of code it is not clear if we have to handle Default or if has already been mapped to a concrete value. In this case in particular, only the target can do the mapping and it is nice to make sure it is always done. This deletes the two default enum values of CodeModel and uses an explicit Optional<CodeModel> when it is possible that it is unspecified. llvm-svn: 309911	2017-08-03 02:16:21 +00:00
Tom Stellard	3337d74399	AMDGPU/GlobalISel: Mark 32-bit G_FMUL as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D36218 llvm-svn: 309898	2017-08-02 22:56:30 +00:00
Stefan Pintilie	873889ca16	[Power9] Exploit vector absolute difference instructions on Power 9 Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW) for byte, halfword and word. We should take advantage of these. Differential Revision: https://reviews.llvm.org/D34684 llvm-svn: 309876	2017-08-02 20:07:21 +00:00
Evandro Menezes	3840db527b	[AArch64] Add Exynos M2 feature test (NFC) Test fusion of AES operations. llvm-svn: 309855	2017-08-02 18:55:34 +00:00
Evandro Menezes	89a67b8353	[AArch64] Improve the test of conditional branch fusion Separate the checking of the fused pairings with B.cc and CBcc. llvm-svn: 309825	2017-08-02 15:34:06 +00:00
Diana Picus	d5a00b0ff6	[MIR] Print target-specific constant pools This should enable us to test the generation of target-specific constant pools, e.g. for ARM: constants: - id: 0 value: 'g(GOT_PREL)-(LPC0+8-.)' alignment: 4 isTargetSpecific: true I intend to use this to test PIC support in GlobalISel for ARM. This is difficult to test outside of that context, since the existing MIR tests usually rely on parser support as well, and that seems a bit trickier to add. We could try to add a unit test, but the setup for that seems rather convoluted and overkill. We do test however that the parser reports a nice error when encountering a target-specific constant pool. Differential Revision: https://reviews.llvm.org/D36092 llvm-svn: 309806	2017-08-02 11:09:30 +00:00
Matt Arsenault	8e8f8f43b0	AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it llvm-svn: 309783	2017-08-02 01:52:45 +00:00
Matt Arsenault	1d6317c3ad	AMDGPU: Fix emitting encoded calls This was failing on out of bounds access to the extra operands on the s_swappc_b64 beyond those in the instruction definition. This was working, but somehow regressed within the past few weeks, although I don't see any obvious commit. llvm-svn: 309782	2017-08-02 01:42:04 +00:00
Matt Arsenault	6ed7b9bfc0	AMDGPU: Analyze callee resource usage in AsmPrinter llvm-svn: 309781	2017-08-02 01:31:28 +00:00
Matt Arsenault	d1867c0345	AMDGPU: Don't place arguments in emergency stack slot When finding the fixed offsets for function arguments, this needs to skip over the 4 bytes reserved for the emergency stack slot. llvm-svn: 309776	2017-08-02 00:59:51 +00:00
Matt Arsenault	acc5e82b0e	DAG: Undo and->or combine with FrameIndexes This pattern shows up when lowering byval copies on AMDGPU. The byval object access is split into 4-byte chunks, adding a constant offset to the FixedStack base. When some of the offsets turn into ors, this prevents combining the constant offsets. This makes it not apparent that the object is there when matching addressing modes, so it ends up using a scratch wave offset relative access and the lengthy frame index expansion for that. llvm-svn: 309775	2017-08-02 00:43:42 +00:00
Matthias Braun	6b898beb8e	X86: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309774	2017-08-02 00:28:10 +00:00
Stanislav Mekhanoshin	da0edef1bd	[AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused With SI_END_CF elimination for some nested control flow we can now eliminate saved exec register completely by turning a saveexec version of instruction into just a logical instruction. Differential Revision: https://reviews.llvm.org/D36007 llvm-svn: 309766	2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin	37e7f959c0	[AMDGPU] Collapse adjacent SI_END_CF Add a pass to remove redundant S_OR_B64 instructions enabling lanes in the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any vector instructions between them we can only keep outer SI_END_CF, given that CFG is structured and exec bits of the outer end statement are always not less than exec bit of the inner one. This needs to be done before the RA to eliminate saved exec bits registers but after register coalescer to have no vector registers copies in between of different end cf statements. Differential Revision: https://reviews.llvm.org/D35967 llvm-svn: 309762	2017-08-01 23:14:32 +00:00
Matthias Braun	7c91336efe	ARM: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309755	2017-08-01 22:20:49 +00:00
Matthias Braun	e2d2ce9ff1	PowerPC: Do not use llc -march in tests. `llc -march` is problematic because it only switches the target architecture, but leaves the operating system unchanged. This occasionally leads to indeterministic tests because the OS from LLVM_DEFAULT_TARGET_TRIPLE is used. However we can simply always use `llc -mtriple` instead. This changes all the tests to do this to avoid people using -march when they copy and paste parts of tests. This patch: - Removes -march if the .ll file already has a matching `target triple` directive or -mtriple argument. - In all other cases changes -march=ppc32/-march=ppc64 to -mtriple=ppc32--/-mtriple=ppc64-- See also the discussion in https://reviews.llvm.org/D35287 llvm-svn: 309754	2017-08-01 22:20:41 +00:00
Adrian Prantl	032d2381bf	Remove PrologEpilogInserter's usage of DBG_VALUE's offset field In the last half-dozen commits to LLVM I removed code that became dead after removing the offset parameter from llvm.dbg.value gradually proceeding from IR towards the backend. Before I can move on to DwarfDebug and friends there is one last side-called offset I need to remove: This patch modifies PrologEpilogInserter's use of the DBG_VALUE's offset argument to use a DIExpression instead. Because the PrologEpilogInserter runs at the Machine level I had to play a little trick with a named llvm.dbg.mir node to get the DIExpressions to print in MIR dumps (which print the llvm::Module followed by the MachineFunction dump). I also had to add rudimentary DwarfExpression support to CodeView and as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo was supposed to give up on variables with complex DIExpressions, but would fail to do so for fragments, which are also modeled as DIExpressions). With this last holdover removed we will have only one canonical way of representing offsets to debug locations which will simplify the code in DwarfDebug (and future versions of CodeViewDebug once it starts handling more complex expressions) and make it easier to reason about. This patch is NFC-ish: All test case changes are for assembler comments and the binary output does not change. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D36125 llvm-svn: 309751	2017-08-01 21:45:24 +00:00
Martin Storsjo	eacf4e408b	[AArch64] Rewrite stack frame handling for win64 vararg functions The previous attempt, which made do with a single offset in computeCalleeSaveRegisterPairs, wasn't quite enough. The previous attempt only worked as long as CombineSPBump == true (since the offset would be adjusted later in fixupCalleeSaveRestoreStackOffset). Instead include the size for the fixed stack area used for win64 varargs in calculations in emitPrologue/emitEpilogue. The stack consists of mainly three parts; - AFI->getLocalStackSize() - AFI->getCalleeSavedStackSize() - FixedObject Most of the places in the code which previously used the CSStackSize now use PrologueSaveSize instead, which is the sum of the latter two, while some cases which need exactly the middle one use AFI->getCalleeSavedStackSize() explicitly instead of a local variable. In addition to moving the offsetting into emitPrologue/emitEpilogue (which fixes functions with CombineSPBump == false), also set the frame pointer to point to the right location, where the frame pointer and link register actually are stored. In addition to the prologue/epilogue, this also requires changes to resolveFrameIndexReference. Add tests for a function that keeps a frame pointer and another one that uses a VLA. Differential Revision: https://reviews.llvm.org/D35919 llvm-svn: 309744	2017-08-01 21:13:54 +00:00
Matt Arsenault	206f826348	AMDGPU: Fix handling of div_scale with undef inputs The src0 register must match src1 or src2, but if these were undefined they could end up using different implicit_defed virtual registers. Force these to use one undef vreg or pick the defined other register. Also fixes producing invalid nodes without the right number of inputs when src2 is undef. llvm-svn: 309743	2017-08-01 20:49:41 +00:00
Matt Arsenault	a5fcb83bfa	AMDGPU: Add test for r308774 llvm-svn: 309733	2017-08-01 19:54:58 +00:00
Matt Arsenault	b62a4eb524	AMDGPU: Initial implementation of calls Includes a hack to fix the type selected for the GlobalAddress of the function, which will be fixed by changing the default datalayout to use generic pointers for 0. llvm-svn: 309732	2017-08-01 19:54:18 +00:00
Simon Pilgrim	f0a5d09b8d	[X86][SSE3] Add scheduler tests for MONITOR/MWAIT llvm-svn: 309718	2017-08-01 18:16:44 +00:00
Nirav Dave	27a6605bdc	Revert "[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector." This reverts commit r309680 which appears to be raising an assertion in the test-suite. llvm-svn: 309717	2017-08-01 18:09:25 +00:00
Simon Pilgrim	486072d3d6	[X86][SSE] Added missing vector logic intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them. llvm-svn: 309715	2017-08-01 17:51:20 +00:00
Sanjay Patel	4dbdd470a8	[CGP] use narrower types in memcmp expansion when possible This only affects very small memcmp on x86 for now, but it will become more important if we allow vector-sized load and compares. llvm-svn: 309711	2017-08-01 17:24:54 +00:00
Craig Topper	2a5bba7325	[X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D36129 llvm-svn: 309706	2017-08-01 17:18:14 +00:00
Simon Pilgrim	3f24ff6130	[X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431). Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class llvm-svn: 309701	2017-08-01 16:47:48 +00:00
Craig Topper	e2cc589283	[X86] Split bmi.ll into a bmi test and a bmi2 test. This moves all the bmi2 specific intrinsics to a separate test file and adds a bmi1 only command line to the existing bmi test. This will allow us to see the missed opportunity to use bextr to handle 64-bit 'and' with a large mask. This will be improved in an upcoming patch. llvm-svn: 309700	2017-08-01 16:45:11 +00:00
Simon Pilgrim	810677eba2	[X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules llvm-svn: 309699	2017-08-01 16:18:25 +00:00
Manoj Gupta	7ebbe655ed	[X86] Fix a crash in FEntryInserter Pass. Summary: FEntryInserter pass unconditionally derefs the first Instruction in the first Basic Block. The pass crashes when the first BasicBlock is empty. Fix the crash by not dereferencing the basic Block iterator. This fixes an issue observed when building Linux kernel 4.4 with clang. Fixes PR33971. Reviewers: hfinkel, niravd, dblaikie Reviewed By: niravd Subscribers: davide, llvm-commits Differential Revision: https://reviews.llvm.org/D35979 llvm-svn: 309694	2017-08-01 15:39:12 +00:00
Craig Topper	2462a713ae	[AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32. We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though. This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need. This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch. Differential Revision: https://reviews.llvm.org/D35978 llvm-svn: 309693	2017-08-01 15:31:24 +00:00
Simon Pilgrim	78b305f8d5	[X86][SSSE3] Fix typos in pabsw/pmulhrsw tests for load folding scheduling. llvm-svn: 309692	2017-08-01 15:31:24 +00:00
Simon Pilgrim	8484698321	[X86] Added missing cpu to fix generic scheduling model tests llvm-svn: 309691	2017-08-01 15:14:35 +00:00
Nirav Dave	b5cb48c6ae	[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector. Summary: Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally improves vector shuffle computations. Reviewers: efriedma, RKSimon, spatel Subscribers: javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35566 llvm-svn: 309680	2017-08-01 13:45:35 +00:00
Strahinja Petrovic	a2b4748bdc	[Mips] Fix for BBIT octeon instruction This patch enables control flow optimization for variations of BBIT instruction. In this case optimization removes unnecessary branch after BBIT instruction. Differential Revision: https://reviews.llvm.org/D35359 llvm-svn: 309679	2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek	91ff5c6d47	[Hexagon] Convert HVX vector constants of i1 to i8 Certain operations require vector of i1 values. However, for Hexagon architecture compatibility, they need to be represented as vector of i8. Patch by Suyog Sarda. llvm-svn: 309677	2017-08-01 13:12:53 +00:00
Simon Pilgrim	4e0b41450d	[X86] Regenerate big structure return test and check on x86_64 as well. llvm-svn: 309676	2017-08-01 13:12:15 +00:00
Tom Stellard	9d8337d857	AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D35916 llvm-svn: 309675	2017-08-01 12:38:33 +00:00
Andrew V. Tischenko	d56595184b	Support itineraries in TargetSubtargetInfo::getSchedInfoStr - Now if the given instr does not have sched model then we try to calculate the latecy/throughput with help of itineraries. Differential Revision https://reviews.llvm.org/D35997 llvm-svn: 309666	2017-08-01 09:15:43 +00:00
Eli Friedman	37f41d1e7c	[ScheduleDAG] Don't schedule node with physical register interference https://reviews.llvm.org/D31536 didn't really solve the problem it was trying to solve; it got rid of the assertion failure, but we were still scheduling the DAG incorrectly (mixing together instructions from different calls), leading to a MachineVerifier failure. In order to schedule the DAG correctly, we have to make sure we don't schedule a node which should be blocked by an interference. Fix ScheduleDAGRRList::PickNodeToScheduleBottomUp so it doesn't pick a node like that. The added call to FindAvailableNode() is the key change here; this makes sure we don't try to schedule a call while we're in the middle of scheduling a different call. I'm not sure this is the right approach; in particular, I'm not sure how to prove we don't end up with an infinite loop of repeatedly backtracking. This also reverts the code change from D31536. It doesn't do anything useful: we should never schedule an ADJCALLSTACKDOWN unless we've already scheduled the corresponding ADJCALLSTACKUP. Differential Revision: https://reviews.llvm.org/D33818 llvm-svn: 309642	2017-08-01 00:28:40 +00:00
Craig Topper	410d252f5b	[AVX-512] Add unmasked subvector inserts and extract to the execution domain tables. llvm-svn: 309632	2017-07-31 22:07:29 +00:00
Craig Topper	7dd26785e7	[AVX512] Add a common prefix to avx512-insert-extract.ll so we can reduce the number of check lines on some test cases. This was pointed out during the review for D313804. llvm-svn: 309629	2017-07-31 21:20:06 +00:00
Craig Topper	043c44efff	[AVX-512] Use AVX512 as test check prefix instead of AVX3. NFC llvm-svn: 309625	2017-07-31 20:58:06 +00:00
Konstantin Belochapka	b77d0a5cf1	[X86][MMX] Added custom lowering action for MMX SELECT (PR30418) Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch Differential Revision: https://reviews.llvm.org/D34661 llvm-svn: 309614	2017-07-31 20:11:49 +00:00
Quentin Colombet	15f6ffbf7c	[TargetPassConfig] Feature generic options to setup start/stop-after/before This patch refactors the code used in llc such that all the users of the addPassesToEmitFile API have access to a homogeneous way of handling start/stop-after/before options right out of the box. In particular, just invoking addPassesToEmitFile will set the proper pipeline without additional effort (modulo parsing a .mir file if the start-before/after options are used. NFC. Differential Revision: https://reviews.llvm.org/D30913 llvm-svn: 309599	2017-07-31 18:24:07 +00:00
Sanjay Patel	fea731a4aa	[CGP] use subtract or subtract-of-cmps for result of memcmp expansion As noted in the code comment, transforming this in the other direction might require a separate transform here in CGP given the block-at-a-time DAG constraint. Besides that theoretical motivation, there are 2 practical motivations for the subtract-of-cmps form: 1. The codegen for both x86 and PPC is better for this IR (though PPC could be better still). There is discussion about canonicalizing IR to the select form ( http://lists.llvm.org/pipermail/llvm-dev/2017-July/114885.html ), so we probably need to add DAG transforms for those patterns anyway, but this improves the memcmp output without waiting for that step. 2. If we allow vector-sized chunks for the load and compare, x86 is better prepared to convert that to optimal code when using subtract-of-cmps, so another prerequisite patch is avoided if we choose to enable that. Differential Revision: https://reviews.llvm.org/D34904 llvm-svn: 309597	2017-07-31 18:08:24 +00:00
Craig Topper	cb0e74975a	[AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead. These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use. But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases. I also generally dislike patterns rooted in a bitcast which these were. Differential Revision: https://reviews.llvm.org/D35977 llvm-svn: 309589	2017-07-31 17:35:44 +00:00
Aditya Nandakumar	02c602e18c	[GISel]: Support Widening G_ICMP's destination operand. Updated AArch64 to widen destination to s32. https://reviews.llvm.org/D35737 Reviewed by Tim llvm-svn: 309579	2017-07-31 17:00:16 +00:00
Simon Pilgrim	11ea6fdcd7	[X86] Extending a test cases for LEA factorization. Submitted on the behalf of Jatin Bhateja Differential Revision: https://reviews.llvm.org/D36048 llvm-svn: 309565	2017-07-31 14:23:28 +00:00
Simon Dardis	8930b4a049	[SelectionDAG][mips] Fix PR33883 PR33883 shows that calls to intrinsic functions should not have their vector arguments or returns subject to ABI changes required by the target. This resolves PR33883. Thanks to Alex Crichton for reporting the issue! Reviewers: zoran.jovanovic, atanasyan Differential Revision: https://reviews.llvm.org/D35765 llvm-svn: 309561	2017-07-31 14:06:58 +00:00
Guy Blank	b169d56dc3	[X86][AVX512] Add masked MOVS[S\|D] patterns Added patterns to recognize AND 1 on the mask of a scalar masked move is not needed since only the lower bit is relevant for the instruction. Differential Revision: https://reviews.llvm.org/D35897 llvm-svn: 309546	2017-07-31 08:26:14 +00:00
Craig Topper	97e9fa7954	[X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved. We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority. llvm-svn: 309535	2017-07-31 05:55:54 +00:00
Michael Zuckerman	11148120d0	Expanding the test case for vf8 for stride 4 interleaved. llvm-svn: 309511	2017-07-30 11:54:57 +00:00
Florian Hahn	f63a5e91db	[AArch64] Tie source and destination operands for AESMC/AESIMC. Summary: Most CPUs implementing AES fusion require instruction pairs of the form AESE Vn, _ AESMC Vn, Vn and AESD Vn, _ AESIMC Vn, Vn The constraint is added to AES(I)MC instructions which use the result of an AES(E\|D) instruction by using AES(I)MCTrr pseudo instructions, which constraint source and destination registers to be the same. A nice side effect of this change is that now all possible pairs are scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll test case. I had to update aes_load_store. The version I added initially was very reduced and with the new constraint, AESE/AESMC could not be scheduled back-to-back. I updated the test to be more realistic and still expose the same scheduling problem as the initial test case. Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga Reviewed By: t.p.northover, evandro Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D35299 llvm-svn: 309495	2017-07-29 20:35:28 +00:00
Florian Hahn	2f86e3d494	[AArch64] Use 8 bytes as preferred function alignment on Cortex-A53. Summary: This change gives a 0.25% speedup on execution time, a 0.82% improvement in benchmark scores and a 0.20% increase in binary size on a Cortex-A53. These numbers are the geomean results on a wide range of benchmarks from the test-suite and a range of proprietary suites. Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin Reviewed By: rengolin Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35568 llvm-svn: 309494	2017-07-29 20:04:54 +00:00
Simon Pilgrim	718cb0ea62	[SelectionDAG][X86] CombineBT - more aggressively determine demanded bits This patch is in 2 parts: 1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT. 2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match. Differential Revision: https://reviews.llvm.org/D35896 llvm-svn: 309486	2017-07-29 14:50:25 +00:00
Matt Arsenault	9608a2891d	AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat Checking the encoding is insufficient since now there can be global or scratch instructions. llvm-svn: 309472	2017-07-29 01:26:21 +00:00
Matt Arsenault	dc8f5cc39c	AMDGPU: Teach isLegalAddressingMode about global_* instructions Also refine the flat check to respect flat-for-global feature, and constant fallback should check global handling, not specifically MUBUF. llvm-svn: 309471	2017-07-29 01:12:31 +00:00
Matt Arsenault	4e309b0861	AMDGPU: Start selecting global instructions llvm-svn: 309470	2017-07-29 01:03:53 +00:00
Farhana Aleen	7ccc9fd0e2	Added tests for i8 interleaved-load-pattern of stride=4, VF=(8, 16, 32). llvm-svn: 309447	2017-07-28 22:43:34 +00:00
Adrian Prantl	abe04759a6	Remove the obsolete offset parameter from @llvm.dbg.value There is no situation where this rarely-used argument cannot be substituted with a DIExpression and removing it allows us to simplify the DWARF backend. Note that this patch does not yet remove any of the newly dead code. rdar://problem/33580047 Differential Revision: https://reviews.llvm.org/D35951 llvm-svn: 309426	2017-07-28 20:21:02 +00:00
Reid Kleckner	9be82c3169	Fix conditional tail call branch folding when both edges are the same The conditional tail call logic did the wrong thing when both destinations of a conditional branch were the same: BB#1: derived from LLVM BB %entry Live Ins: %EFLAGS Predecessors according to CFG: BB#0 JE_1 <BB#5>, %EFLAGS<imp-use,kill> JMP_1 <BB#5> BB#5: derived from LLVM BB %sw.epilog Predecessors according to CFG: BB#1 TCRETURNdi64 <ga:@mergeable_conditional_tailcall>, 0, ... We would fold the JE_1 to a TCRETURNdi64cc, and then remove our BB#5 successor. Then BB#5 would be deleted as it had no predecessors, leaving a dangling "JMP_1 <BB#5>" reference behind to cause assertions later. This patch checks that both conditional branch destinations are different before doing the transform. The standard branch folding logic is able to remove both the JMP_1 and the JE_1, and for my test case we end up forming a better conditional tail call later. Fixes PR33980 llvm-svn: 309422	2017-07-28 19:48:40 +00:00
Matt Arsenault	da9ab148f3	AMDGPU: Look through a bitcast user of an out argument This allows handling of a lot more of the interesting cases in Blender. Most of the large functions unlikely to be inlined have this pattern. This is a special case for what clang emits for OpenCL 3 element vectors. Annoyingly, these are emitted as <3 x elt>* pointers, but accessed as <4 x elt>* operations. This also needs to handle cases where a struct containing a single vector is used. llvm-svn: 309419	2017-07-28 19:06:16 +00:00
Matt Arsenault	c06574ffc0	AMDGPU: Add pass to replace out arguments It is better to return arguments directly in registers if we are making a call rather than introducing expensive stack usage. In one of sample compile from one of Blender's many kernel variants, this fires on about ~20 different functions. Future improvements may be to recognize simple cases where the pointer is indexing a small array. This also fails when the store to the out argument is in a separate block from the return, which happens in a few of the Blender functions. This should also probably be using MemorySSA which might help with that. I'm not sure this is correct as a FunctionPass, but MemoryDependenceAnalysis seems to not work with a ModulePass. I'm also not sure where it should run.I think it should run before DeadArgumentElimination, so maybe either EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate. llvm-svn: 309416	2017-07-28 18:40:05 +00:00
Tim Northover	a7f583e33b	GlobalISel: map 128-bit values to an FPR by default. Eventually we may want to allow a pair of GPRs but absolutely nothing in the entire world is ready for that yet. llvm-svn: 309404	2017-07-28 17:11:01 +00:00
Matt Arsenault	9166ce86e8	AMDGPU: Annotate implicitarg.ptr usage We need to pass something to functions for this to work. It isn't derivable just from the kernarg segment pointer because the implicit arguments are placed after the kernel arguments. Also fixes missing test for the intrinsic. llvm-svn: 309398	2017-07-28 15:52:08 +00:00
Strahinja Petrovic	25e9e1b866	[ARM] Add the option to directly access TLS pointer This patch enables choice for accessing thread local storage pointer (like '-mtp' in gcc). Differential Revision: https://reviews.llvm.org/D34408 llvm-svn: 309381	2017-07-28 12:54:57 +00:00
Simon Pilgrim	1ff3da7273	[X86] Add test case for PR33290 llvm-svn: 309375	2017-07-28 09:43:52 +00:00
Simon Pilgrim	88d3bed351	[X86][AVX] Cleanup shuffle combine tests - remove old prefixes. llvm-svn: 309374	2017-07-28 09:41:55 +00:00
Peter Smith	5804364f7a	[ARM] Add test to check pcs of ARM ABI runtime floating point helpers The ARM Runtime ABI document (IHI0043) defines the AEABI floating point helper functions in section 4.1.2 The floating-point helper functions. The functions listed in this section must always use the base AAPCS calling convention. This test generates calls to all the helper functions that llvm supports and checks that the base AAPCS calling convention has been used. We test the equivalent of -mfloat-abi=soft, -mfloat-abi=softfp, -mfloat-abi=hardfp with an FPU that supports single and double precision, and one that only supports double precision. Differential Revision: https://reviews.llvm.org/D35904 llvm-svn: 309371	2017-07-28 09:21:00 +00:00
Matthias Braun	c618a466f1	ARMFrameLowering: Only set ExtraCSSpill for actually unused registers. The code assumed that unclobbered/unspilled callee saved registers are unused in the function. This is not true for callee saved registers that are also used to pass parameters such as swiftself. rdar://33401922 llvm-svn: 309350	2017-07-28 01:36:32 +00:00
Reid Kleckner	07a5d4372e	[X86] Fix latent bug in sibcall eligibility logic The X86 tail call eligibility logic was correct when it was written, but the addition of inalloca and argument copy elision broke its assumptions. It was assuming that fixed stack objects were immutable. Currently, we aim to emit a tail call if no arguments have to be re-arranged in memory. This code would trace the outgoing argument values back to check if they are loads from an incoming stack object. If the stack argument is immutable, then we won't need to store it back to the stack when we tail call. Fortunately, stack objects track their mutability, so we can just make the obvious check to fix the bug. This was http://crbug.com/749826 llvm-svn: 309343	2017-07-28 00:58:35 +00:00
Ahmed Bougacha	87807c5a86	[AArch64] Fix legality info passed to demanded bits for TBI opt. The (seldom-used) TBI-aware optimization had a typo lying dormant since it was first introduced, in r252573: when asking for demanded bits, it told TLI that it was running after legalize, where the opposite was true. This is an important piece of information, that the demanded bits analysis uses to make assumptions about the node. r301019 added such an assumption, which was broken by the TBI combine. Instead, pass the correct flags to TLO. llvm-svn: 309323	2017-07-27 21:27:25 +00:00

... 2 3 4 5 6 ...

21353 Commits