llvm-project

Commit Graph

Author	SHA1	Message	Date
Max Kazantsev	a63d333881	[SCEV] Add one more case in computeConstantDifference This patch teaches `computeConstantDifference` handle calculation of constant difference between `(X + C1)` and `(X + C2)` which is `(C2 - C1)`. Differential Revision: https://reviews.llvm.org/D43759 Reviewed By: anna llvm-svn: 328609	2018-03-27 04:54:00 +00:00
Mircea Trofin	56ba71b2a7	Revert "Revert "[lit] Generalized /dev/null support on Windows."" Summary: This reverts commit r328596. Checking if the arguments are strings before testing if they contain "/dev/null". Reviewers: rnk Reviewed By: rnk Subscribers: delcypher, llvm-commits Differential Revision: https://reviews.llvm.org/D44914 llvm-svn: 328603	2018-03-27 01:39:17 +00:00
Sanjay Patel	15f7df9f44	[x86] add RUN for target before roundss; NFC llvm-svn: 328601	2018-03-27 00:32:19 +00:00
Mircea Trofin	373c445c24	Revert "[lit] Generalized /dev/null support on Windows." This reverts commit ca7fdbb974384ce5a05528b22a41d46b1cc13e92. llvm-svn: 328596	2018-03-26 23:59:39 +00:00
Sanjay Patel	8653776367	[x86] add tests for ftrunc; NFC llvm-svn: 328592	2018-03-26 23:18:32 +00:00
Mircea Trofin	88911686c8	[lit] Generalized /dev/null support on Windows. Generalized /dev/null remapping on Windows, and added test. Reviewers: rnk Reviewed By: rnk Subscribers: amccarth, zturner, delcypher, llvm-commits Differential Revision: https://reviews.llvm.org/D44771 llvm-svn: 328589	2018-03-26 22:41:06 +00:00
Simon Pilgrim	f6440b6fb1	Fix newlines. NFCI. llvm-svn: 328583	2018-03-26 21:07:59 +00:00
Simon Pilgrim	28e7bcbba6	[X86] Add WriteCRC32 scheduler class Currently CRC32 instructions use the WriteFAdd class, this patch splits them off into their own, at the moment it is still mostly just a duplicate of WriteFAdd but it can now be tweaked on a target by target basis. Differential Revision: https://reviews.llvm.org/D44647 llvm-svn: 328582	2018-03-26 21:06:14 +00:00
Rafael Espindola	78fdca3cd5	Use local symbols for creating .stack-size. llvm-svn: 328581	2018-03-26 20:40:22 +00:00
Reid Kleckner	2fe905cfee	Fix go bindings test when using goma distributed build tool Goma[1] is a distributed build system similar to distcc and icecc primarily used to compile Chromium. The client is open source, and hopefully soon the server will be as well. The intended usage model is similar to most distributed build systems: prefix gomacc onto your compiler command line, and it transparently distributes compilation. The go lit config wants to determine the host compiler binary, so it needs some extra logic to avoid looking at these prefixes. [1] https://chromium.googlesource.com/infra/goma/client/ llvm-svn: 328580	2018-03-26 20:19:14 +00:00
Eli Friedman	88e2bac94d	[MemorySSA] Fix exponential compile-time updating MemorySSA. MemorySSAUpdater::getPreviousDefRecursive is a recursive algorithm, for each block, it computes the previous definition for each predecessor, then takes those definitions and combines them. But currently it doesn't remember results which it already computed; this means it can visit the same block multiple times, which adds up to exponential time overall. To fix this, this patch adds a cache. If we computed the result for a block already, we don't need to visit it again because we'll come up with the same result. Well, unless we RAUW a MemoryPHI; in that case, the TrackingVH will be updated automatically. This matches the original source paper for this algorithm. The testcase isn't really a test for the bug, but it adds coverage for the case where tryRemoveTrivialPhi erases an existing PHI node. (It's hard to write a good regression test for a performance issue.) Differential Revision: https://reviews.llvm.org/D44715 llvm-svn: 328577	2018-03-26 19:52:54 +00:00
Simon Pilgrim	fcf49df21c	[X86][Btver2] Add (U)COMISD/(U)COMISD scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) llvm-svn: 328573	2018-03-26 19:01:06 +00:00
Haicheng Wu	b45f921678	[SLP] Add more checks to a test case. NFC. llvm-svn: 328572	2018-03-26 18:59:28 +00:00
Reid Kleckner	41fb2dba9c	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Summary: Re-lands r328386 and r328443, reverting r328482. Incorporates fixes from @mstorsjo in D44876 (thanks!) so that small parameters in i8 and i16 do not end up in the SysV register parameters (EDI, ESI, etc). I added tests for how we receive small parameters, since that is the important part. It's always safe to store more bytes than will be read, but the assumptions you make when loading them are what really matter. I also tested this by self-hosting clang and it passed tests on win64. Reviewers: mstorsjo, hans Subscribers: hiraditya, mstorsjo, llvm-commits Differential Revision: https://reviews.llvm.org/D44900 llvm-svn: 328570	2018-03-26 18:49:48 +00:00
Simon Pilgrim	f33d905293	[X86] Add WriteBitScan/WriteLZCNT/WriteTZCNT/WritePOPCNT scheduler classes (PR36881) Give the bit count instructions their own scheduler classes instead of forcing them into existing classes. These were mostly overridden anyway, but I had to add in costs from Agner for silvermont and znver1 and the Fam16h SoG for btver2 (Jaguar). Differential Revision: https://reviews.llvm.org/D44879 llvm-svn: 328566	2018-03-26 18:19:28 +00:00
Krzysztof Parzyszek	5488deb1ab	[Hexagon] Add more lit tests llvm-svn: 328561	2018-03-26 17:53:48 +00:00
Lei Huang	be0afb0870	[Power9]Legalize and emit code for quad-precision convert from double-precision Legalize and emit code for quad-precision floating point operation xscvdpqp and add option to guard the quad precision operation support. Differential Revision: https://reviews.llvm.org/D44746 llvm-svn: 328558	2018-03-26 17:46:25 +00:00
Stefan Pintilie	26d4f923c4	[PowerPC] Infrastructure work. Implement getting the opcode for a spill in one place. A new function getOpcodeForSpill should now be the only place to get the opcode for a given spilled register. Differential Revision: https://reviews.llvm.org/D43086 llvm-svn: 328556	2018-03-26 17:39:18 +00:00
Tim Corringham	7116e8963d	[AMDGPU] Improve disassembler error handling Summary: llvm-objdump now disassembles unrecognised opcodes as data, using the .long directive. We treat unrecognised opcodes as being 32 bit values, so move along 4 bytes rather than the single byte which previously resulted in a cascade of bogus disassembly following an unrecognised opcode. While no solution can always disassemble code that contains embedded data correctly this provides a significant improvement. The disassembler will now cope with an arbitrary length section as it no longer truncates it to a multiple of 4 bytes, and will use the .byte directive for trailing bytes. Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D44685 llvm-svn: 328553	2018-03-26 17:06:33 +00:00
Simon Pilgrim	86ea53123d	[X86][Btver2] Add CVTSI2SD/CVTSI2SS scheduler costs We still need to account for how Jaguar passes data from GPR -> XMM, which isn't as clean as XMM -> GPR..... llvm-svn: 328551	2018-03-26 17:02:02 +00:00
Krzysztof Parzyszek	9f041b1830	[Pipeliner] Add missing loop carried dependences The pipeliner is not adding a dependence edge for a loop carried dependence, and ends up scheduling a load from iteration n prior to an aliased store in iteration n-1. The code that adds the loop carried dependences in the pipeliner doesn't check if the memory objects for loads and stores are "identified" (i.e., distinct) objects. If they are not, then the code that adds the dependences needs to be conservative. The objects can be used to check dependences only when they are distinct objects. The code that checks for loop carried dependences has been updated to classify loads and stores that are not identified as "unknown" values. A store with an "unknown" value can potentially create a loop carried dependence with any pending load. Patch by Brendon Cahoon. llvm-svn: 328547	2018-03-26 16:50:11 +00:00
Haicheng Wu	0ec1dbe417	[SLP] Add a test case. NFC. llvm-svn: 328546	2018-03-26 16:47:37 +00:00
Krzysztof Parzyszek	a212204453	[Pipeliner] Use latency to compute RecMII The patch contains severals changes needed to pipeline an example that was transformed so that a Phi with a subreg is converted to copies. The pipeliner wasn't working for a couple of reasons. - The RecMII was 3 instead of 2 due to the extra copies. - Copy instructions contained a latency of 1. - The node order algorithm was not choosing the best "bottom" node, which caused an instruction to be scheduled that had a predecessor and successor already scheduled. - Updated the Hexagon Machine Scheduler to check if the node is latency bound when adding the cost for a 0-latency dependence. The RecMII was 3 because the computation looks at the number of nodes in the recurrence. The extra copy is an extra node but it shouldn't increase the latency. The new RecMII computation looks at the latency of the instructions in the recurrence. We changed the latency of the dependence of a copy to 0. The latency computation for the copy also checks the use of the copy (similar to a reg_sequence). The node order algorithm was not choosing the last instruction in the recurrence for a bottom up traversal. This was when the last instruction is a copy. A check was added when choosing the instruction to check for NodeNum if the maxASAP is the same. This means that the scheduler will not end up with another node in the recurrence that has both a predecessor and successor already scheduled. The cost computation in Hexagon Machine Scheduler adds cost when an instruction can be packetized with a zero-latency instruction. We should only do this if the schedule is latency bound. Patch by Brendon Cahoon. llvm-svn: 328542	2018-03-26 16:33:16 +00:00
Simon Pilgrim	8815105cd5	[X86][Btver2] Add CVTSD2SS/CVTSS2SD scheduler costs llvm-svn: 328541	2018-03-26 16:24:13 +00:00
Krzysztof Parzyszek	f13bbf1d58	[Pipeliner] Fix assert caused by pipeliner serialization The pipeliner is asserting because the serialization step that occurs at the end is deleting an instruction. The assert occurs later on because there is a use without a definition. The problem occurs when an instruction defines a value used by a REQ_SEQUENCE and that value is used by a COPY instruction. The latencies between these instructions are zero, so they are put in to the same packet. The serialization code is unable to handle this correctly, and ends up putting the REG_SEQUENCE before its definition. There is special code in the serialization step that attempts to handle zero-cost instructions (phis, copy, reg_sequence) differently than regular instructions. Unfortunately, this means the order does not come out correct. This patch simplifies the code by changing the seperate steps for handling zero-cost and regular instructions. Only phis are handled separate now, since they should occurs first. Then, this patch adds checks to make use the MoveUse is set to the smallest value if there are multiple uses in a cycle. Patch by Brendon Cahoon. llvm-svn: 328540	2018-03-26 16:23:29 +00:00
Sebastian Pop	d870aea03e	[InstCombine] reassociate loop invariant GEP chains to enable LICM This change brings performance of zlib up by 10%. The example below is from a hot loop in longest_match() from zlib. do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 %idx.ext1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 -1 In this example %idx.ext1 is a loop invariant. It will be moved above the use of loop induction variable %idx.ext such that it can be hoisted out of the loop by LICM. The operands that have dependences carried by the loop will be sinked down in the GEP chain. This patch will produce the following output: do.body: %cur_match.addr.0 = phi i32 [ %cur_match, %entry ], [ %2, %do.cond ] %idx.ext = zext i32 %cur_match.addr.0 to i64 %add.ptr = getelementptr inbounds i8, i8* %win, i64 %idx.ext1 %add.ptr2 = getelementptr inbounds i8, i8* %add.ptr, i64 -1 %add.ptr3 = getelementptr inbounds i8, i8* %add.ptr2, i64 %idx.ext llvm-svn: 328539	2018-03-26 16:19:31 +00:00
Simon Pilgrim	aa40148cae	[X86][Btver2] Account for the "+i" integer pipe transfer costs (1cy use of JALU0 for GPR PRF write) llvm-svn: 328536	2018-03-26 16:10:08 +00:00
Krzysztof Parzyszek	8e1363df4e	[Pipeliner] Fix check for order dependences when finalizing instructions The code in orderDepdences that looks at the order dependences between instructions was processing all the successor and predecessor order dependences. However, we really only want to check for an order dependence for instructions scheduled in the same cycle. Also, fixed how the pipeliner handles output dependences. An output dependence is also a potential loop carried dependence. The pipeliner didn't handle this case properly so an invalid schedule could be created that allowed an output dependence to be scheduled in the next iteration at the same cycle. Patch by Brendon Cahoon. llvm-svn: 328516	2018-03-26 16:05:55 +00:00
Krzysztof Parzyszek	3a0a15afe7	[Pipeliner] Fix in the pipeliner phi reuse code When the definition of a phi is used by a phi in the next iteration, the pipeliner was assuming that the definition is processed first. Because of the assumption, an incorrect phi name was used. This patch has a check to see if the phi definition has been processed already. Patch by Brendon Cahoon. llvm-svn: 328510	2018-03-26 15:58:16 +00:00
Krzysztof Parzyszek	785b6cec11	[Pipeliner] Correctly update memoperands in the epilog The pipeliner needs to be conservative when updating the memoperands of instructions in the epilog. Previously, the pipeliner was changing the offset of the memoperand based upon the scheduling stage. However, that is incorrect when control flow branches around the kernel code. The bug enabled a load and store to the same stack offset to be swapped. This patch fixes the bug by updating the size of the memoperands to be UINT_MAX. This conservative value means that dependences will be created between other loads and stores. Patch by Brendon Cahoon. llvm-svn: 328508	2018-03-26 15:45:55 +00:00
Krzysztof Parzyszek	56f0fc4716	[Hexagon] Give priority to post-incremementing memory accesses in LSR llvm-svn: 328506	2018-03-26 15:32:03 +00:00
Simon Pilgrim	0b73b29388	[X86][Btver2] Add CVTSD2SI/CVTSS2SI scheduler costs Account for the "+i" integer pipe transfer cost (1cy use of JALU0 for GPR PRF write) This also adds missing vcvttss2si tests llvm-svn: 328505	2018-03-26 15:30:47 +00:00
Sanjay Patel	4fd4fd610c	[InstCombine] distribute fmul over fadd/fsub This replaces a large chunk of code that was looking for compound patterns that include these sub-patterns. Existing tests ensure that all of the previous examples are still folded as expected. We still need to loosen the FMF check. llvm-svn: 328502	2018-03-26 15:03:57 +00:00
Simon Pilgrim	3aa9344605	[X86][Btver2] Fix YMM BLENDPD/BLENDPS + UNPCKPD/UNPCKP instructions costs These should match the YMM MOVDUP/ PERMILPD/PERMILPS + SHUFPD/SHUFPS shuffles instead of using the WriteFShuffle defaults. llvm-svn: 328501	2018-03-26 14:44:24 +00:00
Andrea Di Biagio	5ffd2c3cfc	[llvm-mca] Fix how views are added to the InstructionTables. This should fix the stack-use-after-scope reported by the asan buildbots after revision 328493. llvm-svn: 328499	2018-03-26 14:25:52 +00:00
Sanjay Patel	2455fef497	[InstCombine] check uses before creating instructions for fmul distribution As the tests show, we could create extra instructions without any obvious benefit. llvm-svn: 328498	2018-03-26 14:25:43 +00:00
Simon Pilgrim	67df1cf597	[X86][Btver2] Add (V)SQRTPD/(V)SQRTSD costs The xmm sd/pd versions were using the WriteFSQRT default which is modelled on sqrtss/sqrtps llvm-svn: 328497	2018-03-26 14:03:40 +00:00
Andrea Di Biagio	ff9c1092b7	[llvm-mca] Add a flag -instruction-info to enable/disable the instruction info view. llvm-svn: 328493	2018-03-26 13:44:54 +00:00
Simon Pilgrim	caa203aed5	[X86][Btver2] Double the AGU and schedule pipe resources for YMM Both the AGUs and schedule pipes are double pumped for 256-bit instructions as well as the functional units which we already model. llvm-svn: 328491	2018-03-26 13:15:20 +00:00
Andrea Di Biagio	d1569290ef	[llvm-mca] Add flag -instruction-tables to print the theoretical resource pressure distribution for instructions (PR36874) The goal of this patch is to address most of PR36874. To fully fix PR36874 we need to split the "InstructionInfo" view from the "SummaryView". That would make easy to check the latency and rthroughput as well. The patch reuses all the logic from ResourcePressureView to print out the "instruction tables". We have an entry for every instruction in the input sequence. Each entry reports the theoretical resource pressure distribution. Resource pressure is uniformly distributed across all the processor resource units of a group. At the moment, the backend pipeline is not configurable, so the only way to fix this is by creating a different driver that simply sends instruction events to the resource pressure view. That means, we don't use the Backend interface. Instead, it is simpler to just have a different code-path for when flag -instruction-tables is specified. Once Clement addresses bug 36663, then we can port the "instruction tables" logic into a stage of our configurable pipeline. Updated the BtVer2 test cases (thanks Simon for the help). Now we pass flag -instruction-tables to each modified test. Differential Revision: https://reviews.llvm.org/D44839 llvm-svn: 328487	2018-03-26 12:04:53 +00:00
Carlos Alberto Enciso	5bf31e1e63	Test commit - adding a new line. llvm-svn: 328484	2018-03-26 11:38:01 +00:00
Max Kazantsev	a55749312b	[LoopUnroll] Fix dangling pointers in SCEV Current logic of loop SCEV invalidation in Loop Unroller implicitly relies on fact that exit count of outer loops cannot rely on exiting blocks of inner loops, which is true in current implementation of backedge taken count calculation but is wrong in general. As result, when we only forget the loop that we have just unrolled, we may still have cached data for its outer loops (in particular, exit counts) which keeps references on blocks of inner loop that could have been changed or even deleted. The attached test demonstrates a situaton when after unrolling of innermost loop the outermost loop contains a dangling pointer on non-existant block. The problem shows up when we apply patch https://reviews.llvm.org/D44677 that makes SCEV smarter about exit count calculation. I am not sure if the bug exists without this patch, it appears that now it is accidentally correct just because in practice exact backedge taken count for outer loops with complex control flow inside is never calculated. But when SCEV learns to do so, this problem shows up. This patch replaces existing logic of SCEV loop invalidation with a correct one, which happens to be invalidation of outermost loop (which also leads to invalidation of all loops inside of it). It is the only way to ensure that no outer loop keeps dangling pointers on removed blocks, or just outdated information that has changed after unrolling. Differential Revision: https://reviews.llvm.org/D44818 Reviewed By: samparker llvm-svn: 328483	2018-03-26 11:31:46 +00:00
Hans Wennborg	311b63f13b	Revert r328386 "[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32" This broke Chromium (see crbug.com/825748). It looks like mstorsjo's follow-up patch at D44876 fixes this, but let's revert back to green for now until that's ready to land. (Also reverts r328443.) > Both GCC and MSVC only look at the low byte of a boolean when it is > passed. llvm-svn: 328482	2018-03-26 10:07:51 +00:00
Benjamin Kramer	8840f644b4	[DeadArgElim] Strip allocsize attributes when deleting an argument. Since allocsize refers to the argument number it gets invalidated when an argument is removed and the numbers shift. llvm-svn: 328481	2018-03-26 09:44:24 +00:00
Sam Parker	53a423a417	[IRCE] Enable increasing loops of variable bounds CanBeMin is currently used which will report true for any unknown values, but often a check is performed outside the loop which covers this situation: for (int i = 0; i < N; ++i) ... if (N > 0) for (int i = 0; i < N; ++i) ... So I've add 'LoopGuardedAgainstMin' which reports whether N is greater than the minimum value which then allows loop with a variable loop count to be optimised. I've also moved the increasing bound checking into its own function and replaced SumCanReachMax is another isLoopEntryGuardedByCond function. llvm-svn: 328480	2018-03-26 09:29:42 +00:00
Craig Topper	6f28d3c954	[X86] Fix the SchedRW for intrinsic register form of SQRT/RCP/RSQRT. llvm-svn: 328474	2018-03-26 05:05:12 +00:00
Craig Topper	cdfcf8ecda	[X86] Merge the SSE and AVX versions of fp divs and sqrts in the SandyBridge/Haswell/Broadwell/Skylake scheduler models. I've used Agner's data as best I could to get the values to converge on. llvm-svn: 328473	2018-03-26 05:05:10 +00:00
Craig Topper	fbf2d850e3	[X86] Add itinerary to intrinsic version of sqrtss, rcpss, and rsqrtss instructions. llvm-svn: 328472	2018-03-26 04:20:36 +00:00
Craig Topper	659f85af14	[X86] Swap the itineraries on the memory and register forms of CVTDQ2PD. They were backwards. llvm-svn: 328469	2018-03-26 02:17:13 +00:00
Craig Topper	15fef89ad9	[X86] Move (v)movss to port 5 only for Skylake. Move (v)movups/d to port 015 for Skylake. This matches Agner's data and is consistent with what the EVEX instructions were doing on SKX. llvm-svn: 328465	2018-03-25 23:40:56 +00:00
Sanjay Patel	93e64dd9a1	[PatternMatch] allow undef elements when matching vector FP +0.0 This continues the FP constant pattern matching improvements from: https://reviews.llvm.org/rL327627 https://reviews.llvm.org/rL327339 https://reviews.llvm.org/rL327307 Several integer constant matchers also have this ability. I'm separating matching of integer/pointer null from FP positive zero and renaming/commenting to make the functionality clearer. llvm-svn: 328461	2018-03-25 21:16:33 +00:00
Sanjay Patel	c84b48ec29	[InstSimplify, InstCombine] add/update tests with FP +0.0 vector with undef; NFC llvm-svn: 328455	2018-03-25 17:48:20 +00:00
Craig Topper	a985919d3e	[X86] Update cost model for Goldmont. Add fsqrt costs for Silvermont Add fdiv costs for Goldmont using table 16-17 of the Intel Optimization Manual. Also add overrides for FSQRT for Goldmont and Silvermont. Reviewers: RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D44644 llvm-svn: 328451	2018-03-25 15:58:12 +00:00
Sanjay Patel	e94a85533f	[InstCombine] adjust test comments; NFC llvm-svn: 328450	2018-03-25 14:24:32 +00:00
Sanjay Patel	60d0e119c7	[InstCombine] consolidate casted icmp vector tests We have thorough coverage of predicates and scalar types, so we just need a sampling of vector tests to show that things are working or not with vectors types. llvm-svn: 328449	2018-03-25 14:19:25 +00:00
Sanjay Patel	841aac04d4	[InstCombine] peek through more icmp of FP cast + bitcast This is an extension of rL328426 as noted in D44367. llvm-svn: 328448	2018-03-25 14:01:42 +00:00
Mandeep Singh Grang	98bc25a0f2	[RISCV] Use init_array instead of ctors for RISCV target, by default Summary: LLVM defaults to the newer .init_array/.fini_array scheme for static constructors rather than the less desirable .ctors/.dtors (the UseCtors flag defaults to false). This wasn't being respected in the RISC-V backend because it fails to call TargetLoweringObjectFileELF::InitializeELF with the the appropriate flag for UseInitArray. This patch fixes this by implementing RISCVELFTargetObjectFile and overriding its Initialize method to call InitializeELF(TM.Options.UseInitArray). Reviewers: asb, apazos Reviewed By: asb Subscribers: mgorny, rbar, johnrusso, simoncook, jordy.potman.lists, sabuasal, niosHD, kito-cheng, shiva0217, llvm-commits Differential Revision: https://reviews.llvm.org/D44750 llvm-svn: 328433	2018-03-24 18:37:19 +00:00
Sanjay Patel	745a9c62c2	[InstCombine] peek through FP casts for sign-bit compares (PR36682) This pattern came up in PR36682: https://bugs.llvm.org/show_bug.cgi?id=36682 https://godbolt.org/g/LhuD9A Equality checks are planned as a follow-up enhancement. Differential Revision: https://reviews.llvm.org/D44367 llvm-svn: 328426	2018-03-24 15:45:02 +00:00
Simon Pilgrim	913345f8f5	[X86][AES] Ensure we're testing both non-VEX/VEX variants of AES instructions on AVX targets Add skylake server tests as well llvm-svn: 328424	2018-03-24 15:05:12 +00:00
Simon Pilgrim	91fe24b8cf	[X86][SSE] Ensure we're testing both non-VEX/VEX variants of SSE instructions on AVX targets And ensure we don't use later instruction sets in SSE schedule tests llvm-svn: 328423	2018-03-24 14:51:52 +00:00
Sanjay Patel	6a176d527c	[InstCombine] add multi-use/vector tests for intrinsic shrinking; NFC llvm-svn: 328422	2018-03-24 14:45:41 +00:00
Simon Pilgrim	f7d0f7e6db	[X86][AVX1] Ensure we don't use later instruction sets in AVX1 schedule tests llvm-svn: 328421	2018-03-24 13:47:48 +00:00
Simon Pilgrim	d2016f95fb	[X86][AVX2] Ensure we don't use later instruction sets in AVX2 schedule tests llvm-svn: 328420	2018-03-24 13:47:01 +00:00
Eric Christopher	40eb34607c	Add REQUIRES lines for the targets being checked in this test. llvm-svn: 328408	2018-03-24 02:56:58 +00:00
Craig Topper	2c0a62ab9a	[X86] Add a DAG combine to simplify PMULDQ/PMULUDQ nodes These nodes only use the lower 32 bits of their inputs so we can use SimplifyDemandedBits to simplify them. Differential Revision: https://reviews.llvm.org/D44375 llvm-svn: 328405	2018-03-24 01:52:01 +00:00
Eric Christopher	fe6e6d93d9	Allow FDE references outside the +/-2GB range supported by PC relative offsets for code models other than small/medium. For JIT application, memory layout is less controlled and can result in truncations otherwise. Patch based on one by Olexa Bilaniuk! llvm-svn: 328400	2018-03-24 00:07:38 +00:00
Reid Kleckner	e27b410661	[X86] Fix Windows `i1 zeroext` conventions to use i8 instead of i32 Both GCC and MSVC only look at the low byte of a boolean when it is passed. llvm-svn: 328386	2018-03-23 23:38:53 +00:00
Fedor Sergeev	6660fd0f95	[PM][FunctionAttrs] add NoUnwind attribute inference to PostOrderFunctionAttrs pass Summary: This was motivated by absence of PrunEH functionality in new PM. It was decided that a proper way to do PruneEH is to add NoUnwind inference into PostOrderFunctionAttrs and then perform normal SimplifyCFG on top. This change generalizes attribute handling implemented for (a removal of) Convergent attribute, by introducing a generic builder-like class AttributeInferer It registers all the attribute inference requests, storing per-attribute predicates into a vector, and then goes through an SCC Node, scanning all the instructions for not breaking attribute assumptions. The main idea is that as soon all the instructions from all the functions of SCC Node conform to attribute assumptions then we are free to infer the attribute as set for all the functions of SCC Node. It handles two distinct cases of attributes: - those that might break due to derefinement of the function code for these attributes we are allowed to apply inference only if all the functions are "exact definitions". Example - NoUnwind. - those that do not care about derefinement for these attributes we are allowed to apply inference as soon as we see any function definition. Example - removal of Convergent attribute. Also in this commit: * Converted all the FunctionAttrs tests to use FileCheck and added new-PM invocations to them * FunctionAttrs/convergent.ll test demonstrates a difference in behavior between new and old PM implementations. Marked with FIXME. * PruneEH tests were converted to new-PM as well, using function-attrs+simplify-cfg combo as intended * some of "other" tests were updated since function-attrs now infers 'nounwind' even for old PM pipeline * -disable-nounwind-inference hidden option added as a possible workaround for a supposedly rare case when nounwind being inferred by default presents a problem Reviewers: chandlerc, jlebar Reviewed By: jlebar Subscribers: eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D44415 llvm-svn: 328377	2018-03-23 21:46:16 +00:00
Sanjay Patel	9b09fe9f66	[InstCombine] increase test coverage for intrinsic shrinking; NFC There were no tests with vector types before this. llvm-svn: 328371	2018-03-23 21:13:53 +00:00
Krzysztof Parzyszek	bcf0a96f9e	[Hexagon] Boost profit for word-mask immediates, reduce for others This avoids unnecessary splitting due to uninteresting immediates. llvm-svn: 328364	2018-03-23 20:11:00 +00:00
Krzysztof Parzyszek	e247526cc9	[Hexagon] Fold offset in base+immediate loads/stores Optimize Ry = add(Rx,#n); memw(Ry+#0) = Rz => memw(Rx,#n) = Rz. Patch by Jyotsna Verma. llvm-svn: 328355	2018-03-23 19:30:34 +00:00
Tony Tye	88441a3d1e	[AMDGPU] Update OpenCL to use 48 bytes of implicit arguments for AMDGPU Add two additional implicit arguments for OpenCL for the AMDGPU target using the AMDHSA runtime to support device enqueue. Differential Revision: https://reviews.llvm.org/D44697 llvm-svn: 328351	2018-03-23 18:58:47 +00:00
Tony Tye	7a893d4e34	[AMDGPU] Remove use of OpenCL triple environment and replace with function attribute for AMDGPU - Remove use of the opencl and amdopencl environment member of the target triple for the AMDGPU target. - Use function attribute to communicate to the AMDGPU backend to add implicit arguments for OpenCL kernels for the AMDHSA OS. Differential Revision: https://reviews.llvm.org/D43736 llvm-svn: 328349	2018-03-23 18:45:18 +00:00
Krzysztof Parzyszek	5f7ba9a74c	[Hexagon] Always generate mux out of predicated transfers if possible HexagonGenMux would collapse pairs of predicated transfers if it assumed that the predicated .new forms cannot be created. Turns out that generating mux is preferable in almost all cases. Introduce an option -hexagon-gen-mux-threshold that controls the minimum distance between the instruction defining the predicate and the later of the two transfers. If the distance is closer than the threshold, mux will not be generated. Set the threshold to 0 by default. llvm-svn: 328346	2018-03-23 18:43:09 +00:00
Krzysztof Parzyszek	80f10e4fe5	[Hexagon] Avoid early if-conversion for one sided branches Patch by Anand Kodnani. llvm-svn: 328344	2018-03-23 18:00:18 +00:00
Simon Pilgrim	6c63e6c222	[X86][Btver2] Cleanup TEST instructions to use JFPA (+JFPX on ymms) function unit llvm-svn: 328343	2018-03-23 17:59:22 +00:00
Alex Shlyapnikov	83e7841419	[HWASan] Port HWASan to Linux x86-64 (LLVM) Summary: Porting HWASan to Linux x86-64, first of the three patches, LLVM part. The approach is similar to ARM case, trap signal is used to communicate memory tag check failure. int3 instruction is used to generate a signal, access parameters are stored in nop [eax + offset] instruction immediately following the int3 one. One notable difference is that x86-64 has to untag the pointer before use due to the lack of feature comparable to ARM's TBI (Top Byte Ignore). Reviewers: eugenis Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44699 llvm-svn: 328342	2018-03-23 17:57:54 +00:00
Ana Pazos	41573804f2	[ARM] Fix "Constant pool entry out of range!" in Thumb1 mode This patch fixes PR36658, "Constant pool entry out of range!" in Thumb1 mode. In ARMConstantIslands::optimizeThumb2JumpTables() in Thumb1 mode, adjustBBOffsetsAfter() is not calculating postOffset correctly by properly accounting for the padding that is required for the constant pool that immediately follows the jump table branch instruction. Reviewers: t.p.northover, eli.friedman Reviewed By: t.p.northover Subscribers: chrib, tstellar, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D44709 llvm-svn: 328341	2018-03-23 17:53:27 +00:00
Krzysztof Parzyszek	570c6440cd	[Hexagon] Two fixes in early if-conversion - Fix checking for vector predicate registers. - Avoid speculating llvm.lifetime.end intrinsic. Patch by Harsha Jagasia and Brendon Cahoon. llvm-svn: 328339	2018-03-23 17:46:09 +00:00
Simon Pilgrim	e5c0a041ff	[X86][Btver2] Cleanup MOVMSK instructions to use JFPA function unit Add missing non-VEX and (V)PMOVMSKB instructions to the pattern llvm-svn: 328338	2018-03-23 17:38:59 +00:00
Simon Pilgrim	256f149bf0	[X86][Btver2] Vector permutes use a JFPU01 scheduler pipe and JFPX/JVALU function unit llvm-svn: 328331	2018-03-23 16:17:56 +00:00
Sanjay Patel	cd1f3e7a78	[InstCombine] auto-generate checks; NFC llvm-svn: 328329	2018-03-23 15:39:03 +00:00
Simon Pilgrim	ee282b3160	[X86][Btver2] Vector store instructions use a JFPU1 scheduler pipe and JSAGU/JSTC function units llvm-svn: 328328	2018-03-23 15:35:13 +00:00
Sanjay Patel	3547dcb3ae	[InstSimplify] regenerate checks, move tests; NFC llvm-svn: 328327	2018-03-23 15:31:31 +00:00
Zaara Syeda	6535993625	Re-commit: [MachineLICM] Add functions to MachineLICM to hoist invariant stores This patch adds functions to allow MachineLICM to hoist invariant stores. Currently, MachineLICM does not hoist any store instructions, however when storing the same value to a constant spot on the stack, the store instruction should be considered invariant and be hoisted. The function isInvariantStore iterates each operand of the store instruction and checks that each register operand satisfies isCallerPreservedPhysReg. The store may be fed by a copy, which is hoisted by isCopyFeedingInvariantStore. This patch also adds the PowerPC changes needed to consider the stack register as caller preserved. Differential Revision: https://reviews.llvm.org/D40196 llvm-svn: 328326	2018-03-23 15:28:15 +00:00
Sanjay Patel	d189b596ae	[InstCombine] regenerate test checks; NFC llvm-svn: 328325	2018-03-23 15:19:35 +00:00
Simon Pilgrim	1335b9c0ca	[X86][Btver2] Cleanup DPPS/DPPD instructions to use JFPA/JFPM function units llvm-svn: 328324	2018-03-23 15:17:50 +00:00
John Brawn	e3b44f9de6	[AArch64] Don't reduce the width of loads if it prevents combining a shift Loads and stores can only shift the offset register by the size of the value being loaded, but currently the DAGCombiner will reduce the width of the load if it's followed by a trunc making it impossible to later combine the shift. Solve this by implementing shouldReduceLoadWidth for the AArch64 backend and make it prevent the width reduction if this is what would happen, though do allow it if reducing the load width will let us eliminate a later sign or zero extend. Differential Revision: https://reviews.llvm.org/D44794 llvm-svn: 328321	2018-03-23 14:47:07 +00:00
Simon Pilgrim	5792e10ffb	[X86][Btver2] Fix MicroOps counts for DPPS/YMM memory folded instructions This was due to a misunderstanding over what llvm calls a micro-op (retirement unit) is actually called a macro-op on the AMD/Jaguar target. Folded loads don't affect num macro ops. llvm-svn: 328320	2018-03-23 14:45:03 +00:00
Simon Pilgrim	8619962c73	[X86][Btver2] Cleanup SSE42 PCMPISTR/PCMPESTR string instructions to correctly use JFPU1 scheduler pipe followed by JLAGU/JSAGU/JFPA/JVALU function units Fixes throughput to match Agner/Fam16h-SoG as well. llvm-svn: 328318	2018-03-23 14:27:26 +00:00
Matthew Simpson	6c289a1c74	[SLP] Stop counting cost of gather sequences with multiple uses When building the SLP tree, we look for reuse among the vectorized tree entries. However, each gather sequence is represented by a unique tree entry, even though the sequence may be identical to another one. This means, for example, that a gather sequence with two uses will be counted twice when computing the cost of the tree. We should only count the cost of the definition of a gather sequence rather than its uses. During code generation, the redundant gather sequences are emitted, but we optimize them away with CSE. So it looks like this problem just affects the cost model. Differential Revision: https://reviews.llvm.org/D44742 llvm-svn: 328316	2018-03-23 14:18:27 +00:00
Alexey Bataev	bff360865b	[DEBUGINFO] Add flag for DWARF2 to use sections as references. Summary: Some targets does not support labels inside debug sections, but support references in form `section+offset`. Patch adds initial support for this. Reviewers: echristo, probinson, jlebar Subscribers: llvm-commits, JDevlieghere Differential Revision: https://reviews.llvm.org/D43943 llvm-svn: 328314	2018-03-23 13:35:54 +00:00
Christof Douma	4a025cc79d	[ARM] Support float literals under XO When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313	2018-03-23 13:02:03 +00:00
Florian Hahn	f73c3ece7f	Revert r328307: [IPSCCP] Use constant range information for comparisons of parameters. Reverted for now, due to it causing verifier failures. llvm-svn: 328312	2018-03-23 12:49:39 +00:00
Amara Emerson	f542355942	[GlobalISel] Fix legalizer combine to not use illegal input G_EXTRACT. This was being masked because GISel is enabled by default for -O0 and the abort was disabled. Modified test to explicitly enable abort. llvm-svn: 328311	2018-03-23 12:48:57 +00:00
Matthew Simpson	4316a2623a	[test] Allow for optional No-Op Barrier Pass in O0 pipeline llvm-svn: 328310	2018-03-23 12:47:54 +00:00
Simon Pilgrim	2755893834	[X86][SandyBridge] Fix missing comma that was causing string concatenation of 2 instregex entries Found while updating D44687 llvm-svn: 328308	2018-03-23 11:56:38 +00:00
Florian Hahn	b1feec087e	[IPSCCP] Use constant range information for comparisons of parameters. For comparisons with parameters, we can use the ParamState lattice elements which also provide constant range information. This improves the code for PR33253 further and gets us closer to use ValueLatticeElement for all values. Also, as we are using the range information in the solver directly, we do not need tryToReplaceWithConstantRange afterwards anymore. Reviewers: dberlin, mssimpso, davide, efriedma Reviewed By: mssimpso Differential Revision: https://reviews.llvm.org/D43762 llvm-svn: 328307	2018-03-23 11:56:00 +00:00
Simon Pilgrim	a1e3ea01ef	[X86][Btver2] Vector move/load/store instructions use a JFPU01 scheduler pipe and JFPX/JVALU function unit as well as the AGUs llvm-svn: 328304	2018-03-23 11:27:31 +00:00
Florian Hahn	52436a587e	[LoopUnroll] Simplify induction variables after peeling too. Loop peeling also has an impact on the induction variables, so we should benefit from induction variable simplification after peeling too. Reviewers: sanjoy, bogner, mzolotukhin, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D43878 llvm-svn: 328301	2018-03-23 10:38:12 +00:00

1 2 3 4 5 ...

52009 Commits