llvm-project

Commit Graph

Author	SHA1	Message	Date
Petr Hosek	710479cede	[CodeGen][X86] Fuchsia supports sincos* libcalls and sin+cos->sincos optimization Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D35748 llvm-svn: 308854	2017-07-23 22:30:00 +00:00
Craig Topper	07a7d56144	[X86] Add some hasSideEffects=0 flags. llvm-svn: 308835	2017-07-23 03:59:39 +00:00
Craig Topper	6912d7faa3	[X86] Add patterns for memory forms of SARX/SHLX/SHRX with careful complexity adjustment to keep shift by immediate using the legacy instructions. These patterns were only missing to favor using the legacy instructions when the shift was a constant. With careful adjustment of the pattern complexity we can make sure the immediate instructions still have priority over these patterns. llvm-svn: 308834	2017-07-23 03:59:37 +00:00
Craig Topper	abfe380f9a	[X86] Add nopq instruction which is a rex encoded version of nopl for gas compatibility. llvm-svn: 308818	2017-07-22 01:30:53 +00:00
Craig Topper	e88aef4b5f	[X86] Add register form of NOPL and NOPW for assembler/disassembler. Fixes PR32805. llvm-svn: 308817	2017-07-22 01:30:51 +00:00
Farhana Aleen	e4a89a6462	X86InterleaveAccess: A fix for bug33826 Reviewers: DavidKreitzer Differential Revision: https://reviews.llvm.org/D35638 llvm-svn: 308784	2017-07-21 21:35:00 +00:00
Jonas Paulsson	024e319489	[SystemZ, LoopStrengthReduce] This patch makes LSR generate better code for SystemZ in the cases of memory intrinsics, Load->Store pairs or comparison of immediate with memory. In order to achieve this, the following common code changes were made: * New TTI hook: LSRWithInstrQueries(), which defaults to false. Controls if LSR should do instruction-based addressing evaluations by calling isLegalAddressingMode() with the Instruction pointers. * In LoopStrengthReduce: handle address operands of memset, memmove and memcpy as address uses, and call isFoldableMemAccessOffset() for any LSRUse::Address, not just loads or stores. SystemZ changes: * isLSRCostLess() implemented with Insns first, and without ImmCost. * New function supportedAddressingMode() that is a helper for TTI methods looking at Instructions passed via pointers. Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D35262 https://reviews.llvm.org/D35049 llvm-svn: 308729	2017-07-21 11:59:37 +00:00
Simon Pilgrim	32c377a1cf	[X86][SSE] Add pre-AVX2 support for (i32 bitcast(v32i1)) -> 2xMOVMSK Currently we only support (i32 bitcast(v32i1)) using the AVX2 VPMOVMSKB ymm instruction. This patch adds support for splitting pre-AVX2 targets into 2 x (V)PMOVMSKB xmm instructions and merging the integer results. In future we could probably generalize this to handle more cases. Differential Revision: https://reviews.llvm.org/D35303 llvm-svn: 308723	2017-07-21 09:58:50 +00:00
Craig Topper	31140ade70	[AVX-512] Fix a bug that prevented some non-temporal loads from using the movntdqa instruction. The bitconverts here had an input type of 128-bits and an output type of 256 bits. The input type should also have been 256 bits. llvm-svn: 308702	2017-07-21 00:40:42 +00:00
Craig Topper	27c12e088e	[X86] Allow masks with more than 6 bits set on the x << (y & mask) optimization for the 64-bit memory shifts. llvm-svn: 308657	2017-07-20 19:29:58 +00:00
Craig Topper	33225ef314	[X86] Use SARX/SHLX/SHLX instructions for (shift x (and y, (BitWidth-1))) Fixes PR33841. llvm-svn: 308591	2017-07-20 06:19:55 +00:00
Davide Italiano	5fc5d0a406	[X86] Don't try to scale down if that exceeds the bitwidth. Fixes the crash reported in PR33844. llvm-svn: 308503	2017-07-19 18:09:46 +00:00
Simon Pilgrim	e5c7925c5e	[X86][XOP] Use default AVX2 lowering for v4i64 ashr by splat constants XOP shifts only support 128-bit vectors, so we were ending up with less optimal codegen requiring constants llvm-svn: 308430	2017-07-19 10:29:31 +00:00
Craig Topper	106b5b6856	AMD znver1 Initial Scheduler model Summary: This patch adds the following 1. Adds a skeleton scheduler model for AMD Znver1. 2. Introduces the znver1 execution units and pipes. 3. Caters the instructions based on the generic scheduler classes. 4. Further additions to the scheduler model with instruction itineraries will be carried out incrementally based on a. Instructions types b. Registers used 5. Since itineraries are not added based on instructions, throughput information are bound to change when incremental changes are added. 6. Scheduler testcases are modified accordingly to suit the new model. Patch by Ganesh Gopalasubramanian. With minor formatting tweaks from me. Reviewers: craig.topper, RKSimon Subscribers: javed.absar, shivaram, ddibyend, vprasad Differential Revision: https://reviews.llvm.org/D35293 llvm-svn: 308411	2017-07-19 02:45:14 +00:00
Simon Pilgrim	483927aefb	[x86, CGP] increase memcmp() expansion up to 4 load pairs It should be a win to avoid going out to the system lib for all small memcmp() calls using scalar ops. For x86 32-bit, this means most everything up to 16 bytes. For 64-bit, that doubles because we can do 8-byte loads. Notes: Reduced from 4 to 2 loads for -Os behavior, which might not be optimal in all cases. It's effectively a question of how much do we trust the system implementation. Linux and macOS (and Windows I assume, but did not test) have optimized memcmp() code for x86, so it's probably not bad either way? PPC is using 8/4 for defaults on these. We do not expand at all for -Oz. There are still potential improvements to make for the CGP expansion IR and/or lowering such as avoiding select-of-constants (D34904) and not doing zexts to the max load type before doing a compare. We have special-case SSE/AVX codegen for (memcmp(x, y, 16/32) == 0) that will no longer be produced after this patch. I've shown the experimental justification for that change in PR33329: https://bugs.llvm.org/show_bug.cgi?id=33329#c12 TLDR: While the vector code is a likely winner, we can't guarantee that it's a winner in all cases on all CPUs, so I'm willing to sacrifice it for the greater good of expanding all small memcmp(). If we want to resurrect that codegen, it can be done by adjusting the CGP params or poking a hole to let those fall-through the CGP expansion. Committed on behalf of Sanjay Patel Differential Revision: https://reviews.llvm.org/D35067 llvm-svn: 308322	2017-07-18 15:55:30 +00:00
Craig Topper	f54a500101	[X86] Prevent an assertion failure if a gather intrinsic is passed a non-constant scale value. This isn't legal code, but we shouldn't crash on it. Now we just don't convert the gather intrinsic if the scale isn't constant and let it go through to isel where we'll report an isel failure. Fixes PR33772. llvm-svn: 308267	2017-07-18 06:49:23 +00:00
Martin Storsjo	2f24e93481	[AArch64] Extend CallingConv::X86_64_Win64 to AArch64 as well Rename the enum value from X86_64_Win64 to plain Win64. The symbol exposed in the textual IR is changed from 'x86_64_win64cc' to 'win64cc', but the numeric value is kept, keeping support for old bitcode. Differential Revision: https://reviews.llvm.org/D34474 llvm-svn: 308208	2017-07-17 20:05:19 +00:00
Simon Pilgrim	1cbe8c2ca5	[X86][AVX512] Add lowering of vXi32/vXi64 ISD::ROTL/ISD::ROTR Add support for lowering to ISD::ROTL/ISD::ROTR, including rotate by immediate Differential Revision: https://reviews.llvm.org/D35463 llvm-svn: 308177	2017-07-17 14:11:30 +00:00
Simon Pilgrim	64fff14bde	Strip trailing whitespace. NFCI llvm-svn: 308143	2017-07-16 18:37:23 +00:00
Amjad Aboud	4563c062b1	[X86] X86::CMOV to Branch heuristic based optimization. LLVM compiler recognizes opportunities to transform a branch into IR select instruction(s) - later it will be lowered into X86::CMOV instruction, assuming no other optimization eliminated the SelectInst. However, it is not always profitable to emit X86::CMOV instruction. For example, branch is preferable over an X86::CMOV instruction when: 1. Branch is well predicted 2. Condition operand is expensive, compared to True-value and the False-value operands In CodeGenPrepare pass there is a shallow optimization that tries to convert SelectInst into branch, but it is not enough. This commit, implements machine optimization pass that converts X86::CMOV instruction(s) into branch, based on a conservative heuristic. Differential Revision: https://reviews.llvm.org/D34769 llvm-svn: 308142	2017-07-16 17:39:56 +00:00
Simon Pilgrim	73ef87978f	[X86][SSE4A] Add EXTRQ/INSERTQ values to BTVER2 scheduling model llvm-svn: 308132	2017-07-16 12:06:06 +00:00
Hiroshi Inoue	7f46baff2c	fix typos in comments; NFC llvm-svn: 308127	2017-07-16 08:11:56 +00:00
Eric Christopher	4e332c7cf1	Add a set of comments explaining why getSubtargetImpl() is deleted on these targets. llvm-svn: 307999	2017-07-14 04:33:43 +00:00
Simon Pilgrim	5ee68bcc22	Fix whitespace indentation. NFCI. llvm-svn: 307894	2017-07-13 09:36:04 +00:00
Hiroshi Inoue	e9dea6e613	fix typos in comments and error messges; NFC llvm-svn: 307885	2017-07-13 06:48:39 +00:00
Sanjay Patel	4450e73b5e	[x86] improve SBB optimizations for SETB/SETA with subtract This is another step towards removing a combine that turns sext into select of constants and preparing the backend for an IR future where select is the canonical form. Earlier commits in this area: https://reviews.llvm.org/rL306040 https://reviews.llvm.org/rL306072 https://reviews.llvm.org/rL307404 (https://reviews.llvm.org/D34652) https://reviews.llvm.org/rL307471 llvm-svn: 307821	2017-07-12 17:56:46 +00:00
Davide Italiano	a63981aaa9	[X86/FastIsel] Fall-back to SelectionDAG when lowering soft-floats. FastIsel can't handle them, so we would end up crashing during register class selection. Fixes PR26522. Differential Revision: https://reviews.llvm.org/D35272 llvm-svn: 307797	2017-07-12 15:26:06 +00:00
Rafael Espindola	1beb702ba2	Fully fix the movw/movt addend. The issue is not if the value is pcrel. It is whether we have a relocation or not. If we have a relocation, the static linker will select the upper bits. If we don't have a relocation, we have to do it. llvm-svn: 307730	2017-07-11 23:18:25 +00:00
Konstantin Zhuravlyov	bb80d3e1d3	Enhance synchscope representation OpenCL 2.0 introduces the notion of memory scopes in atomic operations to global and local memory. These scopes restrict how synchronization is achieved, which can result in improved performance. This change extends existing notion of synchronization scopes in LLVM to support arbitrary scopes expressed as target-specific strings, in addition to the already defined scopes (single thread, system). The LLVM IR and MIR syntax for expressing synchronization scopes has changed to use syncscope("<scope>"), where <scope> can be "singlethread" (this replaces singlethread keyword), or a target-specific name. As before, if the scope is not specified, it defaults to CrossThread/System scope. Implementation details: - Mapping from synchronization scope name/string to synchronization scope id is stored in LLVM context; - CrossThread/System and SingleThread scopes are pre-defined to efficiently check for known scopes without comparing strings; - Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in the bitcode. Differential Revision: https://reviews.llvm.org/D21723 llvm-svn: 307722	2017-07-11 22:23:00 +00:00
Igor Breger	324d3791f8	[GlobalISel][X86] Use correct AND instructions. AND8ri8 not supported in 64bit. llvm-svn: 307630	2017-07-11 08:04:51 +00:00
Andrew V. Tischenko	ae9d6db769	[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 (PR28573). The new version of the model is definitely faster. Differential Revision: https://reviews.llvm.org/D35198 llvm-svn: 307552	2017-07-10 16:36:03 +00:00
Gadi Haber	f4d154c089	This patch completely replaces the scheduling information for the SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target. The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information. Please note that the patch extensively affects the X86 MC instr scheduling for SNB. Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX. The updated and extended information about each instruction includes the following details: •static latency of the instruction •number of uOps from which the instruction consists of •all ports used by the instruction's' uOPs For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5: def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> { let Latency = 9; let NumMicroOps = 6; let ResourceCycles = [1,2,2,1]; } def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>; def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>; Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script. Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb Differential Revision: https://reviews.llvm.org/D35019#inline-304691 llvm-svn: 307529	2017-07-10 09:53:16 +00:00
Igor Breger	d8b51e134e	[GlobalISel][X86] Support G_LOAD/G_STORE i1. Summary: Support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D35178 llvm-svn: 307527	2017-07-10 09:26:09 +00:00
Igor Breger	d48c5e4855	[GlobalISel][X86] extend G_ZEXT support. Summary: Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal. Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code). This patch requred to support G_LOAD/G_STORE i1. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D35177 llvm-svn: 307526	2017-07-10 09:07:34 +00:00
Simon Pilgrim	4050c77d33	[X86] Allow GHC calling convention to use YMM and ZMM registers GHC 8.4 will know how to use YMM and ZMM registers for calls. Submitted on behalf of @bgamari (Ben Gamari) Differential Revision: https://reviews.llvm.org/D34854 llvm-svn: 307504	2017-07-09 16:57:10 +00:00
Sanjay Patel	18ee908ca2	[x86] add SBB optimization for SETBE (ule) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 rL307404 (D34652) As acknowledged in the earlier review, there's a possibility that some Intel uarch would prefer to produce an xor to clear the fake register operand with sbb %eax, %eax. This will likely need to be addressed in a separate pass. llvm-svn: 307471	2017-07-08 14:04:48 +00:00
Eric Christopher	8737f0650c	Remove a variable that was only used in asserts and had a duplicate copy in something we did use anyhow. llvm-svn: 307457	2017-07-08 01:03:29 +00:00
Sanjay Patel	dd36f75733	[x86] add SBB optimization for SETAE (uge) condition code x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. DAGCombiner already has the foundation to allow the transforms, so we just need to fill in the holes for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 Differential Revision: https://reviews.llvm.org/D34652 llvm-svn: 307404	2017-07-07 14:56:20 +00:00
Simon Pilgrim	8ae7e41bea	Fix spelling in comments. NFCI. llvm-svn: 307288	2017-07-06 18:17:07 +00:00
Simon Pilgrim	713600747e	[X86][SSE4A] Add support for shuffle combining to INSERTQI. llvm-svn: 307268	2017-07-06 15:34:17 +00:00
Simon Pilgrim	7b79fbd4ea	[X86][SSE] combineX86ShuffleChain - merge duplicate creations of integer mask types llvm-svn: 307257	2017-07-06 13:09:19 +00:00
Simon Pilgrim	77ad6d9bb2	[X86][SSE] combineX86ShuffleChain - merge duplicate 'Zeroable' element masks llvm-svn: 307255	2017-07-06 12:40:10 +00:00
Simon Pilgrim	cc0f785dca	[X86][SSE4A] Add support for shuffle combining to EXTRQ. llvm-svn: 307254	2017-07-06 12:22:58 +00:00
Simon Pilgrim	1dd0bd1949	[X86][SSE4A] Split EXTRQ/INSERTQ shuffle matching from lowering. NFCI. First step toward supporting shuffle combining to EXTRQ/INSERTQ. llvm-svn: 307250	2017-07-06 11:06:54 +00:00
Igor Breger	0c979d49eb	[GlobalISel][X86] For now don't handle not trivial function arguments lowering. llvm-svn: 307142	2017-07-05 11:40:35 +00:00
Igor Breger	9d5571a226	[GlobalISel][X86] Allow graceful fallback for struct/array argument/return value lowering. Going to support it in follow patch. llvm-svn: 307125	2017-07-05 06:24:13 +00:00
Simon Pilgrim	ac3e7f3f57	[X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shuffles With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099	2017-07-04 18:11:02 +00:00
Simon Pilgrim	f809c5f11c	Fix signed/unsigned comparison warnings llvm-svn: 307098	2017-07-04 17:42:01 +00:00
Simon Pilgrim	9f0a0bd20b	[X86][SSE4A] Generalized EXTRQI/INSERTQI shuffle decodes The existing decodes only worked for v16i8 vectors, this adds support for any 128-bit vector llvm-svn: 307095	2017-07-04 16:53:12 +00:00
Daniel Sanders	6ab0daade8	[globalisel][tablegen] Partially fix compile-time regressions by converting matcher to state-machine(s) Summary: Replace the matcher if-statements for each rule with a state-machine. This significantly reduces compile time, memory allocations, and cumulative memory allocation when compiling AArch64InstructionSelector.cpp.o after r303259 is recommitted. The following patches will expand on this further to fully fix the regressions. Reviewers: rovka, ab, t.p.northover, qcolombet, aditya_nandakumar Reviewed By: ab Subscribers: vitalybuka, aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33758 llvm-svn: 307079	2017-07-04 14:35:06 +00:00
Craig Topper	ad140cfb68	[X86] Add comment string for broadcast loads from the constant pool. Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062	2017-07-04 05:46:11 +00:00
Craig Topper	a4c5caf67a	[X86] Add RDRAND feature to GLM CPU Summary: I believe this should be supported on GLM since RDSEED is. Reviewers: m_zuckerman, zvi, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34828 llvm-svn: 307060	2017-07-04 05:33:19 +00:00
Simon Pilgrim	fa6e675267	[X86][SSE4A] Add support for combining from EXTRQI/INSERTQI shuffles llvm-svn: 307048	2017-07-03 20:58:16 +00:00
Zvi Rackover	d7a1c334ce	DAGCombine: Combine BUILD_VECTOR to TRUNCATE Summary: Add a combine for creating a truncate to replace a build_vector composed of extracts with indices that form a stride-2^N series. Example: v8i32 V = ... v4i32 build_vector((extract_elt V, 0), (extract_elt V, 2), (extract_elt V, 4), (extract_elt V, 6)) --> v4i32 truncate (bitcast V to v4i64) Related discussion in llvm-dev about canonicalizing shuffles to truncates in LLVM IR: http://lists.llvm.org/pipermail/llvm-dev/2017-January/108936.html. Reviewers: spatel, RKSimon, efriedma, igorb, craig.topper, wolfgangp, delena Reviewed By: delena Subscribers: guyblank, delena, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D34077 llvm-svn: 307036	2017-07-03 15:47:40 +00:00
Igor Breger	5c787ab346	[GlobalISel][X86] fix %ptr(p0) = G_CONSTANT selection. llvm-svn: 307019	2017-07-03 11:06:54 +00:00
Hiroshi Inoue	ddb34d84c9	fix trivial typos in comments; NFC llvm-svn: 307004	2017-07-03 06:32:59 +00:00
Simon Pilgrim	a9655ffb42	[X86][AVX512VPOPCNTDQ] Improve support for v16i8/v8i16/v16i16/ CTPOP Zero extend to v16i32/v8i64, use VPOPCNTDQ instructions and truncate back. llvm-svn: 306990	2017-07-02 19:32:37 +00:00
Simon Pilgrim	8971b2904e	[X86][SSE] Attempt to combine 64-bit and 32-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive llvm-svn: 306978	2017-07-02 14:16:25 +00:00
Simon Pilgrim	4cb5613c38	[X86][SSE] Attempt to combine 64-bit and 16-bit shuffles to unary shuffles before bit shifts We are combining shuffles to bit shifts before unary permutes, which means we can't fold loads plus the destination register is destructive The 32-bit shuffles are a bit tricky and will be dealt with in a later patch llvm-svn: 306977	2017-07-02 13:19:10 +00:00
Mohammed Agabaria	eb09a810e6	[X86][CM] update add\sub costs of vectors of 64 in X86\SLM arch this patch updates the cost of addq\subq (add\subtract of vectors of 64bits) based on the performance numbers of SLM arch. Differential Revision: https://reviews.llvm.org/D33983 llvm-svn: 306974	2017-07-02 12:16:15 +00:00
Igor Breger	717bd36c83	[GlobalISel][X86] Support G_GLOBAL_VALUE operation. Summary: Support G_GLOBAL_VALUE operation. For now most of the PIC configurations not implemented yet. Reviewers: zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34738 Conflicts: test/CodeGen/X86/GlobalISel/regbankselect-X86_64.mir llvm-svn: 306972	2017-07-02 08:58:29 +00:00
Igor Breger	b186a69aa5	[GlobalISel][X86] Support vector type G_UNMERGE_VALUES selection. Summary: Support vector type G_UNMERGE_VALUES selection. For now G_UNMERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Reviewers: t.p.northover, qcolombet, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33665 llvm-svn: 306971	2017-07-02 08:15:49 +00:00
Hiroshi Inoue	bb703e8960	fix trivial typos; NFC suport -> support llvm-svn: 306968	2017-07-02 03:24:54 +00:00
Quentin Colombet	8cf805ae89	[X86] Move GISel accessor initialization from TargetMachine to Subtarget. NFC llvm-svn: 306921	2017-07-01 00:45:50 +00:00
Eric Christopher	b4fb256574	Make 0 argument getSubtargetImpl functions for the X86, AArch64, and PPC targets deleted so that no one is tempted to use them. llvm-svn: 306864	2017-06-30 19:49:05 +00:00
Simon Pilgrim	724990ab64	[X86][SSE] Pulled common variables to top of matchUnaryPermuteVectorShuffle. NFCI. llvm-svn: 306847	2017-06-30 18:00:14 +00:00
Daniel Jasper	559aa75382	Revert "r306529 - [X86] Correct dwarf unwind information in function epilogue" I am 99% sure that this breaks the PPC ASAN build bot: http://lab.llvm.org:8011/builders/sanitizer-ppc64be-linux/builds/3112/steps/64-bit%20check-asan/logs/stdio If it doesn't go back to green, we can recommit (and fix the original commit message at the same time :) ). llvm-svn: 306676	2017-06-29 13:58:24 +00:00
Igor Breger	0cddd34876	[GlobalISel][X86] Support vector type G_MERGE_VALUES selection. Summary: Support vector type G_MERGE_VALUES selection. For now G_MERGE_VALUES marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: rovka, kristof.beyls, guyblank, llvm-commits Differential Revision: https://reviews.llvm.org/D33958 llvm-svn: 306665	2017-06-29 12:08:28 +00:00
Michael Zuckerman	4bcb9c3349	[LLVM][X86][Goldmont] Adding new target-cpu: Goldmont [LLVM SIDE] Connecting the GoldMont processor to his feature. Reviewers: 1. igorb 2. zvi 3. delena 4. RKSimon 5. craig.topper Differential Revision: https://reviews.llvm.org/D34504 llvm-svn: 306658	2017-06-29 10:00:33 +00:00
Rafael Espindola	d926ea2ff7	Reuse existing variables. NFC. llvm-svn: 306586	2017-06-28 19:26:37 +00:00
Rafael Espindola	96367a3d1e	Fix PR33625. We were failing to convert this expression to pcrel. llvm-svn: 306573	2017-06-28 17:56:07 +00:00
Igor Breger	d5b59cf914	[GlobalISel][X86] Support bitwise operations : G_AND, G_OR, G_XOR Summary: Support G_AND, G_OR, G_XOR for i8/i16/i32/i64. Selection done via TableGen'erated code. Reviewers: zvi, guyblank, aymanmus, m_zuckerman Reviewed By: aymanmus Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34605 llvm-svn: 306533	2017-06-28 11:39:04 +00:00
Michael Zuckerman	f66840020c	Reverting commit 306414 on behalf of @gadi.haber llvm-svn: 306532	2017-06-28 11:23:31 +00:00
Petar Jovanovic	7b3a38ec30	[X86] Correct dwarf unwind information in function epilogue CFI instructions that set appropriate cfa offset and cfa register are now inserted in emitEpilogue() in X86FrameLowering. Majority of the changes in this patch: 1. Ensure that CFI instructions do not affect code generation. 2. Enable maintaining correct information about cfa offset and cfa register in a function when basic blocks are reordered, merged, split, duplicated. These changes are target independent and described below. Changed CFI instructions so that they: 1. are duplicable 2. are not counted as instructions when tail duplicating or tail merging 3. can be compared as equal Add information to each MachineBasicBlock about cfa offset and cfa register that are valid at its entry and exit (incoming and outgoing CFI info). Add support for updating this information when basic blocks are merged, split, duplicated, created. Add a verification pass (CFIInfoVerifier) that checks that outgoing cfa offset and register of predecessor blocks match incoming values of their successors. Incoming and outgoing CFI information is used by a late pass (CFIInstrInserter) that corrects CFA calculation rule for a basic block if needed. That means that additional CFI instructions get inserted at basic block beginning to correct the rule for calculating CFA. Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D18046 llvm-svn: 306529	2017-06-28 10:21:17 +00:00
Coby Tayree	41a5b55f50	[X86][AsmParser][MS-compatability] Binary/Unary operators enhancements Introducing MOD binary operator https://msdn.microsoft.com/en-us/library/hha180wt.aspx Enhancing unary operators NEG and NOT, to support more complex patterns Differential Revision: https://reviews.llvm.org/D33876 llvm-svn: 306425	2017-06-27 16:58:27 +00:00
Gadi Haber	13759a7ed6	Updated and extended the information about each instruction in HSW and SNB to include the following data: •static latency •number of uOps from which the instructions consists •all ports used by the instruction Reviewers:  RKSimon zvi aymanmus m_zuckerman Differential Revision: https://reviews.llvm.org/D33897 llvm-svn: 306414	2017-06-27 15:05:13 +00:00
Ayman Musa	721d97f7b8	Recommitting rL305465 after fixing bug in TableGen in rL306251 & rL306371 [X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 306402	2017-06-27 12:08:37 +00:00
Galina Kistanova	06a0e0e6a9	Fixed the warning introduced by r306289 to make ubuntu-gcc7.1-werror bot green. llvm-svn: 306369	2017-06-27 06:58:57 +00:00
Tim Northover	c2d5e6d637	AArch64: legalize G_EXTRACT operations. This is the dual problem to legalizing G_INSERTs so most of the code and testing was cribbed from there. llvm-svn: 306328	2017-06-26 20:34:13 +00:00
Marina Yatsina	f58dcb85d2	[inline asm] dot operator while using imm generates wrong ir + asm - llvm part Inline asm dot operator while using imm generates wrong ir and asm This also fixes bugzilla 32987: https://bugs.llvm.org//show_bug.cgi?id=32987 The clang part of the review that contains the test can be found here: https://reviews.llvm.org/D33040 commit on behald of zizhar Differential Revision: https://reviews.llvm.org/D33039 llvm-svn: 306300	2017-06-26 16:03:42 +00:00
Ahmed Bougacha	58a197414e	[X86][AVX-512] Don't raise inexact in ceil, floor, round, trunc. The non-AVX-512 behavior was changed in r248266 to match N1778 (C bindings for IEEE-754 (2008)), which defined the four functions to not raise the inexact exception ("rint" is still defined as raising it). Update the AVX-512 lowering of these functions to match that: it should not be different. llvm-svn: 306299	2017-06-26 16:00:24 +00:00
Sanjay Patel	15748d239e	[x86] transform vector inc/dec to use -1 constant (PR33483) Convert vector increment or decrement to sub/add with an all-ones constant: add X, <1, 1...> --> sub X, <-1, -1...> sub X, <1, 1...> --> add X, <-1, -1...> The all-ones vector constant can be materialized using a pcmpeq instruction that is commonly recognized as an idiom (has no register dependency), so that's better than loading a splat 1 constant. AVX512 uses 'vpternlogd' for 512-bit vectors because there is apparently no better way to produce 512 one-bits. The general advantages of this lowering are: 1. pcmpeq has lower latency than a memop on every uarch I looked at in Agner's tables, so in theory, this could be better for perf, but... 2. That seems unlikely to affect any OOO implementation, and I can't measure any real perf difference from this transform on Haswell or Jaguar, but... 3. It doesn't look like it from the diffs, but this is an overall size win because we eliminate 16 - 64 constant bytes in the case of a vector load. If we're broadcasting a scalar load (which might itself be a bug), then we're replacing a scalar constant load + broadcast with a single cheap op, so that should always be smaller/better too. 4. This makes the DAG/isel output more consistent - we use pcmpeq already for padd x, -1 and psub x, -1, so we should use that form for +1 too because we can. If there's some reason to favor a constant load on some CPU, let's make the reverse transform for all of these cases (either here in the DAG or in a later machine pass). This should fix: https://bugs.llvm.org/show_bug.cgi?id=33483 Differential Revision: https://reviews.llvm.org/D34336 llvm-svn: 306289	2017-06-26 14:19:26 +00:00
Simon Pilgrim	c338ba48fc	[X86][SSE] Remove unused memopfsf32_128/memopfsf64_128 scalar memops The 'scalar' simd bitops were dropped a while ago llvm-svn: 306248	2017-06-25 17:04:58 +00:00
Simon Pilgrim	bed1fa1ac1	Strip trailing whitespace. NFCI. llvm-svn: 306247	2017-06-25 16:57:46 +00:00
Igor Breger	f5035d6ee5	[GlobalISel][X86] Support vector type G_EXTRACT selection. Summary: Support vector type G_EXTRACT selection. For now G_EXTRACT marked as legal for any type, so nothing to do in legalizer. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33957 llvm-svn: 306240	2017-06-25 11:42:17 +00:00
Dorit Nuzman	e0e0f1ddb0	[AVX2] [TTI CostModel] Add cost of interleaved loads/stores for AVX2 The cost of an interleaved access was only implemented for AVX512. For other X86 targets an overly conservative Base cost was returned, resulting in avoiding vectorization where it is actually profitable to vectorize. This patch starts to add costs for AVX2 for most prominent cases of interleaved accesses (stride 3,4 chars, for now). Note1: Improvements of up to ~4x were observed in some of EEMBC's rgb workloads; There is also a known issue of 15-30% degradations on some of these workloads, associated with an interleaved access followed by type promotion/widening; the resulting shuffle sequence is currently inefficient and will be improved by a series of patches that extend the X86InterleavedAccess pass (such as D34601 and more to follow). Note 2: The costs in this patch do not reflect port pressure penalties which can be very dominant in the case of interleaved accesses since most of the shuffle operations are restricted to a single port. Further tuning, that may incorporate these considerations, will be done on top of the upcoming improved shuffle sequences (that is, along with the abovementioned work to extend X86InterleavedAccess pass). Differential Revision: https://reviews.llvm.org/D34023 llvm-svn: 306238	2017-06-25 08:26:25 +00:00
Rafael Espindola	f351292141	Remove redundant argument. llvm-svn: 306189	2017-06-24 00:26:57 +00:00
Rafael Espindola	801b42de31	ARM: move some logic from processFixupValue to applyFixup. processFixupValue is called on every relaxation iteration. applyFixup is only called once at the very end. applyFixup is then the correct place to do last minute changes and value checks. While here, do proper range checks again for fixup_arm_thumb_bl. We used to do it, but dropped because of thumb2. We now do it again, but use the thumb2 range. llvm-svn: 306177	2017-06-23 22:52:36 +00:00
whitequark	00ede4dcc1	[X86] Fix SP adjustment in stack probes emitted on 32-bit Windows. Commit r306010 adjusted the condition as follows: - if (Is64Bit) { + if (!STI.isTargetWin32()) { The intent was to preserve the behavior on all Windows platforms but extend the behavior on 64-bit Windows platforms to every other one. (Before r306010, emitStackProbeCall only ever executed when emitting code for Windows triples.) Unfortunately, if (Is64Bit && STI.isOSWindows()) is not the same as if (!STI.isTargetWin32()) because of the way isTargetWin32() is defined: bool isTargetWin32() const { return !In64BitMode && (isTargetCygMing() \|\| isTargetKnownWindowsMSVC()); } In practice this broke the JIT tests on 32-bit Windows, which did not satisfy the new condition: LLVM :: ExecutionEngine/MCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/MCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/MCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/MCJIT/test-loadstore.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-01-15-AlignmentTest.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-15-AllocaAssertion.ll LLVM :: ExecutionEngine/OrcMCJIT/2003-08-23-RegisterAllocatePhysReg.ll LLVM :: ExecutionEngine/OrcMCJIT/test-loadstore.ll because %esp was not updated correctly. The failures are only visible on a MSVC 2017 Debug build, for which we do not have bots. llvm-svn: 306142	2017-06-23 18:58:10 +00:00
Sanjay Patel	3de6bad65f	[x86] fix value types for SBB transform (PR33560) I'm not sure yet why this wouldn't fail in the simple case, but clearly I used the wrong value type with: https://reviews.llvm.org/rL306040 ...and the bug manifests with: https://bugs.llvm.org/show_bug.cgi?id=33560 llvm-svn: 306139	2017-06-23 18:42:15 +00:00
Simon Pilgrim	6e85e92b6c	Remove trailing whitespace. NFCI. llvm-svn: 306121	2017-06-23 16:35:32 +00:00
Rafael Espindola	58173b9720	COFF: Produce an error on invalid pcrel relocs. X86_64 COFF only has support for 32 bit pcrel relocations. Produce an error on all others. Note that gnu as has extended the relocation values to support this. It is not clear if we should support the gnu extension. llvm-svn: 306082	2017-06-23 04:07:44 +00:00
Farhana Aleen	9bd593e0d7	Fixed a (product) build error that was due to an unused variable Details: There was a use but it was in the assert which was not exercised during product build. Reviewers: Andrew Kaylor Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306073	2017-06-22 23:56:31 +00:00
Sanjay Patel	359ae44fb4	[x86] add/sub (X==0) --> sbb(cmp X, 1) This is very similar to the transform in: https://reviews.llvm.org/rL306040 ...but in this case, we use cmp X, 1 to set the carry bit as needed. Again, we can show that all of these are logically equivalent (although InstCombine currently canonicalizes to a form not seen here), and if we believe IACA, then this is the smallest/fastest code. Eg, with SNB: \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1 \| 1.0 \| \| \| \| \| \| \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax The larger motivation is to clean up all select-of-constants combining/lowering because we're missing some common cases. llvm-svn: 306072	2017-06-22 23:47:15 +00:00
Farhana Aleen	4b652a5335	Supported lowerInterleavedStore() in X86InterleavedAccess. Reviewers: RKSimon, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D32658 llvm-svn: 306068	2017-06-22 22:59:04 +00:00
Craig Topper	792fc92be2	[AVX-512] Remove and autoupgrade the masked integer compare intrinsics Summary: These intrinsics aren't used by clang and haven't been for a while. There's some really terrible codegen in the 32-bit target for avx512bw due to i64 not being legal. But as I said these intrinsics aren't used by clang even before this patch so this codegen reflects our clang behavior today. Reviewers: spatel, RKSimon, zvi, igorb Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34389 llvm-svn: 306047	2017-06-22 20:11:01 +00:00
Sanjay Patel	41a34e4111	[x86] add/sub (X==0) --> sbb(neg X) Our handling of select-of-constants is lumpy in IR (https://reviews.llvm.org/D24480), lumpy in DAGCombiner, and lumpy in X86ISelLowering. That's why we only had the 'sbb' codegen in 1 out of the 4 tests. This is a step towards smoothing that out. First, show that all of these IR forms are equivalent: http://rise4fun.com/Alive/mx Second, show that the 'sbb' version is faster/smaller. IACA output for SandyBridge (later Intel and AMD chips are similar based on Agner's tables): This is the "obvious" x86 codegen (what gcc appears to produce currently): \| Num Of \| Ports pressure in cycles \| \| \| Uops \| 0 - DV \| 1 \| 2 - D \| 3 - D \| 4 \| 5 \| \| --------------------------------------------------------------------- \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| test edi, edi \| 1 \| \| \| \| \| \| 1.0 \| CP \| setnz al \| 1 \| \| 1.0 \| \| \| \| \| CP \| neg eax This is the adc version: \| 1* \| \| \| \| \| \| \| \| xor eax, eax \| 1 \| 1.0 \| \| \| \| \| \| CP \| cmp edi, 0x1 \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| adc eax, 0xffffffff And this is sbb: \| 1 \| 1.0 \| \| \| \| \| \| \| neg edi \| 2 \| \| 1.0 \| \| \| \| 1.0 \| CP \| sbb eax, eax If IACA is trustworthy, then sbb became a single uop in Broadwell, so this will be clearly better than the alternatives going forward. llvm-svn: 306040	2017-06-22 18:11:19 +00:00
Rafael Espindola	8a261c2565	Add a common error checking for some invalid expressions. This refactors a bit of duplicated code and fixes an assertion failure on ELF. llvm-svn: 306035	2017-06-22 17:25:35 +00:00
whitequark	cebe8241ca	[X86] Add support for "probe-stack" attribute This commit adds prologue code emission for stack probe function calls. Reviewed By: majnemer Differential Revision: https://reviews.llvm.org/D34387 llvm-svn: 306010	2017-06-22 15:42:53 +00:00
Igor Breger	1c29be7e4f	[GlobalISel][X86] Support vector type G_INSERT legalization/selection. Summary: Support vector type G_INSERT legalization/selection. Split from https://reviews.llvm.org/D33665 Reviewers: qcolombet, t.p.northover, zvi, guyblank Reviewed By: guyblank Subscribers: guyblank, rovka, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D33956 llvm-svn: 305989	2017-06-22 09:43:35 +00:00

1 2 3 4 5 ...

15151 Commits