llvm-project

Commit Graph

Author	SHA1	Message	Date
Mandeep Singh Grang	d104673257	[llvm] Remove redundant return [NFC] Reviewers: davidxl, olista01, Eugene.Zelenko Reviewed By: Eugene.Zelenko Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D39917 llvm-svn: 317995	2017-11-12 03:47:50 +00:00
Craig Topper	ac250825c6	[X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint when avx512vl is enabled. This matches what we do for scalar and 512-bit types. llvm-svn: 317991	2017-11-11 21:44:51 +00:00
Simon Pilgrim	294b87b432	[X86] Attempt to match multiple binary reduction ops at once. NFCI matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729). This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time. Differential Revision: https://reviews.llvm.org/D39726 llvm-svn: 317985	2017-11-11 18:16:55 +00:00
Craig Topper	0ccec70ff5	[X86] Add scalar register class versions of VRNDSCALE instructions and rename the existing versions to _Int. This is consistent with out normal implementation of scalar instructions. While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice. llvm-svn: 317977	2017-11-11 08:24:15 +00:00
Craig Topper	80405076b0	[X86] Inline some SDNode operand multiclass operands that don't vary. NFC llvm-svn: 317975	2017-11-11 08:24:12 +00:00
Craig Topper	4a63843706	[X86] Set the execution domain for VFPCLASS to SSEPackedSingle/Double. llvm-svn: 317974	2017-11-11 06:57:44 +00:00
Craig Topper	1a093934a9	[X86] Set the execution domain for vptest instruction to the integer domain. llvm-svn: 317973	2017-11-11 06:19:12 +00:00
Craig Topper	0eb4a43384	[X86] Correct the execution domain on ROUND/VROUND instructions. llvm-svn: 317968	2017-11-11 02:26:05 +00:00
Craig Topper	bf9b944ea7	[X86] Remove the default for one of the arguments to some tablegen multiclasses. NFC No one ever uses this default and probably shouldn't since it sets the execution domain to generic. llvm-svn: 317967	2017-11-11 02:26:02 +00:00
Krzysztof Parzyszek	e8926438a9	Recommit r317904: [Hexagon] Create HexagonISelDAGToDAG.h, NFC The Windows builder did not reconstruct the HexagonGenDAGISel.inc file after the TableGen binary has changed. llvm-svn: 317921	2017-11-10 20:09:46 +00:00
Konstantin Zhuravlyov	27b0a033d8	AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td Differential Revision: https://reviews.llvm.org/D39880 llvm-svn: 317920	2017-11-10 20:01:58 +00:00
Krzysztof Parzyszek	79dae95f4a	Revert "[Hexagon] Create HexagonISelDAGToDAG.h, NFC" This reverts r317904: broke Windows build. llvm-svn: 317916	2017-11-10 19:27:18 +00:00
Craig Topper	bb001c6ddc	[X86] Merge the template method selectAddrOfGatherScatterNode into selectVectorAddr. NFCI Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method. llvm-svn: 317915	2017-11-10 19:26:04 +00:00
Mandeep Singh Grang	5f043ae2e1	[RISCV] Silence an unused variable warning in release builds [NFC] Summary: Also minor cleanups: 1. Avoided multiple calls to Fixup.getKind() 2. Avoided multiple calls to getFixupKindInfo() 3. Removed a redundant return. Reviewers: asb, apazos Reviewed By: asb Subscribers: rbar, johnrusso, llvm-commits Differential Revision: https://reviews.llvm.org/D39881 llvm-svn: 317908	2017-11-10 19:09:28 +00:00
Krzysztof Parzyszek	89765acc6c	[Hexagon] Create HexagonISelDAGToDAG.h, NFC llvm-svn: 317904	2017-11-10 18:39:45 +00:00
Jonas Paulsson	4b017e682d	[RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints. * The method getRegAllocationHints() is now of bool type instead of void. If true is returned, regalloc (AllocationOrder) will only try to allocate the hints, as opposed to merely trying them before non-hinted registers. * TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with an increase in number of LOCRs. In this case, it is desired to force the hints even though there is a slight increase in spilling, because if a non-hinted register would be allocated, the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR (Load On Condition) SystemZ instruction must have both operands in either the low or high part of the 64 bit register. Reviewers: Quentin Colombet and Ulrich Weigand https://reviews.llvm.org/D36795 llvm-svn: 317879	2017-11-10 08:46:26 +00:00
Craig Topper	1a0da2db5f	[X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C) Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG. llvm-svn: 317878	2017-11-10 08:22:37 +00:00
Yaxun Liu	35845f06a4	[AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment r600 uses dummy pointer info for lowering load/store. Since dummy pointer info assumes address space 0, this causes isel failure when temporary load/store SDNodes are generated for amdgiz environment. Since the offest is not constant, FixedStack pseudo source value cannot be used to create the pointer info. This patch creates pointer info using llvm undef value. At least this provides correct address space so that isel can be done correctly. Differential Revision: https://reviews.llvm.org/D39698 llvm-svn: 317862	2017-11-10 02:03:28 +00:00
Yaxun Liu	920cc2f813	[AMDGPU] Fix pointer info for pseudo source for r600 The pointer info for pseudo source for r600 is not correct when alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39670 llvm-svn: 317861	2017-11-10 01:53:24 +00:00
Ulrich Weigand	d39e9dca1b	[SystemZ] Add support for the "o" inline asm constraint We don't really need any special handling of "offsettable" memory addresses, but since some existing code uses inline asm statements with the "o" constraint, add support for this constraint for compatibility purposes. llvm-svn: 317807	2017-11-09 16:31:57 +00:00
Simon Dardis	c2d3e38ba6	[mips] Correct microMIP's jump and add unconditional branch pseudo Correct the definition of 'j' as being unavailable for microMIPS32R6 and provide the 'b' assembly idiom for codegen purposes for microMIPS32r3. Provide the necessary 'br' pattern for microMIPS32R6 as it now longer incorrectly uses the 'j' instruction. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39741 llvm-svn: 317801	2017-11-09 16:02:18 +00:00
Alex Bradbury	8c345c5aa9	[RISCV] MC layer support for the standard RV32A instruction set extension llvm-svn: 317791	2017-11-09 15:00:03 +00:00
Alex Bradbury	a47514ce3f	[RISCV] MC layer support for the standard RV32M instruction set extension llvm-svn: 317788	2017-11-09 14:46:30 +00:00
Andrew V. Tischenko	f8c75b8794	Sched model improving on btver2: JFPU01 resource, vtestp* for xmm. Differential Revision: https://reviews.llvm.org/D39802 llvm-svn: 317785	2017-11-09 14:19:59 +00:00
Andrew V. Tischenko	3543f0a712	Add -print-schedule scheduling comments to inline asm. Differential Revision: https://reviews.llvm.org/D39728 llvm-svn: 317782	2017-11-09 12:45:40 +00:00
Craig Topper	5bfa5ffe5e	[X86] Give priority to EVEX FMA instructions over FMA4 instructions. No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority. llvm-svn: 317775	2017-11-09 08:26:26 +00:00
Vitaly Buka	bee1964d80	Fix "default label in switch which covers all enumeration values" warning llvm-svn: 317771	2017-11-09 07:46:13 +00:00
Craig Topper	7a6e294a6c	[X86] Make X86ISD::FMADDS3 isel patterns commutable. This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND. llvm-svn: 317769	2017-11-09 06:17:05 +00:00
Marek Olsak	58410f37ff	AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 Summary: Only 56 shaders (out of 48486) are affected. Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs. The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39012 llvm-svn: 317755	2017-11-09 01:52:55 +00:00
Marek Olsak	5cec64195c	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754	2017-11-09 01:52:48 +00:00
Marek Olsak	4c421a2db2	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 Summary: Only 3 (out of 48486) shaders are affected. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38951 llvm-svn: 317753	2017-11-09 01:52:36 +00:00
Marek Olsak	6a0548acaa	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 Summary: -9.9% code size decrease in affected shaders. Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0.35 %) Spilled SGPRs: 8942 -> 8940 (-0.02 %) Code Size: 52940672 -> 51727288 (-2.29 %) bytes Max Waves: 373066 -> 371718 (-0.36 %) Totals from affected shaders: SGPRS: 283520 -> 302704 (6.77 %) VGPRS: 227632 -> 233308 (2.49 %) Spilled SGPRs: 3966 -> 3964 (-0.05 %) Code Size: 12203080 -> 10989696 (-9.94 %) bytes Max Waves: 44070 -> 42722 (-3.06 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38950 llvm-svn: 317752	2017-11-09 01:52:30 +00:00
Marek Olsak	b953cc36e2	AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4 Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads. The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only: SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %) There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size. These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs: SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38949 llvm-svn: 317751	2017-11-09 01:52:23 +00:00
Marek Olsak	ffadcb744b	AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750	2017-11-09 01:52:17 +00:00
Craig Topper	93e27d2ecc	[X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine. r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes. llvm-svn: 317748	2017-11-09 01:06:47 +00:00
Craig Topper	cfd510678f	[X86] X86MaskedGatherSDNode shouldn't inherit from MaskedGatherScatterSDNode The classof implementation in MaskedGatherScatterSDNode doesn't consider X86MaskedGatherSDNode so its misleading. llvm-svn: 317733	2017-11-08 22:26:41 +00:00
Craig Topper	61f81f9637	[X86] Preserve memory refs when folding loads into divides. This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load. llvm-svn: 317732	2017-11-08 22:26:39 +00:00
Craig Topper	55029d811f	[X86] Remove an if check on the result of a cast. NFC cast takes a non-null input and produces a non-null output. So this if can never fail. llvm-svn: 317731	2017-11-08 22:26:37 +00:00
Reid Kleckner	7adb2fdbba	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. llvm-svn: 317726	2017-11-08 21:31:14 +00:00
Alex Bradbury	fa18b9e73c	Set hasSideEffects=0 for PHI and fix affected passes Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred as 1. D37065 sets the previously inferred properties explicitly. This patch sets hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has been updated so it still returns false for PHI. Additionally, HexagonBitSimplify relied on a PHI node having the hasUnmodeledSideEffects property. This patch fixes that assumption. Differential Revision: https://reviews.llvm.org/D37097 llvm-svn: 317721	2017-11-08 20:19:16 +00:00
Craig Topper	78a770402a	[X86] Correct the implementation of BEXTR load folding to use the shift as the parent node and pass a separate root. We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node. To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites. llvm-svn: 317720	2017-11-08 20:17:33 +00:00
Sam Clegg	6368442fb7	[WebAssembly] Update test expectations I believe these were fixed in rL317707 Differential Revision: https://reviews.llvm.org/D39813 llvm-svn: 317718	2017-11-08 20:14:06 +00:00
Craig Topper	e6094f9bd9	[X86] Don't call validateInstruction from MatchAndEmitInstruction when MatchingInlineAsm is set. The MCInst won't be populated. Without this we can't parse gather instructions in ms inline asm blocks. The validateInstruction function was introduced in r316700 to check gather constraints. llvm-svn: 317713	2017-11-08 19:38:48 +00:00
Dan Gohman	0828ba1e1e	[WebAssembly] Call signExtend to get sign extended register Patch by Jatin Bhateja! Differential Revision: https://reviews.llvm.org/D39529 llvm-svn: 317710	2017-11-08 19:24:21 +00:00
Dan Gohman	b465aa0504	[WebAssembly] Revise the strategy for inline asm. Previously, an "r" constraint would mean the compiler provides a value on WebAssembly's operand stack. This was tricky to use properly, particularly since it isn't possible to declare a new local from within an inline asm string. With this patch, "r" provides the value in a WebAssembly local, and the local index is provided to the inline asm string. This requires inline asm to use get_local and set_local to read the register. This does potentially result in larger code size, however inline asm should hopefully be quite rare in WebAssembly. This also means that the "m" constraint can no longer be supported, as WebAssembly has nothing like a "memory operand" that includes an implicit get_local. This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though not for the ELF target). llvm-svn: 317707	2017-11-08 19:18:08 +00:00
Alex Bradbury	a337675cdb	[RISCV] Initial support for function calls Note that this is just enough for simple function call examples to generate working code. Support for varargs etc follows in future patches. Differential Revision: https://reviews.llvm.org/D29936 llvm-svn: 317691	2017-11-08 13:41:21 +00:00
Alex Bradbury	74913e1c70	[RISCV] Codegen for conditional branches A good portion of this patch is the extra functions that needed to be implemented to support the test case. e.g. storeRegToStackSlot, loadRegFromStackSlot, eliminateFrameIndex. Setting ISD::BR_CC to Expand may appear non-obvious on an architecture with branch+cmp instructions. However, I found it much easier to deal with matching the expanded form. I had to change simm13_lsb0 and simm21_lsb0 to inherit from the Operand<OtherVT> class rather than Operand<i32> in order to keep tablegen happy. This isn't a big deal, but it does seem a shame to lose the uniformity across immediate types when there's not an obvious benefit (I'm hoping a tablegen expert will educate me on what I'm missing here!). Differential Revision: https://reviews.llvm.org/D29935 llvm-svn: 317690	2017-11-08 13:31:40 +00:00
Alex Bradbury	ec8aa91305	[RISCV] Codegen support for memory operations on global addresses Differential Revision: https://reviews.llvm.org/D39103 llvm-svn: 317688	2017-11-08 13:24:21 +00:00
Alex Bradbury	cfa6291bb1	[RISCV] Codegen support for memory operations This required the implementation of RISCVTargetInstrInfo::copyPhysReg. Support for lowering global addresses follow in the next patch. Differential Revision: https://reviews.llvm.org/D29934 llvm-svn: 317685	2017-11-08 12:20:01 +00:00
Alex Bradbury	0f0e1b54f0	[RISCV] Codegen support for materializing constants Differential Revision: https://reviews.llvm.org/D39101 llvm-svn: 317684	2017-11-08 12:02:22 +00:00
Simon Dardis	789f7ca265	[mips] Guard indirect and tailcall pseudo instructions correctly. Previously these pseudo instructions were not guarded by ISA, so their select was dependant on the ordering of the entries in the DAG matcher. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39723 llvm-svn: 317681	2017-11-08 11:13:44 +00:00
Alex Bradbury	cc988415fe	[NFCI] Ensure TargetOpcode::* are compatible with guessInstructionProperties=0 rL162640 introduced CodeGenTarget::guessInstructionProperties. If a target sets guessInstructionProperties=0 in its FooInstrInfo, tablegen will error if it has to guess properties from patterns. Unfortunately, guessInstructionProperties=0 can't be used with current upstream LLVM as instructions in the TargetOpcode namespace are always included and sometimes have inferred properties for mayLoad, mayStore, and hasSideEffects. This patch provides the simplest possible fix to this problem, setting default values for these fields in the TargetOpcode scope. There is no intended functional change, as the explicitly set properties should match what was previously inferred. A number of the instructions had hasSideEffects=1 inferred unintentionally. This patch makes it explicit, while future patches (such as D37097) correct the property. Differential Revision: https://reviews.llvm.org/D37065 llvm-svn: 317674	2017-11-08 09:26:06 +00:00
Craig Topper	65e6d0b758	[X86] Add patterns to fold EVEX store with EVEX encoded vcvtps2ph instructions. Remove bad pattern that had vf432 vcvtps2ph storing 128-bits. llvm-svn: 317662	2017-11-08 04:00:31 +00:00
Craig Topper	b832ee68b4	[X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back. Missed store folding opportunities will be fixed in a subsequent commit. llvm-svn: 317661	2017-11-08 04:00:30 +00:00
David Blaikie	3f833edc7c	Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layering This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647	2017-11-08 01:01:31 +00:00
Matt Arsenault	4709ab9124	AMDGPU: Set correct sched model on v_mad_u64_u32 llvm-svn: 317645	2017-11-08 00:48:25 +00:00
Sriraman Tallam	056b3fd6fb	Attribute nonlazybind should not affect calls to functions with hidden visibility. Differential Revision: https://reviews.llvm.org/D39625 llvm-svn: 317639	2017-11-08 00:01:05 +00:00
Justin Lebar	da9e0bd3a2	[NVPTX] Implement __nvvm_atom_add_gen_d builtin. Summary: This just seems to have been an oversight. We already supported the f64 atomic add with an explicit scope (e.g. "cta"), but not the scopeless version. Reviewers: tra Subscribers: jholewinski, sanjoy, cfe-commits, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D39638 llvm-svn: 317623	2017-11-07 22:10:54 +00:00
Graham Yiu	5cd044e8c8	Use new vector insert half-word and byte instructions when we see insertelement on '8 x i16' and '16 x i8' types. Also extended existing lit testcase to cover these cases. Differential Revision: https://reviews.llvm.org/D34630 llvm-svn: 317613	2017-11-07 20:55:43 +00:00
Krzysztof Parzyszek	385a4e0489	[Hexagon] Make a test more flexible in HexagonLoopIdiomRecognition An "or" that sets the sign-bit can be replaced with a "xor", if the sign-bit was known to be clear before. With some changes to instruction combining, the simple sign-bit check was failing. Replace it with a more flexible one to catch more cases. llvm-svn: 317592	2017-11-07 17:05:54 +00:00
Florian Hahn	b936810833	[AArch64][SVE] Asm: Add support for (ADD\|SUB)_ZZZ Patch [5/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39091 llvm-svn: 317591	2017-11-07 16:58:13 +00:00
Florian Hahn	91f11e5ad1	[AArch64][SVE] Asm: Add SVE (Z) Register definitions and parsing support Patch [3/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. To summarise, this patch adds: * SVE register definitions * Methods to parse SVE register operands * Methods to print SVE register operands * RegKind SVEDataVector to distinguish it from other data types like scalar register or Neon vector. * k_SVEDataRegister and SVEDataRegOp to describe SVE registers (which will be extended by further patches with e.g. ElementWidth and the shift-extend type). Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39089 llvm-svn: 317590	2017-11-07 16:45:48 +00:00
Florian Hahn	d825bbdc41	[AArch64][SVE] Asm: Set SVE as unsupported feature for existing scheduler models. Patch [4/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. We add SVE as unsupported feature for CPUs that don't have SVE to prevent errors from scheduler models saying it lacks information for these instructions. Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39090 llvm-svn: 317582	2017-11-07 15:03:11 +00:00
Petar Jovanovic	e2a585dddc	Reland "Correct dwarf unwind information in function epilogue for X86" Reland r317100 with minor fix regarding ComputeCommonTailLength function in BranchFolding.cpp. Skipping top CFI instructions block needs to executed on several more return points in ComputeCommonTailLength(). Original r317100 message: "Correct dwarf unwind information in function epilogue for X86" This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. llvm-svn: 317579	2017-11-07 14:40:27 +00:00
Alexey Bataev	e25a6fd390	[SLP] Fix PR35047: Fix default cost model for cast op in X86. Summary: The cost calculation for default case on X86 target does not always follow correct wayt because of missing 4-th argument in `BaseT::getCastInstrCost()` call. Added this missing parameter. Reviewers: hfinkel, mkuper, RKSimon, spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39687 llvm-svn: 317576	2017-11-07 14:23:44 +00:00
Florian Hahn	c4422247b3	[AArch64][SVE] Asm: Replace 'IsVector' by 'RegKind' in AArch64AsmParser (NFC) Patch [2/5] in a series to add assembler/disassembler support for AArch64 SVE unpredicated ADD/SUB instructions. This change is a non functional change that adds RegKind as an alternative to 'isVector' to prepare it for newer types (SVE data vectors and predicate vectors) that will be added in next patches (where the SVE data vector is added as part of this patch set) Patch by Sander De Smalen. Reviewed by: rengolin Differential Revision: https://reviews.llvm.org/D39088 llvm-svn: 317569	2017-11-07 13:07:50 +00:00
Kristof Beyls	af9814a1fc	[GlobalISel] Enable legalizing non-power-of-2 sized types. This changes the interface of how targets describe how to legalize, see the below description. 1. Interface for targets to describe how to legalize. In GlobalISel, the API in the LegalizerInfo class is the main interface for targets to specify which types are legal for which operations, and what to do to turn illegal type/operation combinations into legal ones. For each operation the type sizes that can be legalized without having to change the size of the type are specified with a call to setAction. This isn't different to how GlobalISel worked before. For example, for a target that supports 32 and 64 bit adds natively: for (auto Ty : {s32, s64}) setAction({G_ADD, 0, s32}, Legal); or for a target that needs a library call for a 32 bit division: setAction({G_SDIV, s32}, Libcall); The main conceptual change to the LegalizerInfo API, is in specifying how to legalize the type sizes for which a change of size is needed. For example, in the above example, how to specify how all types from i1 to i8388607 (apart from s32 and s64 which are legal) need to be legalized and expressed in terms of operations on the available legal sizes (again, i32 and i64 in this case). Before, the implementation only allowed specifying power-of-2-sized types (e.g. setAction({G_ADD, 0, s128}, NarrowScalar). A worse limitation was that if you'd wanted to specify how to legalize all the sized types as allowed by the LLVM-IR LangRef, i1 to i8388607, you'd have to call setAction 8388607-3 times and probably would need a lot of memory to store all of these specifications. Instead, the legalization actions that need to change the size of the type are specified now using a "SizeChangeStrategy". For example: setLegalizeScalarToDifferentSizeStrategy( G_ADD, 0, widenToLargerAndNarrowToLargest); This example indicates that for type sizes for which there is a larger size that can be legalized towards, do it by Widening the size. For example, G_ADD on s17 will be legalized by first doing WidenScalar to make it s32, after which it's legal. The "NarrowToLargest" indicates what to do if there is no larger size that can be legalized towards. E.g. G_ADD on s92 will be legalized by doing NarrowScalar to s64. Another example, taken from the ARM backend is: for (unsigned Op : {G_SDIV, G_UDIV}) { setLegalizeScalarToDifferentSizeStrategy(Op, 0, widenToLargerTypesUnsupportedOtherwise); if (ST.hasDivideInARMMode()) setAction({Op, s32}, Legal); else setAction({Op, s32}, Libcall); } For this example, G_SDIV on s8, on a target without a divide instruction, would be legalized by first doing action (WidenScalar, s32), followed by (Libcall, s32). The same principle is also followed for when the number of vector lanes on vector data types need to be changed, e.g.: setAction({G_ADD, LLT::vector(8, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(16, 8)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(8, 16)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(2, 32)}, LegalizerInfo::Legal); setAction({G_ADD, LLT::vector(4, 32)}, LegalizerInfo::Legal); setLegalizeVectorElementToDifferentSizeStrategy( G_ADD, 0, widenToLargerTypesUnsupportedOtherwise); As currently implemented here, vector types are legalized by first making the vector element size legal, followed by then making the number of lanes legal. The strategy to follow in the first step is set by a call to setLegalizeVectorElementToDifferentSizeStrategy, see example above. The strategy followed in the second step "moreToWiderTypesAndLessToWidest" (see code for its definition), indicating that vectors are widened to more elements so they map to natively supported vector widths, or when there isn't a legal wider vector, split the vector to map it to the widest vector supported. Therefore, for the above specification, some example legalizations are: * getAction({G_ADD, LLT::vector(3, 3)}) returns {WidenScalar, LLT::vector(3, 8)} * getAction({G_ADD, LLT::vector(3, 8)}) then returns {MoreElements, LLT::vector(8, 8)} * getAction({G_ADD, LLT::vector(20, 8)}) returns {FewerElements, LLT::vector(16, 8)} 2. Key implementation aspects. How to legalize a specific (operation, type index, size) tuple is represented by mapping intervals of integers representing a range of size types to an action to take, e.g.: setScalarAction({G_ADD, LLT:scalar(1)}, {{1, WidenScalar}, // bit sizes [ 1, 31[ {32, Legal}, // bit sizes [32, 33[ {33, WidenScalar}, // bit sizes [33, 64[ {64, Legal}, // bit sizes [64, 65[ {65, NarrowScalar} // bit sizes [65, +inf[ }); Please note that most of the code to do the actual lowering of non-power-of-2 sized types is currently missing, this is just trying to make it possible for targets to specify what is legal, and how non-legal types should be legalized. Probably quite a bit of further work is needed in the actual legalizing and the other passes in GlobalISel to support non-power-of-2 sized types. I hope the documentation in LegalizerInfo.h and the examples provided in the various {Target}LegalizerInfo.cpp and LegalizerInfoTest.cpp explains well enough how this is meant to be used. This drops the need for LLT::{half,double}...Size(). Differential Revision: https://reviews.llvm.org/D30529 llvm-svn: 317560	2017-11-07 10:34:34 +00:00
Bjorn Steinbrink	c02b237e46	[X86] Don't clobber reserved registers with stack adjustments Summary: Calls using invoke in funclet based functions are assumed to clobber all registers, which causes the stack adjustment using pops to consider all registers not defined by the call to be undefined, which can unfortunately include the base pointer, if one is needed. To prevent this (and possibly other hazards), skip reserved registers when looking for candidate registers. This fixes issue #45034 in the Rust compiler. Reviewers: mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39636 llvm-svn: 317551	2017-11-07 08:50:21 +00:00
Craig Topper	e7fb300226	[X86] Add patterns to fold a 64-bit load into the EVEX vcvtph2ps instructions. llvm-svn: 317548	2017-11-07 07:13:07 +00:00
Craig Topper	0231b1d445	[X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics. Disable the peephole pass to prove that the pattern is working. llvm-svn: 317547	2017-11-07 07:13:06 +00:00
Craig Topper	cf8e6d0a76	[X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544	2017-11-07 07:13:03 +00:00
Craig Topper	afc3c8206e	[X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a COPY_TO_REGCLASS. ExeDepsFix pass should take care of making the registers match. llvm-svn: 317542	2017-11-07 04:44:22 +00:00
Craig Topper	4ad81b51ed	[X86] Remove 'Requires' from instructions with no patterns. NFC llvm-svn: 317541	2017-11-07 04:44:21 +00:00
Matt Arsenault	6119f80034	AMDGPU: Remove redundant combine This combine was already done in two places. The generic combiner already has done this since r217610, for adds (with a single use). This one was added in r303641, and added support for handling or as well. r313251 later added support to the generic combine for or. It also turns out the isOrEquivalentToAdd check is not necessary for this combine. Additionally, we already reproduce this combine in yet another place in the backend, although in that version multiple uses of the add are still folded if it will allow a fold into the addressing mode. That version needs to be improved to understand ors though, as well as the correct legal offsets for private. llvm-svn: 317526	2017-11-07 00:06:32 +00:00
Craig Topper	428a4e6374	[X86] Make FeatureAVX512 imply FeatureF16C. The EVEX to VEX pass is already assuming this is true under AVX512VL. We had special patterns to use zmm instructions if VLX and F16C weren't available. Instead just make AVX512 imply F16C to make the EVEX to VEX behavior explicitly legal and remove the extra patterns. All known CPUs with AVX512 have F16C so this should safe for now. llvm-svn: 317521	2017-11-06 22:49:04 +00:00
Craig Topper	cb6c38612e	[X86] Make FeatureAVX512 imply FeatureFMA. Previously our VEX patterns were checking Subtarget.hasFMA() which checked FMA \|\| AVX512. So we were behaving as if AVX512 implied it anyway. Which means we'd allow VEX encoded 128/256 FMA when AVX512F was enabled but AVX512VL is off. Regardless of the FMA flag. EVEX to VEX also transforms scalar EVEX FMA instructions to their VEX versions even without the FMA flag. Similarly for 128/256 under AVX512VL. So this makes AVX512 imply FeatureFMA to make our current behavior explicit. All known CPUs that support AVX512 have VEX FMA instructions. llvm-svn: 317520	2017-11-06 22:49:01 +00:00
Graham Yiu	52a52a6cab	Fix buildbot breakages from r317503. Add parentheses to assignment when using result as a condition. llvm-svn: 317508	2017-11-06 21:04:19 +00:00
Graham Yiu	030621bbcb	Adds code to PPC ISEL lowering to recognize byte inserts from vector_shuffles, and use P9 shift and vector insert byte instructions instead of vperm. Extends tests from vector insert half-word. Differential Revision: https://reviews.llvm.org/D34497 llvm-svn: 317503	2017-11-06 20:18:30 +00:00
Guozhi Wei	e3b8d9a312	[PPC] Use xxbrd to speed up bswap64 Power doesn't have bswap instructions, so llvm generates following code sequence for bswap64. rotldi 5, 3, 16 rotldi 4, 3, 8 rotldi 9, 3, 24 rotldi 10, 3, 32 rotldi 11, 3, 48 rotldi 12, 3, 56 rldimi 4, 5, 8, 48 rldimi 4, 9, 16, 40 rldimi 4, 10, 24, 32 rldimi 4, 11, 40, 16 rldimi 4, 12, 48, 8 rldimi 4, 3, 56, 0 But Power9 has vector bswap instructions, they can also be used to speed up scalar bswap intrinsic. With this patch, bswap64 can be translated to: mtvsrdd 34, 3, 3 xxbrd 34, 34 mfvsrld 3, 34 Differential Revision: https://reviews.llvm.org/D39510 llvm-svn: 317499	2017-11-06 19:09:38 +00:00
Matt Arsenault	4f6318fe1b	AMDGPU: Select v_mad_u64_u32 and v_mad_i64_i32 llvm-svn: 317492	2017-11-06 17:04:37 +00:00
Sanjay Patel	629c411538	[IR] redefine 'UnsafeAlgebra' / 'reassoc' fast-math-flags and add 'trans' fast-math-flag As discussed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html and again more recently: http://lists.llvm.org/pipermail/llvm-dev/2017-October/118118.html ...this is a step in cleaning up our fast-math-flags implementation in IR to better match the capabilities of both clang's user-visible flags and the backend's flags for SDNode. As proposed in the above threads, we're replacing the 'UnsafeAlgebra' bit (which had the 'umbrella' meaning that all flags are set) with a new bit that only applies to algebraic reassociation - 'AllowReassoc'. We're also adding a bit to allow approximations for library functions called 'ApproxFunc' (this was initially proposed as 'libm' or similar). ...and we're out of bits. 7 bits ought to be enough for anyone, right? :) FWIW, I did look at getting this out of SubclassOptionalData via SubclassData (spacious 16-bits), but that's apparently already used for other purposes. Also, I don't think we can just add a field to FPMathOperator because Operator is not intended to be instantiated. We'll defer movement of FMF to another day. We keep the 'fast' keyword. I thought about removing that, but seeing IR like this: %f.fast = fadd reassoc nnan ninf nsz arcp contract afn float %op1, %op2 ...made me think we want to keep the shortcut synonym. Finally, this change is binary incompatible with existing IR as seen in the compatibility tests. This statement: "Newer releases can ignore features from older releases, but they cannot miscompile them. For example, if nsw is ever replaced with something else, dropping it would be a valid way to upgrade the IR." ( http://llvm.org/docs/DeveloperPolicy.html#ir-backwards-compatibility ) ...provides the flexibility we want to make this change without requiring a new IR version. Ie, we're not loosening the FP strictness of existing IR. At worst, we will fail to optimize some previously 'fast' code because it's no longer recognized as 'fast'. This should get fixed as we audit/squash all of the uses of 'isFast()'. Note: an inter-dependent clang commit to use the new API name should closely follow commit. Differential Revision: https://reviews.llvm.org/D39304 llvm-svn: 317488	2017-11-06 16:27:15 +00:00
Simon Pilgrim	ad9b9720e8	[X86][SSE] Merge combineExtractVectorElt_SSE into combineExtractVectorElt. NFCI. We still early-out for X86ISD::PEXTRW/X86ISD::PEXTRB so no actual change in behaviour, but it'll make it easier to add support in a future patch. llvm-svn: 317485	2017-11-06 15:28:25 +00:00
Simon Pilgrim	14450720e6	[X86][SSE] Combine EXTRACT_VECTOR_ELT with combineExtractWithShuffle before XFormVExtractWithShuffleIntoLoad combineExtractWithShuffle can handle more complex shuffles/bitcasts than we can with the equivalent code in XFormVExtractWithShuffleIntoLoad. Mainly a compile time improvement now (combineExtractWithShuffle combines will have always failed late on inside XFormVExtractWithShuffleIntoLoad), and will let us merge combineExtractVectorElt_SSE in a future commit. llvm-svn: 317481	2017-11-06 14:34:19 +00:00
Yaxun Liu	cc56a8b108	[AMDGPU] Change alloca addr space of r600 to 5 for amdgiz environment Differential Revision: https://reviews.llvm.org/D39657 llvm-svn: 317479	2017-11-06 14:32:33 +00:00
Jonas Paulsson	e54cc1a436	[SystemZ] implement hasDivRemOp() SystemZ can do division and remainder in a single instruction for scalar integer types, which are now reflected by returning true in this hook for those cases. Review: Ulrich Weigand llvm-svn: 317477	2017-11-06 13:10:31 +00:00
Yaxun Liu	1ac16619d2	[AMDGPU] Fix assertion due to assuming pointer in default addr space is 32 bit The backend assumes pointer in default addr space is 32 bit, which is not true for the new addr space mapping and causes assertion for unresolved functions. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39643 llvm-svn: 317476	2017-11-06 13:01:33 +00:00
Simon Dardis	169df4e24b	[mips] Add movep for microMIPS32R6 and fix microMIPS32r3 version Previously, the 'movep' instruction was defined for microMIPS32r3 and shared that definition with microMIPS32R6. 'movep' was re-encoded for microMIPS32r6, so this patch provides the correct encoding. Secondly, correct the encoding of the 'rs' and 'rt' operands which have an instruction specific encoding for the registers those operands accept. Finally, correct the decoding of the 'dst_regs' operand which was extracting the relevant field from the instruction, but was actually extracting the field from the alreadly extracted field. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39495 llvm-svn: 317475	2017-11-06 12:59:53 +00:00
Mohammed Agabaria	6691758364	[LV][X86] update the cost of interleaving mem. access of floats Recommit: This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. fixed the location of the lit test it works with make check-all. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317471	2017-11-06 10:56:20 +00:00
Simon Dardis	e57795384c	[mips] Fix PR35140 Mark all symbols involved with TLS relocations as being TLS symbols. This resolves PR35140. Thanks to Alex Crichton for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39591 llvm-svn: 317470	2017-11-06 10:50:04 +00:00
Uriel Korach	bb86686a8b	[X86][AVX512] Improve lowering of AVX512 test intrinsics Added TESTM and TESTNM to the list of instructions that already zeroing unused upper bits and does not need the redundant shift left and shift right instructions afterwards. Added a pattern for TESTM and TESTNM in iselLowering, so now icmp(neq,and(X,Y), 0) goes folds into TESTM and icmp(eq,and(X,Y), 0) goes folds into TESTNM This commit is a preparation for lowering the test and testn X86 intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38732 llvm-svn: 317465	2017-11-06 09:22:38 +00:00
Uriel Korach	eb47d95d52	[X86] Replace duplicate function call with variable. NFC Change from: if (N->getOperand(0).getValueType() == MVT::v8i32 \|\| N->getOperand(0).getValueType() == MVT::v8f32) to: EVT OpVT = N->getOperand(0).getValueType(); if (OpVT == MVT::v8i32 \|\| OpVT == MVT::v8f32) Change-Id: I5a105f8710b73a828e6cfcd55fac2eae6153ce25 llvm-svn: 317464	2017-11-06 08:32:45 +00:00
Zvi Rackover	3122698040	X86 ISel: Basic support for variable-index vector permutations Summary: Try to lower a BUILD_VECTOR composed of extract-extract chains that can be reasoned to be a permutation of a vector by indices in a non-constant vector. We saw this pattern created by ISPC, which resolts to creating it due to the requirement that shufflevector's mask operand be a constant vector. I didn't check this but we could possibly use this pattern for lowering the X86 permute C-instrinsics instead of llvm.x86 instrinsics. This change can be followed by more improvements: 1. Handle vectors with undef elements. 2. Utilize pshufb and zero-mask-blending to support more effiecient construction of vectors with constant-0 elements. 3. Use smaller-element vectors of same width, and "interpolate" the indices, when no native operation available. Reviewers: RKSimon, craig.topper Reviewed By: RKSimon Subscribers: chandlerc, DavidKreitzer Differential Revision: https://reviews.llvm.org/D39126 llvm-svn: 317463	2017-11-06 08:25:46 +00:00
Jina Nahias	3844f1ad5c	Revert "adding a pattern for broadcastm" This reverts commit r317457. Change-Id: If07f1fca1e3453d16c1dac906e87768661384e91 llvm-svn: 317462	2017-11-06 07:48:58 +00:00
Jina Nahias	7b705f1f91	[x86][AVX512] Lowering Broadcastm intrinsics to LLVM IR This patch, together with a matching clang patch (https://reviews.llvm.org/D38683), implements the lowering of X86 broadcastm intrinsics to IR. Differential Revision: https://reviews.llvm.org/D38684 Change-Id: I709ac0b34641095397e994c8ff7e15d1315b3540 llvm-svn: 317458	2017-11-06 07:09:24 +00:00
Jina Nahias	9c6561b648	adding a pattern for broadcastm Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 317457	2017-11-06 07:09:09 +00:00
Craig Topper	70eaeae7f0	[X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible. llvm-svn: 317454	2017-11-06 05:48:26 +00:00
Craig Topper	07dac55d95	[X86] Add scalar FMA ISD nodes without rounding mode. NFC Next step is to use them for the legacy FMA scalar intrinsics as well. This will enable the legacy intrinsics to use EVEX encoded opcodes and the extended registers. llvm-svn: 317453	2017-11-06 05:48:25 +00:00
Craig Topper	eff606cc0e	[X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics. Fixes PR35161. llvm-svn: 317445	2017-11-06 04:04:01 +00:00
Craig Topper	d6471cb934	[X86] Add missing predicate to a pattern. NFC Other patterns had higher priority so this wasn't noticed. But we shouldn't be dependent on pattern order. llvm-svn: 317442	2017-11-05 21:14:06 +00:00
Craig Topper	4e2f53511a	[X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I missed in r317413. llvm-svn: 317441	2017-11-05 21:14:05 +00:00
Craig Topper	948c39c480	[X86] Fix outdated comment. NFC llvm-svn: 317440	2017-11-05 21:14:04 +00:00
Mohammed Agabaria	acd69dbc7c	[REVERT][LV][X86] update the cost of interleaving mem. access of floats reverted my changes will be committed later after fixing the failure This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317433	2017-11-05 09:36:54 +00:00
Mohammed Agabaria	f74c767de6	[LV][X86] update the cost of interleaving mem. access of floats This patch contains update of the costs of interleaved loads of v8f32 of stride 3 and 8. Differential Revision: https://reviews.llvm.org/D39403 llvm-svn: 317432	2017-11-05 09:06:23 +00:00
Craig Topper	692c8efe30	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413	2017-11-04 18:26:41 +00:00
Craig Topper	e5d44cefea	[X86] Teach EVEX->VEX pass to turn SHUFI32X4/SHUFF32X4/SHUFI64X/SHUFF64X2 into VPERM2F128/VPERM2I128. This recovers some of the tests that were changed by r317403. llvm-svn: 317410	2017-11-04 18:10:03 +00:00
Yaxun Liu	0d9673cff2	[AMDGPU] Remove hardcoded address space value from AMDGPULibFunc AMDGPULibFunc hardcodes address space values of the old address space mapping, which causes invalid addrspacecast instructions and undefined functions in APPSDK sample MonteCarloAsianDP. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39616 llvm-svn: 317409	2017-11-04 17:37:43 +00:00
Craig Topper	a96d62b360	[X86] Teach shuffle lowering to use 256-bit SHUF128 when possible. This allows masked operations to be used and allows the register allocator to use YMM16-31 if necessary. As a follow up I'll look into teaching EVEX->VEX how to turn this back into PERM2X128 if any of the additional features don't work out. llvm-svn: 317403	2017-11-04 06:44:47 +00:00
Craig Topper	d21a53f246	[X86] Give unary PERMI priority over SHUF128 in lowerV8I64VectorShuffle to make it possible to fold a load. llvm-svn: 317382	2017-11-03 22:48:13 +00:00
David Blaikie	1be62f0327	Move TargetFrameLowering.h to CodeGen where it's implemented This header already includes a CodeGen header and is implemented in lib/CodeGen, so move the header there to match. This fixes a link error with modular codegeneration builds - where a header and its implementation are circularly dependent and so need to be in the same library, not split between two like this. llvm-svn: 317379	2017-11-03 22:32:11 +00:00
Aaron Ballman	ecf0e95267	Add llvm::for_each as a range-based extensions to <algorithm> and make use of it in some cases where it is a more clear alternative to std::for_each. llvm-svn: 317356	2017-11-03 20:01:25 +00:00
Evandro Menezes	9dcf099944	[AArch64] Fix the number of iterations for the Newton series The number of iterations was incorrectly determined for DP FP vector types and the tests were insufficient to flag this issue. Differential revision: https://reviews.llvm.org/D39507 llvm-svn: 317349	2017-11-03 18:56:36 +00:00
Simon Dardis	d3b9f61c52	[mips] Match 'ins' and its' variants with C++ code Change the ISel matching of 'ins', 'dins[mu]' from tablegen code to C++ code. This resolves an issue where ISel would select 'dins' instead of 'dinsm' when the instructions size and position were individually in range but their sum was out of range according to the ISA specification. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39117 llvm-svn: 317331	2017-11-03 15:35:13 +00:00
Andrew V. Tischenko	0916c6b654	Fix for Bug 34475 - LOCK/REP/REPNE prefixes emitted as instruction on their own. Differential Revision: https://reviews.llvm.org/D39546 llvm-svn: 317330	2017-11-03 15:25:13 +00:00
Simon Pilgrim	ae1f013495	[X86][SSE] Add PACKUS support to combineVectorTruncation Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317315	2017-11-03 11:33:48 +00:00
Diana Picus	acf4bf21ab	[ARM GlobalISel] Move the check for Thumb higher up We're currently bailing out for Thumb targets while lowering formal parameters, but there used to be some other checks before it, which could've caused some functions (e.g. those without formal parameters) to sneak through unnoticed. llvm-svn: 317312	2017-11-03 10:30:12 +00:00
Martin Storsjo	9befcd7d8d	[AArch64] Use dwarf exception handling on MinGW Ideally we should probably produce WinEH here as well, but until then, we can use dwarf exceptions, without any further changes required in clang, libunwind or libcxxabi. Differential Revision: https://reviews.llvm.org/D39535 llvm-svn: 317304	2017-11-03 07:33:20 +00:00
Craig Topper	333897ec31	[X86] Remove PALIGNR/VALIGN handling from combineBitcastForMaskedOp and move to isel patterns instead. Prefer 128-bit VALIGND/VALIGNQ over PALIGNR during lowering when possible. llvm-svn: 317299	2017-11-03 06:48:02 +00:00
Sriraman Tallam	7cdb10f1aa	Avoid PLT for external calls when attribute nonlazybind is used. Differential Revision: https://reviews.llvm.org/D39065 llvm-svn: 317292	2017-11-03 00:10:19 +00:00
Quentin Colombet	b6afac1f9a	[AArch64][RegisterBankInfo] Add mapping for G_FPEXT. This fixes http://llvm.org/PR32560. We were missing a description for half floating point type and as a result were using the FPR 32 mapping. Because of the size mismatch the generic code was complaining that the default mapping is not appropriate. Fix the mapping description so that the default mapping can be properly applied. llvm-svn: 317287	2017-11-02 23:38:19 +00:00
Quentin Colombet	619d649878	[AArch64][RegisterBankInfo] Add FPR16 support in value mapping. NFC. llvm-svn: 317286	2017-11-02 23:38:13 +00:00
Craig Topper	086c04c8a7	[X86] Give AVX512VL instructions priority over their AVX equivalents. I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283	2017-11-02 23:23:37 +00:00
Konstantin Zhuravlyov	275a4f76c4	AMDGPU: Fix warning discovered by r317266 [-Wunused-private-field] llvm-svn: 317280	2017-11-02 22:35:22 +00:00
Krzysztof Parzyszek	058014fca5	[Hexagon] Prefer L2_loadrub_io over L4_loadrub_rr If the offset is an immediate, avoid putting it in a register to get Rs+Rt<<#0. llvm-svn: 317275	2017-11-02 21:56:59 +00:00
Konstantin Zhuravlyov	b695cd41b3	AMDGPU: Remove outdated fixme (it was already fixed) llvm-svn: 317266	2017-11-02 20:48:06 +00:00
Simon Dardis	725acb2d91	[mips] Use register scavenging with MSA. MSA stores and loads to the stack are more likely to require an emergency GPR spill slot due to the smaller offsets available with those instructions. Handle this by overestimating the size of the stack by determining the largest offset presuming that all callee save registers are spilled and accounting of incoming arguments when determining whether an emergency spill slot is required. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39056 llvm-svn: 317204	2017-11-02 12:47:22 +00:00
Sam Parker	242052c6b4	[ARM] and, or, xor and add with shl combine The generic dag combiner will fold: (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) (shl (or x, c1), c2) -> (or (shl x, c2), c1 << c2) This can create constants which are too large to use as an immediate. Many ALU operations are also able of performing the shl, so we can unfold the transformation to prevent a mov imm instruction from being generated. Other patterns, such as b + ((a << 1) \| 510), can also be simplified in the same manner. Differential Revision: https://reviews.llvm.org/D38084 llvm-svn: 317197	2017-11-02 10:43:10 +00:00
Andrew V. Tischenko	3c8bf5ec37	The patch updates sched numbers for YMM AVX instrs such as VMOVx, VORx, VXOR, VPERMILx, VBROADCASTx, etc. PR32857 should be closed. Differential Revision: https://reviews.llvm.org/D39227 llvm-svn: 317196	2017-11-02 10:33:41 +00:00
Petar Jovanovic	bb5c84fb57	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317100 as it introduced sanitizer-x86_64-linux-autoconf buildbot failure (build #15606). llvm-svn: 317136	2017-11-01 23:05:52 +00:00
Craig Topper	3837322a6b	[X86] Use foreach in X86.td to combine some of the CPU names that are obviously aliases. NFC llvm-svn: 317134	2017-11-01 22:15:49 +00:00
Craig Topper	7a754c4622	[X86] Add CMOV feature to 'i686' processor, making it a proper alias of pentiumpro which I believe it should be. This is consistent with current gcc behavior. llvm-svn: 317133	2017-11-01 22:15:40 +00:00
Simon Pilgrim	e152c2c447	[X86][SSE] Add PACKUS support to LowerTruncate Similar to the existing code to lower to PACKSS, we can use PACKUS if the input vector's leading zero bits extend all the way to the packed/truncated value. We have to account for pre-SSE41 targets not supporting PACKUSDW llvm-svn: 317128	2017-11-01 21:52:29 +00:00
Craig Topper	4e56ba271e	[X86] Add custom code to EVEX to VEX pass to turn unmasked 128-bit VPALIGND/Q into VPALIGNR if the extended registers aren't being used. This will enable us to prefer VALIGND/Q during shuffle lowering in order to get the extended register encoding space when BWI isn't available. But if we end up not using the extended registers we can switch VPALIGNR for the shorter VEX encoding. Differential Revision: https://reviews.llvm.org/D39401 llvm-svn: 317122	2017-11-01 21:00:59 +00:00
Konstantin Zhuravlyov	435151ad75	AMDGPU: Fix set but not used warnings related to AMDGPUAS Differential Revision: https://reviews.llvm.org/D39499 llvm-svn: 317114	2017-11-01 19:12:38 +00:00
Craig Topper	ca1aa83cbe	[X86] Prevent fast isel from folding loads into the instructions listed in hasPartialRegUpdate. This patch moves the check for opt size and hasPartialRegUpdate into the lower level implementation of foldMemoryOperandImpl to catch the entry point that fast isel uses. We're still folding undef register instructions in AVX that we should also probably disable, but that's a problem for another patch. Unfortunately, this requires reordering a bunch of functions which is why the diff is so large. I can do the function reordering separately if we want. Differential Revision: https://reviews.llvm.org/D39402 llvm-svn: 317112	2017-11-01 18:10:06 +00:00
Graham Yiu	671526148c	Adds code to PPC ISEL lowering to recognize half-word inserts from vector_shuffles, and use P9 shift and vector insert instructions instead of vperm. Differential Revision: https://reviews.llvm.org/D34160 llvm-svn: 317111	2017-11-01 18:06:56 +00:00
Craig Topper	5ae677e102	[X86] Add 64-bit int to float/double conversion with AVX to X86FastISel::X86SelectSIToFP Summary: [X86] Teach fast isel to handle i64 sitofp with AVX. For some reason we only handled i32 sitofp with AVX. But with SSE only we support i64 so we should do the same with AVX. Also add i686 command lines for the 32-bit tests. 64-bit tests are in a separate file to avoid a fast-isel abort failure in 32-bit mode. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39450 llvm-svn: 317102	2017-11-01 16:23:06 +00:00
Andrew V. Tischenko	3d971e39f8	Update VCVTx, VMOVNTPx and VROUNDYPx instructions scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39059 llvm-svn: 317101	2017-11-01 16:10:20 +00:00
Petar Jovanovic	f2faee92aa	Correct dwarf unwind information in function epilogue for X86 This patch aims to provide correct dwarf unwind information in function epilogue for X86. It consists of two parts. The first part inserts CFI instructions that set appropriate cfa offset and cfa register in emitEpilogue() in X86FrameLowering. This part is X86 specific. The second part is platform independent and ensures that: - CFI instructions do not affect code generation - Unwind information remains correct when a function is modified by different passes. This is done in a late pass by analyzing information about cfa offset and cfa register in BBs and inserting additional CFI directives where necessary. Changed CFI instructions so that they: - are duplicable - are not counted as instructions when tail duplicating or tail merging - can be compared as equal Added CFIInstrInserter pass: - analyzes each basic block to determine cfa offset and register valid at its entry and exit - verifies that outgoing cfa offset and register of predecessor blocks match incoming values of their successors - inserts additional CFI directives at basic block beginning to correct the rule for calculating CFA Having CFI instructions in function epilogue can cause incorrect CFA calculation rule for some basic blocks. This can happen if, due to basic block reordering, or the existence of multiple epilogue blocks, some of the blocks have wrong cfa offset and register values set by the epilogue block above them. CFIInstrInserter is currently run only on X86, but can be used by any target that implements support for adding CFI instructions in epilogue. Patch by Violeta Vukobrat. Differential Revision: https://reviews.llvm.org/D35844 llvm-svn: 317100	2017-11-01 16:04:11 +00:00
Simon Pilgrim	778810eb42	[X86][SSE] Begun generalizing truncateVectorWithPACKSS to work with PACKSS/PACKUS functions Renamed to truncateVectorWithPACK llvm-svn: 317098	2017-11-01 15:31:51 +00:00
Roger Ferrer Ibanez	9dfbc10522	Revert r313618 "[ARM] Use ADDCARRY / SUBCARRY" That change causes PR35103, so reverting until I figure it out. llvm-svn: 317092	2017-11-01 14:06:57 +00:00
NAKAMURA Takumi	1657f2ad99	Fix warnings discovered by rL317076. [-Wunused-private-field] llvm-svn: 317091	2017-11-01 13:47:55 +00:00
NAKAMURA Takumi	f7d7a59b9e	Suppress a warning discovered by rL317076. [-Wunused-private-field] llvm-svn: 317090	2017-11-01 13:47:51 +00:00
Simon Pilgrim	f657ba0cb6	[X86][SSE] Truncate with PACKSS any input with sufficient sign-bits So far we've only been using PACKSS truncations with 'all-bits or zero-bits' patterns (vector comparison results etc.). When really we can safely use it for any case as long as the number of sign bits reach down to the last 16-bits (or 8-bits if we're truncating to bytes). The next steps after this is add the equivalent support for PACKUS and to support packing to sub-128 bit vectors for truncating stores etc. Differential Revision: https://reviews.llvm.org/D39476 llvm-svn: 317086	2017-11-01 11:47:44 +00:00
Craig Topper	688f0ca6a7	[X86] Add more type qualifiers to INSERT_SUBREG operations in rotate patterns so they don't get created with a v64i8 type. Not sure why tablegen didn't error on this. Fixes PR35158. llvm-svn: 317079	2017-11-01 07:11:32 +00:00
Craig Topper	a827f84dcc	[X86] Add AVX512 support to X86FastISel::fastMaterializeFloatZero. llvm-svn: 317059	2017-11-01 00:47:45 +00:00
Benjamin Kramer	f9ab3ddb8f	[AMDGPU] Clean up symbols in the global namespace. llvm-svn: 317051	2017-10-31 23:21:30 +00:00
Marek Olsak	5914ece6aa	AMDGPU: Select s_buffer_load_dword with a non-constant SGPR offset Summary: Apps that benefit: - alien isolation - bioshock infinite - civilization: beyond earth - company of heroes 2 - dirt showdown - dota 2 - F1 2015 - grid autosport - hitman - legend of grimrock - serious sam 3: bfe - shadow warrior - talos principle - total war: warhammer - UE4 demos: effects cave, elemental, sun temple Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38914 llvm-svn: 317038	2017-10-31 21:06:42 +00:00
Reid Kleckner	39970069b1	[X86][AsmParser] Treat '%' as the modulo operator under Intel syntax It can't be a register prefix, anyway. This is consistent with the masm docs on MSDN: https://msdn.microsoft.com/en-us/library/t4ax90d2.aspx This is a straight-forward extension of our support for "MOD" implemented in https://reviews.llvm.org/D33876 / r306425 llvm-svn: 317011	2017-10-31 16:47:38 +00:00
Simon Pilgrim	f3c33ca83e	[X86][SSE] Add VSRLI/VSRAI/VSLLI demanded elts support to computeKnownBits/ComputeNumSignBits Mainly a perf improvements as most combines will have occurred before we lower to these instructions llvm-svn: 317005	2017-10-31 16:06:21 +00:00
Michael Zuckerman	9e58831cb8	[AVX512] Adding new patterns for extract_subvector of vXi1 extract subvector of vXi1 from vYi1 is poorly supported by LLVM and most of the time end with an assertion. This patch fixes this issue by adding new patterns to the TD file. Reviewers: 1. guyblank 2. igorb 3. zvi 4. ayman 5. craig.topper Differential Revision: https://reviews.llvm.org/D39292 Change-Id: Ideb4d7e946c8d40cfce2920891f2d89fe64c58f8 llvm-svn: 316981	2017-10-31 10:00:19 +00:00
Craig Topper	beed653135	[X86] Make AVX512_512_SET0 XMM16-31 lower to 128-bit XOR when AVX512VL is enabled. Use 128-bit VLX instruction when VLX is enabled. Unfortunately, this weakens our ability to do domain fixing when AVX512DQ is not enabled, but it is consistent with our 256-bit behavior. Maybe we should add custom handling to domain fixing to allow EVEX integer XOR/AND/OR/ANDN to switch to VEX encoded fp instructions if the high registers aren't being used? llvm-svn: 316978	2017-10-31 06:01:04 +00:00
Craig Topper	668b1ab6f1	[X86] Clang-format some code. NFC llvm-svn: 316973	2017-10-31 02:34:29 +00:00
Javed Absar	d13d419d4a	[AArch64]: range loopify frame-lowering llvm-svn: 316960	2017-10-30 22:00:06 +00:00
Craig Topper	9f01f6093c	[X86] Add AVX512 support to fast isel's X86ChooseCmpOpcode. llvm-svn: 316955	2017-10-30 21:09:19 +00:00
Stefan Pintilie	6262fd4b0a	Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat" Revert r316478. A test case has failed. Will recommit this change once we find and fix the failure. This reverts commit 7c330fabaedaba3d02c58bc3cc1198896c895f34. llvm-svn: 316952	2017-10-30 19:55:38 +00:00
Jina Nahias	5bf6620b15	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I71c8605a8e4c98013ef25289694afc5cfd46bb0b llvm-svn: 316921	2017-10-30 16:37:28 +00:00
Rafael Espindola	6f36637be0	Move isDSOLocal check and add a comment. llvm-svn: 316920	2017-10-30 16:32:31 +00:00
Fangrui Song	2696db90d1	[PPC CodeGen] Fix the bitreverse.i64 intrinsic. Summary: The two 32-bit words were swapped. Update a test omitted in reverted r316270. Reviewers: jtony, aaron.ballman Subscribers: nemanjai, kbarton Differential Revision: https://reviews.llvm.org/D39163 llvm-svn: 316916	2017-10-30 16:03:44 +00:00
Craig Topper	4e13d4de52	[X86] Make sure we don't create locked inc/dec instructions when the carry flag is being used. Summary: INC/DEC don't update the carry flag so we need to make sure we don't try to use it. This patch introduces new X86ISD opcodes for locked INC/DEC. Teaches lowerAtomicArithWithLOCK to emit these nodes if INC/DEC is not slow or the function is being optimized for size. An additional flag is added that allows the INC/DEC to be disabled if the caller determines that the carry flag is being requested. The test_sub_1_cmp_1_setcc_ugt test is currently showing this bug. The other test case changes are recovering cases that were regressed in r316860. This should fully fix PR35068 finishing the fix started in r316860. Reviewers: RKSimon, zvi, spatel Reviewed By: zvi Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39411 llvm-svn: 316913	2017-10-30 14:51:37 +00:00
Craig Topper	367cc12fa9	[X86] Remove AVX512 early out from X86FastISel::X86SelectCmp. This shouldn't be needed anymore since i1 isn't a legal type. llvm-svn: 316912	2017-10-30 14:50:11 +00:00
Yaxun Liu	c928f2a6d4	[AMDGPU] Emit metadata for hidden arguments for kernel enqueue Identifies kernels which performs device side kernel enqueues and emit metadata for the associated hidden kernel arguments. Such kernels are marked with calls-enqueue-kernel function attribute by AMDGPUOpenCLEnqueueKernelLowering pass and later on hidden kernel arguments metadata HiddenDefaultQueue and HiddenCompletionAction are emitted for them. Differential Revision: https://reviews.llvm.org/D39255 llvm-svn: 316907	2017-10-30 14:30:28 +00:00
Clement Courbet	b2c3eb8cf1	[CodeGen][ExpandMemcmp] Allow memcmp to expand to vector loads (2). - Targets that want to support memcmp expansions now return the list of supported load sizes. - Expansion codegen does not assume that all power-of-two load sizes smaller than the max load size are valid. For examples, this is not the case for x86(32bit)+sse2. Fixes PR34887. llvm-svn: 316905	2017-10-30 14:19:33 +00:00
Krzysztof Parzyszek	bef1c56724	[Hexagon] Allow the RDF optimizations to be run in .mir testcases llvm-svn: 316904	2017-10-30 14:11:52 +00:00
Javed Absar	5cde1ccb29	[GlobalISel\|ARM] : Allow legalizing G_FSUB Adding support for VSUB. Reviewed by: @rovka Differential Revision: https://reviews.llvm.org/D39261 llvm-svn: 316902	2017-10-30 13:51:56 +00:00
Andrew V. Tischenko	f94da596a7	Invalid used of 'w' suffix on push and pop using 64-bit register. Differential Revision: https://reviews.llvm.org/D38626 llvm-svn: 316898	2017-10-30 12:02:06 +00:00
Jina Nahias	e63db55c67	Revert "[X86][AVX512] Adding a pattern for broadcastm intrinsic." This reverts commit r316890. Change-Id: I683cceee9848ef309b452293086b1f26a941950d llvm-svn: 316894	2017-10-30 10:35:53 +00:00
Jina Nahias	70280f9a0d	[X86][AVX512] Adding a pattern for broadcastm intrinsic. Differential Revision: https://reviews.llvm.org/D38312 Change-Id: I6551fb13879e098aed74de410e29815cf37d9ab5 llvm-svn: 316890	2017-10-30 09:59:52 +00:00
Craig Topper	85bcc297c3	[X86] Rearrange code in X86InstrInfo.cpp to put all the foldMemoryOperandImpl methods together without partial/undef register handling in the middle. NFC I have a future patch that wants to make use of the one of the partial functions in one of the earlier memory folding methods and the current ordering prevents that. llvm-svn: 316883	2017-10-30 04:39:18 +00:00
Craig Topper	c848355335	[X86] Simplify code by removing an unnecessary temporary variable. NFC llvm-svn: 316882	2017-10-30 03:35:44 +00:00
Craig Topper	730414b0ca	[X86] Move some EVEX->VEX code to a helper function to prepare for a future patch. NFC llvm-svn: 316881	2017-10-30 03:35:43 +00:00
Craig Topper	495a1bc893	[X86] Remove combine that turns X86ISD::LSUB into X86ISD::LADD. Update patterns that depended on this. If the carry flag is being used, this transformation isn't safe. This does prevent some test cases from using DEC now, but I'll try to look into that separately. Fixes PR35068. llvm-svn: 316860	2017-10-29 06:51:04 +00:00
Craig Topper	7a60e29185	[X86] Fix typo in comment. NFC llvm-svn: 316859	2017-10-29 06:51:02 +00:00
Craig Topper	912f3b8e4b	[X86] Use the extended vector register classes in fast isel with AVX512F/VL. llvm-svn: 316857	2017-10-29 05:14:26 +00:00
Craig Topper	5f2289a13c	[X86] Add AVX512 support to X86FastISel::X86SelectFPExt and X86FastISel::X86SelectFPTrunc. llvm-svn: 316856	2017-10-29 02:50:31 +00:00
Craig Topper	1e30d783dd	[X86] Add AVX512 support to X86FastISel::X86MaterializeFP llvm-svn: 316853	2017-10-29 02:18:41 +00:00
Craig Topper	0692ca4bd2	[X86] Remove invalid code from LowerVSELECT. This code attempted to say that v8i16/v16i16 VSELECT is legal if BWI and VLX are enabled, but the only way we could reach this point is if the condition was not a vXi1 type. Which means it really wasn't legal. We don't have any tests that exercise this code. So I'm hoping it wasn't really reachable. llvm-svn: 316851	2017-10-28 23:10:13 +00:00
Simon Pilgrim	294f88dfa0	[X86][SSE] Combine 128-bit target shuffles to PACKSS/PACKUS. llvm-svn: 316845	2017-10-28 20:51:27 +00:00
Simon Pilgrim	bd3852aa5e	[X86][SSE] Split off matchVectorShuffleWithPACK. NFCI. Split matchVectorShuffleWithPACK from lowerVectorShuffleWithPACK so that we can reuse it for target shuffle combines llvm-svn: 316844	2017-10-28 20:27:22 +00:00
Craig Topper	40f0584f08	[X86] Fix a mistake in the X86ISelDAGToDAG.cpp code for MUL8r/IMUL8r. I think this code is unreachable due to some promotions that occur elsewhere. I'll look into that to be sure, but for now I thought I should at least fix the obvious typo. llvm-svn: 316840	2017-10-28 19:56:57 +00:00
Craig Topper	202b559ae0	[X86] Replace some default cases in X86SelectShift with llvm_unreachable. llvm-svn: 316839	2017-10-28 19:56:56 +00:00
Sanjay Patel	b049173157	[SimplifyCFG] use pass options and remove the latesimplifycfg pass This is no-functional-change-intended. This is repackaging the functionality of D30333 (defer switch-to-lookup-tables) and D35411 (defer folding unconditional branches) with pass parameters rather than a named "latesimplifycfg" pass. Now that we have individual options to control the functionality, we could decouple when these fire (but that's an independent patch if desired). The next planned step would be to add another option bit to disable the sinking transform mentioned in D38566. This should also make it clear that the new pass manager needs to be updated to limit simplifycfg in the same way as the old pass manager. Differential Revision: https://reviews.llvm.org/D38631 llvm-svn: 316835	2017-10-28 18:43:07 +00:00
Simon Pilgrim	25808c303f	[X86][SSE] Rename truncateVectorCompareWithPACKSS to truncateVectorWithPACKSS. NFC. We no longer rely on the vector source being a comparison result, just have sufficient sign bits. llvm-svn: 316834	2017-10-28 17:59:56 +00:00
Craig Topper	f8b92661b8	[X86] Remove unneeded MVT::i1 related code from fast isel. llvm-svn: 316825	2017-10-28 05:52:23 +00:00
Tom Stellard	d0c6cf2e8c	AMDGPU/GlobalISel: Mark 32-bit G_FADD as legal Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38439 llvm-svn: 316815	2017-10-27 23:57:41 +00:00
Krzysztof Parzyszek	4dc04e6a70	[Hexagon] Adjust patterns to reflect instruction selection preferences llvm-svn: 316804	2017-10-27 22:24:49 +00:00
David Blaikie	8699f71310	Add a few missing headers for modularization/IWYU/etc Several cases where class definitions are required for DenseMap pointer traits handling. llvm-svn: 316803	2017-10-27 22:12:46 +00:00
Rafael Espindola	2393c3b4e1	Handle undefined weak hidden symbols on all architectures. We were handling the non-hidden case in lib/Target/TargetMachine.cpp, but the hidden case was handled in architecture dependent code and only X86_64 and AArch64 were covered. While it is true that some code sequences in some ABIs might be able to produce the correct value at runtime, that doesn't seem to be the common case. I left the AArch64 code in place since it also forces a got access for non-pic code. It is not clear if that is needed, but it is probably better to change that in another commit. llvm-svn: 316799	2017-10-27 21:18:48 +00:00
Craig Topper	d69453290e	[X86] Remove fast-isel code for handling i8 shifts. This is handled by auto generated code. llvm-svn: 316797	2017-10-27 21:00:59 +00:00
Craig Topper	728fa7b4e2	[X86] Teach fastisel to use VLX VMOVNTDQA for v4f64 and 256-bit integers when available. This looks to have been missed from r280682. llvm-svn: 316790	2017-10-27 20:13:10 +00:00
Krzysztof Parzyszek	92a2635bbd	[Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cpp Making sure that an instruction has fewer operands than required, then attempting to access one out of range is going to fail. llvm-svn: 316785	2017-10-27 18:52:28 +00:00
Simon Pilgrim	5e3808afa2	[X86][F16C] Fix btver2 AGU pipe scheduling Use the store AGU for stores, and the load AGU needs to be the first pipe for loads llvm-svn: 316771	2017-10-27 16:34:58 +00:00
David Blaikie	6265130054	InstructionSelectorImpl.h: Modularize/remove ODR violations by using a static member function to expose the debug name llvm-svn: 316715	2017-10-26 23:39:54 +00:00
Eli Friedman	d5dfb62de7	[ARM] Honor -mfloat-abi for libcall calling convention As far as I can tell, this matches gcc: -mfloat-abi determines the calling convention for all functions except those explicitly defined as soft-float in the ARM RTABI. This change only affects cases where the user specifies -mfloat-abi to override the default calling convention derived from the target triple. Fixes https://bugs.llvm.org//show_bug.cgi?id=34530. Differential Revision: https://reviews.llvm.org/D38299 llvm-svn: 316708	2017-10-26 21:42:32 +00:00
Craig Topper	b8d7d4d683	[X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support 64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702	2017-10-26 21:12:03 +00:00
Craig Topper	8a2a104129	[X86] Teach the assembly parser to warn on duplicate registers in gather instructions. Fixes PR32238. Differential Revision: https://reviews.llvm.org/D39077 llvm-svn: 316700	2017-10-26 21:03:54 +00:00
Sanjay Patel	ac50f3e907	[x86] use an insert op to put one variable element into a constant of vectors Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it. Differential Revision: https://reviews.llvm.org/D38756 llvm-svn: 316685	2017-10-26 18:27:55 +00:00
Yichao Yu	221dae31a5	Clear LastMappingSymbols and LastEMS(Info) when resetting the ARM(AArch64)ELFStreamer Summary: This causes a segfault on ARM when (I think) the pass manager is used multiple times. Reset set the (last) current section to NULL without saving the corresponding LastEMSInfo back into the map. The next use of the streamer then save the LastEMSInfo for the NULL section leaving the LastEMSInfo mapping for the last current section (the one that was there before the reset) NULL which cause the LastEMSInfo to be set to NULL when the section is being used again. The reuse of the section (pointer) might mean that the map was holding dangling pointers previously which is why I went for clearing the map and resetting the info, making it as similar to the state right after the constructor run as possible. The AArch64 one doesn't have segfault (since LastEMS isn't a pointer) but it seems to have the same issue. The segfault is likely caused by https://reviews.llvm.org/D30724 which turns LastEMSInfo into a pointer. As mentioned above, it seems that the actual issue was older though. No test is included since the test is believed to be too complicated for such an obvious fix and not worth doing. Reviewers: llvm-commits, shankare, t.p.northover, peter.smith, rengolin Reviewed By: rengolin Subscribers: mgorny, aemerson, rengolin, javed.absar, kristof.beyls Differential Revision: https://reviews.llvm.org/D38588 llvm-svn: 316679	2017-10-26 17:36:43 +00:00
Sean Fertile	c70d28bff5	Represent runtime preemption in the IR. Currently we do not represent runtime preemption in the IR, which has several drawbacks: 1) The semantics of GlobalValues differ depending on the object file format you are targeting (as well as the relocation-model and -fPIE value). 2) We have no way of disabling inlining of run time interposable functions, since in the IR we only know if a function is link-time interposable. Because of this llvm cannot support elf-interposition semantics. 3) In LTO builds of executables we will have extra knowledge that a symbol resolved to a local definition and can't be preemptable, but have no way to propagate that knowledge through the compiler. This patch adds preemptability specifiers to the IR with the following meaning: dso_local --> means the compiler may assume the symbol will resolve to a definition within the current linkage unit and the symbol may be accessed directly even if the definition is not within this compilation unit. dso_preemptable --> means that the compiler must assume the GlobalValue may be replaced with a definition from outside the current linkage unit at runtime. To ease transitioning dso_preemptable is treated as a 'default' in that low-level codegen will still do the same checks it did previously to see if a symbol should be accessed indirectly. Eventually when IR producers emit the specifiers on all Globalvalues we can change dso_preemptable to mean 'always access indirectly', and remove the current logic. Differential Revision: https://reviews.llvm.org/D20217 llvm-svn: 316668	2017-10-26 15:00:26 +00:00
Marek Olsak	2232243863	AMDGPU: Handle s_buffer_load_dword hazard on SI Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39171 llvm-svn: 316666	2017-10-26 14:43:02 +00:00
Simon Dardis	b633acac9f	[mips] Fix (dis)assembly of abs.fmt for micromips These instructions were previously marked as codegen only preventing them from being assembled as microMIPS or disassembled. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D39123 llvm-svn: 316656	2017-10-26 11:36:54 +00:00
Simon Dardis	13452383cd	[mips] Fix PR35071 PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past debug instructions when removing branches for the control flow optimizer, which lead to duplicated conditional branches. If the target of the branch was a removable block, only the conditional branch in the terminating position would have it's MBB operands updated, leaving the first branch with a dangling MBB operand. The MIPS long branch pass would then trigger an assertion when attempting to examine the instruction with dangling MBB operand. This resolves PR35071. Thanks to Alex Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39288 llvm-svn: 316654	2017-10-26 10:58:36 +00:00
Hiroshi Inoue	b72b1fb0de	[PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1 Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0. This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities. Differential Revision: https://reviews.llvm.org/D38941 llvm-svn: 316647	2017-10-26 09:01:51 +00:00
Craig Topper	0551556ed2	[AsmParser][TableGen] Add VariantID argument to the generated mnemonic spell check function so it can use the correct table based on variant. I'm considering implementing the mnemonic spell checker for x86, and that would require the separate intel and att variants. llvm-svn: 316641	2017-10-26 06:46:41 +00:00
Craig Topper	2a06028c0a	[AsmParser][TableGen] Make the generated mnemonic spell checker function a file local static function. Also only emit in targets that specificially request it. This is required so we don't get an unused static function error. llvm-svn: 316640	2017-10-26 06:46:40 +00:00
Craig Topper	619b15283d	[X86] Use correct type for return value of ComputeAvailableFeatures in the AsmParser. NFC There aren't enough used bits to make this a functional change, but we should fix it for consistency. llvm-svn: 316639	2017-10-26 06:46:38 +00:00
David Blaikie	cc7763ba92	Hexagon: Fold a single-use textual header into its use llvm-svn: 316604	2017-10-25 19:52:21 +00:00
Krzysztof Parzyszek	27056da9a8	[Hexagon] Account for negative offset when limiting max deviation In getOffsetRange, Max can be set to 0 to force the extender replacement to be at or below the original value. This would cause the new offset to be non-negative, which is preferred for memory instructions (to reduce the likelihood of it getting constant-extended due to predication). The problem happens when the range is shifted by an offset (present in the instruction being examined) and the offset is negative. The entire range for the allowable deviation will then be strictly negative. This creates a problem, since 0 is assumed to be a valid deviation. llvm-svn: 316601	2017-10-25 18:46:40 +00:00
Craig Topper	6fae2eedf3	[X86] Add avx512vpopcntdq to Knights Mill As indicated by Table 1-1 in Intel Architecture Instruction Set Extensions and Future Features Programming Reference from October 2017. llvm-svn: 316592	2017-10-25 17:10:32 +00:00
Simon Dardis	7af3edc4f4	[mips] Clean up some whitespace (NFC). Also test that my email address was updated. llvm-svn: 316575	2017-10-25 13:35:53 +00:00
Diana Picus	b35022121d	[ARM GlobalISel] Fix call opcodes We were generating BLX for all the calls, which was incorrect in most cases. Update ARMCallLowering to generate BL for direct calls, and BLX, BX_CALL or BMOVPCRX_CALL for indirect calls. llvm-svn: 316570	2017-10-25 11:42:40 +00:00
Sam Parker	1f742117bd	[ARM] OrCombineToBFI function Extract the functionality to combine OR to BFI into its own function. Differential Revision: https://reviews.llvm.org/D39001 llvm-svn: 316563	2017-10-25 08:37:33 +00:00
Sam Parker	ccb209bb97	[ARM] Swap cmp operands for automatic shifts Swap the compare operands if the lhs is a shift and the rhs isn't, as in arm and T2 the shift can be performed by the compare for its second operand. Differential Revision: https://reviews.llvm.org/D39004 llvm-svn: 316562	2017-10-25 08:33:06 +00:00
Martin Storsjo	373c8efa1e	[AArch64] Add support for dllimport of values and functions Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555	2017-10-25 07:25:18 +00:00
Matt Arsenault	28f52e51f1	AMDGPU: Add max-mix-insts subtarget feature llvm-svn: 316553	2017-10-25 07:00:51 +00:00
Yonghong Song	9af998e86e	bpf: fix an uninitialized variable issue Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 316519	2017-10-24 21:36:33 +00:00
David Blaikie	c70b392e49	ARMAddressingModes.h: Don't mark header functions as file local llvm-svn: 316517	2017-10-24 21:29:21 +00:00
David Blaikie	4016da602e	HexagonDepTimingClasses.h: Don't mark header functions as file local llvm-svn: 316508	2017-10-24 21:29:16 +00:00
David Blaikie	75bda3006b	WebassemblyAsmPrinter.h: Include WebAssemblyMachineFunctionInfo for use with MachineFunction::getInfo llvm-svn: 316507	2017-10-24 21:29:15 +00:00
David Blaikie	1032b51aa0	X86Operand.h: Include X86MCTargetDesc.h for SSE register enum/names llvm-svn: 316506	2017-10-24 21:29:15 +00:00
David Blaikie	6a2b124248	X86AsmPrinter.h: Add missing header for complete type needed for MCCodeEmitter dtor. llvm-svn: 316505	2017-10-24 21:29:14 +00:00
Artem Belevich	cb8f6328dc	[NVPTX] allow address space inference for volatile loads/stores. If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495	2017-10-24 20:31:44 +00:00
Gadi Haber	323f2e1715	[X86][Broadwell] Added the instruction scheduling information for the Broadwell CPU. Adding the scheduling information for the Browadwell (BDW) CPU target. This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target. We used the scheduling information retrieved from the Broadwell architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction. The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175. Performance fluctuations may be expected due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D39054 Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01 llvm-svn: 316492	2017-10-24 20:19:47 +00:00
Yonghong Song	ee68d8e41f	bpf: fix a bug in trunc-op optimization Previous implementation for per-function scope is incorrect and too conservative. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 316481	2017-10-24 18:21:10 +00:00
Stefan Pintilie	8f0c783095	[PowerPC] Try to simplify a Swap if it feeds a Splat If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Fixed the test case that was failing and recommit after pulling the original commit. Original revision is here: https://reviews.llvm.org/D39009 llvm-svn: 316478	2017-10-24 17:44:27 +00:00
Yonghong Song	0f836d5dc5	bpf: fix a bug in bpf-isel trunc-op optimization In BPF backend, we try to optimize away redundant trunc operations so that kernel verifier rewrite remains valid. Previous implementation only works for a single function. This patch fixed the issue for multiple functions. It clears internal map data structure before performing optimization for each function. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 316469	2017-10-24 17:29:03 +00:00
Simon Pilgrim	5e8c3f328f	[X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC llvm-svn: 316462	2017-10-24 17:04:57 +00:00
Saleem Abdulrasool	fb490a0bcc	PowerPC: support the separator character in the IAS PowerPC uses ; as a comment leader and the @ as a separator character. Support this properly. llvm-svn: 316454	2017-10-24 16:19:56 +00:00
Simon Pilgrim	0a12c239b6	[X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448	2017-10-24 15:38:16 +00:00
Oliver Stannard	03ded27bbc	[ARM] Error for invalid shift in memory operand Report a diagnostic when we fail to parse a shift in a memory operand because the shift type is not an identifier. Without this, we were silently ignoring the whole instruction. Differential revision: https://reviews.llvm.org/D39237 llvm-svn: 316441	2017-10-24 14:19:08 +00:00
Simon Pilgrim	c36dd6ae9c	[X86] truncateVectorCompareWithPACKSS - remove duplicate variables. NFCI. llvm-svn: 316440	2017-10-24 14:18:32 +00:00
Andrew V. Tischenko	f4fbe4a51b	Update f16c instruction scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39051 llvm-svn: 316435	2017-10-24 13:38:30 +00:00
Zvi Rackover	bf31bf78e7	X86CallFrameOptimization: Update comments and variable names. NFCI. Following up on D38738. llvm-svn: 316434	2017-10-24 13:24:26 +00:00
Zvi Rackover	31b101a186	X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms Summary: r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However, X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable. This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms. An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire. Fixes pr34863 Reviewers: DavidKreitzer, guyblank, aymanmus Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38738 llvm-svn: 316431	2017-10-24 12:13:05 +00:00
Marek Olsak	ce76ea0394	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1) Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427	2017-10-24 10:27:13 +00:00
Marek Olsak	2114fc3bcb	AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 llvm-svn: 316426	2017-10-24 10:26:59 +00:00
Oliver Stannard	ce256a3a01	[ARM] Replace development diagnostics with normal DEBUG macro * Remove the -arm-asm-parser-dev-diags option. * Use normal DEBUG(dbgs()) printing for the extra development information about missing diagnostics. Differential Revision: https://reviews.llvm.org/D39194 llvm-svn: 316423	2017-10-24 09:46:56 +00:00
Oliver Stannard	6d5a5b98ab	[ARM] tSETEND needs IsThumb This is the Thumb encoding, so the Requires list must include IsThumb. No test because we happen to select the ARM one first, but that's just luck. Differential Revision: https://reviews.llvm.org/D39190 llvm-svn: 316421	2017-10-24 09:03:33 +00:00
Oliver Stannard	c507b370a1	[ARM] Remove tCPS alias which just crashed This alias caused a crash when trying to print the "cps #0" instruction in a diagnostic for thumbv6 (which doesn't have that instruction). The comment was incorrect, this instruction is UNPREDICTABLE if no flag bits are set, so I don't think it's worth keeping. Differential Revision: https://reviews.llvm.org/D39191 llvm-svn: 316420	2017-10-24 08:55:36 +00:00
Zvi Rackover	3c0d385598	X86: Fix X86CallFrameOptimization to search for the COPY StackPointer SelectionDAG inserts a copy of ESP into a virtual register. X86CallFrameOptimization assumed that the COPY, if present, is always right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a wrong assumption as the COPY can be located anywhere between the call-frame setup instruction and its first use. If the COPY happened to be located in a different location than what X86CallFrameOptimization assumed, visiting it while processing the call chain would lead to a conservative bail-out. The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note of it so it can be ignored while processing the call chain. Fixes pr34903 Differential Revision: https://reviews.llvm.org/D38730 llvm-svn: 316416	2017-10-24 07:38:29 +00:00
Omer Paparo Bivas	2251c79aba	[MC] Adding code padding for performance stability - infrastructure. NFC. Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved. The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known. This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding. Reviewers: aaboud zvi craig.topper gadi.haber Differential revision: https://reviews.llvm.org/D34393 Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e llvm-svn: 316413	2017-10-24 06:16:03 +00:00
Zvi Rackover	c6d0b6c103	X86: Register the X86CallFrameOptimization pass Summary: The motivation of this change is to enable .mir testing for this pass. Added one test case to cover the functionality, this same case will be improved by a future patch. Reviewers: igorb, guyblank, DavidKreitzer Reviewed By: guyblank, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38729 llvm-svn: 316412	2017-10-24 05:47:07 +00:00
Konstantin Zhuravlyov	339e74440a	AMDGPU: Initialize WavefrontSize from TD files Differential Revision: https://reviews.llvm.org/D39205 llvm-svn: 316389	2017-10-23 23:02:39 +00:00
Simon Pilgrim	321e54f72d	[X86][SSE] combineBitcastvxi1 - use PACKSSWB directly to pack v8i16 to v16i8 Avoid difficulties determining the number of sign bits later on in shuffle lowering to lower to PACKSS llvm-svn: 316383	2017-10-23 22:05:02 +00:00
Stefan Pintilie	52bbd587ac	Revert "[PowerPC] Try to simplify a Swap if it feeds a Splat" Revert commit r316366. Previous commit causes p8-scalar_vector_conversions.ll to fail. This reverts commit 990e764ad8a2eec206ce5dda6aefab059ccd4e92. llvm-svn: 316371	2017-10-23 20:22:23 +00:00
Krzysztof Parzyszek	6f06b6edff	[Hexagon] Return the correct chain edge for i1 function calls In HexagonISelLowering, there is code to handle the case when a function returns an i1 type. In this case, we need to generate extra nodes to copy the result from R0 to a predicate register. The code was returning the wrong value for the chain edge which caused an assert "Wrong topological sorting" when converting the instructions to MIs. This patch fixes the problem by returning the chain for the final copy. Patch by Brendon Cahoon. llvm-svn: 316367	2017-10-23 19:35:25 +00:00
Stefan Pintilie	feafa1d7f0	[PowerPC] Try to simplify a Swap if it feeds a Splat If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Differential Revision: https://reviews.llvm.org/D39009 llvm-svn: 316366	2017-10-23 19:33:31 +00:00
Krzysztof Parzyszek	273678823b	[Hexagon] Add extra pattern for S4_addaddi One combination was missing: add(add(x,y),c). llvm-svn: 316363	2017-10-23 19:07:50 +00:00
Daniel Sanders	d66e0901ae	[globalisel][tablegen] Import stores and allow GISel to automatically substitute zero regs like WZR/XZR/$zero. This patch enables the import of stores. Unfortunately, doing so by itself, loses an optimization where storing 0 to memory makes use of WZR/XZR. To mitigate this, this patch also introduces a new feature that allows register operands to nominate a zero register. When this is done, GlobalISel will substitute (G_CONSTANT 0) with the nominated register automatically. This is currently configured to only apply to the stores. Applying it to GPR32/GPR64 register classes in general will be done after review see (https://reviews.llvm.org/D39150). llvm-svn: 316360	2017-10-23 18:19:24 +00:00
Matt Arsenault	a030e2688f	AMDGPU: Cleanup local atomic node names llvm-svn: 316349	2017-10-23 17:16:43 +00:00
Matt Arsenault	b791802aef	AMDGPU: Fix default range in non-kernel functions The range should be assumed to be the hardware maximum if a workitem intrinsic is used in a callable function which does not know the restricted limit of the calling kernel. llvm-svn: 316346	2017-10-23 17:09:35 +00:00
Craig Topper	8d5a246ebe	[X86] Change VMPTRST to use PS instead of TB to match VMPTRLD. llvm-svn: 316340	2017-10-23 16:22:40 +00:00
Craig Topper	1db2f0828e	[X86] Change RDRAND to use PS instead of TB. Should be no functional change for now. A future disassembler change will prevent disassembling with 0xf2/0xf3. llvm-svn: 316339	2017-10-23 16:22:38 +00:00
Craig Topper	4d93adfed5	[X86] Change XRSTOR to use PS instead of TB to match XSAVE. I don't think this changes anything functionally yet, but I plan to fix the disassembler to use this to disable matching certain instructions with 0xf3/0xf2/0x66 prefixes. llvm-svn: 316337	2017-10-23 16:11:33 +00:00
Simon Pilgrim	1dcb913be6	[X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI. Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection. Differential Revision: https://reviews.llvm.org/D39169 llvm-svn: 316336	2017-10-23 16:00:57 +00:00
Andrew V. Tischenko	777308b548	Update DPPD/DPPS instruction scheduling on btver2. Differential Revision: https://reviews.llvm.org/D39046 llvm-svn: 316334	2017-10-23 15:53:30 +00:00
Craig Topper	8f182fdd8b	[X86] Add PTWRITE instruction for assembler and disassembler. llvm-svn: 316333	2017-10-23 15:53:21 +00:00
Craig Topper	5f0339d2f3	[X86] Add RDPID instruction for assembler and disassembler. llvm-svn: 316332	2017-10-23 15:53:16 +00:00
Andrew V. Tischenko	eff4fc0d41	Fix for Bug 30718 - Failure to disassemble certain MOV with rex.R. The issue was in illegal segment register index. Differential Revision: https://reviews.llvm.org/D38786 llvm-svn: 316319	2017-10-23 09:36:33 +00:00
Haojian Wu	1afddd4136	Fix a -Wpedantic warning. llvm-svn: 316315	2017-10-23 09:02:59 +00:00
Sam Parker	487ab86942	[ARM] Allow unrolling of multi-block loops. Before, loop unrolling was only enabled for loops with a single block. This restriction has been removed and replaced by: - allow a maximum of two exiting blocks, - a four basic block limit for cores with a branch predictor. Differential Revision: https://reviews.llvm.org/D38952 llvm-svn: 316313	2017-10-23 08:05:14 +00:00
Craig Topper	326008c615	[X86] Fix disassembly of EVEX rounding control and SAE instructions. Fixes PR31955. llvm-svn: 316308	2017-10-23 02:26:24 +00:00
Benjamin Kramer	a7c822a238	[X86] Add missing override. NFC. llvm-svn: 316299	2017-10-22 19:16:31 +00:00
Simon Pilgrim	ce55eab936	Strip trailing whitespace. NFCI. llvm-svn: 316296	2017-10-22 18:38:57 +00:00
Marina Yatsina	f9371d821f	Add logic to greedy reg alloc to avoid bad eviction chains This fixes bugzilla 26810 https://bugs.llvm.org/show_bug.cgi?id=26810 This is intended to prevent sequences like: movl %ebp, 8(%esp) # 4-byte Spill movl %ecx, %ebp movl %ebx, %ecx movl %edi, %ebx movl %edx, %edi cltd idivl %esi movl %edi, %edx movl %ebx, %edi movl %ecx, %ebx movl %ebp, %ecx movl 16(%esp), %ebp # 4 - byte Reload Such sequences are created in 2 scenarios: Scenario #1: vreg0 is evicted from physreg0 by vreg1 Evictee vreg0 is intended for region splitting with split candidate physreg0 (the reg vreg0 was evicted from) Region splitting creates a local interval because of interference with the evictor vreg1 (normally region spliiting creates 2 interval, the "by reg" and "by stack" intervals. Local interval created when interference occurs.) one of the split intervals ends up evicting vreg2 from physreg1 Evictee vreg2 is intended for region splitting with split candidate physreg1 one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills Scenario #2 vreg0 is evicted from physreg0 by vreg1 vreg2 is evicted from physreg2 by vreg3 etc Evictee vreg0 is intended for region splitting with split candidate physreg1 Region splitting creates a local interval because of interference with the evictor vreg1 one of the split intervals ends up evicting back original evictor vreg1 from physreg0 (the reg vreg0 was evicted from) Another evictee vreg2 is intended for region splitting with split candidate physreg1 one of the split intervals ends up evicting vreg3 from physreg2 etc.. until someone spills As compile time was a concern, I've added a flag to control weather we do cost calculations for local intervals we expect to be created (it's on by default for X86 target, off for the rest). Differential Revision: https://reviews.llvm.org/D35816 Change-Id: Id9411ff7bbb845463d289ba2ae97737a1ee7cc39 llvm-svn: 316295	2017-10-22 17:59:38 +00:00
Momchil Velikov	d6a4ab3d49	[ARM] Dynamic stack alignment for 16-bit Thumb This patch implements dynamic stack (re-)alignment for 16-bit Thumb. When targeting processors, which support only the 16-bit Thumb instruction set the compiler ignores the alignment attributes of automatic variables and may silently generate incorrect code. Differential revision: https://reviews.llvm.org/D38143 llvm-svn: 316289	2017-10-22 11:56:35 +00:00
Guy Blank	92d5ce3bd4	[X86] Add a pass to convert instruction chains between domains. The pass scans the function to find instruction chains that define registers in the same domain (closures). It then calculates the cost of converting the closure to another domain. If found profitable, the instructions are converted to instructions in the other domain and the register classes are changed accordingly. This commit adds the pass infrastructure and a simple conversion from the GPR domain to the Mask domain. Differential Revision: https://reviews.llvm.org/D37251 Change-Id: Ic2cf1d76598110401168326d411128ae2580a604 llvm-svn: 316288	2017-10-22 11:43:08 +00:00
Craig Topper	a33846aca6	[X86] Add VEX_WIG to applicable AVX512 instructions. This should be NFC. Will be used in future patches to fix disassembler bugs. llvm-svn: 316284	2017-10-22 06:18:23 +00:00
Craig Topper	1bcb0d8a7f	[X86] Add VEX_WIG to VROUNDSSrr/VROUNDSSrm/VROUNDSDrr/VROUNDSDrm llvm-svn: 316283	2017-10-22 06:18:20 +00:00
Craig Topper	158bc6474a	[X86] Don't allow gather/scatter to disassembler if memory operand does not use a SIB byte. Fixes PR34998. llvm-svn: 316282	2017-10-22 04:32:30 +00:00
Simon Pilgrim	ab6dbe2b29	Strip trailing whitespace. NFCI. llvm-svn: 316277	2017-10-21 20:40:49 +00:00
Aaron Ballman	fc02869c96	Reverting r316270 due to failing build bots. http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/12899 http://lab.llvm.org:8011/builders/clang-x86-windows-msvc2015/builds/7951 llvm-svn: 316276	2017-10-21 20:38:15 +00:00
Simon Pilgrim	3cb024490a	[X86][SSE] Add extractps/pextrd equivalence to domain tables Differential Revision: https://reviews.llvm.org/D39135 llvm-svn: 316274	2017-10-21 20:19:48 +00:00
Craig Topper	ca2382d809	[X86] Fix disassembling of EVEX instructions to stop accidentally decoding the SIB index register as an XMM/YMM/ZMM register. This introduces a new operand type to encode the whether the index register should be XMM/YMM/ZMM. And new code to fixup the results created by readSIB. This has the nice effect of removing a bunch of code that hard coded the name of every GATHER and SCATTER instruction to map the index type. This fixes PR32807. llvm-svn: 316273	2017-10-21 20:03:20 +00:00
Simon Pilgrim	cb028c7321	Fix MSVC 'result of 32-bit shift implicitly converted to 64 bits' warning. NFCI. llvm-svn: 316271	2017-10-21 17:23:04 +00:00
Fangrui Song	c7b749bd06	[PPC CodeGen] Fix the bitreverse.i64 intrinsic. Summary: The two 32-bit words were swapped. Subscribers: nemanjai, kbarton Differential Revision: https://reviews.llvm.org/D38705 llvm-svn: 316270	2017-10-21 16:59:40 +00:00
Craig Topper	fcf27188d7	[X86] Do not generate __multi3 for mul i128 on X86 Summary: __multi3 is not available on x86 (32-bit). Setting lib call name for MULI_128 to nullptr forces DAGTypeLegalizer::ExpandIntRes_MUL to generate instructions for 128-bit multiply instead of a call to an undefined function. This fixes PR20871 though it may be worth looking at why licm and indvars combine to generate 65-bit multiplies in that test. Patch by Riyaz V Puthiyapurayil Reviewers: craig.topper, schweitz Reviewed By: craig.topper, schweitz Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D38668 llvm-svn: 316254	2017-10-21 02:26:00 +00:00
Krzysztof Parzyszek	9d19c8cac9	[Packetizer] Add function to check for aliasing between instructions llvm-svn: 316243	2017-10-20 22:08:40 +00:00
Sam Clegg	12fd3da9d1	[WebAssembly] MC: Fix crash when -g specified. At this point we don't output any debug sections or thier relocations. Differential Revision: https://reviews.llvm.org/D39076 llvm-svn: 316240	2017-10-20 21:28:38 +00:00
Daniel Sanders	1e4569fdc1	[globalisel][tablegen] Fix small spelling nits. NFC ComplexRendererFn -> ComplexRendererFns Corrected a couple lingering references to tied operands that were missed. llvm-svn: 316237	2017-10-20 20:55:29 +00:00
Krzysztof Parzyszek	022922b31a	[Hexagon] Report error instead of crashing on wrong inline-asm constraints llvm-svn: 316236	2017-10-20 20:24:44 +00:00
Krzysztof Parzyszek	64e5d7d3ae	[Hexagon] Reorganize and update instruction patterns llvm-svn: 316228	2017-10-20 19:33:12 +00:00
Simon Pilgrim	29b32472b4	[X86][SSE] getTargetShuffleMask - check shuffle input value types. NFCI. To help identify shuffle combine issues llvm-svn: 316222	2017-10-20 18:07:50 +00:00
Dave Lee	f9b72327b0	Make x86 __ehhandler comdat if parent function is Summary: This change comes from using lld for i686-windows-msvc. Before this change, lld emits an error of: error: relocation against symbol in discarded section: .xdata It's possible that this could be addressed in lld, but I think this change is reasonable on its own. At a high level, this is being generated: A (.text comdat) -> B (.text) -> C (.xdata comdat) Where A is a C++ inline function, which references B, an exception handler thunk, which references C, the exception handling info. With this structure, lld will error when applying relocations to B if the C it references has been discarded (some other C has been selected). This change checks if A is comdat, and if so places the exception registration thunk (B) in the comdata group of A (and B). It appears that MSVC makes the __ehhandler function comdat. Is it possible that duplicate thunks are being emitted into the final binary with other linkers, or are they stripping the unused thunks? Reviewers: rnk, majnemer, compnerd, smeenai Reviewed By: rnk, compnerd Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38940 llvm-svn: 316219	2017-10-20 17:04:43 +00:00
Krzysztof Parzyszek	3818aeaeb9	[Hexagon] Allow redefinition with immediates for hw loop conversion Normally, if the registers holding the induction variable's bounds are redefined inside of the loop's body, the loop cannot be converted to a hardware loop. However, if the redefining instruction is actually loading an immediate value into the register, this conversion is both possible and legal (since the immediate itself will be used in the loop setup in the preheader). llvm-svn: 316218	2017-10-20 16:56:33 +00:00
Aleksandar Beserminji	143572984d	Revert "[mips] Reordering callseq* nodes to be linear" This reverts commit r314507, because the original patch is causing test failures. llvm-svn: 316215	2017-10-20 14:35:41 +00:00
Eugene Leviant	27b226fb65	[ARM] Use post-RA MI scheduler when +use-misched is set Differential revision: https://reviews.llvm.org/D39100 llvm-svn: 316214	2017-10-20 14:29:17 +00:00
Nemanja Ivanovic	0026c06e11	Disabling the transformation introduced in r315888 The commit at https://reviews.llvm.org/rL315888 is causing some failures with internal testing. Disabling this code until we can resolve the issues. llvm-svn: 316199	2017-10-20 00:36:46 +00:00
Alex Bradbury	c6c4e8bd5a	[RISCV] Add missing hunk from r316188 r316188 didn't set guessInstructionProperties=1 as it should have done. llvm-svn: 316189	2017-10-19 21:43:29 +00:00
Alex Bradbury	8971842f43	[RISCV] Initial codegen support for ALU operations This adds the minimum necessary to support codegen for simple ALU operations on RV32. Prolog and epilog insertion, support for memory operations etc etc follow in future patches. Leave guessInstructionProperties=1 until https://reviews.llvm.org/D37065 is reviewed and lands. Differential Revision: https://reviews.llvm.org/D29933 llvm-svn: 316188	2017-10-19 21:37:38 +00:00
Craig Topper	7bce79a539	[X86] Remove LowerEXTRACT_SUBVECTOR handler. All EXTRACT_SUBVECTORs are marked as legal. llvm-svn: 316182	2017-10-19 20:59:40 +00:00
Graham Yiu	488782efa3	The cost of splitting a large vector instruction is not being taken into account by the getUserCost function. This was leading to some loops being over unrolled. The cost of a vector instruction is now being multiplied by the cost of the type legalization. This will return a more accurate cost. Committing on behalf on Brad Nemanich (brad.nemanich@ibm.com) Differential Revision: https://reviews.llvm.org/D38961 llvm-svn: 316174	2017-10-19 18:16:31 +00:00
Krzysztof Parzyszek	e4d0e199bf	[Hexagon] Fix store conversion from rr to io in optimize addressing modes llvm-svn: 316170	2017-10-19 16:59:22 +00:00
Alex Bradbury	3c941e7ed9	[RISCV] RISCVAsmParser: early exit if RISCVOperand isn't immediate as expected This is necessary to avoid an assertion in the included test case and similar assembler inputs. llvm-svn: 316168	2017-10-19 16:22:51 +00:00
Alex Bradbury	baa54d4ac8	[RISCV][NFC] Drop unused parameter from createImm helper in RISCVAsmParser llvm-svn: 316167	2017-10-19 16:09:20 +00:00
Simon Pilgrim	fdd63d1535	[X86] Replace custom scalar integer absolute matching with ISD::ABS lowering. x86 has its own copy of integer absolute pattern matching to combine directly to a SUB+CMOV. This patch removes the x86 combine and adds custom lowering support for ISD::ABS instead, allowing us to use the DAGCombiner version. Additional test cases are already covered by iabs.ll (rL315706 and rL315711). Differential Revision: https://reviews.llvm.org/D38895 llvm-svn: 316162	2017-10-19 15:02:24 +00:00
Alex Bradbury	ee7c7ecd03	[RISCV] Prepare for the use of variable-sized register classes While parameterising by XLen, also take the opportunity to clean up the formatting of the RISCV .td files. This commit unifies the in-tree code with my patchset at <https://github.com/lowrisc/riscv-llvm>. llvm-svn: 316159	2017-10-19 14:29:03 +00:00
Sumanth Gundapaneni	e1983bcf55	[Hexagon] New HVX target features. This patch lets the llvm tools handle the new HVX target features that are added by frontend (clang). The target-features are of the form "hvx-length64b" for 64 Byte HVX mode, "hvx-length128b" for 128 Byte mode HVX. "hvx-double" is an alias to "hvx-length128b" and is soon will be deprecated. The hvx version target feature is upgated form "+hvx" to "+hvxv{version_number}. Eg: "+hvxv62" For the correct HVX code generation, the user must use the following target features. For 64B mode: "+hvxv62" "+hvx-length64b" For 128B mode: "+hvxv62" "+hvx-length128b" Clang picks a default length if none is specified. If for some reason, no hvx-length is specified to llvm, the compilation will bail out. There is a corresponding clang patch. Differential Revision: https://reviews.llvm.org/D38851 llvm-svn: 316101	2017-10-18 18:07:07 +00:00
Sumanth Gundapaneni	9d954c4169	[Hexagon] Update Hexagon ArchEnum and sync some downstream changes(NFC) Differential Revision: https://reviews.llvm.org/D38850 llvm-svn: 316099	2017-10-18 17:45:22 +00:00
Krzysztof Parzyszek	8c53c95137	[Hexagon] Mark vector loads as predicable, update instruction mappings All loads of form V6_vL32b_{,cur,nt,tmp,nt_cur,nt_tmp}_{ai,pi,ppu} are predicable on v62 (but not on v60). Mark them all as predicable in the instruction definitions, and handle the v60 case in HII::isPredicable. llvm-svn: 316098	2017-10-18 17:36:46 +00:00
Konstantin Zhuravlyov	8d5e9e110c	AMDGPU: Rename MaxFlatWorkgroupSize to MaxFlatWorkGroupSize for consistency Differential Revision: https://reviews.llvm.org/D38957 llvm-svn: 316097	2017-10-18 17:31:09 +00:00
Alex Bradbury	13ce95b77f	[RISCV] Bugfix createRISCVELFObjectWriter r315275 set the IsLittleEndian parameter incorrectly. This patch corrects this, and adds a test to ensure such mistakes will be caught in the future. llvm-svn: 316091	2017-10-18 16:11:31 +00:00
Andre Vieira	d4a25707f0	[ARM] Fix disassembly for conditional VMRS and VMSR instructions in ARM mode Differential Revision: https://reviews.llvm.org/D38347 llvm-svn: 316085	2017-10-18 14:47:37 +00:00
Simon Dardis	03c2c65b2d	[mips] Fix analyzeBranch to handle debug data In the case where there was a conditional branch followed by a unconditional branch with debug instruction separating them, MipsInstrInfo::analyzeBranch would not skip past debug instruction when searching for the second branch which give erroneous results about the control flow of the block. This could lead to the branch folder to merge the non-fall through case into it's predecessor, leaving the conditional branch with a dangling basic block operand. This resolves PR34975. Thanks to Alexander Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39003 llvm-svn: 316084	2017-10-18 14:35:29 +00:00
NAKAMURA Takumi	6f43bd4bde	Untabify. llvm-svn: 316079	2017-10-18 13:31:28 +00:00
Dylan McKay	bebde41ec5	[AVR] Update to current LLVM API r315410 broke a number of things in the AVR backend, which are now fixed. llvm-svn: 316076	2017-10-18 12:35:15 +00:00
Michael Zuckerman	49293264cc	[AVX512][AVX2]Cost calculation for interleave load/store patterns {v8i8,v16i8,v32i8,v64i8} This patch adds accurate instructions cost. The formula presents two cases(stride 3 and stride 4) and calculates the cost according to the VF and stride. Reviewers: 1. delena 2. Farhana 3. zvi 4. dorit 5. Ayal Differential Revision: https://reviews.llvm.org/D38762 Change-Id: If4cfbd4ac0e63694e8144cb78c7fa34850647ff7 llvm-svn: 316072	2017-10-18 11:41:55 +00:00
Hiroshi Inoue	5388e66d3a	[PowerPC] Use helper functions to check sign-/zero-extended value Helper functions to identify sign- and zero-extending machine instruction is introduced in rL315888. This patch makes PPCInstrInfo::optimizeCompareInstr use the helper functions. It simplifies the code and also makes possible more optimizations since the helper can do more analysis than the original check code; I observed about 5000 more compare instructions are eliminated while building LLVM. Also, this patch fixes a bug in helpers on ANDIo instruction handling due to the order of checks. This bug causes a failure in an existing test case for optimizeCompareInstr. Differential Revision: https://reviews.llvm.org/D38988 llvm-svn: 316071	2017-10-18 10:31:19 +00:00
Michael Zuckerman	72a6f893cb	Fixing bug issue https://bugs.llvm.org/show_bug.cgi?id=34978 Change-Id: I7f13d5bcb181be2860377df7b40e1579a8ad4add llvm-svn: 316067	2017-10-18 08:04:31 +00:00
Daniel Sanders	30247fd1d9	[aarch64][globalisel] Register banks and classes should have distinct names. Otherwise they are ambiguous in MIR. llvm-svn: 316047	2017-10-18 00:12:43 +00:00
Wei Ding	7ab1f7a421	AMDGPU : Fix an error for the llvm.cttz implementation. Differential Revision: http://reviews.llvm.org/D39014 llvm-svn: 316037	2017-10-17 21:49:52 +00:00
Matthias Braun	a2f96b5bde	AArch64: Enable AES instruction fusion on Cyclone. Note that cyclone itself doesn't fuse, but newer apple chips do and we are using cyclone as the default when targeting apple OSes. The current code also does not capture all fusion patterns of apple CPUs yet; I am still looking for ways to refactor the code nicely to extend it. llvm-svn: 316036	2017-10-17 21:46:15 +00:00
Tim Northover	350a87eaf1	AArch64: account for possible frame index operand in compares. If the address of a local is used in a comparison, AArch64 can fold the address-calculation into the comparison via "adds". Unfortunately, a couple of places (both hit in this one test) are not ready to deal with that yet and just assume the first source operand is a register. llvm-svn: 316035	2017-10-17 21:43:52 +00:00
Eugene Zelenko	6cadde7f40	[Transforms] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 316034	2017-10-17 21:27:42 +00:00
Konstantin Zhuravlyov	7dabe9ced7	AMDGPU: Start generating metadata for MaxFlatWorkGroupSize Differential Revision: https://reviews.llvm.org/D38958 llvm-svn: 316024	2017-10-17 20:03:21 +00:00
Yichao Yu	a46eb8e649	Fix `FaultMaps` crash when the out streamer is reused Summary: Make sure the map is cleared before processing a new module. Similar to what is done on `StackMaps`. This issue is similar to D38588, though this time for FaultMaps (on x86) rather than ARM/AArch64. Other than possible mixing of information between modules, the crash is caused by the pointers values in the map that was allocated by the bump pointer allocator that is unwinded when emitting the next file. This issue has been around since 3.8. This issue is likely much harder to write a test for since AFAICT it requires emitting something much more compilcated (and possibly real code) instead of just some random bytes. Reviewers: skatkov, sanjoy Reviewed By: skatkov, sanjoy Subscribers: sanjoy, aemerson, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D38924 llvm-svn: 315990	2017-10-17 11:44:34 +00:00
Gadi Haber	1e0f1f476a	[X86][SKL] Updated scheduling information for the SkylakeClient target Updated the scheduling information for the SkylakeClient target with the following changes: 1. regrouped the instructions after adding load and store latencies. 2. regrouped the instructions after adding identified missing ports in several groups. The changes were made after revisiting the latencies impact of all the load and store uOps. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D38727 Change-Id: I778a308cc11e490e8fa5e27e2047412a1dca029f llvm-svn: 315978	2017-10-17 06:47:04 +00:00
Craig Topper	fbb1985c14	[X86] Fix typo in comment. NFC llvm-svn: 315969	2017-10-17 04:17:54 +00:00
Mark Searles	4e3d6160db	Use the return value of UpdateNodeOperands(); in some cases, UpdateNodeOperands() modifies the node in-place and using the return value isn’t strictly necessary. However, it does not necessarily modify the node, but may return a resultant node if it already exists in the DAG. See comments in UpdateNodeOperands(). In that case, the return value must be used to avoid such scenarios as an infinite loop (node is assumed to have been updated, so added back to the worklist, and re-processed; however, node hasn’t changed so it is once again passed to UpdateNodeOperands(), assumed modified, added back to worklist; cycle infinitely repeats). Differential Revision: https://reviews.llvm.org/D38466 llvm-svn: 315957	2017-10-16 23:38:53 +00:00
Quentin Colombet	0bd2825517	Re-apply [AArch64][RegisterBankInfo] Use the statically computed mappings for COPY This reverts commit r315823, thus re-applying r315781. Also make sure we don't use G_BITCAST mapping for non-generic registers. Non-generic registers don't have a type but do have a reg bank. Something the COPY mapping now how to deal with but the G_BITCAST mapping don't. -- Original Commit Message -- We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315947	2017-10-16 22:28:40 +00:00
Quentin Colombet	9f20af6135	[AArch64][RegisterBankInfo] Add mapping support for G_BITCAST of s128 Anything bigger than 64-bit just map to FPR. llvm-svn: 315946	2017-10-16 22:28:38 +00:00
Quentin Colombet	7c114d3d70	[AArch64][LegalizerInfo] Mark s128 G_BITCAST legal We used to mark all G_BITCAST of 128-bit legal but only for vector types. Scalars of this size are just fine as well. llvm-svn: 315945	2017-10-16 22:28:27 +00:00
Krzysztof Parzyszek	72518eaa6f	Add iterator range MachineRegisterInfo::liveins(), adopt users, NFC llvm-svn: 315927	2017-10-16 19:08:41 +00:00
Krzysztof Parzyszek	02893de4ef	[Hexagon] Rangify some loops, NFC Recommit r315763 with a fix. llvm-svn: 315925	2017-10-16 18:43:08 +00:00
Simon Dardis	0d378a9eed	[mips][micromips] Fix (dis)assembly of bc1(t\|f) Previously these instructions were marked codegen only and had an under-specified instruction description that did not record the fcc register. Reviewers: atanasyan, abeserminji Differential Revision: https://reviews.llvm.org/D38847 llvm-svn: 315905	2017-10-16 14:20:22 +00:00
Simon Pilgrim	73bd5aa049	Fix or vs \|\| typo. llvm-svn: 315903	2017-10-16 14:01:59 +00:00
Stefan Maksimovic	ee6b5a79dc	[mips] Provide alternate predicates for constant synthesis Ordering of patterns should not be of importance anymore since the predicates used are mutually exclusive now. llvm-svn: 315901	2017-10-16 13:18:21 +00:00
Hiroshi Inoue	a7eb78b47f	[PowerPC] fix up in sign-/zero-extension elimination This patch fixes a potential problem in my previous commit (https://reviews.llvm.org/rL315888) by adding a null check. llvm-svn: 315900	2017-10-16 12:11:15 +00:00
Andrew V. Tischenko	bfc9061593	This patch is a result of D37262: The issues with X86 prefixes. It closes PR7709, PR17697, PR19251, PR32809 and PR21640. There could be other bugs closed by this patch. llvm-svn: 315899	2017-10-16 11:14:29 +00:00
Daniel Sanders	01805b6747	[aarch64][globalisel] Fix a crash in selectAddrModeIndexed() caused by incorrect G_FRAME_INDEX handling The wrong operand was being rendered to the result instruction. The crash was detected by Bitcode/simd_ops/AArch64_halide_runtime.bc llvm-svn: 315890	2017-10-16 05:39:30 +00:00
Yonghong Song	6621cf67cf	bpf: fix bug on silently truncating 64-bit immediate We came across an llvm bug when compiling some testcases that 64-bit immediates are silently truncated into 32-bit and then packed into BPF_JMP \| BPF_K encoding. This caused comparison with wrong value. This bug looks to be introduced by r308080. The Select_Ri pattern is supposed to be lowered into J_Ri while the latter only support 32-bit immediate encoding, therefore Select_Ri should have similar immediate predicate check as what J_Ri are doing. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Yonghong Song <yhs@fb.com> llvm-svn: 315889	2017-10-16 04:14:53 +00:00
Hiroshi Inoue	e3a3e3c9e9	[PowerPC] Eliminate sign- and zero-extensions if already sign- or zero-extended This patch enables redundant sign- and zero-extension elimination in PowerPC MI Peephole pass. If the input value of a sign- or zero-extension is known to be already sign- or zero-extended, the operation is redundant and can be eliminated. One common case is sign-extensions for a method parameter or for a method return value; they must be sign- or zero-extended as defined in PPC ELF ABI. For example of the following simple code, two extsw instructions are generated before the invocation of int_func and before the return. With this patch, both extsw are eliminated. void int_func(int); void ii_test(int a) { if (a & 1) return int_func(a); } Such redundant sign- or zero-extensions are quite common in many programs; e.g. I observed about 60,000 occurrences of the elimination while compiling the LLVM+CLANG. Differential Revision: https://reviews.llvm.org/D31319 llvm-svn: 315888	2017-10-16 04:12:57 +00:00
Daniel Sanders	ea8711b88e	Re-commit r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. The previous commit failed on MSVC due to a failure to convert an initializer_list to a std::vector. Hopefully, MSVC will accept this version. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315887	2017-10-16 03:36:29 +00:00
Daniel Sanders	ce72d611af	Revert r315885: [globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* MSVC doesn't like one of the constructors. llvm-svn: 315886	2017-10-16 02:15:39 +00:00
Daniel Sanders	6735ea86cd	[globalisel][tblgen] Add support for iPTR and implement am_unscaled* and am_indexed* Summary: iPTR is a pointer of subtarget-specific size to any address space. Therefore type checks on this size derive the SizeInBits from a subtarget hook. At this point, we can import the simplests G_LOAD rules and select load instructions using them. Further patches will support for the predicates to enable additional loads as well as the stores. Depends on D37457 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: kristof.beyls, javed.absar, llvm-commits, igorb Differential Revision: https://reviews.llvm.org/D37458 llvm-svn: 315885	2017-10-16 01:16:35 +00:00
Krzysztof Parzyszek	7467119149	[Hexagon] Add LLVM_ATTRIBUTE_UNUSED to operator<<, NFC This should silence "unused function" warnings. llvm-svn: 315883	2017-10-16 00:29:47 +00:00
Daniel Sanders	df39cbae2f	Re-commit r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Hopefully fixed the ambiguous constructor that a large number of bots reported. Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315869	2017-10-15 18:22:54 +00:00
Daniel Sanders	bb082a36d3	Revert r315863: [globalisel][tablegen] Import ComplexPattern when used as an operator A large number of bots are failing on an ambiguous constructor call. llvm-svn: 315866	2017-10-15 17:51:07 +00:00
Daniel Sanders	b95b867dd8	[globalisel][tablegen] Import ComplexPattern when used as an operator Summary: It's possible for a ComplexPattern to be used as an operator in a match pattern. This is used by the load/store patterns in AArch64 to name the suboperands returned by ComplexPattern predicate so that they can be broken apart and referenced independently in the result pattern. This patch adds support for this in order to enable the import of load/store patterns. Depends on D37445 Reviewers: ab, qcolombet, t.p.northover, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D37456 llvm-svn: 315863	2017-10-15 17:03:36 +00:00
Craig Topper	2738117326	[X86] Remove the SlowBTMem feature flag entirely Turns out we have no patterns on the instructions that were using this feature flag for other reasons. These instructions are slow on all modern CPUs so it seems unlikely that we will spend any effort supporting these instructions going forward. So we might as well just kill of the feature flag and just fix up the comments. llvm-svn: 315862	2017-10-15 16:57:33 +00:00
Craig Topper	a5af4a64d0	[AVX512] Don't mark EXTLOAD as legal with AVX512. Continue using custom lowering. Summary: This was impeding our ability to combine the extending shuffles with other shuffles as you can see from the test changes. There's one special case that needed to be added to use VZEXT directly for v8i8->v8i64 since the custom lowering requires v64i8. Reviewers: RKSimon, zvi, delena Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38714 llvm-svn: 315860	2017-10-15 16:41:17 +00:00
Craig Topper	a1f9c9dd8b	[X86] Add FeatureSlowBTMem to Haswell, Broadwell, Skylake, Cannonlake, and Knights Landing CPUs. Summary: I see nothing in Agner Fog's tables to indicate that this improved between Ivy Bridge and Haswell. It's also set for all Atom CPUs so I assume KNL should have it too. Reviewers: RKSimon, zvi, gadi.haber Reviewed By: gadi.haber Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38890 llvm-svn: 315859	2017-10-15 16:41:15 +00:00
Aaron Ballman	615eb47035	Reverting r315590; it did not include changes for llvm-tblgen, which is causing link errors for several people. Error LNK2019 unresolved external symbol "public: void __cdecl `anonymous namespace'::MatchableInfo::dump(void)const " (?dump@MatchableInfo@?A0xf4f1c304@@QEBAXXZ) referenced in function "public: void __cdecl `anonymous namespace'::AsmMatcherEmitter::run(class llvm::raw_ostream &)" (?run@AsmMatcherEmitter@?A0xf4f1c304@@QEAAXAEAVraw_ostream@llvm@@@Z) llvm-tblgen D:\llvm\2017\utils\TableGen\AsmMatcherEmitter.obj 1 llvm-svn: 315854	2017-10-15 14:32:27 +00:00
Amjad Aboud	c8d67979c0	[X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565 Differential Revision: https://reviews.llvm.org/D38359 llvm-svn: 315851	2017-10-15 11:00:56 +00:00
Craig Topper	a9cd59fb5d	[X86] Lower vselect with constant condition to vector_shuffle even with AVX512 instructions. Summary: It's better to use our shuffle lowering code to handle these than loading an immediate into a k-register. It really feels like this should be a DAG combine optimization rather than a lowering operation, but that's a problem for another day. Reviewers: RKSimon, delena, zvi Reviewed By: delena Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38932 llvm-svn: 315849	2017-10-15 06:39:07 +00:00
Vitaly Buka	7450398e01	Remove unused variables llvm-svn: 315847	2017-10-15 05:35:02 +00:00
Davide Italiano	76067588dc	[Hexagon] Mark RangeTree::dump() with LLVM_DUMP_METHOD. GCC otherwise emits a "defined but not used" warning on the member function. llvm-svn: 315838	2017-10-14 23:46:01 +00:00
Konstantin Zhuravlyov	8c18f5b3d4	AMDGPU: Don't use TargetStreamer if it has not been initialized Fixes cfe/trunk/test/Misc/backend-resource-limit-diagnostics.cl test after r315808 We may hit few other similar issues, but I want to discuss good solution offline. llvm-svn: 315830	2017-10-14 22:16:26 +00:00
Simon Pilgrim	36fe00ee17	[X86][SSE] Don't attempt to reduce the imul vector width of odd sized vectors (PR34947) llvm-svn: 315825	2017-10-14 19:57:19 +00:00
Bruno Cardoso Lopes	caac2fbd19	Revert "[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY" This reverts commit r315781, breaks: http://green.lab.llvm.org/green/job/Compiler_Verifiers_GlobalISEL/9882 llvm-svn: 315823	2017-10-14 19:31:03 +00:00
Konstantin Zhuravlyov	a01d8b0b63	AMDGPU: Bring HSA metadata on par with the specification Differential Revision: https://reviews.llvm.org/D38753 llvm-svn: 315821	2017-10-14 19:03:51 +00:00
Simon Pilgrim	f5b9f353c3	Pull out repeated calls to VT.getVectorNumElements(). NFCI. llvm-svn: 315818	2017-10-14 17:37:42 +00:00
Simon Pilgrim	cded82837d	Use DAG::getBitcast() helper. NFCI. llvm-svn: 315815	2017-10-14 17:14:42 +00:00
Konstantin Zhuravlyov	219066bab8	AMDGPU: Improve note directive verification in assembler - Do not allow amd_amdgpu_isa directives on non-amdgcn architectures - Do not allow amd_amdgpu_hsa_metadata on non-amdhsa OSes - Do not allow amd_amdgpu_pal_metadata on non-amdpal OSes Differential Revision: https://reviews.llvm.org/D38750 llvm-svn: 315812	2017-10-14 16:15:28 +00:00
Konstantin Zhuravlyov	eda425edd4	AMDGPU: Do not emit deprecated notes for code object v3 Differential Revision: https://reviews.llvm.org/D38749 llvm-svn: 315810	2017-10-14 15:59:07 +00:00
Konstantin Zhuravlyov	9c05b2bc3b	AMDGPU: Add support for isa version note - Emit NT_AMD_AMDGPU_ISA - Add assembler parsing for isa version directive - If isa version directive does not match command line arguments, then return error Differential Revision: https://reviews.llvm.org/D38748 llvm-svn: 315808	2017-10-14 15:40:33 +00:00
Simon Pilgrim	f367c27d2d	[X86][SSE] Support combining AND(EXTRACT(SHUF(X)), C) -> EXTRACT(SHUF(X)) If we are applying a byte mask to a value extracted from a shuffle, see if we can combine the mask into shuffle. Fixes the last issue with PR22415 llvm-svn: 315807	2017-10-14 15:01:36 +00:00
Craig Topper	f7e777763d	[X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load. llvm-svn: 315802	2017-10-14 07:04:48 +00:00
Craig Topper	61010a85b8	[X86] Add AVX512 versions of VCVTPD2PS to load folding tables. llvm-svn: 315801	2017-10-14 05:55:43 +00:00
Craig Topper	ee277e190c	[X86] Add patterns for vzmovl+cvtpd2ps with a load. llvm-svn: 315800	2017-10-14 05:55:42 +00:00
Craig Topper	aec05a9303	[X86] Remove some patterns for bitcasted alignednonedtemporalloads. These select the same instruction as the non-bitcasted pattern. So this provides no additional value. llvm-svn: 315799	2017-10-14 04:18:11 +00:00
Craig Topper	009f0aaeb0	[X86] Remove unnecessary bitconverts as the root of patterns for zero extended VCVTPD2UDQZ128rr and VCVTTPD2UDQZ128rr. We don't need a bitconvert as a root pattern in these cases. The types in the other parts of the pattern are sufficient to express the behavior of these instructions. llvm-svn: 315798	2017-10-14 04:18:10 +00:00
Craig Topper	d746747d03	[X86] Add additional patterns for folding loads with 128-bit VCVTDQ2PD and VCVTUDQ2PD. This matches the patterns we have for the SSE/AVX version. This is a prerequisite for D38714. llvm-svn: 315797	2017-10-14 04:18:09 +00:00
Craig Topper	134241e4af	[X86] Add AVX512 flavors of VCVTDQ2PD plus VCVTUDQ2PD to the load folding tables. llvm-svn: 315796	2017-10-14 04:18:08 +00:00
Craig Topper	0b64e67b0d	[X86] Remove TB_NO_REVERSE from VCVTDQ2PDYrr and VCVTPS2PDYrr in the load folding tables. I believe these were added incorrectly under the belief that the load size was smaller than the input register size, but that's not true. llvm-svn: 315795	2017-10-14 04:18:07 +00:00
Craig Topper	53b0cb7fa9	[X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable load folding without the peephole pass. This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns. llvm-svn: 315794	2017-10-14 04:18:06 +00:00
Quentin Colombet	dc2da06c55	[AArch64][RegisterBankInfo] Use the statically computed mappings for COPY We use to resort on the generic implementation to get the mappings for COPYs. The generic implementation resorts on table lookup and dynamically allocated objects to get the valid mappings. Given we already know how to map G_BITCAST and have the static mappings for them, use that code path for COPY as well. This is much more efficient. Improve the compile time of RegBankSelect by up to 20%. Note: When we eventually generate all the mappings via TableGen, we wouldn't have to do that dance to shave compile time. The intent of this change was to make sure that moving to static structure really pays off. NFC. llvm-svn: 315781	2017-10-14 00:43:48 +00:00
Krzysztof Parzyszek	a7e5c84590	Revert r315763: "[Hexagon] Rangify some loops, NFC" Broke some builds (using libstdc++). llvm-svn: 315769	2017-10-13 21:57:11 +00:00
Craig Topper	f6c69564e7	[X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768	2017-10-13 21:56:48 +00:00
Krzysztof Parzyszek	63ca5d6196	[Hexagon] Rangify some loops, NFC llvm-svn: 315763	2017-10-13 21:43:00 +00:00
Daniel Sanders	11300cead8	[globalisel][tablegen] Add support for fpimm and import of APInt/APFloat based ImmLeaf. Summary: There's only a tablegen testcase for IntImmLeaf and not a CodeGen one because the relevant rules are rejected for other reasons at the moment. On AArch64, it's because there's an SDNodeXForm attached to the operand. On X86, it's because the rule either emits multiple instructions or has another predicate using PatFrag which cannot easily be supported at the same time. Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: aemerson, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36569 llvm-svn: 315761	2017-10-13 21:28:03 +00:00
Matt Arsenault	e11d8aca77	AMDGPU: Implement hasBitPreservingFPLogic llvm-svn: 315754	2017-10-13 21:10:22 +00:00
Benjamin Kramer	9f21ca6361	[Hexagon] Avoid unused variable warnings in release builds. No functionality change intended. llvm-svn: 315749	2017-10-13 20:46:14 +00:00
Matt Arsenault	550c66d10f	AMDGPU: Look for src mods before fp_extend When selecting modifiers for mad_mix instructions, look at fneg/fabs that occur before the conversion. llvm-svn: 315748	2017-10-13 20:45:49 +00:00
Daniel Sanders	649c585710	[aarch64] Support APInt and APFloat in ImmLeaf subclasses and make AArch64 use them. Summary: The purpose of this patch is to expose more information about ImmLeaf-like PatLeaf's so that GlobalISel can learn to import them. Previously, ImmLeaf could only be used to test int64_t's produced by sign-extending an APInt. Other tests on immediates had to use the generic PatLeaf and extract the constant using C++. With this patch, tablegen will know how to generate predicates for APInt, and APFloat. This will allow it to 'do the right thing' for both SelectionDAG and GlobalISel which require different methods of extracting the immediate from the IR. This is NFC for SelectionDAG since the new code is equivalent to the previous code. It's also NFC for FastISel because FastIselShouldIgnore is 1 for the ImmLeaf subclasses. Enabling FastIselShouldIgnore == 0 for these new subclasses will require a significant re-factor of FastISel. For GlobalISel, it's currently NFC because the relevant code to import the affected rules is not yet present. This will be added in a later patch. Depends on D36086 Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar Reviewed By: qcolombet Subscribers: bjope, aemerson, rengolin, javed.absar, igorb, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D36534 llvm-svn: 315747	2017-10-13 20:42:18 +00:00
Matt Arsenault	4d70754e3c	AMDGPU: Implement isFPExtFoldable This helps match v_mad_mix* in some cases. llvm-svn: 315744	2017-10-13 20:18:59 +00:00
Matt Arsenault	f2db97d8fa	DAG: Add opcode and source type to isFPExtFree This is only currently used for mad/fma transforms. This is the only case where it should be used for AMDGPU, so add an opcode to be sure. llvm-svn: 315740	2017-10-13 19:55:45 +00:00
Krzysztof Parzyszek	7c9c05888c	[Hexagon] Minimize number of repeated constant extenders Each constant extender requires an extra instruction, which adds to the code size and also reduces the number of available slots in an instruction packet. In most cases, the value of a repeated constant extender could be loaded into a register, and the instructions using the extender could be replaced with their counterparts that use that register instead. This patch adds a pass that tries to reduce the number of constant extenders, including extenders which differ only in an immediate offset known at compile time, e.g. @global and @global+12. llvm-svn: 315735	2017-10-13 19:02:59 +00:00
Craig Topper	5d692917f4	[X86] Add initial skeleton support for knm cpu This adds Intel's Knights Mill CPU to valid CPU names for the backend. For now its an alias of "knl", but ultimately we need to support AVX5124FMAPS and AVX5124VNNIW instruction sets for it. Differential Revision: https://reviews.llvm.org/D38811 llvm-svn: 315722	2017-10-13 18:10:17 +00:00
Craig Topper	5805fb3dfc	[X86] Fix some inconsistent formatting in the processor feature lists. llvm-svn: 315696	2017-10-13 16:06:06 +00:00
Craig Topper	54541c4675	[X86] Add ProcIntelBDW to BroadwellProc class not BDWFeatures class. This isn't a property we want inherited. llvm-svn: 315695	2017-10-13 16:04:08 +00:00
Krzysztof Parzyszek	a0f2f7c413	[Hexagon] Add patterns for cmpb/cmph with immediate arguments Patch by Sumanth Gundapaneni. llvm-svn: 315692	2017-10-13 15:43:12 +00:00
Craig Topper	0817346aef	[X86] Stop creating CMOV nodes with a second MVT::Glue result Summary: We seem to inconsistently create CMOV nodes some with a Glue result and some without. But I can't find any cases that use the Glue result. So I've tried to remove all the place that did this. Reviewers: RKSimon, spatel, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38664 llvm-svn: 315686	2017-10-13 15:28:35 +00:00
Craig Topper	bf0de9d3b6	[X86] Remove patterns that select unmasked vbroadcastf2x32/vbroadcasti2x32. Prefer vbroadcastsd/vpbroadcastq instead. There's no advantage to using these instructions when they aren't masked. This enables some additional execution domain switching without needing to update the table. llvm-svn: 315674	2017-10-13 06:07:10 +00:00
Matthias Braun	bb8507e63c	Revert "TargetMachine: Merge TargetMachine and LLVMTargetMachine" Reverting to investigate layering effects of MCJIT not linking libCodeGen but using TargetMachine::getNameWithPrefix() breaking the lldb bots. This reverts commit r315633. llvm-svn: 315637	2017-10-12 22:57:28 +00:00
Matthias Braun	3a9c114b24	TargetMachine: Merge TargetMachine and LLVMTargetMachine Merge LLVMTargetMachine into TargetMachine. - There is no in-tree target anymore that just implements TargetMachine but not LLVMTargetMachine. - It should still be possible to stub out all the various functions in case a target does not want to use lib/CodeGen - This simplifies the code and avoids methods ending up in the wrong interface. Differential Revision: https://reviews.llvm.org/D38489 llvm-svn: 315633	2017-10-12 22:28:54 +00:00
Craig Topper	060cb43721	[X86] Add CLWB intrinsic. llvm part llvm-svn: 315613	2017-10-12 20:08:31 +00:00
Wei Ding	5676acad9e	Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. Differential Revision: http://reviews.llvm.org/D37348 llvm-svn: 315610	2017-10-12 19:37:14 +00:00
Konstantin Zhuravlyov	70303c011f	AMDGPU/NFC: Move AMDGPU specific note types to ELF.h Differential Revision: https://reviews.llvm.org/D38747 llvm-svn: 315608	2017-10-12 18:59:54 +00:00
Artem Belevich	3bafc2f0d9	[NVPTX] Implemented wmma intrinsics and instructions. WMMA = "Warp Level Matrix Multiply-Accumulate". These are the new instructions introduced in PTX6.0 and available on sm_70 GPUs. Differential Revision: https://reviews.llvm.org/D38645 llvm-svn: 315601	2017-10-12 18:27:55 +00:00
Reid Kleckner	1a7e387849	[codeview] Don't emit FPO data in funclet prologues Attempt 3 to work around bugs in FPO data with funclets. llvm-svn: 315600	2017-10-12 18:20:35 +00:00
Konstantin Zhuravlyov	63e87f5a02	AMDGPU: Fix warnings introduced in r315526 llvm-svn: 315596	2017-10-12 17:34:05 +00:00
Lei Huang	0724fea2da	[PowerPC] Add profitablilty check for conversion to mtctr loops Add profitability checks for modifying counted loops to use the mtctr instruction. The latency of mtctr is only justified if there are more than 4 comparisons that will be removed as a result. Usually counted loops are formed relatively early and before unrolling, so most low trip count loops often don't survive. However we want to ensure that if they do, we do not mistakenly update them to mtctr loops. Use CodeMetrics to ensure we are only doing this for small loops with small trip counts. Differential Revision: https://reviews.llvm.org/D38212 llvm-svn: 315592	2017-10-12 16:43:33 +00:00
Tim Renouf	c8ffffe462	[AMDGPU] For amdpal, widen interpolation mode workaround Summary: The interpolation mode workaround ensures that at least one interpolation mode is enabled in PSInputAddr. It does not also check PSInputEna on the basis that the user might enable bits in that depending on run-time state. However, for amdpal os type, the user does not enable some bits after compilation based on run-time states; the register values being generated here are the final ones set in the hardware. Therefore, apply the workaround to PSInputAddr and PSInputEnable together. (The case where a bit is set in PSInputAddr but not in PSInputEnable is where the frontend set up an input arg for a particular interpolation mode, but nothing uses that input arg. Really we should have an earlier pass that removes such an arg.) Reviewers: arsenm, nhaehnle, dstuttard Subscribers: kzhuravl, wdng, yaxunl, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D37758 llvm-svn: 315591	2017-10-12 16:16:41 +00:00
Don Hinton	3e0199f7eb	[dump] Remove NDEBUG from test to enable dump methods [NFC] Summary: Add LLVM_FORCE_ENABLE_DUMP cmake option, and use it along with LLVM_ENABLE_ASSERTIONS to set LLVM_ENABLE_DUMP. Remove NDEBUG and only use LLVM_ENABLE_DUMP to enable dump methods. Move definition of LLVM_ENABLE_DUMP from config.h to llvm-config.h so it'll be picked up by public headers. Differential Revision: https://reviews.llvm.org/D38406 llvm-svn: 315590	2017-10-12 16:16:06 +00:00
Sanjay Patel	3a72909b7e	[x86] replace isEqualTo with == for efficiency This is a follow-up suggested in D37534. Patch by Yulia Koval. llvm-svn: 315589	2017-10-12 16:15:38 +00:00
Simon Pilgrim	0903085ec3	[X86][SSE] Pull out repeated INSERT_VECTOR_ELT code from LowerBUILD_VECTOR v16i8/v8i16 insertion. NFCI. llvm-svn: 315587	2017-10-12 15:52:01 +00:00
Reid Kleckner	d925f98375	Speculative build fix 2 llvm-svn: 315542	2017-10-12 00:28:28 +00:00
Wei Mi	1736efd16a	Revert r307036 because of PR34919. llvm-svn: 315540	2017-10-12 00:24:52 +00:00
Reid Kleckner	9c0126ec0b	Speculative build fix, apparently I built llc without my patch applied to test it llvm-svn: 315539	2017-10-12 00:20:50 +00:00
Reid Kleckner	29cfa6f11f	[codeview] Disable FPO in functions using EH funclets Funclets are emitted by WinException which doesn't have access to X86TargetStreamer so it's hard to make a quick fix for this. llvm-svn: 315538	2017-10-12 00:06:57 +00:00

... 6 7 8 9 10 ...

45037 Commits