llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	28f97f1dbc	AMDGPU: Don't hardcode num defs for MUBUF instructions This shouldn't change anything since the no-ret atomics are selected later. llvm-svn: 357084	2019-03-27 16:12:29 +00:00
Matt Arsenault	e9ad7e9a71	AMDGPU: wave_barrier is not isBarrier This is not a control flow instruction, so should not be marked as isBarrier. This fixes a verifier error if followed by unreachable. llvm-svn: 357081	2019-03-27 15:54:45 +00:00
Yonghong Song	6c56edfe42	[BPF] use std::map to ensure consistent output The .BTF.ext FuncInfoTable and LineInfoTable contain information organized per ELF section. Current definition of FuncInfoTable/LineInfoTable is: std::unordered_map<uint32_t, std::vector<BTFFuncInfo>> FuncInfoTable std::unordered_map<uint32_t, std::vector<BTFLineInfo>> LineInfoTable where the key is the section name off in the string table. The unordered_map may cause the order of section output different for different platforms. The same for unordered map definition of std::unordered_map<std::string, std::unique_ptr<BTFKindDataSec>> DataSecEntries where BTF_KIND_DATASEC entries may have different ordering for different platforms. This patch fixed the issue by using std::map. Test static-var-derived-type.ll is modified to generate two DataSec's which will ensure the ordering is the same for all supported platforms. Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 357077	2019-03-27 15:45:27 +00:00
Matt Arsenault	bbc59d8d0d	AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics The offset operand index is different for atomics. llvm-svn: 357073	2019-03-27 15:41:00 +00:00
Dmitry Preobrazhensky	40f0162a9a	Revert of 357063 [AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes Reason: the change was mistakenly committed before review llvm-svn: 357066	2019-03-27 13:49:52 +00:00
Sander de Smalen	90d1b551e1	[AArch64] NFC: Cleanup isAArch64FrameOffsetLegal Cleanup isAArch64FrameOffsetLegal by: - Merging the large switch statement to reuse AArch64InstrInfo::getMemOpInfo(). - Using AArch64InstrInfo::getUnscaledLdSt() to determine whether an instruction has an unscaled variant. - Simplifying the logic that calculates the offset to fit the immediate. Reviewers: paquette, evandro, eli.friedman, efriedma Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D59636 llvm-svn: 357064	2019-03-27 13:16:19 +00:00
Dmitry Preobrazhensky	bcc4d53835	[AMDGPU][MC] Corrected handling of tied src for atomic return MUBUF opcodes See bug 40917: https://bugs.llvm.org/show_bug.cgi?id=40917 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D59305 llvm-svn: 357063	2019-03-27 13:07:41 +00:00
Sander de Smalen	46edefe3c4	[AArch64] Adds cases for LDRSHWui and LDRSHXui to getMemOpInfo This patch also adds cases PRFUMi and PRFMui. This change was discussed in https://reviews.llvm.org/D59635. llvm-svn: 357059	2019-03-27 10:39:03 +00:00
Simon Pilgrim	ccb71b2985	Revert rL356864 : [X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. ........ Causes PR41249 llvm-svn: 357057	2019-03-27 10:25:02 +00:00
Craig Topper	7da7b97487	[X86] When iselling (x << C1) and/or/xor C2 as (x and/or/xor (C2>>C1)) << C1, go through the isel table instead of manually selecting. Previously we manually selected the AND/OR/XOR with immediate and the SHL(or ADD if the shift is 1). But this was missing out on the opportunity to use a 64 bit AND with a 32-bit immediate and possibly other isel tricks we have built into the tables. Instead, insert the new nodes into the DAG using insertDAGNode and allow them each to be selected through the normal table. llvm-svn: 357049	2019-03-27 04:45:58 +00:00
QingShan Zhang	5321dcd608	[NFC][PowerPC] Custom PowerPC specific machine-scheduler This patch lays the groundwork for extending the generic machine scheduler by providing a PPC-specific implementation. There are no functional changes as this is an incremental patch that simply provides the necessary overrides which just encapsulate the behavior of the generic scheduler. Subsequent patches will add specific behavior. Differential Revision: https://reviews.llvm.org/D59284 llvm-svn: 357047	2019-03-27 03:50:16 +00:00
Craig Topper	22387a56fe	[X86] Simplify some code in matchBitExtract by using ANY_EXTEND. We were manually outputting the code we would get from selecting ANY_EXTEND. We can save some code by just letting an ANY_EXTEND go through isel on its own. llvm-svn: 357045	2019-03-27 02:08:03 +00:00
Guozhi Wei	330dcd9dab	[PPC] Refactor PPCBranchSelector.cpp This patch splits the huge function PPCBranchSelector.cpp:runOnMachineFunction into several smaller functions. No functional change. Differential Revision: https://reviews.llvm.org/D59623 llvm-svn: 357033	2019-03-26 21:27:38 +00:00
Stefan Pintilie	e1d79a87c6	[PowerPC] Remove UseVSXReg The UseVSXReg flag can be safely removed and the code cleaned up. Patch By: Yi-Hong Liu Differential Revision: https://reviews.llvm.org/D58685 llvm-svn: 357028	2019-03-26 20:28:21 +00:00
Sam Clegg	492f752969	[WebAssembly] Initial implementation of PIC code generation This change implements lowering of references global symbols in PIC mode. This change implements lowering of global references in PIC mode using a new @GOT reference type. @GOT references can be used with function or data symbol names combined with the get_global instruction. In this case the linker will insert the wasm global that stores the address of the symbol (either in memory for data symbols or in the wasm table for function symbols). For now I'm continuing to use the R_WASM_GLOBAL_INDEX_LEB relocation type for this type of reference which means that this relocation type can refer to either a global or a function or data symbol. We could choose to introduce specific relocation types for GOT entries in the future. See the current dynamic linking proposal: https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md Differential Revision: https://reviews.llvm.org/D54647 llvm-svn: 357022	2019-03-26 19:46:15 +00:00
Heejin Ahn	54551c1df7	[WebAssembly] Don't analyze branches after CFGStackify Summary: `WebAssembly::analyzeBranch` now does not analyze anything if the function is CFG stackified. We were previously doing similar things by checking if a branch's operand is whether an integer or an MBB, but this failed to bail out when a BB did not have any terminators. Consider this case: ``` bb0: try $label0 call @foo // unwinds to %ehpad bb1: ... br $label0 // jumps to %cont. can be deleted ehpad: catch ... cont: end_try ``` Here `br $label0` will be deleted in CFGStackify's `removeUnnecessaryInstrs` function, because we jump to the %cont block even without the branch. But in this case, MachineVerifier fails to verify this, because `ehpad` is not a successor of `bb1` even if `bb1` does not have any terminators. MachineVerifier incorrectly thinks `bb1` falls through to the next block. This pass now consistently rejects all analysis after CFGStackify whether a BB has terminators or not, also making the MachineVerifier work. (MachineVerifier does not try to verify relationships between BBs if `analyzeBranch` fails, the behavior we want after CFGStackify.) This also adds a new option `-wasm-disable-ehpad-sort` for testing. This option helps create the sorted order we want to test, and without the fix in this patch, the tests in cfg-stackify-eh.ll fail at MachineVerifier with `-wasm-disable-ehpad-sort`. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59740 llvm-svn: 357015	2019-03-26 18:21:20 +00:00
Heejin Ahn	1aaa481fc1	[WebAssembly] Add CFGStacikfied field to WebAssemblyFunctionInfo Summary: This adds `CFGStackified` field and its serialization to WebAssemblyFunctionInfo. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59747 llvm-svn: 357011	2019-03-26 17:46:14 +00:00
Heejin Ahn	52221d56bc	[WebAssembly] Support WebAssemblyFunctionInfo serialization Summary: The framework for supporting target-specific MachineFunctionInfo was added in r356215. This adds serialization support for WebAssemblyFunctionInfo on top of that. This patch only adds the framework and does not actually serialize anything at this point; we have to add YAML mapping later for the fields in WebAssemblyFunctionInfo we want to serialize if necessary. Reviewers: dschuff, arsenm Subscribers: sunfish, wdng, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59737 llvm-svn: 357009	2019-03-26 17:35:35 +00:00
Heejin Ahn	222718fdd2	[WebAssembly] Fix a bug when mixing TRY/LOOP markers Summary: When TRY and LOOP markers are in the same BB and END_TRY and END_LOOP markers are in the same BB, END_TRY should be _before_ END_LOOP, because LOOP is always before TRY if they are in the same BB. (TRY is placed in the latest possible position, whereas LOOP is in the earliest possible position.) Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59751 llvm-svn: 357008	2019-03-26 17:29:55 +00:00
Heejin Ahn	44a5a4b107	[WebAssembly] Fix bugs in BLOCK/TRY placement Summary: Before we placed all TRY/END_TRY markers before placing BLOCK/END_BLOCK markers. This couldn't handle this case: ``` bb0: br bb2 bb1: // nearest common dominator of bb3 and bb4 br_if ... bb3 br bb4 bb2: ... bb3: call @foo // unwinds to ehpad bb4: call @bar // unwinds to ehpad ehpad: catch ... ``` When we placed TRY markers, we placed it in bb1 because it is the nearest common dominator of bb3 and bb4. But because bb0 jumps to bb2, when we placed block markers, we ended up with interleaved scopes like ``` block try end_block catch end_try ``` which was not correct. This patch fixes the bug by placing BLOCK and TRY markers in one pass while iterating BBs in a function. This also adds some more routines to `placeTryMarkers`, because we now have to assume that there can be previously placed BLOCK and END_BLOCK. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59739 llvm-svn: 357007	2019-03-26 17:15:55 +00:00
Jonas Paulsson	8f8c38174e	[SystemZ] Remove LRMux pseudo instruction. This instruction is unused and not needed. Review: Ulrich Weigand. llvm-svn: 356997	2019-03-26 15:13:48 +00:00
Luis Marques	614fd9d830	[RISCV] Improve codegen for icmp {ne,eq} with a constant Adds two patterns to improve the codegen of GPR value comparisons with small constants. Instead of first loading the constant into another register and then doing an XOR of those registers, these patterns directly use the constant as an XORI immediate. llvm-svn: 356990	2019-03-26 12:55:00 +00:00
Oliver Stannard	5c90238479	[ARM][Asm] Accept upper case coprocessor number and registers Differential revision: https://reviews.llvm.org/D59760 llvm-svn: 356984	2019-03-26 10:24:03 +00:00
Craig Topper	4dcabf8ddf	[X86] In matchBitExtract, place all of the new nodes before Node's position in the DAG for the topological sort. We were using OrigNBits, but that put all the nodes before the node we used to start the control computation. This caused some node earlier than the sequence we inserted to be selected before the sequence we created. We want our new sequence to be selected first since it depends on OrigNBits. I don't have a test case. Found by reviewing the code. llvm-svn: 356979	2019-03-26 05:31:32 +00:00
Craig Topper	10576fea82	[X86] In matchBitExtract, if we need to truncate the BEXTR make sure we put the BEXTR at Node's position in the DAG for the topological sort. We were using OrigNBits, but that doesn't guarantee that it will be selected before the nodes that make up X. llvm-svn: 356978	2019-03-26 05:12:23 +00:00
Craig Topper	795ebe3bff	[X86] Remove unneeded FIXME. NFC We do fold loads right below this. llvm-svn: 356977	2019-03-26 05:12:21 +00:00
Craig Topper	fd880d30b1	X86Parser: Fix potential reference to deleted object Within the MatchFPUWaitAlias function, Operands[0] is potentially overwritten leading to &Op referencing a deleted object. To fix this, assign the reference after the function. Differential Revision: https://reviews.llvm.org/D57376 llvm-svn: 356973	2019-03-26 03:12:43 +00:00
Craig Topper	3dce29b8e9	X86AsmParser: Do not process a non-existent token This error can only happen if an unfinished operation is at Eof. Patch by Brandon Jones Differential Revision: https://reviews.llvm.org/D57379 llvm-svn: 356972	2019-03-26 03:12:41 +00:00
Eli Friedman	1e5d569c8c	[ARM] Add missing memory operands to a bunch of instructions. This should hopefully lead to minor improvements in code generation, and more accurate spill/reload comments in assembly. Also fix isLoadFromStackSlotPostFE/isStoreToStackSlotPostFE so they don't lead to misleading assembly comments for merged memory operands; this is technically orthogonal, but in practice the relevant memory operand lists don't show up without this change. Differential Revision: https://reviews.llvm.org/D59713 llvm-svn: 356963	2019-03-25 22:42:30 +00:00
Matt Arsenault	8bbc159786	Revert "AMDGPU: Scavenge register instead of findUnusedReg" This reverts r356149. This is crashing on rocBLAS. llvm-svn: 356958	2019-03-25 21:41:40 +00:00
Matt Arsenault	77bf2e3704	AMDGPU: Remove unnecessary check for isFullCopy Subregister indexes are not used for physical register operands, so isFullCopy is implied by the physical register check. llvm-svn: 356956	2019-03-25 21:28:53 +00:00
Eli Friedman	92d0d13366	[AArch64] Prefer "mov" over "orr" to materialize constants. This is generally more readable due to the way the assembler aliases work. (This causes a lot of test changes, but it's not really as scary as it looks at first glance; it's just mechanically changing a bunch of checks for orr to check for mov instead.) Differential Revision: https://reviews.llvm.org/D59720 llvm-svn: 356954	2019-03-25 21:25:28 +00:00
Matt Arsenault	bc978872de	AMDGPU: Set hasSideEffects 0 on _term instructions These were defaulting to true, but they are just wrappers around bit operations. This avoids regressions in the exec mask optimization passes in a future commit. llvm-svn: 356952	2019-03-25 21:10:12 +00:00
Konstantin Zhuravlyov	51809cbc98	AMDGPU: Add support for cross address space synchronization scopes Differential Revision: https://reviews.llvm.org/D59517 llvm-svn: 356946	2019-03-25 20:50:21 +00:00
Matt Arsenault	fa28455116	AMDGPU: Preserve LiveIntervals in WQM This seems to already be done, but wasn't marked. llvm-svn: 356922	2019-03-25 16:47:42 +00:00
Petar Avramovic	a034a64f84	[MIPS GlobalISel] Select copy for arguments from FPRBRegBank Move selectCopy into MipsInstructionSelector class. Select copy for arguments from FPRBRegBank for MIPS32. Differential Revision: https://reviews.llvm.org/D59644 llvm-svn: 356886	2019-03-25 11:38:06 +00:00
Petar Avramovic	3dfa368d5d	[MIPS GlobalISel] Add floating point register bank Add floating point register bank for MIPS32. Implement getRegBankFromRegClass for float register classes. Differential Revision: https://reviews.llvm.org/D59643 llvm-svn: 356883	2019-03-25 11:30:46 +00:00
Petar Avramovic	5a457e08f6	[MIPS GlobalISel] Lower float and double arguments in registers Lower float and double arguments in registers for MIPS32. When float/double argument is passed through gpr registers select appropriate move instruction. Differential Revision: https://reviews.llvm.org/D59642 llvm-svn: 356882	2019-03-25 11:23:41 +00:00
Diana Picus	254b11a0fd	[ARM GlobalISel] 64-bit memops should be aligned We currently use only VLDR/VSTR for all 64-bit loads/stores, so the memory operands must be word-aligned. Mark aligned operations as legal and narrow non-aligned ones to 32 bits. While we're here, also mark non-power-of-2 loads/stores as unsupported. llvm-svn: 356872	2019-03-25 08:54:29 +00:00
Craig Topper	a17287f084	[X86] Update some of the getMachineNode calls from X86ISelDAGToDAG to also include a VT for a EFLAGS result. This makes the nodes consistent with how they would be emitted from the isel table. llvm-svn: 356870	2019-03-25 07:22:18 +00:00
Craig Topper	1cc01c3228	[X86] When selecting (x << C1) op C2 as (x op (C2>>C1)) << C1, use the operation VT for the target constant. Normally when the nodes we use here(AND32ri8 for example) are selected their immediates are just converted from ConstantSDNode to TargetConstantSDNode without changing VT from the original operation VT. So we should still be emitting them with the operation VT. Theoretically this could expose more accurate opportunities for CSE. llvm-svn: 356869	2019-03-25 06:53:45 +00:00
Craig Topper	3810e35d3f	[X86] Remove GetLo8XForm and use GetLo32XForm instead. NFCI We were using this to create an AND32ri8 node from a 64-bit and, but that node normally still uses a 32-bit immediate. So we should just truncate the existing immediate to i32. We already verified it has the same value in bits 31:7. llvm-svn: 356868	2019-03-25 06:53:44 +00:00
Craig Topper	5b43446831	[X86] Remove a couple unused SDNodeXForms. NFC llvm-svn: 356867	2019-03-25 06:53:43 +00:00
Craig Topper	7c2554dd92	Revert r356688 "[X86] Don't avoid folding multiple use sign extended 8-bit immediate into instructions under optsize." Looking back over how the one use optimization works, I don't think this is the right way to fix this. llvm-svn: 356866	2019-03-25 01:25:32 +00:00
Simon Pilgrim	87d4ab8b92	[X86][SSE41] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Enable SSE41 ZERO_EXTEND_VECTOR_INREG shuffle combines - for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern we reduce the shuffles (port5-bottleneck on Intel) at the expense of creating a zero (pxor v,v) and an extra register move - which is a good trade off as these are pretty cheap and in most cases it doesn't increase register pressure. This also exposed a missed opportunity to use combine to ZERO_EXTEND_VECTOR_INREG with folded loads - even if we're in the float domain. llvm-svn: 356864	2019-03-24 19:06:35 +00:00
Heejin Ahn	803c7782d5	[WebAssembly] Rename a variable in CFGSort (NFC) Class `RegionInfo` was `SortUnitInfo` before, so the variables were named `SUI`. Now the class name is `RegionInfo`, so this renames `SUI` to `RI`, matching the class name. llvm-svn: 356861	2019-03-24 17:34:40 +00:00
Simon Pilgrim	a71c0ed471	[X86][AVX] Start shuffle combining from ZERO_EXTEND_VECTOR_INREG (PR40685) Just enable this for AVX for now as SSE41 introduces extra register moves for the PMOVZX(PSHUFD(V)) -> UNPCKH(V,0) pattern (but otherwise helps reduce port5 usage on Intel targets). Only AVX support is required for PR40685 as the issue is due to 8i8->8i32 zext shuffle leftovers. llvm-svn: 356858	2019-03-24 16:30:35 +00:00
Sanjay Patel	7d676dfd86	[x86] improve the default expansion of uaddsat/usubsat This is yet another step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 uaddsat X, Y --> (X >u (X + Y)) ? -1 : X + Y usubsat X, Y --> (X >u Y) ? X - Y : 0 We can't count on a sane vector ISA, so override the default (umin/umax) expansion of unsigned add/sub saturate in cases where we do not have umin/umax. Differential Revision: https://reviews.llvm.org/D59006 llvm-svn: 356855	2019-03-24 13:55:54 +00:00
Sanjay Patel	2e92846d36	[x86] reduce code duplication; NFC llvm-svn: 356836	2019-03-23 15:00:52 +00:00
Eli Friedman	b906bba576	[ARM] Don't form "ands" when it isn't scheduled correctly. In r322972/r323136, the iteration here was changed to catch cases at the beginning of a basic block... but we accidentally deleted an important safety check. Restore that check to the way it was. Fixes https://bugs.llvm.org/show_bug.cgi?id=41116 Differential Revision: https://reviews.llvm.org/D59680 llvm-svn: 356809	2019-03-22 20:49:15 +00:00

1 2 3 4 5 ...

51382 Commits