llvm-project

Commit Graph

Author	SHA1	Message	Date
Amara Emerson	7bc4fad0fb	[AArch64][GlobalISel] Implement narrowing of G_SEXT. We need this to narrow a sext to s128. Differential Revision: https://reviews.llvm.org/D65357 llvm-svn: 367164	2019-07-26 23:46:38 +00:00
Jessica Paquette	aa8b9993c2	[AArch64][GlobalISel] Select @llvm.aarch64.stlxr for 32-bit pointers Add partial instruction selection for intrinsics like this: ``` declare i32 @llvm.aarch64.stlxr(i64, i32*) ``` (This only handles the case where a G_ZEXT is feeding the intrinsic.) Also make sure that the added store instruction actually has the memory op from the original G_STORE. Update select-stlxr-intrin.mir and arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D65355 llvm-svn: 367163	2019-07-26 23:28:53 +00:00
Vlad Tsyrklevich	485b8789de	Revert "[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler." This reverts r367100, it appears to be causing test failures after Nico's revert of r367091. llvm-svn: 367141	2019-07-26 18:14:21 +00:00
Sean Fertile	9df6177d38	[PowerPC][AIX]Add lowering of MCSymbol MachineOperand. Adds machine operand lowering for MCSymbolSDNodes to the PowerPC backend. This is needed to produce call instructions in assembly for AIX because the callee operand is a MCSymbolSDNode. The test is XFAIL'ed for asserts due to a (valid) assertion in PEI that the AIX ABI isn't supported yet. Differential Revision: https://reviews.llvm.org/D63738 llvm-svn: 367133	2019-07-26 17:25:27 +00:00
Michael Liao	711556e6a8	[AMDGPU] Fix typo. llvm-svn: 367131	2019-07-26 17:13:59 +00:00
Cullen Rhodes	2cde8b5db6	[AArch64][SVE2] Rename bitperm feature to sve2-bitperm Summary: The bitperm feature flag is now prefixed with SVE2, as it is for all other SVE2 extensions Patch by Maciej Gabka. Reviewers: sdesmalen, rovka, chill, SjoerdMeijer, rengolin Reviewed By: SjoerdMeijer, rengolin Differential Revision: https://reviews.llvm.org/D65327 llvm-svn: 367124	2019-07-26 15:57:50 +00:00
Sam Parker	3da59e5513	[ARM][ParallelDSP] Combine structs Combine OpChain and BinOpChain structs as OpChain is a base class to BinOpChain that is never used. llvm-svn: 367114	2019-07-26 14:11:40 +00:00
Sean Fertile	9bd22fec0d	[PowerPC] Add getCRSaveOffset to improve readability. [NFC] In preperation for AIX support in FrameLowering: replace a number of literal '8' that represent the stack offset of the condition register save area with a member in PPCFrameLowering. Patch by Chris Bowler. llvm-svn: 367111	2019-07-26 14:02:17 +00:00
Petar Avramovic	cf21794566	[MIPS GlobalISel] Fix check for void return during lowerCall Void return used to have unsigned with value 0 for virtual register but with addition of Register class and changes to arguments to lowerCall this is no longer valid. Check for void return by inspecting the Ty field in OrigRet. Differential Revision: https://reviews.llvm.org/D65321 llvm-svn: 367107	2019-07-26 13:19:37 +00:00
Carl Ritson	0b28357053	[AMDGPU] Move WQM/WWM intrinsic instruction selection to AMDGPUISelDAGToDAG Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65328 llvm-svn: 367105	2019-07-26 13:11:44 +00:00
Petar Avramovic	b1fc6f6130	[MIPS GlobalISel] Select inttoptr and ptrtoint Select G_INTTOPTR and G_PTRTOINT for MIPS32. Differential Revision: https://reviews.llvm.org/D65217 llvm-svn: 367104	2019-07-26 13:08:06 +00:00
Simon Pilgrim	d93e8ece7b	[X86][SSE] Replace PMULDQ GetDemandedBits combine with SimplifyMultipleUseDemandedBits handler. This removes a GetDemandedBits user and allows us to benefit from the DemandedElts propagated through SimplifyDemandedBits. llvm-svn: 367100	2019-07-26 11:10:20 +00:00
Sam Parker	7440065bd8	[NFC][ARM][ParallelDSP] Cleanup isNarrowSequence Remove unused logic. llvm-svn: 367099	2019-07-26 10:57:42 +00:00
Carl Ritson	00e89b428b	[AMDGPU] Add llvm.amdgcn.softwqm intrinsic Add llvm.amdgcn.softwqm intrinsic which behaves like llvm.amdgcn.wqm only if there is other WQM computation in the shader. Reviewers: nhaehnle, tpr Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64935 llvm-svn: 367097	2019-07-26 09:54:12 +00:00
Momchil Velikov	898d953693	[AArch64] Define ETE and TRBE system registers Embedded Trace Extension and Trace Buffer Extension are optional future architecture extensions. (cf. https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools) Their system registers are documented here: https://developer.arm.com/docs/ddi0601/a ETE shares register names with ETM. One exception is the ETE TRCEXTINSELR0 register, which has the same encoding as the ETM TRCEXTINSELR register (but different semantics). This patch treats them as aliases: the assembler will accept both names, emitting identical encoding, and the disassembler will keep disassembling to TRCEXRINSELR. Differential Revision: https://reviews.llvm.org/D63707 llvm-svn: 367093	2019-07-26 09:19:08 +00:00
Sam Parker	c760b5da11	[ARM][LowOverheadLoops] Add CPSR defs Both WhileLoopStart and LoopEnd may get turned into a cmp and br pair, so add an implicit def to these pseudo instructions in case that WLS and LE aren't generated. Differential Revision: https://reviews.llvm.org/D65275 llvm-svn: 367089	2019-07-26 08:15:01 +00:00
Pengfei Wang	9ad565f70e	[WinEH] Allocate space in funclets stack to save XMM CSRs Summary: This is an alternate approach to D57970. Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Reviewers: andrew.w.kaylor, LuoYuanke, annita.zhang, craig.topper, RKSimon Subscribers: rnk, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63396 Signed-off-by: pengfei <pengfei.wang@intel.com> llvm-svn: 367088	2019-07-26 07:33:15 +00:00
Matt Arsenault	a9ea8a9aae	AMDGPU/GlobalISel: Handle most function return types handleAssignments gives up pretty easily on structs, and i8 values for some reason. The other case that doesn't work is when an implicit sret needs to be inserted if the return size exceeds the number of return registers. llvm-svn: 367082	2019-07-26 02:36:05 +00:00
Amara Emerson	c07fe307b4	[AArch64][GlobalISel] Simplify zext/sext selection, use MachineIRBuilder. NFC. llvm-svn: 367075	2019-07-26 00:01:09 +00:00
Yonghong Song	329abf2939	[BPF] fix typedef issue for offset relocation Currently, the CO-RE offset relocation does not work if any struct/union member or array element is a typedef. For example, typedef const int arr_t[7]; struct input { arr_t a; }; func(...) { struct input *in = ...; ... __builtin_preserve_access_index(&in->a[1]) ... } The BPF backend calculated default offset is 0 while 4 is the correct answer. Similar issues exist for struct/union typedef's. When getting struct/union member or array element type, we should trace down to the type by skipping typedef and qualifiers const/volatile as this is what clang did to generate getelementptr instructions. (const/volatile member type qualifiers are already ignored by clang.) This patch fixed this issue, for each access index, skipping typedef and const/volatile/restrict BTF types. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65259 llvm-svn: 367062	2019-07-25 21:47:27 +00:00
Amara Emerson	e54dc6b8b5	[AArch64][GlobalISel] Fix G_SELECT legalization fallback after r366943. Changes the order of legalization of G_ICMP suggested by Petar in D65079. llvm-svn: 367060	2019-07-25 21:44:52 +00:00
Yonghong Song	d8efec97be	[BPF] fix CO-RE incorrect index access string Currently, we expect the CO-RE offset relocation records a string encoding the original getelementptr access index, so kernel bpf loader can decode it correctly. For example, struct s { int a; int b; }; struct t { int c; int d; }; #define _(x) (__builtin_preserve_access_index(x)) int get_value(const void addr1, const void addr2); int test(struct s arg1, struct t arg2) { return get_value(_(&arg1->b), _(&arg2->d)); } We expect two offset relocations: reloc 1: type s, access index 0, 1 reloc 2: type t, access index 0, 1 Two globals are created to retain access indexes for the above two relocations with global variable names. The first global has a name "0:1:". Unfortunately, the second global has the name "0:1:.1" as the llvm internals automatically add suffix ".1" to a global with the same name. Later on, the BPF peels the last character and record "0:1" and "0:1:." in the relocation table. This is not desirable. BPF backend could use the global variable suffix knowledge to generate correct access str. This patch rather took an approach not relying on that knowledge. It generates "s:0:1:" and "t:0:1:" to avoid global variable suffixes and later on generate correct index access string "0:1" for both records. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D65258 llvm-svn: 367030	2019-07-25 16:01:26 +00:00
Michael Liao	53f967f2bd	[AMDGPU] Run `unreachable-mbb-elimination` after isel to clean up PHIs. Summary: - As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB. - Run extra `unreachable-mbb-elimination` at the end of isel to clean up PHIs. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64353 llvm-svn: 367023	2019-07-25 14:50:18 +00:00
Momchil Velikov	a655f476b0	[AArch64][SVE] Allow explicit size specifier for predicate operand ... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue supporting the exsting behaviour of not requiring an explicit size specifier. The preferred disasembly is with the specifier. This is implemented by redefining intruction forms to require vector predicates with explicit size and adding aliases, which allow a predicate with no size. Differential Revision: https://reviews.llvm.org/D65145 llvm-svn: 367019	2019-07-25 13:56:04 +00:00
Matt Arsenault	a85af76c72	AMDGPU: Don't assert on v4f16 arguments to shader calling conventions llvm-svn: 367018	2019-07-25 13:55:07 +00:00
Simon Pilgrim	447fe31964	[X86] concatSubVectors - remove unnecessary args. NFCI. All these args can be cheaply recomputed and it makes it much easier to use the function as a quick helper. llvm-svn: 367014	2019-07-25 13:05:46 +00:00
Pablo Barrio	275954539d	[ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1 Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007	2019-07-25 10:59:45 +00:00
Kai Luo	985e52a4c1	[PowerPC][NFC] Make `getDefMIPostRA` public llvm-svn: 366995	2019-07-25 08:36:44 +00:00
Kai Luo	5c8af53806	[PowerPC][NFC] Added `getDefMIPostRA` method Summary: In PostRA phase, we often have to find out the most recent definition of a register. This patch adds getDefMIPostRA so that other methods can use it rather than implementing it repeatedly. Differential Revision: https://reviews.llvm.org/D65131 llvm-svn: 366990	2019-07-25 07:47:52 +00:00
Seiya Nuta	21277e3ec2	[MC] Add MCInstrAnalysis::evaluateMemoryOperandAddress Summary: Add a new method which tries to compute the target address referenced by an operand. This patch supports x86_64 RIP-relative addressing for now. It is necessary to print referenced symbol names in llvm-objdump. Reviewers: andreadb, MaskRay, grosbach, jgalenson, craig.topper Reviewed By: MaskRay, craig.topper Subscribers: bcain, rupprecht, jhenderson, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63847 llvm-svn: 366987	2019-07-25 06:57:09 +00:00
Eli Friedman	82e109279d	[ARM] Remove dead code from ARMConstantIslands. tLDRHi is not a pc-relative load; it can't directly refer to a constant pool or jump table. llvm-svn: 366963	2019-07-24 23:36:14 +00:00
Jessica Paquette	728b18f29f	[AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEP Before, we weren't able to select things like this for G_GEP: add x0, x8, #8 And instead we'd materialize the 8. This teaches GISel to do that. It gives some considerable code size savings on 252.eon-- about 4%! Differential Revision: https://reviews.llvm.org/D65248 llvm-svn: 366959	2019-07-24 23:11:01 +00:00
Amara Emerson	de81bd0faa	[AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon or fp. Throughout the legalizerinfo we currently make the assumption that the target has neon and FP target features available. Fixing it will require a refactor of the whole thing, so until then make sure we fall back. Works around PR42734 Differential Revision: https://reviews.llvm.org/D65244 llvm-svn: 366957	2019-07-24 23:00:04 +00:00
Roman Lebedev	017e272c3a	[Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 fold Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955	2019-07-24 22:57:22 +00:00
Jessica Paquette	68499112cf	[AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possible If we have a G_MUL, and either the LHS or the RHS of that mul is the legal shift value for a load addressing mode, we can fold it into the load. This gives some code size savings on some SPEC tests. The best are around 2% on 300.twolf and 3% on 254.gap. Differential Revision: https://reviews.llvm.org/D65173 llvm-svn: 366954	2019-07-24 22:49:42 +00:00
Amara Emerson	13af1ed8e3	[GlobalISel] Support for inlining memcpy, memset and memmove calls. This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951	2019-07-24 22:17:31 +00:00
Stanislav Mekhanoshin	c43784ff26	[AMDGPU] Increase kernel padding To support prefetch mode 3 we need to pad current cacheline and fill 3 cachelines after. Current padding is only sufficient for mode 2. Differential Revision: https://reviews.llvm.org/D65236 llvm-svn: 366938	2019-07-24 19:40:13 +00:00
David Green	cd7a6fa314	[ARM] Rewrite how VCMP are lowered, using a single node This removes the VCEQ/VCNE/VCGE/VCEQZ/etc nodes, just using two called VCMP and VCMPZ with an extra operand as the condition code. I believe this will make some combines simpler, allowing us to just look at these codes and not the operands. It also helps fill in a missing VCGTUZ MVE selection without adding extra nodes for it. Differential Revision: https://reviews.llvm.org/D65072 llvm-svn: 366934	2019-07-24 17:36:47 +00:00
Simon Pilgrim	7d318b2bb1	[DAGCombine] matchBinOpReduction - add partial reduction matching This patch adds support for recognizing cases where a larger vector type is being used to reduce just the elements in the lower subvector: e.g. <8 x i32> reduction pattern in a <16 x i32> vector: <4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> <2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> <1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> matchBinOpReduction returns the lower extracted subvector in such cases, assuming isExtractSubvectorCheap accepts the extraction. I've only enabled it for X86 reduction sums so far. I intend to enable it for the bitop/minmax cases in future patches, and eventually I think its worth turning it on all the time. This is mainly just a case of ensuring calls to matchBinOpReduction don't make assumptions on the vector width based on the original vector extraction. Fixes the x86 partial reduction sum cases in PR33758 and PR42023. Differential Revision: https://reviews.llvm.org/D65047 llvm-svn: 366933	2019-07-24 17:29:56 +00:00
David Green	047a0b6575	[ARM] Disable MVE fptosi and friends The prevents us from trying to convert an i1 predicate vector to a float, or vice-versa. Better patterns are possible, which will follow in a subsequent commit. For now we just expand them. Differential Revision: https://reviews.llvm.org/D65066 llvm-svn: 366931	2019-07-24 17:26:26 +00:00
Jessica Paquette	c19c30776a	[AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVec Fix an off-by-one error which made us not look at the last element of the zero vector. This caused a miscompile in 188.ammp. Differential Revision: https://reviews.llvm.org/D65168 llvm-svn: 366930	2019-07-24 17:18:51 +00:00
David Green	b342bddbe2	[ARM] More MVE compare vector splat combines for ANDs Adds some extra r register compare combines, this time for ANDs. Differential Revision: https://reviews.llvm.org/D65062 llvm-svn: 366928	2019-07-24 17:08:09 +00:00
David Green	93b5f61295	[ARM] MVE compare vector splat combine MVE VCMP instructions can use a general purpose register as the second operand. This adds the combines for it, selecting from a compare of a vdup. Differential Revision: https://reviews.llvm.org/D65061 llvm-svn: 366924	2019-07-24 16:58:41 +00:00
Dmitry Preobrazhensky	5e1dd02c90	[AMDGPU][MC][GFX10] Enabled GFX10 assembly with arbitrary wavesize assumed by the code Reviewers: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D65216 llvm-svn: 366921	2019-07-24 16:50:17 +00:00
David Green	bab4d8ac5a	[ARM] Better OR's for MVE compares This adds a DeMorgan combine for OR's of compares to turn them into AND's, helping prevent them from going into and out of gpr registers. It also fills in the VCLE and VCLT nodes that MVE can select, allowing it to invert more compares. Differential Revision: https://reviews.llvm.org/D65059 llvm-svn: 366920	2019-07-24 16:42:09 +00:00
Stanislav Mekhanoshin	5cdacea297	[AMDGPU] Add all vgpr classes to asm parser Differential Revision: https://reviews.llvm.org/D65158 llvm-svn: 366917	2019-07-24 16:21:18 +00:00
Matt Arsenault	0e7d8698b5	AMDGPU/GlobalISel: Don't assume instruction can be erased when selecting exts The G_ANYEXT handling can end up reaching selectCOPY, which mutates the instruction in place. llvm-svn: 366915	2019-07-24 16:05:53 +00:00
David Green	69fba7434e	[ARM] Better AND's for MVE compares Add a number of folds to convert and(vcmp, vcmp) into a single VPT block, where the second vcmp becomes predicated on the first. The VCMP; VPST; VCMP will eventually be converted to VPT; VCMP in the VPTBlockPass. Differential Revision: https://reviews.llvm.org/D65058 llvm-svn: 366910	2019-07-24 14:42:05 +00:00
David Green	4fc78c496e	[ARM] MVE floating point compares and selects Much like integers, this adds MVE floating point compares and select. It requires a lot more buildvector/shuffle code because we may need to expand the compares without mve.fp, and requires support for and/or because of the way we lower llvm condition codes. Some original code by David Sherwood Differential Revision: https://reviews.llvm.org/D65054 llvm-svn: 366909	2019-07-24 14:28:22 +00:00
David Green	a4a4698c16	[ARM] Basic And/Or/Xor handling for MVE predicates This adds some basic, "worst case" handling for MVE predicate Or/And/Xor. It does this by going into and out of GPRs, doing the operation on scalars. Code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65053 llvm-svn: 366907	2019-07-24 14:17:54 +00:00
Simi Pallipurath	724888af45	[ARM] Make sure that the constant pool does not keep in the middle of an IT block. This change make sure that llvm does not emit an invalid IT block by putting the constant pool in the middle of an IT block. We have code to try to avoid putting a constant island in the middle of an IT block, but it only works if we see an IT between the one currently referencing CPE and possible insertion point. If the first instruction we look at is the VLDRD after the IT , we never see the IT and does not realize that the instruction doing the load could be in an IT block itself. Differential Revision: https://reviews.llvm.org/D64621 Change-Id: I24cecb37cded75e8992870bd997f6226853bd920 llvm-svn: 366905	2019-07-24 13:54:14 +00:00
Sjoerd Meijer	a19f5a76e6	Test commit. NFC. Removed 2 trailing whitespaces in 2 files that used to be in different repos to test my new github monorepo workflow. llvm-svn: 366904	2019-07-24 13:30:36 +00:00
David Green	c7e55d4f52	[ARM] MVE predicate register support This adds support code for building and shuffling i1 predicate registers. It generally uses two basic principles, either converting the predicate into an scalar (through a PREDICATE_CAST) and doing scalar operations on it there, or by converting the register to an full vector register and back. Some of the code here is a not super efficient but will hopefully cover most cases of moving i1 vectors around and can be improved in subsequent patches. Some code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65052 llvm-svn: 366890	2019-07-24 11:51:36 +00:00
David Green	b9d96ceca0	[ARM] MVE integer compares and selects This adds the very basics for MVE vector predication, adding integer VCMP and VSEL instruction support. This is done through predicate registers (MVT::v16i1, MVT::v8i1, MVT::v4i1), but otherwise using same mechanics as NEON to custom lower setcc's through ARMISD::VCXX nodes (VCEQ, VCGT, VCEQZ, etc). An extra VCNE was added, as this can be handled sensibly by MVE's expanded number of VCMP condition codes. (There are also VCLE and VCLT which are added later). VPSEL is also added here, simply selecting on the vselect. Original code by David Sherwood. Differential Revision: https://reviews.llvm.org/D65051 llvm-svn: 366885	2019-07-24 11:08:14 +00:00
Sam Parker	aeb21b96a0	[ARM][ParallelDSP] Fix pointer operand reordering While combining two loads into a single load, we often need to reorder the pointer operands for the new load. This reordering was broken in the cases where there was a chain of values that built up the pointer. Differential Revision: https://reviews.llvm.org/D65193 llvm-svn: 366881	2019-07-24 09:38:39 +00:00
Chen Zheng	8b7e82be12	[PowerPC][NFC] use opcode instead of MachineInstr for instrHasImmForm(). llvm-svn: 366867	2019-07-24 04:50:23 +00:00
Fangrui Song	305ace7cc8	[AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after r366857 llvm-svn: 366866	2019-07-24 01:59:44 +00:00
Amara Emerson	511f7f5785	[AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs. We need to be able to load and store s128 for memcpy inlining, where we want to generate Q register mem ops. Making these legal also requires that we add some support in other instructions. Regbankselect should also know about these since they have no GPR register class that can hold them, so need special handling to live on the FPR bank. Differential Revision: https://reviews.llvm.org/D65166 llvm-svn: 366857	2019-07-23 22:05:13 +00:00
Jessica Paquette	a2fae1e3e9	[GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPR The condition can never be fed by FPRs, so it should always be on a GPR. Differential Revision: https://reviews.llvm.org/D65157 llvm-svn: 366854	2019-07-23 21:39:50 +00:00
Eli Friedman	b27fc95e89	[ARM] Add opt-bisect support to ARMParallelDSP. llvm-svn: 366851	2019-07-23 20:48:46 +00:00
Yi-Hong Lyu	41a010a4ef	[PowerPC] Remove redundant load immediate instructions Currently PowerPC backend emits code like this: r3 = li 0 std r3, 264(r1) r3 = li 0 std r3, 272(r1) This patch fixes that and other cases where a register already contains a value that is loaded so we will get: r3 = li 0 std r3, 264(r1) std r3, 272(r1) Differential Revision: https://reviews.llvm.org/D64220 llvm-svn: 366840	2019-07-23 19:11:07 +00:00
Craig Topper	76bc3d6e07	[X86] In lowerVectorShuffle, instead of creating a new node to canonicalize the shuffle mask by commuting, just commute the mask and swap V1/V2. LegalizeDAG tries to legal the DAG by legalizing nodes before their operands. If we create a new node, we end up legalizing it after its operands. This prevents some of the optimizations that can be done when the operand is a build_vector since the build_vector will have been legalized to something else. Differential Revision: https://reviews.llvm.org/D65132 llvm-svn: 366835	2019-07-23 18:46:15 +00:00
Jessica Paquette	2b404d01e8	[GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modes When we select the XRO variants of loads, we can pull in very specific shifts (of the size of an element). E.g. ``` ldr x1, [x2, x3, lsl #3] ``` This teaches GISel to handle these when they're coming from shifts specifically. This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg` which recognizes this pattern. This also packs this up with `selectAddrModeRegisterOffset` into `selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO` in AArch64ISelDAGtoDAG. Also update load-addressing-modes to show that all of the cases here work. Differential Revision: https://reviews.llvm.org/D65119 llvm-svn: 366819	2019-07-23 16:09:42 +00:00
Sam Parker	57e87dd81b	[ARM][LowOverheadLoops] Fix branch target codegen While lowering test.set.loop.iterations, it wasn't checked how the brcond was using the result and so the wls could branch to the loop preheader instead of not entering it. The same was true for loop.decrement.reg. So brcond and br_cc and now lowered manually when using the hwloop intrinsics. During this we now check whether the result has been negated and whether we're using SETEQ or SETNE and 0 or 1. We can then figure out which basic block the WLS and LE should be targeting. Differential Revision: https://reviews.llvm.org/D64616 llvm-svn: 366809	2019-07-23 14:08:46 +00:00
Simon Pilgrim	c60c12fb10	Fix MSVC warning about extending a uint32_t shift result to uint64_t. NFCI. llvm-svn: 366808	2019-07-23 14:04:54 +00:00
David Green	fdedf240f8	[ARM] Rename NEONModImm to VMOVModImm. NFC Rename NEONModImm to VMOVModImm as it is used in both NEON and MVE. llvm-svn: 366790	2019-07-23 09:19:24 +00:00
Zi Xuan Wu	57d17ec2e1	[PowerPC] Replace float load/store pair with integer load/store pair when it's only used in load/store Replace float load/store pair with integer load/store pair when it's only used in load/store, because float load/store instructions cost more cycles then integer load/store. A typical scenario is when there is a call with more than 13 float arguments passing, we need pass them by stack. So we need a load/store pair to do such memory operation if the variable is global variable. Differential Revision: https://reviews.llvm.org/D64195 llvm-svn: 366775	2019-07-23 03:34:40 +00:00
Matt Arsenault	827427f65b	AMDGPU: Don't use SDNodeXForm for DS offset output The xform has no real valuewhen it's using out of a complex pattern output. The complex pattern was already creating TargetConstants with i16, so this was just unnecessary machinery. This allows global isel to import the simple cases once the complex pattern is implemented. llvm-svn: 366743	2019-07-22 21:38:11 +00:00
Craig Topper	510e6fadaa	[X86] When using AND+PACKUS in lowerV16I8Shuffle, generate the build vector directly in v16i8 with the correct 0x00 or 0xFF elements rather than using another VT and bitcasting it. The build_vector will become a constant pool load. By using the desired type initially, it ensures we don't generate a bitcast of the constant pool load which will need to be folded with the load. While experimenting with another patch, I noticed that when the load type and the constant pool type don't match, then SimplifyDemandedBits can't handle it. While we should probably fix that, this was a simple way to fix the issue I saw. llvm-svn: 366732	2019-07-22 19:58:49 +00:00
Jason Liu	8dd563ef4b	[NFC][PowerPC]Change ADDIStocHA to ADDIStocHA8 to follow 64-bit naming convention Summary: Since we are planning to add ADDIStocHA for 32bit in later patch, we decided to change 64bit one first to follow naming convention with 8 behind opcode. Patch by: Xiangling_L Differential Revision: https://reviews.llvm.org/D64814 llvm-svn: 366731	2019-07-22 19:55:33 +00:00
Sean Fertile	942537d9fa	Stubs out TLOF for AIX and add support for common vars in assembly output. Stubs out a TargetLoweringObjectFileXCOFF class, implementing only SelectSectionForGlobal for common symbols. Also adds an override of EmitGlobalVariable in PPCAIXAsmPrinter which adds a number of defensive errors and adds support for emitting common globals. llvm-svn: 366727	2019-07-22 19:15:29 +00:00
Sean Fertile	324d33dd4e	[PowerPC] Fix comment on MO_PLT Target Operand Flag. [NFC] Patch by Xiangling Liao. llvm-svn: 366724	2019-07-22 18:47:59 +00:00
Sam Parker	4379a40088	[ARM][LowOverheadLoops] Revert remaining pseudos ARMLowOverheadLoops would assert a failure if it did not find all the pseudo instructions that comprise the hardware loop. Instead of doing this, iterate through all the instructions of the function and revert any remaining pseudo instructions that haven't been converted. Differential Revision: https://reviews.llvm.org/D65080 llvm-svn: 366691	2019-07-22 14:16:40 +00:00
Matt Arsenault	937d0ee5d8	AMDGPU/GlobalISel: Remove unnecessary code The minnum/maxnum case are dead, and the cvt is handled by the default. llvm-svn: 366685	2019-07-22 13:05:25 +00:00
David Green	8876a312a8	[ARM] Fix for MVE VPT block pass We need to ensure that the number of T's is correct when adding multiple instructions into the same VPT block. Differential revision: https://reviews.llvm.org/D65049 llvm-svn: 366684	2019-07-22 12:51:38 +00:00
Simon Pilgrim	b3d719e1cf	[X86] EltsFromConsecutiveLoads - support common source loads (REAPPLIED) This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Fixed out of bounds load assert identified in rL366501 Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366681	2019-07-22 12:44:10 +00:00
Christudasan Devadasan	006cf8c03d	Added address-space mangling for stack related intrinsics Modified the following 3 intrinsics: int_addressofreturnaddress, int_frameaddress & int_sponentry. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D64561 llvm-svn: 366679	2019-07-22 12:42:48 +00:00
Oliver Stannard	6771a89fa0	[IPRA][ARM] Make use of the "returned" parameter attribute ARM has code to recognise uses of the "returned" function parameter attribute which guarantee that the value passed to the function in r0 will be returned in r0 unmodified. IPRA replaces the regmask on call instructions, so needs to be told about this to avoid reverting the optimisation. Differential revision: https://reviews.llvm.org/D64986 llvm-svn: 366669	2019-07-22 08:44:36 +00:00
Jay Foad	298500ae33	[AMDGPU] Save some work when an atomic op has no uses Summary: In the atomic optimizer, save doing a bunch of work and generating a bunch of dead IR in the fairly common case where the result of an atomic op (i.e. the value that was in memory before the atomic op was performed) is not used. NFC. Reviewers: arsenm, dstuttard, tpr Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64981 llvm-svn: 366667	2019-07-22 07:19:44 +00:00
Simon Pilgrim	86fa3270ef	[X86] SimplifyDemandedVectorEltsForTargetNode - Move SUBV_BROADCAST narrowing handling. NFCI. Move the narrowing of SUBV_BROADCAST to where we handle all the other opcodes. llvm-svn: 366660	2019-07-21 19:04:44 +00:00
Simon Pilgrim	adec0f2252	[X86][SSE] Use PSADBW to improve vXi8 sum reduction (PR42674) As detailed on PR42674, we can reduce a vXi8 down until we have the final <8 x i8>, and then use PSADBW with zero, to sum those values. We then extract the bottom i8, discarding any overflow from the upper bits of the i16 result. llvm-svn: 366636	2019-07-20 15:20:11 +00:00
Jessica Paquette	41affad967	[GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREs Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625	2019-07-20 01:55:35 +00:00
Guanzhong Chen	5204f7611f	[WebAssembly] Compute and export TLS block alignment Summary: Add immutable WASM global `__tls_align` which stores the alignment requirements of the TLS segment. Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang. The expected usage has now changed to: __wasm_init_tls(memalign(__builtin_wasm_tls_align(), __builtin_wasm_tls_size())); Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D65028 llvm-svn: 366624	2019-07-19 23:34:16 +00:00
Matt Arsenault	f3bfb85bce	AMDGPU/GlobalISel: Legalize GEP for other 32-bit address spaces llvm-svn: 366621	2019-07-19 22:28:44 +00:00
Stanislav Mekhanoshin	05d9e6a2a3	[AMDGPU] Autogenerate register sequences in tuples Differential Revision: https://reviews.llvm.org/D65007 llvm-svn: 366619	2019-07-19 21:43:42 +00:00
Stanislav Mekhanoshin	7b5a54e369	[AMDGPU] Fixed occupancy calculation for gfx10 Differential Revision: https://reviews.llvm.org/D65010 llvm-svn: 366616	2019-07-19 21:29:51 +00:00
Matt Arsenault	5e23f42820	AMDGPU: Avoid custom predicates for stores with glue llvm-svn: 366613	2019-07-19 21:01:30 +00:00
Matt Arsenault	e3401a9b86	AMDGPU: Redefine setcc condition PatLeafs Avoid using custom code predicates. llvm-svn: 366609	2019-07-19 20:24:40 +00:00
Matt Arsenault	48c0df5d46	AMDGPU: Don't rely on m0 being -1 for GWS offsets This only works if the high bits of m0 are also 0, so m0 would have to be set to 0xffff. llvm-svn: 366608	2019-07-19 20:01:24 +00:00
Matt Arsenault	85f3890126	AMDGPU: Force s_waitcnt after GWS instructions This is apparently required to be the immediately following instruction, so force it into a bundle with a waitcnt. llvm-svn: 366607	2019-07-19 19:47:30 +00:00
Stanislav Mekhanoshin	01fcf9238f	[AMDGPU] Allow register tuples to set asm names This change reverts most of the previous register name generation. The real problem is that RegisterTuple does not generate asm names. Added optional operand to RegisterTuple. This way we can simplify register name access and dramatically reduce the size of static tables for the backend. Differential Revision: https://reviews.llvm.org/D64967 llvm-svn: 366598	2019-07-19 18:05:01 +00:00
Matt Arsenault	7df225dfc2	AMDGPU/GlobalISel: Fix MMO flags for kernel argument loads The DAG lowering sets dereferencable and invariant, not nontemporal. llvm-svn: 366597	2019-07-19 17:52:56 +00:00
Matt Arsenault	08494f6231	AMDGPU/GlobalISel: Selection for fminnum/fmaxnum v2f16 case doesn't work yet because the VOP3P complex patterns haven't been ported yet. llvm-svn: 366585	2019-07-19 14:42:40 +00:00
Matt Arsenault	b60a2ae40e	AMDGPU/GlobalISel: Support arguments with multiple registers Handles structs used directly in argument lists. llvm-svn: 366584	2019-07-19 14:29:30 +00:00
Matt Arsenault	fecf43eba3	AMDGPU/GlobalISel: Rewrite lowerFormalArguments This should now handle everything except structs passed as multiple registers. I think most of the packing logic should be handled by handleAssignments, but I'm unclear on what the contract is for multiple registers. This is copying how x86 handles this. This does change the behavior of the test_sgpr_alignment0 amdgpu_vs test. I don't think shader arguments should try to follow the alignment, and registers need to be repacked. I also don't think it matters, since I think the pointers are packed to the beginning of the argument list anyway. llvm-svn: 366582	2019-07-19 14:15:18 +00:00
Matt Arsenault	1022c0dfde	AMDGPU: Decompose all values to 32-bit pieces for calling conventions This is the more natural lowering, and presents more opportunities to reduce 64-bit ops to 32-bit. This should also help avoid issues graphics shaders have had with 64-bit values, and simplify argument lowering in globalisel. llvm-svn: 366578	2019-07-19 13:57:44 +00:00
Dmitry Preobrazhensky	4ccb7f8c45	[AMDGPU][MC] Corrected parsing of branch offsets See bug 40820: https://bugs.llvm.org/show_bug.cgi?id=40820 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64629 llvm-svn: 366571	2019-07-19 13:12:47 +00:00
Than McIntosh	e238a4c757	[X86] for split stack, not save/restore nested arg if unused Summary: For split-stack, if the nested argument (i.e. R10) is not used, no need to save/restore it in the prologue. Reviewers: thanm Reviewed By: thanm Subscribers: mstorsjo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64673 llvm-svn: 366569	2019-07-19 12:54:44 +00:00
Oliver Stannard	8780c0dda2	Don't update NoTrappingFPMath and FPDenormalMode in resetTargetOptions We'd like to remove this whole function, because these are properties of functions, not the target as a whole. These two are easy to remove because they are only used for emitting ARM build attributes, which expects them to represent the defaults for the whole module, not just the last function generated. This is needed to get correct build attributes when using IPRA on ARM, because IPRA causes resetTargetOptions to get called before ARMAsmPrinter::emitAttributes. Differential revision: https://reviews.llvm.org/D64929 llvm-svn: 366562	2019-07-19 10:37:37 +00:00
Mikhail Maltsev	0b001f94a5	[ARM] Add <saturate> operand to SQRSHRL and UQRSHLL Summary: According to the new Armv8-M specification https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the instructions SQRSHRL and UQRSHLL now have an additional immediate operand <saturate>. The new assembly syntax is: SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm where <saturate> can be either 64 (the existing behavior) or 48, in that case the result is saturated to 48 bits. The new operand is encoded as follows: #64 Encoded as sat = 0 #48 Encoded as sat = 1 sat is bit 7 of the instruction bit pattern. This patch adds a new assembler operand class MveSaturateOperand which implements parsing and encoding. Decoding is implemented in DecodeMVEOverlappingLongShift. Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer Reviewed By: simon_tatham Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64810 llvm-svn: 366555	2019-07-19 09:46:28 +00:00
Jay Foad	7d06ffff46	[AMDGPU] Simplify the exclusive scan used for optimized atomics Summary: Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8, 16, 32) instead of starting off shifting by 1, 2 and 3 and then doing a 3-way ADD, because: 1. It simplifies the compiler a little. 2. It minimizes vgpr pressure because each instruction is now of the form vn = vn + vn << c. 3. It is more friendly to the DPP combiner, which currently can't combine into an ADD3 instruction. Because of #2 and #3 the end result is improved from this: v_add_u32_dpp v4, v3, v3 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 v_mov_b32_dpp v5, v3 row_shr:2 row_mask:0xf bank_mask:0xf v_mov_b32_dpp v1, v3 row_shr:3 row_mask:0xf bank_mask:0xf v_add3_u32 v1, v4, v5, v1 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf To this: v_add_u32_dpp v1, v1, v1 row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0 s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:4 row_mask:0xf bank_mask:0xe s_nop 1 v_add_u32_dpp v1, v1, v1 row_shr:8 row_mask:0xf bank_mask:0xc s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:15 row_mask:0xa bank_mask:0xf s_nop 1 v_add_u32_dpp v1, v1, v1 row_bcast:31 row_mask:0xc bank_mask:0xf I.e. two fewer computational instructions, one extra nop where we could schedule something else. Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64411 llvm-svn: 366543	2019-07-19 08:40:37 +00:00
Hsiangkai Wang	18ccfadd46	[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame. It is necessary to generate fixups in .debug_frame or .eh_frame as relaxation is enabled due to the address delta may be changed after relaxation. There is an opcode with 6-bits data in debug frame encoding. So, we also need 6-bits fixup types. Differential Revision: https://reviews.llvm.org/D58335 llvm-svn: 366524	2019-07-19 02:03:34 +00:00
Amara Emerson	cf12c7815f	[GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516	2019-07-19 00:24:45 +00:00
Stanislav Mekhanoshin	a9c71e01e7	[AMDGPU] Drop Reg32 and use regular AsmName This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb. Differential Revision: https://reviews.llvm.org/D64952 llvm-svn: 366505	2019-07-18 22:18:33 +00:00
Jessica Paquette	7a1dcc5ff1	[GlobalISel][AArch64] Add support for base register + offset register loads Add support for folding G_GEPs into loads of the form ``` ldr reg, [base, off] ``` when possible. This can save an add before the load. Currently, this is only supported for loads of 64 bits into 64 bit registers. Add a new addressing mode function, `selectAddrModeRegisterOffset` which performs this folding when it is profitable. Also add a test for addressing modes for G_LOAD. Differential Revision: https://reviews.llvm.org/D64944 llvm-svn: 366503	2019-07-18 21:50:11 +00:00
Reid Kleckner	ba9c9e62cb	Revert [X86] EltsFromConsecutiveLoads - support common source loads This reverts r366441 (git commit `48104ef7c9`) This causes clang to fail to compile some file in Skia. Reduction soon. llvm-svn: 366501	2019-07-18 21:26:41 +00:00
Guanzhong Chen	df4479200b	[WebAssembly] Fix __builtin_wasm_tls_base intrinsic Summary: Properly generate the outchain for the `__builtin_wasm_tls_base` intrinsic. Also marked the intrinsic pure, per @sunfish's suggestion. Reviewers: tlively, aheejin, sbc100, sunfish Reviewed By: tlively Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits, sunfish Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64949 llvm-svn: 366499	2019-07-18 21:17:52 +00:00
Guanzhong Chen	801fa8e6b9	[WebAssembly] Implement __builtin_wasm_tls_base intrinsic Summary: Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local block and scan through it for memory leaks. Reviewers: tlively, aheejin, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64900 llvm-svn: 366475	2019-07-18 17:53:22 +00:00
Peter Collingbourne	aa6a7df64a	MC: AArch64: Add support for prel_g* relocation specifiers. Differential Revision: https://reviews.llvm.org/D64683 llvm-svn: 366462	2019-07-18 16:54:33 +00:00
Peter Collingbourne	76427f849f	AArch64: Unify relocation restrictions between MOVK/MOVN/MOVZ. There doesn't seem to be a practical reason for these instructions to have different restrictions on the types of relocations that they may be used with, notwithstanding the language in the ELF AArch64 spec that implies that specific relocations are meant to be used with specific instructions. For example, we currently forbid the first instruction in the following sequence, despite it currently being used by clang to generate a global reference under -mcmodel=large: movz x0, #:abs_g0_nc:foo movk x0, #:abs_g1_nc:foo movk x0, #:abs_g2_nc:foo movk x0, #:abs_g3:foo Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations that they currently accept individually. Differential Revision: https://reviews.llvm.org/D64466 llvm-svn: 366461	2019-07-18 16:51:53 +00:00
Hsiangkai Wang	657277e0f1	Revert "[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame." This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e. llvm-svn: 366444	2019-07-18 15:06:50 +00:00
Hsiangkai Wang	e43ce1a958	[DebugInfo] Generate fixups as emitting DWARF .debug_frame/.eh_frame. It is necessary to generate fixups in .debug_frame or .eh_frame as relaxation is enabled due to the address delta may be changed after relaxation. There is an opcode with 6-bits data in debug frame encoding. So, we also need 6-bits fixup types. Differential Revision: https://reviews.llvm.org/D58335 llvm-svn: 366442	2019-07-18 14:47:34 +00:00
Simon Pilgrim	48104ef7c9	[X86] EltsFromConsecutiveLoads - support common source loads This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load. A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match. Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle. Differential Revision: https://reviews.llvm.org/D64551 llvm-svn: 366441	2019-07-18 14:33:25 +00:00
Sanjay Patel	e654785912	[x86] try harder to form LEA from ADD to avoid flag conflicts (PR40483) LEA doesn't affect flags, so use it more liberally to replace an ADD when we know that the ADD operands affect flags. In the motivating example from PR40483: https://bugs.llvm.org/show_bug.cgi?id=40483 ...this lets us avoid duplicating a math op just to avoid flag conflict. As mentioned in the TODO comments, this heuristic can be extended to fire more often if that leads to more improvements. Differential Revision: https://reviews.llvm.org/D64707 llvm-svn: 366431	2019-07-18 12:48:01 +00:00
Diogo N. Sampaio	11512e742b	[ARM][DAGCOMBINE][FIX] PerformVMOVRRDCombine Summary: PerformVMOVRRDCombine ommits adding a offset of 4 to the PointerInfo, when converting a f64 = load[M] to {i32, i32} = {load[M], load[M + 4]} Which would allow the machine scheduller to break dependencies with the second load. - pr42638 Reviewers: eli.friedman, dmgreen, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64870 llvm-svn: 366423	2019-07-18 10:05:56 +00:00
Alex Bradbury	b8d352a08b	[RISCV] Reset NoPHIS MachineFunctionProperty in emitSelectPseudo We insered PHIS were there were none before, so the property must be reset. This error was found on an EXPENSIVE_CHECKS build. llvm-svn: 366412	2019-07-18 07:52:41 +00:00
Craig Topper	8da0402210	[X86] Disable combineConcatVectors for vXi1 vectors. I'm not convinced the code this calls is properly vetted for vXi1 vectors. Experimental vector widening legalization testing for D55251 is now hitting an assertion failure inside EltsFromConsecutiveLoads. This is occurring from a v2i1 load having a store size different than its VT size. Hopefully this commit will keep such issues from happening. llvm-svn: 366405	2019-07-18 06:18:06 +00:00
Alex Bradbury	8aba95d64c	[RISCV] Avoid signed integer overflow UB in RISCVMatInt::generateInstSeq Found by UBSan. llvm-svn: 366398	2019-07-18 04:02:58 +00:00
Alex Bradbury	ad73a436dc	[RISCV] Don't acccess an invalidated iterator in RISCVInstrInfo::removeBranch Issue found by ASan. llvm-svn: 366397	2019-07-18 03:23:47 +00:00
Fangrui Song	f358cf8de2	[AArch64] Add dependency from AArch64CodeGen to TransformUtils to fix -DBUILD_SHARED_LIBS=on link error after D64173/r366361 This fixes: ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value, llvm::DenseMap<llvm::Value, llvm::Alloc aInst, llvm::DenseMapInfo<llvm::Value>, llvm::detail::DenseMapPair<llvm::Value, llvm::AllocaInst> >&) >>> referenced by AArch64StackTagging.cpp llvm-svn: 366396	2019-07-18 01:53:08 +00:00
Stanislav Mekhanoshin	7872d76a16	[AMDGPU] Simplify AMDGPUInstPrinter::printRegOperand() Differential Revision: https://reviews.llvm.org/D64892 llvm-svn: 366385	2019-07-17 22:58:43 +00:00
Craig Topper	61fff7a337	[X86] Make sure we mark 128/256 MLOAD as Legal with VLX when min-legal-vector-width=256 is in effect. This started triggering an assertion after r364718 when we made these Custom under AVX2. llvm-svn: 366382	2019-07-17 22:26:00 +00:00
Stanislav Mekhanoshin	9c7f4264d3	[AMDGPU] Stop special casing flat_scratch for register name Differential Revision: https://reviews.llvm.org/D64885 llvm-svn: 366376	2019-07-17 21:35:11 +00:00
Evgeniy Stepanov	f45fd429b7	Speculative fix for stack-tagging.ll failure. Depending on the evaluation order of function call arguments, the current code may insert a use before def. llvm-svn: 366375	2019-07-17 21:27:44 +00:00
Evgeniy Stepanov	851339fb29	Basic MTE stack tagging instrumentation. Summary: Use MTE intrinsics to tag stack variables in functions with sanitize_memtag attribute. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64173 llvm-svn: 366361	2019-07-17 19:24:12 +00:00
Evgeniy Stepanov	d752f5e953	Basic codegen for MTE stack tagging. Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360	2019-07-17 19:24:02 +00:00
Momchil Velikov	0e2b74a2b0	Revert [AArch64] Add support for Transactional Memory Extension (TME) This reverts r366322 (git commit `4b8da3a503`) llvm-svn: 366355	2019-07-17 17:43:32 +00:00
Daniil Fukalov	d912a9ba9b	[AMDGPU] Tune inlining parameters for AMDGPU target Summary: Since the target has no significant advantage of vectorization, vector instructions bous threshold bonus should be optional. amdgpu-inline-arg-alloca-cost parameter default value and the target InliningThresholdMultiplier value tuned then respectively. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64642 llvm-svn: 366348	2019-07-17 16:51:29 +00:00
Matt Arsenault	06eed42213	AMDGPU: Use getTargetConstant Avoids creating an extra intermediate mov. llvm-svn: 366340	2019-07-17 15:35:36 +00:00
Alex Bradbury	ab009a602e	[AsmPrinter] Make the encoding of call sites in .gcc_except_table configurable and use for RISC-V The original behavior was to always emit the offsets to each call site in the call site table as uleb128 values, however on some architectures (eg RISCV) these uleb128 offsets into the code cannot always be resolved until link time (because relaxation will invalidate any calculated offsets), and there are no appropriate relocations for uleb128 values. As a consequence it needs to be possible to specify an alternative. This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in .gcc_except_table Differential Revision: https://reviews.llvm.org/D63415 Patch by Edward Jones. llvm-svn: 366329	2019-07-17 14:00:35 +00:00
Jay Foad	70235c642e	[AMDGPU] Optimize atomic AND/OR/XOR Summary: Extend the atomic optimizer to handle AND, OR and XOR. Reviewers: arsenm, sheredom Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64809 llvm-svn: 366323	2019-07-17 13:40:03 +00:00
Momchil Velikov	4b8da3a503	[AArch64] Add support for Transactional Memory Extension (TME) TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Patch by Javed Absar and Momchil Velikov Differential Revision: https://reviews.llvm.org/D64416 llvm-svn: 366322	2019-07-17 13:23:27 +00:00
Justin Hibbits	0257c6b659	PowerPC: Fix register spilling for SPE registers Summary: Missed in the original commit, use the correct callee-saved register list for spilling, instead of the standard SVR432 list. This avoids needlessly spilling the SPE non-volatile registers when they're not used. As part of this, also add where missing, and sort, the spill opcode checks for SPE and SPE4 register classes. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D56703 llvm-svn: 366319	2019-07-17 12:30:48 +00:00
Justin Hibbits	5214956eaa	PowerPC/SPE: Fix load/store handling for SPE Summary: Pointed out in a comment for D49754, register spilling will currently spill SPE registers at almost any offset. However, the instructions `evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256 (unsigned) bytes from the base register, as the offset must fix into a 5-bit offset, which ranges from 0-31 (indexed in double-words). The update to the register spill test is taken partially from the test case shown in D49754. Additionally, pointed out by Kei Thomsen, globals will currently use evldd/evstdd, though the offset isn't known at compile time, so may exceed the 8-bit (unsigned) offset permitted. This fixes that as well, by forcing it to always use evlddx/evstddx when accessing globals. Part of the patch contributed by Kei Thomsen. Reviewers: nemanjai, hfinkel, joerg Subscribers: kbarton, jsji, llvm-commits Differential Revision: https://reviews.llvm.org/D54409 llvm-svn: 366318	2019-07-17 12:30:04 +00:00
Petar Avramovic	1e62635d05	[MIPS GlobalISel] ClampScalar and select pointer G_ICMP Add narrowScalar to half of original size for G_ICMP. ClampScalar G_ICMP's operands 2 and 3 to to s32. Select G_ICMP for pointers for MIPS32. Pointer compare is same as for integers, it is enough to declare them as legal type. Differential Revision: https://reviews.llvm.org/D64856 llvm-svn: 366317	2019-07-17 12:08:01 +00:00
Nicolai Haehnle	8b7041a5c6	AMDGPU/GFX10: Apply the VMEM-to-scalar-write hazard also to writes to EXEC Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630 Reviewers: rampitec, mareko Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64807 Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc llvm-svn: 366314	2019-07-17 11:22:57 +00:00
Nicolai Haehnle	a256b8b7d7	AMDGPU: Improve alias analysis for GDS Summary: GDS cannot alias anything else. Original patch by: Marek Olšák Reviewers: arsenm, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64114 Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0 llvm-svn: 366313	2019-07-17 11:22:19 +00:00
Diana Picus	37e403d18c	[ARM GlobalISel] Cleanup CallLowering. NFC Migrate CallLowering::lowerReturnVal to use the same infrastructure as lowerCall/FormalArguments and remove the now obsolete code path from splitToValueTypes. Forgot to push this earlier. llvm-svn: 366308	2019-07-17 10:01:27 +00:00
Simon Atanasyan	4c1e440892	[mips] Use mult/mflo pattern on 64-bit targets prior to MIPS64 The `MUL` instruction is available starting from the MIPS32/MIPS64 targets. llvm-svn: 366301	2019-07-17 08:11:40 +00:00
Simon Atanasyan	a884afb6f8	[mips] Implement .cplocal directive This directive forces to use the alternate register for context pointer. For example, this code: .cplocal $4 jal foo expands to: ld $25, %call16(foo)($4) jalr $25 Differential Revision: https://reviews.llvm.org/D64743 llvm-svn: 366300	2019-07-17 08:11:31 +00:00
Simon Atanasyan	7f308af5ee	[mips] Support the "o" inline asm constraint As well as other LLVM targets we do not handle "offsettable" memory addresses in any special way. In other words, the "o" constraint is an exact equivalent of the "m" one. But some existing code require the "o" constraint support. This fixes PR42589. Differential Revision: https://reviews.llvm.org/D64792 llvm-svn: 366299	2019-07-17 08:11:15 +00:00
Stanislav Mekhanoshin	e5012ab308	[AMDGPU] Autogenerate register asm names Differential Revision: https://reviews.llvm.org/D64839 llvm-svn: 366283	2019-07-16 23:44:21 +00:00
Guanzhong Chen	0a8d4df799	[WebAssembly] Compile all TLS on Emscripten as local-exec Summary: Currently, on Emscripten, dynamic linking is not supported with threads. This means that if thread-local storage is used, it must be used in a statically-linked executable. Hence, local-exec is the only possible model. This diff compiles all TLS variables to use local-exec on Emscripten as a temporary measure until dynamic linking is supported with threads. The goal for this is to allow C++ types with constructors to be thread-local. Currently, when `clang` compiles a `thread_local` variable with a constructor, it generates `__tls_guard` variable: @__tls_guard = internal thread_local global i8 0, align 1 As no TLS model is specified, this is treated as general-dynamic, which we do not support (and cannot support without implementing dynamic linking support with threads in Emscripten). As a result, any C++ constructor in `thread_local` variables would not compile. By compiling all `thread_local` as local-exec, `__tls_guard` will compile and we can support C++ constructors with TLS without implementing dynamic linking with threads. Depends on D64537 Reviewers: tlively, aheejin, sbc100 Reviewed By: aheejin Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64776 llvm-svn: 366275	2019-07-16 22:22:08 +00:00
Guanzhong Chen	42bba4b852	[WebAssembly] Implement thread-local storage (local-exec model) Summary: Thread local variables are placed inside a `.tdata` segment. Their symbols are offsets from the start of the segment. The address of a thread local variable is computed as `__tls_base` + the offset from the start of the segment. `.tdata` segment is a passive segment and `memory.init` is used once per thread to initialize the thread local storage. `__tls_base` is a wasm global. Since each thread has its own wasm instance, it is effectively thread local. Currently, `__tls_base` must be initialized at thread startup, and so cannot be used with dynamic libraries. `__tls_base` is to be initialized with a new linker-synthesized function, `__wasm_init_tls`, which takes as an argument a block of memory to use as the storage for thread locals. It then initializes the block of memory and sets `__tls_base`. As `__wasm_init_tls` will handle the memory initialization, the memory does not have to be zeroed. To help allocating memory for thread-local storage, a new compiler intrinsic is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns the size of the thread-local storage for the current function. The expected usage is to run something like the following upon thread startup: __wasm_init_tls(malloc(__builtin_wasm_tls_size())); Reviewers: tlively, aheejin, kripken, sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D64537 llvm-svn: 366272	2019-07-16 22:00:45 +00:00
Sanjay Patel	d746a210e1	[x86] use more phadd for reductions This is part of what is requested by PR42023: https://bugs.llvm.org/show_bug.cgi?id=42023 There's an extension needed for FP add, but exactly how we would specify that using flags is not clear to me, so I left that as a TODO. We're still missing patterns for partial reductions when the input vector is 256-bit or 512-bit, but I think that's a failure of vector narrowing. If we can reduce the widths, then this matching should work on those tests. Differential Revision: https://reviews.llvm.org/D64760 llvm-svn: 366268	2019-07-16 21:30:41 +00:00
Matt Arsenault	f8c8284455	AMDGPU/GlobalISel: Select G_ASHR llvm-svn: 366257	2019-07-16 20:31:25 +00:00
Matt Arsenault	e5b28b98e9	AMDGPU/GlobalISel: Select G_LSHR llvm-svn: 366256	2019-07-16 20:25:43 +00:00
Jinsong Ji	65e34a3143	[PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin Summary: This is exposed by our internal testing. The reduced testcase will assert with "Impossible reg-to-reg copy" We can't use COPY to do 32-bit to 64-bit conversion. Reviewers: kbarton, hfinkel, nemanjai Reviewed By: hfinkel Subscribers: hiraditya, MaskRay, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64499 llvm-svn: 366255	2019-07-16 20:24:33 +00:00
Matt Arsenault	1b69fd275d	AMDGPU/GlobalISel: Select G_SHL I think this manages to not break the DAG handling with the divergent predicates because the stadalone divergent patterns end up with a higher priority than the pattern on the instruction definition. The 16-bit versions don't work yet. llvm-svn: 366254	2019-07-16 20:15:30 +00:00
Stanislav Mekhanoshin	6e0fa292c2	[AMDGPU] Change register type for v32 vectors When it is AReg_1024 this results in unnecessary copying into AGPRs of a 32 element vectors even though they are not intended for an mfma instruction. Differential Revision: https://reviews.llvm.org/D64815 llvm-svn: 366252	2019-07-16 20:06:00 +00:00
Matt Arsenault	2d10407719	AMDGPU/GlobalISel: Fix selection of private stores llvm-svn: 366249	2019-07-16 19:27:44 +00:00
Matt Arsenault	7161fb0be5	AMDGPU/GlobalISel: Select private loads llvm-svn: 366248	2019-07-16 19:22:21 +00:00
Matt Arsenault	dad1f89210	AMDGPU/GlobalISel: Select flat stores llvm-svn: 366246	2019-07-16 18:42:53 +00:00
Matt Arsenault	7eb1902cd5	AMDGPU: Add register classes to flat store patterns For some reason GlobalISelEmitter needs register classes to import these, although it works for the load patterns. llvm-svn: 366242	2019-07-16 18:26:42 +00:00
Matt Arsenault	8f8d07e93b	AMDGPU: Replace store PatFrags Convert the easy cases to formats understood for GlobalISel. llvm-svn: 366240	2019-07-16 18:21:25 +00:00
Matt Arsenault	35c96598b1	AMDGPU/GlobalISel: Select flat loads Now that the patterns use the new PatFrag address space support, the only blocker to importing most load patterns is the addressing mode complex patterns. llvm-svn: 366237	2019-07-16 18:05:29 +00:00
Jay Foad	17060f0a54	[AMDGPU] Optimize atomic max/min Summary: Extend the atomic optimizer to handle signed and unsigned max and min operations, as well as add and subtract. Reviewers: arsenm, sheredom, critson, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64328 llvm-svn: 366235	2019-07-16 17:44:54 +00:00
Matt Arsenault	c6fd5abecc	AMDGPU: Redefine load PatFrags Rewrite PatFrags using the new PatFrag address space matching in tablegen. These will now work with both SelectionDAG and GlobalISel. llvm-svn: 366234	2019-07-16 17:38:50 +00:00
Michael Liao	b3f967d411	[AMDGPU] Add the adjusted FP as a livein register. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64145 llvm-svn: 366223	2019-07-16 15:57:12 +00:00
Matt Arsenault	22c4a147a9	AMDGPU/GlobalISel: Fix test failures in release build Apparently the check for legal instructions during instruction select does not happen without an asserts build, so these would successfully select in release, and fail in debug. Make s16 and/or/xor legal. These can just be selected directly to the 32-bit operation, as is already done in SelectionDAG, so just make them legal. llvm-svn: 366210	2019-07-16 14:28:30 +00:00
Kyrylo Tkachov	eb72138340	[AArch64] Implement __jcvt intrinsic from Armv8.3-A The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined. This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function. The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used. I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example). make check-all didn't show any new failures. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics Differential Revision: https://reviews.llvm.org/D64495 llvm-svn: 366197	2019-07-16 09:27:39 +00:00
Kyrylo Tkachov	a3e26d1a6c	[NFC] Test commit: add full stop at end of comment llvm-svn: 366195	2019-07-16 09:15:01 +00:00
Craig Topper	c0b2ed664b	[X86] In combineStore, don't convert v2f32 load/store pairs to f64 loads/stores. Type legalization can take care of this. This gives DAG combine a little more time with the original types. llvm-svn: 366182	2019-07-16 05:52:27 +00:00
Alex Bradbury	1ffceaa543	[RISCV] Match GNU tools canonical JALR and add aliases The canonical GNU form of JALR resembles a load/store instruction rather than placing the immediate offset as a separate argument, so match this behaviour. Also add parser-only aliases for the three-operand form, and add other shorter aliases also emitted by GNU tools. Differential Revision: https://reviews.llvm.org/D55277 Patch by James Clarke. llvm-svn: 366179	2019-07-16 04:56:43 +00:00
Rui Ueyama	49a3ad21d6	Fix parameter name comments using clang-tidy. NFC. This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib//.{cpp,h} ../clang/lib/*/.{cpp,h} ../lld/*/.{cpp,h} llvm-svn: 366177	2019-07-16 04:46:31 +00:00
Alex Bradbury	bb479ca311	[RISCV] Avoid overflow when determining number of nops for code align RISCVAsmBackend::shouldInsertExtraNopBytesForCodeAlign() assumed that the align specified would be greater than or equal to the minimum nop length, but that is not always the case - for example if a user specifies ".align 0" in assembly. Differential Revision: https://reviews.llvm.org/D63274 Patch by Edward Jones. llvm-svn: 366176	2019-07-16 04:40:25 +00:00
Alex Bradbury	e9ad0cf6cf	[RISCV] Fix a potential issue in shouldInsertFixupForCodeAlign() The bool result of shouldInsertExtraNopBytesForCodeAlign() is not checked but the returned nop count is unconditionally read even though it could be uninitialized. Differential Revision: https://reviews.llvm.org/D63285 Patch by Edward Jones. llvm-svn: 366175	2019-07-16 04:37:19 +00:00
Alex Bradbury	ef8577ef98	[RISCV][NFC] Split PseudoCALL pattern out from instruction Since PseudoCALL defines AsmString, it can be generated from assembly, and so code-gen patterns should be defined separately to be consistent with the style of the RISCV backend. Other pseudo-instructions exist that have code-gen patterns defined directly, but these instructions are purely for code-gen and cannot be written in assembly. Differential Revision: https://reviews.llvm.org/D64012 Patch by James Clarke. llvm-svn: 366174	2019-07-16 03:56:45 +00:00
Alex Bradbury	a3c7b27419	[RISCV][NFC] Fix HasStedExtA -> HasStdExtA typo in comment Differential Revision: https://reviews.llvm.org/D64011 Patch by James Clarke. llvm-svn: 366173	2019-07-16 03:54:08 +00:00
Alex Bradbury	4ac0b9be23	[RISCV] Make RISCVELFObjectWriter::getRelocType check IsPCRel Previously, this function didn't check the IsPCRel argument. But doing so is a useful check for errors, and also seemingly necessary for FK_Data_4 (which we produce a R_RISCV_32_PCREL relocation for if IsPCRel). Other than R_RISCV_32_PCREL, this should be NFC. Future exception handling related patches will include tests that capture this behaviour. llvm-svn: 366172	2019-07-16 03:47:34 +00:00
Matt Arsenault	1739b700b1	AMDGPU: Avoid code predicates for extload PatFrags Use the MemoryVT field. This will be necessary for tablegen to automatically handle patterns for GlobalISel. Doesn't handle the d16 lo/hi patterns. Those are a special case since it involvess the custom node type. llvm-svn: 366168	2019-07-16 02:46:05 +00:00
Craig Topper	51193871da	[X86] Teach convertToThreeAddress to handle SUB with immediate We mostly avoid sub with immediate but there are a couple cases that can create them. One is the add 128, %rax -> sub -128, %rax trick in isel. The other is when a SUB immediate gets created for a compare where both the flags and the subtract value is used. If we are unable to linearize the SelectionDAG to satisfy the flag user and the sub result user from the same instruction, we will clone the sub immediate for the two uses. The one that produces flags will eventually become a compare. The other will have its flag output dead, and could then be considered for LEA creation. I added additional test cases to add.ll to show the the sub -128 trick gets converted to LEA and a case where we don't need to convert it. This showed up in the current codegen for PR42571. Differential Revision: https://reviews.llvm.org/D64574 llvm-svn: 366151	2019-07-15 23:07:56 +00:00
Heejin Ahn	1cf6922660	[WebAssembly] Add missing utility methods for exnref type Summary: This adds missing utility methods and copy instruction handling for `exnref` type and also adds tests. `tee` instruction tests are missing because `isTee` is currently only used in ExplicitLocals pass and testing that pass in mir requires serialization of stackified registers in mir files, which is a bit nontrivial because `MachineFunctionInfo` only has info of vreg numbers (which are large integers) but not the mir's register numbers. But this change is quite trivial anyway. Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64705 llvm-svn: 366149	2019-07-15 23:04:00 +00:00
Heejin Ahn	9f96a58ccc	[WebAssembly] Rename except_ref type to exnref Summary: We agreed to rename `except_ref` to `exnref` for consistency with other reference types in https://github.com/WebAssembly/exception-handling/issues/79. This also renames WebAssemblyInstrExceptRef.td to WebAssemblyInstrRef.td in order to use the file for other reference types in future. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64703 llvm-svn: 366145	2019-07-15 22:49:25 +00:00
Wouter van Oortmerssen	292e21d8bc	[WebAssembly] Assembler: support special floats: infinity / nan Summary: These are emitted as identifiers by the InstPrinter, so we should parse them as such. These could potentially clash with symbols of the same name, but that is out of our (the WebAssembly backend) control. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64770 llvm-svn: 366139	2019-07-15 22:13:39 +00:00
Austin Kerbow	423b4a18a4	[AMDGPU] Enable merging m0 initializations. Summary: Enable hoisting and merging m0 defs that are initialized with the same immediate value. Fixes bug where removed instructions are not considered to interfere with other inits, and make sure to not hoist inits before block prologues. Reviewers: rampitec, arsenm Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64766 llvm-svn: 366135	2019-07-15 22:07:05 +00:00
Simon Atanasyan	becae2b232	[mips] Print BEQZL and BNEZL pseudo instructions One of the reasons - to be compatible with GNU tools. llvm-svn: 366133	2019-07-15 21:46:38 +00:00
Matt Arsenault	b082f1055b	AMDGPU: Use standalone MUBUF load patterns We already do this for the flat and DS instructions, although it is certainly uglier and more verbose. This will allow using separate pattern definitions for extload and zextload. Currently we get away with using a single PatFrag with custom predicate code to check if the extension type is a zextload or anyextload. The generic mechanism the global isel emitter understands treats these as mutually exclusive. I was considering making the pattern emitter accept zextload or sextload extensions for anyextload patterns, but in global isel, the different extending loads have distinct opcodes, and there is currently no mechanism for an opcode matcher to try multiple (and there probably is very little need for one beyond this case). llvm-svn: 366132	2019-07-15 21:41:44 +00:00
Matt Arsenault	66ee934440	AMDGPU/GlobalISel: Allow scalar s1 and/or/xor If a 1-bit value is in a 32-bit VGPR, the scalar opcodes set SCC to whether the result is 0. If the inputs are SCC, these can be copied to a 32-bit SGPR to produce an SCC result. llvm-svn: 366125	2019-07-15 20:20:18 +00:00
Matt Arsenault	c8291c94f8	AMDGPU/GlobalISel: Select G_AND/G_OR/G_XOR llvm-svn: 366121	2019-07-15 19:50:07 +00:00
Matt Arsenault	ad19b50c00	AMDGPU/GlobalISel: Don't constrain source register of VCC copies This is a hack until I come up with a better way of dealing with the pseudo-register banks used for boolean values. If the use instruction constrains the register, the selector for the def instruction won't see that the bank was VCC. A 1-bit SReg_32 is could ambiguously have been SCCRegBank or VCCRegBank in wave32. This is necessary to successfully select branches with and and/or/xor condition. llvm-svn: 366120	2019-07-15 19:48:36 +00:00
Matt Arsenault	e1b52f4180	AMDGPU/GlobalISel: Fix selecting vcc->vcc bank copies The extra test change is correct, although how it arrives there is a bug that needs work. With wave32, the test for isVCC ambiguously reports true for an SCC or VCC source. A new allocatable pseudo register class for SCC may be necesssary. llvm-svn: 366119	2019-07-15 19:46:48 +00:00
Matt Arsenault	3bfdb54d88	AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC llvm-svn: 366118	2019-07-15 19:45:49 +00:00
Matt Arsenault	18b7133843	AMDGPU/GlobalISel: Fix handling of sgpr (not scc bank) s1 to VCC This was emitting a copy from a 32-bit register to a 64-bit. llvm-svn: 366117	2019-07-15 19:44:07 +00:00
Matt Arsenault	6ed315f89b	AMDGPU/GlobalISel: Custom legalize G_INSERT_VECTOR_ELT llvm-svn: 366116	2019-07-15 19:43:04 +00:00
Matt Arsenault	b0e04c018c	AMDGPU/GlobalISel: Custom legalize G_EXTRACT_VECTOR_ELT Turn the constant cases into G_EXTRACTs. llvm-svn: 366115	2019-07-15 19:40:59 +00:00
Matt Arsenault	5dfd466032	AMDGPU/GlobalISel: Fix G_ICMP for wave32 llvm-svn: 366114	2019-07-15 19:39:31 +00:00
David Green	dc56995c57	[ARM] MVE vector for 64bit types We need to make sure that we are sensibly dealing with vectors of types v2i64 and v2f64, even if most of the time we cannot generate native operations for them. This mostly adds a lot of testing, plus fixes up a couple of the issues found. And, or and xor can be legal for v2i64, and shifts combining needs a slight fixup. Differential Revision: https://reviews.llvm.org/D64316 llvm-svn: 366106	2019-07-15 18:42:54 +00:00
Matt Arsenault	90bdfb3daf	AMDGPU/GlobalISel: Widen vector extracts llvm-svn: 366103	2019-07-15 18:31:10 +00:00
Matt Arsenault	53fa759ff5	AMDGPU/GlobalISel: Handle llvm.amdgcn.if.break llvm-svn: 366102	2019-07-15 18:25:24 +00:00
Matt Arsenault	b390121efb	AMDGPU/GlobalISel: Select llvm.amdgcn.end.cf llvm-svn: 366099	2019-07-15 18:18:46 +00:00
Sanjay Patel	eb99165b97	[x86] try to keep FP casted+truncated+extracted vector element out of GPRs inttofp (trunc (extelt X, 0)) --> inttofp (extelt (bitcast X), 0) We have pseudo-vectorization of scalar int to FP casts, so this tries to make that more likely by replacing a truncate with a bitcast. I didn't see any test diffs starting from 'uitofp', so I left that as a TODO. We can't only match the shorter trunc+extract pattern because there's an opposing transform somewhere, so we infinite loop. Waiting to try this during lowering is another possibility. A motivating case is shown in PR39975 and included in the test diffs here: https://bugs.llvm.org/show_bug.cgi?id=39975 Differential Revision: https://reviews.llvm.org/D64710 llvm-svn: 366098	2019-07-15 18:17:23 +00:00
Craig Topper	81971b2b79	[X86] Return UNDEF from LowerScalarImmediateShift when the shift amount is out of range. I think we only turn out of range shiftss to undef when all elements are out of range or the shift amount is a splat out of range. I'm not sure which, I didn't check. During lowering we can split a shift where some elements are out of range into multiple shifts. This can create a new shift with a splat shift amount that is out of range. This patch returns undef for this case. Fixes PR42615. Differential Revision: https://reviews.llvm.org/D64699 llvm-svn: 366096	2019-07-15 17:56:57 +00:00
Matt Arsenault	49169a963e	AMDGPU: Add 24-bit mul intrinsics Insert these during codegenprepare. This works around a DAG issue where generic combines eliminate the and asserting the high bits are zero, which then exposes an unknown read source to the mul combine. It doesn't worth the hassle of trying to insert an AssertZext or something to try to deal with it. llvm-svn: 366094	2019-07-15 17:50:31 +00:00
Stanislav Mekhanoshin	7938424eb9	[AMDGPU] Copy missing predicate from pseudo to real NFC at the momemnt, needed for future commit. Differential Revision: https://reviews.llvm.org/D64761 llvm-svn: 366092	2019-07-15 17:49:25 +00:00
David Green	8e7eee617a	[ARM] Minor formatting in ARMInstrMVE.td. NFC llvm-svn: 366089	2019-07-15 17:29:06 +00:00
Matt Arsenault	a65913e752	AMDGPU/GlobalISel: Select easy cases for G_BUILD_VECTOR llvm-svn: 366087	2019-07-15 17:26:43 +00:00
Matt Arsenault	cc02b17082	AMDGPU/GlobalISel: RegBankSelect for G_CONCAT_VECTORS llvm-svn: 366086	2019-07-15 17:20:40 +00:00
Stanislav Mekhanoshin	fd08dcb9db	[AMDGPU] fixed scheduler crash in gfx908 For some reason scheduler can send down an SUnit without an instruction. Differential Revision: https://reviews.llvm.org/D64709 llvm-svn: 366074	2019-07-15 15:34:05 +00:00
Dmitry Preobrazhensky	5153b1723a	[AMDGPU][MC][GFX9][GFX10] Added support of GET_DOORBELL message Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64729 llvm-svn: 366071	2019-07-15 15:12:16 +00:00
Dmitry Preobrazhensky	8d879c8d95	[AMDGPU][MC] Corrected encoding of src0 for DS_GWS_* instructions See bug 42599: https://bugs.llvm.org/show_bug.cgi?id=42599 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D64716 llvm-svn: 366067	2019-07-15 14:37:57 +00:00
Simon Pilgrim	60fb5e97a0	[X86] isTargetShuffleEquivalent - assert the expected mask is correctly formed. NFCI. While we don't make any assumptions about the actual mask, assert that the expected mask only contains valid mask element values. llvm-svn: 366066	2019-07-15 14:29:14 +00:00
Simon Atanasyan	83ae0b5eb4	[mips] Remove "else-after-return". NFC llvm-svn: 366064	2019-07-15 13:12:36 +00:00
David Green	6e89887642	[ARM] MVE Vector Shifts This adds basic lowering for MVE shifts. There are many shifts in MVE, but the instructions handled here are: VSHL (imm) VSHRu (imm) VSHRs (imm) VSHL (vector) VSHL (register) MVE, like NEON before it, doesn't have shift right by a vector (or register). We instead have to negate the amount and shift in the opposite direction. This means we have to convert any SHR's into a form of SHL (that is still signed or unsigned) with a negated condition and selecting from there. MVE still does have shifting by an immediate for SHL, ASR and LSR. This adds lowering for these and for register forms, which work well for shift lefts but may require an extra fold of neg(vdup(x)) -> vdup(neg(x)) to potentially work optimally for right shifts. Differential Revision: https://reviews.llvm.org/D64212 llvm-svn: 366056	2019-07-15 11:35:39 +00:00
David Green	f059147a10	[ARM] Move Shifts after Bits. NFC This just moves the shift instruction definitions further down the ARMInstrMVE.td file, to make positioning patterns slightly more natural. llvm-svn: 366054	2019-07-15 11:22:05 +00:00
David Green	da750b1688	[ARM] Adjust how NEON shifts are lowered This adjusts the way that we lower NEON shifts to use a DAG target node, not via a neon intrinsic. This is useful for handling MVE shifts operations in the same the way. It also renames some of the immediate shift nodes for consistency, and moves some of the processing of immediate shifts into LowerShift allowing it to capture more cases. Differential Revision: https://reviews.llvm.org/D64426 llvm-svn: 366051	2019-07-15 10:44:50 +00:00
Bill Wendling	796ed134cc	Remove set but unused variable. llvm-svn: 366041	2019-07-15 06:35:28 +00:00
Craig Topper	635d103e0b	[X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only. Summary: SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory. This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense. I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load. If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64528 llvm-svn: 366034	2019-07-15 02:02:31 +00:00
Craig Topper	9450b0084a	[X86] Remove offset of 8 from the call to FuseInst for UNPCKLPDrr folding added in r365287. This was copy/pasted from above and I forgot to change it. We just need the default offset of 0 here. Fixes PR42616. llvm-svn: 366011	2019-07-14 04:13:33 +00:00
David Green	458a720ec1	[ARM] Add sign and zero extend patterns for MVE The vmovlb instructions can be uses to sign or zero extend vector registers between types. This adds some patterns for them and relevant testing. The VBICIMM generation is also put behind a hasNEON check (as is already done for VORRIMM). Code originally by David Sherwood. Differential Revision: https://reviews.llvm.org/D64069 llvm-svn: 366008	2019-07-13 15:43:00 +00:00
David Green	07a7ec2021	[ARM] MVE VNEG instruction patterns This selects integer VNEG instructions, which can be especially useful with shifts. Differential Revision: https://reviews.llvm.org/D64204 llvm-svn: 366006	2019-07-13 15:26:51 +00:00
David Green	4ce648b5e8	[ARM] MVE integer abs Similar to floating point abs, we also have instructions for integers. Differential Revision: https://reviews.llvm.org/D64027 llvm-svn: 366005	2019-07-13 14:58:32 +00:00
David Green	701bf714db	[ARM] MVE integer min and max This simply makes the MVE integer min and max instructions legal and adds the relevant patterns for them. Differential Revision: https://reviews.llvm.org/D64026 llvm-svn: 366004	2019-07-13 14:48:54 +00:00
David Green	ac5bcbeb9f	[ARM] MVE VRINT support This adds support for the floor/ceil/trunc/... series of instructions, converting to various forms of VRINT. They use the same suffixes as their floating point counterparts. There is not VTINTR, so nearbyint is expanded. Also added a copysign test, to show it is expanded. Differential Revision: https://reviews.llvm.org/D63985 llvm-svn: 366003	2019-07-13 14:38:53 +00:00
David Green	ec8af0db6c	[ARM] MVE minnm and maxnm instructions This adds the patterns for minnm and maxnm from the fminnum and fmaxnum nodes, similar to scalar types. Original patch by Simon Tatham Differential Revision: https://reviews.llvm.org/D63870 llvm-svn: 366002	2019-07-13 14:29:02 +00:00
Sanjay Patel	2097f75eab	[x86] simplify cmov with same true/false operands llvm-svn: 365998	2019-07-13 12:04:52 +00:00
Stanislav Mekhanoshin	1dfae6fe50	[AMDGPU] use v32f32 for 3 mfma intrinsics These should really use v32f32, but were defined as v32i32 due to the lack of the v32f32 type. Differential Revision: https://reviews.llvm.org/D64667 llvm-svn: 365972	2019-07-12 22:42:01 +00:00
Wouter van Oortmerssen	d8ddf83950	[WebAssembly] refactored utilities to not depend on MachineInstr Summary: Most of these functions can work for MachineInstr and MCInst equally now. Reviewers: dschuff Subscribers: MatzeB, sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64643 llvm-svn: 365965	2019-07-12 22:08:25 +00:00
Evgeniy Stepanov	32452487ae	Factor out resolveFrameOffsetReference (NFC). Split AArch64FrameLowering::resolveFrameIndexReference in two parts * Finding frame offset for the index. * Finding base register and offset to that register. The second part will be used to implement a virtual frame pointer in armv8.5 MTE stack instrumentation lowering. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64171 llvm-svn: 365958	2019-07-12 21:13:55 +00:00
Matt Arsenault	51a05d72ae	AMDGPU: Drop remnants of byval support for shaders Before 2018, mesa used to use byval interchangably with inreg, which didn't really make sense. Fix tests still using it to avoid breaking in a future commit. llvm-svn: 365953	2019-07-12 20:12:17 +00:00
David Tenty	ae79a2c390	Fix missing use of defined() in include guard Subscribers: arsenm, jvesely, nhaehnle, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64657 llvm-svn: 365952	2019-07-12 20:12:15 +00:00
Nikita Popov	411fa4c0df	[SystemZ] Fix addcarry of addcarry of const carry (PR42606) This fixes https://bugs.llvm.org/show_bug.cgi?id=42606 by extending D64213. Instead of only checking if the carry comes from a matching operation, we now check the full chain of carries. Otherwise we might custom lower the outermost addcarry, but then generically legalize an inner addcarry. Differential Revision: https://reviews.llvm.org/D64658 llvm-svn: 365949	2019-07-12 20:03:34 +00:00
Craig Topper	b828f0b90a	[X86] Use MachineInstr::findRegisterDefOperand to simplify some code in optimizeCompareInstr. NFCI llvm-svn: 365946	2019-07-12 19:26:35 +00:00
Ulrich Weigand	38ec89a670	[SystemZ] Fix build bot failure after r365932 Insert LLVM_FALLTHROUGH to avoid compiler warning. llvm-svn: 365942	2019-07-12 18:44:51 +00:00
Stanislav Mekhanoshin	495b0f5cc3	[AMDGPU] Extend MIMG opcode to 8 bits This is NFC, but required for future commit. Differential Revision: https://reviews.llvm.org/D64649 llvm-svn: 365940	2019-07-12 18:38:06 +00:00
Ulrich Weigand	0f0a8b7784	[SystemZ] Add support for new cpu architecture - arch13 This patch series adds support for the next-generation arch13 CPU architecture to the SystemZ backend. This includes: - Basic support for the new processor and its features. - Assembler/disassembler support for new instructions. - CodeGen for new instructions, including new LLVM intrinsics. - Scheduler description for the new processor. - Detection of arch13 as host processor. Note: No currently available Z system supports the arch13 architecture. Once new systems become available, the official system name will be added as supported -march name. llvm-svn: 365932	2019-07-12 18:13:16 +00:00
Craig Topper	98f931639b	[X86] Add NEG to isUseDefConvertible. We can use the C flag from NEG to detect that the input was zero. Really we could probably use the Z flag too. But C matches what we'd do for usubo 0, X. Haven't found a test case for this due to the usubo formation in CGP. But I verified if I comment out the CGP code this transformation catches some of the same cases. llvm-svn: 365929	2019-07-12 17:52:17 +00:00
Jay Foad	27ec195f39	[AMDGPU] Fix DPP combiner check for exec modification Summary: r363675 changed the exec modification helper function, now called execMayBeModifiedBeforeUse, so that if no UseMI is specified it checks all instructions in the basic block, even beyond the last use. That meant that the DPP combiner no longer worked in any basic block that ended with a control flow instruction, and in particular it didn't work on code sequences generated by the atomic optimizer. Fix it by reinstating the old behaviour but in a new helper function execMayBeModifiedBeforeAnyUse, and limiting the number of instructions scanned. Reviewers: arsenm, vpykhtin Subscribers: kzhuravl, nemanjai, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kbarton, MaskRay, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64393 llvm-svn: 365910	2019-07-12 15:59:40 +00:00
Jay Foad	7816ad918f	[AMDGPU] Restrict v_cndmask_b32 abs/neg modifiers to f32 Summary: D64497 allowed abs/neg source modifiers on v_cndmask_b32 but it doesn't make any sense to apply them to f16 operands; they would interpret the bits of the value as an f32, giving nonsensical results. This patch restricts them to f32 operands. Reviewers: arsenm, hakzsam Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64636 llvm-svn: 365904	2019-07-12 15:02:59 +00:00
Fangrui Song	b251cc0d91	Delete dead stores llvm-svn: 365903	2019-07-12 14:58:15 +00:00
Djordje Todorovic	0739ccd3b5	Revert "[DwarfDebug] Dump call site debug info" A build failure was found on the SystemZ platform. This reverts commit 9e7e73578e54cd22b3c7af4b54274d743b6607cc. llvm-svn: 365886	2019-07-12 09:45:12 +00:00
Sam Elliott	fafec5155e	[RISCV] Allow parsing dot '.' in assembly Summary: Useful for jumps, such as `j .`. I am not sure who should review this. Do not hesitate to change the reviewers if needed. Reviewers: asb, jrtc27, lenary Reviewed By: lenary Subscribers: MaskRay, lenary, hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63669 Patch by John LLVM (JohnLLVM) llvm-svn: 365881	2019-07-12 08:36:07 +00:00
Bryant Wong	7ba838d29c	Test commit. NFC. Formatting fix. llvm-svn: 365878	2019-07-12 08:25:59 +00:00
Simon Atanasyan	ee5af50eb0	[mips] Fix JmpLink to texternalsym and tglobaladdr on mcroMIPS R6 There is not match for the `MipsJmpLink texternalsym` and `MipsJmpLink tglobaladdr` patterns for microMIPS R6. As a result LLVM incorrectly selects the `JALRC16` compact 2-byte instruction which takes a target instruction address from a register only and assign `R_MIPS_32` relocation for this instruction. This relocation completely overwrites `JALRC16` and nearby instructions. This patch adds missed matching patterns, selects `BALC` instruction and assign a correct `R_MICROMIPS_PC26_S1` relocation. Differential Revision: https://reviews.llvm.org/D64552 llvm-svn: 365870	2019-07-12 04:58:45 +00:00
Michael Liao	16d3c1ac03	[AMDGPU] Skip calculating callee saved registers for entry function. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64596 llvm-svn: 365846	2019-07-11 23:53:30 +00:00
Matt Arsenault	e5fb434d92	AMDGPU: s_waitcnt field should be treated as unsigned Also make it an ImmLeaf, so it should work with global isel as well, which was part of the point of moving it in the first place. llvm-svn: 365842	2019-07-11 23:42:57 +00:00
Stanislav Mekhanoshin	28550c8680	[AMDGPU] Fixed asan error with agpr spilling Instruction was used after it was erased. llvm-svn: 365837	2019-07-11 22:30:11 +00:00
Stanislav Mekhanoshin	937ff6e701	[AMDGPU] gfx908 agpr spilling Differential Revision: https://reviews.llvm.org/D64594 llvm-svn: 365833	2019-07-11 21:54:13 +00:00
Stanislav Mekhanoshin	7d2019bb96	[AMDGPU] gfx908 hazard recognizer Differential Revision: https://reviews.llvm.org/D64593 llvm-svn: 365829	2019-07-11 21:30:34 +00:00
Stanislav Mekhanoshin	b83e283e65	[AMDGPU] gfx908 scheduling Differential Revision: https://reviews.llvm.org/D64590 llvm-svn: 365826	2019-07-11 21:25:00 +00:00
Stanislav Mekhanoshin	e67cc380a8	[AMDGPU] gfx908 mfma support Differential Revision: https://reviews.llvm.org/D64584 llvm-svn: 365824	2019-07-11 21:19:33 +00:00
Wouter van Oortmerssen	a617967d68	[WebAssembly] Assembler: support negative float constants. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64367 llvm-svn: 365802	2019-07-11 18:18:07 +00:00
Benjamin Kramer	fa1a4e4de5	[NVPTX] Use atomicrmw fadd instead of intrinsics AutoUpgrade the old intrinsics to atomicrmw fadd. llvm-svn: 365796	2019-07-11 17:11:25 +00:00
Sanjay Patel	5cc7c9ab93	[X86] Merge negated ISD::SUB nodes into X86ISD::SUB equivalent (PR40483) Follow up to D58597, where it was noted that the commuted ISD::SUB variant was having problems with lack of combines. See also D63958 where we untangled setcc/sub pairs. Differential Revision: https://reviews.llvm.org/D58875 llvm-svn: 365791	2019-07-11 15:56:33 +00:00
Matt Arsenault	b725d27350	AMDGPU/GlobalISel: Move kernel argument handling to separate function llvm-svn: 365782	2019-07-11 14:18:25 +00:00
Tim Northover	67828edbbd	OpaquePtr: switch to GlobalValue::getValueType in a few places. NFC. llvm-svn: 365770	2019-07-11 13:13:02 +00:00
Fangrui Song	f9ca13cb5f	[X86] -fno-plt: use GOT __tls_get_addr only if GOTPCRELX is enabled Summary: As of binutils 2.32, ld has a bogus TLS relaxation error when the GD/LD code sequence using R_X86_64_GOTPCREL (instead of R_X86_64_GOTPCRELX) is attempted to be relaxed to IE/LE (binutils PR24784). gold and lld are good. In gcc/config/i386/i386.md, there is a configure-time check of as/ld support and the GOT relaxation will not be used if as/ld doesn't support it: if (flag_plt \|\| !HAVE_AS_IX86_TLS_GET_ADDR_GOT) return "call\t%P2"; return "call\t{*%p2@GOT(%1)\|[DWORD PTR %p2@GOT[%1]]}"; In clang, -DENABLE_X86_RELAX_RELOCATIONS=OFF is the default. The ld.bfd bogus error can be reproduced with: thread_local int a; int main() { return a; } clang -fno-plt -fpic a.cc -fuse-ld=bfd GOTPCRELX gained relative good support in 2016, which is considered relatively new. It is even difficult to conditionally default to -DENABLE_X86_RELAX_RELOCATIONS=ON due to cross compilation reasons. So work around the ld.bfd bug by only using GOT when GOTPCRELX is enabled. Reviewers: dalias, hjl.tools, nikic, rnk Reviewed By: nikic Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64304 llvm-svn: 365752	2019-07-11 10:10:09 +00:00
Sam Parker	08b4a8da07	[ARM][LowOverheadLoops] Correct offset checking This patch addresses a couple of problems: 1) The maximum supported offset of LE is -4094. 2) The offset of WLS also needs to be checked, this uses a maximum positive offset of 4094. The use of BasicBlockUtils has been changed because the block offsets weren't being initialised, but the isBBInRange checks both positive and negative offsets. ARMISelLowering has been tweaked because the test case presented another pattern that we weren't supporting. llvm-svn: 365749	2019-07-11 09:56:15 +00:00
Simon Tatham	7916198a41	[ARM] Remove nonexistent unsigned forms of MVE VQDMLAH. The VQDMLAH.U8, VQDMLAH.U16 and VQDMLAH.U32 instructions don't actually exist: the Armv8.1-M architecture spec only lists signed forms of that instruction. The unsigned ones were added in error: they existed in an early draft of the spec, but they were removed before the public version, and we missed that particular spec change. Also affects the variant forms VQDMLASH, VQRDMLAH and VQRDMLASH. Reviewers: miyuki Subscribers: javed.absar, kristof.beyls, hiraditya, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64502 llvm-svn: 365747	2019-07-11 09:52:15 +00:00
Petar Avramovic	962524070a	[MIPS GlobalISel] Skip copies in addUseDef and addDefUses Skip copies between virtual registers during search for UseDefs and DefUses. Since each operand has one def search for UseDefs is straightforward. But since operand can have many uses, we have to check all uses of each copy we traverse during search for DefUses. Differential Revision: https://reviews.llvm.org/D64486 llvm-svn: 365744	2019-07-11 09:28:34 +00:00

... 3 4 5 6 7 ...

53349 Commits