llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	443556c18f	AMDGPU/GlobalISel: Fix some legalization of < dword vector stores This avoids many instances of failing to legalize a vector truncstore of <4 x s8> to 2 bytes. We don't perfectly handle every truncstore yet, largely because the given set of legalization actions can't actually differentiate between changing the result type and changing the memory type.	2020-06-26 18:07:39 -04:00
Matt Arsenault	c2e403c19d	GlobalISel: Don't fail translate on weak cmpxchg The translation of cmpxchg added by `9481399c0f` specifically skipped weak cmpxchg due to not understanding the meaning. Weak cmpxchg was added in `420a216817`. As explained in the commit message, the weak mode is implicit in how ATOMIC_CMP_SWAP_WITH_SUCCESS is lowered. If it's expanded to a regular ATOMIC_CMP_SWAP, it's replaced with a strong cmpxchg. This handling seems weird to me, but this was already following the DAG behavior. I would expect the strong IR instruction to not have the boolean output. Failing that, I might expect the IRTranslator to emit ATOMIC_CMP_SWAP and a constant for the boolean.	2020-06-26 17:52:18 -04:00
Francesco Petrogalli	ddbdff3acc	[sve][acle] Recommit https://reviews.llvm.org/D82501 The original patch was reverted in `ff5ccf258e` as it was missing the C tests that got accidentally missing. This patch is a NFC of https://reviews.llvm.org/D82501, together with the SVE ACLE tests for the C intrinsics of svreinterpret for brain float types.	2020-06-26 20:45:29 +00:00
Amy Huang	8b59c26bf3	Extend or truncate __ptr32/__ptr64 pointers when dereferenced. Summary: A while ago I implemented the functionality to lower Microsoft __ptr32 and __ptr64 pointers, which are stored as 32-bit and 64-bit pointer and are extended/truncated to the appropriate pointer size when dereferenced. This patch adds an addrspacecast to cast from the __ptr32/__ptr64 pointer to a default address space when dereferencing. Bug: https://bugs.llvm.org/show_bug.cgi?id=42359 Reviewers: hans, arsenm, RKSimon Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D81517	2020-06-26 13:33:54 -07:00
Francesco Petrogalli	ff5ccf258e	Revert "[sve][acle] Add reinterpret intrinsics for brain float." This reverts commit `a15722c5ce`. The commmit has to be reverted because I accidentally submit https://reviews.llvm.org/D82501 without the C tests that were added in an early version of the patch.	2020-06-26 20:19:49 +00:00
Paul Walker	3a98d5d7e7	[SVE] Code generation for fixed length vector adds. Summary: Teach LowerToPredicatedOp to lower fixed length vector operations. Add AArch64ISD nodes and isel patterns for predicated integer and floating point adds. Together this enables SVE code generation for fixed length vector adds. Reviewers: rengolin, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82483	2020-06-26 19:54:41 +00:00
Matt Arsenault	9e03bdebc1	AMDGPU: Add llvm.amdgcn.sqrt intrinsic I spread the GlobalISel test into the regular one, which I've been avoiding so far.	2020-06-26 15:07:07 -04:00
Sanjay Patel	67043ed885	[AArch64] add vector test for merged condition branching; NFC	2020-06-26 14:22:11 -04:00
Amy Kwan	fa0da7ec6a	[PowerPC] Add support for llvm.ppc.dcbt, llvm.ppc.dcbtst, llvm.ppc.isync intrinsics This patch adds LLVM intrinsics for the dcbt (Data Cache Block Touch), dcbtst (Data Cache Block Touch for Store) and isync (Instruction Synchronize) instructions. The intrinsic for dcbt and dcbst in this patch are named llvm.ppc.dcbt.with.hint and llvm.ppc.dcbtst.with.hint respectively as there already exists an intrinsic for llvm.ppc.dcbt and llvm.ppc.dcbtst. However, the original variants of the intrinsics do not accept the TH immediate field, whereas these variants do. Differential Revision: https://reviews.llvm.org/D79633	2020-06-26 13:02:18 -05:00
Philip Reames	2e17bba324	Migrate last batch of tests to gc-live bundle format For context of anyone following along, we've not completed the migration of statepoint to the operand bundle form. The only remaining piece is to actually version the statepoint intrinsic to remove the old inline operand sets. That will follow when I have some time; delay is useful here to allow downstream migrations.	2020-06-26 10:28:27 -07:00
Francesco Petrogalli	a15722c5ce	[sve][acle] Add reinterpret intrinsics for brain float. Reviewers: kmclaughlin, efriedma, ctetreau, sdesmalen, david-arm Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82501	2020-06-26 15:20:58 +00:00
Matt Arsenault	431daedee4	AMDGPU/GlobalISel: Fix legacy clover kernel argument ABI This had an extra attempt to align the pointer, which only did anything with a base kernel argument offset which only clover used to use.	2020-06-26 10:03:05 -04:00
Matt Arsenault	54573528ae	AMDGPU/GlobalISel: Add baseline checks for legacy clover kernel ABI I'm not sure we actually need to support this now, since I think clover always explicitly uses amdgcn-mesa-mesa3d now, not the ill-defined amdgcn-- behavior.	2020-06-26 10:03:05 -04:00
Matt Arsenault	b1cfa64cb1	AMDGPU/GlobalISel: Uncomment some fixed tests	2020-06-26 10:03:05 -04:00
Anatoly Trosinenko	cb56fa2196	[MSP430] Update register names When writing a unit test on replacing standard epilogue sequences with `BR __mspabi_func_epilog_<N>`, by manually asm-clobbering `rN` - `r10` for N = 4..10, everything worked well except for seeming inability to clobber r4. The problem was that MSP430 code generator of LLVM used an obsolete name FP for that register. Things were worse because when `llc` read an unknown register name, it silently ignored it. That is, I cannot use `fp` register name from the C code because Clang does not accept it (exactly like GCC). But the accepted name `r4` is not recognised by `llc` (it can be used in listings passed to `llvm-mc` and even `fp` is replace to `r4` by `llvm-mc`). So I can specify any of `fp` or `r4` for the string literal of `asm(...)` but nothing in the clobber list. This patch replaces `MSP430::FP` with `MSP430::R4` in the backend code (even [MSP430 EABI](http://www.ti.com/lit/an/slaa534/slaa534.pdf) doesn't mention FP as a register name). The R0 - R3 registers, on the other hand, are left as is in the backend code (after all, they have some special meaning on the ISA level). It is just ensured clang is renaming them as expected by the downstream tools. There is probably not much sense in marking them clobbered but rename them //just in case// for use at potentially different contexts. Differential Revision: https://reviews.llvm.org/D82184	2020-06-26 15:32:07 +03:00
Kerry McLaughlin	edcfef8fee	[AArch64][SVE] Add bfloat16 support to store intrinsics Summary: Bfloat16 support added for the following intrinsics: - ST1 - STNT1 Reviewers: sdesmalen, c-rhodes, fpetrogalli, efriedma, stuij, david-arm Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82448	2020-06-26 11:05:56 +01:00
Kerry McLaughlin	0ccfe1b267	[AArch64][SVE] Predicate bfloat16 load patterns with HasBF16 Reviewers: sdesmalen, c-rhodes, efriedma, fpetrogalli Reviewed By: fpetrogalli Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82464	2020-06-26 10:38:24 +01:00
Cullen Rhodes	c65d4eb5d3	[AArch64][SVE] Guard perm and select bfloat16 intrinsic patterns Summary: Permutation and selection bfloat16 intrinsic patterns should be guarded on the feature flag `+bf16`. Missed in D82182 and D80850. Reviewers: sdesmalen, fpetrogalli, kmclaughlin, efriedma Reviewed By: fpetrogalli Differential Revision: https://reviews.llvm.org/D82492	2020-06-26 09:35:36 +00:00
David Green	d428f88152	[ARM] VCVTT fpround instruction selection Similar to the recent patch for fpext, this adds vcvtb and vcvtt with insert into vector instruction selection patterns for fptruncs. This helps clear up a lot of register shuffling that we would otherwise do. Differential Revision: https://reviews.llvm.org/D81637	2020-06-26 10:24:06 +01:00
David Green	76e0e1a55d	[ARM] VCVTT instruction selection We current extract and convert from a top lane of a f16 vector using a VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The pattern is mostly copied from a vector extract pattern, but produces a VCVTTHS f32 directly. This had to move some code around so that ARMInstrVFP had access to the required pattern frags that were previously part of ARMInstrNEON. Differential Revision: https://reviews.llvm.org/D81556	2020-06-26 08:58:55 +01:00
Sjoerd Meijer	243a5329d4	[SelectionDAG] Lower @llvm.get.active.lane.mask to setcc This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp ule, and creates vectors for its 2 arguments on which the comparison is performed. Differential Revision: https://reviews.llvm.org/D82292	2020-06-26 07:46:38 +01:00
Sjoerd Meijer	1319d9bb84	[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication pass Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292. Differential Revision: https://reviews.llvm.org/D82105	2020-06-26 07:42:39 +01:00
Amy Kwan	e0c02dc980	[PowerPC][Power10] Implement centrifuge, vector gather every nth bit, vector evaluate Builtins in LLVM/Clang This patch implements builtins for the following prototypes: unsigned long long __builtin_cfuged (unsigned long long, unsigned long long); vector unsigned long long vec_cfuge (vector unsigned long long, vector unsigned long long); unsigned long long vec_gnb (vector unsigned __int128, const unsigned int); vector unsigned char vec_ternarylogic (vector unsigned char, vector unsigned char, vector unsigned char, const unsigned int); vector unsigned short vec_ternarylogic (vector unsigned short, vector unsigned short, vector unsigned short, const unsigned int); vector unsigned int vec_ternarylogic (vector unsigned int, vector unsigned int, vector unsigned int, const unsigned int); vector unsigned long long vec_ternarylogic (vector unsigned long long, vector unsigned long long, vector unsigned long long, const unsigned int); vector unsigned __int128 vec_ternarylogic (vector unsigned __int128, vector unsigned __int128, vector unsigned __int128, const unsigned int); Differential Revision: https://reviews.llvm.org/D80970	2020-06-25 21:34:41 -05:00
Amara Emerson	97a34b5f8d	[AArch64][GlobalISel] Fix extended shift addressing mode selection not handling sxth. The complex pattern for extended shift offsets only allow sxtw as the extend, not sxth. Our equivalent function to do this was not rejecting SXTH so we were miscompiling. This was exposed by D81992.	2020-06-25 17:24:32 -07:00
Wouter van Oortmerssen	b9a539c010	[WebAssembly] Adding 64-bit versions of __stack_pointer and other globals We have 6 globals, all of which except for __table_base are 64-bit under wasm64. Differential Revision: https://reviews.llvm.org/D82130	2020-06-25 15:52:44 -07:00
Jessica Paquette	7fb84dff69	[AArch64][GlobalISel] Port buildvector -> dup pattern from AArch64ISelLowering Given this: ``` %x:_(<n x sK>) = G_BUILD_VECTOR %lane, ... ... %y:_(<n x sK>) = G_SHUFFLE_VECTOR %x(<n x sK>), %foo, shufflemask(0, 0, ...) ``` We can produce: ``` %y:_(<n x sK) = G_DUP %lane(sK) ``` Doesn't seem to be too common, but AArch64ISelLowering attempts to do this before trying to produce a DUPLANE. Might as well port it. Also make it so that when the splat has an undef mask, we try setting it to 0. SDAG does this, and it makes sure that when we get the build vector operand, we actually get a source operand. Differential Revision: https://reviews.llvm.org/D81979	2020-06-25 14:19:06 -07:00
Philip Reames	5d65529e50	Migrate a couple of codegen tests to gc-live format	2020-06-25 14:11:20 -07:00
David Green	d79b57b8bb	[ARM] Split FPExt loads This extends PerformSplittingToWideningLoad to also handle FP_Ext, as well as sign and zero extends. It uses an integer extending load followed by a VCVTL on the bottom lanes to efficiently perform an fpext on a smaller than legal type. The existing code had to be rewritten a little to not just split the node in two and let legalization handle it from there, but to actually split into legal chunks. Differential Revision: https://reviews.llvm.org/D81340	2020-06-25 21:55:13 +01:00
Sanjay Patel	7231295830	[x86] add vector test for merged condition branching; NFC	2020-06-25 16:28:10 -04:00
Philip Reames	b5769a777f	Migrate a couple of codegen tests to gc-live format	2020-06-25 12:58:52 -07:00
David Green	8532b2ee89	[ARM] MVE VCVT lowering for f16->f32 extends This adds code to lower f16 to f32 fp_exts's using an MVE VCVT instructions, similar to a recent similar patch for fp_trunc. Again it goes through the lowering of a BUILD_VECTOR, but is slightly simpler only having to deal with interleaved indices. It adds a VCVTL node to lower to, similar to VCVTN. Differential Revision: https://reviews.llvm.org/D81339	2020-06-25 20:54:26 +01:00
Craig Topper	6673d69226	[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature. The PREFETCHW instruction was originally part of the 3DNow. But it was given its own CPUID bit on later CPUs just before 3DNow was deprecated. We were setting the -mprfchw flag if -m3dnow was passed or the CPU supported 3dnow unless -mno-prfchw was passed. But -march=native on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw. So -march=k8 will behave differently than -march=native on a K8 for example. So remove this implicit setting from the frontend and instead enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled. Also enable PRFCHW flag on amdfam10/barcelona which seems to be where this CPUID bit was introduced. That CPU also supported 3dnow.	2020-06-25 12:46:52 -07:00
Craig Topper	01c18f9199	Revert "[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature." This is failing on the bots. This reverts commit `636d31a5c3`.	2020-06-25 11:43:02 -07:00
David Green	0bfb4c2506	[ARM] Add FP_ROUND handling to splitting MVE stores This splits MVE vector stores of a fp_trunc in the same way that we do for standard trunc's. It extends PerformSplittingToNarrowingStores to handle fp_round, splitting the store into pieces and adding a VCVTNb to perform the actual fp_round. The actual store is then converted to an integer store so that it can truncate bottom lanes of the result. Differential Revision: https://reviews.llvm.org/D81141	2020-06-25 19:37:15 +01:00
Craig Topper	636d31a5c3	[X86] Don't imply -mprfchw when -m3dnow is specified. Enable prefetchw in the backend with 3dnow feature. The PREFETCHW instruction was originally part of the 3DNow. But it was given its own CPUID bit on later CPUs just before 3DNow was deprecated. We were setting the -mprfchw flag if -m3dnow was passed or the CPU supported 3dnow unless -mno-prfchw was passed. But -march=native on a CPU without the PRFCHW CPUID bit set will pass -mno-prfchw. So -march=k8 will behave differently than -march=native on a K8 for example. So remove this implicit setting from the frontend and instead enable the backend to use PREFETCHW if 3dnow OR prfchw is enabled. Also enable PRFCHW flag on amdfam10/barcelona which seems to be where this CPUID bit was introduced. That CPU also supported 3dnow.	2020-06-25 11:25:35 -07:00
Francesco Petrogalli	7200fa38a9	[sve][acle] Add some C intrinsics for brain float types. Summary: The following intrinsics has been added: svuint16_t svcnt[_bf16]_m(svuint16_t inactive, svbool_t pg, svbfloat16_t op) svuint16_t svcnt[_bf16]_x(svbool_t pg, svbfloat16_t op) svuint16_t svcnt[_bf16]_z(svbool_t pg, svbfloat16_t op) svbfloat16_t svtbl[_bf16](svbfloat16_t data, svuint16_t indices) svbfloat16_t svtbl2[_bf16](svbfloat16x2_t data, svuint16_t indices) svbfloat16_t svtbx[_bf16](svbfloat16_t fallback, svbfloat16_t data, svuint16_t indices) Reviewers: c-rhodes, kmclaughlin, efriedma, sdesmalen, ctetreau Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D82429	2020-06-25 16:31:01 +00:00
David Green	3cb2190b0b	[ARM] MVE VCVT lowering for f32->f16 truncs This adds code to lower f32 to f16 fp_trunc's using a pair of MVE VCVT instructions. Due to v4f16 not being legal, fp_round are often split up fairly early. So this reconstructs the vcvt's from a buildvector of fp_rounds from two vector inputs. Something like: BUILDVECTOR(FP_ROUND(EXTRACT_ELT(X, 0), FP_ROUND(EXTRACT_ELT(Y, 0), FP_ROUND(EXTRACT_ELT(X, 1), FP_ROUND(EXTRACT_ELT(Y, 1), ...) It adds a VCVTN node to handle this, which like VMOVN or VQMOVN lowers into the top/bottom lanes of an MVE instruction. Differential Revision: https://reviews.llvm.org/D81139	2020-06-25 15:59:36 +01:00
Sam Tebbs	187f627a50	[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsics This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication. Differential revision: https://reviews.llvm.org/D82377	2020-06-25 11:54:29 +01:00
Shawn Landden	de9f842c55	[PowerPC] add popcount CodeGen test; NFC	2020-06-25 12:41:33 +04:00
Piotr Sobczak	0045786f14	[AMDGPU] Select s_cselect Summary: Add patterns to select s_cselect in the isel. Handle more cases of implicit SCC accesses in si-fix-sgpr-copies to allow new patterns to work. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, asbirlea, kerbowa, llvm-commits Tags: #llvm Re-commit D81925 with a bugfix D82370. Differential Revision: https://reviews.llvm.org/D81925 Differential Revision: https://reviews.llvm.org/D82370	2020-06-25 10:38:23 +02:00
Craig Topper	a5041987ed	[X86] Emit a reg-reg copy for fast isel of vector bitcasts. Previously we just updated a map and moved on. But it possible we cached known bits information with the vreg that can be used by another basic block. If the other basic block has a different view of the VT these known bits won't make sense. By emitting a copy we ensure we have different vregs before and after the bitcast. This prevents the known bits from being used with the wrong type. Differential Revision: https://reviews.llvm.org/D82517	2020-06-24 20:15:21 -07:00
Wang, Pengfei	b2eb1c5793	[X86] Fix a typo error. Summary: This will result opcode MULX32Hrm been emitted to MULX32Hrr. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D82472	2020-06-25 10:06:27 +08:00
Pengfei Wang	bcb75344a5	[X86][NFC] Pre-commit test case for the following patch.	2020-06-24 18:37:01 -07:00
Scott Linder	4d81aec40c	[MIR] Fix CFI_INSTRUCTION escape printing Summary: The printer seems to intend to not print the trailing comma but has a copy-paste error for the last value in the escape, and the parser enforces having no trailing comma, but somehow a test was never included to actually confirm it. Reviewers: thegameg, arsenm Reviewed By: thegameg, arsenm Subscribers: wdng, arsenm, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82478	2020-06-24 18:15:28 -04:00
Yuanfang Chen	ebc88811b5	Remove Passes dependency on CodeGen The dependency was introduced in `5134020ea6`. The only functional change from this removal would be the new PM interface for the two codegen passes. This is not necessary since we don't have codegen pipeline using new PM yet. This removal is to break the potential circular dependency between Passes and CodeGen once the codegen begins to gain new PM support.	2020-06-24 14:52:46 -07:00
Amy Kwan	d82f26cc4b	[PowerPC][Power10] Implement Count Leading/Trailing Zeroes Builtins under bit Mask in LLVM/Clang This patch implements builtins for the following prototypes: unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long) unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long) vector unsigned long long vec_cntlzm (vector unsigned long long, vector unsigned long long) vector unsigned long long vec_cnttzm (vector unsigned long long, vector unsigned long long) Differential Revision: https://reviews.llvm.org/D80941	2020-06-24 16:03:45 -05:00
Sanjay Patel	26fd3ffa78	[x86][AArch64] add tests for fmul-fma combine; NFC As discussed in D80801, there's a possible overstep in what is allowed by the 'contract' fast-math-flag.	2020-06-24 15:56:32 -04:00
tatz.j@northeastern.edu	af5e61bf4f	[NVPTX] Fix for NVPTX module asm regression Currently module asm ends up emitted twice and at the wrong place in the PTX. This patch moves module asm generation into emitStartOfAsmFile() which puts at the correct location in the generated PTX. Differential Revision: https://reviews.llvm.org/D82280	2020-06-24 11:17:09 -07:00
Craig Topper	1a4f888980	[X86] Rename O3-pipeline.ll to opt-pipeline.ll and add O1/O2 command lines Eric Cristopher asked me about possibly disabling some passes at -O1/Og. Figured a good first step was to test all the pipelines. They all appear to be the same for now. Hoping we can use FileCheck prefixes for differences to avoid repeating the contents 3 times.	2020-06-24 11:09:50 -07:00
dstuttar	e8775c8d81	[AMDGPU] Make sure to fix implicit operands on insertBranch Summary: Without fixImplicitOperands we may end up creating default implicit operands that are the wrong wave size Includes simple test that provokes insertBranch in the correct way to expose the issue being fixed. Change-Id: I92bdcdee9fcb7b4d91529b84e76a48ac8218483e Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D82459	2020-06-24 16:50:48 +01:00

1 2 3 4 5 ...

34561 Commits