llvm-project

Commit Graph

Author	SHA1	Message	Date
Kerry McLaughlin	da4ef9b4c8	[SVE][Inline-Asm] Support for SVE asm operands Summary: Adds the following inline asm constraints for SVE: - w: SVE vector register with full range, Z0 to Z31 - x: Restricted to registers Z0 to Z15 inclusive. - y: Restricted to registers Z0 to Z7 inclusive. This change also adds the "z" modifier to interpret a register as an SVE register. Not all of the bitconvert patterns added by this patch are used, but they have been included here for completeness. Reviewers: t.p.northover, sdesmalen, rovka, momchil.velikov, rengolin, cameron.mcinally, greened Reviewed By: sdesmalen Subscribers: javed.absar, tschuett, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66302 llvm-svn: 370673	2019-09-02 16:12:31 +00:00
Simon Pilgrim	fb5661a884	[X86] getPMOVMSKB - add MVT::v64i8 handling and remove from combineBitcastvxi1. NFCI. llvm-svn: 370670	2019-09-02 15:10:35 +00:00
Jay Foad	6e18266aa4	Partially revert D61491 "AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0" Summary: D61491 caused us to use relocs when they're not strictly necessary, to refer to symbols in the text section. This is a pessimization and it's a problem for some loaders that don't support relocs yet. Reviewers: nhaehnle, arsenm, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D65813 llvm-svn: 370667	2019-09-02 14:40:57 +00:00
Dmitry Preobrazhensky	4aa90ea58e	[AMDGPU][MC][GFX10] Corrected constant bus checks to exclude null See AMD SWDEV-157286 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65229 llvm-svn: 370665	2019-09-02 14:19:52 +00:00
Dmitry Preobrazhensky	9c68eddbbe	[AMDGPU][MC][GFX10] Enabled null with 64-bit operands See Bug 42745: https://bugs.llvm.org/show_bug.cgi?id=42745 Reviewers: atamazov, arsenm https://reviews.llvm.org/D65231 llvm-svn: 370660	2019-09-02 13:42:25 +00:00
Dmitry Preobrazhensky	fe2ee4c46a	[AMDGPU][MC][GFX10] Corrected constant bus limit for 64-bit shift instructions See bug 42744: https://bugs.llvm.org/show_bug.cgi?id=42744 Reviewers: atamazov, arsenm Differential Revision: https://reviews.llvm.org/D65228 llvm-svn: 370652	2019-09-02 12:50:05 +00:00
Andrea Di Biagio	528f68144b	[X86][BtVer2] Fix latency and throughput of conditional SIMD store instructions. On BtVer2 conditional SIMD stores are heavily microcoded. The latency is directly proportional to the number of packed elements extracted from the input vector. Also, according to micro-benchmarks, most of the computation seems to be done in the integer unit. Only a minority of the uOPs is executed by the FPU. The observed behaviour on the FPU looks similar to this: - The input MASK value is moved to the Integer Unit -- [ a VMOVMSK-like uOP-executed on JFPU0]. - In parallel, each element of the input XMM/YMM is extracted and then sent to the IntegerUnit through JFPU1. As expected, a (conditional) store is executed for every extracted element. Interestingly, a (speculative) load is executed for every extracted element too. It is as-if a "LOAD - BIT_EXTRACT- CMOV" sequence of uOPs is repeated by the integer unit for every contionally stored element. VMASKMOVDQU is a special case: the number of speculative loads is always 2 (presumably, one load per quadword). That means, extra shifts and masking is performed on (one of) the loaded quadwords before each conditional store (that also explains the big number of non-FP uOPs retired). This patch replaces the existing writes for conditional SIMD stores (i.e. WriteFMaskedStore, and WriteFMaskedStoreY) with the following new writes: WriteFMaskedStore32 [ XMM Packed Single ] WriteFMaskedStore32Y [ YMM Packed Single ] WriteFMaskedStore64 [ XMM Packed Double ] WriteFMaskedStore64Y [ YMM Packed Double ] Added a wrapper class named X86SchedWriteMaskMove in X86Schedule.td to describe both RM and MR variants for conditional SIMD moves in a single tablegen definition. Instances of that class are then passed in input to multiclass avx_movmask_rm when constructing MASKMOVPS/PD definitions. Since this patch introduces new writes, I had to update all the X86 scheduling models. Differential Revision: https://reviews.llvm.org/D66801 llvm-svn: 370649	2019-09-02 12:32:28 +00:00
Simon Pilgrim	05a3a92751	[X86] combineHorizontalPredicateResult - pull out repeated getTargetLoweringInfo() calls. NFCI. llvm-svn: 370637	2019-09-02 10:42:48 +00:00
Craig Topper	3ab210862a	[X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620	2019-09-01 22:14:36 +00:00
Simon Pilgrim	07de5292e5	[X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI. Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613	2019-09-01 16:04:28 +00:00
Simon Pilgrim	27cc2efaf2	Fix shadow variable warning. NFCI. llvm-svn: 370610	2019-09-01 13:10:18 +00:00
David Green	8469a39af3	[ARM] Remove MVE masked loads/stores These were never enabled correctly and are causing other problems. Taking them out for the moment, whilst we work on the issues. This reverts r370329. llvm-svn: 370607	2019-09-01 10:11:40 +00:00
Matt Arsenault	ede9a5293d	AMDGPU: Remove unused custom node definition llvm-svn: 370603	2019-09-01 02:00:08 +00:00
Craig Topper	1594605416	[X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns with VMOVDI2PDIrr/VMOV64toPQIrr. This is what the copies will eventually be turned into. We don't use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should use the real instruction here too. llvm-svn: 370601	2019-08-31 23:52:25 +00:00
Craig Topper	1329cc6e01	[X86] Compress the flag bits in the folding tables to make room for more bits in an upcoming patch. llvm-svn: 370600	2019-08-31 23:52:21 +00:00
David Bolvansky	8caa16ec13	[NFC] Fixed -Wdocumentation warning /srv/llvm-buildbot-srcatch/llvm-build-dir/clang-x86_64-debian-fast/llvm.src/lib/Target/AMDGPU/AMDGPUGenRegisterBankInfo.def:98:1: warning: not a Doxygen trailing comment [-Wdocumentation] 1 warning generated. llvm-svn: 370596	2019-08-31 18:44:57 +00:00
Simon Pilgrim	f8d1d00190	[X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592	2019-08-31 16:21:31 +00:00
Simon Pilgrim	cffbec63d6	Fix shadow variable warning by making CondCodes names more explicit. NFCI. llvm-svn: 370589	2019-08-31 15:19:59 +00:00
Simon Pilgrim	ad020c0af1	Fix shadow variable warning. NFCI. llvm-svn: 370585	2019-08-31 15:01:03 +00:00
Simon Pilgrim	2d89007f61	[X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI. Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe). llvm-svn: 370583	2019-08-31 14:18:26 +00:00
Simon Pilgrim	7238353da2	[X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI. VT of SELECT result and selection ops will be the same. llvm-svn: 370581	2019-08-31 13:14:52 +00:00
Thomas Lively	d0d9317061	[WebAssembly] Add SIMD QFMA/QFMS Summary: Adds clang builtins and LLVM intrinsics for these experimental instructions. They are not implemented in engines yet, but that is ok because the user must opt into using them by calling the builtins. Reviewers: aheejin, dschuff Reviewed By: aheejin Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D67020 llvm-svn: 370556	2019-08-31 00:12:29 +00:00
Reid Kleckner	185ddc08ee	Fix SEH_NoReturn machine verifier error llvm-svn: 370543	2019-08-30 22:40:51 +00:00
Reid Kleckner	657a06c619	[MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata directives llvm-svn: 370540	2019-08-30 22:25:55 +00:00
Reid Kleckner	a33474d595	[X86] Print register names in .seh_* directives Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533	2019-08-30 21:23:05 +00:00
Reid Kleckner	0bb1630685	[Windows] Disable TrapUnreachable for Win64, add SEH_NoReturn Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525	2019-08-30 20:46:39 +00:00
Craig Topper	18e8d02e8c	[X86] Pass v32i16/v64i8 in zmm registers on KNL target. gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495	2019-08-30 17:35:08 +00:00
Evgeniy Stepanov	04647f5e22	MemTag: unchecked load/store optimization. Summary: MTE allows memory access to bypass tag check iff the address argument is [SP, #imm]. This change takes advantage of this to demote uses of tagged addresses to regular FrameIndex operands, reducing register pressure in large functions. MO_TAGGED target flag is used to signal that the FrameIndex operand refers to memory that might be tagged, and needs to be handled with care. Such operand must be lowered to [SP, #imm] directly, without a scratch register. The transformation pass attempts to predict when the offset will be out of range and disable the optimization. AArch64RegisterInfo::eliminateFrameIndex has an escape hatch in case this prediction has been wrong, but it is quite inefficient and should be avoided. Reviewers: pcc, vitalybuka, ostannard Subscribers: mgorny, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66457 llvm-svn: 370490	2019-08-30 17:23:02 +00:00
Craig Topper	66f03ba17d	[X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only call site. I'm looking at unfolding broadcast loads on AVX512 which will require refactoring this code to select broadcast opcodes instead of regular load/stores in some cases. Merging them to avoid further complicating their interfaces. llvm-svn: 370484	2019-08-30 16:05:57 +00:00
Petar Avramovic	e96892a8aa	[MIPS GlobalISel] Lower uitofp Add custom lowering for G_UITOFP for MIPS32. Differential Revision: https://reviews.llvm.org/D66930 llvm-svn: 370432	2019-08-30 05:51:12 +00:00
Petar Avramovic	6412b56513	[MIPS GlobalISel] Lower fptoui Add lower for G_FPTOUI. Algorithm is similar to the SDAG version in TargetLowering::expandFP_TO_UINT. Lower G_FPTOUI for MIPS32. Differential Revision: https://reviews.llvm.org/D66929 llvm-svn: 370431	2019-08-30 05:44:02 +00:00
Jinsong Ji	a070f12e57	[PowerPC][NFC] Use inline Subtarget->isPPC64() To be consistent with all the other instances. llvm-svn: 370428	2019-08-30 03:16:41 +00:00
Fangrui Song	7704b54389	[PPC32] Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO Unlike ppc64, which has ADDISgotTprelHA+LDgotTprelL pairs, ppc32 just uses LDgotTprelL32, so it does not make lots of sense to use _LO without a paired _HA. Emit R_PPC_GOT_TPREL16 instead R_PPC_GOT_TPREL16_LO to match GCC, and get better linker relocation check. Note, R_PPC_GOT_TPREL16_{HA,LO} don't have good linker support: (a) lld does not support R_PPC_GOT_TPREL16_{HA,LO}. (b) Top of tree ld.bfd does not support R_PPC_GOT_REL16_HA Initial-Exec -> Local-Exec relaxation: // a.o addis 3, 3, tsd_tls@got@tprel@ha lwz 3, tsd_tls@got@tprel@l(3) add 3, 3, tsd_tls@tls // b.o .section .tdata,"awT"; .globl tsd_tls; tsd_tls: // ld/ld-new a.o b.o internal error, aborting at ../../bfd/elf32-ppc.c:7952 in ppc_elf_relocate_section Reviewed By: adalava Differential Revision: https://reviews.llvm.org/D66925 llvm-svn: 370426	2019-08-30 02:20:49 +00:00
Craig Topper	160ed4cab4	[X86] Explicitly list all the always trivially rematerializable instructions. Add a default with an llvm_unreachable for anything we don't expect. This seems safer that just blindly returning true for anything missing from the switch. llvm-svn: 370424	2019-08-30 00:54:36 +00:00
Dan Gohman	da84b688f9	[WebAssembly] Make __attribute__((used)) not imply export. Add an WASM_SYMBOL_NO_STRIP flag, so that __attribute__((used)) doesn't need to imply exporting. When targeting Emscripten, have WASM_SYMBOL_NO_STRIP imply exporting. Differential Revision: https://reviews.llvm.org/D62542 llvm-svn: 370415	2019-08-29 22:40:00 +00:00
Jinsong Ji	1ed7d2119e	[PowerPC] Support extended mnemonics mffprwz etc. Summary: Reported in https://github.com/opencv/opencv/issues/15413. We have serveral extended mnemonics for Move To/From Vector-Scalar Register Instructions eg: mffprd,mtfprd etc. We only support one of them, this patch add the others. Reviewers: nemanjai, steven.zhang, hfinkel, #powerpc Reviewed By: hfinkel Subscribers: wuzish, qcolombet, hiraditya, kbarton, MaskRay, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66963 llvm-svn: 370411	2019-08-29 21:53:59 +00:00
Jessica Paquette	04e657be28	[AArch64][GlobalISel] Select arithmetic extended register patterns This teaches GISel to select patterns which fold an extend plus optional shift into the addressing mode. In particular, adds and subs. Factor out the arith extended register ComplexPatterns in AArch64InstrFormats.td and create GISel equivalents. Add some equivalent functions to the ones in AArch64ISelDAGToDAG: - `selectArithExtendedRegister` - `narrowExtendRegIfNeeded` - `getExtendTypeForInst` `getExtendTypeForInst` includes the checks for loads and stores. This will be used for WRO addressing modes in loads + stores. Teach selectCopy to properly handle subregister copies on the same bank in order to support `narrowExtendRegIfNeeded`. The extended register must be a GPR32, so we need to support same-bank subregister copies. Fix a bug in getSubRegForClass which would cause registers on things like GPR32common to end up getting ssub. Just change the check to look for FPR32 rather than GPR32. For tests: - Add select-arith-extended-reg.mir - Update addsub_ext.ll to include GlobalISel checks Differential Revision: https://reviews.llvm.org/D66835 llvm-svn: 370410	2019-08-29 21:53:58 +00:00
Reid Kleckner	5b79e603d3	[X86] Don't emit unreachable stack adjustments Summary: This is a minor improvement on our past attempts to do this. Fixes PR43155. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66905 llvm-svn: 370409	2019-08-29 21:24:41 +00:00
Reid Kleckner	81e458d001	Allow '@' to appear in x86 mingw symbols Summary: There is no reason to differ in assembler behavior here between -msvc and -gnu targets. Without this setting, the text after the '@' is interpreted as a symbol variable, like foo@IMGREL. Reviewers: mstorsjo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66974 llvm-svn: 370408	2019-08-29 21:15:02 +00:00
Reid Kleckner	fe47ed67fc	Fix the build for MSVC builds using M_PI llvm-svn: 370405	2019-08-29 20:32:53 +00:00
Simon Pilgrim	3d705a1fa4	[X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159) ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404	2019-08-29 20:22:08 +00:00
Matt Arsenault	cbd1782c79	AMDGPU/GlobalISel: Legalize sin/cos llvm-svn: 370402	2019-08-29 20:06:48 +00:00
Craig Topper	5a43fdd313	[X86] Remove what little support we had for MPX -Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393	2019-08-29 18:09:02 +00:00
Matt Arsenault	caff0a88dd	GlobalISel: Add known bits to InstructionSelector AMDGPU uses this for some addressing mode selection patterns. The analysis run itself doesn't do anything so it seems easier to just always require this than adding a way to opt in. llvm-svn: 370388	2019-08-29 17:24:32 +00:00
Jessica Paquette	ba04f5fac1	[GlobalISel][AArch64] Select llvm.aarch64.stxr* intrinsics. Add a GISelPredicateCode to the stxr_* PatFrags in AArch64InstrAtomics.td. This allows us to select these intrinsics. Differential Revision: https://reviews.llvm.org/D65779 llvm-svn: 370382	2019-08-29 16:55:55 +00:00
Jessica Paquette	b8b23a1648	[GlobalISel][AArch64] Use a GISelPredicateCode to select llvm.aarch64.stlxr.* Remove manual selection code for this intrinsic and use a GISelPredicateCode instead. This allows us to fully select this intrinsic without any tricky custom C++ matching. Differential Revision: https://reviews.llvm.org/D65780 llvm-svn: 370380	2019-08-29 16:45:19 +00:00
Jessica Paquette	87720ac8c8	[AArch64][GlobalISel] Select @llvm.aarch64.ldxr.* intrinsics Same thing as D66897, but for ldxr.* instead. Add a GISelPredicateCode to the ldxr_* definitions, which allows us to import them. Add select-ldxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66898 llvm-svn: 370378	2019-08-29 16:33:01 +00:00
Jessica Paquette	c327daeea5	[AArch64][GlobalISel] Select @llvm.aarch64.ldaxr.* intrinsics Add a GISelPredicateCode to ldaxr_. This allows us to import the patterns for @llvm.aarch64.ldaxr., and thus select them. Add `isLoadStoreOfNumBytes` for the GISelPredicateCode, since each of these intrinsics involves the same check. Add select-ldaxr-intrin.mir, and update arm64-ldxr-stxr.ll. Differential Revision: https://reviews.llvm.org/D66897 llvm-svn: 370377	2019-08-29 16:16:38 +00:00
Simon Atanasyan	b23857c149	[mips] Inline emitStoreWithSymOffset and emitLoadWithSymOffset methods. NFC Both methods `MipsTargetStreamer::emitStoreWithSymOffset` and `MipsTargetStreamer::emitLoadWithSymOffset` are almost the same and differ argument names only. These methods are used in the single place so it's better to inline their code and remove original methods. llvm-svn: 370354	2019-08-29 13:19:50 +00:00
Simon Atanasyan	3464b91ef7	[mips] Fix expanding `lw/sw $reg1, symbol($reg2)` instruction When a "base" in the `lw/sw $reg1, symbol($reg2)` instruction is a register and generated code is position independent, backend does not add the "base" value to the symbol address. ``` lw $reg1, %got(symbol)($gp) lw/sw $reg1, 0($reg1) ``` This patch fixes the bug and adds the missed `addu` instruction by passing `BaseReg` into the `loadAndAddSymbolAddress` routine and handles the case when the `BaseReg` is the zero register to escape redundant `move reg, reg` instruction: ``` lw $reg1, %got(symbol)($gp) addu $reg1, $reg1, $reg2 lw/sw $reg1, 0($reg1) ``` Differential Revision: https://reviews.llvm.org/D66894 llvm-svn: 370353	2019-08-29 13:19:38 +00:00

1 2 3 4 5 ...

53659 Commits