llvm-project

Commit Graph

Author	SHA1	Message	Date
Matt Arsenault	72bde9aa7e	AMDGPU: Scavenge register instead of findUnusedReg llvm-svn: 356149	2019-03-14 14:19:01 +00:00
Matt Arsenault	3a31b3f6e8	AMDGPU: Don't add unnecessary convergent attributes These are redundant with the intrinsic declaration. llvm-svn: 356143	2019-03-14 13:46:09 +00:00
Sam Parker	4c4ff13d3c	[ARM][ParallelDSP] Enable multiple uses of loads When choosing whether a pair of loads can be combined into a single wide load, we check that the load only has a sext user and that sext also only has one user. But this can prevent the transformation in the cases when parallel macs use the same loaded data multiple times. To enable this, we need to fix up any other uses after creating the wide load: generating a trunc and a shift + trunc pair to recreate the narrow values. We also need to keep a record of which loads have already been widened. Differential Revision: https://reviews.llvm.org/D59215 llvm-svn: 356132	2019-03-14 11:14:13 +00:00
Sam Parker	3b2ba20afd	[ARM] Run ARMParallelDSP in the IRPasses phase Run EarlyCSE before ParallelDSP and do this in the backend IR opt phase. Differential Revision: https://reviews.llvm.org/D59257 llvm-svn: 356130	2019-03-14 10:57:40 +00:00
Alex Bradbury	fec503acb6	[RISCV] Fix rL356123 The wrong version of the patch was committed. This fixes typos that broke the build. llvm-svn: 356124	2019-03-14 08:31:35 +00:00
Alex Bradbury	8dbc6398e1	[RISCV][NFC] Rename callee saved regs 'CSR' to CSR_ILP32_LP64 and minor RISCVRegisterInfo refactoring The CSR renaming further prepares the way for an upcoming patch adding support for more RISC-V ABIs. Modify RISCVRegisterInfo::getCalleeSavedRegs and RISCVRegisterInfo::getReservedRegs to do MF->getSubtarget<RISCVSubtarget>() once rather than multiple times. llvm-svn: 356123	2019-03-14 08:28:48 +00:00
Craig Topper	54a0b53308	[X86] Add patterns for rotr by immediate to fix PR41057. Prior to the introduction of funnel shift intrinsics we could count on rotate by immediates prefering to use rotl since that's what MatchRotate would check first. The or+shift pattern doesn't have a direction so one must be chosen arbitrarily. With funnel shift, there is a direction and fshr will try to use rotr first. While fshl will try to use rotl first. This patch adds the isel patterns for rotr to complement the rotl patterns. I've put the rotr by 1 patterns in the instruction patterns. And moved the rotl by bitwidth-1 patterns to separate Pat patterns. Fixes PR41057. llvm-svn: 356121	2019-03-14 07:07:26 +00:00
Jessica Paquette	85ace6269f	[AArch64][GlobalISel] Gardening: Simplify subregister copy in selectBuildVector NFC. Some more preliminary factoring for G_INSERT_VECTOR_ELT. Also better code-reuse, etc., etc. Differential Revision: https://reviews.llvm.org/D59323 llvm-svn: 356107	2019-03-13 23:29:54 +00:00
Jessica Paquette	16d67a3e32	[GlobalISel][AArch64] Gardening: Factor out vector inserts Factor out the vector insert code in `selectBuildVector`. Replace part of it with `emitScalarToVector`, since it was pretty much equivalent. This will make implementing G_INSERT_VECTOR_ELT easier. Differential Revision: https://reviews.llvm.org/D59322 llvm-svn: 356106	2019-03-13 23:22:23 +00:00
Jessica Paquette	bb1aced80d	[GlobalISel][AArch64] Gardening: Factor out code to find lane indices Some more refactoring for G_INSERT_VECTOR_ELT. Factor out the code used to find a lane index from `selectExtractElt`. Put it into a more general-purpose `getConstantValueForReg` function. This will be shared with the code for G_INSERT_VECTOR_ELT. Differential Revision: https://reviews.llvm.org/D59324 llvm-svn: 356101	2019-03-13 21:19:29 +00:00
Stanislav Mekhanoshin	da644c025d	[AMDGPU] Silence gcc 7 warnings Differential Revision: https://reviews.llvm.org/D59330 llvm-svn: 356100	2019-03-13 21:15:52 +00:00
Tim Renouf	ed0b9af997	[AMDGPU] Switched HSA metadata to use MsgPackDocument Summary: MsgPackDocument is the lighter-weight replacement for MsgPackTypes. This commit switches AMDGPU HSA metadata processing to use MsgPackDocument instead of MsgPackTypes. Differential Revision: https://reviews.llvm.org/D57024 Change-Id: I0751668013abe8c87db01db1170831a76079b3a6 llvm-svn: 356081	2019-03-13 18:55:50 +00:00
Craig Topper	84abec2855	[X86] Check for 64-bit mode in X86Subtarget::hasCmpxchg16b() The feature flag alone can't be trusted since it can be passed via -mattr. Need to ensure 64-bit mode as well. We had a 64 bit mode check on the instruction to make the assembler work correctly. But we weren't guarding any of our lowering code or the hooks for the AtomicExpandPass. I've added 32-bit command lines to atomic128.ll with and without cx16. The tests there would all previously fail if -mattr=cx16 was passed to them. I had to move one test case for f128 to a new file as it seems to have a different 32-bit mode or possibly sse issue. Differential Revision: https://reviews.llvm.org/D59308 llvm-svn: 356078	2019-03-13 18:48:50 +00:00
Simon Pilgrim	bef4fe056d	[X86][AVX] Add X86ISD::VTRUNC handling to SimplifyDemandedVectorEltsForTargetNode llvm-svn: 356067	2019-03-13 17:00:18 +00:00
Simon Pilgrim	d9aa879b67	[X86][AVX] Add combineConcatVectors support to improve subvector handling Attempt to combine CONCAT_VECTORS nodes, which we only really have pre-legalization. This encourages a lot of X86ISD::SUBV_BROADCAST generation, so I've added SimplifyDemandedVectorEltsForTargetNode handling for this at the same time. The X86ISD::VTRUNC regression in shuffle-vs-trunc-256-widen.ll will be handled in a future commit. llvm-svn: 356064	2019-03-13 16:37:30 +00:00
Alex Bradbury	8a70468a27	[RISCV] Only mark fp as reserved if the function has a dedicated frame pointer This follows similar logic in the ARM and Mips backends, and allows the free use of s0 in functions without a dedicated frame pointer. The changes in callee-saved-gprs.ll most clearly show the effect of this patch. llvm-svn: 356063	2019-03-13 16:33:45 +00:00
Simon Atanasyan	3f4870b692	[mips] Join some adjacent `let DecoderNamespace` blocks. NFC llvm-svn: 356059	2019-03-13 16:00:42 +00:00
Sanjay Patel	0a251e4076	[x86] limit extractelement of setcc to pre-legalization A fuzzer found the crasher: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13700 The bug was introduced recently here: rL355741 This is the quick fix. If we need to do this transform later, then we'd have to extend/truncate the vector setcc element type to the scalar setcc type (i8). llvm-svn: 356053	2019-03-13 14:49:52 +00:00
Simon Atanasyan	9bfd140ddb	[mips] Fix encoding of the `mov.d` command for microMIPS R6 Before this change LLVM emits non-microMIPS variant of the `mov.d` command for microMIPS code. Differential Revision: http://reviews.llvm.org/D59045 llvm-svn: 356052	2019-03-13 14:23:12 +00:00
Simon Atanasyan	ab45d68406	[mips] Define `mov.d` instructions using `ABSS_M` multiclass. NFC llvm-svn: 356051	2019-03-13 14:22:58 +00:00
Simon Pilgrim	0c1e5aacd3	Fix signed/unsigned mismatch warning. NFCI. llvm-svn: 356046	2019-03-13 13:14:14 +00:00
Simon Atanasyan	b9d9e0be3c	[mips] Map SW instruction to its microMIPS R6 variant To provide mapping between standard and microMIPS R6 variants of the `sw` command we have to rename SWSP_xxx commands from "sw" to "swsp". Otherwise `tablegen` starts to show the error `Multiple matches found for `SW'`. After that to restore printing SWSP command as `sw`, I add an appropriate `MipsInstAlias` instance. We also need to implement "size reduction" for microMIPS R6. But this task is for separate patch. After that the `micromips-lwsp-swsp.ll` test case will be extended. Differential Revision: http://reviews.llvm.org/D59046 llvm-svn: 356045	2019-03-13 13:09:30 +00:00
Simon Pilgrim	7abbd70300	[X86][AVX] lowerShuffleAsBroadcast - improve load folding by avoiding bitcasts AVX1 broadcasts were failing as we were adding bitcasts that caused MayFoldLoad's hasOneUse to return false. This patch stops introducing bitcasts so early and also replaces the broadcast index scaling through bitcasts (which can't succeed in some cases) to instead just keep track of the bitoffset which can be converted back to the broadcast index later on. Differential Revision: https://reviews.llvm.org/D58888 llvm-svn: 356043	2019-03-13 12:20:39 +00:00
Simon Atanasyan	c2b975a75c	[MIPS][microMIPS] Fix PseudoMTLOHI_MM matching and expansion On micromips MipsMTLOHI is always matched to PseudoMTLOHI_DSP regardless of +dsp argument. This patch checks is HasDSP predicate is present for PseudoMTLOHI_DSP so PseudoMTLOHI_MM can be matched when appropriate. Add expansion of PseudoMTLOHI_MM instruction into a mtlo/mthi pair. Patch by Mirko Brkusanin. Differential Revision: http://reviews.llvm.org/D59203 llvm-svn: 356039	2019-03-13 11:04:38 +00:00
Jonas Hahnfeld	c64d73cce2	[ELF] Fix GCC8 warnings about "fall through", NFCI Add break statements in Object/ELF.cpp since the code should consider the generic tags for Hexagon, MIPS, and PPC. Add a test (copied from llvm-readobj) to show that this works correctly (earlier versions of this patch would have asserted). The warnings in X86ELFObjectWriter.cpp are actually false-positives since the nested switch() handles all possible values and returns in all cases. Make this explicit by adding llvm_unreachable's. Differential Revision: https://reviews.llvm.org/D58837 llvm-svn: 356037	2019-03-13 10:38:17 +00:00
Alex Bradbury	18f95e6a6f	[RISCV] Replace incorrect use of sizeof with array_lengthof RISCVDisassembler was incorrectly using sizeof(Arr) when it should have used sizeof(Arr)/sizeof(Arr[0]). Update to use array_lengthof instead. llvm-svn: 356035	2019-03-13 09:22:57 +00:00
Craig Topper	750efba67c	[X86] Enable printAliasInstr for the Intel assembly printer so that AAM and AAD will print without an immediate when the immediate is 10. llvm-svn: 355997	2019-03-13 00:43:03 +00:00
Heejin Ahn	8b49b6bed6	[WebAssembly] Place 'try' and 'catch' correctly wrt EH_LABELs Summary: After instruction selection phase, possibly-throwing calls, which were previously invoke, are wrapped in `EH_LABEL` instructions. For example: ``` EH_LABEL <mcsymbol .Ltmp0> CALL_VOID @foo ... EH_LABEL <mcsymbol .Ltmp1> ``` `EH_LABEL` is placed also in the beginning of EH pads: ``` bb.1 (landing-pad): EH_LABEL <mcsymbol .Ltmp2> ... ``` And we'd like to maintian this relationship, so when we place a `try`, ``` TRY ... EH_LABEL <mcsymbol .Ltmp0> CALL_VOID @foo ... EH_LABEL <mcsymbol .Ltmp1> ``` When we place a `catch`, ``` bb.1 (landing-pad): EH_LABEL <mcsymbol .Ltmp2> %0:except_ref = CATCH ... ... ``` Previously we didn't treat EH_LABELs specially, so `try` was placed right before a call, and `catch` was placed in the beginning of an EH pad. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58914 llvm-svn: 355996	2019-03-13 00:37:31 +00:00
Jason Liu	a03ae73c29	Add XCOFF triple object format type for AIX This patch adds an XCOFF triple object format type into LLVM. This XCOFF triple object file type will be used later by object file and assembly generation for the AIX platform. Differential Revision: https://reviews.llvm.org/D58930 llvm-svn: 355989	2019-03-12 22:01:10 +00:00
Philip Reames	9134f84ba4	For faulting ops, include a comment w/the fault destination A faulting_op is one that has specified behavior when a fault occurs, generally redirecting control flow to another location. This change just adds a comment to the assembly output which makes it both human readable, and machine checkable w/o having to parse the FaultMap section. This is used to split a test file into two parts, so that I can (in a near future commit) easily extend the test file to demonstrate another case. llvm-svn: 355982	2019-03-12 21:05:31 +00:00
Matt Arsenault	caf1316f71	IR: Add immarg attribute This indicates an intrinsic parameter is required to be a constant, and should not be replaced with a non-constant value. Add the attribute to all AMDGPU and generic intrinsics that comments indicate it should apply to. I scanned other target intrinsics, but I don't see any obvious comments indicating which arguments are intended to be only immediates. This breaks one questionable testcase for the autoupgrade. I'm unclear on whether the autoupgrade is supposed to really handle declarations which were never valid. The verifier fails because the attributes now refer to a parameter past the end of the argument list. llvm-svn: 355981	2019-03-12 21:02:54 +00:00
Sanjay Patel	737c27a9cd	[x86] scalarize extractelement 0 of FP vselect llvm-svn: 355955	2019-03-12 19:20:45 +00:00
Jinsong Ji	9dc2c1d564	Set useful flags for vector imm setting instructions Vector imm setting instructions like XXLXORz/XXLXORspz/XXLXORdpz Should behave like LI8. We should set corresponding flags to allow rematerialization and other opts in LICM, RA, Scheduling etc. Differential Revision: https://reviews.llvm.org/D58645 llvm-svn: 355948	2019-03-12 18:27:09 +00:00
Eli Friedman	74b6aae4e8	[RISCV][MC] Find matching pcrel_hi fixup in more cases. If a symbol points to the end of a fragment, instead of searching for fixups in that fragment, search in the next fragment. Fixes spurious assembler error with subtarget change next to "la" pseudo-instruction, or expanded equivalent. Alternate proposal to fix the problem discussed in https://reviews.llvm.org/D58759. Testcase by Ana Pazos. Differential Revision: https://reviews.llvm.org/D58943 llvm-svn: 355946	2019-03-12 18:14:16 +00:00
Craig Topper	5c1177a68f	[X86] Arrange more CPU features to inherit from earlier CPUs. NFCI This makes SandyBridge inherit back to Westmere/Nehalem. Make bdver1-4 inherit from each other and btver2 inherit from btver1. llvm-svn: 355935	2019-03-12 16:35:30 +00:00
Jinsong Ji	06bee01d2b	[NFC][PowerPC]Assert when trying to generate directmove below P8. This was found when we generated COPY from G8RC to F8RC in EmitInstrWithCustomInserter without checking proper architecture, we silently generated mtvsrd, which require P8 and up. This is a NFC patch to add assert when we call copyPhysReg, in case someone accidentally generate COPY between G8RC to F8RC for P7 and below. llvm-svn: 355920	2019-03-12 14:01:29 +00:00
David Stuttard	20ea21c6ed	[AMDGPU] Add support for immediate operand for S_ENDPGM Summary: Add support for immediate operand in S_ENDPGM Change-Id: I0c56a076a10980f719fb2a8f16407e9c301013f6 Reviewers: alexshap Subscribers: qcolombet, arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, eraman, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59213 llvm-svn: 355902	2019-03-12 09:52:58 +00:00
David Blaikie	eae78b5157	Hexagon RDF: Replace function template (plus explicit specializations) with non-template overloads For the design in question, overloads seem to be a much simpler and less subtle solution. This removes ODR issues, and errors of the kind where code that uses the specialization in question will accidentally and erroneously specialize the primary template. This only "works" by accident; the program is ill-formed NDR. (Found with -Wundefined-func-template.) Patch by Thomas Köppe! Differential Revision: https://reviews.llvm.org/D58998 llvm-svn: 355880	2019-03-11 23:10:33 +00:00
Craig Topper	a958d40e78	[X86] Remove ProcModel and ProcFeatures tablegen classes. Move all feature lists into a ProcessorFeatures class. ProcFeatures was a class that just concatenated two feature lists together and gave it a name. We used it to inherit features between CPUs. ProcModel took a two CPU feature lists and concatenated them before deferring to ProcessorModel. This was to allow inherited features and specific features to be passed to each CPU. Both of these allowed for only very rigid CPU inheritance rules. With this patch we now store all of the lists we were using for inheritance in one object and do any list oncatenation we want there. Then we just pass whatever list we want from this class into the ProcessorModel class for each CPU. Hopefully this gives us more flexibility to build up feature lists in whatever ways we think make sense. Perhaps untangling ISA flags and tuning flags. I've only touched the CPUs that were directly affected by the removal of the ProcModel and ProcFeatures classes. We should move more of the feature lists into ProcessorFeatures. llvm-svn: 355872	2019-03-11 22:29:00 +00:00
Jessica Paquette	607774c960	Recommit "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT" After r355865, we should be able to safely select G_EXTRACT_VECTOR_ELT without running into any problematic intrinsics. Also add a fix for lane copies, which don't support index 0. llvm-svn: 355871	2019-03-11 22:18:01 +00:00
Evgeniy Stepanov	aedec3f684	Remove ASan asm instrumentation. Summary: It is incomplete and has no users AFAIK. Reviewers: pcc, vitalybuka Subscribers: srhines, kubamracek, mgorny, krytarowski, eraman, hiraditya, jdoerfert, #sanitizers, llvm-commits, thakis Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59154 llvm-svn: 355870	2019-03-11 21:50:10 +00:00
Alex Bradbury	4d20cc21c7	[RISCV] Do a sign-extension in a compare-and-swap of 32 bit in RV64A AtomicCmpSwapWithSuccess is legalised into an AtomicCmpSwap plus a comparison. This requires an extension of the value which, by default, is a zero-extension. When we later lower AtomicCmpSwap into a PseudoCmpXchg32 and then expanded in RISCVExpandPseudoInsts.cpp, the lr.w instruction does a sign-extension. This mismatch of extensions causes the comparison to fail when the compared value is negative. This change overrides TargetLowering::getExtendForAtomicOps for RISC-V so it does a sign-extension instead. Differential Revision: https://reviews.llvm.org/D58829 Patch by Ferran Pallarès Roca. llvm-svn: 355869	2019-03-11 21:41:22 +00:00
Alex Bradbury	b6d322bdc2	[RISCV] Allow fp as an alias of s0 The RISC-V Assembly Programmer's Manual defines fp as another alias of x8. However, our tablegen rules only recognise s0. This patch adds fp as another alias of x8. GCC also accepts fp. Differential Revision: https://reviews.llvm.org/D59209 Patch by Ferran Pallarès Roca. llvm-svn: 355867	2019-03-11 21:35:26 +00:00
Jessica Paquette	42d16501e6	[GlobalISel][AArch64] Always fall back on aarch64.neon.addp.* Overloaded intrinsics aren't necessarily safe for instruction selection. One such intrinsic is aarch64.neon.addp.*. This is a temporary workaround to ensure that we always fall back on that intrinsic. Eventually this will be replaced with a proper solution. https://bugs.llvm.org/show_bug.cgi?id=40968 Differential Revision: https://reviews.llvm.org/D59062 llvm-svn: 355865	2019-03-11 20:51:17 +00:00
Alex Bradbury	2c6c84e52c	[RISCV][NFC] Convert some MachineBaiscBlock::iterator(MI) to MI.getIterator() llvm-svn: 355864	2019-03-11 20:43:29 +00:00
Nikita Popov	aa7cfa75f9	[SDAG][AArch64] Legalize VECREDUCE Fixes https://bugs.llvm.org/show_bug.cgi?id=36796. Implement basic legalizations (PromoteIntRes, PromoteIntOp, ExpandIntRes, ScalarizeVecOp, WidenVecOp) for VECREDUCE opcodes. There are more legalizations missing (esp float legalizations), but there's no way to test them right now, so I'm not adding them. This also includes a few more changes to make this work somewhat reasonably: * Add support for expanding VECREDUCE in SDAG. Usually experimental.vector.reduce is expanded prior to codegen, but if the target does have native vector reduce, it may of course still be necessary to expand due to legalization issues. This uses a shuffle reduction if possible, followed by a naive scalar reduction. * Allow the result type of integer VECREDUCE to be larger than the vector element type. For example we need to be able to reduce a v8i8 into an (nominally) i32 result type on AArch64. * Use the vector operand type rather than the scalar result type to determine the action, so we can control exactly which vector types are supported. Also change the legalize vector op code to handle operations that only have vector operands, but no vector results, as is the case for VECREDUCE. * Default VECREDUCE to Expand. On AArch64 (only target using VECREDUCE), explicitly specify for which vector types the reductions are supported. This does not handle anything related to VECREDUCE_STRICT_*. Differential Revision: https://reviews.llvm.org/D58015 llvm-svn: 355860	2019-03-11 20:22:13 +00:00
Jinsong Ji	c6063e83d5	[NFC][PowerPC] Add comment for PPCAsmPrinter::printOperand Patch by Yi-Hong Lyu llvm-svn: 355848	2019-03-11 17:57:49 +00:00
Stanislav Mekhanoshin	e98944ed47	Use bitset for assembler predicates AMDGPU target run out of Subtarget feature flags hitting the limit of 64. AssemblerPredicates uses at most uint64_t for their representation. At the same time CodeGen has exhausted this a long time ago and switched to a FeatureBitset with the current limit of 192 bits. This patch completes transition to the bitset for feature bits extending it to asm matcher and MC code emitter. Differential Revision: https://reviews.llvm.org/D59002 llvm-svn: 355839	2019-03-11 17:04:35 +00:00
Stanislav Mekhanoshin	266f1574ce	[AMDGPU] Mark enum types in SIDefines.h as unsigned MSVC issues some warnings about signed/unsigned comparison. Differential Revision: https://reviews.llvm.org/D59171 llvm-svn: 355836	2019-03-11 16:49:32 +00:00
Petar Jovanovic	28e13eb098	[MIPS][microMIPS] Add a pattern to match TruncIntFP A pattern needed to match TruncIntFP was missing. This was causing multiple tests from llvm test suite to fail during compilation for micromips. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58722 llvm-svn: 355825	2019-03-11 14:13:31 +00:00
Petar Avramovic	5229f47f9f	[MIPS GlobalISel] NarrowScalar G_UMULH NarrowScalar G_UMULH in LegalizerHelper using multiplyRegisters helper function. NarrowScalar G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58825 llvm-svn: 355815	2019-03-11 10:08:44 +00:00
Petar Avramovic	0b17e59b5c	[MIPS GlobalISel] NarrowScalar G_MUL Narrow Scalar G_MUL for MIPS32. Revisit NarrowScalar implementation in LegalizerHelper. Introduce new helper function multiplyRegisters. It performs generic multiplication of values held in multiple registers. Generated instructions use only types NarrowTy and i1. Destination can be same or two times size of the source. Differential Revision: https://reviews.llvm.org/D58824 llvm-svn: 355814	2019-03-11 10:00:17 +00:00
Craig Topper	00afa193f1	[X86] Enable sse2_cvtsd2ss intrinsic to use an EVEX encoded instruction. llvm-svn: 355810	2019-03-11 06:01:04 +00:00
Craig Topper	f1e7482e69	[X86] Remove apparently unneeded patterns for storing a bitcasted extractelement. I suspect if this pattern was seen, DAG combine would just change the type of the store to eliminate the bitcast. llvm-svn: 355809	2019-03-11 06:01:02 +00:00
Craig Topper	dc488767b2	[X86] Use 'UseAVX' in place of 'HasAVX, NoAVX512'. NFC They mean the same thing, but 'HasAVX, NoAVX512' only appears in this one place. Every other place uses UseAVX. llvm-svn: 355808	2019-03-11 06:01:00 +00:00
Craig Topper	f19d6a4073	[X86] Add SCALAR_SINT_TO_FP/SCALAR_UINT_TO_FP ISD opcodes without rounding mode. After this we no longer need to match FROUND_CURRENT or FROUND_NO_EXC during isel so I remove those. llvm-svn: 355807	2019-03-11 04:37:01 +00:00
Craig Topper	ecbc141dbf	[X86] Split SCALEF(S) ISD opcodes into a version without rounding mode. llvm-svn: 355806	2019-03-11 04:36:59 +00:00
Craig Topper	a0b5338834	[X86] Split RCP28/RSQRT/GETEXP/EXP2 ISD opcodes into SAE and current direction nodes. Remove rounding mode operand. llvm-svn: 355805	2019-03-11 04:36:57 +00:00
Craig Topper	ba7d654526	[X86] Rename _RND versions of RANGE/REDUCE/GETMANT/RDNSCALE ISD opcodes to _SAE. Remove SAE operand. No need to explicitly store it and match it during isel. llvm-svn: 355804	2019-03-11 04:36:55 +00:00
Craig Topper	244ffcdf0d	[X86] Rename X86ISD::CVTPH2PS_RND to CVTPH2PS_SAE. Remove SAE operand. llvm-svn: 355803	2019-03-11 04:36:53 +00:00
Craig Topper	6059b1737e	[X86] Rename the CVTT*_RND ISD nodes to _SAE and remove the SAE operand. Split VFPROUNDS_RND/VFPEXT(S)_RND into versions without rounding operand. For VFPEXT(S) we only need current rounding mode and an SAE version. Neither need extra operand. llvm-svn: 355802	2019-03-11 04:36:51 +00:00
Craig Topper	4c544ca993	[X86] Rename X86ISD::CMPM_RND and X86ISD::FSETCCM_RND to _SAE instead of _RND. Remove rounding operand. The operand could only be the SAE encoding so no need to include it. llvm-svn: 355801	2019-03-11 04:36:49 +00:00
Craig Topper	704303a2a1	[X86] Split the VFIXUPIMM/VFIXUPIMMS nodes into a current rounding mode and SAE ISD opcode. Remove matching of FROUND_CURRENT and FROUND_NO_EXC for these nodes from isel table. llvm-svn: 355800	2019-03-11 04:36:47 +00:00
Craig Topper	b7e6bfe579	[X86] Begin removing matching of FROUND_CURRENT and FROUND_NO_EXC from isel tables. Instead I plan to have dedicated nodes for FROUND_CURRENT and FROUND_NO_EXC. This patch starts with FADDS/FSUBS/FMULS/FDIVS/FMAXS/FMINS/FSQRTS. llvm-svn: 355799	2019-03-11 04:36:44 +00:00
Zi Xuan Wu	428dcd5c3f	[PowerPC] Remove the override of isMachineVerifierClean() to open machine verifier After fix all asserts found by machine verifier in PowerPC target with following patches, we can activate machine verifier as default. rL293769, rL348566, rL349030, rL349029, rL350113, rL350111, rL350799, rL350165, rL355378, rL352174, rL354762, rL350115 It's also found in PR#27456, https://bugs.llvm.org/show_bug.cgi?id=27456 Differential Revision: https://reviews.llvm.org/D59011 llvm-svn: 355798	2019-03-11 03:31:09 +00:00
Craig Topper	d8ebbe4a76	[X86] Remove unneeded isel patterns from VCVTSI2SDZ and VCVTUSI2SDZ. NFC We had patterns using X86ISD::SCALAR_SINT_TO_FP_RND/SCALAR_UINT_TO_FP_RND for these instructions. There's nothing to round. Instead, we use a regular sint_to_fp/uint_to_fp and a movsd as the pattern for these. llvm-svn: 355796	2019-03-11 01:20:38 +00:00
Craig Topper	4cf8cdc51d	[X86] Remove VCVTSI2SDZrrb_Int as it shouldn't exist. This would convert a signed 32-bit integer to double precision with rounding. But there's nothing to round. llvm-svn: 355795	2019-03-11 01:20:37 +00:00
Sanjay Patel	26e06e859e	[x86] add x86-specific opcodes to extractelement scalarization list llvm-svn: 355792	2019-03-10 18:56:21 +00:00
Craig Topper	66c9690ad6	[X86] Remove unused variable. NFC llvm-svn: 355790	2019-03-10 17:36:41 +00:00
Craig Topper	93e15dfacc	[X86] Make lowering of intrinsics with rounding mode stricter so that only valid rounding modes are lowered. Update tests accordingly Many of our tests were not using valid rounding mode immediates. Clang verifies this in the frontend when it creates the intrinsics from builtins, but the backend would still lower invalid immediates. With this change we will now leave them as intrinsics if the immediate is invalid. This will cause an isel selection failure. llvm-svn: 355789	2019-03-10 17:20:45 +00:00
Craig Topper	0dc8c52d4e	[X86] Remove dead code from the handler for INTR_TYPE_SCALAR_MASK_RM. The code in here handles nodes with 6 or 7 operands. But only the 6 operand case is ever used these days. llvm-svn: 355788	2019-03-10 17:20:42 +00:00
Craig Topper	1a872f2b15	Recommit r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." Includes a fix to emit a CheckOpcode for build_vector when immAllZerosV/immAllOnesV is used as a pattern root. This means it can't be used to look through bitcasts when used as a root, but that's probably ok. This extra CheckOpcode will ensure that the first match in the isel table will be a SwitchOpcode which is needed by the caching optimization in the ISel Matcher. Original commit message: Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355784	2019-03-10 05:21:52 +00:00
Alex Bradbury	62c8a57a74	[RISCV][NFC] Minor refactoring of CC_RISCV Immediately check if we need to early-exit as we have a return value that can't be returned directly. Also tweak following if/else. llvm-svn: 355773	2019-03-09 11:16:27 +00:00
Alex Bradbury	bd0eff316a	[RISCV][NFC] Split out emitSelectPseudo from EmitInstrWithCustomInserter It's cleaner and more consistent to have a separate helper function here. llvm-svn: 355772	2019-03-09 09:30:14 +00:00
Alex Bradbury	fea4957177	[RISCV] Support -target-abi at the MC layer and for codegen This patch adds proper handling of -target-abi, as accepted by llvm-mc and llc. Lowering (codegen) for the hard-float ABIs will follow in a subsequent patch. However, this patch does add MC layer support for the hard float and RVE ABIs (emission of the appropriate ELF flags https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#-file-header). ABI parsing must be shared between codegen and the MC layer, so we add computeTargetABI to RISCVUtils. A warning will be printed if an invalid or unrecognized ABI is given. Differential Revision: https://reviews.llvm.org/D59023 llvm-svn: 355771	2019-03-09 09:28:06 +00:00
Thomas Lively	972d7d514b	[WebAssembly] Use named operands to identify loads and stores Summary: Uses the named operands tablegen feature to look up the indices of offset, address, and p2align operands for all load and store instructions. This replaces brittle, incorrect logic for identifying loads and store when eliminating frame indices, which previously crashed on bulk-memory ops. It also cleans up the SetP2Alignment pass. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59007 llvm-svn: 355770	2019-03-09 04:31:37 +00:00
Ana Pazos	5254d1baae	[RISCV] Allow access to FP CSRs without F extension Summary: Floating-point CSRs should be accessible even when F extension is not enabled. But pseudo instructions that access floating point CSRs still require the F extension. GNU tools already implement this behavior. RISC-V spec is pending update to reflect this behavior and to extend it to pseudo instructions that access floating point CSRs. Reviewers: asb Reviewed By: asb Subscribers: asb, rbar, johnrusso, simoncook, sabuasal, niosHD, kito-cheng, shiva0217, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, llvm-commits Differential Revision: https://reviews.llvm.org/D58932 llvm-svn: 355753	2019-03-08 23:01:08 +00:00
Amara Emerson	7a05d1c1f1	[AArch64][GlobalISel] Fix i1 arguments not being zero-extended as required by ABI. Fixes PR41001. llvm-svn: 355745	2019-03-08 22:17:00 +00:00
Sanjay Patel	f84083b4db	[x86] scalarize extract element 0 of FP cmp An extension of D58282 noted in PR39665: https://bugs.llvm.org/show_bug.cgi?id=39665 This doesn't answer the request to use movmsk, but that's an independent problem. We need this and probably still need scalarization of FP selects because we can't do that as a target-independent transform (although it seems likely that targets besides x86 should have this transform). llvm-svn: 355741	2019-03-08 21:54:41 +00:00
Alexey Bataev	a8b3eb46b5	[NVPTX][DEBUGINFO]Temp workaround for crash of ptxas: disable packed bytes in debug sections. Summary: This patch works around the bug in the ptxas tool with the processing of bytes separated by the comma symbol. The emission of the packed string is temporarily disabled. Reviewers: tra Subscribers: jholewinski, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59148 llvm-svn: 355740	2019-03-08 21:29:17 +00:00
Mitch Phillips	790edbc16e	[HWASan] Save + print registers when tag mismatch occurs in AArch64. Summary: This change change the instrumentation to allow users to view the registers at the point at which tag mismatch occured. Most of the heavy lifting is done in the runtime library, where we save the registers to the stack and emit unwind information. This allows us to reduce the overhead, as very little additional work needs to be done in each __hwasan_check instance. In this implementation, the fast path of __hwasan_check is unmodified. There are an additional 4 instructions (16B) emitted in the slow path in every __hwasan_check instance. This may increase binary size somewhat, but as most of the work is done in the runtime library, it's manageable. The failure trace now contains a list of registers at the point of which the failure occured, in a format similar to that of Android's tombstones. It currently has the following format: Registers where the failure occurred (pc 0x0055555561b4): x0 0000000000000014 x1 0000007ffffff6c0 x2 1100007ffffff6d0 x3 12000056ffffe025 x4 0000007fff800000 x5 0000000000000014 x6 0000007fff800000 x7 0000000000000001 x8 12000056ffffe020 x9 0200007700000000 x10 0200007700000000 x11 0000000000000000 x12 0000007fffffdde0 x13 0000000000000000 x14 02b65b01f7a97490 x15 0000000000000000 x16 0000007fb77376b8 x17 0000000000000012 x18 0000007fb7ed6000 x19 0000005555556078 x20 0000007ffffff768 x21 0000007ffffff778 x22 0000000000000001 x23 0000000000000000 x24 0000000000000000 x25 0000000000000000 x26 0000000000000000 x27 0000000000000000 x28 0000000000000000 x29 0000007ffffff6f0 x30 00000055555561b4 ... and prints after the dump of memory tags around the buggy address. Every register is saved exactly as it was at the point where the tag mismatch occurs, with the exception of x16/x17. These registers are used in the tag mismatch calculation as scratch registers during __hwasan_check, and cannot be saved without affecting the fast path. As these registers are designated as scratch registers for linking, there should be no important information in them that could aid in debugging. Reviewers: pcc, eugenis Reviewed By: pcc, eugenis Subscribers: srhines, kubamracek, mgorny, javed.absar, krytarowski, kristof.beyls, hiraditya, jdoerfert, llvm-commits, #sanitizers Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D58857 llvm-svn: 355738	2019-03-08 21:22:35 +00:00
Matt Arsenault	e8c03a2511	AMDGPU: Move d16 load matching to preprocess step When matching half of the build_vector to a load, there could still be a hidden dependency on the other half of the build_vector the pattern wouldn't detect. If there was an additional chain dependency on the other value, a cycle could be introduced. I don't think a tablegen pattern is capable of matching the necessary conditions, so move this into PreprocessISelDAG. Check isPredecessorOf for the other value to avoid a cycle. This has a warning that it's expensive, so this should probably be moved into an MI pass eventually that will have more freedom to reorder instructions to help match this. That is currently complicated by the lack of a computeKnownBits type mechanism for the selected function. llvm-svn: 355731	2019-03-08 20:58:11 +00:00
Matt Arsenault	26e76ef0e2	DAG: Don't try to cluster loads with tied inputs This avoids breaking possible value dependencies when sorting loads by offset. AMDGPU has some load instructions that write into the high or low bits of the destination register, and have a tied input for the other input bits. These can easily have the same base pointer, but be a swizzle so the high address load needs to come first. This was inserting glue forcing the opposite ordering, producing a cycle the InstrEmitter would assert on. It may be potentially expensive to look for the dependency between the other loads, so just skip any where this could happen. Fixes bug 40936 by reverting r351379, which added a hacky attempt to fix this by adding chains in this case, which I think was just working around broken glue before the InstrEmitter. The core of the patch is re-implementing the fix for that problem. llvm-svn: 355728	2019-03-08 20:46:15 +00:00
Matt Arsenault	f587fd9ce1	AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtr This is only called in contexts that are verifying the chain itself, and the query itself is only asking about the address. llvm-svn: 355723	2019-03-08 20:30:51 +00:00
Matt Arsenault	07f904befb	AMDGPU: Correct DS implementation of areLoadsFromSameBasePtr This was checking the wrong operands for the base register and the offsets. The indexes are shifted by the number of output registers from the machine instruction definition, and the chain is moved to the end. llvm-svn: 355722	2019-03-08 20:30:50 +00:00
Alexey Bataev	78fcb8381f	[DEBUG_INFO][NVPTX]Emit empty .debug_loc section in presence of the debug option. Summary: If the LLVM module shows that it has debug info, but the file is actually empty and the real debug info is not emitted, the ptxas tool emits error 'Debug information not found in presence of .target debug'. We need at leas one empty debug section to silence this message. Section `.debug_loc` is not emitted for PTX and we can emit empty `.debug_loc` section if `debug` option was emitted. Reviewers: tra Subscribers: jholewinski, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D57250 llvm-svn: 355719	2019-03-08 20:08:04 +00:00
Sanjay Patel	b22f438df3	[x86] prevent infinite looping from inverse shuffle transforms llvm-svn: 355713	2019-03-08 19:20:28 +00:00
Diogo N. Sampaio	c20c37ba7f	[ARM][FIX] Fix vfmal.f16 and vfmsl.f16 operand The indexed variant of vfmal.f16 and vfmsl.f16 instructions use the uppser bits of the indexed operand to store the index (1 bit for the double variant, 2 bits for the quad). This limits the usable registers to d0 - d7 or s0 - s15. This patch enforces this limitation. Differential Revision: https://reviews.llvm.org/D59021 llvm-svn: 355707	2019-03-08 17:11:20 +00:00
Michael Platings	308e82eceb	[IR][ARM] Add function pointer alignment to datalayout Use this feature to fix a bug on ARM where 4 byte alignment is incorrectly assumed. Differential Revision: https://reviews.llvm.org/D57335 llvm-svn: 355685	2019-03-08 10:44:06 +00:00
Carl Ritson	1a98dc1840	[AMDGPU] V_CVT_F32_UBYTE{0,1,2,3} are full rate instructions Summary: Fix a bug in the scheduling model where V_CVT_F32_UBYTE{0,1,2,3} are incorrectly marked as quarter rate instructions. Reviewers: arsenm, rampitec Reviewed By: rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59091 llvm-svn: 355671	2019-03-08 09:03:11 +00:00
Craig Topper	4505c99e72	[X86] Improve the type checking in isLegalMaskedLoad and isLegalMaskedGather. We were just checking pointer size and type primitive size. But this caused unintended things like vectors of half being accepted by masked load/store. For FP we now explicitly check for only double and float. For pointers we now let any pointer through. Trusting that only 32 and 64 would be used to generate assembly. We only check bitwidth after checking that the type is an integer. llvm-svn: 355667	2019-03-08 07:33:43 +00:00
Craig Topper	d0c2dba644	[X86] Correct scheduler information for rotate by constant for Haswell, Broadwell, and Skylake. Rotate with explicit immediate is a single uop from Haswell on. An immediate of 1 has a dependency on the previous writer of flags, but the other immediate values do not. The implicit rotate by 1 instruction is 2 uops. But the flags are merged after the rotate uop so the data result does not see the flag dependency. But I don't think we have any way of modeling that. RORX is 1 uop without the load. 2 uops with the load. We currently model these with WriteShift/WriteShiftLd. Differential Revision: https://reviews.llvm.org/D59077 llvm-svn: 355636	2019-03-07 21:22:56 +00:00
Craig Topper	b3af5d3e57	[X86] Model ADC/SBB with immediate 0 more accurately in the Haswell scheduler model Haswell and possibly Sandybridge have an optimization for ADC/SBB with immediate 0 to use a single uop flow. This only applies GR16/GR32/GR64 with an 8-bit immediate. It does not apply to GR8. It also does not apply to the implicit AX/EAX/RAX forms. Differential Revision: https://reviews.llvm.org/D59058 llvm-svn: 355635	2019-03-07 21:22:51 +00:00
Konstantin Zhuravlyov	47f0bf8f1f	AMDHSA: Code object v3 updates - Copy kernel symbol attributes into kernel descriptor attributes - Make sure kernel symbol's visibility is not "higher" than protected Differential Revision: https://reviews.llvm.org/D59057 llvm-svn: 355630	2019-03-07 19:58:29 +00:00
Vlad Tsyrklevich	2e1479e2f2	Delete x86_64 ShadowCallStack support Summary: ShadowCallStack on x86_64 suffered from the same racy security issues as Return Flow Guard and had performance overhead as high as 13% depending on the benchmark. x86_64 ShadowCallStack was always an experimental feature and never shipped a runtime required to support it, as such there are no expected downstream users. Reviewers: pcc Reviewed By: pcc Subscribers: mgorny, javed.absar, hiraditya, jdoerfert, cfe-commits, #sanitizers, llvm-commits Tags: #clang, #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D59034 llvm-svn: 355624	2019-03-07 18:56:36 +00:00
Jinsong Ji	de3348ae3f	[PowerPC] Run clang format to avoid compiling warning. llvm-svn: 355623	2019-03-07 18:55:21 +00:00
Mitch Phillips	92dd321a14	Rollback of rL355585. Introduces memory leak in FunctionTest.GetPointerAlignment that breaks sanitizer buildbots: ``` ================================================================= ==2453==ERROR: LeakSanitizer: detected memory leaks Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105 #1 0x16936bc in llvm::User::operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/User.cpp:151:19 #2 0x7c3fe9 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:12 #3 0x7c3fe9 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136 #4 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #5 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #6 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #7 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #8 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #9 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #10 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #11 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #12 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #13 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) Indirect leak of 40 byte(s) in 1 object(s) allocated from: #0 0x610428 in operator new(unsigned long) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/projects/compiler-rt/lib/asan/asan_new_delete.cc:105 #1 0x151be6b in make_unique<llvm::ValueSymbolTable> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/ADT/STLExtras.h:1349:29 #2 0x151be6b in llvm::Function::Function(llvm::FunctionType, llvm::GlobalValue::LinkageTypes, unsigned int, llvm::Twine const&, llvm::Module) /b/sanitizer-x86_64-linux-bootstrap/build/llvm/lib/IR/Function.cpp:241 #3 0x7c4006 in Create /b/sanitizer-x86_64-linux-bootstrap/build/llvm/include/llvm/IR/Function.h:144:16 #4 0x7c4006 in (anonymous namespace)::FunctionTest_GetPointerAlignment_Test::TestBody() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/unittests/IR/FunctionTest.cpp:136 #5 0x1a836a0 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #6 0x1a836a0 in testing::Test::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #7 0x1a85c55 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #8 0x1a870d0 in testing::TestCase::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #9 0x1aa5b84 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #10 0x1aa4d30 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc #11 0x1aa4d30 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #12 0x1a6b656 in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #13 0x1a6b656 in main /b/sanitizer-x86_64-linux-bootstrap/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #14 0x7f5af37a22e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) SUMMARY: AddressSanitizer: 168 byte(s) leaked in 2 allocation(s). ``` See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap/builds/11358/steps/check-llvm%20asan/logs/stdio for more information. Also introduces use-of-uninitialized-value in ConstantsTest.FoldGlobalVariablePtr: ``` ==7070==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x14e703c in User /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5 #1 0x14e703c in Constant /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/Constant.h:44 #2 0x14e703c in llvm::GlobalValue::GlobalValue(llvm::Type, llvm::Value::ValueTy, llvm::Use, unsigned int, llvm::GlobalValue::LinkageTypes, llvm::Twine const&, unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalValue.h:78 #3 0x14e5467 in GlobalObject /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/GlobalObject.h:34:9 #4 0x14e5467 in llvm::GlobalVariable::GlobalVariable(llvm::Type, bool, llvm::GlobalValue::LinkageTypes, llvm::Constant, llvm::Twine const&, llvm::GlobalValue::ThreadLocalMode, unsigned int, bool) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/IR/Globals.cpp:314 #5 0x6938f1 in llvm::(anonymous namespace)::ConstantsTest_FoldGlobalVariablePtr_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm/unittests/IR/ConstantsTest.cpp:565:18 #6 0x1a240a1 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc #7 0x1a240a1 in testing::Test::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2474 #8 0x1a26d26 in testing::TestInfo::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2656:11 #9 0x1a2815f in testing::TestCase::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:2774:28 #10 0x1a43de8 in testing::internal::UnitTestImpl::RunAllTests() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4649:43 #11 0x1a42c47 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc #12 0x1a42c47 in testing::UnitTest::Run() /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/gtest.cc:4257 #13 0x1a0dfba in RUN_ALL_TESTS /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/include/gtest/gtest.h:2233:46 #14 0x1a0dfba in main /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/UnitTestMain/TestMain.cpp:50 #15 0x7f2081c412e0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202e0) #16 0x4dff49 in _start (/b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/unittests/IR/IRTests+0x4dff49) SUMMARY: MemorySanitizer: use-of-uninitialized-value /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/IR/User.h:79:5 in User ``` See http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30222/steps/check-llvm%20msan/logs/stdio for more information. llvm-svn: 355616	2019-03-07 18:13:39 +00:00
Petar Avramovic	3d3120dc9a	[MIPS GlobalISel] Fix mul operands Unsigned mul high for MIPS32 is selected into two PseudoInstructions: PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for some of its operands. Registers in this class have appropriate hi and lo register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc. mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td. In functions where mul and PseudoMULTu are present fastRegisterAllocator will "run out of registers during register allocation" because 'calcSpillCost' for $ac0 will return spillImpossible because subregisters $lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction. Differential Revision: https://reviews.llvm.org/D58715 llvm-svn: 355594	2019-03-07 13:28:29 +00:00
Michael Platings	fd4156ed4d	[IR][ARM] Add function pointer alignment to datalayout Use this feature to fix a bug on ARM where 4 byte alignment is incorrectly assumed. Differential Revision: https://reviews.llvm.org/D57335 llvm-svn: 355585	2019-03-07 09:15:23 +00:00
Craig Topper	3acc4236b8	[X86] Enable combineFMinNumFMaxNum for 512 bit vectors when AVX512 is enabled. Simplified by just checking if the vector type is legal rather than listing all combinations of types and features. Fixes PR40984. llvm-svn: 355582	2019-03-07 06:30:19 +00:00
Aakanksha Patil	c56d2afc63	AMDGPU: Handle "uniform-work-group-size" attribute (fix for RADV) A previous patch for "uniform-work-group-size" attribute was found to break some RADV and possibly radeon SI tests and had to be retracted. This patch fixes that. Differential Revision: http://reviews.llvm.org/D58993 llvm-svn: 355574	2019-03-07 00:54:04 +00:00
Simon Atanasyan	83b88441ad	[mips] Replace assertion by error message while lowering `RETURNADDR` and `FRAMEADDR` MIPS target supports lowering `RETURNADDR` and `FRAMEADDR` for a current frame only. It's better to show an error message then crash on assertion if `__builtin_return_address` is invoked with non-zero argument. llvm-svn: 355558	2019-03-06 22:40:28 +00:00
Abderrazek Zaafrani	5ced596198	[AArch64] Improve FP16 instruction selection for vector round and vector conver from half instructions https://reviews.llvm.org/D58855 llvm-svn: 355545	2019-03-06 20:30:06 +00:00
Mitch Phillips	318028f00f	Revert "[IR][ARM] Add function pointer alignment to datalayout" This reverts commit `2391bfca97`. This reverts rL355522 (https://reviews.llvm.org/D57335). Kills buildbots that use '-Werror' with the following error: /var/lib/buildbot/sanitizer-buildbot6/sanitizer-x86_64-linux-android/build/llvm/lib/IR/Value.cpp:657:7: error: default label in switch which covers all enumeration values [-Werror,-Wcovered-switch-default] See buildbots http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/30200/steps/check-llvm%20asan/logs/stdio for more information. llvm-svn: 355537	2019-03-06 19:17:18 +00:00
Amara Emerson	21f44dfe9c	[AArch64] Remove a stray test from the AArch64 directory. llvm-svn: 355534	2019-03-06 18:54:07 +00:00
Simon Pilgrim	9d6347cfc1	[DAGCombine] Improve select (not Cond), N1, N2 -> select Cond, N2, N1 fold Move the x86 combine from D58974 into the DAGCombine VSELECT code and update the SELECT version to use the isBooleanFlip helper as well. Requested by @spatel on D59006 llvm-svn: 355533	2019-03-06 18:52:52 +00:00
Guozhi Wei	11308bdb43	[PPC] Adjust the computed branch offset for the possible shorter distance In file PPCBranchSelector.cpp we tend to over estimate code size due to large alignment and inline assembly. Usually it causes larger computed branch offset, it is not big problem. But sometimes it may also causes smaller computed branch offset than actual branch offset. If the offset is close to the limit of encoding, it may cause problem at run time. Following is a simplified example. actual estimated address address ... bne Far 100 10c .p2align 4 Near: 110 110 ... Far: 8108 8108 Actual offset: 0x8108 - 0x100 = 0x8008 Computed offset: 0x8108 - 0x10c = 0x7ffc The computed offset is at most ((1 << alignment) - 4) bytes smaller than actual offset. So we add this number to the offset for safety. Differential Revision: https://reviews.llvm.org/D57718 llvm-svn: 355529	2019-03-06 18:22:22 +00:00
Krzysztof Parzyszek	9c005bbdd4	[Hexagon] Avoid creating 5-instruction packets with vgather pseudos Change the resource usage of the vgather pseudos from SLOT0+LD to SLOT0+SLOT1. llvm-svn: 355524	2019-03-06 17:43:50 +00:00
Michael Platings	2391bfca97	[IR][ARM] Add function pointer alignment to datalayout Use this feature to fix a bug on ARM where 4 byte alignment is incorrectly assumed. Differential Revision: https://reviews.llvm.org/D57335 llvm-svn: 355522	2019-03-06 17:24:11 +00:00
Ryan Taylor	67f36903ae	[AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions Summary: This adds support for 64 bit buffer atomic arithmetic instructions but does not include cmpswap as that depends on a fix to the way the register pairs are handled Change-Id: Ib207ea65fb69487ccad5066ea647ae8ddfe2ce61 Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58918 llvm-svn: 355520	2019-03-06 17:02:06 +00:00
Strahinja Petrovic	94fccc93de	[PowerPC] Add secure plt support for TLS symbols This patch supports secure plt mode for TLS symbols. Differential Revision: https://reviews.llvm.org/D45520 llvm-svn: 355513	2019-03-06 15:00:10 +00:00
Simon Pilgrim	468bb2e601	[X86][SSE] VSELECT(XOR(Cond,-1), LHS, RHS) --> VSELECT(Cond, RHS, LHS) As noticed on D58965 DAGCombiner::visitSELECT has something similar, so we should be able to move this to DAGCombiner and support VSELECT as well at some point. Differential Revision: https://reviews.llvm.org/D58974 llvm-svn: 355494	2019-03-06 10:54:43 +00:00
Craig Topper	c0e01d29a4	[X86] Enable the add with 128 -> sub with -128 encoding trick with X86ISD::ADD when the carry flag isn't used. This allows us to use an 8-bit sign extended immediate instead of a 16 or 32 bit immediate. Also do similar for 0x80000000 with 64-bit adds to avoid having to use a movabsq. llvm-svn: 355485	2019-03-06 07:36:38 +00:00
Craig Topper	97a1c4c340	[X86] Suppress load folding for add/sub with 128 immediate. 128 won't fit in a sign extended 8-bit immediate, but we can negate it to -128 and use the other operation. This results in a shorter encoding since the move would have used 16 or 32 bits for the immediate. llvm-svn: 355484	2019-03-06 07:36:36 +00:00
Craig Topper	112ea336c3	[X86] Remove periods from the end of SubtargetFeature descriptions since the help printer adds a period. Most features don't have periods already, but some did. When there is a period it causes llc -mattr=+help to print 2 periods. llvm-svn: 355474	2019-03-06 02:36:48 +00:00
Heejin Ahn	3c20b34d24	[WebAssembly] Remove trailing whitespaces in tests (NFC) Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58955 llvm-svn: 355472	2019-03-06 02:00:22 +00:00
Florian Hahn	13bbcb3264	[ARM] Sink zext/sext operands for add and sub to enable vsubl generation. This uses the infrastructure added in rL353152 to sink zext and sexts to sub/add users, to enable vsubl/vaddl generation when NEON is available. See https://bugs.llvm.org/show_bug.cgi?id=40025. Reviewers: SjoerdMeijer, t.p.northover, samparker, efriedma Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D58063 llvm-svn: 355460	2019-03-06 00:10:03 +00:00
Heejin Ahn	5c644c9bca	[WebAssembly] Simplify iterator navigations (NFC) Summary: - Replaces some uses of `MachineFunction::iterator(MBB)` with `MBB->getIterator()` and `MachineBasicBlock::iterator(MI)` with `MI->getIterator()`, which are simpler. - Replaces some uses of `std::prev` of `std::next` that takes a MachineFunction or MachineBasicBlock iterator with `getPrevNode` and `getNextNode`, which are also simpler. Reviewers: sbc100 Subscribers: dschuff, sunfish, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58913 llvm-svn: 355444	2019-03-05 21:05:09 +00:00
Heejin Ahn	ef9d6aea45	[WebAssembly] Disable MachineBlockPlacement pass Summary: This pass hurts code size for wasm and sometimes generates irreducible control flow. Context: https://github.com/emscripten-core/emscripten/pull/8233 Reviewers: kripken, dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58953 llvm-svn: 355437	2019-03-05 20:35:34 +00:00
Craig Topper	57fd733140	Revert r355224 "[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary." This caused the first matcher in the isel table for many targets to Opc_Scope instead of Opc_SwitchOpcode. This leads to a significant increase in isel match failures. llvm-svn: 355433	2019-03-05 19:18:16 +00:00
Guozhi Wei	f124e75656	[X86] In X86DomainReassignment.cpp add enclosed registers to EnclosedEdges The variable X86DomainReassignment::EnclosedEdges is used to store registers that have been enclosed in some closure, so those registers will be ignored when create new closures. But there is no registers has ever been put into this set, so a single register can be enclosed in multiple closures, it significantly increase compile time. This patch adds a register into EnclosedEdges when it is enclosed into a closure. Differential Revision: https://reviews.llvm.org/D58646 llvm-svn: 355430	2019-03-05 18:54:34 +00:00
Matt Arsenault	870397739e	AMDGPU: Preserve undef flag when expanding SI_IF Fixes undefined value verifier error. llvm-svn: 355426	2019-03-05 18:38:00 +00:00
Craig Topper	4a9dd7c39b	[X86] Enable 8-bit SHL to convert to LEA Differential Revision: https://reviews.llvm.org/D58870 llvm-svn: 355425	2019-03-05 18:37:41 +00:00
Craig Topper	216bf7f03b	[X86] Allow 8-bit INC/DEC to be converted to LEA. We already do this for 16/32/64 as well as 8-bit add with register/immediate. Might as well do it for 8-bit INC/DEC too. Differential Revision: https://reviews.llvm.org/D58869 llvm-svn: 355424	2019-03-05 18:37:37 +00:00
Craig Topper	572e94ca02	[X86] Enable 8-bit OR with disjoint bits to convert to LEA We already support 8-bits adds in convertToThreeAddress. But we can also support 8-bit OR if the bits are disjoint. We already do this for 16/32/64. Differential Revision: https://reviews.llvm.org/D58863 llvm-svn: 355423	2019-03-05 18:37:33 +00:00
Jessica Paquette	00d5847b5c	Revert "[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT" This broke test-suite::aarch64_neon_intrinsics.test Reverting while I look into it. Example failure: http://lab.llvm.org:8011/builders/clang-cmake-aarch64-quick/builds/17740 llvm-svn: 355408	2019-03-05 15:47:00 +00:00
Carl Ritson	9e3f7d8ad0	[AMDGPU] Fix DPP operand order in atomic optimizer Summary: Ensure order of operands in DPP atomic optimizer final WWM step is appropriate for sub instructions. Change-Id: I631d050e1c00a3b4bc7c11a90437064403c4cf30 Reviewers: sheredom, tpr Reviewed By: sheredom Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, t-tye, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58900 llvm-svn: 355394	2019-03-05 12:21:44 +00:00
Heejin Ahn	c7397613d2	[WebAssembly] Rename a variable in LateEHPrepare (NFC) llvm-svn: 355387	2019-03-05 11:11:34 +00:00
Oliver Stannard	4a9086b537	[ARM] Fix select_cc lowering for fp16 When lowering a select_cc node where the true and false values are of type f16, we can't use a general conditional move because the FP16 instructions do not support conditional execution. Instead, we must ensure that the condition code is one of the four supported by the VSEL instruction. Differential revision: https://reviews.llvm.org/D58813 llvm-svn: 355385	2019-03-05 10:42:34 +00:00
David Stuttard	81eec58a0d	[AMDGPU] Omit KILL instructions from hazard recognizer Summary: In some cases the KILL was causing a hazard to be introduced as these were scheduled into hazard slots, but don't result in an instruction. KILL shouldn't be considered for hazard recognition. Change-Id: Ib6d2a2160f8c94cd0ce611ab198c7e4f46aeffcf Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58898 llvm-svn: 355384	2019-03-05 10:25:16 +00:00
Chen Zheng	9cfe7e81f1	[PowerPC] fix killed/dead flag after convert x-form to d-form tranformation. Differential Revision: https://reviews.llvm.org/D58428 llvm-svn: 355378	2019-03-05 04:56:54 +00:00
Scott Linder	efec1396ac	[AMDGPU] Implement AMDGPUMCInstrAnalysis Implement MCInstrAnalysis for AMDGPU, with default implementations save for `evaluateBranch`. Differential Revision: https://reviews.llvm.org/D58400 llvm-svn: 355373	2019-03-05 03:02:00 +00:00
Craig Topper	6a6ce5be84	[X86] Reduce some patterns by using FP instructions for integer types even when AVX2 is available and execution domain fixing will do the right thing We have quite a few cases of using FP instructions for integer operations when only AVX1 is available. Then we switch to integer instructions with AVX2. In a lot of these cases execution domain fixing will take care of turning FP instructions into integer if its profitable. With this patch we just keep on using the FP instructions even with AVX2. I've only handled some cases that don't require messing with patterns that are defined in the instruction definition. Those will require more subtle multiclass work possibly involving null_frag, hasSideEffects = 0, etc. Differential Revision: https://reviews.llvm.org/D58470 llvm-svn: 355361	2019-03-05 01:14:25 +00:00
Yonghong Song	d82247cb80	[BPF] Do not generate BTF sections unnecessarily If There is no types/non-empty strings, do not generate .BTF section. If there is no func_info/line_info, do not generate .BTF.ext section. Signed-off-by: Yonghong Song <yhs@fb.com> Differential Revision: https://reviews.llvm.org/D58936 llvm-svn: 355360	2019-03-05 01:01:21 +00:00
Jessica Paquette	caf62b1d47	[GlobalISel][AArch64] Add selection support for G_EXTRACT_VECTOR_ELT This adds instruction selection support for G_EXTRACT_VECTOR_ELT for cases where the index is defined by a G_CONSTANT. It also factos out the lane copy opcode selection part into its own function, `getLaneCopyOpcode`. This is used by both `selectUnmergeValues` and `selectExtractElt`. Differential Revision: https://reviews.llvm.org/D58469 llvm-svn: 355344	2019-03-04 22:35:32 +00:00
Jessica Paquette	0632e12f89	[GlobalISel][AArch64] Legalize vector G_SELECT Just scalarize it, and add a test showing it works. Differential Revision: https://reviews.llvm.org/D58747 llvm-svn: 355339	2019-03-04 21:12:46 +00:00
Amara Emerson	8acb0d9c82	Re-commit r355104: "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1." The code to materialize a mask from a constant pool load tried to use a 128 bit LDR to load a 64 bit constant pool entry, which was 8 byte aligned. This resulted in a link failure in the NEON tests in the test suite since the LDR address was unaligned. This change fixes that to instead emit a 64 bit LDR if the entry is 64 bit, before converting back to a 128 bit register for the TBL. llvm-svn: 355326	2019-03-04 19:16:00 +00:00
Wouter van Oortmerssen	f3feb6adb9	[WebAssembly] Add support for data sections in the assembler. Summary: This is quite minimal so far, introduce them with .section, fill them with .int8 or .asciz, end with .size Reviewers: dschuff, sbc100, aheejin Subscribers: jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58660 llvm-svn: 355321	2019-03-04 17:18:04 +00:00
Dmitry Preobrazhensky	6023d5990d	[AMDGPU][MC] Enable lds_direct operand for v_readfirstlane_b32, v_readlane_b32 and v_writelane_b32 See bug 40662: https://bugs.llvm.org/show_bug.cgi?id=40662 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58713 llvm-svn: 355312	2019-03-04 12:48:32 +00:00
Jeremy Morse	09d8ea5282	[X86] Avoid codegen changes when DBG_VALUE appears between lowered selects X86TargetLowering::EmitLoweredSelect presently detects sequences of CMOV pseudo instructions without accounting for debug intrinsics. This leads to different codegen with and without option -g, if a DBG_VALUE instruction lands in the middle of several lowered selects. Work around this by skipping over debug instructions when looking for CMOV sequences, and sinking those debug insts into the EmitLoweredSelect sunk block. This might slightly shift where variables appear in the instruction sequence, but won't re-order assignments. Differential Revision: https://reviews.llvm.org/D58672 llvm-svn: 355307	2019-03-04 10:56:02 +00:00
Oliver Stannard	181afc7f3b	[ARM] Fix selection of VLDR.16 instruction with imm offset The isScaledConstantInRange function takes upper and lower bounds which are checked after dividing by the scale, so the bounds checks for half, single and double precision should all be the same. Previously, we had wrong bounds checks for half precision, so selected an immediate the instructions can't actually represent. Differential revision: https://reviews.llvm.org/D58822 llvm-svn: 355305	2019-03-04 09:17:38 +00:00
Jonas Hahnfeld	65a401f6a9	[AArch64/ARM] Fix two compiler warnings in InstructionSelector, NFCI 1) GCC complains that KnownValid is set but not used. 2) In ARMInstructionSelector::selectGlobal() the code is mixing "enumeral and non-enumeral type in conditional expression". Solve this by casting to unsigned which is the final type anyway. Differential Revision: https://reviews.llvm.org/D58834 llvm-svn: 355304	2019-03-04 08:51:32 +00:00
Heejin Ahn	195a62e9ae	[WebAssembly] Delete ThrowUnwindDest map from WasmEHFuncInfo Summary: Before when we implemented the first EH proposal, 'catch <tag>' instruction may not catch an exception so there were multiple EH pads an exception can unwind to. That means a BB could have multiple EH pad successors. Now after we switched to the new proposal, every 'catch' instruction catches an exception, and there is only one catchpad per catchswitch, so we at most have one EH pad successor, making `ThrowUnwindDest` map in `WasmEHInfo` unnecessary. Keeping `ThrowUnwindDest` map in `WasmEHInfo` has its own problems, because other optimization passes can split a BB that contains possibly throwing calls (previously invokes), and we have to update the map every time that happens, which is not easy for common CodeGen passes. This also correctly updates successor info in LateEHPrepare when we add a rethrow instruction. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58486 llvm-svn: 355296	2019-03-03 22:35:56 +00:00
Simon Pilgrim	e48be5d698	Remove unused variable. NFCI. llvm-svn: 355289	2019-03-03 14:23:07 +00:00
Simon Pilgrim	d8e91a54c0	[X86] getShuffleScalarElt - peek through insert/extract subvector nodes. llvm-svn: 355288	2019-03-03 14:11:05 +00:00
Simon Pilgrim	11149ea433	[X86] Pull out combineToConsecutiveLoads helper. NFCI. llvm-svn: 355287	2019-03-03 13:53:27 +00:00
Craig Topper	ce68659772	[X86] Prefer VPBLENDD for v2i64/v4i64 blends with AVX2. We were using VPBLENDW for v2i64 and VBLENDPD for v4i64. VPBLENDD has better throughput than VPBLENDW on some CPUs so it makes sense to use it when possible. VBLENDPD will probably become VBLENDD during execution domain fixing, but we might as well use integer in isel while we can. This should work around some issues with the domain fixing pass prefering PBLENDW when we start with PBLENDW. There may still be some v8i16 cases that could use PBLENDD. llvm-svn: 355281	2019-03-03 00:18:07 +00:00
Thomas Lively	43876ae7bc	[WebAssembly] Expand operations not supported by SIMD Summary: This prevents crashes in instruction selection when these operations are used. The tests check that the scalar version of the instruction is used where applicable, although some expansions do not use the scalar version. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58859 llvm-svn: 355261	2019-03-02 03:32:25 +00:00
Amaury Sechet	f24abf6511	[X86] Improve use of SHLD/SHRD Summary: This extends the variety of pattern that can generate a SHLD instead of using two shifts. This fixes a regression that would be introduced by D57367 or D33587 Reviewers: RKSimon, craig.topper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57389 llvm-svn: 355260	2019-03-02 02:44:16 +00:00
Thomas Lively	7ec62fde78	Revert "[WebAssembly][WIP] Expand operations not supported by SIMD" This was accidentally committed without tests or review. llvm-svn: 355254	2019-03-02 00:55:16 +00:00
Thomas Lively	f5a8c28e7e	[WebAssembly][WIP] Expand operations not supported by SIMD llvm-svn: 355247	2019-03-02 00:18:07 +00:00
Craig Topper	4cfc39179e	[TableGen][SelectionDAG][X86] Add specific isel matchers for immAllZerosV/immAllOnesV. Remove bitcasts from X86 patterns that are no longer necessary. Previously we had build_vector PatFrags that called ISD::isBuildVectorAllZeros/Ones. Internally the ISD::isBuildVectorAllZeros/Ones look through bitcasts, but we aren't able to take advantage of that in isel. Instead of we have to canonicalize the types of the all zeros/ones build_vectors and insert bitcasts. Then we have to pattern match those exact bitcasts. By emitting specific matchers for these 2 nodes, we can make isel look through any bitcasts without needing to explicitly match them. We should also be able to remove the canonicalization to vXi32 from lowering, but I've left that for a follow up. This removes something like 40,000 bytes from the X86 isel table. Differential Revision: https://reviews.llvm.org/D58595 llvm-svn: 355224	2019-03-01 20:18:38 +00:00
Vlad Tsyrklevich	8925138007	Revert "[MIPS GlobalISel] Fix mul operands" This reverts commit r355178, it is causing ASan failures on the sanitizer bots. llvm-svn: 355219	2019-03-01 18:58:22 +00:00
Thomas Lively	d295f51469	Revert "[WebAssembly] Lower SIMD shifts since they are fixed in V8" They weren't fixed in V8. Oops. llvm-svn: 355208	2019-03-01 17:43:55 +00:00
Oliver Stannard	82fbbc21fd	[ARM] Fix FP16 stack loads/stores for Thumb2 with frame pointer The new addressing mode added for the v8.2A FP16 instructions uses bit 8 of the immediate to encode the sign of the offset, like the other FP loads/stores, so need to be treated the same way. Differential revision: https://reviews.llvm.org/D58816 llvm-svn: 355201	2019-03-01 14:20:28 +00:00
Oliver Stannard	e019e6223b	[ARM] Consider undefined-on-NaN conditions in checkVSELConstraints This function was not checking for the condition code variants which are undefined if either input is NaN, so we were missing selection of the VSEL instruction in some cases when using -fno-honor-nans or -ffast-math. Differential revision: https://reviews.llvm.org/D58812 llvm-svn: 355199	2019-03-01 13:58:25 +00:00
Diana Picus	54829ec5d0	[ARM GlobalISel] Support G_CTLZ for Thumb2 Same as ARM mode but with different opcode. llvm-svn: 355191	2019-03-01 10:12:28 +00:00
Stanislav Mekhanoshin	bb98841399	[AMDGPU] Mark ds instructions as meybeAtomic These were not recognized as potential atomics by memory legalizer. The test was working not because legalizer did a right thing, but because it has skipped all these instructions. When I have fixed DS desciption test started to fail because region address has changed from 4 to 2 a while ago. Differential Revision: https://reviews.llvm.org/D58802 llvm-svn: 355179	2019-03-01 07:59:17 +00:00
Petar Avramovic	9bf43b5c26	[MIPS GlobalISel] Fix mul operands Unsigned mul high for MIPS32 is selected into two PseudoInstructions: PseudoMULTu and PseudoMFHI that use accumulator register class ACC64 for some of its operands. Registers in this class have appropriate hi and lo register as subregisters: $lo0 and $hi0 are subregisters of $ac0 etc. mul instruction implicit-defs $lo0 and $hi0 according to MipsInstrInfo.td. In functions where mul and PseudoMULTu are present fastRegisterAllocator will "run out of registers during register allocation" because 'calcSpillCost' for $ac0 will return spillImpossible because subregisters $lo0 and $hi0 of $ac0 are reserved by mul instruction above. A solution is to mark implicit-defs of $lo0 and $hi0 as dead in mul instruction. Differential Revision: https://reviews.llvm.org/D58715 llvm-svn: 355178	2019-03-01 07:35:57 +00:00
Petar Avramovic	a48285a190	[MIPS GlobalISel] Select G_UMULH Legalize G_UMULO and select G_UMULH for MIPS32. Differential Revision: https://reviews.llvm.org/D58714 llvm-svn: 355177	2019-03-01 07:25:44 +00:00
Thomas Lively	c4b674955c	[WebAssembly] Lower SIMD shifts since they are fixed in V8 Reviewers: sbc100 Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58800 llvm-svn: 355163	2019-03-01 01:38:54 +00:00
Tom Stellard	33634d1b25	AMDGPU/GlobalISel: Implement select for G_INSERT Re-commit r344310. Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53116 llvm-svn: 355159	2019-03-01 00:50:26 +00:00
Thomas Lively	ae79f42a2f	[WebAssembly] Fix crash when @llvm.global_dtors is external Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58799 llvm-svn: 355157	2019-03-01 00:12:13 +00:00
Tom Stellard	41f32196a0	AMDGPU/GlobalISel: Implement select for G_EXTRACT Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49714 llvm-svn: 355156	2019-02-28 23:37:48 +00:00
Joerg Sonnenberger	01530291ea	[PPC] Secure PLT only has meaning for PIC llvm-svn: 355154	2019-02-28 23:33:09 +00:00
Eli Friedman	d19a7060c6	[AArch64] [Windows] Don't skip constructing UnwindHelp. In certain cases, the first non-frame-setup instruction in a function is a branch. For example, it could be a cbz on an argument. Make sure we correctly allocate the UnwindHelp, and find an appropriate register to use to initialize it. Fixes https://bugs.llvm.org/show_bug.cgi?id=40184 Differential Revision: https://reviews.llvm.org/D58752 llvm-svn: 355136	2019-02-28 20:38:45 +00:00
Abderrazek Zaafrani	abfd10807c	[AArch64] Improve FP16 vector convert from short instructions. https://reviews.llvm.org/D58563 llvm-svn: 355134	2019-02-28 20:21:46 +00:00
Sanjay Patel	7fc6ef7dd7	[x86] scalarize extract element 0 of FP math This is another step towards ensuring that we produce the optimal code for reductions, but there are other potential benefits as seen in the tests diffs: 1. Memory loads may get scalarized resulting in more efficient code. 2. Memory stores may get scalarized resulting in more efficient code. 3. Complex ops like fdiv/sqrt get scalarized which may be faster instructions depending on uarch. 4. Even simple ops like addss/subss/mulss/roundss may result in faster operation/less frequency throttling when scalarized depending on uarch. The TODO comment suggests 1 or more follow-ups for opcodes that can currently result in regressions. Differential Revision: https://reviews.llvm.org/D58282 llvm-svn: 355130	2019-02-28 19:47:04 +00:00
Jiong Wang	0a039660fa	bpf: disassembler support for XADD under sub-register mode Like the other load/store instructions, "w" register is preferred when disassembling BPF_STX \| BPF_W \| BPF_XADD. v1 -> v2: - Updated testcase insn-unit.s (Yonghong) Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 355127	2019-02-28 19:22:34 +00:00
Jiong Wang	3da8bcd0a0	bpf: enable sub-register code-gen for XADD Support sub-register code-gen for XADD is like supporting any other Load and Store patterns. No new instruction is introduced. lock (u32 )(r1 + 0) += w2 has exactly the same underlying insn as: lock (u32 )(r1 + 0) += r2 BPF_W width modifier has guaranteed they behave the same at runtime. This patch merely teaches BPF back-end that BPF_W width modifier could work GPR32 register class and that's all needed for sub-register code-gen support for XADD. test/CodeGen/BPF/xadd.ll updated to include sub-register code-gen tests. A new testcase test/CodeGen/BPF/xadd_legal.ll is added to make sure the legal case could pass on all code-gen modes. It could also test dead Def check on GPR32. If there is no proper handling like what has been done inside BPFMIChecking.cpp:hasLivingDefs, then this testcase will fail. Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 355126	2019-02-28 19:21:28 +00:00
Jiong Wang	3d7c265e11	bpf: improve dead Defs check for XADD BPF XADD semantics require all Defs of XADD are dead, meaning any result of XADD insn is not used. However, BPF backend hasn't enabled sub-register liveness track, so when the source and destination operands of XADD are GPR32, there is no sub-register dead info. If we rely on the generic MachineInstr::allDefsAreDead, then we will raise false alarm on GPR32 Def. This was fine as there was no sub-register code-gen support for XADD which will be added by the next patch. To support GPR32 Def, ideally we could just enable sub-registr liveness track on BPF backend, then allDefsAreDead could work on GPR32 Def. This requires implementing TargetSubtargetInfo::enableSubRegLiveness on BPF. However, sub-register liveness tracking module inside LLVM is actually designed for the situation where one register could be split into more than one sub-registers for which case each sub-register could have their own liveness and kill one of them doesn't kill others. So, tracking liveness for each make sense. For BPF, each 64-bit register could only have one 32-bit sub-register. This is exactly the case which LLVM think brings no benefits for doing sub-register tracking, because the live range of sub-register must always equal to its parent register, therefore liveness tracking is disabled even the back-end has implemented enableSubRegLiveness. The detailed information is at r232695: Author: Matthias Braun <matze@braunis.de> Date: Thu Mar 19 00:21:58 2015 +0000 Do not track subregister liveness when it brings no benefits Hence, for BPF, we enhance MachineInstr::allDefsAreDead. Given the solo sub-register always has the same liveness as its parent register, LLVM is already attaching a implicit 64-bit register Def whenever the there is a sub-register Def. The liveness of the implicit 64-bit Def is available. For example, for "lock (u32 )(r0 + 4) += w9", the MachineOperand info could be: $w9 = XADDW32 killed $r0, 4, $w9(tied-def 0), implicit killed $r9, implicit-def dead $r9 Even though w9 is not marked as Dead, the parent register r9 is marked as Dead correctly, and it is safe to use such information or our purpose. v1 -> v2: - Simplified code logic inside hasLiveDefs. (Yonghong) Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiong Wang <jiong.wang@netronome.com> llvm-svn: 355124	2019-02-28 19:20:26 +00:00
Craig Topper	38427c47b9	[X86] Don't peek through bitcasts before checking ISD::isBuildVectorOfConstantSDNodes in combineTruncatedArithmetic We don't have any combines that can look through a bitcast to truncate a build vector of constants. So the truncate will stick around and give us something like this pattern (binop (trunc X), (trunc (bitcast (build_vector)))) which has two truncates in it. Which will be reversed by hoistLogicOpWithSameOpcodeHands in the generic DAG combiner. Thus causing an infinite loop. Even if we had a combine for (truncate (bitcast (build_vector))), I think it would need to be implemented in getNode otherwise DAG combiner visit ordering would probably still visit the binop first and reverse it. Or combineTruncatedArithmetic would need to do its own constant folding. Differential Revision: https://reviews.llvm.org/D58705 llvm-svn: 355116	2019-02-28 18:49:29 +00:00
Amara Emerson	8d70e6425c	Revert "[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1." Seems to break some neon intrinsics tests. llvm-svn: 355115	2019-02-28 18:47:29 +00:00
Thomas Lively	f3b4f99007	[WebAssembly] Remove uses of ThreadModel Summary: In the clang UI, replaces -mthread-model posix with -matomics as the source of truth on threading. In the backend, replaces -thread-model=posix with the atomics target feature, which is now collected on the WebAssemblyTargetMachine along with all other used features. These collected features will also be used to emit the target features section in the future. The default configuration for the backend is thread-model=posix and no atomics, which was previously an invalid configuration. This change makes the default valid because the thread model is ignored. A side effect of this change is that objects are never emitted with passive segments. It will instead be up to the linker to decide whether sections should be active or passive based on whether atomics are used in the final link. Reviewers: aheejin, sbc100, dschuff Subscribers: mehdi_amini, jgravelle-google, hiraditya, sunfish, steven_wu, dexonsmith, rupprecht, jfb, jdoerfert, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D58742 llvm-svn: 355112	2019-02-28 18:39:08 +00:00
Amara Emerson	85c3afd7f6	[AArch64][GlobalISel] Add support for 64 bit vector shuffle using TBL1. This extends the existing support for shufflevector to handle cases like <2 x float>, which we can implement by concating the vectors and using a TBL1. Differential Revision: https://reviews.llvm.org/D58684 llvm-svn: 355104	2019-02-28 16:43:11 +00:00
Kadir Cetinkaya	1b1b1a6135	[Target][ARM] Add a usage for SrcSz to unbreak build-bots without assertions llvm-svn: 355101	2019-02-28 15:55:11 +00:00
Bjorn Pettersson	d30f308a9f	Add support for computing "zext of value" in KnownBits. NFCI Summary: The description of KnownBits::zext() and KnownBits::zextOrTrunc() has confusingly been telling that the operation is equivalent to zero extending the value we're tracking. That has not been true, instead the user has been forced to explicitly set the extended bits as known zero afterwards. This patch adds a second argument to KnownBits::zext() and KnownBits::zextOrTrunc() to control if the extended bits should be considered as known zero or as unknown. Reviewers: craig.topper, RKSimon Reviewed By: RKSimon Subscribers: javed.absar, hiraditya, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58650 llvm-svn: 355099	2019-02-28 15:45:29 +00:00
Stefan Pintilie	a073a18460	[PowerPC] Removed STATISTIC that was causing build errors. llvm-svn: 355087	2019-02-28 12:40:28 +00:00
Stefan Pintilie	bd5429ef38	[PowerPC] Move the stack pointer update instruction later in the prologue and earlier in the epilogue. Move the stdu instruction in the prologue and epilogue. This should provide a small performance boost in functions that are able to do this. I've kept this change rather conservative at the moment and functions with frame pointers or base pointers will not try to move the stack pointer update. Differential Revision: https://reviews.llvm.org/D42590 llvm-svn: 355085	2019-02-28 12:23:28 +00:00
Simon Pilgrim	134bc19079	[X86][AVX] Remove superfluous insert_subvector(zero, bitcast(x)) -> bitcast(insert_subvector(zero, x)) fold This is caught by other existing bitcast folds. llvm-svn: 355084	2019-02-28 11:39:52 +00:00
Diana Picus	cf0ff638bc	[ARM GlobalISel] Make arm_i32imm an IntImmLeaf This gets rid of some duplication in the TableGen definition, but it forces us to keep both a pointer and a reference to the subtarget in the ARMInstructionSelector. That is pretty ugly but it might be a reasonable trade-off, since the TableGen descriptions should outlive the code in the selector (or in the worst case we can update to use just the reference when we get rid of DAGISel). Differential Revision: https://reviews.llvm.org/D58031 llvm-svn: 355083	2019-02-28 11:13:05 +00:00
Simon Pilgrim	87aeff8bbb	[X86][AVX] Fold vf64 concat_vectors(movddup(x),movddup(x)) -> broadcast(x) llvm-svn: 355078	2019-02-28 10:53:58 +00:00
Diana Picus	3b7beafc77	[ARM GlobalISel] Support global variables for Thumb2 Add the same level of support as for ARM mode (i.e. still no TLS support). In most cases, it is sufficient to replace the opcodes with the t2-equivalent, but there are some idiosyncrasies that I decided to preserve because I don't understand the full implications: * For ARM we use LDRi12 to load from constant pools, but for Thumb we use t2LDRpci (I'm not sure if the ideal would be to use t2LDRi12 for Thumb as well, or to use LDRcp for ARM). * For Thumb we don't have an equivalent for MOV\|LDRLIT_ga_pcrel_ldr, so we have to generate MOV\|LDRLIT_ga_pcrel plus a load from GOT. The tests are in separate files because they're hard enough to read even without doubling the number of checks. llvm-svn: 355077	2019-02-28 10:42:47 +00:00
Craig Topper	6ca7398a1e	[X86] Use PreprocessISelDAG to convert vector sra/srl/shl to the X86 specific variable shift ISD opcodes. These allows use to use the same set of isel patterns for sra/srl/shl which are undefined for out of range shifts and intrinsic shifts which aren't undefined. Doing this late allows DAG combine to have every opportunity to optimize the sra/srl/shl nodes. This removes about 7000 bytes from the isel table and simplies the td files. llvm-svn: 355071	2019-02-28 07:21:26 +00:00
Craig Topper	240315aa64	[X86] Use X86::LAST_VALID_COND instead of assuming X86::COND_S is the last encoding. NFC llvm-svn: 355059	2019-02-28 01:00:31 +00:00
Matt Arsenault	09a09ef8b7	AMDGPU: Fix typo llvm-svn: 355056	2019-02-28 00:52:33 +00:00
Matt Arsenault	5d567dc137	AMDGPU: Enable function calls by default Fixes some crashes on illegal call situations which are unfortunately still valid IR. llvm-svn: 355051	2019-02-28 00:40:32 +00:00
Abderrazek Zaafrani	2fc498a652	[AArch64] Generate FP16 vector compare instructions. https://reviews.llvm.org/D58561 llvm-svn: 355050	2019-02-28 00:31:38 +00:00
Matt Arsenault	aa03bcd23c	AMDGPU: Fix crashes in invalid call cases We have to at least tolerate calls to kernels, possibly with a mismatched calling convention on the callsite. llvm-svn: 355049	2019-02-28 00:28:44 +00:00
Matt Arsenault	d3093c2f1f	GlobalISel: Implement fewerElementsVector for phi llvm-svn: 355048	2019-02-28 00:16:32 +00:00
Matt Arsenault	72bcf15dbf	GlobalISel: Implement moreElementsVector for phi llvm-svn: 355047	2019-02-28 00:01:05 +00:00
Joerg Sonnenberger	6a198366a0	Default to Secure PLT on PPC for NetBSD and OpenBSD. This matches the default settings of clang. llvm-svn: 355038	2019-02-27 21:53:14 +00:00
Philip Reames	288a95fc8c	Seperate volatility and atomicity/ordering in SelectionDAG At the moment, we mark every atomic memory access as being also volatile. This is unnecessarily conservative and prohibits many legal transforms (DCE, folding, etc..). This patch removes MOVolatile from the MachineMemOperands of atomic, but not volatile, instructions. This should be strictly NFC after a series of previous patches which have gone in to ensure backend code is conservative about handling of isAtomic MMOs. Once it's in and baked for a bit, we'll start working through removing unnecessary bailouts one by one. We applied this same strategy to the middle end a few years ago, with good success. To make sure this patch itself is NFC, it is build on top of a series of other patches which adjust code to (for the moment) be as conservative for an atomic access as for a volatile access and build up a test corpus (mostly in test/CodeGen/X86/atomics-unordered.ll).. Previously landed D57593 Fix a bug in the definition of isUnordered on MachineMemOperand D57596 [CodeGen] Be conservative about atomic accesses as for volatile D57802 Be conservative about unordered accesses for the moment rL353959: [Tests] First batch of cornercase tests for unordered atomics. rL353966: [Tests] RMW folding tests w/unordered atomic operations. rL353972: [Tests] More unordered atomic lowering tests. rL353989: [SelectionDAG] Inline a single use helper function, and remove last non-MMO interface rL354740: [Hexagon, SystemZ] Be super conservative about atomics rL354800: [Lanai] Be super conservative about atomics rL354845: [ARM] Be super conservative about atomics Attention Out of Tree Backend Owners: This patch may break you. If it does, you can use the TLI getMMOFlags hook to restore the MOVolatile to any instruction you need to. (See llvm-dev thread titled "PSA: Changes to how atomics are handled in backends" started Feb 27, 2019.) Differential Revision: https://reviews.llvm.org/D57601 llvm-svn: 355025	2019-02-27 20:20:08 +00:00
Simon Pilgrim	1001a6ab03	[X86][AVX] Pull out some INSERT_SUBVECTOR combines into a combineConcatVectorOps helper. NFCI A lot of the INSERT_SUBVECTOR combines can be more generally handled as if they have come from a CONCAT_VECTORS node. I've been investigating adding a CONCAT_VECTORS combine to X86, but this is a much easier first step that avoids the issue of handling a number of pre-legalization issues that I've encountered. Differential Revision: https://reviews.llvm.org/D58583 llvm-svn: 355015	2019-02-27 18:46:32 +00:00
Dmitry Preobrazhensky	7904231edb	[AMDGPU][MC] Added register size check for VOP3/SDWA/DPP operands See bug 37943: https://bugs.llvm.org/show_bug.cgi?id=37943 Reviewers: artem.tamazov, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D58287 llvm-svn: 354974	2019-02-27 13:58:48 +00:00
Dmitry Preobrazhensky	ef92035827	[AMDGPU][MC][GFX8+] Added syntactic sugar for 'vgpr index' operand of instructions s_set_gpr_idx_on and s_set_gpr_idx_mode See bug 39331: https://bugs.llvm.org/show_bug.cgi?id=39331 Reviewers: artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D58288 llvm-svn: 354969	2019-02-27 13:12:12 +00:00
Simon Pilgrim	71bb6850cf	[X86][AVX] Only combine loads to broadcasts for legal types Thanks to @echristo for spotting this. llvm-svn: 354961	2019-02-27 11:17:25 +00:00
Yonghong Song	cc290a9e91	[BPF] Don't fail for static variables Currently, the LLVM will print an error like Unsupported relocation: try to compile with -O2 or above, or check your static variable usage if user defines more than one static variables in a single ELF section (e.g., .bss or .data). There is ongoing effort to support static and global variables in libbpf and kernel. This patch removed the assertion so user programs with static variables won't fail compilation. The static variable in-section offset is written to the "imm" field of the corresponding to-be-relocated bpf instruction. Below is an example to show how the application (e.g., libbpf) can relate variable to relocations. -bash-4.4$ cat g1.c static volatile long a = 2; static volatile int b = 3; int test() { return a + b; } -bash-4.4$ clang -target bpf -O2 -c g1.c -bash-4.4$ llvm-readelf -r g1.o Relocation section '.rel.text' at offset 0x158 contains 2 entries: Offset Info Type Symbol's Value Symbol's Name 0000000000000000 0000000400000001 R_BPF_64_64 0000000000000000 .data 0000000000000018 0000000400000001 R_BPF_64_64 0000000000000000 .data -bash-4.4$ llvm-readelf -s g1.o Symbol table '.symtab' contains 6 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS g1.c 2: 0000000000000000 8 OBJECT LOCAL DEFAULT 4 a 3: 0000000000000008 4 OBJECT LOCAL DEFAULT 4 b 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 5: 0000000000000000 64 FUNC GLOBAL DEFAULT 2 test -bash-4.4$ llvm-objdump -d g1.o g1.o: file format ELF64-BPF Disassembly of section .text: 0000000000000000 test: 0: 18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0 ll 2: 79 11 00 00 00 00 00 00 r1 = (u64 )(r1 + 0) 3: 18 02 00 00 08 00 00 00 00 00 00 00 00 00 00 00 r2 = 8 ll 5: 61 20 00 00 00 00 00 00 r0 = (u32 )(r2 + 0) 6: 0f 10 00 00 00 00 00 00 r0 += r1 7: 95 00 00 00 00 00 00 00 exit -bash-4.4$ . from symbol table, static variable "a" is in section #4, offset 0. . from symbol table, static variable "b" is in section #4, offset 8. . the first relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 0 (see llvm-objdump result) . the second relocation is against symbol #4: 4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 and in-section offset 8 (see llvm-objdump result) . therefore, the first relocation is for variable "a", and the second relocation is for variable "b". Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> llvm-svn: 354954	2019-02-27 05:36:15 +00:00
Heejin Ahn	82da1ffc16	[WebAssembly] Fix ScopeTops info in CFGStackify for EH pads Summary: When creating `ScopeTops` info for `try` ~ `catch` ~ `end_try`, we should create not only `end_try` -> `try` mapping but also `catch` -> `try` mapping as well. If this is not created, `block` and `end_block` markers later added may span across an existing `catch`, resulting in the incorrect code like: ``` try block --\| (X) catch \| end_block --\| end_try ``` Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58605 llvm-svn: 354945	2019-02-27 01:35:14 +00:00
Heejin Ahn	cf699b4534	[WebAssembly] Remove unnecessary instructions after TRY marker placement Summary: This removes unnecessary instructions after TRY marker placement. There are two cases: - `end`/`end_block` can be removed if they overlap with `try`/`end_try` and they have the same return types. - `br` right before `catch` that branches to after `end_try` can be deleted. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58591 llvm-svn: 354939	2019-02-27 00:50:53 +00:00
Jonas Paulsson	129826cd9f	[SystemZ] Pass regalloc hints to help Load-and-Test transformations. Since there is no "Load-and-Test-High" instruction, the 32 bit load of a register to be compared with 0 can only be implemented with LT if the virtual GRX32 register ends up in a low part (GR32 register). This patch detects these cases and passes the GR32 registers (low parts) as (soft) hints in getRegAllocationHints(). Review: Ulrich Weigand. llvm-svn: 354935	2019-02-27 00:18:28 +00:00
Stanislav Mekhanoshin	da1628eb67	[AMDGPU] Fixed hang during DAG combine SITargetLowering::reassociateScalarOps() does not touch constants so that DAGCombiner::ReassociateOps() does not revert the combine. However a global address is not a ConstantSDNode. Switched to the method used by DAGCombiner::ReassociateOps() itself to detect constants. Differential Revision: https://reviews.llvm.org/D58695 llvm-svn: 354926	2019-02-26 20:56:25 +00:00
Reid Kleckner	8fda7e15e6	[X86] Fix bug in vectorcall calling convention Original implementation can't correctly handle __m256 and __m512 types passed by reference through stack. This patch fixes it. Patch by Wei Xiao! Differential Revision: https://reviews.llvm.org/D57643 llvm-svn: 354921	2019-02-26 19:48:16 +00:00
Petar Avramovic	bd39569913	[MIPS GlobalISel] Select G_UADDO Lower G_UADDO. Legalize G_UADDO for MIPS32 Differential Revision: https://reviews.llvm.org/D58671 llvm-svn: 354900	2019-02-26 17:22:42 +00:00
Ganesh Gopalasubramanian	e172d7008d	[X86] AMD znver2 enablement This patch enables the following 1) AMD family 17h "znver2" tune flag (-march, -mcpu). 2) ISAs that are enabled for "znver2" architecture. 3) For the time being, it uses the znver1 scheduler model. 4) Tests are updated. 5) Scheduler descriptions are yet to be put in place. Reviewers: craig.topper Differential Revision: https://reviews.llvm.org/D58343 llvm-svn: 354897	2019-02-26 16:55:10 +00:00
Jonas Paulsson	c110b5b69f	[SystemZ] Wait with selection of legal vector/FP constants until Select(). This patch aims to make sure that any such constant that can be generated with a vector instruction (for example VGBM) is recognized as such during legalization and kept as a target independent node through post-legalize DAGCombining. Two new functions named isVectorConstantLegal() and loadVectorConstant() replace old ways of handling vector/FP constants. A new struct named SystemZVectorConstantInfo is used to cache the results of isVectorConstantLegal() and pass them onto loadVectorConstant(). Support for fp128 constants in the presence of FeatureVectorEnhancements1 (z14) has been added. Review: Ulrich Weigand https://reviews.llvm.org/D58270 llvm-svn: 354896	2019-02-26 16:47:59 +00:00
Simon Atanasyan	8cb497027d	[mips] Emit `.module softfloat` directive This change fixes crash on an assertion in case of using `soft float` ABI for mips32r6 target. llvm-svn: 354882	2019-02-26 14:45:17 +00:00
Igor Kudrin	2d3faad706	[llvm-objdump] Implement -Mreg-names-raw/-std options. The --disassembler-options, or -M, are used to customize the disassembler and affect its output. The two implemented options allow selecting register names on ARM: * With -Mreg-names-raw, the disassembler uses rNN for all registers. * With -Mreg-names-std it prints sp, lr and pc for r13, r14 and r15, which is the default behavior of llvm-objdump. Differential Revision: https://reviews.llvm.org/D57680 llvm-svn: 354870	2019-02-26 12:15:14 +00:00
Luke Cheeseman	9e285bef2b	[ARM] Add Cortex-M35P - Add LLVM backend support for Cortex-M35P - Documentation can be found at https://developer.arm.com/products/processors/cortex-m/cortex-m35p Differentail Revision: https://reviews.llvm.org/D57763 llvm-svn: 354868	2019-02-26 12:02:12 +00:00
Dan Gohman	c71132c0be	[WebAssembly] Properly align fp128 arguments in outgoing varargs arguments For outgoing varargs arguments, it's necessary to check the OrigAlign field of the corresponding OutputArg entry to determine argument alignment, rather than just computing an alignment from the argument value type. This is because types like fp128 are split into multiple argument values, with narrower types that don't reflect the ABI alignment of the full fp128. This fixes the printf("printfL: %4.*Lf\n", 2, lval); testcase. Differential Revision: https://reviews.llvm.org/D58656 llvm-svn: 354846	2019-02-26 05:20:19 +00:00
Philip Reames	38b14e33a8	[ARM] Be super conservative about atomics As requested during review of D57601 <https://reviews.llvm.org/D57601> https://reviews.llvm.org/D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Differential Revision: https://reviews.llvm.org/D58490 Note: D58498 landed in several pieces as individual backends were approved. This is the last chunk. llvm-svn: 354845	2019-02-26 04:30:33 +00:00
Heejin Ahn	d2a56ac661	[WebAssembly] Fix a bug deleting instruction in a ranged for loop Summary: We shouldn't delete elements while iterating a ranged for loop. Reviewers: dschuff Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58519 llvm-svn: 354844	2019-02-26 04:08:49 +00:00
Reid Kleckner	2f055f026a	[X86] Fix bug in x86_intrcc with arg copy elision Summary: Use a custom calling convention handler for interrupts instead of fixing up the locations in LowerMemArgument. This way, the offsets are correct when constructed and we don't need to account for them in as many places. Depends on D56883 Replaces D56275 Reviewers: craig.topper, phil-opp Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D56944 llvm-svn: 354837	2019-02-26 02:11:25 +00:00
Matt Arsenault	752579736e	RegBankSelect: Handle slightly more complex value mappings Try to use concat_vectors. Also remove unnecessary assert on pointers. Fixes asserting for <4 x s16> operations and 64-bit pointers for AMDGPU. llvm-svn: 354828	2019-02-25 22:24:13 +00:00
Matt Arsenault	f4bfe4cd17	AMDGPU/GlobalISel: Fix bit ops for non-power-of-2 sizes llvm-svn: 354825	2019-02-25 21:32:48 +00:00
Matt Arsenault	82b103998b	AMDGPU/GlobalISel: Clamp max implicit_def elements llvm-svn: 354818	2019-02-25 20:46:06 +00:00
Matt Arsenault	f97ace5639	AMDGPU: Remove IntrReadMem from memtime/memrealtime intrinsics EarlyCSE with MemorySSA was able to use this to merge multiple calls with no intervening store. llvm-svn: 354814	2019-02-25 20:16:11 +00:00
Craig Topper	316c58e8f1	[X86] Improve detection of unneeded shift amount masking to also handle the case that the LHS has known zeroes in it If the LHS has known zeros, the RHS immediate will have had bits removed. So call computeKnownBits to get the known zeroes so we can handle this case. Differential Revision: https://reviews.llvm.org/D58475 llvm-svn: 354811	2019-02-25 19:42:47 +00:00
Matt Arsenault	fd6fd00773	AMDGPU: Correct definitions for bitset instructions These really read and write the result register, so these need a tied input. llvm-svn: 354809	2019-02-25 19:24:46 +00:00
Nikita Popov	fcbd7f6495	[Mips] Fix missing masking in fast-isel of br (PR40325) Fixes https://bugs.llvm.org/show_bug.cgi?id=40325 by zero extending (and x, 1) the condition before branching on it. To avoid regressing trivial cases, I'm combining emission of cmp+br sequences for the single-use + same block case (similar to what we do in x86). icmpbr1.ll still regresses due to the cross-bb usage of the condition. Differential Revision: https://reviews.llvm.org/D58576 llvm-svn: 354808	2019-02-25 18:54:17 +00:00
Amara Emerson	6bcfa1c419	[AArch64][GlobalISel] Refactor selectBuildVector to use MachineIRBuilder. NFC. This is a preparatory change as I want to use emitScalarToVector() elsewhere, and in general we want to transition to MIRBuilder instead of using BuildMI directly. Differential Revision: https://reviews.llvm.org/D58528 llvm-svn: 354807	2019-02-25 18:52:54 +00:00
Philip Reames	a64de6720b	[Lanai] Be super conservative about atomics As requested during review of D57601 <https://reviews.llvm.org/D57601>, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review. llvm-svn: 354800	2019-02-25 17:36:10 +00:00
David Green	b504f104b2	[ARM] Add some more missing T1 opcodes for the peephole optimisier This adds a few extra Thumb1 opcodes to improve the peephole opimisers ability to remove redundant cmp instructions. tADC and tSBC require a small fixup to prevent MOVS being moved past the instruction, giving the wrong flags. Differential Revision: https://reviews.llvm.org/D58281 llvm-svn: 354791	2019-02-25 15:50:54 +00:00
Luke Cheeseman	59f77e7891	[AArch64] Add support for Cortex-A76 and Cortex-A76AE - Add LLVM backend support for Cortex-A76 and Cortex-A76AE - Documentation can be found at https://developer.arm.com/products/processors/cortex-a/cortex-a76 llvm-svn: 354788	2019-02-25 15:08:27 +00:00
Simon Pilgrim	c61f1e8e6c	[X86] Merge ISD::ADD/SUB nodes into X86ISD::ADD/SUB equivalents (PR40483) Avoid ADD/SUB instruction duplication by reusing the X86ISD::ADD/SUB results. Includes ADD commutation - I tried to include NEG+SUB SUB commutation as well but this causes regressions as we don't have good combine coverage to simplify X86ISD::SUB. Differential Revision: https://reviews.llvm.org/D58597 llvm-svn: 354771	2019-02-25 11:19:37 +00:00
Simon Tatham	b70fc0c5fd	[ARM] Make fullfp16 instructions not conditionalisable. More or less all the instructions defined in the v8.2a full-fp16 extension are defined as UNPREDICTABLE if you put them in an IT block (Thumb) or use with any condition other than AL (ARM). LLVM didn't know that, and was happy to conditionalise them. In order to force these instructions to count as not predicable, I had to make a small Tablegen change. The code generation back end mostly decides if an instruction was predicable by looking for something it can identify as a predicate operand; there's an isPredicable bit flag that overrides that check in the positive direction, but nothing that overrides it in the negative direction. (I considered the alternative approach of actually removing the predicate operand from those instructions, but thought that it would be more painful overall for instructions differing only in data type to have different shapes of operand list. This way, the only code that has to notice the difference is the if-converter.) So I've added an isUnpredicable bit alongside isPredicable, and set that bit on the right subset of FP16 instructions, and also on the VSEL, VMAXNM/VMINNM and VRINT[ANPM] families which should be unpredicable for all data types. I've included a couple of representative regression tests, both of which previously caused an fp16 instruction to be conditionalised in ARM state and (with -arm-no-restrict-it) to be put in an IT block in Thumb. Reviewers: SjoerdMeijer, t.p.northover, efriedma Reviewed By: efriedma Subscribers: jdoerfert, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57823 llvm-svn: 354768	2019-02-25 10:39:53 +00:00
Kang Zhang	4faa4090c9	[PowerPC] [PowerPC] Enhance the fast selection of fptoi & fptrunc instruction and clean up related asserts Summary: Fast selection of llvm fptoi & fptrunc instructions is not handled well about VSX instruction support. We'd use VSX float convert integer instruction instead of non-vsx float convert integer instruction if the operand register class is VSSRC or VSFRC because i32 and i64 are mapped to VSSRC and VSFRC correspondingly if VSX feature is openeded. For float trunc instruction, we do this silimar work like float convert integer instruction to try to use VSX instruction. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D58430 llvm-svn: 354762	2019-02-25 02:46:16 +00:00
Simon Pilgrim	cfaf663a35	[X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637) Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step. llvm-svn: 354757	2019-02-24 19:57:52 +00:00
Craig Topper	3fe4bd464c	[X86] Fix tls variable lowering issue with large code model Summary: The problem here is the lowering for tls variable. Below is the DAG for the code. SelectionDAG has 11 nodes: t0: ch = EntryToken t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64 t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 t12: i64 = add t8, t11 t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64 t6: ch = CopyToReg t0, Register:i32 %0, t4 And when mcmodel is large, below instruction can NOT be folded. t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10] t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64 So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0" When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails. The patch is to fold the load and X86ISD::WrapperRIP. Fixes PR26906 Patch by LuoYuanke Reviewers: craig.topper, rnk, annita.zhang, wxiao3 Reviewed By: rnk Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58336 llvm-svn: 354756	2019-02-24 19:33:37 +00:00
Craig Topper	5532a98737	[X86][SSE] Use pblendw for v4i32/v2i64 during isel. Summary: Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58574 llvm-svn: 354755	2019-02-24 19:23:41 +00:00
Craig Topper	ce2bd19c49	[X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake. Summary: The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate. The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate. Reviewers: RKSimon, andreadb Reviewed By: RKSimon Subscribers: gbedwell, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58581 llvm-svn: 354754	2019-02-24 19:23:39 +00:00
Craig Topper	be3348573e	[LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents Summary: When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it. We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used. Reviewers: spatel, RKSimon, nikic Reviewed By: nikic Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58567 llvm-svn: 354753	2019-02-24 19:23:36 +00:00
Simon Pilgrim	4f4f9abdfa	[X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC. Name better matches the other similar 'lane permute' and 'repeated mask' functions we have. llvm-svn: 354749	2019-02-24 17:30:06 +00:00
Heejin Ahn	20cf0749cb	[WebAssembly] Rename a variable in CFGStackify (NFC) llvm-svn: 354744	2019-02-24 08:30:06 +00:00
Heejin Ahn	25d924b41f	[WebAssembly] Merge two identical switch case routines into one (NFC) llvm-svn: 354743	2019-02-24 08:19:55 +00:00
Philip Reames	33d7e49bb7	[Hexagon, SystemZ] Be super conservative about atomics As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that. Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review. llvm-svn: 354740	2019-02-24 00:45:09 +00:00
Craig Topper	be9eeb5526	Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" And its follow ups r354511, r354640. A follow patch will fix the issue that caused it to be reverted. llvm-svn: 354737	2019-02-23 21:41:42 +00:00
Craig Topper	ccc860cb81	Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract" r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit." These were reverted in r354713 as their context depended on other patches that were reverted for a bug. llvm-svn: 354734	2019-02-23 19:51:32 +00:00
Nikita Popov	e661f946a7	[WebAssembly] Fix select of and (PR40805) Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by patterns added in D53676. I'm removing the patterns entirely here, as they are not correct in the general case. If necessary something more specific can be added in the future. Differential Revision: https://reviews.llvm.org/D58575 llvm-svn: 354733	2019-02-23 18:59:01 +00:00
Simon Pilgrim	f383a47b7d	[X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x) D58053/rL354340 added this to EltsFromConsecutiveLoads directly llvm-svn: 354732	2019-02-23 18:53:03 +00:00
Simon Pilgrim	398d0b9e96	Fix MSVC constant truncation warnings. NFCI. llvm-svn: 354731	2019-02-23 18:49:02 +00:00
Simon Pilgrim	e08f177ea2	[X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x) For AVX1, limit this to i32/f32/i64/f64 loading cases only. llvm-svn: 354730	2019-02-23 18:34:05 +00:00
Simon Pilgrim	31793733a0	[X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD. llvm-svn: 354729	2019-02-23 17:10:47 +00:00
Craig Topper	75afc0105c	[X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel. Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t. This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost. The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results. llvm-svn: 354724	2019-02-23 08:34:10 +00:00
Jordan Rupprecht	6387fa2715	[NFC] Fix typos: preceeding -> preceding llvm-svn: 354715	2019-02-23 01:28:32 +00:00
Reid Kleckner	e3876637cf	Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types" r354363 caused https://crbug.com/934963#c1, which has a plain C reduced test case. I also had to revert some dependent changes: - r354648 - r354647 - r354640 - r354511 llvm-svn: 354713	2019-02-23 01:19:42 +00:00
Craig Topper	a9697f24cf	[X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits. If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register. Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization. llvm-svn: 354709	2019-02-23 00:35:02 +00:00
Konstantin Zhuravlyov	9a278bf6b5	Revert "AMDGPU/NFC: Cleanup subtarget predicates" It breaks one of our downstream merges, so revert it temporarily while investigating failures downstream llvm-svn: 354700	2019-02-22 23:21:06 +00:00
Sam Clegg	8fffa1dfa3	[WebAssembly] Remove unneeded MCSymbolRefExpr variants We record the type of the symbol (event/function/data/global) in the MCWasmSymbol and so it should always be clear how to handle a relocation based on the symbol itself. The exception is a function which still needs the special @TYPEINDEX then the relocation contains the signature rather than the address of the functions. Differential Revision: https://reviews.llvm.org/D58472 llvm-svn: 354697	2019-02-22 22:29:34 +00:00
Matt Arsenault	476e26b5d3	AMDGPU: Use removeAllRegUnitsForPhysReg llvm-svn: 354686	2019-02-22 19:03:36 +00:00
Sam Clegg	a5e68748bf	[WebAssembly] Remove debug statement submitted in rL354657 Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58549 llvm-svn: 354684	2019-02-22 19:00:03 +00:00
Sanjay Patel	a9e289174a	[x86] allow narrowing of vector UINT_TO_FP As discussed in: D56864 D58197 Always use the narrow (128-bit) instruction when possible. We already had the signed int version of this transform. llvm-svn: 354675	2019-02-22 15:47:45 +00:00
Sanjay Patel	1baf7896cc	[x86] simplify code in combineExtractSubvector; NFC Only the 1st fold is attempted pre-legalization, but it requires legal (simple) types too, so we don't need an EVT in any of the code. llvm-svn: 354674	2019-02-22 15:28:22 +00:00
Petar Jovanovic	6083106b12	[mips][micromips] fix filling delay slots for PseudoIndirectBranch_MM Filling a delay slot in 32bit jump instructions with a 16bit instruction can cause issues. According to the documentation such an operation is unpredictable. This patch adds opcode Mips::PseudoIndirectBranch_MM alongside Mips::PseudoIndirectBranch and other instructions that are expanded to jr instruction and do not allow a 16bit instruction in their delay slots. Patch by Mirko Brkusanin. Differential Revision: https://reviews.llvm.org/D58507 llvm-svn: 354672	2019-02-22 14:53:58 +00:00
David Green	acb628b2af	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Reapplying this after the first attempt broke non-thumb1 code as the t2ADDri instruction can be used with frame indices. In thumb1 we use tADDframe. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354667	2019-02-22 12:23:31 +00:00
Diana Picus	35e1c6663c	[ARM GlobalISel] Support floating point for Thumb2 This is exactly the same as arm mode, so for the instruction selector tests we just extract them to a new file and run with the same checks for both arm and thumb mode. For the legalizer we need to update the tests for soft float a bit, but only because BL and tBL are slightly different. We could be pedantic and check that we get a well-formed BL for arm mode and a tBL for thumb, but for the purposes of the legalizer test it's sufficient to just skip over the predicate operands in the checks. Also note that we have the pedantic checks in the divmod test, so we're covered. llvm-svn: 354665	2019-02-22 09:54:54 +00:00
Heejin Ahn	85631d8b50	[WebAssembly] Remove getBottom function from CFGStackify (NFC) Summary: This removes `getBottom` function and the bookeeping map of <begin marker instruction, bottom BB>. Reviewers: dschuff Subscribers: sunfish, sbc100, jgravelle-google, jdoerfert, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58319 llvm-svn: 354657	2019-02-22 07:19:30 +00:00
Craig Topper	3a391fc0e8	[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit. Type legalization is causing two nodes to be created here, but we can use a single node to extend from v8i16 to v2i64. llvm-svn: 354648	2019-02-22 01:49:53 +00:00
Craig Topper	427404c769	[X86] Fix some copy/paste mistakes that caused a VR128 to be used as the address of a load in an isel pattern This was introduced in r354511. Fixes PR40811. llvm-svn: 354640	2019-02-22 00:04:35 +00:00
Matt Arsenault	aa6fb4c45e	AMDGPU: Remove debugger related subtarget features As far as I know these aren't needed anymore. llvm-svn: 354634	2019-02-21 23:27:46 +00:00
Craig Topper	2b34fdc67f	[X86] Remove hasSideEffects=1 from the X87 pseudos with folded load. This was done in r321424 to prevent scheduling from reordering things. But now that we model FPCW as a dependency, I don't think the same scheduling we were trying to prevent can occur. llvm-svn: 354628	2019-02-21 22:00:15 +00:00
Konstantin Zhuravlyov	c2650178a1	AMDGPU/NFC: Cleanup subtarget predicates Differential Revision: https://reviews.llvm.org/D58522 llvm-svn: 354620	2019-02-21 20:43:43 +00:00
Sanjay Patel	234a5e8ea4	[x86] vectorize more cast ops in lowering to avoid register file transfers This is a follow-up to D56864. If we're extracting from a non-zero index before casting to FP, then shuffle the vector and optionally narrow the vector before doing the cast: cast (extelt V, C) --> extelt (cast (extract_subv (shuffle V, [C...]))), 0 This might be enough to close PR39974: https://bugs.llvm.org/show_bug.cgi?id=39974 Differential Revision: https://reviews.llvm.org/D58197 llvm-svn: 354619	2019-02-21 20:40:39 +00:00
Amara Emerson	1abe05c0dd	Re-land "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR"" Thanks to Richard Trieu for pointing out that the failures were due to a use-after-free of an ArrayRef. llvm-svn: 354616	2019-02-21 20:20:16 +00:00
Krzysztof Parzyszek	f6e875bacf	[Hexagon] Use misaligned load instead of trap0(#0 ) for __builtin_trap The trap instruction is intercepted by various runtime environments, and instead of a crash it creates confusion. This reapplies r354606 with a fix. llvm-svn: 354611	2019-02-21 19:42:39 +00:00
Krzysztof Parzyszek	948c9f93c4	Revert r354606, it breaks asan tests llvm-svn: 354609	2019-02-21 19:33:58 +00:00
Krzysztof Parzyszek	5f47fac3a2	[Hexagon] Use misaligned load instead of trap0(#0 ) for __builtin_trap The trap instruction is intercepted by various runtime environments, and instead of a crash it creates confusion. llvm-svn: 354606	2019-02-21 18:39:22 +00:00
Mark Searles	599ce44d3f	[AMDGPU] remove unused AssemblerPredicates An internal build is hitting asserts complaining about too many subtarget features: llvm/utils/TableGen/Types.cpp:42: const char* llvm::getMinimalTypeForEnumBitfield(uint64_t): Assertion `MaxIndex <= 64 && "Too many bits"' failed. llvm/utils/TableGen/AsmMatcherEmitter.cpp:1476: void {anonymous}::AsmMatcherInfo::buildInfo(): Assertion `SubtargetFeatures.size() <= 64 && "Too many subtarget features!"' failed. The short-term solution is to remove a few unused AssemblerPredicates to get under the limit. The long-term solution seems to be to revisit these asserts. E.g., rather than hardcoded '64', use the standard sized std::bitset like the other places that track subtarget features. Differential Revision: https://reviews.llvm.org/D58516 llvm-svn: 354604	2019-02-21 18:19:54 +00:00
Matt Arsenault	2e0ee47712	AMDGPU/GlobalISel: Make phis legal llvm-svn: 354592	2019-02-21 15:48:13 +00:00
Nirav Dave	dce91c1edb	[X86] Fix copy-paste error in @ccz flag. @ccz operand should be equivalent to @cce. llvm-svn: 354588	2019-02-21 15:28:31 +00:00
Matt Arsenault	b10fa8df3f	AMDGPU/GlobalISel: Fix bit count ops for non-power-of-2 types llvm-svn: 354587	2019-02-21 15:22:20 +00:00
Alex Bradbury	db67be889d	[RISCV][NFC] IsEligibleForTailCallOptimization -> isEligibleForTailCallOptimization Also clang-format the modified hunks. llvm-svn: 354584	2019-02-21 14:31:41 +00:00
Alex Bradbury	047170cfc3	[RISCV] Add implied zero offset load/store alias patterns Allow load/store instructions with implied zero offset for compatibility with GNU assembler. Differential Revision: https://reviews.llvm.org/D57141 Patch by James Clarke. llvm-svn: 354581	2019-02-21 14:09:34 +00:00
Diana Picus	dcaa939ab7	[ARM GlobalISel] Support G_FRAME_INDEX for Thumb2 Same as arm mode. llvm-svn: 354579	2019-02-21 13:00:02 +00:00
Simon Pilgrim	e6b338cbef	[X86][SSE] combineX86ShufflesRecursively - moved to generic op input index lookup. NFCI. We currently bail if the target shuffle decodes to more than 2 input vectors, this change alters the input index to work for any number of inputs for when we drop that requirement. llvm-svn: 354575	2019-02-21 12:24:49 +00:00
David Green	7a183a86be	Revert 354564: [ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs I believe it's causing bootstrap failures for A32 code. I'll take a look at what's wrong. llvm-svn: 354569	2019-02-21 11:03:13 +00:00
David Spickett	8ac2b181a1	[AArch64] Print instruction before atomic semantic annotations Commit r353303 added annotations when acquire semantics were dropped from an instruction. printAnnotation was called before printInstruction. So if you didn't set a separate comment output stream you got <comment><instr> instead of <instr><comment> as expected. To fix this move the new printAnnotation to after the instruction is printed. Differential Revision: https://reviews.llvm.org/D58059 llvm-svn: 354565	2019-02-21 10:42:49 +00:00
David Green	89efe24eba	[ARM] Add some missing thumb1 opcodes to enable peephole optimisation of CMPs This adds a number of missing Thumb1 opcodes so that the peephole optimiser can remove redundant CMP instructions. Differential Revision: https://reviews.llvm.org/D57833 llvm-svn: 354564	2019-02-21 10:30:09 +00:00
Sam Parker	6ed47bee27	[ARM] Negative constants mishandled in ARM CGP During type promotion, sometimes we convert negative an add with a negative constant into a sub with a positive constant. The loop that performs this transformation has two issues: - it iterates over a set, causing non-determinism. - it breaks, instead of continuing, when it finds the first non-negative operand. Differential Revision: https://reviews.llvm.org/D58452 llvm-svn: 354557	2019-02-21 09:33:18 +00:00
Sam Clegg	1634516e35	[WebAssembly] Default to something reasonable in WebAssemblyAddMissingPrototypes Previously if we couldn't derive a prototype for a "no-prototype" function from C we would leave it as is: void foo(...) With this change we instead give is an empty signature and remove the "no-prototype" attribute. This fixes the current wasm waterfall test failure. Tags: #llvm Differential Revision: https://reviews.llvm.org/D58488 llvm-svn: 354544	2019-02-21 03:27:00 +00:00
Stanislav Mekhanoshin	42e229e130	[AMDGPU] fix commuted case of sub combine Differential Revision: https://reviews.llvm.org/D58481 llvm-svn: 354543	2019-02-21 02:58:00 +00:00
Amara Emerson	71f2a5e60f	Revert "[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR" This reverts r354521 because it broke the bots, but passes on Darwin somehow. llvm-svn: 354532	2019-02-21 00:31:13 +00:00
Sam Clegg	6028c969ac	[WebAssembly] Don't error on conflicting uses of prototype-less functions When we can't determine with certainty the signature of a function import we pick the fist signature we find rather than error'ing out. The resulting program might not do what is expected since we might pick the wrong signature. However since undefined behavior in C to use the same function with different signatures this seems better than refusing to compile such programs. Fixes PR40472 Differential Revision: https://reviews.llvm.org/D58304 llvm-svn: 354523	2019-02-20 22:40:57 +00:00
Amara Emerson	a946d057b4	[AArch64][GlobalISel] Implement partial support for G_SHUFFLE_VECTOR This change makes some basic type combinations for G_SHUFFLE_VECTOR legal, and implements them with a very pessimistic TBL2 instruction in the selector. For TBL2, support is also needed to generate constant pool entries and load from them in order to materialize the mask register. Currently supports <2 x s64> and <4 x s32> result types. Differential Revision: https://reviews.llvm.org/D58466 llvm-svn: 354521	2019-02-20 22:11:39 +00:00
Tom Stellard	79b5c3842b	AMDGPU/GlobalISel: Move SMRD selection logic to TableGen Reviewers: arsenm Reviewed By: arsenm Subscribers: volkan, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D52922 llvm-svn: 354516	2019-02-20 21:02:37 +00:00
Nikita Popov	c3b496de7a	[SDAG] Support vector UMULO/SMULO Second part of https://bugs.llvm.org/show_bug.cgi?id=40442. This adds an extra UnrollVectorOverflowOp() method to SDAG, because the general UnrollOverflowOp() method can't deal with multiple results. Additionally we need to expand UMULO/SMULO during vector op legalization, as it may result in unrolling, which may need additional type legalization. Differential Revision: https://reviews.llvm.org/D57997 llvm-svn: 354513	2019-02-20 20:41:44 +00:00
Craig Topper	31823fba2e	[X86] Add more load folding patterns for blend instructions as a follow up to r354363. This avoids depending on the peephole pass to do load folding. Also adds some load folding for some insert_subvector patterns that use blend. All of this was found by temporarily adding TB_NO_FORWARD to the blend immediate entries in the load folding tables. I've added -disable-peephole to some of the affected tests from that experiment to ensure we're testing isel patterns. llvm-svn: 354511	2019-02-20 20:18:20 +00:00
Simon Pilgrim	dca47c659c	[X86][SSE] combineX86ShufflesRecursively - begin generalizing the number of shuffle inputs. NFCI. We currently bail if the target shuffle decodes to more than 2 input vectors, this is some initial cleanup that still has the limit but generalizes the opindices to an array that will be necessary when we drop the limit. llvm-svn: 354489	2019-02-20 17:58:29 +00:00
Matt Arsenault	75e30c4d5d	GlobalISel: Fix fewerElementsVector for ctlz with different result type Also complete the set of related operations. llvm-svn: 354480	2019-02-20 16:42:52 +00:00
Matt Arsenault	c4d07554e4	GlobalISel: Implement moreElementsVector for g_insert results llvm-svn: 354477	2019-02-20 16:11:22 +00:00
Krzysztof Parzyszek	6128ac5a8f	[Hexagon] Split vector pairs for ISD::SIGN_EXTEND and ISD::ZERO_EXTEND llvm-svn: 354473	2019-02-20 15:05:19 +00:00
Petar Avramovic	dee5846b4a	[MIPS MSA] Avoid some DAG combines for vector shifts DAG combiner combines two shifts into shift + and with bitmask. Avoid such combines for vectors since leaving two vector shifts as they are produces better end results. Differential Revision: https://reviews.llvm.org/D58225 llvm-svn: 354461	2019-02-20 13:42:44 +00:00
Craig Topper	e4025c5eb1	[X86] Remove FeatureSlowIncDec from Sandy Bridge and later Intel Core CPUs Summary: Inc and Dec were at one point slow on Intel CPUs due to their tendency to cause partial flag stalls on P6 derived CPU cores. This is because these instructions are defined to preserve the carry flag. This partial flag stall issue persisted until Sandy Bridge when flag merging was changed to be handled as a data dependency instead of as a stall until retirement. Sandy Bridge and later CPUs rename the C flag separately from OSPAZ so there is no flag merge needed on INC/DEC to preserve the C flag. Given these improvements I don't know why INC/DEC was ever considered slow on Sandy Bridge. If anything they should have been disabled on the earlier CPUs instead. Note after this patch, INC/DEC are still considered slow on Silvermont, Goldmont, Knights Landing and our generic "x86-64" CPU. Reviewers: spatel, RKSimon, chandlerc Reviewed By: chandlerc Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D58412 llvm-svn: 354436	2019-02-20 05:39:11 +00:00
Eric Christopher	2534592b9f	Temporarily Revert "[X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)" As this has broken the lto bootstrap build for 3 days and is showing a significant regression on the Dither_benchmark results (from the LLVM benchmark suite) -- specifically, on the BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and BENCHMARK_FLOYD_DITHER_512; the others are unchanged. These have regressed by about 28% on Skylake, 34% on Haswell, and over 40% on Sandybridge. This reverts commit r353923. llvm-svn: 354434	2019-02-20 04:42:07 +00:00
Kito Cheng	303217e8b4	[RISCV] Implement pseudo instructions for load/store from a symbol address. Summary: Those pseudo-instructions are making load/store instructions able to load/store from/to a symbol, and its always using PC-relative addressing to generating a symbol address. Reviewers: asb, apazos, rogfer01, jrtc27 Differential Revision: https://reviews.llvm.org/D50496 llvm-svn: 354430	2019-02-20 03:31:32 +00:00
Chen Zheng	ffece2dfcf	[PowerPC] exploit P9 instruction maddld. Differential Revision: https://reviews.llvm.org/D58364 llvm-svn: 354427	2019-02-20 02:30:06 +00:00
Heejin Ahn	20ea1826f7	[WebAssembly] Refactor atomic operation definitions (NFC) Summary: - Make `ATOMIC_I`, `ATOMIC_NRI`, `AtomicLoad`, `AtomicStore` classes and make other operations inherit from them - Factor the common opcode prefix '0xfe' out from the opcodes into the common class - Reorder instructions in the order of increasing opcodes Reviewers: tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58338 llvm-svn: 354421	2019-02-20 01:29:34 +00:00
Heejin Ahn	3477bd12a0	[WebAssembly] Fix load/store name detection for atomic instructions Summary: Fixed a bug in the routine in AsmParser that determines whether the current instruction is a load or a store. Atomic instructions' prefixes are not `atomic_` but `atomic.`, and all atomic instructions are also memory instructions. Also fixed the printing format of atomic instructions to match other memory instructions and added encoding tests for atomic instructions. Reviewers: aardappel, tlively Subscribers: dschuff, sbc100, jgravelle-google, sunfish, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58337 llvm-svn: 354419	2019-02-20 01:14:36 +00:00
Wouter van Oortmerssen	8a28ce1a12	[WebAssembly] Fixed disassembler not knowing about OPERAND_EVENT Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D58414 llvm-svn: 354416	2019-02-20 00:55:59 +00:00
Thomas Lively	2e1504091e	[WebAssembly] Update MC for bulk memory Summary: Rename MemoryIndex to InitFlags and implement logic for determining data segment layout in ObjectYAML and MC. Also adds a "passive" flag for the .section assembler directive although this cannot be assembled yet because the assembler does not support data sections. Reviewers: sbc100, aardappel, aheejin, dschuff Subscribers: jgravelle-google, hiraditya, sunfish, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D57938 llvm-svn: 354397	2019-02-19 22:56:19 +00:00

... 4 5 6 7 8 ...

51455 Commits