llvm-project

Commit Graph

Author	SHA1	Message	Date
Craig Topper	c2e8a421ac	[X86] Don't widen 128/256-bit strict compares with vXi1 result to 512-bits on KNL. If we widen the compare we might trigger a spurious exception from the garbage data. We have two choices here. Explicitly force the upper bits to zero. Or use a legacy VEX vcmpps/pd instruction and convert the XMM/YMM result to mask register. I've chosen to go with the second option. I'm not sure which is really best. In some cases we could get rid of the zeroing since the producing instruction probably already zeroed it. But we lose the ability to fold a load. So which is best is dependent on surrounding code. Differential Revision: https://reviews.llvm.org/D74522	2020-02-13 13:26:40 -08:00
Fangrui Song	0dce409cee	[AsmPrinter] De-capitalize Emit{Function,BasicBlock]* and Emit{Start,End}OfAsmFile	2020-02-13 13:22:49 -08:00
Thomas Lively	e252293d06	[WebAssembly] Add cbrt function signatures Summary: Fixes a crash in the backend where optimizations produce calls to the cbrt runtime functions. Fixes PR 44227. Reviewers: aheejin Subscribers: dschuff, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74259	2020-02-13 13:18:42 -08:00
Matt Arsenault	5adbf7d57f	AMDGPU/GlobalISel: Make G_TRUNC legal This is required to be legal. I'm not sure how we were getting away without defining any rules for it.	2020-02-13 15:25:52 -05:00
Matt Arsenault	de256478e6	GlobalISel: Don't use LLT references These should always be passed by value	2020-02-13 15:25:30 -05:00
Frederic Bastien	019ab61e25	[NVPTX, LSV] Move the LSV optimization pass to later when the graph is cleaner This allow it to recognize more loads as being consecutive when the load's address are complex at the start. Differential Revision: https://reviews.llvm.org/D74444	2020-02-13 12:15:38 -08:00
Vedant Kumar	02b72f564c	Revert "Recommit "[SCCP] Remove forcedconstant, go to overdefined instead"" This reverts commit `bb310b3f73`. This breaks the stage2 ASan build, see: https://bugs.llvm.org/show_bug.cgi?id=44898 rdar://59431448	2020-02-13 11:55:18 -08:00
Greg Clayton	e8e97b28cd	Fix buildbots that create shared libraries from GSYM library by adding a dependency on LLVMDebugInfoDWARF.	2020-02-13 11:43:07 -08:00
Greg Clayton	22d63b6318	Fix buildbots by not using "and" and "not".	2020-02-13 11:35:43 -08:00
Greg Clayton	19602b7194	Add a DWARF transformer class that converts DWARF to GSYM. Summary: The DWARF transformer is added as a class so it can be unit tested fully. The DWARF is converted to GSYM format and handles many special cases for functions: - omit functions in compile units with 4 byte addresses whose address is UINT32_MAX (dead stripped) - omit functions in compile units with 8 byte addresses whose address is UINT64_MAX (dead stripped) - omit any functions whose high PC is <= low PC (dead stripped) - StringTable builder doesn't copy strings, so we need to make backing copies of strings but only when needed. Many strings come from sections in object files and won't need to have backing copies, but some do. - When a function doesn't have a mangled name, store the fully qualified name by creating a string by traversing the parent decl context DIEs and then. If we don't do this, we end up having cases where some function might appear in the GSYM as "erase" instead of "std::vector<int>::erase". - omit any functions whose address isn't in the optional TextRanges member variable of DwarfTransformer. This allows object file to register address ranges that are known valid code ranges and can help omit functions that should have been dead stripped, but just had their low PC values set to zero. In this case we have many functions that all appear at address zero and can omit these functions by making sure they fall into good address ranges on the object file. Many compilers do this when the DWARF has a DW_AT_low_pc with a DW_FORM_addr, and a DW_AT_high_pc with a DW_FORM_data4 as the offset from the low PC. In this case the linker can't write the same address to both the high and low PC since there is only a relocation for the DW_AT_low_pc, so many linkers tend to just zero it out. Reviewers: aprantl, dblaikie, probinson Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74450	2020-02-13 10:48:37 -08:00
Yuanfang Chen	4ad7685258	Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""" This reverts commit `80a34ae311` with fixes. Previously, since bots turning on EXPENSIVE_CHECKS are essentially turning on MachineVerifierPass by default on X86 and the fact that inline-asm-avx-v-constraint-32bit.ll and inline-asm-avx512vl-v-constraint-32bit.ll are not expected to generate functioning machine code, this would go down to `report_fatal_error` in MachineVerifierPass. Here passing `-verify-machineinstrs=0` to make the intent explicit.	2020-02-13 10:16:06 -08:00
Yuanfang Chen	17122ec10a	Revert "Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"""" This reverts commit `bb51d24330`.	2020-02-13 10:08:05 -08:00
Yuanfang Chen	bb51d24330	Revert "Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`""" This reverts commit `80a34ae311` with fixes. On bots llvm-clang-x86_64-expensive-checks-ubuntu and llvm-clang-x86_64-expensive-checks-debian only, llc returns 0 for these two tests unexpectedly. I tweaked the RUN line a little bit in the hope that LIT is the culprit since this change is not in the codepath these tests are testing. llvm\test\CodeGen\X86\inline-asm-avx-v-constraint-32bit.ll llvm\test\CodeGen\X86\inline-asm-avx512vl-v-constraint-32bit.ll	2020-02-13 10:02:53 -08:00
Nikita Popov	f0b57d8071	[MemorySSA] Don't verify MemorySSA unless VerifyMemorySSA enabled MemorySSA is often taking up an unreasonable fraction of runtime in assertion enabled builds. Turns out that there is one code-path that runs verifyMemorySSA() even if VerifyMemorySSA is not enabled. This patch makes it conditional as well. Differential Revision: https://reviews.llvm.org/D74505	2020-02-13 18:46:58 +01:00
Matt Arsenault	bfe3779459	AMDGPU: Use v_perm_b32 to implement bswap Also greatly improve i64 lowering. LegalizeIntegerTypes does the correct narrowing if i64 isn't legal. Just workaround this for SelectionDAG by making i64 legal and splitting in the patterns.	2020-02-13 09:45:31 -08:00
John Brawn	0ec5797296	[ARM] Fix infinite loop when lowering STRICT_FP_EXTEND If the target has FP64 but not FP16 then we have custom lowering for FP_EXTEND and STRICT_FP_EXTEND with type f64. However if the extend is from f32 to f64 the current implementation will cause in infinite loop for STRICT_FP_EXTEND due to emitting a merge_values of the original node which after replacement becomes a merge_values of itself. Fix this by not doing anything for f32 to f64 extend when we have FP64, though for STRICT_FP_EXTEND we have to do the strict-to-nonstrict mutation as that doesn't happen automatically for opcodes with custom lowering. Differential Revision: https://reviews.llvm.org/D74559	2020-02-13 16:12:50 +00:00
Sean Fertile	b2d1e002ca	[PowerPC][NFC] Small cleanup to restore CR field code in PPCFrameLowering. Skip the loop over the CalleSavedInfos in 'restoreCalleeSavedRegisters' when the register is a CR field and we are not targeting 32-bit ELF. This is safe because: 1) The helper function 'restoreCRs' returns if the target is not 32-bit ELF, making all the code in the loop related to CR fields dead for every other subtarget. This code is only called on ELF right now, but the patch to extend it for AIX also needs to skip 'restoreCRs'. 2) The loop will not otherwise modify the iterator, so the iterator manipulations at the bottom of the loop end up setting 'I' to its current value. This simplifciation allows us to remove one argument from 'restoreCRs'. Also add a helper function to determine if a register is one of the callee saved condition register fields.	2020-02-13 09:50:28 -05:00
Simon Pilgrim	32176133fa	Move FIXME to start of comment so visual studio actually tags it. NFC.	2020-02-13 14:28:50 +00:00
Qiu Chaofan	87c773082a	[PowerPC] Exploit VSX rounding instrs for rint Exploit native VSX rounding instruction, x(v\|s)r(d\|s)pic, which does rounding using current rounding mode. According to C standard library, rint may raise INEXACT exception while nearbyint won't. Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D72685	2020-02-13 20:59:50 +08:00
stozer	9bda7ab835	Re-revert: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `61b35e4111`. This commit causes a timeout in chromium builds; likely to have a similar cause to the previous timeout issue caused by this commit (see `6ded69f294` for more details). It is possible that there is no way to fix this bug that will not cause this issue; further investigations as to the efficiency of handling large amounts of debug info will be necessary.	2020-02-13 11:48:19 +00:00
Daniel Kiss	d5a186a600	[AArch64] Fix BTI landing pad generation. In some cases BTI landing pad is inserted even compatible instruction was there already. Meta instruction does not count in this case therefore skip them in the check for first instructions in the function. Differential revision: https://reviews.llvm.org/D74492	2020-02-13 10:44:34 +00:00
Kerry McLaughlin	671cbc1fbb	[AArch64][SVE] Add mul/mla/mls lane & dup intrinsics Summary: Implements the following intrinsics: - @llvm.aarch64.sve.dup - @llvm.aarch64.sve.mul.lane - @llvm.aarch64.sve.mla.lane - @llvm.aarch64.sve.mls.lane Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74222	2020-02-13 10:32:59 +00:00
David Green	9d4c597541	[ARM] Fix ReconstructShuffle for bigendian Simon pointed out that this function is doing a bitcast, which can be incorrect for big endian. That makes the lowering of VMOVN in MVE wrong, but the function is shared between Neon and MVE so both can be incorrect. This attempts to fix things by using the newly added VECTOR_REG_CAST instead of the BITCAST. As it may now be used on Neon, I've added the relevant patterns for it there too. I've also added a quick dag combine for it to remove them where possible. Differential Revision: https://reviews.llvm.org/D74485	2020-02-13 09:56:46 +00:00
Igor Kudrin	2ba4df6c11	[DebugInfo] Fix dumping CIE ID in .eh_frame sections. We do not keep the actual value of the CIE ID field, because it is predefined, and use a constant when dumping a CIE record. The issue was that the predefined value is different for .debug_frame and .eh_frame sections, but we always printed the one which corresponds to .debug_frame. The patch fixes that by choosing an appropriate constant to print. See the following for more information about .eh_frame sections: https://refspecs.linuxfoundation.org/LSB_5.0.0/LSB-Core-generic/LSB-Core-generic/ehframechpt.html Differential Revision: https://reviews.llvm.org/D73627	2020-02-13 15:42:14 +07:00
Johannes Doerfert	3f3ec9c40b	[OpenMP][FIX] Collect blocks to be outlined after finalization Finalization can introduce new blocks we need to outline as well so it makes sense to identify the blocks that need to be outlined after finalization happened. There was also a minor unit test adjustment to account for the fact that we have a single outlined exit block now.	2020-02-13 00:42:22 -06:00
Yonghong Song	61bd33e37b	[BPF] explicit warning of not supporting dynamic stack allocation Currently, BPF does not support dynamic static allocation. For a program like below: extern void bar(int *); void foo(int n) { int a[n]; bar(a); } The current error message looks like: unimplemented operand UNREACHABLE executed at /.../llvm/lib/Target/BPF/BPFISelLowering.cpp:199! Let us make error message explicit so it will be clear to the user what is the problem. With this patch, the error message looks like: fatal error: error in backend: Unsupported dynamic stack allocation ... Differential Revision: https://reviews.llvm.org/D74521	2020-02-12 20:43:06 -08:00
Johannes Doerfert	70cac41a2b	Reapply "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late" Reapply `8a56d64d76` with minor fixes. The problem was that cancellation can cause new edges to the parallel region exit block which is not outlined. The CodeExtractor will encode the information which "exit" was taken as a return value. The fix is to ensure we do not return any value from the outlined function, to prevent control to value conversion we ensure a single exit block for the outlined region. This reverts commit `3aac953afa`.	2020-02-12 22:29:07 -06:00
Serguei Katkov	a6f38b4697	[Statepoint] Remove redundant clear of call target on register Patchable statepoint is lowered into sequence of nops, so zeroed call target should not be on register. It is better to use getTargetConstant instead of getConstant to select zero constant for call target. Reviewers: reames Reviewed By: reames Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D74465	2020-02-13 10:25:50 +07:00
Austin Kerbow	5db0b2521c	[AMDGPU][GlobalISel] Handle 64byte EltSIze in getRegSplitParts Reviewers: arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, rovka, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74518	2020-02-12 19:11:52 -08:00
Fangrui Song	c662795b07	[AsmPrinter][ELF] Emit local alias for ExternalLinkage dso_local GlobalAlias	2020-02-12 17:08:22 -08:00
Amy Huang	de1d90299b	Revert "[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets" This reverts commit `11c16e7159` because it causes a crash in chromium code. See https://reviews.llvm.org/rG11c16e71598d51f15b4cfd0f719c4dabcc0bebf7.	2020-02-12 17:00:37 -08:00
Johannes Doerfert	3aac953afa	Revert "[OpenMP][IRBuilder] Perform finalization (incl. outlining) late" This reverts commit `8a56d64d76`. Will be recommitted once the clang test problem is addressed.	2020-02-12 18:50:43 -06:00
Matt Arsenault	d1b393d92c	AMDGPU/GlobalISel: Select G_CTTZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:46 -08:00
Matt Arsenault	045a8921d7	AMDGPU/GlobalISel: Select G_CTLZ_ZERO_UNDEF Directly select this rather than going through the intermediate instruction, which may provide some combine value in the future.	2020-02-12 16:19:45 -08:00
Matt Arsenault	e174c278ca	AMDGPU/GlobalISel: Fix mapping G_ICMP with constrained result When SI_IF is inserted, it constrains the source register with a register class, which was quite likely a G_ICMP. This was incorrectly treating it as a scalar, and then applyMappingImpl would end up producing invalid MIR since this was unexpected. Also fix not using all VGPR sources for vcc outputs.	2020-02-12 16:19:45 -08:00
Johannes Doerfert	8a56d64d76	[OpenMP][IRBuilder] Perform finalization (incl. outlining) late In order to fix PR44560 and to prepare for loop transformations we now finalize a function late, which will also do the outlining late. The logic is as before but the actual outlining step happens now after the function was fully constructed. Once we have loop transformations we can apply them in the finalize step before the outlining. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D74372	2020-02-12 17:55:01 -06:00
Johannes Doerfert	23f41f16d4	[Attributor] Use fine-grained liveness in all helpers We used coarse-grained liveness before, thus we looked if the instruction was executed, but we did not use fine-grained liveness, hence if the instruction was needed or could be deleted even if the surrounding ones are live. This patches introduces this level of liveness checks together with other liveness queries, e.g., for uses. For more control we enforce that all liveness queries go through the Attributor. Test have been adjusted to reflect the changes or augmented to prevent deletion of the parts we want to check. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D73313	2020-02-12 17:36:38 -06:00
Johannes Doerfert	b2c76002ca	[Attributor] Ignore uses if a value is simplified If we have a replacement for a value, via AAValueSimplify, the original value will lose all its uses. Thus, as long as a value is simplified we can skip the uses in checkForAllUses, given that these uses are transitive uses for the simplified version and will therefore affect the simplified version as necessary. Since this allowed us to remove calls without side-effects and a known return value, we need to make sure not to eliminate `musttail` calls. Those we keep around, or later remove the entire `musttail` call chain.	2020-02-12 17:36:38 -06:00
Johannes Doerfert	86509e8c3b	[Attributor] Use assumed information to determine side-effects We relied on wouldInstructionBeTriviallyDead before but that functions does not take assumed information, especially for calls, into account. The replacement, AAIsDead::isAssumeSideEffectFree, does. This change makes AAIsDeadCallSiteReturn more complex as we can have a dead call or only dead users. The test have been modified to include a side effect where there was none in order to keep the coverage. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D73311	2020-02-12 17:36:38 -06:00
Guozhi Wei	369d086d78	[MBP] Partial tail duplication into hot predecessors Current tail duplication embedded in MBP duplicates a BB into all or none of its predecessors without too much cost analysis. So sometimes it is duplicated into cold predecessors, and in other cases it may miss the duplication into hot predecessors. This patch improves tail duplication in 3 aspects: A successor can be duplicated into part of its predecessors. A more fine-grained benefit analysis, combined with 1, now a successor is duplicated into hot predecessors only. If a successor can't be duplicated into one predecessor, it doesn't impact the duplication into other predecessors. Differential Revision: https://reviews.llvm.org/D73387	2020-02-12 15:22:33 -08:00
Ehud Katz	d8a2ea9fd5	[LoopExtractor] Fix legacy pass dependencies Fixes a memory leak of allocating `LoopInfoWrapperPass` and `DominatorTreeWrapperPass`.	2020-02-12 22:39:21 +02:00
Vedant Kumar	34d9f93977	[AddressSanitizer] Ensure only AllocaInst is passed to dbg.declare Various parts of the LLVM code generator assume that the address argument of a dbg.declare is not a `ptrtoint`-of-alloca. ASan breaks this assumption, and this results in local variables sometimes being unavailable at -O0. GlobalISel, SelectionDAG, and FastISel all do not appear to expect dbg.declares to have a `ptrtoint` as an operand. This means that they do not place entry block allocas in the usual side table reserved for local variables available in the whole function scope. This isn't always a problem, as LLVM can try to lower the dbg.declare to a DBG_VALUE, but those DBG_VALUEs can get dropped for all the usual reasons DBG_VALUEs get dropped. In the ObjC test case I'm looking at, the cause happens to be that `replaceDbgDeclare` has hoisted dbg.declares into the entry block, causing LiveDebugValues to "kill" the DBG_VALUEs because the lexical dominance check fails. To address this, I propose: 1) Have ASan (always) pass an alloca to dbg.declares (this patch). This is a narrow bugfix for -O0 debugging. 2) Make replaceDbgDeclare not move dbg.declares around. This should be a generic improvement for optimized debug info, as it would prevent the lexical dominance check in LiveDebugValues from killing as many variables. This means reverting llvm/r227544, which fixed an assertion failure (llvm.org/PR22386) but no longer seems to be necessary. I was able to complete a stage2 build with the revert in place. rdar://54688991 Differential Revision: https://reviews.llvm.org/D74369	2020-02-12 11:24:02 -08:00
Jay Foad	32aac25637	[KnownBits] Introduce anyext instead of passing a flag into zext Summary: This was a very odd API, where you had to pass a flag into a zext function to say whether the extended bits really were zero or not. All callers passed in a literal true or false. I think it's much clearer to make the function name reflect the operation being performed on the value we're tracking (rather than on the KnownBits Zero and One fields), so zext means the value is being zero extended and new function anyext means the value is being extended with unknown bits. NFC. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74482	2020-02-12 19:06:53 +00:00
Huihui Zhang	5350a48931	[ConstantFold][SVE] Fix constant fold for FoldReinterpretLoadFromConstPtr. Summary: Bail out early for scalable vectors. As global variables are not expected to be scalable. Use explicit call of getFixedSize() to assert on places where scalable size doesn't make sense. Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74424	2020-02-12 10:24:50 -08:00
Florian Hahn	bb310b3f73	Recommit "[SCCP] Remove forcedconstant, go to overdefined instead" This version includes a fix for a set of crashes caused by marking values depending on a yet unknown & tracked call as overdefined. In some cases, we would later discover that the call has a constant result and try to mark a user of it as constant, although it was already marked as overdefined. Most instruction handlers bail out early if the instruction is already overdefined. But that is not necessary for CastInsts for example. By skipping values that depend on skipped calls, we resolve the crashes and also improve the precision in some cases (see resolvedundefsin-tracked-fn.ll). Note that we may not skip PHI nodes that may depend on a skipped call, but they can be safely marked as overdefined, as we bail out early if the PHI node is overdefined. This reverts the revert commit a74b31a3e9cd844c7ce2087978568e3f5ec8519.	2020-02-12 18:02:18 +00:00
Anh Tuyen Tran	a5b6480d05	[NFC] Remove extra headers included in Loop Unroll and LoopUnrollAndJam files Summary: This refactor patch removes some header files which are not needed and also add some to meet IWYU principles. Reviewers: rnk (Reid Kleckner), Meinersbur (Michael Kruse), dmgreen (Dave Green) Reviewed By: dmgreen (Dave Green), rnk (Reid Kleckner), Meinersbur (Michael Kruse) Subscribers: dmgreen (Dave Green), Whitney (Whitney Tsang), hiraditya (Aditya Kumar), zzheng (Z. Zheng), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D73498	2020-02-12 17:57:56 +00:00
Jessica Paquette	45417b7aa7	[AArch64][GlobalISel] Properly implement widening for TB(N)Z When we have to widen to a 64-bit register, we have to emit a SUBREG_TO_REG. Add a general-purpose widening helpe which emits the correct SUBREG_TO_REG instruction based off of a desired size and add a testcase. Also remove some asserts which are technically incorrect in `emitTestBit`. - p0 doesn't count as a scalar type, so we need to check `!Ty.isVector()` instead - Whenever we have a s1, the Size/Bit checks are too conservative, so just remove them Replace these asserts with less conservative ones where applicable. Differential Revision: https://reviews.llvm.org/D74427	2020-02-12 09:24:58 -08:00
Alina Sbirlea	4f33a68973	Compute ORE, BPI, BFI in Loop passes. Summary: Passes ORE, BPI, BFI are not being preserved by Loop passes, hence it is incorrect to retrieve these passes as cached. This patch makes the loop passes in question compute a new instance. In some of these cases, however, it may be beneficial to change the Loop pass to a Function pass instead, similar to the change for LoopUnrollAndJam. Reviewers: chandlerc, dmgreen, jdoerfert, reames Subscribers: mehdi_amini, hiraditya, zzheng, steven_wu, dexonsmith, Whitney, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72891	2020-02-12 09:15:18 -08:00
Simon Pilgrim	ff307c8120	[X86] combineFneg - generalize FMA negations with isNegatibleForFree/getNegatedExpression This has a really interesting side effect in that it improves some UMAX/UMIN reduction code which had redundant XOR(SHUFFLE(XOR(X,SIGNMASK)),SIGNMASK) patterns - the getNegatibleCost recognises it as FNEG(SHUFFLE(FNEG(X))).... We have a lot of FNEG patterns bitcasted to the integer domain for XOR signbit twiddling which is similar to what we do to allow UMAX/UMIN to be lowered using SMAX/SMIN. Differential Revision: https://reviews.llvm.org/D74231	2020-02-12 16:07:27 +00:00
Sven van Haastregt	665dcdacc0	Add missing newlines at EOF; NFC	2020-02-12 15:57:25 +00:00
Danilo Carvalho Grael	fc8d033e96	[AArch64][SVE] Add addsub carry long instrinsics Summary: Add intrinsics for the following instructions: - adclb, adclt, sbclb, sbclt Reviewers: kmclaughlin, c-rhodes, sdesmalen, efriedma, rengolin Reviewed By: kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74328	2020-02-12 10:49:10 -05:00
Victor Huang	caa10988be	[PowerPC] Add options for PPC to enable/disable using non-volatile CR An option is added for PowerPC to disable use of non-volatile CR register fields and avoid CR spilling in the prologue. Differential Revision: https://reviews.llvm.org/D69835	2020-02-12 09:23:11 -06:00
Anil Mahmud	ab4d606421	[PowerPC] Add support for intrinsic llvm.ppc.eieio Add support for the intrinsic llvm.ppc.eieio to emit the instruction eieio. Differential Revision: https://reviews.llvm.org/D69066	2020-02-12 09:02:17 -06:00
Anil Mahmud	b413e5c309	[PowerPC] Add support for intrinsics llvm.ppc.dcbfl and llvm.ppc.dcbflp Added support for the intrinsic llvm.ppc.dcbfl and llvm.ppc.dcbflp. These will be used for emitting cache control instructions dcbfl and dcbflp which are actually mnemonics for using dcbf instruction with different immediate arguments. dcbfl ra, rb -> dcbf ra, rb, 1 dcbflp, ra, rb -> dcbf ra, rb, 3 Differential Revision: https://reviews.llvm.org/D68411	2020-02-12 09:02:17 -06:00
James Henderson	bf4d8f2952	[DebugInfo] Add checks for v2 directory and file name table terminators The DWARFv2-4 specification for the line table header states that the include directories and file name tables both end with a single null byte. Prior to this change, the parser did not detect if this byte was missing, because it also stopped reading the tables once it reached the prologue end, as claimed by the header_length field. This change adds a check that the terminator has been seen at the end of each table. Reviewed by: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D74413	2020-02-12 14:49:22 +00:00
James Henderson	23cf0a30b1	[DebugInfo] Add check for zero debug line opcode_base The number of standard opcodes is defined to be opcode_base - 1, so a value of 0 for the opcode_base caused a crash as an attempt was made to reserve many entries in a vector. This change fixes the crash, by issuing a warning and skipping reading of standard opcode lengths in the event of an opcode_base of 0. Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D74309	2020-02-12 14:49:22 +00:00
James Henderson	1da62b51a5	[DebugInfo] Print version in error message in decimal Also remove some test duplication and add a test case that shows the maximum version is rejected (this also shows that the value in the error message is actually in decimal, and not just missing an 0x prefix). Reviewed by: dblaikie Differential Revision: https://reviews.llvm.org/D74403	2020-02-12 14:49:22 +00:00
stozer	61b35e4111	Re-reapply: Recover debug intrinsics when killing duplicated/empty blocks This reverts commit `636c93ed11`. The original patch caused build failures on TSan buildbots. Commit `6ded69f294` fixes this issue by reducing the rate at which empty debug intrinsics propagate, reducing the memory footprint and preventing a fatal spike.	2020-02-12 14:36:30 +00:00
Matt Arsenault	fa61e200e5	AMDGPU/GlobalISel: Widen non-power-of-2 load results Load extra bits if suitably aligned. This allows using widened 3-vector loads on SI, and fixes legalization for <9 x s32> (which LSV apparently forms frequently on lowered kernel argument lists). Fix incorrectly treating these as legal on SI. This should emit a 64-bit store and a 32-bit store. I think all of the load and store rules are just about complete, but due for a rewrite.	2020-02-12 09:35:10 -05:00
Florian Hahn	81dbb6aec6	Recommit "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)." This includes a fix for the santizier failures. This reverts the revert commit `42f8b915eb`.	2020-02-12 14:17:50 +00:00
Hans Wennborg	a19de32095	Fix unused function warning (PR44808)	2020-02-12 15:12:48 +01:00
Ayman Musa	35f02aa021	Revert "[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand." This reverts commit `cf155150f9`.	2020-02-12 15:04:49 +02:00
Ayman Musa	cf155150f9	[AggressiveInstCombine] Add support for ICmp instr that feeds a select intsr's condition operand.	2020-02-12 15:01:27 +02:00
stozer	ffeb64db35	Reapply "[DebugInfo] Prevent explosion of debug intrinsics during jump threading" This reverts commit `6ded69f294`.	2020-02-12 12:39:54 +00:00
Ayman Musa	3bda9059b8	[AggressiveInstCombine] Add support for select instruction. Differential Revision: https://reviews.llvm.org/D72837	2020-02-12 13:59:34 +02:00
Simon Pilgrim	9eb426c88c	[TargetLowering] Add NegatibleCost enum for isNegatibleForFree return codes The isNegatibleForFree/getNegatedExpression methods currently rely on a raw char value to indicate whether a negation is beneficial or not. This patch replaces the char return value with an NegatibleCost enum to more clearly demonstrate what is implied. It also renames isNegatibleForFree to getNegatibleCost to more accurately reflect whats going on. Differential Revision: https://reviews.llvm.org/D74221	2020-02-12 11:51:42 +00:00
stozer	6ded69f294	Revert "[DebugInfo] Prevent explosion of debug intrinsics during jump threading" This reverts commit `fe6f6cd6b8`. Found test failure on several buildbots.	2020-02-12 11:48:00 +00:00
Ayman Musa	49a4d85f6d	[NFC][AggressiveInstCombine] Remove redundant std::max. Differential Revision: https://reviews.llvm.org/D74476	2020-02-12 13:47:40 +02:00
stozer	fe6f6cd6b8	[DebugInfo] Prevent explosion of debug intrinsics during jump threading This patch is a fix following the revert of `72ce759` (https://reviews.llvm.org/rG72ce759928e6dfee6a9efa310b966c19722352ba) and fixes the failure that it caused. The above patch failed on the Thread Sanitizer buildbot with an out of memory error. After an investigation, the cause was identified as an explosion in debug intrinsics while running the Jump Threading pass on ModuleMap.ll. The above patched prevented debug intrinsics from being dropped when their Basic Block was deleted due to being "empty". In this case, one of the functions in ModuleMap.ll had (after many optimization passes) a very large number of debug intrinsics representing a set of repeatedly inlined variables. Previously the vast majority of these were silently dropped during Jump Threading when their blocks were deleted, but as of the above patch they survived for longer, causing a large increase in the number of debug intrinsics. These intrinsics were then repeatedly cloned by the Jump Threading pass as edges were threaded, multiplying the intrinsic count further. The memory consumed by this process spiralled out of control, crashing the buildbot that uses TSan (which has an estimated 5-10x memory overhead compared to non-sanitized builds). This patch adds RemoveRedundantDbgInstrs to the Jump Threading pass, in order to reduce the number of debug intrinsics down to a manageable amount in cases where many intrinsics for the same variable end up bunched together contiguously, as in this case. Differential Revision: https://reviews.llvm.org/D73054	2020-02-12 11:22:54 +00:00
Ehud Katz	2470d2988a	[ConstantFolding] Fold calls to FP remainder function With the fixed implementation of the "remainder" operation in rG9d0956ebd471, we can now add support to folding calls to it. Differential Revision: https://reviews.llvm.org/D69777	2020-02-12 13:21:18 +02:00
Jay Foad	e9900b1fbf	[AMDGPU] Add one more pass to LLVMInitializeAMDGPUTarget	2020-02-12 11:19:14 +00:00
Djordje Todorovic	97ed706a96	Revert "[DebugInfo] Enable the debug entry values feature by default" This reverts commit rG9f6ff07f8a39. Found a test failure on clang-with-thin-lto-ubuntu buildbot.	2020-02-12 11:59:04 +01:00
Florian Hahn	fa74b31a3e	Revert "[SCCP] Remove forcedconstant, go to overdefined instead" This causes a crash for the reproducer below enum { a }; enum b { c, d }; e; static _Bool g(struct f *h, enum b i) { i &&j(); return a; } static k(char h, enum b i) { _Bool l = g(e, i); l; } m(h) { k(h, c); g(h, d); } This reverts commit `aadb635e04`.	2020-02-12 09:41:19 +00:00
Clement Courbet	15488ff24b	[CodeGen] Fix the computation of the alignment of split stores. Summary: Right now the alignment of the lower half of a store is computed as align/2, which fails for unaligned stores (align = 1), and is overly pessimitic for, e.g. a 8 byte store aligned to 4 bytes. Fixes PR44851 Fixes PR44877 Reviewers: gchatelet, spatel, lebedev.ri Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74311	2020-02-12 10:37:30 +01:00
Djordje Todorovic	9f6ff07f8a	[DebugInfo] Enable the debug entry values feature by default This patch enables the debug entry values feature. - Remove the (CC1) experimental -femit-debug-entry-values option - Enable it for x86, arm and aarch64 targets - Resolve the test failures - Leave the llc experimental option for targets that do not support the CallSiteInfo yet Differential Revision: https://reviews.llvm.org/D73534	2020-02-12 10:25:14 +01:00
Ehud Katz	9d0956ebd4	[APFloat] Fix FP remainder operation Reimplement IEEEFloat::remainder() function. Fix PR3359. Differential Revision: https://reviews.llvm.org/D69776	2020-02-12 10:42:55 +02:00
Nicolai Hähnle	ab2f610f38	AMDGPU: llvm.amdgcn.writelane is a source of divergence Summary: Consider: %r = call i32 @llvm.amdgcn.writelane(i32 0, i32 1, i32 2) This produces a value that is 0 on lane 1, and 2 everywhere else; i.e., it is divergent. Reported-by: Marek Olsak <Marek.Olsak@amd.com> Reviewers: arsenm, foad, mareko Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74400	2020-02-12 09:12:56 +01:00
Nicolai Hähnle	07a5b849f7	SelectionDAG: Fix bug in ClusterNeighboringLoads Summary: The method attempts to find loads that can be legally clustered by looking for loads consuming the same chain glue token. However, the old code looks at _all_ users of values produced by the chain node -- including uses of the loaded/returned value of volatile loads or atomics. This could lead to circular dependencies which then failed during scheduling. With this change, we filter out users by getResNo, i.e. by which SDValue value they use, to ensure that we only look at users of the chain glue token. This appears to be a rather old bug, which is perhaps surprising. However, the test case is actually quite fragile (i.e., it is hidden by fairly small changes), and the test _must_ use volatile loads for the bug to manifest. Reviewers: arsenm, bogner, craig.topper, foad Subscribers: MatzeB, jvesely, wdng, hiraditya, javed.absar, jfb, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74253	2020-02-12 09:12:55 +01:00
Kazushi (Jam) Marukawa	42a16dacda	[VE] Bit operator isel Summary: Isel and tests for bswap,brev,ctpop,ctlz,ctty,rotl,rotr Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D74304	2020-02-12 09:02:13 +01:00
Craig Topper	746395a446	[X86] Remove unnecessary hasSideEffects = 0, mayLoad = 1 from an instruction with a pattern. NFC	2020-02-11 23:26:29 -08:00
Craig Topper	3988b7046a	[X86] Correct the predicate on some patterns for 128 and 256 EVEX versions of VCVTPS2PH. These should require AVX512VL not AVX512F. The legacy VEX patterns will match first unless AVX512VL is enabled so this doesn't cause a functional issue.	2020-02-11 23:26:29 -08:00
Igor Kudrin	07e50c7b91	[DebugInfo] Add support for DWARF64 into DWARFDebugAddr. Differential Revision: https://reviews.llvm.org/D74198	2020-02-12 13:33:01 +07:00
Igor Kudrin	dc16612393	[DebugInfo] Simplify DWARFDebugAddr. The patch removes unnecessary members of DWARFDebugAddr and further simplifies the implementation by separating parsing methods of tables in the DWARFv5 and pre-standard formats. Differential Revision: https://reviews.llvm.org/D74197	2020-02-12 13:33:00 +07:00
Igor Kudrin	de9604232a	[DebugInfo] Refine error messages in DWARFDebugAddr. As a preparation for the subsequent patches, this updates the wordings of some error messages in DWARFDebugAddr. Differential Revision: https://reviews.llvm.org/D74196	2020-02-12 13:33:00 +07:00
Igor Kudrin	292b67f993	[DebugInfo] Use "an address table" in diagnostic messages of DWARFDebugAddr. This replaces a collocation "a .debug_addr table" with "an address table" because the latter sounds more accurate. Differential Revision: https://reviews.llvm.org/D74407	2020-02-12 13:33:00 +07:00
Igor Kudrin	675c4bebaf	[DebugInfo] Do not dump header field for pre-DWARFv5 address tables. As there is no header in pre-DWARFv5 address tables, and we fill the class data members with some artificial values, we should not dump them as that might be misleading. Differential Revision: https://reviews.llvm.org/D74195	2020-02-12 13:33:00 +07:00
Igor Kudrin	5d58eb9f4f	[DebugInfo] Fix reading addresses in DWARFDebugAddr. As addresses in the address tables may have relocations, thus, the relocations should be resolved to read the correct address. That is especially important for targets that use RELA relocations because in that case addends are stored in relocation sections. Differential Revision: https://reviews.llvm.org/D74404	2020-02-12 13:32:59 +07:00
Craig Topper	0daf9b8e41	[X86][LegalizeTypes] Add SoftPromoteHalf support STRICT_FP_EXTEND and STRICT_FP_ROUND This adds a strict version of FP16_TO_FP and FP_TO_FP16 and uses them to implement soft promotion for the half type. This is enough to provide basic support for __fp16 with strictfp. Add the necessary X86 support to use VCVTPS2PH/VCVTPH2PS when F16C is enabled.	2020-02-11 22:30:04 -08:00
Yuanfang Chen	80a34ae311	Revert "Reland "[Support] make report_fatal_error `abort` instead of `exit`"" This reverts commit rGcd5b308b828e, rGcd5b308b828e, rG8cedf0e2994c. There are issues to be investigated for polly bots and bots turning on EXPENSIVE_CHECKS.	2020-02-11 20:41:53 -08:00
Matt Arsenault	6d4ebada79	AMDGPU: Use conditions directly in division expansion This was creating a select on true/false values, and then comparing that later. This produced more work for later combines, which can be avoided by just using the boolean values. This was copied from the original DAG expansion, which also has the same problem. This doesn't have a observable change using SelectionDAG, but since GlobalISel is missing these optimizations, the final code was noticeably longer.	2020-02-11 23:11:30 -05:00
Yuanfang Chen	8cedf0e299	Reland "[Support] make report_fatal_error `abort` instead of `exit`" Summary: Reland D67847 after D73742 is committed. Replace `sys::Process::Exit(1)` with `abort` in `report_fatal_error`. After this patch, for tools turning on `CrashRecoveryContext`, crash handler installed by `CrashRecoveryContext` is called unless they installed a non-returning handler using `llvm::install_fatal_error_handler` like `cc1_main` currently does. Reviewers: rnk, MaskRay, aganea, hans, espindola, jhenderson Subscribers: jholewinski, qcolombet, dschuff, jyknight, emaste, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, rupprecht, jocewei, jsji, Jim, dmgreen, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, kerbowa, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74456	2020-02-11 18:20:40 -08:00
Austin Kerbow	3a312c3ee5	[AMDGPU][GlobalISel] Refactor selectDS1Addr1Offset/selectDS64Bit4ByteAligned Differential Revision: https://reviews.llvm.org/D74261	2020-02-11 16:57:13 -08:00
Matt Arsenault	b30e122333	AMDGPU: Don't expand more special div cases in IR These have nicer expansions implemented in the DAG. Ideally we would either directly implement all of these special expansions, or stop expanding division in the IR.	2020-02-11 19:01:06 -05:00
Reid Kleckner	a349c09162	Fix MSVC build with C++ EH enabled Mark the CrashRecoveryContextImpl constructor noexcept, so that MSVC won't emit an unwind helper to clean up the allocation from `new` if the constructor throws an exception. Otherwise, MSVC complains: llvm\lib\Support\CrashRecoveryContext.cpp(220): error C2712: \ Cannot use __try in functions that require object unwinding The other simple fix would be to wrap `new` in a static helper or lambda. Users have reported that Tensorflow builds LLVM with /EHsc.	2020-02-11 15:56:10 -08:00
Matt Arsenault	86f9117d47	AMDGPU: Don't report 2-byte alignment as fast This is apparently worse than 1-byte alignment. This does not attempt to decompose 2-byte aligned wide stores, but will stop trying to produce them. Also fix bug in LoadStoreVectorizer which was decreasing the alignment and vectorizing stack accesses. It was assuming a stack object was an alloca that could have its base alignment changed, which is not true if the pointer is derived from a function argument.	2020-02-11 18:35:00 -05:00
Justin Lebar	1bd6123b78	Use std::foo_t rather than std::foo in LLVM. Summary: C++14 migration. No functional change. Reviewers: bkramer, JDevlieghere, lebedev.ri Subscribers: MatzeB, hiraditya, jkorous, dexonsmith, arphaman, kadircet, lebedev.ri, usaxena95, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D74384	2020-02-11 15:12:51 -08:00
Matt Arsenault	f734ce0488	AMDGPU: Fix crash on v3i15 kernel arguments This was split into 3 i15 arguments. The i15 piece needs to be rounded to a simple MVT for the memory type.	2020-02-11 18:11:39 -05:00
Matt Arsenault	92c62582fc	AMDGPU: Directly use rcp intrinsic in idiv expansions Since natural fdiv lowering is now more conservative even with denormals disabled, we get a slower expansion from just a plain 1.0/fdiv. Directly emit the rcp intrinsic when using it to implement integer division to avoid a pointlessly complex sequence.	2020-02-11 18:11:39 -05:00
Matt Arsenault	b87e3e2d0d	AMDGPU: Don't create potentially dead rcp declarations This will introduce unused declarations if this doesn't reach any of the paths that will really use it.	2020-02-11 18:11:39 -05:00
Aditya Nandakumar	bdc3c73454	[MachO] Pad section data to pointer size bytes https://reviews.llvm.org/D74273 Pad macho section data to pointer size bytes, so that relocation table and symbol table following section data will be pointer size aligned. Patch by pguo.	2020-02-11 14:52:21 -08:00
Craig Topper	846d0ac43e	[X86] Don't disable code in combineHorizontalPredicateResult just because we have avx512 We aren't doing a good job of optimizing AVX512 outside of this code. So remove the bail out for AVX512 and replace with a FIXME. This at least gets us the AVX2 codegen. Differential Revision: https://reviews.llvm.org/D74431	2020-02-11 14:36:29 -08:00
Huihui Zhang	88de9338f2	[ConstantFold][SVE] Fix constand fold for vector call. Summary: Do not iterate on scalable vectors. Reviewers: sdesmalen, efriedma, apazos, huntergr, willlovett Reviewed By: sdesmalen Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74419	2020-02-11 14:06:15 -08:00
Krzysztof Parzyszek	61ca996e79	[Hexagon] Don't generate short vectors in ISD::SELECT in preprocessing Selection DAG preprocessing runs long after legalization, so make sure that the types can be handled by the selection code.	2020-02-11 15:27:33 -06:00
lewis-revill	a6bd1256ce	[DebugInfo] Call site entries cannot be generated for FrameSetup calls Instructions marked as FrameSetup do not cause requestLabelAfterInsn to be called and so no such label is generated. Call instructions which require call site entries to be generated require this label to be present in order to calculate the return PC offset/address, but the check for whether the call instruction is marked as FrameSetup was not present. Therefore in the case where a call instruction is marked as FrameSetup, an assertion failure occurs if a call site entry is to be generated. This is the case with RISC-V's implementation of save/restore via library calls. Differential Revision: https://reviews.llvm.org/D71593	2020-02-11 21:23:18 +00:00
lewis-revill	07f7c00208	[RISCV] Add support for save/restore of callee-saved registers via libcalls This patch adds the support required for using the __riscv_save and __riscv_restore libcalls to implement a size-optimization for prologue and epilogue code, whereby the spill and restore code of callee-saved registers is implemented by common functions to reduce code duplication. Logic is also included to ensure that if both this optimization and shrink wrapping are enabled then the prologue and epilogue code can be safely inserted into the basic blocks chosen by shrink wrapping. Differential Revision: https://reviews.llvm.org/D62686	2020-02-11 21:23:03 +00:00
Johannes Doerfert	52aec3221f	[Attributor][NFC] Clarify the documentation a bit more	2020-02-11 15:11:55 -06:00
Johannes Doerfert	8e62968d45	[Attributor] Identify dead uses in PHIs (almost) based on dead edges As an approximation to a dead edge we can check if the terminator is dead. If so, the corresponding operand use in a PHI node is dead even if the PHI node itself is not.	2020-02-11 15:11:55 -06:00
Lang Hames	ca6f58486f	[ORC] Fix symbol dependence propagation algorithm in ObjectLinkingLayer. ObjectLinkingLayer was not correctly propagating dependencies through local symbols within an object. This could cause symbol lookup to return before a searched-for symbol is ready if the following conditions are met: (1) The definition of the symbol being searched for transitively depends on a local symbol within the same object, and that local symbol in turn transitively depends on an external symbol provided by some other module in the JIT. (2) Concurrent compilation is enabled. (3) Thread scheduling causes the lookup of the searched-for symbol to return before all transitive dependencies of the looked-up symbol are emitted. This bug was found by inspection and has not been observed in practice. A jitlink test case has been added to verify that symbol dependencies are correctly propagated through local symbol definitions.	2020-02-11 12:56:41 -08:00
Lang Hames	86787f159a	[ORC] Add debug logging to JITDylib::addDependencies.	2020-02-11 12:56:40 -08:00
Jay Foad	9df0c264d4	[AMDGPU] Fix implicit operands for ENTER_WWM pseudo Summary: SIInstrInfo::expandPostRAPseudo converts ENTER_WWM in-place into an S_OR_SAVEEXEC instruction that needs certain implicit operands. Without this patch I get errors like this that make it harder to use -stop-after to bisect the pass pipeline: $ llc -march=amdgcn test/CodeGen/AMDGPU/wqm.ll -stop-after=postrapseudos -o - \| sed -E 's/ (from\|into) custom "TargetCustom[0-9]+"//' \| llc -march=amdgcn -x=mir error: <stdin>:1295:70: missing implicit register operand 'implicit-def $scc' renamable $sgpr2_sgpr3 = S_OR_SAVEEXEC_B64 -1, implicit-def $exec ^ Note that this error is currently only generated by MIParser but it comes with a FIXME comment: // FIXME: Move the implicit operand verification to the machine verifier. Reviewers: critson, arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74428	2020-02-11 20:11:41 +00:00
Sterling Augustine	417375d785	Allow retrieving source files relative to the compilation directory. Summary: Dwarf stores source-file names the three parts: <compilation_directory><include_directory><filename> Prior to this change, the code only allowed retrieving either all three as the absolute path, or just the filename. But many compile-command lines--especially those in hermetic build systems don't specify an absolute path, nor just the filename, but rather the path relative to the compilation directory. This features allows retrieving them in that style. Add tests for path printing styles. Modify createBasicPrologue to handle include directories. Subscribers: aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73383	2020-02-11 11:46:20 -08:00
Alina Sbirlea	0cecafd647	[BasicAA] Make BasicAA a cfg pass. Summary: Part of the changes in D44564 made BasicAA not CFG only due to it using PhiAnalysisValues which may have values invalidated. Subsequent patches (rL340613) appear to have addressed this limitation. BasicAA should not be invalidated by non-CFG-altering passes. A concrete example is MemCpyOpt which preserves CFG, but we are testing it invalidates BasicAA. llvm-dev RFC: https://groups.google.com/forum/#!topic/llvm-dev/eSPXuWnNfzM Reviewers: john.brawn, sebpop, hfinkel, brzycki Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74353	2020-02-11 11:30:08 -08:00
Craig Topper	d7de7ac370	[X86] Raise the latency for VectorImul from 4 to 5 in Skylake scheduler models Based on uops.info these should have 5 cycle latency as they did on Haswell/Broadwell. I have no additional internal information from Intel. This was also shown as a discrepancy in the spreadsheet that was sent with an early llvm-dev post about llvm-exegesis. It also matches Agner Fog. Differential Revision: https://reviews.llvm.org/D74357	2020-02-11 11:24:25 -08:00
Stanislav Mekhanoshin	453a8f3af7	[AMDGPU] Remove AMDGPURegisterInfo R600 and GCN do not have anything in common in terms of register file organization anymore. Differential Revision: https://reviews.llvm.org/D74426	2020-02-11 11:13:38 -08:00
Teresa Johnson	80d0a137a5	Restore "[WPD/LowerTypeTests] Delay lowering/removal of type tests until after ICP" This restores commit `748bb5a0f1`, along with a fix for a Chromium test suite build issue (and a new test for that case). Differential Revision: https://reviews.llvm.org/D73242	2020-02-11 10:48:05 -08:00
Cyndy Ishida	8c3d0d6a5f	[llvm][TextAPI] add simulators to output Summary: * for <= tbd_v3, simulator platforms appear the same as the real platform and we distinct the difference from the architecture. fixes: rdar://problem/59161559 Reviewers: ributzka, steven_wu Reviewed By: ributzka Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74416	2020-02-11 10:37:37 -08:00
Jordan Rupprecht	734f086b42	[NFC] Fix unused var in release builds	2020-02-11 10:10:52 -08:00
Yonghong Song	29bc5dd194	[BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering Currently, isTruncateFree() and isZExtFree() callbacks return false as they are not implemented in BPF backend. This may cause suboptimal code generation. For example, if the load in the context of zero extension has more than one use, the pattern zextload{i8,i16,i32} will not be generated. Rather, the load will be matched first and then the result is zero extended. For example, in the test together with this commit, we have I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2 I2: %conv = zext i32 %0 to i64 ... I3: %2 = load i32, i32* %data, align 4, !tbaa !7 I4: %conv2 = zext i32 %2 to i64 ... I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32 I6: %conv13 = sub i32 %4, %2 ... The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is used to convert a 32bit register to 64bit one. During code generation, SUBREG_TO_REG is a noop. The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false, the current implementation will generate a SLL_ri and SRL_ri for the zext part during lowering. This patch implement isTruncateFree() in the BPF backend, so for the above example, I3 and I4 will generate a zextloadi32 DAG node with SUBREG_TO_REG is generated during lowering to Machine IR. isZExtFree() is also implemented as it should help code gen as well. This patch also enables the change in https://reviews.llvm.org/D73985 since it won't kick in generates MOV_32_64 machine instruction. Differential Revision: https://reviews.llvm.org/D74101	2020-02-11 09:59:19 -08:00
Johannes Doerfert	185e9b083e	[Attributor][NFC] Improve documentation	2020-02-11 11:19:34 -06:00
Johannes Doerfert	f95553923f	[Attributor] Return uses do not free pointers If a pointer is returned that does not mean it is freed in the current (function) scope. We can ignore such uses in AANoFree.	2020-02-11 11:02:59 -06:00
Johannes Doerfert	4c62a35860	[Attributor][FIX] Remove duplicate, half-broken functionality The changeXXXAfterManifest functions are better suited to deal with changes so we should prefer them. These functions also recursively delete dead instructions which is why we see test changes.	2020-02-11 11:02:59 -06:00
Johannes Doerfert	77a9e61c9a	[Attributor][NFC] Improve debug message	2020-02-11 11:02:59 -06:00
Nikita Popov	5a8819b216	[InstCombine] Use replaceOperand() in more places This is a followup to D73803, which uses the replaceOperand() helper in more places. This should be NFC apart from changes to worklist order. Differential Revision: https://reviews.llvm.org/D73919	2020-02-11 17:38:23 +01:00
Nikita Popov	5eb19bf4a2	[X86CmovConversion] Make heuristic for optimized cmov depth more conservative (PR44539) Fix/workaround for https://bugs.llvm.org/show_bug.cgi?id=44539. As discussed there, this pass makes some overly optimistic assumptions, as it does not have access to actual branch weights. This patch makes the computation of the depth of the optimized cmov more conservative, by assuming a distribution of 75/25 rather than 50/50 and placing the weights to get the more conservative result (larger depth). The fully conservative choice would be std::max(TrueOpDepth, FalseOpDepth), but that would break at least one existing test (which may or may not be an issue in practice). Differential Revision: https://reviews.llvm.org/D74155	2020-02-11 17:33:11 +01:00
Eric Astor	8d5bf0422b	[ms] [llvm-ml] Add support for attempted register parsing Summary: Add a new method (tryParseRegister) that attempts to parse a register specification. MASM allows the use of IFDEF <register>, as well as IFDEF <symbol>. To accommodate this, we make it possible to check whether a register specification can be parsed at the current location, without failing the entire parse if it can't. Reviewers: thakis Reviewed By: thakis Tags: #llvm Differential Revision: https://reviews.llvm.org/D73486	2020-02-11 10:45:33 -05:00
Jonas Paulsson	0311e28e9c	[SystemZ] Bugfix in emitSelect() When more than one SelectPseudo instruction is handled a new MBB is returned. This must not be done if that would result in leaving an undhandled isel pseudo behind in the original MBB. Fixes https://bugs.llvm.org/show_bug.cgi?id=44849. Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D74352	2020-02-11 10:41:01 -05:00
Florian Hahn	aadb635e04	[SCCP] Remove forcedconstant, go to overdefined instead This patch removes forcedconstant to simplify things for the move to ValueLattice, which includes constant ranges, but no forced constants. This patch removes forcedconstant and changes ResolvedUndefsIn to mark instructions with unknown operands as overdefined. This means we do not do simplifications based on undef directly in SCCP any longer, but this seems to hardly come up in practice (see stats below), presumably because InstCombine & others take care of most of the relevant folds already. It is still beneficial to keep ResolvedUndefIn, as it allows us delaying going to overdefined until we propagated all known information. I also built MultiSource, SPEC2000 and SPEC2006 and compared sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact is quite low: Tests: 244 Same hash: 238 (filtered out) Remaining: 6 Metric: sccp.IPNumInstRemoved Program base patch diff test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0% test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5% test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9% test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0% test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0% test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0% Tests: 244 Same hash: 238 (filtered out) Remaining: 6 Metric: sccp.NumInstRemoved Program base patch diff test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4% test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1% test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0% test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0% test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0% Reviewers: davide, efriedma, mssimpso Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D61314	2020-02-11 15:24:15 +00:00
Sjoerd Meijer	6b0ed508fa	[ARM][MVE] Tail-Predication: recognise (again) active lanes IR pattern A small IR change in calculating the active lanes resulted in no longer recognising tail-predication. Now recognise both an 'add' and 'or' in the expression that calculates the active lanes. Differential Revision: https://reviews.llvm.org/D74394	2020-02-11 15:18:18 +00:00
Alexandre Ganea	faace36508	[Clang][Driver] After default -fintegrated-cc1, make llvm::report_fatal_error() generate preprocessed source + reproducer.sh again. Added a test for #pragma clang __debug llvm_fatal_error to test for the original issue. Added llvm::sys::Process::Exit() and replaced ::exit() in places where it was appropriate. This new function would call the current CrashRecoveryContext if one is running on the same thread; or call ::exit() otherwise. Fixes PR44705. Differential Revision: https://reviews.llvm.org/D73742	2020-02-11 10:17:30 -05:00
Andrew Wei	db875f6655	[RISCV] Optimize seteq/setne pattern expansions for better code size ADDI(C.ADDI) may achieve better code size than XORI, since XORI has no C extension. This patch transforms two patterns and gets almost equivalent results. Differential Revision: https://reviews.llvm.org/D71774	2020-02-11 22:45:15 +08:00
Kadir Cetinkaya	42f8b915eb	Revert "[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk)." This reverts commit `d0c4d4fe09`. Revert "[DSE,MSSA] Move more passing test cases from todo to simple.ll." This reverts commit `02266e64bb`. Revert "[DSE,MSSA] Adjust mda-with-dbg-values.ll to MSSA backed DSE." This reverts commit `74f03e4ff0`.	2020-02-11 15:34:48 +01:00
Simon Pilgrim	fa620fc8e2	[X86] combineConcatVectorOps - reuse IsSplat and remove duplicate code. NFC.	2020-02-11 13:37:57 +00:00
Sanjay Patel	a2a0f9a43a	[VectorCombine] remove unused debug counter; NFC The variable was added to the initial commit via copy/paste of existing code, but it wasn't actually used in the code. We can add it back with the proper usage if/when that is needed.	2020-02-11 08:24:07 -05:00
Simon Pilgrim	11c16e7159	[X86][SSE] lowerShuffleAsBitRotate - lower to vXi8 shuffles to ROTL on pre-SSSE3 targets Without PSHUFB we are better using ROTL (expanding to OR(SHL,SRL)) than using the generic v16i8 shuffle lowering - but if we can widen to v8i16 or more then the existing shuffles are still the better option.	2020-02-11 12:21:03 +00:00
Alexey Lapshin	cc9b4fb6c9	[Debuginfo][NFC] Rename error handling functions using the same pattern. Summary: That patch is extracted from https://reviews.llvm.org/D74308. Currently there are two patterns to name error handling functions: using "Callback" and "Handler". This patch uses "Handler" for all usage places. Reviewers: jhenderson, dblaikie, probinson, aprantl Reviewed By: jhenderson, dblaikie Subscribers: hiraditya, llvm-commits Tags: #llvm, #debug-info Differential Revision: https://reviews.llvm.org/D74354	2020-02-11 14:50:53 +03:00
Mirko Brkusanin	5ba931a84a	[Mips] Add intrinsics for 4-byte and 8-byte MSA loads/stores. New intrinisics are implemented for when we need to port SIMD code from other arhitectures and only load or store portions of MSA registers. Following intriniscs are added which only load/store element 0 of a vector: v4i32 __builtin_msa_ldrq_w (const void , imm_n2048_2044); v2i64 __builtin_msa_ldr_d (const void , imm_n4096_4088); void __builtin_msa_strq_w (v4i32, void , imm_n2048_2044); void __builtin_msa_str_d (v2i64, void , imm_n4096_4088); Differential Revision: https://reviews.llvm.org/D73644	2020-02-11 11:47:30 +01:00
Kerry McLaughlin	e7755f9e4f	[AArch64][SVE] Add SVE2 intrinsics for complex integer dot product Summary: Implements the following intrinsics: - @llvm.aarch64.sve.cdot - @llvm.aarch64.sve.cdot.lane Reviewers: sdesmalen, efriedma, dancgr, c-rhodes, rengolin Reviewed By: efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73687	2020-02-11 10:28:31 +00:00
OCHyams	35e0ab647b	[DebugInfo][NFC] Fixup the UserValue methods to use FragmentInfo Fixup the UserValue methods to use FragmentInfo instead of DIExpression because the DIExpression is only ever used to get the to get the FragmentInfo. The DIExpression is meaningless in the UserValue class because each definition point added to a UserValue may have a unique DIExpression. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74057	2020-02-11 10:20:24 +00:00
OCHyams	3aa33fde03	[DebugInfo][NFC] Rename the class DbgValueLocation to DbgVariableValue Rename the class DbgValueLocation to DbgVariableValue and instances from Loc to DbgValue. These names better express the new semantics introduced in D74053. The class previously represented a { Location } only. It now represents a { Location, DIExpression } pair which together describe a value. Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74055	2020-02-11 10:20:24 +00:00
OCHyams	1e40799324	[DebugInfo] Teach LDV how to handle identical variable fragments LiveDebugVariables uses interval maps to explicitly represent DBG_VALUE intervals. DBG_VALUEs are filtered into an interval map based on their { Variable, DIExpression }. The interval map will coalesce adjacent entries that use the same { Location }. Under this model, DBG_VALUEs which refer to the same bits of the same variable will be filtered into different interval maps if they have different DIExpressions which means the original intervals will not be properly preserved. This patch fixes the problem by using { Variable, Fragment } to filter the DBG_VALUEs into maps, and coalesces adjacent entries iff they have the same { Location, DIExpression } pair. The solution is not perfect because we see the similar issues appear when partially overlapping fragments are encountered, but is far simpler than a complete solution (i.e. D70121). Fixes: pr41992, pr43957 Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D74053	2020-02-11 10:20:24 +00:00
Jay Foad	b06a13f541	[AMDGPU] Fix non-deterministic iteration order Summary: As far as I know this did not affect code generation, but it did affect the order of -debug-only=si-wqm output and the naming of autonamed values in -print-after=si-wqm output. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, mgrang, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74317	2020-02-11 09:19:30 +00:00
Craig Topper	798305d29b	[X86] Custom lower ISD::FP16_TO_FP and ISD::FP_TO_FP16 on f16c targets instead of using isel patterns. We need to use vector instructions for these operations. Previously we handled this with isel patterns that used extra instructions and copies to handle the the conversions. Now we use custom lowering to emit the conversions. This allows them to be pattern matched and optimized on their own. For example we can now emit vpextrw to store the result if its going directly to memory. I've forced the upper elements to VCVTPHS2PS to zero to keep some code similar. Zeroes will be needed for strictfp. I've added a DAG combine for (fp16_to_fp (fp_to_fp16 X)) to avoid extra instructions in between to be closer to the previous codegen. This is a step towards strictfp support for f16 conversions.	2020-02-10 22:01:48 -08:00
River Riddle	52086f802e	[llvm][TableGen] Define FieldInit::isConcrete overload Summary: There are a few field init values that are concrete but not complete/foldable (e.g. `?`). This allows for using those values as initializers without erroring out. Example: ``` class A { string value = ?; } class B<A impl> : A { let value = impl.value; // This currently emits an error. let value = ?; // This doesn't emit an error. } ``` Differential Revision: https://reviews.llvm.org/D74360	2020-02-10 18:04:58 -08:00
diggerlin	09d26b79d2	[NFC] Refactor the tuple of symbol information with structure for llvm-objdump SUMMARY: refator the std::tuple<uint64_t, StringRef, uint8_t> to structor Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74240	2020-02-10 19:23:01 -05:00
Amara Emerson	067dd9c6b1	[GlobalISel][CallLowering] Use stripPointerCasts(). A downstream test exposed a simple logic bug with the manual pointer stripping code, fix that by just using stripPointerCasts() on the value. I don't think there's a way to expose this issue upstream.	2020-02-10 15:43:57 -08:00
Sanjay Patel	b8ebc11f03	[EarlyCSE] avoid crashing when detecting min/max/abs patterns (PR41083) As discussed in PR41083: https://bugs.llvm.org/show_bug.cgi?id=41083 ...we can assert/crash in EarlyCSE using the current hashing scheme and instructions with flags. ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or other flags when detecting patterns such as min/max/abs composed of compare+select. But the value numbering / hashing mechanism used by EarlyCSE intersects those flags to allow more CSE. Several alternatives to solve this are discussed in the bug report. This patch avoids the issue by doing simple matching of min/max/abs patterns that never requires instruction flags. We give up some CSE power because of that, but that is not expected to result in much actual performance difference because InstCombine will canonicalize these patterns when possible. It even has this comment for abs/nabs: /// Canonicalize all these variants to 1 pattern. /// This makes CSE more likely. (And this patch adds PhaseOrdering tests to verify that the expected transforms are still happening in the standard optimization pipelines. I left this code to use ValueTracking's "flavor" enum values, so we don't have to change the callers' code. If we decide to go back to using the ValueTracking call (by changing the hashing algorithm instead), it should be obvious how to replace this chunk. Differential Revision: https://reviews.llvm.org/D74285	2020-02-10 17:25:34 -05:00
Hiroshi Yamauchi	bb383ae612	[CallPromotionUtils] Add tryPromoteCall. Summary: It attempts to devirtualize a call on alloca through vtable loads. Reviewers: davidxl Subscribers: mgorny, Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71308	2020-02-10 13:43:16 -08:00
Xiangling Liao	660b0d7f7b	[AIX] Enable frame pointer for AIX and add related test suite This patch: - enable frame pointer for AIX; - update some of red zone comments; - add/update testcases; Differential Revision: https://reviews.llvm.org/D72454	2020-02-10 15:43:41 -05:00
Matt Arsenault	f270da6bfc	RegisterCoalescer: Add LaneMask to debug printing	2020-02-10 12:34:33 -08:00
Sanjay Patel	62ce7e650a	[InstCombine] fix use check when canonicalizing abs/nabs We were checking for extra uses of the negated operand even if we were not going to create it as part of this canonicalization. This was showing up as a regression when we limit EarlyCSE as proposed in D74285.	2020-02-10 14:57:37 -05:00
diggerlin	aa86311e62	[AIX][XCOFF] Support Mergeable2ByteCString and Mergeable4ByteCString SUMMARY: The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString Reviewers: daltenty Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D74164	2020-02-10 14:45:54 -05:00
Rachel Craik	1f55420065	[LoopCacheAnalysis]: Add support for negative stride LoopCacheAnalysis currently assumes the loop will be iterated over in a forward direction. This patch addresses the issue by using the absolute value of the stride when iterating backwards. Note: this patch will treat negative and positive array access the same, resulting in the same cost being calculated for single and bi-directional access patterns. This should be improved in a subsequent patch. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D73064	2020-02-10 13:22:35 -05:00
Jonas Paulsson	fcdb99e0b5	[SystemZ] Add a subtarget cache like some other targets already have. Each function is with this compiled with the SystemZSubtarget initialized from the functions attributes. Review: Ulrich Weigand. Differential Revision: https://reviews.llvm.org/D74086	2020-02-10 13:10:58 -05:00
Matt Arsenault	7af7b96a9b	AMDGPU: Move R600 test compatability hack Instead of handling the r600 intrinsics on amdgcn, handle the amdgcn intrinsics on r600.	2020-02-10 10:02:06 -08:00
Simon Pilgrim	f319074824	[X86] combineConcatVectorOps - combine X86ISD::PACKSS ops	2020-02-10 17:48:02 +00:00
Simon Pilgrim	74c0f98cf5	[X86] combineConcatVectorOps - combine X86ISD::VPERMI ops	2020-02-10 17:48:01 +00:00
Simon Pilgrim	2463b8c97d	[X86] combineConcatVectorOps - combine VSHLI/VSRAI/VSRLI ops Non-AVX512BW targets failed to concatenate 256-bit shifts back to 512-bits (split during 512-bit shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-10 16:59:09 +00:00
David Stenberg	982944525c	Revert "[InstCombine][DebugInfo] Fold constants wrapped in metadata" This reverts commit `b54a8ec1bc`. The commit triggered debug invariance (different output with/without -g). The patch seems to have exposed a pre-existing invariance problem in GlobalOpt, which I'll write a bug report for.	2020-02-10 17:58:33 +01:00
Stanislav Mekhanoshin	ed3527c648	[AMDGPU] Split R600 and GCN subregs These are generated and do not need to have the same values. We are defining separate subregs for R600 and GCN but then using AMDGPU subregs on R600. Differential Revision: https://reviews.llvm.org/D74248	2020-02-10 08:29:56 -08:00
Simon Pilgrim	06617c4522	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. REAPPLIED rGe82e17d4d4ca after reversion at rG39eade73a567 - fixed offset matching in matchShuffleAsBitRotate.	2020-02-10 16:16:56 +00:00
Kadir Cetinkaya	5731b6672d	Revert "[OpenMP] Fix unused variable" This breaks under asan, see http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/38597/steps/check-clang%20asan/logs/stdio This reverts commit `bb50454295`. Revert "[FIX] Ordering problem accidentally introduced with D72304" This reverts commit `08c0a06d8f`. Revert "[OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder." This reverts commit `e8a436c5ea`.	2020-02-10 16:34:59 +01:00
Bill Wendling	c55cf4afa9	Revert "Remove redundant "std::move"s in return statements" The build failed with error: call to deleted constructor of 'llvm::Error' errors. This reverts commit `1c2241a793`.	2020-02-10 07:07:40 -08:00
James Henderson	b1c7bfe6da	[DebugInfo] Reject line tables of version > 5 If a debug line section with version of greater than 5 is encountered, prior to this change the parser would accept it and treat it as version 5. This might work to some extent, but then it might not at all, as it really depends on the format of the unspecified future version, which will be different (otherwise there would be no point in changing the version number). Any information we could provide has a good chance of being invalid, so we should just refuse to parse such tables. Reviewed by: dblaikie, MaskRay Differential Revision: https://reviews.llvm.org/D74204	2020-02-10 14:43:10 +00:00
Bill Wendling	1c2241a793	Remove redundant "std::move"s in return statements	2020-02-10 06:39:44 -08:00
Luke Geeson	a67db83681	[AArch64] Make Read Write System Registers Read Only This patch makes the following System Registers Read Only: - CurrentEL - ICH_MISR_EL2 - PMBIDR_EL1 - PMSIDR_EL1 as found in: https://developer.arm.com/docs/ddi0595/e/aarch64-system-registers Relative line numbers were also added to the tests so we get more informative error messages on failure. Change-Id: I963b4f01ca5737b58f9e8e7abe9ca1d99e328758	2020-02-10 14:34:24 +00:00
Sebastian Neubauer	7cddd15e56	[SelectionDAG] Optimize build_vector of truncates and shifts Add a simplification to fuse a manual vector extract with shifts and truncate into a bitcast. Unpacking and packing values into vectors is only optimized with extractelement instructions, not when manually unpacked using shifts and truncates. This patch simplifies shifts and truncates into a bitcast if possible. Simplify (build_vec (trunc $1) (trunc (srl $1 width)) (trunc (srl $1 (2 * width))) ...) to (bitcast $1) Differential Revision: https://reviews.llvm.org/D73892	2020-02-10 15:04:07 +01:00
Kai Nacke	34946dfd79	[SystemZ] Add implementation for the intrinsic llvm.read_register This change implements the llvm intrinsic llvm.read_register for the SystemZ platform which returns the value of the specified register (http://llvm.org/docs/LangRef.html#llvm-read-register-and-llvm-write-register-intrinsics). This implementation returns the value of the stack register, and can be extended to return the value of other registers. The implementation for this intrinsic exists on various other platforms including Power, x86, ARM, etc. but missing on SystemZ. Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D73378	2020-02-10 08:19:10 -05:00
Hans Wennborg	ea9850b6c7	Fix an unused variable warning	2020-02-10 14:08:18 +01:00
Mikael Holmen	a50c0b0df7	Fix compiler warning when compiling without asserts [NFC]	2020-02-10 13:55:52 +01:00
Kadir Cetinkaya	bb50454295	[OpenMP] Fix unused variable	2020-02-10 13:47:20 +01:00
Kerry McLaughlin	92a7875092	[AArch64][SVE] SVE2 intrinsics for complex integer arithmetic Summary: Adds the following SVE2 intrinsics: - cadd & sqcadd - cmla & sqrdcmlah - saddlbt, ssublbt & ssubltb Reviewers: sdesmalen, dancgr, efriedma, cameron.mcinally, c-rhodes, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73636	2020-02-10 12:14:56 +00:00
Simon Pilgrim	39eade73a5	Revert rGe82e17d4d4cac8b2df00094e80d5e1cb22795664 - [X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types). --- Internal shuffle tests indicate theres a bug somewhere that I haven't been able to track down yet.	2020-02-10 12:14:26 +00:00
Florian Hahn	d0c4d4fe09	[DSE] Add first version of MemorySSA-backed DSE (Bottom up walk). This patch adds a first version of a MemorySSA based DSE. It is missing a lot of features, which will get added as follow-ups, to help to keep the review manageable. The patch uses the following general approach: given a MemoryDef, walk upwards to find clobbering MemoryDefs that may be killed by the starting def. Then check that there are no uses that may read the location of the original MemoryDef in between both MemoryDefs. A bit more concretely: For all MemoryDefs StartDef: 1. Get the next dominating clobbering MemoryDef (DomAccess) by walking upwards. 2. Check that there no reads between DomAccess and the StartDef by checking all uses starting at DomAccess and walking until we see StartDef. 3. For each found DomDef, check that: 1. There are no barrier instructions between DomDef and StartDef (like throws or stores with ordering constraints). 2. StartDef is executed whenever DomDef is executed. 3. StartDef completely overwrites DomDef. 4. Erase DomDef from the function and MemorySSA. The patch uses a very simple approach to guarantee that no throwing instructions are between 2 stores: We only allow accesses to stack objects, access that are in the same basic block if the block does not contain any throwing instructions or accesses in functions that do not contain any throwing instructions. This will get lifted later. Besides adding support for the missing cases, there is plenty of additional potential for improvements as follow-up work, e.g. the way we visit stores (could be just a traversal of the MemorySSA, rather than collecting them up-front), using the alias information discovered during walking to optimize the MemorySSA. This is loosely based on D40480 by Dave Green. Reviewers: dmgreen, rnk, efriedma, bryant, asbirlea, Tyker Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D72700	2020-02-10 11:52:11 +00:00
Kerry McLaughlin	e299a08149	[AArch64][SVE] SVE2 intrinsics for character match & histogram generation Summary: Implements the following intrinsics: - @llvm.aarch64.sve.histcnt - @llvm.aarch64.sve.histseg - @llvm.aarch64.sve.match - @llvm.aarch64.sve.nmatch Reviewers: c-rhodes, sdesmalen, dancgr, efriedma, rengolin Reviewed By: c-rhodes Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74117	2020-02-10 11:08:00 +00:00
Kerry McLaughlin	5e1d7bb679	[AArch64][SVE] Add SVE2 intrinsics for widening DSP operations Summary: Implements the following intrinsics: - @llvm.aarch64.sve.[s\|u]abalb - @llvm.aarch64.sve.[s\|u]abalt - @llvm.aarch64.sve.[s\|u]addlb - @llvm.aarch64.sve.[s\|u]addlt - @llvm.aarch64.sve.[s\|u]sublb - @llvm.aarch64.sve.[s\|u]sublt - @llvm.aarch64.sve.[s\|u]abdlb - @llvm.aarch64.sve.[s\|u]abdlt - @llvm.aarch64.sve.sqdmullb - @llvm.aarch64.sve.sqdmullt - @llvm.aarch64.sve.[s\|u]mullb - @llvm.aarch64.sve.[s\|u]mullt Reviewers: sdesmalen, dancgr, efriedma, cameron.mcinally, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D73719	2020-02-10 10:37:59 +00:00
Florian Hahn	da52b9c118	[DSE] Add tests for MemorySSA based DSE. This copies the DSE tests into a MSSA subdirectory to test the MemorySSA backed DSE implementation, without disturbing the original tests. Differential Revision: https://reviews.llvm.org/D72145	2020-02-10 10:28:43 +00:00
Djordje Todorovic	3a4dc577c9	[CSInfo] Fix the assertions regarding updating the CSInfo The call site info was not updated correctly when deleting corresponding call instructions. Differential Revision: https://reviews.llvm.org/D73700	2020-02-10 10:55:06 +01:00
Kai Nacke	a5040d5ec9	[SytemZ] Disable vector ABI when using option -march=arch[8\|9\|10] When specifying -march=arch[8\|9\|10], those CPU types do NOT support the vector extension. In this case the vector ABI must be disabled. The generated data layout should NOT contain 64-v128. Reviewers: uweigand Differential Revision: https://reviews.llvm.org/D74146	2020-02-10 04:14:05 -05:00
Djordje Todorovic	68908993eb	[CSInfo] Use isCandidateForCallSiteEntry() when updating the CSInfo Use the isCandidateForCallSiteEntry(). This should mostly be an NFC, but there are some parts ensuring the moveCallSiteInfo() and copyCallSiteInfo() operate with call site entry candidates (both Src and Dest should be the call site entry candidates). Differential Revision: https://reviews.llvm.org/D74122	2020-02-10 10:03:14 +01:00
Sebastian Neubauer	8756869170	[AMDGPU] Add a16 feature to gfx10 Based on D72931 This adds a new feature called A16 which is enabled for gfx10. gfx9 keeps the R128A16 feature so it can share all the instruction encodings with gfx7/8. Differential Revision: https://reviews.llvm.org/D73956	2020-02-10 09:04:23 +01:00
Johannes Doerfert	87ddf1f4fa	[Attributor] Simple casts preserve no-alias property This is a minimal but important advancement over the existing code. A cast with an operand that is only used in the cast retains the no-alias property of the operand.	2020-02-10 01:11:32 -06:00
Amara Emerson	21c9d9ad43	[GlobalISel][CallLowering] Tighten constantexpr check for callee. I'm not sure there's a test case for this, but it's better to be safe.	2020-02-09 22:59:48 -08:00
Johannes Doerfert	8155439331	[Attributor] Allow PHI nodes in AAValueConstantRangeFloating Traversing PHI nodes is natural with the genericValueTraversal but also a bit tricky. The problem is similar to the ones we have seen in AAAlign and AADereferenceable, namely that we continue to increase the range in each iteration. We use a pessimistic approach here to stop the iterations. Nevertheless, optimistic information can now be propagated through a PHI node.	2020-02-10 00:55:10 -06:00
Johannes Doerfert	63adbb9a0e	[Attributor][FIX] Remove FIXME that seems outdated The change is performed as stated by the FIXME and the tests are adjusted. All changes look fine to me and values can be inferred as undef without it being an error.	2020-02-10 00:55:10 -06:00
Johannes Doerfert	7e7e6594b3	[Attributor] Allow SelectInst in AAValueConstantRangeFloating The genericValueTraversal will already handle SelectInst properly and we just needed to allow them in the initialize method.	2020-02-10 00:55:09 -06:00
Johannes Doerfert	ffdbd2a06c	[Attributor] Look through (some) casts in AAValueConstantRangeFloating Casts can be handled natively by the ConstantRange class. We do limit it to extends for now as we assume an integer type in different locations. A TODO and a test case with a FIXME was added to remove that restriction in the future.	2020-02-10 00:38:01 -06:00
Johannes Doerfert	028db8c490	[Attributor][FIX] Call right base method in AAValueConstantRangeFloating We now call the base class method as we should.	2020-02-10 00:38:01 -06:00
Craig Topper	06ba969c9d	[X86] Make (insert_vector_elt (v8i16 zerovec), i16 %x, 0) generate the same code as (v8i16 (build_vector %x, 0, 0, 0, 0, 0, 0, 0)). Instead of using a insrw to element 0, use movzx and movd. Same for v16i8.	2020-02-09 21:52:11 -08:00
Michael Liao	ab3da5dd66	Fix `-Wparentheses` warning. NFC.	2020-02-10 00:45:02 -05:00
Craig Topper	05d44204fa	[X86] Use MOVZX instead of MOVSX in f16_to_fp isel patterns. Using sign extend forces the adjacent element to either all zeros or all ones. But all ones is a NAN. So that doesn't seem like a great idea. Trying to work on supporting this with strict FP where NAN would definitely be bad.	2020-02-09 20:39:52 -08:00
Shiva Chen	64f417200e	[RISCV] Fix incorrect FP base CFI offset for variable argument functions When the FP exists, the FP base CFI directive offset should take the size of variable arguments into account. Differential Revision: https://reviews.llvm.org/D73862	2020-02-10 11:56:08 +08:00
Fangrui Song	512c03bac4	[DebugInfo] Add a DWARFDataExtractor constructor that takes ArrayRef<uint8_t> Similar to D67797 (DataExtractor).	2020-02-09 17:45:32 -08:00
Matt Arsenault	312a9d1b83	GlobalISel: Fix narrowScalar for G_{CTLZ\|CTTZ}_ZERO_UNDEF Narrow these for 64-bit VALU for AMDGPU.	2020-02-09 19:02:38 -05:00
Matt Arsenault	c437f6c687	AMDGPU/GlobalISel: Split 64-bit G_CTPOP in RegBankSelect	2020-02-09 18:39:33 -05:00
Matt Arsenault	6135f5eda4	GlobalISel: Fix narrowing of G_CTLZ/G_CTTZ The result type is separate from the source type.	2020-02-09 18:11:43 -05:00
Matt Arsenault	2126c70e3a	AMDGPU/GlobalISel: Don't mis-select vector index on a constant Vector indexing with a constant index should be folded out in the legalizer, but this was accidentally falling through. This would produce the indexing operation with $noreg. Handle this case as a dynamic index just in case a bug like this happens again in the future.	2020-02-09 18:02:37 -05:00
Matt Arsenault	f4a38c114e	AMDGPU/GlobalISel: Look through casts when legalizing vector indexing We were failing to find constants that were casted. I feel like the artifact combiner should have folded the constant in the trunc before the custom lowering, but that doesn't happen.	2020-02-09 18:02:10 -05:00
Matt Arsenault	00115d767f	AMDGPU: Remove dead kill handling At one point a custom node was used for kill handling, but now the intrinsic is directly selected. Remove leftover pattern machinery.	2020-02-09 17:59:24 -05:00
Matt Arsenault	6e1770821f	AMDGPU: Fix SI_IF lowering when the save exec reg has terminator uses Reverts part of `6524a7a2b9`. Since that commit, the expansion was ignoring the actual save exec register produced by the instruction, and looking at other instructions. I do not understand why it was looking at other instructions, but relying on this scan was wrong. Fixes verifier errors after SI_IF is tail duplicated, which should be correct to do. The results were fed into a phi, which was lowered to the S_MOV_B64_term instructions.	2020-02-09 17:59:19 -05:00
Simon Pilgrim	29e646fe65	[X86] combineConcatVectorOps - combine VROTLI/VROTRI ops Fix issue mentioned on rGe82e17d4d4ca - non-AVX512BW targets failed to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:50:10 +00:00
Craig Topper	656d66f5fc	[X86] Use custom isel for (X86sbb_flag 0, 0) so we can use 32-bit SBB for i8/i16. We were using MOV32r0 and an extract_subreg as an input. By using custom isel we can move the extract_subreg to after the SBB instead of on the input.	2020-02-09 13:19:35 -08:00
Craig Topper	e1cbfecdb8	[X86] Add flag result VT to a MOV32r0 created in X86DAGToDAGISel::Select The flag isn't used, but I believe this matches the MOV32r0 that would be created by the table emitter. This should allow this node to be CSEed with any others created by the table.	2020-02-09 13:19:21 -08:00
Simon Pilgrim	e82e17d4d4	[X86] Add lowerShuffleAsBitRotate (PR44379) As noted on PR44379, we didn't attempt to lower vector shuffles using bit rotations on XOP/AVX512F targets. This patch lowers to uniform ISD:ROTL nodes - ROTR isn't supported by XOP and they are interchangeable for constant values anyway. There might be cases where targets without ISD:ROTL support would benefit from this (expanding to SRL+SHL+OR), which I'll investigate in a future patch. Also, non-AVX512BW targets fail to concatenate 256-bit rotations back to 512-bits (split during shuffle lowering as they don't have v32i16/v64i8 types).	2020-02-09 21:15:03 +00:00
Craig Topper	dd262222b4	[X86] Use MVT::i32 for the type of a MOV32r0 created in X86DAGToDAGISel::Select. Not sure if this really matters. The VT isn't really used after this point. At best it might affect CSE.	2020-02-09 11:57:42 -08:00
Craig Topper	dbcc1392b3	[X86] Remove isel patterns that include a vselect/X86selects and a strict FP node. A vselect+strictfp node is not equivalent to a masked operation. The exceptions of the strictfp node are not masked by a vselect after it so we can't match it to a masked operation. We already had a hack in IsLegalToFold to prevent these patterns from matching. This patch removes that hack and removes the patterns.	2020-02-09 11:45:54 -08:00
Simon Pilgrim	29621b2534	[X86] Rename matchShuffleAsRotate - matchShuffleAsByteRotate. NFCI. A matchShuffleAsBitRotate variant will be added soon and we need to make the difference more obvious.	2020-02-09 18:35:50 +00:00
Sanjay Patel	a17f03bd93	[VectorCombine] new IR transform pass for partial vector ops We have several bug reports that could be characterized as "reducing scalarization", and this topic was also raised on llvm-dev recently: http://lists.llvm.org/pipermail/llvm-dev/2020-January/138157.html ...so I'm proposing that we deal with these patterns in a new, lightweight IR vector pass that runs before/after other vectorization passes. There are 4 alternate options that I can think of to deal with this kind of problem (and we've seen various attempts at all of these), but they all have flaws: InstCombine - can't happen without TTI, but we don't want target-specific folds there. SDAG - too late to assist other vectorization passes; TLI is not equipped for these kind of cost queries; limited to a single basic block. CGP - too late to assist other vectorization passes; would need to re-implement basic cleanups like CSE/instcombine. SLP - doesn't fit with existing transforms; limited to a single basic block. This initial patch/transform is based on existing code in AggressiveInstCombine: we walk backwards through the function looking for a pattern match. But we diverge from that cost-independent IR canonicalization pass by using TTI to decide if the vector alternative is profitable. We probably have at least 10 similar bug reports/patterns (binops, constants, inserts, cheap shuffles, etc) that would fit in this pass as follow-up enhancements. It's possible that we could iterate on a worklist to fix-point like InstCombine does, but it's safer to start with a most basic case and evolve from there, so I didn't try to do anything fancy with this initial implementation. Differential Revision: https://reviews.llvm.org/D73480	2020-02-09 10:04:41 -05:00
Simon Pilgrim	3ec6de07e9	Fix signed/unsigned warning.	2020-02-09 13:35:03 +00:00
Simon Pilgrim	644d56b432	[X86] Recognise ROTLI/ROTRI rotations as faux shuffles Allows us to combine rotations with shuffles. One of many things necessary to fix PR44379 (lowering shuffles to rotations)	2020-02-09 12:25:49 +00:00
Ehud Katz	3b70ee27a5	[LoopExtractor] Convert LoopExtractor from LoopPass to ModulePass The LoopExtractor created new functions (by definition), which violates the restrictions of a LoopPass. The correct implementation of this pass should be as a ModulePass. Includes reverting rL82990 implications on the LoopExtractor. Fixes PR3082 and PR8929. Differential Revision: https://reviews.llvm.org/D69069	2020-02-09 12:25:21 +02:00
serge_sans_paille	e67cbac812	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 10:42:45 +01:00
serge-sans-paille	4546211600	Revert "Support -fstack-clash-protection for x86" This reverts commit `0fd51a4554`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-win-x-armv7l/builds/4354	2020-02-09 10:06:31 +01:00
serge_sans_paille	0fd51a4554	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with proper LiveIn declaration, better option handling and more portable testing. Differential Revision: https://reviews.llvm.org/D68720	2020-02-09 09:35:42 +01:00
Craig Topper	e629674176	[X86] Add more scalar intrinsic instructions to isNonFoldablePartialRegisterLoad. I think this covers most if not all of the scalar intrinsic instructions.	2020-02-08 20:41:36 -08:00
Johannes Doerfert	b0c77c36d2	[Attributor] Add an Attributor CGSCC pass and run it In addition to the module pass, this patch introduces a CGSCC pass that runs the Attributor on a strongly connected component of the call graph (both old and new PM). The Attributor was always design to be used on a subset of functions which makes this patch mostly mechanical. The one change is that we give up `norecurse` deduction in the module pass in favor of doing it during the CGSCC pass. This makes the interfaces simpler but can be revisited if needed. Reviewed By: hfinkel Differential Revision: https://reviews.llvm.org/D70767	2020-02-08 21:27:34 -06:00
Fangrui Song	ee3f13b81d	Fix -Wunused-lambda-capture for -DLLVM_ENABLE_ASSERTIONS=off builds after `6556c615f3`	2020-02-08 19:03:58 -08:00
Craig Topper	0152b106ae	[X86] Add the recently added (V)CVTSS2SI/CVTSD2SI instructions used for LRINT/LLRINT to the load folding tables.	2020-02-08 17:54:48 -08:00
fady	e8a436c5ea	[OpenMP][OMPIRBuilder] Add Directives (master and critical) to OMPBuilder. Add support for Master and Critical directive in the OMPIRBuilder. Both make use of a new common interface for emitting inlined OMP regions called `emitInlinedRegion` which was added in this patch as well. Also this patch modifies clang to use the new directives when `-fopenmp-enable-irbuilder` commandline option is passed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D72304	2020-02-08 18:55:48 -06:00
Johannes Doerfert	e565db49c6	[OpenMP][Opt] Delete terminating and read-only parallel regions Parallel regions known to be read-only, e.g., after we removed all dead write accesses, and terminating (`willreturn`) can be removed. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69954	2020-02-08 18:52:04 -06:00
Johannes Doerfert	e28936f613	[OpenMP][Opt] Annotate known runtime functions and deduplicate more This adds ~27 more runtime calls to the OpenMPKinds.def file, all with attributes. We deduplicate 16 of those automatically in function = thread scope. And we annotate all of them automatically during the OpenMPOpt discovery step. A test with all omp_XXXX runtime calls to track annotation coverage is included. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69984	2020-02-08 18:35:39 -06:00
Craig Topper	d643a39aba	[X86] Use any_fadd/sub/mul/div/sqrt with the AVX512 scalar_*_patterns. Making sure not to use them with patterns for masked instructions. Also fix FMA patterns that were matching strict_fma+x86selects to masked instructions.	2020-02-08 15:54:40 -08:00
Nikita Popov	a05932931c	[InstCombine] Refactor foldICmpAndShift(); NFCI Separate out handling for shl, lshr and ashr. The combined handling obscured some overly pessimistic requirements for the transform.	2020-02-08 22:27:43 +01:00
Johannes Doerfert	9548b74a83	[OpenMP] Introduce the OpenMPOpt transformation pass The OpenMPOpt pass is a CGSCC pass in which OpenMP specific optimizations can reside. The OpenMPOpt pass uses the OpenMPKinds.def file to identify runtime calls and their uses. This allows targeted transformations and eases their implementation. This initial patch deduplicates `__kmpc_global_thread_num` and `omp_get_thread_num` calls. We can also identify arguments that are equivalent to such a call result and use it instead. Later we can determine "gtid" arguments based on the use in kernel functions etc. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D69930	2020-02-08 14:47:03 -06:00
Johannes Doerfert	72277ecd62	Introduce a CallGraph updater helper class The CallGraphUpdater is a helper that simplifies the process of updating the call graph, both old and new style, while running an CGSCC pass. The uses are contained in different commits, e.g. D70767. More functionality is added as we need it. Reviewed By: modocache, hfinkel Differential Revision: https://reviews.llvm.org/D70927	2020-02-08 14:16:48 -06:00
George Burgess IV	f8c9ceb1ce	[SimplifyLibCalls] Add __strlen_chk. Bionic has had `__strlen_chk` for a while. Optimizing that into a constant is quite profitable, when possible. Differential Revision: https://reviews.llvm.org/D74079	2020-02-08 11:51:00 -08:00
Nikita Popov	a148b9e990	[InstCombine] Fix infinite min/max canonicalization loop (PR44541) While D72944 also fixes https://bugs.llvm.org/show_bug.cgi?id=44541, it does so in a more roundabout manner and there might be other loopholes to trigger the same issue. This is a more direct fix, that prevents the transform if the min/max is based on a non-canonical sub X, 0 instruction. Differential Revision: https://reviews.llvm.org/D73849	2020-02-08 20:42:17 +01:00
Craig Topper	eeb63944e4	[LegalizeTypes][ARM][AArch64][PowerPC][RISCV][X86] Use BUILD_PAIR to return expanded integer results from ReplaceNodeResults instead of just returning two results. Remove code from LegalizeTypes that allowed this to work. We were already using BUILD_PAIR for this in some places so this standardizes on a single way to do this.	2020-02-08 09:52:31 -08:00
Simon Pilgrim	4aa7b9cc96	[X86] X86InstComments - add FMA4 comments These typically match the FMA3 equivalents, although the multiply operands sometimes get flipped due to the FMA3 permute variants.	2020-02-08 17:02:00 +00:00
Simon Pilgrim	10417ad2e4	[X86] Standardize BROADCAST enum names (PR31079) Tweak EVEX implementation names so it matches the other variants by adding the 'r' prefix. Oddly some of the subvec broadcast ops already matched.	2020-02-08 16:55:00 +00:00
Nikita Popov	5b2b67be8e	[InstCombine] Remove unnecessary worklist push; NFCI This is no longer needed after `d4627b90a0`, should have dropped it there...	2020-02-08 17:09:28 +01:00
Nikita Popov	d4627b90a0	[InstCombine] Avoid modifying instructions in-place As discussed on D73919, this replaces a few cases where we were modifying multiple operands of instructions in-place with the creation of a new instruction, which we generally prefer nowadays. This tends to be more readable and less prone to worklist management bugs. Test changes are only superficial (instruction naming and order).	2020-02-08 17:05:56 +01:00
Nikita Popov	9d03b7d0d0	[InstCombine] Use swapValues(); NFC Less code, and makes it more obvious that these operands do not need to be added back to the worklist.	2020-02-08 16:57:28 +01:00
Nikita Popov	23db9724d0	[InstCombine] Fix infinite loop in min/max load/store bitcast combine (PR44835) Fixes https://bugs.llvm.org/show_bug.cgi?id=44835. Skip the transform if it wouldn't actually do anything (apart from removing and reinserting the same instructions). Note that the test case doesn't loop on current master anymore, only on the LLVM 10 release branch. The issue is already mitigated on master due to worklist order fixes, but we should fix the root cause there as well. As a side note, we should probably assert in combineLoadToNewType() that it does not combine to the same type. Not doing this here, because this assertion would also be triggered in another place right now. Differential Revision: https://reviews.llvm.org/D74278	2020-02-08 16:55:22 +01:00
Simon Pilgrim	0ed79e9b8f	[X86] Standardize VPSLLDQ/VPSRLDQ enum names (PR31079) Tweak EVEX implementation names so it matches the other variants	2020-02-08 14:54:44 +00:00
serge-sans-paille	658495e6ec	Revert "Support -fstack-clash-protection for x86" This reverts commit `e229017732`. Failures: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-debian/builds/2604 http://lab.llvm.org:8011/builders/llvm-clang-win-x-aarch64/builds/4308	2020-02-08 14:26:22 +01:00
Victor Campos	af2a384581	Revert "[ARM] Improve codegen of volatile load/store of i64" This reverts commit `60e0120c91`.	2020-02-08 13:18:45 +00:00
Igor Kudrin	1ea99a2ebc	[DebugInfo] Allow reading an address table with a mismatched address. This case does not look as an unrecoverable error. Differential Revision: https://reviews.llvm.org/D74194	2020-02-08 20:00:03 +07:00
serge_sans_paille	e229017732	Support -fstack-clash-protection for x86 Implement protection against the stack clash attack [0] through inline stack probing. Probe stack allocation every PAGE_SIZE during frame lowering or dynamic allocation to make sure the page guard, if any, is touched when touching the stack, in a similar manner to GCC[1]. This extends the existing `probe-stack' mechanism with a special value `inline-asm'. Technically the former uses function call before stack allocation while this patch provides inlined stack probes and chunk allocation. Only implemented for x86. [0] https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt [1] https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00556.html This a recommit of `39f50da2a3` with better option handling and more portable testing Differential Revision: https://reviews.llvm.org/D68720	2020-02-08 13:31:52 +01:00
Benjamin Kramer	ef83d46b6b	Use heterogenous lookup for std;:map<std::string with a StringRef. NFCI.	2020-02-08 13:28:29 +01:00
Benjamin Kramer	e4230a9f6c	ArrayRef'ize spillCalleeSavedRegisters. NFCI.	2020-02-08 12:19:23 +01:00
Simon Pilgrim	7f5b3fa73c	[X86][SSE] Add X86ISD::FRCP handling to isNegatibleForFree Peek through X86ISD::FRCP nodes to see if there is a negatible input.	2020-02-08 10:56:27 +00:00
Simon Pilgrim	4229f12a22	[TargetLowering] Remove isDesirableToCombineBuildVectorToShuffleTruncate target hook. NFC. This hasn't been used for years, its original implementation, D35700, had bugs that caused the reversion of most of the code, and since then x86 shuffle lowering/combining has handled most cases and can deal with the rest as well.	2020-02-08 08:55:51 +00:00
Craig Topper	2af1640f9a	[LegalizeDAG][X86][AMDGPU] Use ANY_EXTEND instead of ZERO_EXTEND when promoting ISD::CTTZ/CTTZ_ZERO_UNDEF. Summary: For CTTZ we place a set bit just past where the non-promoted type stopped so the extended bits won't be used for the count. For CTTZ_ZERO_UNDEF we don't care what happens if no bits are set in the original type and we end up counting into the extended bits. So we can just use ANY_EXTEND for both cases. This matches what is done in type legalization for these operations. We make no effort to force the upper bits to zero. Differential Revision: https://reviews.llvm.org/D74111	2020-02-07 22:25:56 -08:00
Akira Hatanaka	4dcc029edb	[ObjC][ARC] Keep track of phis that have been discovered to avoid an infinite loop This fixes a bug introduced in `6770fbb314`. rdar://problem/59137105	2020-02-07 20:33:11 -08:00
Sam Clegg	caeb6cfbc2	[WebAssembly] Fix signature of __powitf2 libcall Add tests for @llvm.powi.f64/f128. See: https://llvm.org/docs/LangRef.html#llvm-powi-intrinsic Differential Revision: https://reviews.llvm.org/D74274	2020-02-07 20:30:47 -08:00
Heejin Ahn	5b5cbfe135	[WebAssembly] Add debug info to insts in Emscripten SjLj Summary: This makes sure all newly create instructions in Emscripten SjLj has appropriate debug info attached. Fixes https://github.com/emscripten-core/emscripten/issues/9797. Reviewers: kripken Subscribers: dschuff, aprantl, sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D74269	2020-02-07 19:08:39 -08:00
Akira Hatanaka	6770fbb314	[ObjC][ARC] Delete ARC runtime calls that take inert phi values This improves on the following patch, which removed ARC runtime calls taking inert global variables: https://reviews.llvm.org/D62433 rdar://problem/59137105	2020-02-07 16:31:36 -08:00
David Blaikie	ba9cae58bb	IR Linking: Support merging Warning+Max module metadata flags Summary: Debug Info Version was changed to use "Max" instead of "Warning" per the original design intent - but this maxes old/new IR unlinkable, since mismatched merge styles are a linking failure. It seems possible/maybe reasonable to actually support the combination of these two flags: Warn, but then use the maximum value rather than the first value/earlier module's value. Reviewers: tejohnson Differential Revision: https://reviews.llvm.org/D74257	2020-02-07 16:29:58 -08:00
Amara Emerson	35c63d66aa	[GlobalISel][CallLowering] Look through bitcasts from constant function pointers. Calls to ObjC's objc_msgSend function are done by bitcasting the function global to the required function type signature. This patch looks through this bitcast so that we can do a direct call with bl on arm64 instead of using an indirect blr. Differential Revision: https://reviews.llvm.org/D74241	2020-02-07 15:32:54 -08:00
Craig Topper	bb717d3f46	[X86] Correct the implementation of the avx512 masked fmsubadd autoupgrade code to not leave the negate unconnected. This was causing us to generate fmaddsub instead of fmsubadd if rounding control is not 4.	2020-02-07 15:27:05 -08:00

... 3 4 5 6 7 ...

131407 Commits