llvm-project

Commit Graph

Author	SHA1	Message	Date
Geoffrey Martin-Noble	c17ae2916c	Remove unnecessary header include which violates layering This was introduced in https://reviews.llvm.org/D89774, but I don't think it should be necessary. Reviewed By: TaWeiTu, aeubanks Differential Revision: https://reviews.llvm.org/D89843	2020-10-20 20:14:03 -07:00
Carl Ritson	324a15cead	[AMDGPU][NFC] Fix missing size in comment	2020-10-21 11:38:21 +09:00
Cyndy Ishida	acb33cba6d	[llvm] Fix ODRViolations for VersionTuple YAML specializations NFC It appears for Swift there was confusing errors when trying to parse APINotes, when libAPINotes and libInterfaceStub are linked, they both export symbol `__ZN4llvm4yaml7yamlizeINS_12VersionTupleEEENSt3__19enable_ifIXsr16has_ScalarTraitsIT_EE5valueEvE4typeERNS0_2IOERS5_bRNS0_12EmptyContextE`, and discovered same symbol defined within llvm-ifs. This consolidates the boilerplate into YAMLTraits and defers the specific validation in reading the whole input. fixes: rdar://problem/70450563 Reviewed By: phosek, dblaikie Differential Revision: https://reviews.llvm.org/D89764	2020-10-20 18:29:15 -07:00
Hubert Tong	134ffa8138	NFC: Fix -Wsign-compare warnings on 32-bit builds Comparing 32-bit `ptrdiff_t` against 32-bit `unsigned` results in `-Wsign-compare` warnings for both GCC and Clang. The warning for the cases in question appear to identify an issue where the `ptrdiff_t` value would be mutated via conversion to an unsigned type. The warning is resolved by using the usual arithmetic conversions to safely preserve the value of the `unsigned` operand while trying to convert to a signed type. Host platforms where `unsigned` has the same width as `unsigned long long` will need to make a different change, but using an explicit cast has disadvantages that can be avoided for now. Reviewed By: dantrushin Differential Revision: https://reviews.llvm.org/D89612	2020-10-20 20:52:10 -04:00
Austin Kerbow	ebdcef20ce	[AMDGPU] Avoid inserting noops during scheduling Passes that are run after the post-RA scheduler may insert instructions like waitcnt which eliminate the need for certain noops. After this patch the scheduler is still aware of possible latency from hazards but noops will not be inserted until the dedicated hazard recognizer pass is run. Depends on D89753. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D89754	2020-10-20 17:11:36 -07:00
Austin Kerbow	37d907899f	[HazardRec] Allow inserting multiple wait-states simultaneously If a target can encode multiple wait-states into a noop allow emitting such instructions directly. Reviewed By: rampitec, dmgreen Differential Revision: https://reviews.llvm.org/D89753	2020-10-20 17:03:47 -07:00
Tony	1bc7bfffdb	[AMDGPU] Optimize waitcnt insertion for flat memory operations Change waitcnt insertion to check the memory operand tokens to see if flat memory operations access VMEM in the same way it does to check if accessing LDS. This avoids adding waitcnt for counters for address spaces that are not accessed. In addition, only generate the pessimistic waitcnt 0 if a flat memory operation appears to access both VMEM and LDS. This benefits flat memory operations that explicitly specify the address space as GLOBAL or LOCAL. Differential Revision: https://reviews.llvm.org/D89618	2020-10-20 22:55:12 +00:00
Craig Topper	1298252f80	[X86] Move 'int $3' -> 'int3' handling in the assembler to processInstruction. Instead of handling before parsing, just fix it after parsing.	2020-10-20 15:22:00 -07:00
Craig Topper	702aae368a	[X86] Move 's{hr,ar,hl} , <op>' to 'shift <op>' optimization in the assembler into processInstruction. Instead of detecting the mnemonic and hacking the operands before parsing. Just fix it up after parsing.	2020-10-20 15:20:46 -07:00
Kazu Hirata	96f372c1e7	[AsmWriter] Construct SlotTracker with the function This patch teaches BasicBlock::print to construct an instance of SlotTracker with the containing function. Without this patch, we dump: * IR Dump After LoopInstSimplifyPass * ; Preheader: br label %1 ; Loop: <badref>: ; preds = %1, %0 br label %1 Note "<badref>" above. This happens because BasicBlock::print calls: SlotTracker SlotTable(this->getModule()); Note that this constructor does not add the contents of functions to the slot table. That is, basic blocks are left unnumbered. This patch fixes the problem by switching to: SlotTracker SlotTable(this->getParent()); which does add the contents of the Module and the function, this->getParent(), to the slot table. Differential Revision: https://reviews.llvm.org/D89567	2020-10-20 15:01:40 -07:00
Paul C. Anagnostopoulos	332ff48dee	[AMDGPU] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans Differential Revision: https://reviews.llvm.org/D89796	2020-10-20 16:23:15 -04:00
Shimin Cui	95bda510fb	[ConstantFold] Fold the comparison of bitcasted global values This is to simplify icmp instructions in the form like: %cmp = icmp eq i32 (i8, i8)* bitcast (i32 (i32, i32)* @f32 to i32 %(i8, i8)), bitcast (i32 (i64, i64) @f64 to i32 (i8, i8)*) Here @f32 and @f64 are two functions. Differential Revision: https://reviews.llvm.org/D87850	2020-10-20 12:41:49 -07:00
David Stenberg	0c0fcea557	Handle value uses wrapped in metadata for the use-list order When generating the use-list order, also consider value uses that are operands which are wrapped in metadata; e.g. llvm.dbg.value operands. This fixes PR36778. The test case is based on the reproducer from that report. Reviewed By: dexonsmith Differential Revision: https://reviews.llvm.org/D53758	2020-10-20 20:05:59 +02:00
Nicolai Hähnle	848a68a032	DomTree: Extract (mostly) read-only logic into type-erased base classes Avoid having to instantiate and compile a subset of the dominator tree logic separately for each node type. More importantly, this allows generic algorithms to be built on top of dominator trees without writing them as templates -- such algorithms can now use opaque CfgBlockRef and CfgInterface instead. A type-erased implementation of dominator trees could be written in terms of CfgInterface as well, but doing so would change the current trade-off: it would slightly reduce code size at the cost of a slight runtime overhead. This patch does not change the trade-off, as it only does type-erasure where basic blocks can be treated in a fully opaque way, i.e. it only moves methods that don't require iteration over CFG successors and predecessors. v5: - rename generic_{begin,end,children} back without the generic_ prefix and refer explictly to base class methods in NewGVN, which wants to mutate the order of dominator tree node children directly v6: - style change: iDom -> idom; it's arguable whether this is really invalid, since it is actually standard camelCase, but clang-tidy complains about it so... shrug - rename {to,from}Generic -> {wrap,unwrap}Ref Change-Id: Ib860dc04cf8bb093d8ed00be7def40d662213672 Differential Revision: https://reviews.llvm.org/D83089	2020-10-20 19:53:07 +02:00
Ta-Wei Tu	529ecd19df	[NPM] port -unify-loop-exits to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89774	2020-10-20 10:46:57 -07:00
vnalamot	89f7ccea6f	[AMDGPU] Remove getAllVGPR32() which cannot handle Accum VGPRs properly Remove getAllVGPR32() interface and update the SGPR spill code to use a proper method to get the relevant VGPR registers list. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D89806	2020-10-20 23:15:24 +05:30
Ta-Wei Tu	59286b36df	[NPM] Port -mergereturn to NPM Reviewed By: aeubanks Differential Revision: https://reviews.llvm.org/D89781	2020-10-20 10:33:58 -07:00
Florian Hahn	2e58010208	[DSE] Do not scan users of memory terminators for further reads. isMemTerminator checks if the current def is a memory terminator that terminates the memory pointed to by DefLoc. We do not have to add any of their users to the worklist, because the follow-on users cannot read the memory in question. This leads to more stores eliminated in the presence of lifetime calls. Previously we added the users of those intrinsics to the worklist, limiting elimination. In terms of removed stores, this gives a nice boost on some benchmarks (MultiSource/SPEC2000/SPEC2006 on X86 with -flto -O3): Same hash: 205 (filtered out) Remaining: 32 Metric: dse.NumFastStores Program base patch diff test-suite...000/197.parser/197.parser.test 4.00 8.00 100.0% test-suite...rolangs-C++/family/family.test 4.00 7.00 75.0% test-suite...marks/7zip/7zip-benchmark.test 1722.00 2189.00 27.1% test-suite...CFP2000/177.mesa/177.mesa.test 30.00 38.00 26.7% test-suite :: External/Nurbs/nurbs.test 44.00 49.00 11.4% test-suite...lications/sqlite3/sqlite3.test 115.00 128.00 11.3% test-suite...006/447.dealII/447.dealII.test 2715.00 3013.00 11.0% test-suite...ProxyApps-C++/CLAMR/CLAMR.test 237.00 261.00 10.1% test-suite...tions/lambda-0.1.3/lambda.test 40.00 44.00 10.0% test-suite...3.xalancbmk/483.xalancbmk.test 1366.00 1475.00 8.0% test-suite...abench/jpeg/jpeg-6a/cjpeg.test 13.00 14.00 7.7% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 46.00 7.0% test-suite...lications/ClamAV/clamscan.test 230.00 246.00 7.0% test-suite...006/450.soplex/450.soplex.test 284.00 299.00 5.3% test-suite...nsumer-jpeg/consumer-jpeg.test 21.00 22.00 4.8%	2020-10-20 16:55:22 +01:00
Simon Pilgrim	ec228fbfc0	[InstCombine] SimplifyDemandedUseBits - replace dyn_cast<ConstantInt> with m_ConstantInt. NFCI.	2020-10-20 16:45:16 +01:00
Jay Foad	4913e3627c	[AMDGPU] Remove unused declaration. NFC. The implementation of this method was removed in D89706.	2020-10-20 16:31:42 +01:00
Simon Pilgrim	ce13549761	[InstCombine] foldOrOfICmps - use m_Specific instead of explicit comparisons. NFCI.	2020-10-20 16:26:41 +01:00
Michael Liao	2a0e4d1c01	[amdgpu] Enhance AMDGPU AA. - In general, a generic point may alias to pointers in all other address spaces. However, for certain cases enforced by the programming model, we may found a generic point won't alias to pointers to local objects. * When a generic pointer is loaded from the constant address space, it could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it won't alias to pointers to the PRIVATE or LOCAL address space. * When a generic pointer is passed as a kernel argument, it also could only be a pointer to the GLOBAL or CONSTANT address space. Thus, it also won't alias to pointers to the PRIVATE or LOCAL address space. Differential Revision: https://reviews.llvm.org/D89525	2020-10-20 09:54:12 -04:00
Florian Hahn	6439fde6d4	[DSE] Bail out from getLocForWriteEx if call is not argmemonly/inacc_mem. This change should currently not have any impact, but guard against further inconsistencies between MemoryLocation and function attributes.	2020-10-20 14:37:53 +01:00
Georgii Rymar	6487ffafd1	Reland "[yaml2obj][ELF] - Simplify the code that performs sections validation." This reverts commit `1b589f4d4d` and relands the D89463 with the fix: update `MappingTraits<FileFilter>::validate()` in ClangTidyOptions.cpp to match the new signature (change the return type to "std::string" from "StringRef"). Original commit message: This: Changes the return type of MappingTraits<T>>::validate to std::string instead of StringRef. It allows to create more complex error messages. It introduces std::vector<std::pair<StringRef, bool>> getEntries(): a new virtual method of Section, which is the base class for all sections. It returns names of special section specific keys (e.g. "Entries") and flags that says if them exist in a YAML. The code in validate() uses this list of entries descriptions to generalize validation. This approach was discussed in the D89039 thread. Differential revision: https://reviews.llvm.org/D89463	2020-10-20 16:25:33 +03:00
Sanjay Patel	7c516504a1	[InstSimplify] allow vector splats for icmp-of-neg folds	2020-10-20 09:24:36 -04:00
Simon Pilgrim	e372a5f86f	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support Reapplied rGa704d8238c86 with a check for integer/integervector types to prevent matching with pointer types	2020-10-20 14:14:26 +01:00
Georgii Rymar	1b589f4d4d	Revert "[yaml2obj][ELF] - Simplify the code that performs sections validation." This reverts commit `b9e2b59680`.	2020-10-20 15:16:56 +03:00
Nicolai Hähnle	c0cdd22c72	Introduce CfgTraits abstraction The CfgTraits abstraction simplfies writing algorithms that are generic over the type of CFG, and enables writing such algorithms as regular non-template code that operates on opaque references to CFG blocks and values. Implementations of CfgTraits provide operations on the concrete CFG types, e.g. `IrCfgTraits::BlockRef` is `BasicBlock `. CfgInterface is an abstract base class which provides operations on opaque types CfgBlockRef and CfgValueRef. Those opaque types encapsulate a `void `, but the meaning depends on the concrete CFG type. For example, MachineCfgTraits -- for use with MachineIR in SSA form -- encodes a Register inside CfgValueRef. Converting between concrete references and opaque/generic ones is done by CfgTraits::{fromGeneric,toGeneric}. Convenience methods CfgTraits::{un}wrap{Iterator,Range} are available as well. Writing algorithms in terms of CfgInterface adds some overhead (virtual method calls, plus in same cases it removes the opportunity to inline iterators), but can be much more convenient since generic algorithms can be written as non-templates. This patch adds implementations of CfgTraits for all CFGs on which dominator trees are calculated, so that the dominator tree can be ported to this machinery. Only IrCfgTraits (LLVM IR) and MachineCfgTraits (Machine IR in SSA form) are complete, the other implementations are limited to the absolute minimum required to make the upcoming dominator tree changes work. v5: - fix MachineCfgTraits::blockdef_iterator and allow it to iterate over the instructions in a bundle - use MachineBasicBlock::printName v6: - implement predecessors/successors for all CfgTraits implementations - fix error in unwrapRange - rename toGeneric/fromGeneric into wrapRef/unwrapRef to have naming that is consistent with {wrap,unwrap}{Iterator,Range} - use getVRegDef instead of getUniqueVRegDef v7: - std::forward fix in wrapping_iterator - fix typos v8: - cleanup operators on CfgOpaqueType - address other review comments Change-Id: Ia75f4f268fded33fca11218a7d578c9aec1f3f4d Differential Revision: https://reviews.llvm.org/D83088	2020-10-20 13:50:52 +02:00
Simon Pilgrim	e346ea9905	[InstCombine] SimplifyDemandedUseBits - pass APInt by const reference. NFCI.	2020-10-20 12:13:08 +01:00
Carl Ritson	be2afbd019	[AMDGPU] Remove fix up operand from SI_ELSE Remove immediate operand from SI_ELSE which indicates if EXEC has been modified. Instead always emit code that handles EXEC and remove unnecessary instructions during pre-RA optimisation. This facilitates passes (i.e. SIWholeQuadMode) adding exec mask manipulation post control flow lowering, and pre control flow lower passes do not need to be aware of SI_ELSE handling. Reviewed By: nhaehnle Differential Revision: https://reviews.llvm.org/D89644	2020-10-20 19:15:21 +09:00
Georgii Rymar	3be2b0d1a1	[yaml2obj][NFCI] - Address post commit comments for "[yaml2obj][ELF] - Simplify the code that performs sections validation." This addresses post commit comments for D89463.	2020-10-20 12:51:19 +03:00
Georgii Rymar	b9e2b59680	[yaml2obj][ELF] - Simplify the code that performs sections validation. This: 1) Changes the return type of `MappingTraits<T>>::validate` to `std::string` instead of `StringRef`. It allows to create more complex error messages. 2) It introduces std::vector<std::pair<StringRef, bool>> getEntries(): a new virtual method of Section, which is the base class for all sections. It returns names of special section specific keys (e.g. "Entries") and flags that says if them exist in a YAML. The code in validate() uses this list of entries descriptions to generalize validation. This approach was discussed in the D89039 thread. Differential revision: https://reviews.llvm.org/D89463	2020-10-20 11:28:23 +03:00
Carl Ritson	6aabbeadae	[AMDGPU][NFC] Tidy SIOptimizeExecMaskingPreRA for extensibility Remove duplicate code and move things around to make it easier to add additional optimisations to the pass. Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D89619	2020-10-20 17:22:43 +09:00
Ulrich Weigand	c299f3555d	[SystemZ] Fix disassembler crashes The "Size" value returned by SystemZDisassembler::getInstruction is used by common code even in the case where the routine returns failure. If that Size value exceeds the number of bytes remaining in the section, that could cause disassembler crashes. Fixed by never returning more than the number of bytes remaining.	2020-10-20 10:21:42 +02:00
Evgeny Leviant	991e86156c	[ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89460	2020-10-20 11:14:21 +03:00
David Green	6dcbc323fd	Revert "[ARM][LowOverheadLoops] Adjust Start insertion." This reverts commit `38f625d0d1`. This commit contains some holes in its logic and has been causing issues since it was commited. The idea sounds OK but some cases were not handled correctly. Instead of trying to fix that up later it is probably simpler to revert it and work to reimplement it in a more reliable way.	2020-10-20 08:55:21 +01:00
Atmn Patel	595c615606	[IR] Adds mustprogress as a LLVM IR attribute This adds the LLVM IR attribute `mustprogress` as defined in LangRef through D86233. This attribute will be applied to functions with in languages like C++ where forward progress is guaranteed. Functions without this attribute are not required to make progress. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85393	2020-10-20 03:09:57 -04:00
Luqman Aden	51892a42da	[COFF][ARM] Fix CodeView for Windows on 32bit ARM targets. Create the LLVM / CodeView register mappings for the 32-bit ARM Window targets. Reviewed By: compnerd Differential Revision: https://reviews.llvm.org/D89622	2020-10-19 22:16:16 -07:00
Jonas Devlieghere	f44fb13025	[FileCollector] Move interface into FileCollectorBase (NFC) For the reproducers in LLDB we want to switch to an "immediate mode" FileCollector that writes every file encountered straight to disk so we can generate the actual mapping out-of-process. This patch moves the interface into a separate base class. Differential revision: https://reviews.llvm.org/D89742	2020-10-19 21:37:20 -07:00
Max Kazantsev	a10a64e7e3	[SCEV] Recommit "Use nw flag and symbolic iteration count to sharpen ranges of AddRecs", attempt 2 Fixed wrapping range case & proof methods reduced to constant range checks to save compile time. Differential Revision: https://reviews.llvm.org/D89381	2020-10-20 11:32:36 +07:00
Serguei Katkov	38799975ce	[IRCE] Do not transform if loop has small number of iterations IRCE has some overhead for runtime checks and in case number of iteration is small the overhead can kill the benefit from optimizations. This CL bases on BlockFrequencyInfo of pre-header and header to estimate the number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop. Probably it is better to make more complex cost model but for simplicity it seems the be enough. The usage of BFI is added only for new pass manager and tries to use it efficiently. Reviewers: ebrevnov, dantrushin, asbirlea, mkazantsev Reviewed By: mkazantsev Subscribers: llvm-commits, fhahn Differential Revision: https://reviews.llvm.org/D89541	2020-10-20 10:33:59 +07:00
Qiu Chaofan	1b2fe71ecf	[DAGCombiner] Tighten reasscociation of visitFMA From LangRef, FMF contract should not enable reassociating to form arbitrary contractions. So it should not help rearrange nodes like (fma (fmul x, c1), c2, y) into (fma x, c1*c2, y). Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89527	2020-10-20 10:13:01 +08:00
Volodymyr Sapsai	a28678e20a	Revert "Reland "[Modules] Add stats to measure performance of building and loading modules."" This reverts commit `4000c9ee18`. Test "LLVM :: Other/statistic.ll" is failing on Windows.	2020-10-19 18:27:30 -07:00
Wang, Pengfei	3a85472af2	[X86] Fix assert fail when element type is i1. extract_vector_elt will turn type vxi1 into i8, which triggers the assertion fail. Since we don't really handle vxi1 cases in below code, we can just return from here. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89096	2020-10-20 09:26:32 +08:00
Volodymyr Sapsai	4000c9ee18	Reland "[Modules] Add stats to measure performance of building and loading modules." Measure amount of high-level or fixed-cost operations performed during building/loading modules and during header search. High-level operations like building a module or processing a .pcm file are motivated by previous issues where clang was re-building modules or re-reading .pcm files unnecessarily. Fixed-cost operations like `stat` calls are tracked because clang cannot change how long each operation takes but it can perform fewer of such operations to improve the compile time. Also tracking such stats over time can help us detect compile-time regressions. Added stats are more stable than the actual measured compilation time, so expect the detected regressions to be less noisy. On relanding drop stats in MemoryBuffer.cpp as their value is pretty low but affects a lot of clients and many of those aren't interested in modules and header search. rdar://problem/55715134 Reviewed By: aprantl, bruno Differential Revision: https://reviews.llvm.org/D86895	2020-10-19 15:44:11 -07:00
Stanislav Mekhanoshin	6ddadf9901	[AMDGPU] flat scratch ST addressing mode on gfx10 GFX10 enables third addressing mode for flat scratch instructions, an ST mode. In that mode both register operands are omitted and only swizzled offset is used in addition to flat_scratch base. Differential Revision: https://reviews.llvm.org/D89501	2020-10-19 15:29:52 -07:00
Jordan Rupprecht	8a377f1e3c	[NFC] Inline assertion-only variable	2020-10-19 15:11:37 -07:00
Sergei Trofimovich	1eb812e06d	[VE] Fix initializer visibility Before the change attempt to link libLTO.so against shared LLVM library failed as: ``` [ 76%] Linking CXX shared library ../../lib/libLTO.so ... /usr/bin/cmake -E cmake_link_script CMakeFiles/LTO.dir/link.txt --verbose=1 c++ -o ...libLTO.so.12git ...ibLLVM-12git.so ld: CMakeFiles/LTO.dir/lto.cpp.o: in function `llvm::InitializeAllTargetInfos()': include/llvm/Config/Targets.def:31: undefined reference to `LLVMInitializeVETargetInfo' ``` It happens because on linux llvm build system sets default symbol visibility to "hidden". The fix is to set visibility back to "default" for exported APIs with LLVM_EXTERNAL_VISIBILITY. Bug: https://bugs.llvm.org/show_bug.cgi?id=47847 Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89633	2020-10-19 22:54:41 +01:00
Amy Huang	ea693a1627	[NPM] Port module-debuginfo pass to the new pass manager Port pass to NPM and update tests in DebugInfo/Generic. Differential Revision: https://reviews.llvm.org/D89730	2020-10-19 14:31:17 -07:00
Roman Lebedev	e0567582b8	[NFCI][SCEV] Always refer to enum SCEVTypes as enum, not integer The main tricky thing here is forward-declaring the enum: we have to specify it's underlying data type. In particular, this avoids the danger of switching over the SCEVTypes, but actually switching over an integer, and not being notified when some case is not handled. I have updated most of such switches to be exaustive and not have a default case, where it's pretty obvious to be the intent, however not all of them.	2020-10-20 00:10:22 +03:00
Roman Lebedev	d4b0aa9773	[NFC][SCEV] BuildConstantFromSCEV(): reformat, NFC Makes diff in next commit more readable	2020-10-20 00:10:22 +03:00
Roman Lebedev	3355284b2d	[NFC][SCEVExpander] isHighCostExpansionHelper(): rewrite as a switch If we switch over an enum, compiler can easily issue a diagnostic if some case is not handled. However with an if cascade that isn't so. Experimental evidence suggests new behavior to be superior.	2020-10-20 00:10:22 +03:00
Evgenii Stepanov	188a7d6710	Add alloca size threshold for StackTagging initializer merging. Summary: Initializer merging generates pretty inefficient code for large allocas that also happens to trigger an exponential algorithm somewhere in Machine Instruction Scheduler. See https://bugs.llvm.org/show_bug.cgi?id=47867. This change adds an upper limit for the alloca size. The default limit is selected such that worst case size of memtag-generated code is similar to non-memtag (but because of the ISA quirks, this case is realized at the different value of alloca size, ex. memset inlining triggers at sizes below 512, but stack tagging instructions are 2x shorter, so limit is approx. 256). We could try harder to emit more compact code with initializer merging, but that would only affect large, sparsely initialized allocas, and those are doing fine already. Reviewers: vitalybuka, pcc Subscribers: llvm-commits	2020-10-19 13:44:07 -07:00
Craig Topper	edd0cb11bd	[SelectionDAG][X86] Enable SimplifySetCC CTPOP transforms for vector splats This enables these transforms for vectors: (ctpop x) u< 2 -> (x & x-1) == 0 (ctpop x) u> 1 -> (x & x-1) != 0 (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) (ctpop x) != 1 --> (x == 0) \|\| ((x & x-1) != 0) All enabled if CTPOP isn't Legal. This differs from the scalar behavior where the first two are done unconditionally and the last two are done if CTPOP isn't Legal or Custom. The Legal check produced better results for vectors based on X86's custom handling. Might be worth re-visiting scalars here. I disabled the looking through truncate for vectors. The code that creates new setcc can use the same result VT as the original setcc even if we truncated the input. That may work work for most scalars, but definitely wouldn't work for vectors unless it was a vector of i1. Fixes or at least improves PR47825 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89346	2020-10-19 12:56:59 -07:00
Craig Topper	e28376ec28	[X86] Add i32->float and i64->double bitcast pseudo instructions to store folding table. We have pseudo instructions we use for bitcasts between these types. We have them in the load folding table, but not the store folding table. This adds them there so they can be used for stack spills. I added an exact size check so that we don't fold when the stack slot is larger than the GPR. Otherwise the upper bits in the stack slot would be garbage. That would be fine for Eli's test case in PR47874, but I'm not sure its safe in general. A step towards fixing PR47874. Next steps are to change the ADDSSrr_Int pseudo instructions to use FR32 as the second source register class instead of VR128. That will keep the coalescer from promoting the register class of the bitcast instruction which will make the stack slot 4 bytes instead of 16 bytes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D89656	2020-10-19 12:53:14 -07:00
Lang Hames	9898d9d885	[ORC] Fix a missing include.	2020-10-19 12:13:55 -07:00
Tony	6be9c7d2dc	[AMDGPU] Correct comment typo in SIMemoryLegaliizer.cpp	2020-10-19 18:50:28 +00:00
Jay Foad	56f6bf1a8d	[AMDGPU] Remove MUL_LOHI_U24/MUL_LOHI_I24 These were introduced in r279902 on the grounds that using separate MUL_U24/MUL_I24 and MULHI_U24/MULHI_I24 nodes would introduce multiple uses of the operands, which would prevent SimplifyDemandedBits from simplifying the operands. This has since been fixed by D24672 "AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operations" No functional change intended. At least it has no effect on lit tests. Differential Revision: https://reviews.llvm.org/D89706	2020-10-19 19:15:34 +01:00
Amy Kwan	6a946fd06f	[DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook, Use isOperationLegalOrCustom directly instead. MULH is often expanded on targets. This patch removes the isMulhCheaperThanMulShift hook and uses isOperationLegalOrCustom instead. Differential Revision: https://reviews.llvm.org/D80485	2020-10-19 12:23:04 -05:00
Tony	151e297034	[AMDGPU] Simplify cumode handling in SIMemoryLegalizer Differential Revision: https://reviews.llvm.org/D89663	2020-10-19 17:13:45 +00:00
Mircea Trofin	225065b9a8	[NFC][MC] Type [MC]Register uses in MachineTraceMetrics Differential Revision: https://reviews.llvm.org/D89710	2020-10-19 09:49:52 -07:00
Lang Hames	c89447b659	[ORC] Fix unused variable warning.	2020-10-19 09:06:33 -07:00
Simon Pilgrim	adb52e5f9e	[InstCombine] foldOrOfICmps - only fold (icmp_eq B, 0) \| (icmp_ult/gt A, B) for integer types Fixes a number of stage2 buildbots that were failing when I generalized the m_ConstantInt() logic - that didn't match for pointer types but m_Zero() does......	2020-10-19 17:05:38 +01:00
Mircea Trofin	d454328ea8	[ML] Add final reward logging facility. Allow logging final rewards. A final reward is logged only once, and is serialized as all-zero values, except for the last one. Differential Revision: https://reviews.llvm.org/D89626	2020-10-19 08:44:50 -07:00
Simon Pilgrim	482e6f0041	Revert rGa704d8238c86bac: "[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support" This reverts commit `a704d8238c`. Causing stage2 build failures on some bots.	2020-10-19 16:03:36 +01:00
Simon Pilgrim	de885f1b2a	[InstCombine] Add (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0) vector support Scalar cases were already being handled by foldLogOpOfMaskedICmps (so this was dead code), but refactoring to support non-uniform vectors will take some time, so tweak this fold in the meantime.	2020-10-19 15:41:21 +01:00
Paul C. Anagnostopoulos	2871c6c93f	[Aarch64] [TableGen] Clean up !if(!eq(boolean, 1) and related booleans. Differential Revision: https://reviews.llvm.org/D89551	2020-10-19 10:33:55 -04:00
Simon Pilgrim	ecd25086d1	[InstCombine] Add (icmp eq B, 0) \| (icmp ult/gt A, B) -> (icmp ule A, B-1) vector support	2020-10-19 15:23:48 +01:00
Simon Pilgrim	a704d8238c	[InstCombine] Add or((icmp ult/ule (A + C1), C3), (icmp ult/ule (A + C2), C3)) uniform vector support	2020-10-19 14:55:18 +01:00
Simon Pilgrim	1d90e53044	[InstCombine] foldOrOfICmps - pull out repeated getOperand() calls. NFCI.	2020-10-19 14:28:08 +01:00
Paul C. Anagnostopoulos	dc5d6632b0	[TableGen] Enhance !empty and !size to handle strings and DAGs. Fix bug in the type checking for !empty, !head, !size, !tail.	2020-10-19 09:22:20 -04:00
Piotr Sobczak	c872faf6e0	[AMDGPU] Do not generate S_CMP_LG_U64 on gfx7 S_CMP_LG_U64 was added in gfx8 and is guarded by hasScalarCompareEq64(). Rewrite S_CMP_LG_U64 to S_OR_B32 + S_CMP_LG_U32 for targets that do not support 64-bit scalar compare. Differential Revision: https://reviews.llvm.org/D89536	2020-10-19 14:44:31 +02:00
Kazushi (Jam) Marukawa	6bb60d3e26	[VE] Add setcc for fp128 Add setcc for fp128 and clean existing ISel patterns. Also add a regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89683	2020-10-19 21:36:57 +09:00
Kazushi (Jam) Marukawa	fb2bb6fad4	[VE] Add cast to/from fp128 patterns Add cast to/from fp128 patterns. Clean other cast patterns too. Update a regression test by adding missing tests. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89682	2020-10-19 21:35:27 +09:00
Hans Wennborg	0628bea513	Revert "[PM/CC1] Add -f[no-]split-cold-code CC1 option to toggle splitting" This broke Chromium's PGO build, it seems because hot-cold-splitting got turned on unintentionally. See comment on the code review for repro etc. > This patch adds -f[no-]split-cold-code CC1 options to clang. This allows > the splitting pass to be toggled on/off. The current method of passing > `-mllvm -hot-cold-split=true` to clang isn't ideal as it may not compose > correctly (say, with `-O0` or `-Oz`). > > To implement the -fsplit-cold-code option, an attribute is applied to > functions to indicate that they may be considered for splitting. This > removes some complexity from the old/new PM pipeline builders, and > behaves as expected when LTO is enabled. > > Co-authored by: Saleem Abdulrasool <compnerd@compnerd.org> > Differential Revision: https://reviews.llvm.org/D57265 > Reviewed By: Aditya Kumar, Vedant Kumar > Reviewers: Teresa Johnson, Aditya Kumar, Fedor Sergeev, Philip Pfaffe, Vedant Kumar This reverts commit `273c299d5d`.	2020-10-19 12:31:14 +02:00
Simon Pilgrim	0b7b446a40	[InstCombine] Support vectors-with-undef in and(logicalshift(1,X),1) --> zext(X == 0) fold	2020-10-19 11:10:32 +01:00
Kazushi (Jam) Marukawa	8796746b2a	[VE] Support select_cc Add missing ISel patterns related to select_cc DAG nodes. Add regression test of all combination of possible scalar types. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89672	2020-10-19 18:54:25 +09:00
Kazushi (Jam) Marukawa	f2fd42098c	[VE] Add VBRD/VMV instructions Add VBRD/VMV vector instructions. In order to do that, also support VM512 registers and RV instruction format in MC layer. Also add regression tests for new instructions. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89641	2020-10-19 18:33:54 +09:00
Kazushi (Jam) Marukawa	7a09aec804	[VE] Add LSV/LVS/LVM/SVM instructions Add LSV/LVS/LVM/SVM vector instructions and regression tests. Also update AsmParser to support new format of operands. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89499	2020-10-19 18:32:48 +09:00
Kazushi (Jam) Marukawa	25955cbae4	[VE] Support br_cc comparing fp128 Support br_cc instruction comparing fp128 values. Add a br_cc.ll regression test for all kind of br_cc instructions. And, clean existing branch regression tests, this time. Clean a brcond.ll regression test for brcond instruction. Remove mixed branch1.ll regression test. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89627	2020-10-19 18:29:39 +09:00
Kazushi (Jam) Marukawa	af8b444de3	[VE] Update ISel patterns for select instruction Add an ISel pattern for fp128 select instruction and optimize generated code for other types' select. instructions. Add a regression test also. Reviewed By: simoll Differential Revision: https://reviews.llvm.org/D89509	2020-10-19 18:28:21 +09:00
Lang Hames	f35707047e	[ORC] Break up C-API header Orc.h, and add JITEventListener support. This patch breaks Orc.h up into Orc.h, LLJIT.h and OrcEE.h. Orc.h contain core Orc utilities. LLJIT.h contains LLJIT specific types and functions. OrcEE.h contains types and functions that depend on ExecutionEngine. The intent is that these headers should match future library divisions: Clients who only use Orc.h should only need to link againt the Orc core libraries, clients using LLJIT.h will also need to link against LLVM core, and clients using OrcEE.h will also have to link against ExecutionEngine. In addition to breaking up the Orc.h header this patch introduces functions to: (1) Set the object linking layer creation function on LLJITBuilder. (2) Create an RTDyldObjectLinkingLayer instance (particularly for use in (1)). (3) Register JITEventListeners with an RTDyldObjectLinkingLayer. Together (1), (2) and (3) can be used to force use of RTDyldObjectLinkingLayer as the underlying JIT linker for LLJIT, rather than the platform default, and to register event listeners with the RTDyldObjectLinkingLayer.	2020-10-19 01:59:04 -07:00
Lang Hames	00369849e1	[ORC] Add function to get pool entry string. Patch by Andres Freund. Thanks Andres!	2020-10-19 01:59:04 -07:00
Lang Hames	24afffe63a	[ORC] Add C API support for defining absolute symbols. Also tweaks the definition of TryToGenerate to make it dovetail more neatly with the new function.	2020-10-19 01:59:04 -07:00
Lang Hames	b6ca0c7dd5	[ORC] Add support for custom generators to the C bindings. C API clients can now define a custom definition generator by providing a callback function (to implement DefinitionGenerator::tryToGenerate) and context object. All arguments for the DefinitionGenerator::tryToGenerate method have been given C API counterparts, and the API allows for optionally asynchronous generation.	2020-10-19 01:59:04 -07:00
Lang Hames	19402ce79a	[Support] Add a C-API function to create a StringError instance. This will allow C API clients to return errors from callbacks. This functionality will be used in upcoming Orc C-bindings functions.	2020-10-19 01:59:04 -07:00
Lang Hames	91d1f417fd	[ORC] Add basic ResourceTracker support to the OrcV2 C Bindings. Based on a patch by Andres Freund. Thanks Andres!	2020-10-19 01:59:04 -07:00
Lang Hames	49c065ae70	[ORC] Rename LLVMOrcJITDylibDefinitionGeneratorRef. The DefinitionGenerator class has been moved out of JITDylib. This updates the C API type and function names to reflect that.	2020-10-19 01:59:04 -07:00
Lang Hames	40f3fb52f7	[ORC] Fix C API function name. Patch by Andres Freund. Thanks Andres!	2020-10-19 01:59:03 -07:00
Lang Hames	35e48d7b91	[ORC] Add C API functions to obtain and clear the symbol string pool. Symbol string pool entries are ref counted, but not automatically cleared. This can cause the size of the pool to grow without bound if it's not periodically cleared. These functions allow that to be done via the C API.	2020-10-19 01:59:03 -07:00
Lang Hames	14cb9b4e21	[ORC] Add a C API function to set the ExecutionSession error reporter.	2020-10-19 01:59:03 -07:00
Lang Hames	c88d9eae8a	[ORC] Fix a memory leak in the OrcV2 C API (and some comment typos). The LLVMOrcLLJITAddLLVMIRModule function was leaking its LLVMOrcThreadSafeModuleRef argument. Wrapping the argument in a unique_ptr fixes this.	2020-10-19 01:59:03 -07:00
Lang Hames	069919c9ba	[ORC] Update Symbol Lookup / DefinitionGenerator system. This patch moves definition generation out from the session lock, instead running it under a per-dylib generator lock. It also makes the DefinitionGenerator::tryToGenerate method optionally asynchronous: Generators are handed an opaque LookupState object which can be captured to stop/restart the lookup process. The new scheme provides the following benefits and guarantees: (1) Queries that do not need to attempt definition generation (because all requested symbols matched against existing definitions in the JITDylib) can proceed without being blocked by any running definition generators. (2) Definition generators can capture the LookupState to continue their work asynchronously. This allows generators to run for an arbitrary amount of time without blocking a thread. Definition generators that do not need to run asynchronously can return without capturing the LookupState to eliminate unnecessary recursion and improve lookup performance. (3) Definition generators still do not need to worry about concurrency or re-entrance: Since they are still run under a (per-dylib) lock, generators will never be re-entered concurrently, or given overlapping symbol sets to generate. Finally, the new system distinguishes between symbols that are candidates for generation (generation candidates) and symbols that failed to match for a query (due to symbol visibility). This fixes a bug where an unresolved symbol could trigger generation of a duplicate definition for an existing hidden symbol.	2020-10-19 01:59:03 -07:00
Lang Hames	5d2e359ce6	[ORC] Move DefinitionGenerator out of JITDylib. This will make it easier to implement asynchronous definition generators.	2020-10-19 01:59:03 -07:00
Lang Hames	680845ec0d	[ORC] Move MaterializationResponsibility methods to ExecutionSession. MaterializationResponsibility, JITDylib, and ExecutionSession collectively manage the OrcV2 core JIT state. Responsibility for maintaining and updating this state has previously been spread among these classes, resulting in implementations that are each non-trivial, but all tightly coupled. This has in turn made reading the code and reasoning about state update and locking rules difficult. The core state model can be simplified by thinking of MaterializationResponsibility and JITDylib as facets of ExecutionSession. This commit is the first in a series intended to refactor Core.cpp to reflect this model. Operations on MaterializationResponsibility and JITDylib will forward to implementation methods inside ExecutionSession. Raw state will remain with the original classes, but in most cases will only be modified by the ExecutionSession.	2020-10-19 01:59:03 -07:00
Evgeny Leviant	8a7ca143f8	[ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate Differential revision: https://reviews.llvm.org/D89553	2020-10-19 11:37:54 +03:00
Roman Lebedev	d083d55c2c	[NFC][SCEV] Rename SCEVCastExpr into SCEVIntegralCastExpr All existing SCEV cast types operate on integers. D89456 will add SCEVPtrToIntExpr cast expression type. I believe this is best for consistency. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D89455	2020-10-19 10:59:53 +03:00
David Sherwood	3945b69e81	[SVE][CodeGen] Replace more TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. This patch changes a few functions that were always expecting to work on scalar or fixed width types: 1. DAGCombiner::mergeTruncStores - deals with scalar integers only. 2. DAGCombiner::ReduceLoadWidth - not valid for vectors. 3. DAGCombiner::createBuildVecShuffle - should only be used for fixed width vectors. 4. SelectionDAGLegalize::ExpandFCOPYSIGN and SelectionDAGLegalize::getSignAsIntValue - only work on scalars. Differential Revision: https://reviews.llvm.org/D88562	2020-10-19 08:38:50 +01:00
David Sherwood	35a531fb45	[SVE][CodeGen][NFC] Replace TypeSize comparison operators with their scalar equivalents In certain places in llvm/lib/CodeGen we were relying upon the TypeSize comparison operators when in fact the code was only ever expecting either scalar values or fixed width vectors. I've changed some of these places to use the equivalent scalar operator. Differential Revision: https://reviews.llvm.org/D88482	2020-10-19 08:30:31 +01:00
David Sherwood	f693f915a0	[SVE][CodeGen] Replace uses of TypeSize comparison operators In certain places in the code we can never end up in a situation where we're mixing fixed width and scalable vector types. For example, we can't have truncations and extends that change the lane count. Also, in other places such as GenWidenVectorStores and GenWidenVectorLoads we know from the behaviour of FindMemType that we can never choose a vector type with a different scalable property. In various places I have used EVT::bitsXY functions instead of TypeSize::isKnownXY, where it probably makes sense to keep an assert that scalable properties match. Differential Revision: https://reviews.llvm.org/D88654	2020-10-19 08:08:41 +01:00
David Sherwood	d67d8f8790	[SVE][AArch64] Replace TypeSize comparisons with their integer equivalents In many places in the AArch64 backend we are comparing TypeSize objects, but in fact we are only ever expecting fixed width types. I've changed all such comparisons to use their integer equivalents by replacing calls to getSizeInBits() with getFixedSizeInBits(), etc. Differential Revision: https://reviews.llvm.org/D89116	2020-10-19 07:41:33 +01:00
Max Kazantsev	199826baa8	[NFC][SCEV] Use getMinusOne where possible	2020-10-19 12:56:09 +07:00
Kai Luo	354d3106c6	[PowerPC] Skip combining (uint_to_fp x) if x is not simple type Current powerpc64le backend hits ``` Combining: t7: f64 = uint_to_fp t6 llc: llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:291: llvm::MVT llvm::EVT::getSimpleVT() const: Assertion `isSimple() && "Expected a SimpleValueType!"' failed. ``` This patch fixes it by skipping combination if `t6` is not simple type. Fixed https://bugs.llvm.org/show_bug.cgi?id=47660. Reviewed By: #powerpc, steven.zhang Differential Revision: https://reviews.llvm.org/D88388	2020-10-19 05:23:46 +00:00
Lang Hames	ad92f16ccc	[ORC][examples] Update Kaleidoscope and BuildingAJIT tutorial series to OrcV2. This patch updates the Kaleidoscope and BuildingAJIT tutorial series (chapter 1-4) to OrcV2. Chapter 5 of the BuildingAJIT series is removed -- it will be re-instated once we have in-tree support for out-of-process JITing. This patch only updates the tutorial code, not the text. Patches welcome for that, otherwise I will try to update it in a few weeks.	2020-10-18 21:03:04 -07:00
Lang Hames	0aec49c853	[ORC] Add support for resource tracking/removal (removable code). This patch introduces new APIs to support resource tracking and removal in Orc. It is intended as a thread-safe generalization of the removeModule concept from OrcV1. Clients can now create ResourceTracker objects (using JITDylib::createResourceTracker) to track resources for each MaterializationUnit (code, data, aliases, absolute symbols, etc.) added to the JIT. Every MaterializationUnit will be associated with a ResourceTracker, and ResourceTrackers can be re-used for multiple MaterializationUnits. Each JITDylib has a default ResourceTracker that will be used for MaterializationUnits added to that JITDylib if no ResourceTracker is explicitly specified. Two operations can be performed on ResourceTrackers: transferTo and remove. The transferTo operation transfers tracking of the resources to a different ResourceTracker object, allowing ResourceTrackers to be merged to reduce administrative overhead (the source tracker is invalidated in the process). The remove operation removes all resources associated with a ResourceTracker, including any symbols defined by MaterializationUnits associated with the tracker, and also invalidates the tracker. These operations are thread safe, and should work regardless of the the state of the MaterializationUnits. In the case of resource transfer any existing resources associated with the source tracker will be transferred to the destination tracker, and all future resources for those units will be automatically associated with the destination tracker. In the case of resource removal all already-allocated resources will be deallocated, any if any program representations associated with the tracker have not been compiled yet they will be destroyed. If any program representations are currently being compiled then they will be prevented from completing: their MaterializationResponsibility will return errors on any attempt to update the JIT state. Clients (usually Layer writers) wishing to track resources can implement the ResourceManager API to receive notifications when ResourceTrackers are transferred or removed. The MaterializationResponsibility::withResourceKeyDo method can be used to create associations between the key for a ResourceTracker and an allocated resource in a thread-safe way. RTDyldObjectLinkingLayer and ObjectLinkingLayer are updated to use the ResourceManager API to enable tracking and removal of memory allocated by the JIT linker. The new JITDylib::clear method can be used to trigger removal of every ResourceTracker associated with the JITDylib (note that this will only remove resources for the JITDylib, it does not run static destructors). This patch includes unit tests showing basic usage. A follow-up patch will update the Kaleidoscope and BuildingAJIT tutorial series to OrcV2 and will use this API to release code associated with anonymous expressions.	2020-10-18 21:02:54 -07:00
Lang Hames	6154c4115c	[ORC] Remove OrcV1 APIs. This removes all legacy layers, legacy utilities, the old Orc C bindings, OrcMCJITReplacement, and OrcMCJITReplacement regression tests. ExecutionEngine and MCJIT are not affected by this change.	2020-10-18 21:02:44 -07:00
Hubert Tong	2980ce98be	Fix various format specifier mismatches Format specifiers of incorrect length are replaced with format specifier macros from `<cinttypes>` matching the typedefs used to declare the type of the value being printed. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D89637	2020-10-18 12:39:15 -04:00
Nikita Popov	6de8d7f1ad	[BasicAA] Accept AATags by const reference (NFC) Rather than swapping the value, the sizes, the AA tags and the underlying objects multiple times, invoke the helper methods with swapped arguments.	2020-10-18 18:19:01 +02:00
Florian Hahn	f5cf7f544b	[DSE] Do not consider 'noop' intrinsics as read-clobbers. isNoopIntrinsic returns true for some intrinsics that are modeled in MemorySSA but do not actually read or write any memory and do not block DSE. Such intrinsics should not be considered as read-clobbers.	2020-10-18 15:51:05 +01:00
Nikita Popov	f9172d3c7b	[AA] Add helper to update result (NFC) This pattern was repeated a few times, and for some reason always using insert or try_emplace, even though we know in advance that we're looking for an existing entry and not trying to create a new one.	2020-10-18 16:43:26 +02:00
Fangrui Song	e7813a930a	Delete unneeded X86RegisterInfo::hasReservedSpillSlot. NFC Only PowerPC and RISCV need to override it.	2020-10-17 23:18:55 -07:00
Craig Topper	5d69eecc53	[X86] Mark the Key Locker instructions as NotMemoryFoldable to make the X86FoldTablesEmitter not crash. loadiwkey and aesenc128kl share the same opcode but one is memory and one is register. But they're behavior is quite different. We were crashing because one has an output register and one doesn't and the backend couldn't account for that. But since they aren't foldable we can just add NotMemoryFoldable so they won't be looked at.	2020-10-17 18:02:54 -07:00
Dávid Bolvanský	65e94cc946	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-18 01:33:26 +02:00
Nikita Popov	9d2b8300b7	[BasicAA] Avoid alias query if result cannot be used (NFCI) Rather then querying first and then checking additional conditions, check the conditions first. They are much cheaper than the alias query.	2020-10-18 00:00:15 +02:00
Nikita Popov	3c6fe0fc77	[BasicAA] Fix stale comment (NFC) DataLayout is always around...	2020-10-17 23:58:58 +02:00
Dávid Bolvanský	2a75e956e5	Revert "[InferAttrs] Add argmemonly attribute to string libcalls" This reverts commit `b77dd32a6f`. Sanitizer tests are broken.	2020-10-17 23:29:02 +02:00
Dávid Bolvanský	b77dd32a6f	[InferAttrs] Add argmemonly attribute to string libcalls Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89602	2020-10-17 22:42:36 +02:00
Roman Lebedev	ec54867df5	[SCEV] Model `ashr exact x, C` as `(abs(x) EXACT/u (1<<C)) * signum(x)` It's not pretty, but probably better than modelling it as an opaque SCEVUnknown, i guess. It is relevant e.g. for the loop that was brought up in https://bugs.llvm.org/show_bug.cgi?id=46786#c26 as an example of what we'd be able to better analyze once SCEV handles `ptrtoint` (D89456). But as it is evident, even if we deal with `ptrtoint` there, we also fail to model such an `ashr`. Also, modeling of mul-of-exact-shr/div could use improvement. As per alive2: https://alive2.llvm.org/ce/z/tnfZKd ``` define i8 @src(i8 %0) { %2 = ashr exact i8 %0, 4 ret i8 %2 } declare i8 @llvm.abs(i8, i1) declare i8 @llvm.smin(i8, i8) declare i8 @llvm.smax(i8, i8) define i8 @tgt(i8 %x) { %abs_x = call i8 @llvm.abs(i8 %x, i1 false) %div = udiv exact i8 %abs_x, 16 %t0 = call i8 @llvm.smax(i8 %x, i8 -1) %t1 = call i8 @llvm.smin(i8 %t0, i8 1) %r = mul nsw i8 %div, %t1 ret i8 %r } ``` Transformation seems to be correct!	2020-10-17 21:22:24 +03:00
Roman Lebedev	130cc662b5	[NFC][SCEV] Refactor getAbsExpr() out of createSCEV()	2020-10-17 21:21:02 +03:00
Roman Lebedev	be1678bdb9	[NFC][SCEV] Add 'getMinusOne()' method	2020-10-17 21:20:58 +03:00
Sanjay Patel	53e92b4c0e	[InstCombine] (~A & B) ^ A -> A \| B Differential Revision: https://reviews.llvm.org/D86395	2020-10-17 12:20:18 -04:00
Nikita Popov	50cc9a0e61	[MemCpyOpt] Extract common function for unwinding check These two cases should be using the same logic. Not NFC, as this resolves the TODO regarding use of the underlying object.	2020-10-17 15:30:39 +02:00
Pedro Tammela	60b19424bb	[NFC] fix some typos in LoopUnrollPass This patch fixes a couple of typos in the LoopUnrollPass.cpp comments Differential Revision: https://reviews.llvm.org/D89603	2020-10-17 14:20:55 +01:00
David Green	b93d74ac9c	[ARM] Basic getArithmeticReductionCost reduction costs This adds some basic costs for MVE reductions - currently just costing the simple legal add vectors as a single MVE instruction. More complex costing can be added in the future when the framework more readily allows it. Differential Revision: https://reviews.llvm.org/D88980	2020-10-17 10:29:00 +01:00
David Green	d79ee3a807	[ARM] Add a very basic active_lane_mask cost This adds a very basic cost for active_lane_mask under MVE - making the assumption that they will be free and then apologizing for that in a comment. In reality they may either be free (by being nicely folded into a tail predicated loop), cost the same as a VCTP or be expanded into vdup's, adds and cmp's. It is difficult to detect the difference from a single getIntrinsicInstrCost call, so makes the assumption that the vectorizer is adding them, and only added them where it makes sense. We may need to change this in the future to better model predicate costs in the vectorizer, especially at -Os or non-tail predicated loops. The vectorizer currently does not query the cost of these instructions but that will change in the future and a zero cost there probably makes the most sense at the moment. Differential Revision: https://reviews.llvm.org/D88989	2020-10-17 10:09:42 +01:00
Juneyoung Lee	62a0ec1612	Add support for !noundef metatdata on loads This patch adds metadata !noundef and makes load instructions can optionally have it. A load with !noundef always return a well-defined value (has no undef bit or isn't poison). If the loaded value isn't well defined, the behavior is undefined. This metadata can be used to encode the assumption from C/C++ that certain reads of variables should have well-defined values. It is helpful for optimizing freeze instructions away, because freeze can be removed when its operand has well-defined value, and showing that a load from arbitrary location is well-defined is usually hard otherwise. The same information can be encoded with llvm.assume with operand bundle; using metadata is chosen because I wasn't sure whether code motion can be freely done when llvm.assume is inserted from clang instead. The existing codebase already is stripping unknown metadata when doing code motion, so using metadata is UB-safe as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D89050	2020-10-17 13:50:10 +09:00
Alok Kumar Sharma	0538353b3b	[DebugInfo] Support for DWARF operator DW_OP_over LLVM rejects DWARF operator DW_OP_over. This DWARF operator is needed for Flang to support assumed rank array. Summary: Currently LLVM rejects DWARF operator DW_OP_over. Below error is produced when llvm finds this operator. [..] invalid expression !DIExpression(151, 20, 16, 48, 30, 35, 80, 34, 6) warning: ignoring invalid debug info in over.ll [..] There were some parts missing in support of this operator, which are now completed. Testing -added a unit testcase -check-debuginfo -check-llvm Reviewed By: aprantl Differential Revision: https://reviews.llvm.org/D89208	2020-10-17 08:42:28 +05:30
Craig Topper	278bd06891	[TargetLowering] Extract simplifySetCCs ctpop into a separate function. NFCI As requested in D89346. This allows us to add some early outs. I reordered some checks a little bit to make the more common bail outs happen earlier. Like checking opcode before checking hasOneUse. And I moved the bit width check to make sure it was safe to look through a truncate to the spot where we look through truncates instead of after. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D89494	2020-10-16 19:47:56 -07:00
Alina Sbirlea	dc97138123	[MemorySSA] Verify clobbering within reachable blocks. Resolves PR45976.	2020-10-16 17:46:28 -07:00
Amara Emerson	4ad459997e	[AArch64][GlobalISel] Select csinc if a select has a 1 on RHS. Differential Revision: https://reviews.llvm.org/D89513	2020-10-16 16:49:52 -07:00
Albion Fung	d30155feaa	[PowerPC] Implementation of 128-bit Binary Vector Rotate builtins This patch implements 128-bit Binary Vector Rotate builtins for PowerPC10. Differential Revision: https://reviews.llvm.org/D86819	2020-10-16 18:03:22 -04:00
Jameson Nash	4242df1470	Revert "make the AsmPrinterHandler array public" I messed up one of the tests.	2020-10-16 17:22:07 -04:00
Jameson Nash	ac2def2d8d	make the AsmPrinterHandler array public This lets external consumers customize the output, similar to how AssemblyAnnotationWriter lets the caller define callbacks when printing IR. The array of handlers already existed, this just cleans up the code so that it can be exposed publically. Differential Revision: https://reviews.llvm.org/D74158	2020-10-16 16:27:31 -04:00
Artem Belevich	c36c0fabd1	[VectorCombine] Avoid crossing address space boundaries. We can not bitcast pointers across different address spaces, and VectorCombine should be careful when it attempts to find the original source of the loaded data. Differential Revision: https://reviews.llvm.org/D89577	2020-10-16 13:19:31 -07:00
Stanislav Mekhanoshin	874524ab88	[AMDGPU] Drop array size in AMDGCNGPUs and R600GPUs Differential Revision: https://reviews.llvm.org/D89568	2020-10-16 12:37:22 -07:00
Nikita Popov	74c8c2d903	Revert "Recommit "[SCEV] Use nw flag and symbolic iteration count to sharpen ranges of AddRecs"" This reverts commit `32b72c3165`. While better than before, this change still introduces a large compile-time regression (>3% on mafft): https://llvm-compile-time-tracker.com/compare.php?from=fbd62fe60fb2281ca33da35dc25ca3c87ec0bb51&to=32b72c3165bf65cca2e8e6197b59eb4c4b60392a&stat=instructions Additionally, the logic here doesn't look quite right to me, I will comment in more detail on the differential revision.	2020-10-16 21:36:33 +02:00
Arthur Eubanks	faf5210420	[CGSCC] Add -abort-on-max-devirt-iterations-reached option Aborts if we hit the max devirtualization iteration. Will be useful for testing that changes to devirtualization don't cause devirtualization to repeat passes more times than necessary. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D89519	2020-10-16 12:34:52 -07:00
Austin Kerbow	978fbd8268	[AMDGPU] Run hazard recognizer pass later If instructions were removed in peephole passes after the hazard recognizer was run it is possible that new hazards could be introduced. Fixes: SWDEV-253090 Reviewed By: rampitec, arsenm Differential Revision: https://reviews.llvm.org/D89077	2020-10-16 12:15:51 -07:00
Amara Emerson	39c05a1a71	[AArch64][GlobalISel] Add selection support for v2s32 and v2s64 reductions for FADD/ADD. We'll need legalizer lower() support for the other types to work. Differential Revision: https://reviews.llvm.org/D89159	2020-10-16 11:41:57 -07:00
Benjamin Kramer	b740899c50	[Indvars][NFCI] Simplify assertion. This should be semantically identical. Also avoids unused variable warnings in Release builds.	2020-10-16 19:58:55 +02:00
Amara Emerson	32f77eea2d	[AArch64][GlobalISel] Regbankselect reductions to use FPR bank for scalars. Differential Revision: https://reviews.llvm.org/D89075	2020-10-16 10:42:15 -07:00
Amara Emerson	9190411fcf	[AArch64][GlobalISel] Add basic legalizer rules for supported add/fadd reductions. NEON is pretty limited in it's reduction support. As a first step add some basic rules for the legal types we can select. Differential Revision: https://reviews.llvm.org/D89070	2020-10-16 10:35:46 -07:00
Amara Emerson	6042c25b0a	[GlobalISel] Add translation support for vector reduction intrinsics. In order to prevent the ExpandReductions pass from expanding some intrinsics before they get to codegen, I had to add a -disable-expand-reductions flag for testing purposes. Differential Revision: https://reviews.llvm.org/D89028	2020-10-16 10:17:53 -07:00
Jay Foad	1417abe54c	[AMDGPU] Add new llvm.amdgcn.fma.legacy intrinsic Differential Revision: https://reviews.llvm.org/D89558	2020-10-16 17:10:21 +01:00
Jay Foad	0c1381d795	[llc] Use -filetype=null to disable MIR printing If you use -stop-after or similar options, llc will normally print MIR. This patch checks for -filetype=null as a special case to disable MIR printing. As the comment says, "The Null output is intended for use for performance analysis ...", and I found this useful for timing a subset of the passes that llc runs without the significant overhead of printing MIR just to send it to /dev/null. Differential Revision: https://reviews.llvm.org/D89476	2020-10-16 16:51:56 +01:00
Matt Arsenault	0a7cd99a70	Reapply "OpaquePtr: Add type to sret attribute" This reverts commit `eb9f7c28e5`. Previously this was incorrectly handling linking of the contained type, so this merges the fixes from D88973.	2020-10-16 11:05:02 -04:00
Krzysztof Parzyszek	97533b10b2	[Hexagon] Fix license headers in some .td files, NFC	2020-10-16 10:03:05 -05:00
Simon Pilgrim	83ae625f0c	[InstCombine] visitAnd - pull out repeated I.getType() calls. NFCI.	2020-10-16 15:43:11 +01:00
Simon Pilgrim	253f24cf4c	[InstCombine] Remove custom and(trunc(and(x,c1)),c2) fold This is more correctly handled by canEvaluateTruncated (one use checks etc.) and covers all the tests cases that were added for this fold.	2020-10-16 15:43:10 +01:00
Matt Arsenault	ce16b6835b	AMDGPU: Don't kill super-register with overlapping copy This would end up killing part of the result super-register, resulting in a verifier error on a later use of the overlapping registers. We could add kills of any non-aliasing registers, but we should be moving away from relying on kill flags.	2020-10-16 09:34:35 -04:00

1 2 3 4 5 ...

140363 Commits