llvm-project

Commit Graph

Author	SHA1	Message	Date
Johannes Doerfert	14cb0bdf2b	[Attributor][NFC] Replace the nested AAMap with a key pair No functional change is intended. --- Single run of the Attributor module and then CGSCC pass (oldPM) for SPASS/clause.c (~10k LLVM-IR loc): Before: ``` calls to allocation functions: 512375 (362871/s) temporary memory allocations: 98746 (69933/s) peak heap memory consumption: 22.54MB peak RSS (including heaptrack overhead): 106.78MB total memory leaked: 269.10KB ``` After: ``` calls to allocation functions: 509833 (338534/s) temporary memory allocations: 98902 (65671/s) peak heap memory consumption: 18.71MB peak RSS (including heaptrack overhead): 103.00MB total memory leaked: 269.10KB ``` Difference: ``` calls to allocation functions: -2542 (-27042/s) temporary memory allocations: 156 (1659/s) peak heap memory consumption: -3.83MB peak RSS (including heaptrack overhead): 0B total memory leaked: 0B ```	2020-05-03 22:10:47 -05:00
Johannes Doerfert	95e0d28b71	[Attributor] Remember only necessary dependences Before we eagerly put dependences into the QueryMap as soon as we encountered them (via `Attributor::getAAFor<>` or `Attributor::recordDependence`). Now we will wait to see if the dependence is useful, that is if the target is not already in a fixpoint state at the end of the update. If so, there is no need to record the dependence at all. Due to the abstraction via `Attributor::updateAA` we will now also treat the very first update (during attribute creation) as we do subsequent updates. Finally this resolves the problematic usage of QueriedNonFixAA. --- Single run of the Attributor module and then CGSCC pass (oldPM) for SPASS/clause.c (~10k LLVM-IR loc): Before: ``` calls to allocation functions: 554675 (389245/s) temporary memory allocations: 101574 (71280/s) peak heap memory consumption: 28.46MB peak RSS (including heaptrack overhead): 116.26MB total memory leaked: 269.10KB ``` After: ``` calls to allocation functions: 512465 (345559/s) temporary memory allocations: 98832 (66643/s) peak heap memory consumption: 22.54MB peak RSS (including heaptrack overhead): 106.58MB total memory leaked: 269.10KB ``` Difference: ``` calls to allocation functions: -42210 (-727758/s) temporary memory allocations: -2742 (-47275/s) peak heap memory consumption: -5.92MB peak RSS (including heaptrack overhead): 0B total memory leaked: 0B ```	2020-05-03 22:01:51 -05:00
Johannes Doerfert	8228153f87	[Attributor][NFC] Encode IRPositions in the bits of a single pointer This reduces memory consumption for IRPositions by eliminating the vtable pointer and the `KindOrArgNo` integer. Since each abstract attribute has an associated IRPosition, the 12-16 bytes we save add up quickly. No functional change is intended. --- Single run of the Attributor module and then CGSCC pass (oldPM) for SPASS/clause.c (~10k LLVM-IR loc): Before: ``` calls to allocation functions: 469545 (260135/s) temporary memory allocations: 77137 (42735/s) peak heap memory consumption: 30.50MB peak RSS (including heaptrack overhead): 119.50MB total memory leaked: 269.07KB ``` After: ``` calls to allocation functions: 468999 (274108/s) temporary memory allocations: 77002 (45004/s) peak heap memory consumption: 28.83MB peak RSS (including heaptrack overhead): 118.05MB total memory leaked: 269.07KB ``` Difference: ``` calls to allocation functions: -546 (5808/s) temporary memory allocations: -135 (1436/s) peak heap memory consumption: -1.67MB peak RSS (including heaptrack overhead): 0B total memory leaked: 0B ``` --- CTMark 15 runs Metric: compile_time Program lhs rhs diff test-suite...:: CTMark/sqlite3/sqlite3.test 25.07 24.09 -3.9% test-suite...Mark/mafft/pairlocalalign.test 14.58 14.14 -3.0% test-suite...-typeset/consumer-typeset.test 21.78 21.58 -0.9% test-suite :: CTMark/SPASS/SPASS.test 21.95 22.03 0.4% test-suite :: CTMark/lencod/lencod.test 25.43 25.50 0.3% test-suite...ark/tramp3d-v4/tramp3d-v4.test 23.88 23.83 -0.2% test-suite...TMark/7zip/7zip-benchmark.test 60.24 60.11 -0.2% test-suite :: CTMark/kimwitu++/kc.test 15.69 15.69 -0.0% test-suite...:: CTMark/ClamAV/clamscan.test 25.43 25.42 -0.0% test-suite :: CTMark/Bullet/bullet.test 37.63 37.62 -0.0% Geomean difference -0.8% --- Reviewed By: lebedev.ri Differential Revision: https://reviews.llvm.org/D78722	2020-05-03 12:15:19 -05:00
Johannes Doerfert	6bf16ee4c5	[Attributor][NFC] Let AbstractAttribute be an IRPosition Since every AbstractAttribute so far, and for the foreseeable future, corresponds to a single IRPosition we can simplify the class structure. We already did this for IRAttribute but there is no reason to stop there.	2020-05-03 12:13:40 -05:00
Alexey Lapshin	4f576ea731	[Debuginfo][NFC] Avoid double calling of DWARFDie::find(DW_AT_name). Summary: Current implementation of DWARFDie::getName(DINameKind Kind) could lead to double call to DWARFDie::find(DW_AT_name) in following scenario: getName(LinkageName); getName(ShortName); getName(LinkageName) calls find(DW_AT_name) if linkage name is not found. Then, it is called again in getName(ShortName). This patch alows to request LinkageName and ShortName separately to avoid extra call to find(DW_AT_name). It helps D74169 to parse clang debuginfo faster(~1%). Reviewers: clayborg, dblaikie Differential Revision: https://reviews.llvm.org/D79173	2020-05-03 14:00:25 +03:00
Reid Kleckner	5070cecd72	[PDB] Bypass generic deserialization code for publics sorting The number of public symbols is very large, and each deserialization does a few heap allocations. The public symbols are serialized by the linker, so we can assume they have the expected layout and use it directly. Saves O(#publics) temporary heap allocations and shrinks some data structures.	2020-05-02 18:14:50 -07:00
Reid Kleckner	7af4bb1641	[PDB] Remove unique_ptr wrapper around C13 line table subsections This accounts for a large portion of the memory allocations in LLD. This DebugSubsectionRecordBuilder object can be stored directly in C13Builders, it mostly wraps other subsections. Remove the container kind field from the object. It is always the same for all elements in the vector, and we can pass it in during writing.	2020-05-02 16:35:07 -07:00
Reid Kleckner	270d3faf6e	[COFF] Add and use a zero-copy tokenizer for .drectve This generalizes the main Windows command line tokenizer to be able to produce StringRef substrings as well as freshly copied C strings. The implementation is still shared with the normal tokenizer, which is important, because we have unit tests for that. .drective sections can be very long. They can potentially list up to every symbol in the object file by name. It is worth avoiding these string copies. This saves a lot of memory when linking chrome.dll with PGO instrumentation: BEFORE AFTER % IMP peak memory: 6657.76MB 4983.54MB -25% real: 4m30.875s 2m26.250s -46% The time improvement may not be real, my machine was noisy while running this, but that the peak memory usage improvement should be real. This change may also help apps that heavily use dllexport annotations, because those also use linker directives in object files. Apps that do not use many directives are unlikely to be affected. Reviewed By: thakis Differential Revision: https://reviews.llvm.org/D79262	2020-05-02 10:47:02 -07:00
Benjamin Kramer	f7bf28b2c0	[SmallVector] Weaken the predicate for the memcpy optimization We don't require the type to be trivially assignable. While the standard says that only is_trivially_copyable types may be memcpy'd, this seems overly strict. We never assign the type, so there's no way for the type to observe that the copy/move construction got elided. This is important for std::pair<POD, POD>, which is not trivially assignable and probably never will be because changing that would break ABI. As a side-effect this no longer allows types with deleted copy/move constructors in SmallVector. That's an unintended side-effect of is_trivially_copyable anyways. Shrinks Release+Asserts clang by 20k.	2020-05-02 19:40:42 +02:00
Benjamin Kramer	d3bc86c2ed	[Allocator] Make Deallocate() pass alignment and make it use (de)allocate_buffer This lets it use sized deallocation and make more efficient alignment decisions. Also adjust BumpPtrAllocator to always allocate at alignof(std::max_align_t).	2020-05-02 16:08:46 +02:00
Sam McCall	d10c995b4d	std::isspace -> llvm::isSpace (where locale should be ignored) I've left out some cases where I wasn't totally sure this was right or whether the include was ok (compiler-rt) or idiomatic (flang).	2020-05-02 15:36:04 +02:00
Sam McCall	b283ae7af8	[ADT] Add locale-independent isSpace() to StringExtras. NFC Use this in clangd, will follow up with replacements for isspace where locale-dependent is clearly not intended.	2020-05-02 15:20:05 +02:00
Nikita Popov	b7e2358220	Remove getNumUses() comparisons (NFC) getNumUses() scans the full use list. Don't use it is we only want to check if there's zero or one uses.	2020-05-02 11:05:19 +02:00
Xing GUO	ff6a0b6a8e	[Object] Change ObjectFile::getSymbolValue() return type to Expected<uint64_t> Summary: In D77860, we have changed `getSymbolFlags()` return type to `Expected<uint32_t>`. This change helps bubble the error further up the stack. Reviewers: jhenderson, grimar, JDevlieghere, MaskRay Reviewed By: jhenderson Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79075	2020-05-02 14:04:44 +08:00
Craig Topper	e39c7ab2b9	[CostModel][X86][ARM] Teach default implementation of getCastInstrCost to not add a split/join cost if source type and the destination type both have a SplitVector action If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join. Differential Revision: https://reviews.llvm.org/D79111	2020-05-01 18:55:23 -07:00
Christopher Tetreault	0ee7b7e3f1	[SVE] Fix invalid use of VectorType::getNumElements() in PatternMatch Summary: Update cst_pred_ty to only work on FixedVectorType. It operates on integers and integer vectors, and returns true if the predicate returns true for all elements of the vector. This operation is not possible on scalable vectors. Make this behavior explicit in the code and document the fact that it only tests fixed width vectors. Identified by test LLVM.Transforms/InstCombine::nsw.ll Reviewers: efriedma, c-rhodes, david-arm, spatel Reviewed By: david-arm Subscribers: tschuett, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79195	2020-05-01 12:35:08 -07:00
River Riddle	1165a35a73	[IndexedAccessorRange] Only offset the base if the index is non-zero. This is more efficient and removes the need for derived ranges to handle the degenerate empty case.	2020-05-01 11:56:39 -07:00
Nico Weber	b9d50bdff2	Fix pr31836 on Windows too, and correctly handle repeated separators. The approach in D30000 assumes that the '/' returned by path::begin() is the first element for absolute paths, but that's not true on Windows. Also, on Windows backslashes in include lines often end up escaped so that there are two of them. Having backslashes in include lines is undefined behavior in most cases and implementation-defined behavior in C++20, but since clang treats it as normal repeated path separators, the diagnostic should too. Unbreaks -Wnonportable-include-path for absolute paths on Windows, and unbreaks it on non-Windows in the case of absolute paths with repeated directory separators. This affects e.g. the `#include __FILE__` technique if the file passed to clang has the wrong case for the drive letter. Before: C:\src\llvm-project>bin\clang-cl.exe c:\src\llvm-project\test.cc c:\\src\\llvm-project\\test.cc(4,10): warning: non-portable path to file '"c\\srccllvm-projectctest.cc.'; specified path differs in case from file name on disk [-Wnonportable-include-path] ^ Now: C:\src\llvm-project> out\gn\bin\clang-cl c:\src\llvm-project\test.cc c:\\src\\llvm-project\\test.cc(4,10): warning: non-portable path to file '"C:\\src\\llvm-project\\test.cc"'; specified path differs in case from file name on disk [-Wnonportable-include-path] ^ Differential Revision: https://reviews.llvm.org/D79223	2020-05-01 14:17:01 -04:00
Melanie Blower	fce82c0ed3	Revert "Reapply "Add support for #pragma float_control" with improvements to" This reverts commit `69aacaf699`.	2020-05-01 10:31:09 -07:00
Fangrui Song	3e4f343d4b	[ADT] Add DenseSetImpl(begin, end)	2020-05-01 10:10:45 -07:00
Melanie Blower	69aacaf699	Reapply "Add support for #pragma float_control" with improvements to test cases Add support for #pragma float_control Reviewers: rjmccall, erichkeane, sepavloff Differential Revision: https://reviews.llvm.org/D72841 This reverts commit `85dc033cac`, and makes corrections to the test cases that failed on buildbots.	2020-05-01 10:03:30 -07:00
Melanie Blower	85dc033cac	Revert "Add support for #pragma float_control" This reverts commit `4f1e9a17e9`. due to fail on buildbot, sorry for the noise	2020-05-01 06:36:58 -07:00
Melanie Blower	4f1e9a17e9	Add support for #pragma float_control Reviewers: rjmccall, erichkeane, sepavloff Differential Revision: https://reviews.llvm.org/D72841	2020-05-01 06:14:24 -07:00
Benjamin Kramer	7a5a1e9460	[IR] AttributeList::getContext has a single user, remove it.	2020-05-01 14:18:29 +02:00
Hubert Tong	5d806e254e	[XCOFF] Clean-up enum use in BinaryFormat/XCOFF.h; NFC Summary: This patch splits mostly unrelated size constants into separate constexpr variables, sets explicit underlying types for the enumerations to match the fields they are used for, and improves various comments. This patch also replaces `<cname>` headers with `<name.h>` headers to match the usage of the declared names as global namespace members in the file. Reviewers: jasonliu, DiggerLin, daltenty, sfertile Reviewed By: jasonliu, sfertile Differential Revision: https://reviews.llvm.org/D79136	2020-04-30 20:48:30 -04:00
Sergey Dmitriev	cfea3dc102	[AbstractCallSite] Look though constant cast expression when checking for callee use Summary: That makes AbstractCallSite::isCallee(const Use *) behavior consistent with AbstractCallSite constructor. Reviewers: jdoerfert Reviewed By: jdoerfert Subscribers: mgorny, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79188	2020-04-30 15:09:57 -07:00
Sam Clegg	65e64f6d65	[WebAssembly] Fix test failure after `0a6c4d8d2e` Reverting part of https://reviews.llvm.org/D79137 which caused an failure in an ObjectYAML test.	2020-04-30 14:27:08 -07:00
Florian Hahn	19ab53f1e2	[LoopVersioning] Update setAliasChecks to take ArrayRef argument (NFC). This cleanup was suggested as part of D78458.	2020-04-30 22:17:12 +01:00
Eli Friedman	c671345153	[IRBuilder][NFC] Dereference MaybeAlign that's known non-None.	2020-04-30 14:08:37 -07:00
Alexey Bataev	b5be1c5419	[OPENMP50]Basic support for uses_allocators clause. Summary: Added parsing/sema/serialization supoprt for uses_allocators clause. Reviewers: jdoerfert Subscribers: yaxunl, guansong, arphaman, cfe-commits, caomhin Tags: #clang Differential Revision: https://reviews.llvm.org/D78577	2020-04-30 16:24:36 -04:00
Sam Clegg	0a6c4d8d2e	[WebAssmebly] Add support for defined wasm globals in MC and lld This change add support for defined wasm globals in the .s format, the MC layer, and wasm-ld Currently there is no support custom initialization and all wasm globals are initialized to zero. Fixes: PR45742 Differential Revision: https://reviews.llvm.org/D79137	2020-04-30 12:43:15 -07:00
Arthur Eubanks	a90948fd6e	[NFC] Rename ByValOrInalloca to PassPointeeByValue Summary: In preparation for preallocated. Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79152	2020-04-30 09:42:13 -07:00
Jann Horn	cfe36e4c6a	[AddressSanitizer] Refactor: Permit >1 interesting operands per instruction Summary: Refactor getInterestingMemoryOperands() so that information about the pointer operand is returned through an array of structures instead of passing each piece of information separately by-value. This is in preparation for returning information about multiple pointer operands from a single instruction. A side effect is that, instead of repeatedly generating the same information through isInterestingMemoryAccess(), it is now simply collected once and then passed around; that's probably more efficient. HWAddressSanitizer has a bunch of copypasted code from AddressSanitizer, so these changes have to be duplicated. This is patch 3/4 of a patch series: https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments [glider: renamed llvm::InterestingMemoryOperand::Type to OpType to fix GCC compilation] Reviewers: kcc, glider Reviewed By: glider Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77618	2020-04-30 17:09:13 +02:00
Raphael Isemann	5e6f167aa9	Include SmallVector.h in IPO.h to fix modules build [NFC] This file currently doesn't compile under LLVM_ENABLE_MODULES as SmallVector is used in this header but is never forward declared or included in any way. Let's include SmallVector.h instead and get rid of the SmallVectorImpl fwd declaration which is now no longer necessary.	2020-04-30 16:33:55 +02:00
Alexander Potapenko	7e7754df32	Revert an accidental commit of four AddressSanitizer refactor CLs I couldn't make arc land the changes properly, for some reason they all got squashed. Reverting them now to land cleanly. Summary: This reverts commit `cfb5f89b62`. Reviewers: kcc, thejh Subscribers:	2020-04-30 16:15:43 +02:00
diggerlin	a2c8cd1812	[AIX] emit .extern and .weak directive linkage SUMMARY: emit .extern and .weak directive linkage Reviewers: hubert.reinterpretcast, Jason Liu Subscribers: wuzish, nemanjai, hiraditya Differential Revision: https://reviews.llvm.org/D76932	2020-04-30 09:54:10 -04:00
Jann Horn	cfb5f89b62	[AddressSanitizer] Refactor ClDebug{Min,Max} handling Summary: A following commit will split the loop over ToInstrument into two. To avoid having to duplicate the condition for suppressing instrumentation sites based on ClDebug{Min,Max}, refactor it out into a new function. While we're at it, we can also avoid the indirection through NumInstrumented for setting FunctionModified. This is patch 1/4 of a patch series: https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments Reviewers: kcc, glider Reviewed By: glider Subscribers: jfb, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D77616	2020-04-30 15:30:46 +02:00
Cullen Rhodes	7e4c26bb88	[AArch64][SVE] Remove unused FP reduction intrinsic definitions Summary: FP reductions no longer use these intrinsics since D78723. Reviewers: efriedma, sdesmalen Reviewed By: efriedma, sdesmalen Differential Revision: https://reviews.llvm.org/D79010	2020-04-30 10:18:40 +00:00
Evgeniy Brevnov	3e68a66704	[BPI][NFC] Reuse post dominantor tree from analysis manager when available Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager. Reviewers: skatkov, taewookoh, yrouban Reviewed By: skatkov Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78987	2020-04-30 11:31:03 +07:00
Mircea Trofin	3ab319b295	[llvm][NFC] Use CallBase explicitly instead of Instruction in FunctionComparator Reviewers: dblaikie, craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79098	2020-04-29 15:37:46 -07:00
Alina Sbirlea	161ccfe5ba	[MemorySSA] Pass DT to the upward iterator for proper PhiTranslation. Summary: A valid DominatorTree is needed to do PhiTranslation. Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop. This can lead to a miscompile. Reviewers: george.burgess.iv Subscribers: Prazek, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79068	2020-04-29 14:28:31 -07:00
Christopher Tetreault	7ef15c869a	[NFC] Make ConstantVector/ConstantDataVector::getType() return a FixedVectorType Reviewers: efriedma, huihuiz, dexonsmith, spatel Reviewed By: efriedma Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79122	2020-04-29 14:23:40 -07:00
Jan Korous	4c53f4202a	[FileCollector] move Root creation If we don't handle the errors we can't rely on the directory being created early anyway. Differential Revision: https://reviews.llvm.org/D78959	2020-04-29 11:47:23 -07:00
Christopher Tetreault	0700cb64b5	[SVE] Upgrade VectorType tests to test new types Reviewers: efriedma, sdesmalen, c-rhodes, ddunbar Reviewed By: sdesmalen Subscribers: huntergr, tschuett, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78831	2020-04-29 11:45:46 -07:00
Anh Tuyen Tran	c7878ad231	[VFDatabase] Scalar functions are vector functions with VF =1 Summary: Return scalar function when VF==1. The new trivial mapping scalar --> scalar when VF==1 to prevent false positive for "isVectorizable" query. Author: masoud.ataei (Masoud Ataei) Reviewers: Whitney (Whitney Tsang), fhahn (Florian Hahn), pjeeva01 (Jeeva P.), fpetrogalli (Francesco Petrogalli), rengolin (Renato Golin) Reviewed By: fpetrogalli (Francesco Petrogalli) Subscribers: hiraditya (Aditya Kumar), llvm-commits, LLVM Tag: LLVM Differential Revision: https://reviews.llvm.org/D78054	2020-04-29 17:20:37 +00:00
Hiroshi Yamauchi	1831986826	[PGO][PGSO] Prep for enabling non-cold code size opts under non-partial-profile sample PGO. Summary: - Distinguish between partial-profile and non-partial-profile sample PGO. - Add a flag for partial-profile sample PGO. - Tune the sample PGO cutoff. - No default behavior change (yet). Reviewers: davidxl Subscribers: eraman, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D78949	2020-04-29 08:57:47 -07:00
Mircea Trofin	e61247c0a8	[llvm][NFC] Change parameter type to more specific CallBase in IndirectCallPromotion Reviewers: dblaikie, craig.topper, wmi Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79047	2020-04-29 08:42:32 -07:00
Simon Pilgrim	1be7f2de1b	Revert rG5c4b4a62256876 "PseudoSourceValue.h - reduce GlobalValue.h include to forward declaration. NFC." Causes buildbot failures.	2020-04-29 16:12:19 +01:00
Simon Pilgrim	5c4b4a6225	PseudoSourceValue.h - reduce GlobalValue.h include to forward declaration. NFC. Fix MachineMemOperand.h implicit dependency on Type.h via PseudoSourceValue.h	2020-04-29 15:39:27 +01:00
Simon Pilgrim	090cae8491	[TTI] Add DemandedElts to getScalarizationOverhead The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited. This patch does 2 things: 1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern. 2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs. This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing. A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well. Reviewed By: @craig.topper Differential Revision: https://reviews.llvm.org/D78216	2020-04-29 12:00:38 +01:00

1 2 3 4 5 ...

40625 Commits