Before we eagerly put dependences into the QueryMap as soon as we
encountered them (via `Attributor::getAAFor<>` or
`Attributor::recordDependence`). Now we will wait to see if the
dependence is useful, that is if the target is not already in a fixpoint
state at the end of the update. If so, there is no need to record the
dependence at all.
Due to the abstraction via `Attributor::updateAA` we will now also treat
the very first update (during attribute creation) as we do subsequent
updates.
Finally this resolves the problematic usage of QueriedNonFixAA.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 554675 (389245/s)
temporary memory allocations: 101574 (71280/s)
peak heap memory consumption: 28.46MB
peak RSS (including heaptrack overhead): 116.26MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 512465 (345559/s)
temporary memory allocations: 98832 (66643/s)
peak heap memory consumption: 22.54MB
peak RSS (including heaptrack overhead): 106.58MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: -42210 (-727758/s)
temporary memory allocations: -2742 (-47275/s)
peak heap memory consumption: -5.92MB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Since every AbstractAttribute so far, and for the foreseeable future,
corresponds to a single IRPosition we can simplify the class structure.
We already did this for IRAttribute but there is no reason to stop
there.
Summary:
Current implementation of DWARFDie::getName(DINameKind Kind) could
lead to double call to DWARFDie::find(DW_AT_name) in following
scenario:
getName(LinkageName);
getName(ShortName);
getName(LinkageName) calls find(DW_AT_name) if linkage name is not
found. Then, it is called again in getName(ShortName). This patch
alows to request LinkageName and ShortName separately
to avoid extra call to find(DW_AT_name).
It helps D74169 to parse clang debuginfo faster(~1%).
Reviewers: clayborg, dblaikie
Differential Revision: https://reviews.llvm.org/D79173
The number of public symbols is very large, and each deserialization
does a few heap allocations. The public symbols are serialized by the
linker, so we can assume they have the expected layout and use it
directly.
Saves O(#publics) temporary heap allocations and shrinks some data
structures.
This accounts for a large portion of the memory allocations in LLD.
This DebugSubsectionRecordBuilder object can be stored directly in
C13Builders, it mostly wraps other subsections.
Remove the container kind field from the object. It is always the same
for all elements in the vector, and we can pass it in during writing.
This generalizes the main Windows command line tokenizer to be able to
produce StringRef substrings as well as freshly copied C strings. The
implementation is still shared with the normal tokenizer, which is
important, because we have unit tests for that.
.drective sections can be very long. They can potentially list up to
every symbol in the object file by name. It is worth avoiding these
string copies.
This saves a lot of memory when linking chrome.dll with PGO
instrumentation:
BEFORE AFTER % IMP
peak memory: 6657.76MB 4983.54MB -25%
real: 4m30.875s 2m26.250s -46%
The time improvement may not be real, my machine was noisy while running
this, but that the peak memory usage improvement should be real.
This change may also help apps that heavily use dllexport annotations,
because those also use linker directives in object files. Apps that do
not use many directives are unlikely to be affected.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D79262
We don't require the type to be trivially assignable. While the standard
says that only is_trivially_copyable types may be memcpy'd, this seems
overly strict. We never assign the type, so there's no way for the type
to observe that the copy/move construction got elided. This is important
for std::pair<POD, POD>, which is not trivially assignable and probably
never will be because changing that would break ABI.
As a side-effect this no longer allows types with deleted copy/move
constructors in SmallVector. That's an unintended side-effect of
is_trivially_copyable anyways.
Shrinks Release+Asserts clang by 20k.
This lets it use sized deallocation and make more efficient alignment
decisions. Also adjust BumpPtrAllocator to always allocate at
alignof(std::max_align_t).
Summary:
In D77860, we have changed `getSymbolFlags()` return type to `Expected<uint32_t>`.
This change helps bubble the error further up the stack.
Reviewers: jhenderson, grimar, JDevlieghere, MaskRay
Reviewed By: jhenderson
Subscribers: hiraditya, MaskRay, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79075
If both the source and the destination need to be split then the two halves of the split operation are completely independent and don't need to be split or joined. So we don't need to assess a cost for the split or join.
Differential Revision: https://reviews.llvm.org/D79111
Summary:
Update cst_pred_ty to only work on FixedVectorType. It operates on
integers and integer vectors, and returns true if the predicate returns
true for all elements of the vector. This operation is not possible on
scalable vectors. Make this behavior explicit in the code and document
the fact that it only tests fixed width vectors.
Identified by test LLVM.Transforms/InstCombine::nsw.ll
Reviewers: efriedma, c-rhodes, david-arm, spatel
Reviewed By: david-arm
Subscribers: tschuett, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79195
The approach in D30000 assumes that the '/' returned by path::begin()
is the first element for absolute paths, but that's not true on
Windows.
Also, on Windows backslashes in include lines often end up escaped
so that there are two of them. Having backslashes in include lines
is undefined behavior in most cases and implementation-defined
behavior in C++20, but since clang treats it as normal repeated
path separators, the diagnostic should too.
Unbreaks -Wnonportable-include-path for absolute paths on Windows,
and unbreaks it on non-Windows in the case of absolute paths with
repeated directory separators.
This affects e.g. the `#include __FILE__` technique if the file
passed to clang has the wrong case for the drive letter. Before:
C:\src\llvm-project>bin\clang-cl.exe c:\src\llvm-project\test.cc
c:\\src\\llvm-project\\test.cc(4,10): warning: non-portable path to file
'"c\\srccllvm-projectctest.cc.'; specified path differs in case from
file name on disk [-Wnonportable-include-path]
^
Now:
C:\src\llvm-project> out\gn\bin\clang-cl c:\src\llvm-project\test.cc
c:\\src\\llvm-project\\test.cc(4,10): warning: non-portable path to file
'"C:\\src\\llvm-project\\test.cc"'; specified path differs in case from
file name on disk [-Wnonportable-include-path]
^
Differential Revision: https://reviews.llvm.org/D79223
test cases
Add support for #pragma float_control
Reviewers: rjmccall, erichkeane, sepavloff
Differential Revision: https://reviews.llvm.org/D72841
This reverts commit 85dc033cac, and makes
corrections to the test cases that failed on buildbots.
Summary:
This patch splits mostly unrelated size constants into separate
constexpr variables, sets explicit underlying types for the enumerations
to match the fields they are used for, and improves various comments.
This patch also replaces `<cname>` headers with `<name.h>` headers to
match the usage of the declared names as global namespace members in the
file.
Reviewers: jasonliu, DiggerLin, daltenty, sfertile
Reviewed By: jasonliu, sfertile
Differential Revision: https://reviews.llvm.org/D79136
This change add support for defined wasm globals in the .s format,
the MC layer, and wasm-ld
Currently there is no support custom initialization and all wasm
globals are initialized to zero.
Fixes: PR45742
Differential Revision: https://reviews.llvm.org/D79137
Summary:
Refactor getInterestingMemoryOperands() so that information about the
pointer operand is returned through an array of structures instead of
passing each piece of information separately by-value.
This is in preparation for returning information about multiple pointer
operands from a single instruction.
A side effect is that, instead of repeatedly generating the same
information through isInterestingMemoryAccess(), it is now simply collected
once and then passed around; that's probably more efficient.
HWAddressSanitizer has a bunch of copypasted code from AddressSanitizer,
so these changes have to be duplicated.
This is patch 3/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
[glider: renamed llvm::InterestingMemoryOperand::Type to OpType to fix
GCC compilation]
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77618
This file currently doesn't compile under LLVM_ENABLE_MODULES as SmallVector
is used in this header but is never forward declared or included in any way.
Let's include SmallVector.h instead and get rid of the SmallVectorImpl fwd
declaration which is now no longer necessary.
I couldn't make arc land the changes properly, for some reason they all got
squashed. Reverting them now to land cleanly.
Summary: This reverts commit cfb5f89b62.
Reviewers: kcc, thejh
Subscribers:
Summary:
A following commit will split the loop over ToInstrument into two.
To avoid having to duplicate the condition for suppressing instrumentation
sites based on ClDebug{Min,Max}, refactor it out into a new function.
While we're at it, we can also avoid the indirection through
NumInstrumented for setting FunctionModified.
This is patch 1/4 of a patch series:
https://reviews.llvm.org/D77616 [PATCH 1/4] [AddressSanitizer] Refactor ClDebug{Min,Max} handling
https://reviews.llvm.org/D77617 [PATCH 2/4] [AddressSanitizer] Split out memory intrinsic handling
https://reviews.llvm.org/D77618 [PATCH 3/4] [AddressSanitizer] Refactor: Permit >1 interesting operands per instruction
https://reviews.llvm.org/D77619 [PATCH 4/4] [AddressSanitizer] Instrument byval call arguments
Reviewers: kcc, glider
Reviewed By: glider
Subscribers: jfb, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77616
Summary: FP reductions no longer use these intrinsics since D78723.
Reviewers: efriedma, sdesmalen
Reviewed By: efriedma, sdesmalen
Differential Revision: https://reviews.llvm.org/D79010
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.
Reviewers: skatkov, taewookoh, yrouban
Reviewed By: skatkov
Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78987
Summary:
A valid DominatorTree is needed to do PhiTranslation.
Before this patch, a MemoryUse could be optimized to an access outside a loop, while the address it loads from is modified in the loop.
This can lead to a miscompile.
Reviewers: george.burgess.iv
Subscribers: Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79068
The improvements to the x86 vector insert/extract element costs in D74976 resulted in the estimated costs for vector initialization and scalarization increasing higher than should be expected. This is particularly noticeable on pre-SSE4 targets where the available of legal INSERT_VECTOR_ELT ops is more limited.
This patch does 2 things:
1 - it implements X86TTIImpl::getScalarizationOverhead to more accurately represent the typical costs of a ISD::BUILD_VECTOR pattern.
2 - it adds a DemandedElts mask to getScalarizationOverhead to permit the SLP's BoUpSLP::getGatherCost to be rewritten to use it directly instead of accumulating raw vector insertion costs.
This fixes PR45418 where a v4i8 (zext'd to v4i32) was no longer vectorizing.
A future patch should extend X86TTIImpl::getScalarizationOverhead to tweak the EXTRACT_VECTOR_ELT scalarization costs as well.
Reviewed By: @craig.topper
Differential Revision: https://reviews.llvm.org/D78216