DataExtractor::GetMaxS64Bitfield performs a shift with UB in order to
construct a bitmask when bitfield_bit_size is 64. The current
implementation actually does “work” in this case, because the assumption
that the shift result is 0 holds, and 0 minus 1 gives the all-ones value
(the correct mask). However, the more readable/maintainable approach
might be to use an off-the-shelf UB-free helper.
Fixes a UBSan issue:
"col" : 37,
"description" : "invalid-shift-exponent",
"filename" : "/Users/vsk/src/llvm-project-master/lldb/source/Utility/DataExtractor.cpp",
"instrumentation_class" : "UndefinedBehaviorSanitizer",
"line" : 615,
"memory_address" : 0,
"summary" : "Shift exponent 64 is too large for 64-bit type 'uint64_t' (aka 'unsigned long long')",
rdar://59117758
Differential Revision: https://reviews.llvm.org/D73913
This ports the existing case for G_XOR from `getTestBitOperand` in
AArch64ISelLowering into GlobalISel.
The idea is to flip between TBZ and TBNZ while walking through G_XORs.
Let's say we have
```
tbz (xor x, c), b
```
Let's say the `b`-th bit in `c` is 1. Then
- If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 0.
- If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 1.
So, then
```
tbz (xor x, c), b == tbnz x, b
```
Let's say the `b`-th bit in `c` is 0. Then
- If the `b`-th bit in `x` is 1, the `b`-th bit in `(xor x, c)` is 1.
- If the `b`-th bit in `x` is 0, then the `b`-th bit in `(xor x, c)` is 0.
So, then
```
tbz (xor x, c), b == tbz x, b
```
Differential Revision: https://reviews.llvm.org/D73929
This implements the following optimization:
```
(tbz (shl x, c), b) -> (tbz x, b-c)
```
Which appears in `getTestBitOperand` in AArch64ISelLowering.cpp.
If we test bit `b` of `shl x, c`, we can fold away the `shl` by looking `c` bits
to the right of `b` in `x` when this fits in the type. So, we can just test the
`b-c`th bit.
Differential Revision: https://reviews.llvm.org/D73924
* [NFC] Renamed local `matching_module_list` to `matching_modules` for
conciseness.
* [NFC] Eliminated redundant local variable `num_matches` to reduce the risk
that changes get it out of sync with `matching_modules.GetSize()`.
* Used an early return from case where the symbol file specified matches
multiple modules. This is a slight behavior change, but it's an improvement:
It didn't make sense to tell the user that the symbol file simultaneously
matched multiple modules and no modules.
* [NFC] Used an early return from the case where no matches are found, to
better align with LLVM coding style.
* [NFC] Simplified call of `AppendWarningWithFormat("%s", stuff)` to
`AppendWarning(stuff)`. I don't think this adds any copies. It does
construct a StringRef, but it was going to have to scan the string for the
length anyway.
* [NFC] Removed unnecessary comments and reworded others for clarity.
* Used an early return if the symbol file could not be loaded. This is a
behavior change because previously it could fail silently.
* Used an early return if the object file could not be retrieved from the
symbol file. Again, this is a change because now there's an error message.
* [NFC] Eliminated a namespace alias that wasn't particularly helpful.
Differential Revision: https://reviews.llvm.org/D73594
Summary:
llvm-objdump -macho will no longer print "Contents of" headers when
disassembling section contents when -no-leading-headers is specified.
For historical reasons, this flag is independent of -no-leading-addr.
Reviewers: ab, pete, jhenderson
Reviewed By: jhenderson
Subscribers: rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73574
Summary:
Replace the generic zero- and one-result builders in LLVM::CallOp with a custom
builder that takes an LLVMFuncOp, which can be used to extract the result type
and create the symbol reference attribute. This is merely a convenience for
upcoming changes. The ODS-generated builders remain present.
Introduce LLVM::LLVMType::isVoidTy by analogy with the underlying LLVM type.
Differential Revision: https://reviews.llvm.org/D73895
As a result of recent changes to the Android size classes, the malloc_free_loop
benchmark started exhausting the 8192 size class at 32768 iterations. To avoid
this problem (and to make the test more realistic), change the benchmark to
use a variety of size classes.
Differential Revision: https://reviews.llvm.org/D73918
The tool is now looked for in the source directory rather than in the
install directory, which should exclude the problems with not being able
to find it.
The tests still aren't being run on Windows, but they hopefully will run
on other platforms that have shell, which hopefully also means Perl.
Differential Revision: https://reviews.llvm.org/D69781
Summary:
This patch allows for late initialisation of the GWP-ASan allocator. Previously, if late initialisation occurred, the sample counter was never updated, meaning we would end up having to wait for 2^32 allocations before getting a sampled allocation.
Now, we initialise the sampling mechanism in init() as well. We require init() to be called single-threaded, so this isn't a problem.
Reviewers: eugenis
Reviewed By: eugenis
Subscribers: merge_guards_bot, mgorny, #sanitizers, llvm-commits, cferris
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73896
Summary:
The AIX assembler .space directive can't take a second non-zero argument to fill
with. But LLVM emitFill currently assumes it can. We add a flag to the AsmInfo
to check if non-zero fill is supported, and if we can't zerofill non-zero values
we just splat the .byte directives.
Reviewers: stevewan, sfertile, DiggerLin, jasonliu, Xiangling_L
Reviewed By: jasonliu
Subscribers: Xiangling_L, wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73554
This change has two components. The moves the generated file
for a namespace to the directory named after the namespace in
a file named 'index.<format>'. This greatly improves the browsing
experience since the index page is shown by default for a directory.
The second improves the markdown output by adding the links to the
referenced pages for children objects and the link back to the source
code.
Patch By: Clayton
Differential Revision: https://reviews.llvm.org/D72954
LinalgDependenceGraph was not updated after successful producer-consumer
fusion for linalg ops. In this patch it is fixed by reconstructing
LinalgDependenceGraph on every iteration. This is very ineffective and
should be improved by updating LDGraph only when it is necessary.
Given
```
tb(n)z (and x, m), b
```
Where the `b`-th bit of `m` is 1,
```
tb(n)z (and x, m), b == tb(n)z x, b
```
So, we can walk past a `G_AND` in this case.
Also add test/CodeGen/AArch64/GlobalISel/opt-fold-and-tbz-tbnz.mir to test this.
Differential Revision: https://reviews.llvm.org/D73790
convertPtrAddToAdd improved overall code size and quality by a significant amount,
but on -O0 we generate some cross-class copies due to the fact that we emitted
G_PTRTOINT and G_INTTOPTR around the G_ADD. Unfortunately at -O0 we don't run any
register coalescing, so these cross class copies end up escaping as moves, and
we ended up regressing 3 benchmarks on CTMark (though still a winner overall).
This patch changes the lowering to instead directly emit the G_ADD into the
destination register, and then force changes the dest LLT to s64 from p0. This
should be ok, as all uses of the register should now be selected and therefore
the LLT doesn't matter for the users. It does however matter for the importer
patterns, which will fail to select a G_ADD if there's a p0 LLT.
I'm not able to get rid of the G_PTRTOINT on the source yet however. We can't
use the same trick of breaking the type system since that could break the
selection of the defining instruction. Thus with -O0 we still end up with a
cross class copy on source.
Code size improvements on -O0:
Program baseline new diff
test-suite :: CTMark/Bullet/bullet.test 965520 949164 -1.7%
test-suite...TMark/7zip/7zip-benchmark.test 1069456 1052600 -1.6%
test-suite...ark/tramp3d-v4/tramp3d-v4.test 1213692 1199804 -1.1%
test-suite...:: CTMark/sqlite3/sqlite3.test 421680 419736 -0.5%
test-suite...-typeset/consumer-typeset.test 837076 833380 -0.4%
test-suite :: CTMark/lencod/lencod.test 799712 796976 -0.3%
test-suite...:: CTMark/ClamAV/clamscan.test 688264 686132 -0.3%
test-suite :: CTMark/kimwitu++/kc.test 1002344 999648 -0.3%
test-suite...Mark/mafft/pairlocalalign.test 422296 421768 -0.1%
test-suite :: CTMark/SPASS/SPASS.test 656792 656532 -0.0%
Geomean difference -0.6%
Differential Revision: https://reviews.llvm.org/D73910
Start using a new strategy with a combination of merge and unmerges.
This allows scalarizing before lowering, which in cases like
<2 x s128> avoids producing giant illegal shifts.
RegAllocGreedy uses a fairly compile time intensive splitting heuristic
called region splitting. This heuristic was disabled via another heuristic
when it is likely that it won't be worth the compile time. The only way
to control this other heuristic was via a command line option (huge-size-for-split).
This commit gives more control on this heuristic by making it overridable
by the target using a target hook in TargetRegisterInfo called
shouldRegionSplitForVirtReg.
The default implementation of this hook keeps the heuristic as it was
before this patch.
```
src/UnwindCursor.hpp:1344:51: error: operator '?:' has lower precedence than '|';
'|' will be evaluated first [-Werror,-Wbitwise-conditional-parentheses]
_info.flags = isSingleWordEHT ? 1 : 0 | scope32 ? 0x2 : 0; // Use enum?
~~~~~~~~~~~ ^
src/UnwindCursor.hpp:1344:51: note: place parentheses around the '|' expression
to silence this warning
_info.flags = isSingleWordEHT ? 1 : 0 | scope32 ? 0x2 : 0; // Use enum?
^
( )
src/UnwindCursor.hpp:1344:51: note: place parentheses around the '?:' expression
to evaluate it first
_info.flags = isSingleWordEHT ? 1 : 0 | scope32 ? 0x2 : 0; // Use enum?
^
( )
```
But `0 |` is a no-op for either of those two interpretations, so I think
what was meant here was
```
_info.flags = (isSingleWordEHT ? 1 : 0) | (scope32 ? 0x2 : 0); // Use enum?
```
Previously, if `isSingleWordEHT` was set, bit 2 would never be set. Now
it is. From what I can tell, the only thing that checks these bitmask is
ProcessDescriptors in Unwind-EHABI.cpp, and that only cares about bit 1,
so in practice this shouldn't have much of an effect.
Differential Revision: https://reviews.llvm.org/D73890
ClangBuildAnalyzer results show that a lot of time is spent
instantiating AnalysisManager::getResultImpl across the code base:
**** Templates that took longest to instantiate:
50445 ms: llvm::AnalysisManager<llvm::Function>::getResultImpl (412 times, avg 122 ms)
47797 ms: llvm::AnalysisManager<llvm::Function>::getResult<llvm::TargetLibraryAnalysis> (389 times, avg 122 ms)
46894 ms: std::tie<const unsigned long long, const bool> (2452 times, avg 19 ms)
43851 ms: llvm::BumpPtrAllocatorImpl<llvm::MallocAllocator, 4096, 4096>::Allocate (3228 times, avg 13 ms)
33911 ms: std::tie<const unsigned int, const unsigned int, const unsigned int, const unsigned int> (897 times, avg 37 ms)
33854 ms: std::tie<const unsigned long long, const unsigned long long> (1897 times, avg 17 ms)
27886 ms: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (11156 times, avg 2 ms)
I mentioned this result to @chandlerc, and he suggested this direction.
AnalysisManager is already explicitly instantiated, and getResultImpl
doesn't need to be inlined. Move the definition to an Impl header, and
include that header in files that explicitly instantiate
AnalysisManager. There are only four (real) IR units:
- function
- module
- loop
- cgscc
Looking at a specific transform (ArgumentPromotion.cpp), here are three
compilations before & after this change:
BEFORE:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 258.15MB
real: 0m6.297s
peak memory: 257.54MB
real: 0m5.906s
peak memory: 257.47MB
real: 0m6.219s
AFTER:
$ for i in $(seq 3) ; do ./ccit.bat ; done
peak memory: 235.35MB
real: 0m5.454s
peak memory: 234.72MB
real: 0m5.235s
peak memory: 234.39MB
real: 0m5.469s
The 20MB of memory saved seems real, and the time improvement seems like
it is there.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73817
Summary:
llvm-objdump started warning when asked to disassemble a section that
isn't present in the input files, in Yuanfang Chen's change:
d16c162c94. The problem is that the
logic was restricted only to the generic llvm-objdump parser, not to the
Mach-O-specific parser used for Apple toolchain compatibility. The
solution is to log section names from the Mach-O parser.
The macho-cstring-dump.test has been updated to fail if it encounters
this new warning in the future.
Reviewers: pete, ab, lhames, jhenderson, grimar, MaskRay, ychen
Reviewed By: jhenderson, grimar
Subscribers: rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73586
I previously removed the code in ValueObject::GetExpressionPath that
took advantage of the parameter `qualify_cxx_base_classes`. As a result,
this is now unused and can be removed.
Summary:
I think that there are very few things from clang that actually need forward
declaration, so not having a ClangForward header makes sense.
Differential Revision: https://reviews.llvm.org/D73827
Summary:
Method appendLoopsToWorklist is duplicate in LoopUnroll and in the
LoopPassManager as an internal method. Make it an utility.
Reviewers: dmgreen, chandlerc, fedor.sergeev, yamauchi
Subscribers: mehdi_amini, hiraditya, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73569
Summary:
* Most of the simplifications in SimplifyShuffleVectorInst depend on the
concrete value of, or the length of the mask vector. For scalable
vectors, this cannot be known at compile time.
** for these tests, detect if the vector is scalable before attempting
the transformation
* The functions ShuffleVectorInst::getMaskValue and
ShuffleVectorInst::getShuffleMask access the value of the constant mask.
However, since the length of the mask is unknown at compile time, these
function do not work for scalable vectors. Add asserts to ensure that
the input mask is not scalable
Reviewers: efriedma, sdesmalen, apazos, chrisj, huihuiz
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73555
Adds a replaceOperand() helper, which is like Instruction.setOperand()
but adds the old operand to the worklist. This reduces the amount of
missing or incorrect worklist management.
This only applies the helper to a relatively small subset of
setOperand() calls in InstCombine, namely those of the pattern
`I.setOperand(); return &I;`, where it is most obviously applicable.
Differential Revision: https://reviews.llvm.org/D73803
This renames Worklist.AddDeferred() to Worklist.add() and
Worklist.Add() to Worklist.push(). The intention here is that
Worklist.add() should be the go-to method for explicit worklist
management, while the raw Worklist.push() is mostly for
InstCombine internals. I will then migrate uses of Worklist.push()
to Worklist.add() in followup changes.
As suggested by spatel on D73411 I'm also changing the remaining
method names to lowercase first character, in line with current
coding standards.
Differential Revision: https://reviews.llvm.org/D73745
Summary:
Clang -fpic defaults to -fno-semantic-interposition (GCC -fpic defaults
to -fsemantic-interposition).
Users need to specify -fsemantic-interposition to get semantic
interposition behavior.
Semantic interposition is currently a best-effort feature. There may
still be some cases where it is not handled well.
Reviewers: peter.smith, rnk, serge-sans-paille, sfertile, jfb, jdoerfert
Subscribers: dschuff, jyknight, dylanmckay, nemanjai, jvesely, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, arphaman, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73865
Followup to D73135. If the target doesn't have hard float (default
for ARM), then we assert when trying to soften the result of vector
reduction intrinsics. This patch marks these for expansion as well.
(A bit odd to use vectors on a target without hard float ... but
that's where you end up if you expose target-independent vector types.)
Differential Revision: https://reviews.llvm.org/D73854