I'm /guessing/ this isn't terribly testable without a very large input
file. Even generated from a more compact assembly file, it's probably
best not to generate a giant temporary test file - if I'm wrong about
that/anyone has good suggestions for testing, I'm all ears!
Based on post-commit review feedback from Igor Kudrin on
eed0242330
A downstream test exposed a simple logic bug with the manual pointer
stripping code, fix that by just using stripPointerCasts() on the value.
I don't think there's a way to expose this issue upstream.
Add an optional table lookup after the existing logarithm computation
for MidSize < Size <= MaxSize during size -> class lookups. The lookup is
O(1) due to indexing a precomputed (via constexpr) table based on a size
table. Switch to this approach for the Android size class maps.
Other approaches considered:
- Binary search was found to have an unacceptable (~30%) performance cost.
- An approach using NEON instructions (see older version of D73824) was found
to be slightly slower than this approach on newer SoCs but significantly
slower on older ones.
By selecting the values in the size tables to minimize wastage (for example,
by passing the malloc_info output of a target program to the included
compute_size_class_config program), we can increase the density of allocations
at a small (~0.5% on bionic malloc_sql_trace as measured using an identity
table) performance cost.
Reduces RSS on specific Android processes as follows (KB):
Before After
zygote (median of 50 runs) 26836 26792 (-0.2%)
zygote64 (median of 50 runs) 30384 30076 (-1.0%)
dex2oat (median of 3 runs) 375792 372952 (-0.8%)
I also measured the amount of whole-system idle dirty heap on Android by
rebooting the system and then running the following script repeatedly until
the results were stable:
for i in $(seq 1 50); do grep -A5 scudo: /proc/*/smaps | grep Pss: | cut -d: -f2 | awk '{s+=$1} END {print s}' ; sleep 1; done
I did this 3 times both before and after this change and the results were:
Before: 365650, 356795, 372663
After: 344521, 356328, 342589
These results are noisy so it is hard to make a definite conclusion, but
there does appear to be a significant effect.
On other platforms, increase the sizes of all size classes by a fixed offset
equal to the size of the allocation header. This has also been found to improve
density, since it is likely for allocation sizes to be a power of 2, which
would otherwise waste space by pushing the allocation into the next size class.
Differential Revision: https://reviews.llvm.org/D73824
This lets us remove two pointer indirections (one by removing the pointer,
and another by making the AllocatorPtr declaration hidden) in the C++ wrappers.
Differential Revision: https://reviews.llvm.org/D74356
Summary:
Instead of hand-crafting an offset into the structure returned by
dlopen(3) to get at the link map, use the documented API. This is
described in dlinfo(3): by calling it with `RTLD_DI_LINKMAP`, the
dynamic linker ensures the right address is returned.
This is a recommit of 92e267a94d, with
dlinfo(3) expliclity being referenced only for FreeBSD, non-Android
Linux, NetBSD and Solaris. Other OSes will have to add their own
implementation.
Reviewers: devnexen, emaste, MaskRay, krytarowski
Reviewed By: krytarowski
Subscribers: krytarowski, vitalybuka, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73990
This breaks macOS, because TARGET_OS_EMBEDDED is always defined. Thanks
to Jason Molenda for pointing this out.
Revert "Do not define AcceptPIDFromInferior when it will not be used"
This reverts commit d23c15a687.
This reverts commit 936d1427da.
As discussed in PR41083:
https://bugs.llvm.org/show_bug.cgi?id=41083
...we can assert/crash in EarlyCSE using the current hashing scheme and
instructions with flags.
ValueTracking's matchSelectPattern() may rely on overflow (nsw, etc) or
other flags when detecting patterns such as min/max/abs composed of
compare+select. But the value numbering / hashing mechanism used by
EarlyCSE intersects those flags to allow more CSE.
Several alternatives to solve this are discussed in the bug report.
This patch avoids the issue by doing simple matching of min/max/abs
patterns that never requires instruction flags. We give up some CSE
power because of that, but that is not expected to result in much
actual performance difference because InstCombine will canonicalize
these patterns when possible. It even has this comment for abs/nabs:
/// Canonicalize all these variants to 1 pattern.
/// This makes CSE more likely.
(And this patch adds PhaseOrdering tests to verify that the expected
transforms are still happening in the standard optimization pipelines.
I left this code to use ValueTracking's "flavor" enum values, so we
don't have to change the callers' code. If we decide to go back to
using the ValueTracking call (by changing the hashing algorithm
instead), it should be obvious how to replace this chunk.
Differential Revision: https://reviews.llvm.org/D74285
Null-check and adjut a TypeLoc before casting it to a FunctionTypeLoc.
This fixes a crash in -fsanitize=nullability-return, and also makes the
location of the nonnull type available when the return type is adjusted.
rdar://59263039
Differential Revision: https://reviews.llvm.org/D74355
Summary: The lit feature object-emission was added because Hexagon did not support the integrated assembler, so some tests needed to be turned off with a Hexagon target. Hexagon now supports the integrated assembler, so this feature can be removed.
Reviewers: bcain, kparzysz, jverma, whitequark, JDevlieghere
Reviewed By: JDevlieghere
Subscribers: mehdi_amini, hiraditya, steven_wu, dexonsmith, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73568
Summary: It attempts to devirtualize a call on alloca through vtable loads.
Reviewers: davidxl
Subscribers: mgorny, Prazek, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71308
We have spv.entry_point_abi for specifying the local workgroup size.
It should be decorated onto input gpu.func ops to drive the SPIR-V
CodeGen to generate the proper SPIR-V module execution mode. Compared
to using command-line options for specifying the configuration, using
attributes also has the benefits that 1) we are now able to use
different local workgroup for different entry points and 2) the
tests contains the configuration directly.
Differential Revision: https://reviews.llvm.org/D74012
Summary:
After D72555 has been landed, `linalg.indexed_generic` also accepts ranked
tensor as input and output. Add a test for it.
Differential Revision: https://reviews.llvm.org/D74267
The plugin expects to have undefined references to symbols exported
by the loading process, which isn't supported by shared libraries
on windows.
Differential Revision: https://reviews.llvm.org/D74042
This patch:
- enable frame pointer for AIX;
- update some of red zone comments;
- add/update testcases;
Differential Revision: https://reviews.llvm.org/D72454
The test got re-enabled after d54d71b67e landed.
However it seems that the order is still not deterministic as it
currently passes with -DLLVM_ENABLE_EXPENSIVE_CHECKS=OFF but randomly
fails with expensive checks ON.
Test that instcombine and early-cse can cooperate
to reduce sequences of select patterns that are not
composed of the same underlying instructions.
There's a bug in EarlyCSE (PR41083), and we can test
how much a possible fix (D74285) may affect optimization.
Summary:
This revision adds EDSC support for VectorOps to enable the creation of a `vector_matmul` declaratively. The `vector_matmul` is a simple configuration
of the `vector.contract` op that follows the StructuredOps abstraction.
Differential Revision: https://reviews.llvm.org/D74284
We were checking for extra uses of the negated operand even
if we were not going to create it as part of this canonicalization.
This was showing up as a regression when we limit EarlyCSE as
proposed in D74285.
SUMMARY:
The patch is enable to support Mergeable2ByteCString and Mergeable4ByteCString
Reviewers: daltenty
Subscribers: wuzish, nemanjai, hiraditya
Differential Revision: https://reviews.llvm.org/D74164
Summary:
The return address validation in D71372 will fail if the memory permissions can't be determined. Many embedded stubs either don't implement the qMemoryRegionInfo packet, or don't have memory permissions at all.
Remove the return from the if clause that calls GetLoadAddressPermissions, so this call failing doesn't cause the step out to abort. Instead, assume that the memory permission check doesn't apply to this type of target.
Reviewers: labath, jingham, clayborg, mossberg
Reviewed By: labath, jingham
Subscribers: lldb-commits
Tags: #lldb
Differential Revision: https://reviews.llvm.org/D72513
`vector' uses the keyword-and-predefine mode from gcc, while __vector is
reliably supported.
As a side effect, it also makes the code consistent in its usage of __vector.
Differential Revision: https://reviews.llvm.org/D74129
Summary:
- This option forces a preamble rebuild to handle the odd case
of a missing header file being added
Reviewers: sammccall
Subscribers: ilya-biryukov, javed.absar, MaskRay, jkorous, arphaman, jfb, kadircet, usaxena95, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D73916
Summary:
The refactoring has caused a failure in
http://lab.llvm.org:8011/builders/clang-ppc64le-linux/builds/29265
The idea of failing the symbolization when the symbolizer bufer is too small
was incorrect. The symbolizer can be invoked for other frames that may fit into
the buffer and get symbolized.
Reviewers: vitalybuka, eugenis
Subscribers: dberris, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D74343
ConstantInt values are always represented as constant ranges with a
single element. getConstantInt is obsolete, as pointed out by @nikic
during D60581.
Reviewers: nikic
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D74329
LoopCacheAnalysis currently assumes the loop will be iterated over in
a forward direction. This patch addresses the issue by using the
absolute value of the stride when iterating backwards.
Note: this patch will treat negative and positive array access the
same, resulting in the same cost being calculated for single and
bi-directional access patterns. This should be improved in a
subsequent patch.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D73064
Summary:
Instead of hand-crafting an offset into the structure returned by
dlopen(3) to get at the link map, use the documented API. This is
described in dlinfo(3): by calling it with `RTLD_DI_LINKMAP`, the
dynamic linker ensures the right address is returned.
Reviewers: devnexen, emaste, MaskRay, krytarowski
Reviewed By: krytarowski
Subscribers: krytarowski, vitalybuka, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D73990