When asking how many parts are required for a scalable vector type
there are occasions when it cannot be computed. For example, <vscale x 1 x i3>
is one such vector for AArch64+SVE because at the moment no matter how we
promote the i3 type we never end up with a legal vector. This means
that getTypeConversion returns TypeScalarizeScalableVector as the
LegalizeKind, and then getTypeLegalizationCost returns an invalid cost.
This then causes BasicTTImpl::getNumberOfParts to dereference an invalid
cost, which triggers an assert. This patch changes getNumberOfParts to
return 0 for such cases, since the definition of getNumberOfParts in
TargetTransformInfo.h states that we can use a return value of 0 to represent
an unknown answer.
Currently, LoopVectorize.cpp is the only place where we need to check for
0 as a return value, because all other instances will not currently
ask for the number of parts for <vscale x 1 x iX> types.
In addition, I have changed the target-independent interface for
getNumberOfParts to return 1 and assume there is a single register
that can fit the type. The loop vectoriser has lots of tests that are
target-independent and they relied upon the 0 value to mean the
answer is known and that we are not scalarising the vector.
I have added tests here that show we correctly return an invalid cost
for VF=vscale x 1 when the loop contains unusual types such as i7:
Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll
Differential Revision: https://reviews.llvm.org/D113772
There are various tests that need to be adjusted to test the right
thing with instruction referencing -- usually because the internal
representation of variables is different, sometimes that location lists
change. This patch makes a bunch of tests explicitly not use
instruction referencing, so that a check-llvm test with instruction
referencing on for x86_64 doesn't fail. I'll then convert the tests
to have instr-ref CHECK lines, and similar.
Differential Revision: https://reviews.llvm.org/D113194
This patch adds the conversion pattern for
`fir.box_tdes`.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D113931
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
Similar other cases in the current function (e.g. when the step is 1 or
-1), applying loop guards can lead to tighter upper bounds for the
backedge-taken counts.
Fixes PR52464.
Reviewed By: reames, nikic
Differential Revision: https://reviews.llvm.org/D113578
Currently the stepvector intrinsic only supports element types that
are integers of size 8 bits or more. This patch adds support for the
creation of stepvectors with smaller element types by creating
the intrinsic with i8 elements that we then truncate to the requested
size.
It's not currently possible to write a vectoriser test to exercise
this code path so I have added a unit test here:
llvm/unittests/IR/IRBuilderTest.cpp
Differential Revision: https://reviews.llvm.org/D113767
Add test coverage for a problem that was fixed by D113493: when updating
live intervals, fix handling of live ranges that were previously tied to
an early-clobber def but no longer are.
Delegate updating of LiveIntervals to each target's
convertToThreeAddress implementation, instead of repairing LiveIntervals
after the fact in TwoAddressInstruction::convertInstTo3Addr.
Differential Revision: https://reviews.llvm.org/D113493
The information in these perations is used by other operation.
At this point they should not have anymore uses.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D113971
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>
This change make WidenVecRes_SELECT work for scalable vectors.
This patch is split from [D110319](https://reviews.llvm.org/D110319)
Signed-off-by: Eric Tang <tangxingxin1008@gmail.com>
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D110388
Mention support for MinGW in the docs. Rename the existing windows
CI jobs to Clang-cl, as both Clang-cl and MinGW are equally much
"Windows", just different toolchain environments.
Add an XFAIL for a recently added test that fails in the MinGW DLL
configuration (with an explanation of what's causing the failure).
Differential Revision: https://reviews.llvm.org/D112215
Patch D113697 added default function arguments to template specializations of `ConvertToBinary`.
According to https://en.cppreference.com/w/cpp/language/template_specialization this not allowed:
> Default function arguments cannot be specified in explicit specializations of function templates, member function templates, and member functions of class templates when the class is implicitly instantiated.
It happens to compile with gcc, clang and msvc 14.30 (Visual Studio 2022), but not msvc 14.29 (Visual Studio 2020). Even for the compilers that syntactically accept it, the default argument will never be used (only the default argument of the template declaration). From https://en.cppreference.com/w/cpp/language/function_template
> Note that only non-template and primary template overloads participate in overload resolution.
That is, the explicit function template specialization is not added to the overload candidate set. Only after all the parameter types are known, are the explicit specializations chosen, at which point the default function argument is ignored.
Also see D85657.
Reviewed By: klausler
Differential Revision: https://reviews.llvm.org/D114032
The key_type type definition for map containers is useful in some
generic, template-based programming scenarios. The addition of key_type
to MapVector is consistent with other map types like DenseMap.
Differential Revision: https://reviews.llvm.org/D113242
As discussed in https://reviews.llvm.org/D113809#3128636. It's a bit
unfortunate to move the asserts away from the structs whose sizes
they're checking, but it's a far better developer experience when one of
the asserts is violated, because you get a single error instead of every
single source file including the header erroring out.
This reverts commit 39e9f5d368.
Reverting, as we needed to re-revert the benchmarks move because it was
causing a build failure in the Fuchsia bots due to the way they consume
libcxx's CMakeLists. I want to make sure I understand where the fix
should be for that. After that, I'll incorporate the change here in the
re-reland.
The reworking of the gdb client tests into the PlatformClientTestBase broke
the test for this. I did the mutatis mutandis for the move, but the test
still fails. Reverting till I have time to figure out why.
This reverts commit b715b79d54.
This struct was added and was intended to be used, but it was missed in the original patch.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D114041
The `r_address` field of `relocation_info` is only 4 bytes, so our
offset field (which is the `r_address` field adjusted for subsection
splitting) also only needs to be 4 bytes. This reduces the structure
size from 32 bytes to 24 bytes.
Combined with https://reviews.llvm.org/D113813, this is a minor perf
improvement for linking an internal app, tested on two machines:
```
smol-relocs baseline difference (95% CI)
sys_time 7.367 ± 0.138 7.543 ± 0.157 [ +0.9% .. +3.8%]
user_time 21.843 ± 0.351 21.861 ± 0.450 [ -1.3% .. +1.4%]
wall_time 20.301 ± 0.307 20.556 ± 0.324 [ +0.1% .. +2.4%]
samples 16 16
smol-relocs baseline difference (95% CI)
sys_time 2.923 ± 0.050 2.992 ± 0.018 [ +1.4% .. +3.4%]
user_time 10.345 ± 0.039 10.448 ± 0.023 [ +0.8% .. +1.2%]
wall_time 12.068 ± 0.071 12.229 ± 0.021 [ +1.0% .. +1.7%]
samples 15 12
```
More importantly though, this change by itself reduces our maximum
resident set size by 220 MB (2.75%, from 7.85 GB to 7.64 GB) on the
first machine. On the second machine, it reduces it by 125 MB (1.94%,
from 6.31 GB to 6.19 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113818
We can lay out Symbol more optimally to reduce its size from 56 bytes to
48 bytes by eliminating unnecessary padding, and we can lay out Defined
such that its bitfield members are placed in the tail padding of Symbol
(on ABIs which support this), to reduce it from 96 bytes to 80 bytes (8
bytes from the Symbol reduction, and 8 bytes from the tail padding
reuse).
This is perf-neutral for an internal app (results from two different
machines):
```
smol-syms baseline difference (95% CI)
sys_time 7.430 ± 0.202 7.440 ± 0.193 [ -2.6% .. +2.9%]
user_time 21.443 ± 0.513 21.206 ± 0.396 [ -3.3% .. +1.1%]
wall_time 20.453 ± 0.534 20.222 ± 0.488 [ -3.7% .. +1.5%]
samples 9 8
smol-syms baseline difference (95% CI)
sys_time 3.011 ± 0.050 3.040 ± 0.052 [ -0.4% .. +2.3%]
user_time 10.416 ± 0.075 10.496 ± 0.091 [ +0.1% .. +1.4%]
wall_time 12.229 ± 0.144 12.354 ± 0.192 [ -0.1% .. +2.1%]
samples 14 13
```
However, on the first machine, it reduces maximum resident set size by
65.9 MB (0.8%, from 7.92 GB to 7.85 GB). On the second machine, it
reduces it by 92 MB (1.4%, from 6.40 GB to 6.31 GB).
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113813
It was checking for 64-bit builds incorrectly. Unfortunately,
ConcatInputSection has grown a bit in the meantime, and I don't see any
obvious way to shrink it. Perhaps icfEqClass could use 32-bit hashes
instead of 64-bit ones, but xxHash64 is supposed to be much faster than
xxHash32 (https://github.com/Cyan4973/xxHash#benchmarks), so that sounds
like a loss. (Unrelatedly, we should really look at using XXH3 instead
of xxHash64 now.)
Reviewed By: #lld-macho, int3
Differential Revision: https://reviews.llvm.org/D113809
We don't actually need a local copy of the main executable to debug
a remote process. So instead of treating "no local module" as an error,
see if the LaunchInfo has an executable it wants lldb to use, and if so
use it. Then report whatever error the remote server returns.
Differential Revision: https://reviews.llvm.org/D113521
Instead of popping them and then immediately throwing them away, we can
just filter out globals and items in different scopes before adding them
to WorkList. Shouldn't change anything but keep the queue smaller.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D113864
It was being used occasionally already, and using it on the constructor
and getDbgEntityID has obvious type safety benefits.
Also use llvm_unreachable in the switch as usual, but since only these
two values are used in constructor calls I think it's still NFC.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D113862
LineCoverageIterator is not providing access to a mutable object. Fix it
to iterate over `const LineCoverageStats` so that `operator->()`
compiles again after 6b9b86db9d.
LLDB doesn't use only the python stable ABI, which means loading
it into an incompatible python can cause the process to crash.
_lldb.so should be named with the full EXT_SUFFIX from sysconfig
-- such as _lldb.cpython-39-darwin.so -- so this doesn't happen.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D112972
This is another attempt at D59351 which attempted to add --update-section, but
with some heuristics for adjusting segment/section offsets/sizes in the event
the data copied into the section is larger than the original size of the section.
We are opting to not support this case. GNU's objcopy was able to do this because
the linker and objcopy are tightly coupled enough that segment reformatting was
simpler. This is not the case with llvm-objcopy and lld where they like to be separated.
This will attempt to copy data into the section without changing any other
properties of the parent segment (if the section is part of one).
Differential Revision: https://reviews.llvm.org/D112116
Apparently "{sys.prefix}/bin/python3" isn't where you find the
python interpreter on windows, so the test I wrote for
-print-script-interpreter-info is failing.
We can't rely on sys.executable at runtime, because that will point
to lldb.exe not python.exe.
We can't just record sys.executable from build time, because python
could have been moved to a different location.
But it should be OK to apply relative path from sys.prefix to sys.executable
from build-time to the sys.prefix at runtime.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D113650