Add missing coverage exposed by D114783.
There should be no associated IRELATIVE, otherwise (a) glibc ld.so may
crash (b) it wastes space (c) unused IPLT causes confusion.
This produced a few verifier warnings that came up while I was
investigating something else here. Change the assembly to match the
comment so it's warning free. Doesn't seem necessary to change the
CHECKs for the test since it's just a bug in the test, not in the code
under test.
Avoid duplicating the string decoding - improve the error messages down
in form parsing (& produce an Expected<const char*> instead of
Optional<const char*> to communicate the extra error details)
If all symbols in a lookup match before we reach the end of the search order
then bail out of the search-order loop early.
This should reduce unnecessary contention on the session lock and improve
readability of the debug logs.
This makes a bunch of these call sites independent of a follow-up change
I'm making to have getAsCString return Expected<const char*> for more
descriptive error messages so that the failures there can be
communicated up to DWARFVerifier (or other callers who want to provide
more verbose diagnostics) so DWARFVerifier doesn't have to re-implement
the string lookup logic and error checking.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum: 138828622701
!Attributes: 2
!CFGChecksum: 281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
Defined in [`specialized.algorithms`](wg21.link/specialized.algorithms).
Also:
- refactor the existing non-range implementation so that most of it
can be shared between the range-based and non-range-based algorithms;
- remove an existing test for the non-range version of
`uninitialized_default_construct{,_n}` that likely triggered undefined
behavior (it read the values of built-ins after default-initializing
them, essentially reading uninitialized memory).
Reviewed By: #libc, Quuxplusone, ldionne
Differential Revision: https://reviews.llvm.org/D115315
Approximately revert D103431.
LDS variables are allocated at kernel launch and deallocated at kernel exit.
The address is therefore kernel execution dependent. Global variables are
initialized by values written to .data, which can't be done for a LDS variable
as there is no kernel running, or by a global constructor. Initializing the
global to the address of some LDS allocated by a global constructor is possible
but indistinguishable from undef.
Assigning the address of a LDS variable to a global should be a sema error. It
isn't for openmp, haven't checked other languages. Failing that it could be set
to undef, perhaps in this pass.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D115413
A few tests in the test suite require support for Bash. For example,
tests that run a program and send data through stdin to it require some
way of piping the data in, and we use a Bash script for that.
However, some executors (e.g. an embedded systems simulator) do not
support Bash, so these tests will fail. This commit adds a Lit feature
that tries to detect whether Bash is available through conventional
means, and disables the tests that require it otherwise.
Differential Revision: https://reviews.llvm.org/D114612
Break up the vectorization pre-condition into the part checking for
static shape and the rest checking if the linalg op is supported by
vectorization. This allows checking if an op could be vectorized if it
had static shapes.
Differential Revision: https://reviews.llvm.org/D115754
Introduce a FreeBSDKernel plugin that provides the ability to read
FreeBSD kernel core dumps. The plugin utilizes libfbsdvmcore to provide
support for both "full memory dump" and minidump formats across variety
of architectures supported by FreeBSD. It provides the ability to read
kernel memory, as well as the crashed thread status with registers
on arm64, i386 and x86_64.
Differential Revision: https://reviews.llvm.org/D114911
The pdb lldb tests do not work correctly with both the VS2019 and VS2017 toolsets at the moment. This change updates several of the tests to work with both toolsets. Unfortunately, this makes the tests suboptimal for both toolsets, but we can update them to be better for VS2019 once we officially drop VS2017. This change is meant to bridge the gap until the update happens, so that the buildbots can work with either toolset.
Differential Revision: https://reviews.llvm.org/D115482
~(iN X s>> (N-1)) & Y --> (X s< 0) ? 0 : Y
https://alive2.llvm.org/ce/z/JKlQ9x
This is similar to D111410 / 727e642e97 ,
but it includes a 'not' of the signbit and so it
saves an instruction in the basic pattern.
DAGCombiner or target-specific folds can expand
this back into bit-hacks.
The diffs in the logical-select tests are not true
regressions - running early-cse and another round
of instcombine is expected in a normal opt pipeline,
and that reduces back to a minimal form as shown
in the duplicated PhaseOrdering test.
I have no understanding of the SystemZ diffs, so
I made the minimal edits suggested by FileCheck to
make that test pass again. That whole test file is
wrong though. It is running the entire optimizer (-O2)
to check IR, and then topping that by even running
codegen and checking asm. It needs to be split up.
Fixes#52631
While the default value for the amdgpu-flat-work-group-size attribute,
"1, 256", matches the defaults from Clang, some users of the ROCDL dialect,
namely Tensorflow, use larger workgroups, such as 1024. Therefore,
instead of hardcoding this value, we add a rocdl.max_flat_work_group_size
attribute that can be set on GPU kernels to override the default value.
Reviewed By: whchung
Differential Revision: https://reviews.llvm.org/D115741
Currently, we'll try to instantiate a ClangREPL for every known
language. The plugin manager already knows what languages it supports,
so rely on that to only instantiate a REPL when we know the requested
language is supported.
rdar://86439474
Differential revision: https://reviews.llvm.org/D115698
The 1st test corresponds to a minimally optimized (mem2reg)
version of the example in:
issue #52631
The 2nd test copies an existing instcombine test with the
same pattern. If we canonicalize differently, we can miss
reducing to minimal form in a single invocation of
-instcombine, but that should not escape the normal opt
pipeline.
The IceLake scheduler model is still mainly a copy of the SkylakeServer model.
This patch adjusts the integer shuffle classes to account for most instructions now working on Port 1 as well as Port 5.
This is based off Agner + uops.info as well as the PR48110 report.
Differential Revision: https://reviews.llvm.org/D115547
Test that STRICT_FMINNUM/FMAXNUM are lowered to libcalls for f32/f64.
The RISC-V instructions don't match the behavior of fmin/fmax libcalls
with respect to SNaN.
Promoting FMINNUM/FMAXNUM for f16 needs more work outside of the
RISC-V backend.
Reviewed By: asb, arcbbb
Differential Revision: https://reviews.llvm.org/D115680
This patch revises the warning fix done in
a93b1792f1. Specifically, it rolls the
MRI.getType call into the assert, thereby avoiding the named variable.
If a copy related symbol (say `copy`) is referenced in two .o
files, this change removes a duplicated line from the -Map output:
```
202470 202470 1 1 .bss.rel.ro
202470 202470 1 1 <internal>:(.bss.rel.ro)
202470 202470 1 1 copy
removed 202470 202470 1 1 copy
```
Differential Revision: https://reviews.llvm.org/D115697
This patch adds a new tool chain, HIPSPVToolChain, for emitting HIP
device code as SPIR-V binary. The SPIR-V binary is emitted by using an
external tool, SPIRV-LLVM-Translator, temporarily. We intend to switch
the translator to the llc tool when the SPIR-V backend lands on LLVM
and proves to work well on HIP implementations which consume SPIR-V.
Before the SPIR-V emission the tool chain loads an optional external
pass plugin, either automatically from a HIP installation or from a
path pointed by --hipspv-pass-plugin, and runs passes that are meant
to expand/lower HIP features that do not have direct counterpart in
SPIR-V (e.g. dynamic shared memory).
Code emission for SPIR-V will be enabled and HIPSPVToolChain tests
will be added in the follow up patch part 3.
Other changes: New option ‘-nohipwrapperinc’ is added to exclude HIP
include wrappers. The reason for the addition is that they cause
compile errors when compiling HIP sources for the host side for HIPCL
and HIPLZ implementations. New option is added to avoid this issue.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D110618
Make clang-scan-deps use the virtual path for module maps instead of the on disk
path. This is needed so that modulemap relative lookups are done correctly in
the actual module builds. The file dependencies still use the on disk path as
that's what matters for build invalidation.
Differential Revision: https://reviews.llvm.org/D114206
In order to support constrained FP intrinsics we need to model FRM
dependency. Whether or not a instruction uses FRM is based on a 3
bit field in the instruction. Because of this we can't add
'Uses = [FRM]' to the tablegen descriptions.
This patch examines the immediate after isel and adds an implicit
use of FRM. This idea came from Roger Ferrer Ibanez.
Other ideas:
We could be overly conservative and just pretend all instructions with
frm field read the FRM register. Or we could have pseudoinstructions
for CodeGen with rounding mode.
Reviewed By: asb, frasercrmck, arcbbb
Differential Revision: https://reviews.llvm.org/D115555
In 2015-05, GCC added the configure option `--enable-default-pie`. When enabled,
* in the absence of -fno-pic/-fpie/-fpic (and their upper-case variants), -fPIE is the default.
* in the absence of -no-pie/-pie/-shared/-static/-static-pie, -pie is the default.
This has been adopted by all(?) major distros.
I think default PIE is the majority in the Linux world, but
--disable-default-pie users is not that uncommon because GCC upstream hasn't
switched the default yet (https://gcc.gnu.org/PR103398).
This patch add CLANG_DEFAULT_PIE_ON_LINUX which allows distros to use default PIE.
The option is justified as its adoption can be very high among Linux distros
to make Clang default match GCC, and is likely a future-new-default, at which
point we will remove CLANG_DEFAULT_PIE_ON_LINUX.
The lit feature `default-pie-on-linux` can be handy to exclude default PIE sensitive tests.
Reviewed By: foutrelis, sylvestre.ledru, thesamesam
Differential Revision: https://reviews.llvm.org/D113372
When building LLVM static libraries, we should not make symbols more
visible than CMAKE_CXX_VISIBILITY_PRESET, since the goal may be to have
a purely hidden llvm embedded in another library. Instead, we only
define LLVM_EXTERNAL_VISIBILITY for the dynamic library build (when
LLVM_BUILD_LLVM_DYLIB=YES).
Differential Revision: https://reviews.llvm.org/D113610
needsPltAddr is equivalent to `needsCopy && isFunc`. In many places, it is
equivalent to `needsCopy` because the non-STT_FUNC cases are ruled out.
Reviewed By: ikudrin, peter.smith
Differential Revision: https://reviews.llvm.org/D115603
In debug info, we expect variables to have location info if they are used and we don't want location info for functions that are not used. However, if an unused function is inlined, we could have the scenario where a function is not in the final binary but its static variable is in the final binary. Ensure that variables in the final binary have location debug info even if their scope was inlined.
Also add `--implicit-check-not` to a test for clarity.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D115565
We already do this for splat nodes that carry a VL, but not for
splats that use VLMAX.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D115483
Introduce a FreeBSDKernel plugin that provides the ability to read
FreeBSD kernel core dumps. The plugin utilizes libfbsdvmcore to provide
support for both "full memory dump" and minidump formats across variety
of architectures supported by FreeBSD. It provides the ability to read
kernel memory, as well as the crashed thread status with registers
on arm64, i386 and x86_64.
Differential Revision: https://reviews.llvm.org/D114911