Xcode 14 no longer puts the Rosetta expanded shared cache in a directory
named "16.0". Instead, it includes the real version number (e.g. 13.0),
the build string and the architecture, similar to the device support
directory names for iOS, tvOS and watchOS.
Currently, when there are multiple directories, we might end up picking
the wrong one in GetSDKDirectoryForCurrentOSVersion. The problem is that
without the build string we have no way to differentiate between
multiple directories with the same version number. This patch fixes the
problem by using GetOSBuildString which, as the name implies, returns
the build string if known.
This also adds a test for Rosetta debugging on Apple Silicon. Depending
on whether the Rosetta expanded shared cache is present, the test
ensures that there is or isn't a diagnostic about reading out of memory.
rdar://97576121
Differential revision: https://reviews.llvm.org/D130540
I noticed that the test TestSetWatchpoint.py was failing every so often
on macOS. The failure was in the last assert, that after destroying the
SBTarget containing it, the SBWatchpoint was still saying it was valid.
IsValid in this case just meant the watchpoint weak pointer could be turned
into a shared pointer. The watchpoint shared pointers have two strong references
in general, one to the "Target::m_last_created_watchpoint", and one in the
Target::m_watchpoint_list. Target::Destroy reset the last created watchpoint
but neglected to call RemoveAll on the watchpoint list (it does the analogous
work for the internal & external breakpoint lists...) This patch does the
equivalent cleanup for the watchpoint list.
DR2338 clarified that it was undefined behavior to set the value outside the
range of the enumerations values for an enum without a fixed underlying type.
We should diagnose this with a constant expression context.
Differential Revision: https://reviews.llvm.org/D130058
In this patch we replace common code patterns with the use of utility
functions for dealing with profiling metadata. There should be no change
in functionality, as the existing checks should be preserved in all
cases.
Reviewed By: bogner, davidxl
Differential Revision: https://reviews.llvm.org/D128860
Currently, there is significant code duplication for dealing with
MD_prof metadata throughout the compiler. These utility functions can
improve code reuse and simplify boilerplate code when dealing with
profiling metadata, such as branch weights. The inent is to provide a
uniform set of APIs that allow common tasks, such as identifying
specific types of MD_prof metadata and extracting branch weights.
Future patches can build on this initial implementation and clean up the
different implementations across the compiler.
Reviewed By: bogner
Differential Revision: https://reviews.llvm.org/D128858
Currently, the IR to MIR translator can only handle two kinds of constant
inputs to dbg.values intrinsics: constant integers and constant floats. In
particular, it cannot handle pointers created from IntToPtr ConstantExpression
objects.
This patch addresses the limitation above by replacing the IntToPtr with
its input integer prior to converting the dbg.value input.
Patch by Felipe Piovezan!
Differential Revision: https://reviews.llvm.org/D130642
Summary:
Linkers use `--verbose` to let users investigate search libraries among
other things. The linker wrapper was incorrectly not forwarding this to
the linker job. This patch simply renames this so users can still see
verbose messages from the linker if it was passed.
This patch adds support for AsmPrinter `-mmlir` options to the Flang driver.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D130598
Since the calendar is added in C++20 the existing operators are removed.
Implements part of:
- P1614R2 The Mothership has Landed
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D129887
This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled) for RISCV when vector instructions are available for the target configuration.
At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.
This exposes a large amount of code which hasn't otherwise been on by default, and thus may not have been fully exercised. Given that, having issues fall out is not unexpected. If you find issues, please make sure to include as much information as you can when reverting this change.
Differential Revision: https://reviews.llvm.org/D129013
Lambdas with trailing return type 'auto' are annotated incorrectly. It causes a misformatting. The simpliest code to reproduce is:
```
auto list = {[]() -> auto { return 0; }};
```
Fixes https://github.com/llvm/llvm-project/issues/54798
Reviewed By: HazardyKnusperkeks, owenpan, curdeius
Differential Revision: https://reviews.llvm.org/D130299
Lifting the core functionalities of the clang-offload-bundler into a
user-facing library/API. This will allow online and JIT compilers to
bundle and unbundle files without spawning a new process.
This patch lifts the classes and functions used to implement
the clang-offload-bundler into a separate OffloadBundler.cpp,
and defines three top-level API functions in OfflaodBundler.h.
BundleFiles()
UnbundleFiles()
UnbundleArchives()
This patch also introduces a Config class that locally stores the
previously global cl::opt options and arrays to allow users to call
the APIs in a multi-threaded context, and introduces an
OffloadBundler class to encapsulate the top-level API functions.
We also lift the BundlerExecutable variable, which is specific
to the clang-offload-bundler tool, from the API, and replace
its use with an ObjcopyPath variable. This variable must be set
in order to internally call llvm-objcopy.
Finally, we move the API files from
clang/tools/clang-offload-bundler into clang/lib/Driver and
clang/include/clang/Driver.
Differential Revision: https://reviews.llvm.org/D129873
The instruction is used to modify wave priority with the intent
to affect VALU execution and currently we can reschedule VALU
around it since that VALU does not have side effects.
Differential Revision: https://reviews.llvm.org/D130654
This patch adds canonicalization conditions for omp.atomic.update thus
eliminating it when it becomes just a write or a no-op due to other
changes during canonicalization.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D126531
Teach libDebugInfo (llvm-dwarfdump) and lldb about DWARF tags and
attributes for pointer authentication. These values have been emitted by
Apple clang for several releases. Although upstream LLVM doesn't emit
these values yet, we hope to upstream that part sometime soon.
Differential revision: https://reviews.llvm.org/D130215
The problem here is target independent, but particularly painful on RISCV. If we chose to vectorize such that vscale x 2 x i32 is our widest type and fits in a register, a naive expansion of i64 comparisons results in comparisons and index types at <scalabe x 2 x i64>. This requires both an LMUL of 2, and a VSETVLI toggle in the loop. Note that we could have used <vscale x 2 x i32> for the compairons legally given the range of the trip count.
Use a delegating constructor to remove the last use of the deprecated
ctor of `TypeErasedDataflowAnalysis`, and then delete it.
Differential Revision: https://reviews.llvm.org/D130653
Without this, the intrinsic will be expanded to an integer; thereby an
explicit copy (from GPR to SIMD register) will be codegen'd. This matches the
general convention of using "v1" types to represent scalar integer operations in
vector registers.
The similar approach is observed in D56616, and the pattern likely applies on
other intrinsic that accepts integer scalars (e.g.,
int_aarch64_neon_sqdmulls_scalar)
Differential Revision: https://reviews.llvm.org/D130548
This adds similar heuristics to G_GLOBAL_VALUE, querying the cost of
materializing a specific constant in code size. Doing so prevents us from
sinking constants which require multiple instructions to generate into
use blocks.
Code size savings on CTMark -Os:
Program size.__text
before after diff
ClamAV/clamscan 381940.00 382052.00 0.0%
lencod/lencod 428408.00 428428.00 0.0%
SPASS/SPASS 411868.00 411876.00 0.0%
kimwitu++/kc 449944.00 449944.00 0.0%
Bullet/bullet 463588.00 463556.00 -0.0%
sqlite3/sqlite3 284696.00 284668.00 -0.0%
consumer-typeset/consumer-typeset 414492.00 414424.00 -0.0%
7zip/7zip-benchmark 595244.00 594972.00 -0.0%
mafft/pairlocalalign 247512.00 247368.00 -0.1%
tramp3d-v4/tramp3d-v4 372884.00 372044.00 -0.2%
Geomean difference -0.0%
Differential Revision: https://reviews.llvm.org/D130554
I am playing with the LoopDataPrefetch pass and found out that it
bails to work with a pointer in a non-zero address space. This
patch adds the target callback to check if an address space is to
be considered for prefetching. Default implementation still only
allows address space 0, so this is NFCI.
This does not currently affect any known targets, but seems to be
generally useful for the future.
Differential Revision: https://reviews.llvm.org/D129795
D98179 added a mechanism to sort tests by test time to run slow tests
early, increasing potential parallelism. It also added a feature where
negative tests would be marked as negative, allowing subsequent test
runs to run them earlier. Unfortunately it never actually stored the
negative time, even if all the other code seemed to be inplace to sort
them early. Luckily the fix seems simple.
Differential Revision: https://reviews.llvm.org/D130570
I don't have any evidence these particular uses are actually causing any
issues, but we should avoid accidentally truncating immediate values
depending on the host.
We can use slli.uw by C followed by sh1add. Similar can be done
for multiples of 5 and 9. We need to make sure that C is less than
32 to stay in bounds of the 5-bit immediate for slli.uw.
We have existing patterns for (mul X, 3<<C) that use sh1add
followed by slli. That order doesn't allow the and to be folded.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130146
A mul by a negated power of 2 is a slli followed by neg. This doesn't
require any constant materialization and may be lower latency than mul.
The neg may also be foldable into other arithmetic.
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D130047
We can't guarantee the long always 64 bits like WINDOWS or LLP64 data
model (rare but we should consider).
So use int64_t from inttypes.h and safe in this case.
Fixes https://github.com/llvm/llvm-project/issues/55911 .
Change `sinf` range reduction to mod pi/16 to be shared with `cosf`.
Previously, `sinf` used range reduction `mod pi`, but this cannot be used to implement `cosf` since the minimax algorithm for `cosf` does not converge due to critical points at `pi/2`. In order to be able to share the same range reduction functions for both `sinf` and `cosf`, we change the range reduction to `mod pi/16` for the following reasons:
- The table size is sufficiently small: 32 entries for `sin(k * pi/16)` with `k = 0..31`. It could be reduced to 16 entries if we treat the final sign separately, with an extra multiplication at the end.
- The polynomials' degrees are reduced to 7/8 from 15, with extra computations to combine `sin` and `cos` with trig sum equality.
- The number of exceptional cases reduced to 2 (with FMA) and 3 (without FMA).
- The latency is reduced while maintaining similar throughput as before.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D130629
After evaluating the new style I noticed inner namespaces are now
indented. I am not fond of that style and I've seen some other review
comment in this regard so I propose we remove this option and use the
LLVM default not to indent it.
Reviewed By: ldionne, philnik, var-const, #libc
Differential Revision: https://reviews.llvm.org/D129441
All of our other tests are functionality tests constrained to some
specific configuration. This one is intended to float with the
default configuration so that changes in that default are visible
in reviews. Note that our current default does not enable
vectorization at all; thus the current output is unvectorized.
Sparse compiler failed on the provided test (when the sparse kernel is nested in a scf structrual operator).
Reviewed By: bixia
Differential Revision: https://reviews.llvm.org/D130609
The options -f{no-}color-diagnostics have been supported in driver. This
supports the behaviors in scanning, parsing, and semantics, and the
behaviors are exactly the same as the driver.
To illustrate the added behaviour, consider the following input file:
```! file.f90
program m
integer :: i = k
end
```
In the following invocations, "error: Must be a constant value" _will be_
formatted:
```
$ flang-new file.f90
error: Semantic errors in file.f90
./file.f90:2:18: error: Must be a constant value
integer :: i = k
```
Note that "error: Semantic errors in file.f90" is also formatted, which
is supported in https://reviews.llvm.org/D126164.
Also note that only "error", "warning" and "portability" are formatted.
Check the following input file:
```! file2.f90
program m
integer :: i =
end
```
```
$ flang-new test2.f90
error: Could not parse test2.f90
./test2.f90:2:11: error: expected '('
integer :: i =
^
./test2.f90:2:3: in the context: statement function definition
integer :: i =
^
...
```
The "error: Could not parse test2.f90" and "error: expected '('" are
formatted. Others such as "in the context" are not formatted yet, which
may or may not be supported.
Reviewed By: awarzynski
Differential Revision: https://reviews.llvm.org/D126166