This feature requires support of __opencl_c_images, so diagnostics for that is provided as well.
Also, ensure that cl_khr_3d_image_writes feature macro is set to the same value.
Reviewed By: Anastasia
Differential Revision: https://reviews.llvm.org/D106260
nm does not show size for aliased symbols,
we should still extract them if they are external.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D107112
https://reviews.llvm.org/D45592 added a nice feature to be able to specify a breakpoint by a relative path. E.g. passing foo.cpp or bar/foo.cpp or zaz/bar/foo.cpp is fine. However, https://reviews.llvm.org/D68671 by mistake disabled the test that ensured this functionality works. With time, someone made a small mistake and fully broke the functionality.
So, I'm making a very simple fix and the test passes.
Differential Revision: https://reviews.llvm.org/D107126
This patch makes it possible to list a td_library as a rule's data
attribute and get its source files and all its transitive dependencies
at runtime. This is useful for, e.g. shell tests running tblgen.
Note that this is a bit different from how a "normal" (e.g. C++) library
rule would work because those have actual library outputs and the
td_library rule just bundles some source files and includes. If someone
wanted to make use of the includes, they would have to access the TdInfo
provider, but this keeps simple things simple.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D106922
These are unconditionally included in the CMake build as well and
necessary for some odd platforms (even though the C++11 standard says
they shouldn't be).
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D107123
This makes the logic used to determine if targets have the given
features the same as is used in CMake. Incidentally, it enables these
features for the targets added in https://reviews.llvm.org/D106921
which were missing because this was previously a hardcoded list.
Reviewed By: chandlerc
Differential Revision: https://reviews.llvm.org/D107019
Add disassembler support for the NORM and NORMH instructions. These instructions
only exist when the ARC processor is configured with the "norm" extension.
fferential Revision: https://reviews.llvm.org/D107118
I moved the code that tries to combine away each unmerge def into a method in
ArtifactValueFinder class itself. This removes a logically messy lambda and
makes it easier to use the value-finder in more places in future.
Android has its own CMAKE_SYSTEM_NAME, but the OS is Linux (Android
target triples look like aarch64-none-linux-android21). The driver will
therefore search for compiler-rt libraries in the "linux" directory and
not the "android" directory, so the default placement of Android
compiler-rt libraries was incorrect. You could fix it by specifying
COMPILER_RT_OS_DIR manually, but it also makes sense to fix the default,
to save others from having to discover and fix the issue for themselves.
This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following:
- openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared.
- openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode.
- openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall.
- openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint.
Reviewed By: jhuber6
Differential Revision: https://reviews.llvm.org/D106802
This patch prevents GlobalISel from optimizing out redundant branch
instructions when compiling without optimizations.
The motivating example is code like the following common pattern in
Swift, where users expect to be able to set a breakpoint on the early
exit:
public func f(b: Bool) {
guard b else {
return // I would like to set a breakpoint here.
}
...
}
The patch modifies two places in GlobalISEL: The first one is in
IRTranslator.cpp where the removal of redundant branches is made
conditional on the optimization level. The second one is in
AArch64InstructionSelector.cpp where an -O0 *only* optimization is
being removed.
Disabling these optimizations increases code size at -O0 by
~8%. However, doing so improves debuggability, and debug builds are
the primary reason why developers compile without optimizations. We
thus concluded that this is the right trade-off.
rdar://79515454
This tenatively reapplies the patch without modifications, the LLDB
test that has blocked this from landing previously has since been
modified to hopefully no longer be sensitive to this change.
Differential Revision: https://reviews.llvm.org/D105238
When we vectorize a scalar constant, the vector constant is inserted before its
first user if the scalar constant is defined outside the loops to be vectorized.
It is possible that the vector constant does not dominate all its users. To fix
the problem, we find the innermost vectorized loop that encloses that first user
and insert the vector constant at the top of the loop body.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D106609
This commit contains two mildly separate concepts.
First, sending out reviews for things like this is a bit of a
complicated endeavor, since the reviewer list is relatively long, and I
generally rely on prior CLs in this area to find an authoritative list.
Life's quite a bit easier if phab usernames are readily available on the
doc. So part 1 is making those available.
Second, it seems to me that, at the moment, Phabricator makes the most
sense for membership changes (incl. security group nominations). My
reasoning for this is detailed in the diff, and to some extent in
comment #1 of this bug
<https://bugs.chromium.org/p/llvm/issues/detail?id=12#c1>. This change
adds prose to recommend the use of Phabricator for nominations as a
result.
Differential Revision: https://reviews.llvm.org/D106917
Make broadcastable needs the output shape to determine whether the operation
includes additional broadcasting. Include some canonicalizations for TOSA
to remove unneeded reshape.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D106846
accidentally used int64 when they should have been int32. This lead to
a Windows build unit test error (Linux did not catch the problem).
Differential Revision: https://reviews.llvm.org/D107107
Same as 91bd3ad128, this doesn't really
change anything but gives the registers better names than the ones
tablegen would define. And fills in the missing gaps.
Adds magic version header to AllocatorState. This can be used by
out-of-process crash handlers, like Crashpad on Fuchsia, to do offline
reconstruction of GWP-ASan crash metadata.
Crashpad on Fuchsia is intending on dumping the AllocationMetadata pool
and the AllocatorState directly into the minidump. Then, using the
version number, they can unpack the data on serverside using a versioned
unpack tool.
Also add some asserts to make sure the version number gets bumped if the
internal structs get changed.
Reviewed By: eugenis, mcgrathr
Differential Revision: https://reviews.llvm.org/D106690
`PadTensorOp` has verification logic to make sure
result dim must be static if all the padding values are static.
Cast folding might add more static information for the src operand
of `PadTensorOp` which might change a valid operation to be invalid.
Change the canonicalizing pattern to fix this.
This option is a subset of -Bsymbolic-functions. It applies to STB_GLOBAL
STT_FUNC definitions.
The address of a vague linkage function (STB_WEAK STT_FUNC, e.g. an inline
function, a template instantiation) seen by a -Bsymbolic-functions linked
shared object may be different from the address seen from outside the shared
object. Such cases are uncommon. (ELF/Mach-O programs may use
`-fvisibility-inlines-hidden` to break such pointer equality. On Windows,
correct dllexport and dllimport are needed to make pointer equality work.
Windows link.exe enables /OPT:ICF by default so different inline functions may
have the same address.)
```
// a.cc -> a.o -> a.so (-Bsymbolic-functions)
inline void f() {}
void *g() { return (void *)&f; }
// b.cc -> b.o -> exe
// The address is different!
inline void f() {}
```
-Bsymbolic-non-weak-functions is a safer (C++ conforming) subset of
-Bsymbolic-functions, which can make such programs work.
Implementations usually emit a vague linkage definition in a COMDAT group. We
could detect the group (with more code) but I feel that we should just check
STB_WEAK for simplicity. A weak definition will thus serve as an escape hatch
for rare cases when users want interposition on definitions.
GNU ld feature request: https://sourceware.org/bugzilla/show_bug.cgi?id=27871
Longer write-up: https://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic
If Linux distributions migrate to protected non-vague-linkage external linkage
functions by default, the linker option can still be handy because it allows
rapid experiment without recompilation. Protected function addresses currently
have deep issues in GNU ld.
Reviewed By: peter.smith
Differential Revision: https://reviews.llvm.org/D102570
Rather than throwing an error. This way we can still use files like
hwasan_dynamic_shadow.cpp for other platforms without leading to a
preprocessor error.
Differential Revision: https://reviews.llvm.org/D106979
The only remaining plugin dependency in Mangled is CPlusPlusLanguage which it
uses to extract information from C++ mangled names. The static function
GetDemangledNameWithoutArguments is written specifically for C++, so it
would make sense for this specific functionality to live in a
C++-related plugin. In order to keep this functionality in Mangled
without maintaining this dependency, I added
`Language::GetDemangledFunctionNameWithoutArguments`.
Differential Revision: https://reviews.llvm.org/D105215
This patch adds an environment variable field. This is usually used as
the basic type of a List field. This is needed to create the process
launch form.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D106999
This patch adds a Create Target form for the LLDB GUI. Additionally, an
Arch Field was introduced to input an arch and the file and directory
fields now have a required property.
Reviewed By: clayborg
Differential Revision: https://reviews.llvm.org/D106192
The way this test generates object file results in relocation sections for .dwo sections. This is not legal. Re-wrote it to avoid those relocation sections.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D107012
result descriptor (e.g., maxloc, minloc, maxval, minval, all, any, count,
parity, findloc, etc.)
Also add a scalar case for these intrinsic unit tests.
Differential Revision: https://reviews.llvm.org/D106820
Patch by Mohammad Fawaz
This issues started happening after
b373b5990d
Basically, if the memcpy is volatile, the collectUsers() function should
return false, just like we do for volatile loads.
Differential Revision: https://reviews.llvm.org/D106950
Two-level distributed barrier is a new experimental barrier designed
for Intel hardware that has better performance in some cases than the
default hyper barrier.
This barrier is designed to handle fine granularity parallelism where
barriers are used frequently with little compute and memory access
between barriers. There is no need to use it for codes with few
barriers and large granularity compute, or memory intensive
applications, as little difference will be seen between this barrier
and the default hyper barrier. This barrier is designed to work
optimally with a fixed number of threads, and has a significant setup
time, so should NOT be used in situations where the number of threads
in a team is varied frequently.
The two-level distributed barrier is off by default -- hyper barrier
is used by default. To use this barrier, you must set all barrier
patterns to use this type, because it will not work with other barrier
patterns. Thus, to turn it on, the following settings are required:
KMP_FORKJOIN_BARRIER_PATTERN=dist,dist
KMP_PLAIN_BARRIER_PATTERN=dist,dist
KMP_REDUCTION_BARRIER_PATTERN=dist,dist
Branching factors (set with KMP_FORKJOIN_BARRIER, KMP_PLAIN_BARRIER,
and KMP_REDUCTION_BARRIER) are ignored by the two-level distributed
barrier.
Patch fixed for ITTNotify disabled builds and non-x86 builds
Co-authored-by: Jonathan Peyton <jonathan.l.peyton@intel.com>
Co-authored-by: Vladislav Vinogradov <vlad.vinogradov@intel.com>
Differential Revision: https://reviews.llvm.org/D103121
D106850 introduced a simplification for llvm.vscale by looking at the
surrounding function's vscale_range attributes. The call that's being
simplified may not yet have been inserted into the IR. This happens for
example during function cloning.
This patch fixes the issue by checking if the instruction is in a
parent basic block.
* Adds source targets (not included in the full set that downstreams use by default) to bundle mlir-c/ headers into the mlir/_mlir_libs/include directory.
* Adds a minimal entry point to get include and library directories.
* Used by npcomp to export a full CAPI (which is then used by the Torch extension to link npcomp).
Reviewed By: mikeurbach
Differential Revision: https://reviews.llvm.org/D107090