This makes the default build closer to a -DLLVM_ENABLE_RUNTIMES=all build.
The layout is arguably superior because different libraries of target triples
are in different directories, similar to GCC/Debian multiarch.
When LLVM_DEFAULT_TARGET_TRIPLE is x86_64-unknown-linux-gnu,
`lib/clang/14.0.0/lib/libclang_rt.asan-x86_64.a`
is moved to
`lib/clang/14.0.0/lib/x86_64-unknown-linux-gnu/libclang_rt.asan.a`.
In addition, if the host compiler supports -m32 (multilib),
`lib/clang/14.0.0/lib/libclang_rt.asan-i386.a`
is moved to
`lib/clang/14.0.0/lib/i386-unknown-linux-gnu/libclang_rt.asan.a`.
Clang has been detecting both paths for lib/Driver/ToolChains/Gnu.cpp since 2018 (D50547).
---
Note: Darwin needs to be disabled. The hierarchy needs to be sorted out.
The current -DLLVM_DEFAULT_TARGET_TRIPLE=off state is like:
```
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_ios.a
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_iossim.a
lib/clang/14.0.0/lib/darwin/libclang_rt.profile_osx.a
```
Windows needs to be disabled: https://reviews.llvm.org/D107799?id=368557#2963311
Differential Revision: https://reviews.llvm.org/D107799
9ee64c3746 has started using
COMPILER_RT_HAS_OMIT_FRAME_POINTER_FLAG inside scudo. However,
the relevant CMake check was performed in builtin-config-ix.cmake,
so the definition was missing when builtins were not built. Move
the check to config-ix.cmake, so that it runs unconditionally of
the components being built.
Fixes PR#51847
Differential Revision: https://reviews.llvm.org/D109812
On x86_64-unknown-linux-gnu, `-m32` tests set LD_LIBRARY_PATH to
`config.compiler_rt_libdir` (`$build/lib/clang/14.0.0/lib/x86_64-unknown-linux-gnu`)
instead of i386-unknown-linux-gnu, so `-shared-libsan` executables
cannot find their runtime (e.g. `TestCases/replaceable_new_delete.cpp`).
Detect -m32 and -m64 in config.target_cflags, and adjust `config.compiler_rt_libdir`.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D108859
Some setjmp calls within libc cannot be intercepted while their matching
longjmp calls can be. This causes problems if our setjmp/longjmp
interceptors don't use the exact same format as libc for populating and
reading the jmp_buf.
We add a magic field to our jmp_buf and populate it in setjmp. This
allows our longjmp interceptor to notice when a libc jmp_buf is passed
to it.
See discussion on https://reviews.llvm.org/D109699 and
https://reviews.llvm.org/D69045.
Fixes https://github.com/google/sanitizers/issues/1244.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D109787
This change adds the system libc++ header location to the driver. As well we define
the `__LIBC_NO_CPP_MATH_OVERLOADS__` macro when using those headers, in order to suppress
conflicting C++ overloads in the system libc headers that were used by XL C++.
Reviewed By: ZarkoCA
Differential Revision: https://reviews.llvm.org/D109078
\x{XXXX} \u{XXXX} and \o{OOOO} are accepted in all languages mode
in characters and string literals.
This is a feature proposed for both C++ (P2290R1) and C (N2785). The
papers have been seen by both committees but are not yet adopted into
either standard. However, they do have support from both committees.
Based off the worse case numbers generated by D103695, the AVX2/512 bit reversing/counting costs were higher than necessary (based off instruction counts instead of actual throughput).
Summary: Making the late transformations opt-in results in less surprising behavior when composing multiple calls to the codegen strategy.
Reviewers:
Subscribers:
Differential revision: https://reviews.llvm.org/D109820
AliasInfo can now use union-find for a much more efficient implementation.
This brings no functional changes but large performance gains on more complex examples.
Differential Revision: https://reviews.llvm.org/D109819
Under some situations under Thumb1, we could be stuck in an infinite
loop recombining the same instruction. This puts a limit on that, not
combining SUBC with SUBE repeatedly.
This extends the reduction logic in the vectorizer to handle intrinsic
versions of min and max, both the floating point variants already
created by instcombine under fastmath and the integer variants from
D98152.
As a bonus this allows us to match a chain of min or max operations into
a single reduction, similar to how add/mul/etc work.
Differential Revision: https://reviews.llvm.org/D109645
When searching for hidden identity shuffles (added at rG41146bfe82aecc79961c3de898cda02998172e4b), only peek through bitcasts to the source operand if it is a vector type as well.
Adds support for a feature macro `__opencl_c_images` in C++ for
OpenCL 2021 enabling a respective optional core feature from
OpenCL 3.0.
This change aims to achieve compatibility between C++ for OpenCL
2021 and OpenCL 3.0.
Differential Revision: https://reviews.llvm.org/D109002
This is a first step towards addressing the last remaining limitation of
the VPlan version of sinkScalarOperands: the legacy version can
partially sink operands. For example, if a GEP has uniform users outside
the sink target block, then the legacy version will sink all scalar
GEPs, other than the one for lane 0.
This patch works towards addressing this case in the VPlan version by
detecting such cases and duplicating the sink candidate. All users
outside of the sink target will be updated to use the uniform clone.
Note that this highlights an issue with VPValue naming. If we duplicate
a replicate recipe, they will share the same underlying IR value and
both VPValues will have the same name ir<%gep>.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104254
Create a new document that explain both stages of the process in a single
place, merge and deduplicate the content from the two previous documents. Also
extend the documentation to account for the recent changes in pass structure
due to standard dialect splitting and translation being more flexible.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D109605
Added '-print-pipeline-passes' printing of parameters for those passes
declared with *_WITH_PARAMS macro in PassRegistry.def.
Note that it only prints the parameters declared inside *_WITH_PARAMS as
in a few cases there appear to be additional parameters not parsable.
The following passes are now covered (i.e. all of those with *_WITH_PARAMS in
PassRegistry.def).
LoopExtractorPass - loop-extract
HWAddressSanitizerPass - hwsan
EarlyCSEPass - early-cse
EntryExitInstrumenterPass - ee-instrument
LowerMatrixIntrinsicsPass - lower-matrix-intrinsics
LoopUnrollPass - loop-unroll
AddressSanitizerPass - asan
MemorySanitizerPass - msan
SimplifyCFGPass - simplifycfg
LoopVectorizePass - loop-vectorize
MergedLoadStoreMotionPass - mldst-motion
GVN - gvn
StackLifetimePrinterPass - print<stack-lifetime>
SimpleLoopUnswitchPass - simple-loop-unswitch
Differential Revision: https://reviews.llvm.org/D109310
This check should ensure we don't reproduce the problem fixed by
02df443d28
More accurately, it checks every llvm::Any::TypeId symbol in libLLVM-x.so and
make sure they have weak linkage and are not local to the library, which would
lead to duplicate definition if another weak version of the symbol is defined in
another linked library.
Differential Revision: https://reviews.llvm.org/D109252
Invalid frame addresses exist in call stack samples due to bad unwinding. This could happen to frame-pointer-based unwinding and the callee functions that do not have the frame pointer chain set up. It isn't common when the program is built with the frame pointer omission disabled, but can still happen with third-party static libs built with frame pointer omitted.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D109638
This reverts commit 81f8ad1769.
This seems to break the shared libs build
(linaro-flang-aarch64-sharedlibs bot) with:
undefined reference to `Fortran::semantics::IsCoarray(Fortran::semantics::Symbol const&)
(from tools/flang/lib/Evaluate/CMakeFiles/obj.FortranEvaluate.dir/tools.cpp.o)
When linking lib/libFortranEvaluate.so.14git
This seems in-line with the intent and how we build tools around it.
Update the description for the flag accordingly.
Also use an injected thread pool in MLIROptMain, now we will create
threads up-front and reuse them across split buffers.
Differential Revision: https://reviews.llvm.org/D109802
Both copy/alloc ops are using memref dialect after this change.
Reviewed By: silvas, mehdi_amini
Differential Revision: https://reviews.llvm.org/D109480
The fmul is a canonicalizing operation, and fneg is not so this would
break denormals that need flushing and also would not quiet signaling
nans. Fold to fsub instead, which is also canonicalizing.
Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.
Reviewed By: wenlei, wlei, wmi
Differential Revision: https://reviews.llvm.org/D109531
Rather than depending on the hex dump from obj2yaml. Now the test shows the
expected function body in a human readable format.
Differential Revision: https://reviews.llvm.org/D109730
This simple heuristic uses the estimated live range length combined
with the number of registers in the class to switch which heuristic to
use. This was taking the raw number of registers in the class, even
though not all of them may be available. AMDGPU heavily relies on
dynamically reserved numbers of registers based on user attributes to
satisfy occupancy constraints, so the raw number is highly misleading.
There are still a few problems here. In the original testcase that
made me notice this, the live range size is incorrect after the
scheduler rearranges instructions, since the instructions don't have
the original InstrDist offsets. Additionally, I think it would be more
appropriate to use the number of disjointly allocatable registers in
the class. For the AMDGPU register tuples, there are a large number of
registers in each tuple class, but only a small fraction can actually
be allocated at the same time since they all overlap with each
other. It seems we do not have a query that corresponds to the number
of independently allocatable registers. Relatedly, I'm still debugging
some allocation failures where overlapping tuples seem to not be
handled correctly.
The test changes are mostly noise. There are a handful of x86 tests
that look like regressions with an additional spill, and a handful
that now avoid a spill. The worst looking regression is likely
test/Thumb2/mve-vld4.ll which introduces a few additional
spills. test/CodeGen/AMDGPU/soft-clause-exceeds-register-budget.ll
shows a massive improvement by completely eliminating a large number
of spills inside a loop.