After landing D121813 the binary size increase introduced by this change can be minimized by using --gc-sections link options. D121813 allows each individual callbacks to be optimized out if not used.
Reviewed By: vitalybuka, MaskRay
Differential Revision: https://reviews.llvm.org/D122407
Fix clang crash and add bitfield support in __builtin_dump_struct.
In clang13.0.x, a struct with three or more members and a bitfield at
the same time will cause a crash. In clang15.x, as long as the struct
has one bitfield, it will cause a crash in clang.
Open issue: https://github.com/llvm/llvm-project/issues/54462
Differential Revision: https://reviews.llvm.org/D122248
CUDA/HIP determines whether a function can be called based on
the device/host attributes of callee and caller. Clang assumes the
caller is CurContext. This is correct in most cases, however, it is
not correct in OpenMP parallel region when CUDA/HIP program
is compiled with -fopenmp. This causes incorrect overloading
resolution and missed diagnostics.
To get the correct caller, clang needs to chase the parent chain
of DeclContext starting from CurContext until a function decl
or a lambda decl is reached. Sema API is adapted to achieve that
and used to determine the caller in hostness check.
Reviewed by: Artem Belevich, Richard Smith
Differential Revision: https://reviews.llvm.org/D121765
We can not bitcast pointers across different address spaces. This was
previously fixed in D89577 but then in D93229 an enhancement was added
which peeks further through the ponter operand, opening up the
possibility that address-space violations could be introduced.
Instead of bailing as the previous fix did, simply insert an
addrspacecast cast instruction.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D121787
This information isn't preserved in the DWARF description of function
types (though probably should be - it's preserved on the function
declarations/definitions themselves through the DW_AT_noreturn attribute
- but we should move or also include that in the subroutine type itself
too - but for now, with it not being there, the DWARF is lossy and
can't be reconstructed)
Most intrinsics, especially "default" ones, will not call back into the
IR module. `nocallback` encodes this nicely. As it was not used before,
this patch also makes use of `nocallback` in the Attributor which
results in many more `norecurse` deductions.
Tablegen part is mechanical, test updates by script.
Differential Revision: https://reviews.llvm.org/D118680
--build-id was introduced as "approximation of true uniqueness across all
binaries that might be used by overlapping sets of people". It does not require
the some resistance mentioned below. In practice, people just use --build-id=md5
for 16-byte build ID and --build-id=sha1 for 20-byte build ID.
BLAKE3 has 256-bit key length, which provides 128-bit security against
(second-)preimage, collision, and differentiability attacks. Its portable
implementation is fast. It additionally provides Arm Neon/AVX2/AVX-512. Just
implement --build-id={md5,sha1} with truncated BLAKE3.
Linking clang 14 RelWithDebInfo with --threads=8 on a Skylake CPU:
* 1.13x as fast with --build-id=md5
* 1.15x as fast with --build-id=sha1
--threads=4 on Apple m1:
* 1.25x as fast with --build-id=md5
* 1.17x as fast with --build-id=sha1
Reviewed By: ikudrin
Differential Revision: https://reviews.llvm.org/D121531
Using a BumpPtrAllocator introduced memory leaks for APIRecords as they
contain a std::vector. This meant that we needed to always keep a
reference to the records in APISet and arrange for their destructor to
get called appropriately. This was further complicated by the need for
records to own sub-records as these subrecords would still need to be
allocated via the BumpPtrAllocator and the owning record would now need
to arrange for the destructor of its subrecords to be called
appropriately.
Since APIRecords contain a std::vector so whenever elements get added to
that there is an associated heap allocation regardless. Since
performance isn't currently our main priority it makes sense to use
regular unique_ptr to keep track of APIRecords, this way we don't need
to arrange for destructors to get called.
The BumpPtrAllocator is still used for strings such as USRs so that we
can easily de-duplicate them as necessary.
Differential Revision: https://reviews.llvm.org/D122331
Changes from original BLAKE3 sources:
* `blake.h`:
* Changes to avoid conflicts if a client also links with its own BLAKE3 version:
* Renamed the header macro guard with `LLVM_C_` prefix
* Renamed the C symbols to add the `llvm_` prefix
* Added a top header comment that references the CC0 license and points to the `LICENSE` file in the repo.
* `blake3_impl.h`: Added `#define`s to remove some of `llvm_` prefixes for the rest of the internal implementation.
* Implementation files:
* Added a top header comment for `blake.c`
* Used `llvm_` prefix for the C public API functions
* Used `LLVM_LIBRARY_VISIBILITY` for internal implementation functions
* Added `.private_extern`/`.hidden` in assembly files to reduce visibility of the internal implementation functions
* `README.md`:
* added a note about where the sources originated from
* Used the C++ BLAKE3 class and `llvm_` prefixed C API in place of examples and API documentation.
* Removed instructions about how to build the files.
BLAKE3 is a cryptographic hash function that is secure and very performant.
The C implementation originates from https://github.com/BLAKE3-team/BLAKE3/tree/1.3.1/c
License is at https://github.com/BLAKE3-team/BLAKE3/blob/1.3.1/LICENSE
This patch adds:
* `llvm/include/llvm-c/blake3.h`: The BLAKE3 C API
* `llvm/include/llvm/Support/BLAKE3.h`: C++ wrapper of the C API
* `llvm/lib/Support/BLAKE3`: Directory containing the BLAKE3 C implementation files, including the `LICENSE` file
* `llvm/unittests/Support/BLAKE3Test.cpp`: unit tests for the BLAKE3 C++ wrapper
This initial patch contains the pristine BLAKE3 sources, a follow-up patch will introduce
LLVM-specific prefixes to avoid conflicts if a client also links with its own BLAKE3 version.
And here's some timings comparing BLAKE3 with LLVM's SHA1/SHA256/MD5.
Timings include `AVX512`, `AVX2`, `neon`, and the generic/portable implementations.
The table shows the speed-up multiplier of BLAKE3 for hashing 100 MBs:
| Processor | SHA1 | SHA256 | MD5 |
|-------------------------|-------|--------|------|
| Intel Xeon W (AVX512) | 10.4x | 27x | 9.4x |
| Intel Xeon W (AVX2) | 6.5x | 17x | 5.9x |
| Intel Xeon W (portable) | 1.3x | 3.3x | 1.1x |
| M1Pro (neon) | 2.1x | 4.7x | 2.8x |
| M1Pro (portable) | 1.1x | 2.4x | 1.5x |
Differential Revision: https://reviews.llvm.org/D121510
This patch adds the nowait parameter to `createSingle` in
OpenMPIRBuilder and handling for IR generation from OpenMP Dialect.
Also added tests for the same.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D122371
The main code for the FILE struct is only enabled on platforms that it
works on, but before this patch the tests were included unconditionally.
Reviewed By: sivachandra, lntue
Differential Revision: https://reviews.llvm.org/D122363
I audited all uses of _LIBCPP_ASSERT to make sure that we only used it
for "basic assertions", i.e. assertions with constant-time conditions.
I also audited all uses of _LIBCPP_DEBUG_ASSERT to make sure we used it
only for debug-mode assertions, and in one case had to change for
_LIBCPP_ASSERT instead.
As a fly-by, I also changed a couple of tests against nullptr or 0 to
be more explicit.
After this patch, all uses of _LIBCPP_ASSERT should be with constant-time
conditions, and all uses of _LIBCPP_DEBUG_ASSERT should be with conditions
that we only want to check when the debug mode is enabled.
Differential Revision: https://reviews.llvm.org/D122395
As mentioned in D120782, the loop block order can be different depending
on if LoopInfo is incrementally updated or freshly computed.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D122195
Currently, we only print how threads involved in data race are created from their parent threads.
Add a runtime flag 'print_full_thread_history' to print thread creation stacks for the threads involved in the data race and their ancestors up to the main thread.
Reviewed By: dvyukov
Differential Revision: https://reviews.llvm.org/D122131
Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
Re-commit of 32e8b550e5
This patch rearranges emission of CFI instructions, so the resulting
DWARF and `.eh_frame` information is precise at every instruction.
The current state is that the unwind info is emitted only after the
function prologue. This is fine for synchronous (e.g. C++) exceptions,
but the information is generally incorrect when the program counter is
at an instruction in the prologue or the epilogue, for example:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
mov x29, sp
.cfi_def_cfa w29, 16
...
```
after the `stp` is executed the (initial) rule for the CFA still says
the CFA is in the `sp`, even though it's already offset by 16 bytes
A correct unwind info could look like:
```
stp x29, x30, [sp, #-16]! // 16-byte Folded Spill
.cfi_def_cfa_offset 16
mov x29, sp
.cfi_def_cfa w29, 16
...
```
Having this information precise up to an instruction is useful for
sampling profilers that would like to get a stack backtrace. The end
goal (towards this patch is just a step) is to have fully working
`-fasynchronous-unwind-tables`.
Reviewed By: danielkiss, MaskRay
Differential Revision: https://reviews.llvm.org/D111411
This patch adds lowering for the `!$acc wait` directive
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122399
The alignment needs to be part of the folding set hash. This is
handled by getAssertAlign when nodes are created, but needs to repeated here.
No test case as I found it as part of a very early experimental patch.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D122279
Asan complained about uninitialized bool
`invalid-bool-load`
llvm/lib/Target/X86/AsmParser/X86Operand.h:389:12: runtime error: load
of value 171, which is not a valid value for type 'bool'
Differential Revision: https://reviews.llvm.org/D122405
This is the orignal patch + a check that LLVM_BUILD_EXAMPLES is enabled before
adding a dependency on the 'Bye' example pass.
Original summary:
Add cli options for new passmanager plugin support to lld.
Currently it is not possible to load dynamic NewPM plugins with lld. This is an
incremental update to D76866. While that patch only added cli options for
llvm-lto2, this adds them for lld as well. This is especially useful for running
dynamic plugins on the linux kernel with LTO.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D120490
This patch moves pointer induction handling from VPWidenPHIRecipe to its
own recipe. In the process, it adds all information required to generate
code for pointer inductions without relying on Legal to access the list
of induction phis.
Alternatively VPWidenPHIRecipe could also take an optional pointer to InductionDescriptor.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D121615
This patch adds lowering for the `!$acc data`
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Depends on D122384
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122398
Add the "(0 - X) --> (X * -1)" reverse identity to the list of alternate form binops.
We need a little hack to make the existing logic work because it does not expect to
move constants from op0 to op1, but the code comment hopefully makes that clear.
I don't think there are any other identities like that.
Fixes#54364
Differential Revision: https://reviews.llvm.org/D122390
Add missing commit SHAs in a few cases, and use "All Platforms" when
the change affected all platforms instead of explicitly listing the
platforms, since there's too many to list exhaustively.
Clangd ignores fixits if the diagnsotics is outside the main file (e.g.
a note on a declaration from a header), but the fix might still be inside the
main file (e.g. change the function call).
This patch changes the logic to retain fixes that touch main file, if the
diagnostic owning them is still inside main file, even if they are attached to a
note outside the main file.
Differential Revision: https://reviews.llvm.org/D122315
This patch adds lowering for the `!$acc update`
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Depends on D122387
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122396
This patch adds lowering for the `!$acc init` and `!$acc shutdown`
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Depends on D122384
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122387
This patch adds lowering for the `!$acc exit data` directive
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Depends on D122384
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122386
This patch adds lowering for the `!$acc enter data` directive
from the PFT to OpenACC dialect.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122384
This patches adds lowering tests for some array
expressions use cases.
This patch is part of the upstreaming effort from fir-dev branch.
Reviewed By: PeteSteinfeld
Differential Revision: https://reviews.llvm.org/D122380
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Eric Schweitz <eschweitz@nvidia.com>