Current clang generates extra set of simd variant function attribute
with extra 'v' encoding.
For example:
_ZGVbN2v__Z5add_1Pf vs _ZGVbN2vv__Z5add_1Pf
The problem is due to declaration of ParamAttrs following:
llvm::SmallVector<ParamAttrTy, 8> ParamAttrs(ParamPositions.size());
where ParamPositions.size() is grown after following assignment:
Pos = ParamPositions[PVD];
So the PVD is not find in ParamPositions.
The problem is ParamPositions need to set for each FD decl. To fix this
Move ParamPositions's init inside while loop for each FD.
Differential Revision: https://reviews.llvm.org/D122338
After landing D121813 the binary size increase introduced by this change can be minimized by using --gc-sections link options. D121813 allows each individual callbacks to be optimized out if not used.
Reviewed By: vitalybuka, MaskRay
Differential Revision: https://reviews.llvm.org/D122407
Fix clang crash and add bitfield support in __builtin_dump_struct.
In clang13.0.x, a struct with three or more members and a bitfield at
the same time will cause a crash. In clang15.x, as long as the struct
has one bitfield, it will cause a crash in clang.
Open issue: https://github.com/llvm/llvm-project/issues/54462
Differential Revision: https://reviews.llvm.org/D122248
CUDA/HIP determines whether a function can be called based on
the device/host attributes of callee and caller. Clang assumes the
caller is CurContext. This is correct in most cases, however, it is
not correct in OpenMP parallel region when CUDA/HIP program
is compiled with -fopenmp. This causes incorrect overloading
resolution and missed diagnostics.
To get the correct caller, clang needs to chase the parent chain
of DeclContext starting from CurContext until a function decl
or a lambda decl is reached. Sema API is adapted to achieve that
and used to determine the caller in hostness check.
Reviewed by: Artem Belevich, Richard Smith
Differential Revision: https://reviews.llvm.org/D121765
This information isn't preserved in the DWARF description of function
types (though probably should be - it's preserved on the function
declarations/definitions themselves through the DW_AT_noreturn attribute
- but we should move or also include that in the subroutine type itself
too - but for now, with it not being there, the DWARF is lossy and
can't be reconstructed)
Most intrinsics, especially "default" ones, will not call back into the
IR module. `nocallback` encodes this nicely. As it was not used before,
this patch also makes use of `nocallback` in the Attributor which
results in many more `norecurse` deductions.
Tablegen part is mechanical, test updates by script.
Differential Revision: https://reviews.llvm.org/D118680
Using a BumpPtrAllocator introduced memory leaks for APIRecords as they
contain a std::vector. This meant that we needed to always keep a
reference to the records in APISet and arrange for their destructor to
get called appropriately. This was further complicated by the need for
records to own sub-records as these subrecords would still need to be
allocated via the BumpPtrAllocator and the owning record would now need
to arrange for the destructor of its subrecords to be called
appropriately.
Since APIRecords contain a std::vector so whenever elements get added to
that there is an associated heap allocation regardless. Since
performance isn't currently our main priority it makes sense to use
regular unique_ptr to keep track of APIRecords, this way we don't need
to arrange for destructors to get called.
The BumpPtrAllocator is still used for strings such as USRs so that we
can easily de-duplicate them as necessary.
Differential Revision: https://reviews.llvm.org/D122331
Adds basic parsing/sema/serialization support for the
#pragma omp target parallel loop directive.
Differential Revision: https://reviews.llvm.org/D122359
This simplifies completeness comparisons against OpenCLBuiltins.td and
also makes the header no longer "claim" the identifiers "x", "y" and
"z".
Continues the direction set out in D119560.
Reformat some misindentation that is coincidentally close to a piece
being worked on.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D122314
Since function parameters and return values are passed via param space, we
can force special alignment for values hold in it which will add vectorization
options. This change may be done if the function has private or internal
linkage. Special alignment is forced during 2 phases.
1) Instruction selection lowering. Here we use special alignment for function
prototypes (changing both own return value and parameters alignment), call
lowering (changing both callee's return value and parameters alignment).
2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that
belong to param space (or are casted to it). We only handle cases when all
uses of such parameters are loads from it. For such loads, we can change the
alignment according to special type alignment and the load offset. Then,
load-store-vectorizer IR pass will perform vectorization where alignment
allows it.
Special alignment calculated as maximum from default ABI type alignment and
alignment 16. Alignment 16 is chosen because it's the maximum size of
vectorized ld.param & st.param.
Before specifying such special alignment, we should check if it is a multiple
of the alignment that the type already has. For example, if a value has an
enforced alignment of 64, default ABI alignment of 4 and special alignment
of 16, we should preserve 64.
This patch will be followed by a refactoring patch that removes duplicating
code in handling byval and non-byval arguments.
Differential Revision: https://reviews.llvm.org/D120129
Since function parameters and return values are passed via param space, we
can force special alignment for values hold in it which will add vectorization
options. This change may be done if the function has private or internal
linkage. Special alignment is forced during 2 phases.
1) Instruction selection lowering. Here we use special alignment for function
prototypes (changing both own return value and parameters alignment), call
lowering (changing both callee's return value and parameters alignment).
2) IR pass nvptx-lower-args. Here we change alignment of byval parameters that
belong to param space (or are casted to it). We only handle cases when all
uses of such parameters are loads from it. For such loads, we can change the
alignment according to special type alignment and the load offset. Then,
load-store-vectorizer IR pass will perform vectorization where alignment
allows it.
Special alignment calculated as maximum from default ABI type alignment and
alignment 16. Alignment 16 is chosen because it's the maximum size of
vectorized ld.param & st.param.
Before specifying such special alignment, we should check if it is a multiple
of the alignment that the type already has. For example, if a value has an
enforced alignment of 64, default ABI alignment of 4 and special alignment
of 16, we should preserve 64.
This patch will be followed by a refactoring patch that removes duplicating
code in handling byval and non-byval arguments.
Differential Revision: https://reviews.llvm.org/D121549
In some cases, we need to set alternative toolchain path other than the
default with system (headers, libraries, dynamic linker prefix, ld path,
etc.), e.g., to pick up newer components, but keep sysroot at the same
time (to pick up extra packages).
This change introduces a new option --overlay-platform-toolchain to set
up such alternative toolchain path.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D121992
Move the SourceRange from the old ParsedAttributesWithRange into
ParsedAttributesView, so we have source range information available
everywhere we use attributes.
This also removes ParsedAttributesWithRange (replaced by simply using
ParsedAttributes) and ParsedAttributesVieWithRange (replaced by using
ParsedAttributesView).
Differential Revision: https://reviews.llvm.org/D121201
Summary:
There was a naming conflict in the getNVPTXTargetFeatures function that
prevented some compilers from correctly disambiguating between the
enumeration and variable of the same name. Rename the variable to avoid
this.
specialization
Before the patch, the compiler would crash for the test due to
inconsistent linkage.
This patch tries to avoid it by make the linkage consistent for template
and its specialization. After the patch, the behavior of compiler would
be partially correct for the case.
The correct one is:
```
export template<class T>
void f() {}
template<>
void f<int>() {}
```
In this case, the linkage for both declaration should be external (the
wording I get by consulting in WG21 is "the linkage for name f should be
external").
And for the case:
```
template<class T>
void f() {}
export template<>
void f<int>() {}
```
Compiler should reject it. This isn't done now. After all, this patch would
stop a crash.
Reviewed By: iains, aaron.ballman, dblaikie
Differential Revision: https://reviews.llvm.org/D120397
For MachO, lower `@llvm.global_dtors` into `@llvm_global_ctors` with
`__cxa_atexit` calls to avoid emitting the deprecated `__mod_term_func`.
Reuse the existing `WebAssemblyLowerGlobalDtors.cpp` to accomplish this.
Enable fallback to the old behavior via Clang driver flag
(`-fregister-global-dtors-with-atexit`) or llc / code generation flag
(`-lower-global-dtors-via-cxa-atexit`). This escape hatch will be
removed in the future.
Differential Revision: https://reviews.llvm.org/D121736
Currently we create offloading entries to register device variables with
the host. When we register a variable we will look up the symbol in the
device image and map the device address to the host address. This is a
problem when the symbol is declared with hidden visibility or internal
linkage. This means the symbol is not accessible externally and we
cannot get its address. We should still allow static variables to be
declared on the device, but ew should not create an offloading entry for
them so they exist independently on the host and device.
Fixes#54309
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D122352
This improves the getHostCPUName check for Apple M1 CPUs, which
previously would always be considered cyclone instead. This also enables
`-march=native` support when building on M1 CPUs which would previously
fail. This isn't as sophisticated as the X86 CPU feature checking which
consults the CPU via getHostCPUFeatures, but this is still better than
before. This CPU selection could also be invalid if this was run on an
iOS device instead, ideally we can improve those cases as they come up.
Differential Revision: https://reviews.llvm.org/D119788