Summary: We've had these target independent intrinsics for at least a year and a half. Looks like they do exactly what we need here and the backend already supports them.
Reviewers: RKSimon, delena, spatel, GBuella
Reviewed By: RKSimon
Subscribers: cfe-commits, llvm-commits
Differential Revision: https://reviews.llvm.org/D47693
llvm-svn: 334366
We currently support them only in AArch64. The NEON Reference,
however, says they are 'ARMv7, ARMv8' intrinsics.
Differential Revision: https://reviews.llvm.org/D47446
llvm-svn: 334362
Symbols are cleaned up from the program state map when they go out of scope.
Memory regions are cleaned up when the corresponding object is destroyed, and
additionally in 'checkDeadSymbols' in case destructor modeling was incomplete.
Differential Revision: https://reviews.llvm.org/D47416
llvm-svn: 334352
This check will mark raw pointers to C++ standard library container internal
buffers 'released' when the objects themselves are destroyed. Such information
can be used by MallocChecker to warn about use-after-free problems.
In this first version, 'std::basic_string's are supported.
Differential Revision: https://reviews.llvm.org/D47135
llvm-svn: 334348
SmallSet forwards to SmallPtrSet for pointer types. SmallPtrSet supports iteration, but a normal SmallSet doesn't. So if it wasn't for the forwarding, this wouldn't work.
These places were found by hiding the begin/end methods in the SmallSet forwarding.
llvm-svn: 334339
There are many masked intrinsics that just wrap a select around a legacy intrinsic from a pre-avx512 instruciton set. If that intrinsic is implemented as a macro, nothing prevents it from being used when only the older feature was enabled. This likely generates very poor code since we don't have a good way to convert from the scalar masked type used by the intrinsic into a vector control for a legacy blend instruction. If we even have a blend instruction to use.
By adding a feature to the select builtins we can prevent and diagnose misuse of these intrinsics.
llvm-svn: 334334
I'd like to make the select builtins require an avx512f, avx512bw, or avx512vl fature to match what is normally required to get masking. Truncate is special in that there are instructions with a 128/256-bit masked result even without avx512vl.
By using special buitlins we can emit a select without using the 128/256-bit select builtins.
llvm-svn: 334331
I'm looking into making the select builtins require avx512f, avx512bw, or avx512vl since masking operations generally require those features.
The extract builtins are funny because the 512-bit versions return a 128 or 256 bit vector with masking even when avx512vl is not supported.
llvm-svn: 334330
These builtins are all handled by CGBuiltin.cpp so it doesn't much matter what the immediate type is, but int matches the intrinsic spec.
llvm-svn: 334310
CGM.GetAddrOfConstantCString() sets the adress of the created GlobalValue
to unnamed. When emitting the object file LLVM will mark the surrounding
section as SHF_MERGE iff the string is nul-terminated and contains no
other nuls (see IsNullTerminatedString). This results in problems when
saving temporaries because LLVM doesn't set an EntrySize, so reading in
the serialized assembly file fails.
This never happened for the GPU binaries because they usually contain
a nul-character somewhere. Instead this only affected the module ID
when compiling relocatable device code.
However, this points to a potentially larger problem: If we put a
constant string into a named section, we really want the data to end
up in that section in the object file. To avoid LLVM merging sections
this patch unmarks the GlobalVariable's address as unnamed which also
fixes the problem of invalid serialized assembly files when saving
temporaries.
Differential Revision: https://reviews.llvm.org/D47902
llvm-svn: 334281
Test changes are due to differences in how we generate undef elements now. We also changed the types used for extractf128_si256/insertf128_si256 to match the signature of the builtin that previously existed which this patch resurrects. This also matches gcc.
llvm-svn: 334261
The windows-msvc target is meant to be ABI compatible with MSVC,
including the exception handling. Ensure that a windows-msvc triple
always equates to the MSVC personality being used.
This mostly affects the GNUStep and ObjFW Obj-C runtimes. To the best of
my knowledge, those are normally not used with windows-msvc triples. I
believe WinObjC is based on GNUStep (or it at least uses libobjc2), but
that also takes the approach of wrapping Obj-C exceptions in C++
exceptions, so the MSVC personality function is the right one to use
there as well.
Differential Revision: https://reviews.llvm.org/D47862
llvm-svn: 334253
Edited loop_proto and its converter to make more "vectorizable" code
according to kcc's comment in D47666
- Removed all while loops
- Can only index into array with induction variable
Patch By: emmettneyman
Differential Revision: https://reviews.llvm.org/D47920
llvm-svn: 334252
This reapplies r334224 and adds explicit triples to some tests to fix
them on Windows (where otherwise they would have run with the default
windows-msvc triple, which I'm changing the behavior for).
Original commit message:
The body of a `@finally` needs to be executed on both exceptional and
non-exceptional paths. On landingpad platforms, this is straightforward:
the `@finally` body is emitted as a normal (non-exceptional) cleanup,
and then a catch-all is emitted which branches to that cleanup (the
cleanup has code to conditionally re-throw based on a flag which is set
by the catch-all).
Unfortunately, we can't use the same approach for MSVC exceptions, where
the catch-all will be emitted as a catchpad. We can't just branch to the
cleanup from within the catchpad, since we can only exit it via a
catchret, at which point the exception is destroyed and we can't
rethrow. We could potentially emit the finally body inside the catchpad
and have the normal cleanup path somehow branch into it, but that would
require some new IR construct that could branch into a catchpad.
Instead, after discussing it with Reid Kleckner, we decided that
frontend outlining was the best approach, similar to how SEH `__finally`
works today. We decided to use CapturedStmt (which was also suggested by
Reid) rather than CaptureFinder (which is what `__finally` uses) since
the latter doesn't handle a lot of cases we care about, e.g. self
accesses, property accesses, block captures, etc. Extending
CaptureFinder to handle those additional cases proved unwieldy, whereas
CapturedStmt already took care of all of those. In theory `__finally`
could also be moved over to CapturedStmt, which would remove some
existing limitations (e.g. the inability to capture this), although
CaptureFinder would still be needed for SEH filters.
The one case supported by `@finally` but not CapturedStmt (or
CaptureFinder for that matter) is arbitrary control flow out of the
`@finally`, e.g. having a return statement inside a `@finally`. We can
add that support as a follow-up, but in practice we've found it to be
used very rarely anyway.
Differential Revision: https://reviews.llvm.org/D47564
llvm-svn: 334251
The windows-msvc target is used for MSVC ABI compatibility, including
the exceptions model. It doesn't make sense to pair a windows-msvc
target with a non-MSVC exception model. This would previously cause an
assertion failure; explicitly error out for it in the frontend instead.
This also allows us to reduce the matrix of target/exception models a
bit (see the modified tests), and we can possibly simplify some of the
personality code in a follow-up.
Differential Revision: https://reviews.llvm.org/D47853
llvm-svn: 334243
This reverts commit r334224.
This is causing buildbot failures on Windows, presumably because some
tests don't specify a triple. I'll test this on Windows locally and
recommit with the tests fixed.
llvm-svn: 334240
Adds support for these intrinsics, which are ARM and ARM64 only:
_interlockedbittestandreset_acq
_interlockedbittestandreset_rel
_interlockedbittestandreset_nf
_interlockedbittestandset_acq
_interlockedbittestandset_rel
_interlockedbittestandset_nf
Refactor the bittest intrinsic handling to decompose each intrinsic into
its action, its width, and its atomicity.
llvm-svn: 334239
We still emit shufflevector instructions we just do it from CGBuiltin.cpp now. This ensures the intrinsics that use this are only available on CPUs that support the feature.
I also added range checking to the immediate, but only checked it is 8 bits or smaller. We should maybe be stricter since we never use all 8 bits, but gcc doesn't seem to do that.
llvm-svn: 334237
The body of a `@finally` needs to be executed on both exceptional and
non-exceptional paths. On landingpad platforms, this is straightforward:
the `@finally` body is emitted as a normal (non-exceptional) cleanup,
and then a catch-all is emitted which branches to that cleanup (the
cleanup has code to conditionally re-throw based on a flag which is set
by the catch-all).
Unfortunately, we can't use the same approach for MSVC exceptions, where
the catch-all will be emitted as a catchpad. We can't just branch to the
cleanup from within the catchpad, since we can only exit it via a
catchret, at which point the exception is destroyed and we can't
rethrow. We could potentially emit the finally body inside the catchpad
and have the normal cleanup path somehow branch into it, but that would
require some new IR construct that could branch into a catchpad.
Instead, after discussing it with Reid Kleckner, we decided that
frontend outlining was the best approach, similar to how SEH `__finally`
works today. We decided to use CapturedStmt (which was also suggested by
Reid) rather than CaptureFinder (which is what `__finally` uses) since
the latter doesn't handle a lot of cases we care about, e.g. self
accesses, property accesses, block captures, etc. Extending
CaptureFinder to handle those additional cases proved unwieldy, whereas
CapturedStmt already took care of all of those. In theory `__finally`
could also be moved over to CapturedStmt, which would remove some
existing limitations (e.g. the inability to capture this), although
CaptureFinder would still be needed for SEH filters.
The one case supported by `@finally` but not CapturedStmt (or
CaptureFinder for that matter) is arbitrary control flow out of the
`@finally`, e.g. having a return statement inside a `@finally`. We can
add that support as a follow-up, but in practice we've found it to be
used very rarely anyway.
Differential Revision: https://reviews.llvm.org/D47564
llvm-svn: 334224
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition. The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum. The second controls more flags-like values.
This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before. This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.
llvm-svn: 334221
Summary:
Created a new protobuf and protobuf-to-C++ "converter" that wraps the entire C++ code in a single for loop.
- Slightly changed cxx_proto.proto -> cxx_loop_proto.proto
- Made some changes to proto_to_cxx files to handle the new kind of protobuf
- Created ExampleClangLoopProtoFuzzer to test new protobuf and "converter"
Patch by Emmett Neyman
Reviewers: kcc, vitalybuka, morehouse
Reviewed By: vitalybuka, morehouse
Subscribers: mgorny, llvm-commits, cfe-commits
Differential Revision: https://reviews.llvm.org/D47843
llvm-svn: 334216
We still lower them to native shuffle IR, but we do it in CGBuiltin.cpp now. This allows us to check the target feature and ensure the immediate fits in 8 bits.
This also improves our -O0 codegen slightly because we're able to see the zeroinitializer in the shuffle. It looks like it got lost behind a store+load previously.
llvm-svn: 334208
Summary: We were missing the case when python-style comments in text protos start with `##`.
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47870
llvm-svn: 334179
Summary:
When requirement imposed by __target__ attributes on functions
are not satisfied, prefer printing those requirements, which
are explicitly mentioned in the attributes.
This makes such messages more useful, e.g. printing avx512f instead of avx2
in the following scenario:
```
$ cat foo.c
static inline void __attribute__((__always_inline__, __target__("avx512f")))
x(void)
{
}
int main(void)
{
x();
}
$ clang foo.c
foo.c:7:2: error: always_inline function 'x' requires target feature 'avx2', but would be inlined into function 'main' that is compiled without support for 'avx2'
x();
^
1 error generated.
```
bugzilla: https://bugs.llvm.org/show_bug.cgi?id=37338
Reviewers: craig.topper, echristo, dblaikie
Reviewed By: craig.topper, echristo
Differential Revision: https://reviews.llvm.org/D46541
llvm-svn: 334174
Previous, if no Decl's were checked, visibility was set to false. Switch it
so that in cases of no Decl's, return true. These are the Decl's after being
filtered. Also remove an unreachable return statement since it is directly
after another return statement.
llvm-svn: 334160
Summary:
We recently switch to using a selects in the intrinsics header files for FMA instructions. But the 512-bit versions support flavors with rounding mode which must be an Integer Constant Expression. This has forced those intrinsics to be implemented as macros. As it stands now the mask and mask3 intrinsics evaluate one of their macro arguments twice. If that argument itself is another intrinsic macro, we can end up over expanding macros. Or if its something we can CSE later it would show up multiple times when it shouldn't.
I tried adding __extension__ around the macro and making it an expression statement and declaring a local variable. But whatever name you choose for the local variable can never be used as the name of an input to the macro in user code. If that happens you would end up with the same name on the LHS and RHS of an assignment after expansion. We might be safe if we use __ in front of the variable names because those names are reserved and user code shouldn't use that, but I wasn't sure I wanted to make that claim.
The other option which I've chosen here, is to add back _mask, _maskz, and _mask3 flavors of the builtin which we will expand in CGBuiltin.cpp to replicate the argument as needed and insert any fneg needed on the third operand to make a subtract. The _maskz isn't truly necessary if we have an unmasked version or if we use the masked version with a -1 mask and wrap a select around it. But I've chosen to make things more uniform.
I separated out the scalar builtin handling to avoid too many things going on in EmitX86FMAExpr. It was different enough due to the extract and insert that the minor duplication of the CreateCall was probably worth it.
Reviewers: tkrupa, RKSimon, spatel, GBuella
Reviewed By: tkrupa
Subscribers: cfe-commits
Differential Revision: https://reviews.llvm.org/D47724
llvm-svn: 334159
An attempt to use dynamic_cast while rtti is disabled, used to emit the error:
cannot use dynamic_cast with -fno-rtti
and a similar one for typeid.
This patch changes that to:
use of dynamic_cast requires -frtti
Differential Revision: https://reviews.llvm.org/D47291
llvm-svn: 334153
Avoid storing information for definitions since those can be out-of-line and
vary between modules even when the declarations are the same.
llvm-svn: 334151
-fseh-exceptions is only meaningful for MinGW targets, and that driver
already has logic to pass either -fdwarf-exceptions or -fseh-exceptions
as appropriate. -fseh-exceptions is just a no-op for MSVC triples, and
passing it to cc1 causes unnecessary confusion.
Differential Revision: https://reviews.llvm.org/D47850
llvm-svn: 334145
We were already performing checks on non-template variables,
but the checks on templated ones were missing.
Differential Revision: https://reviews.llvm.org/D45231
llvm-svn: 334143
HIP uses clang-offload-bundler to bundle intermediate files for host
and different gpu archs together. When a file is unbundled,
clang-offload-bundler should be called only once, and the objects
for host and different gpu archs should be passed to the next
jobs. This is because Driver maintains CachedResults which maps
triple-arch string to output files for each job.
This patch fixes a bug in Driver::BuildJobsForActionNoCache which
uses incorrect key for CachedResults for HIP which causes
clang-offload-bundler being called mutiple times and incorrect
output files being used.
It only affects HIP.
Differential Revision: https://reviews.llvm.org/D47555
llvm-svn: 334128
Factor out the common setjmp call emission code.
Based on a patch by Chris January
Differential Revision: https://reviews.llvm.org/D47784
llvm-svn: 334112
When looking up a template name, we can find an overload set containing a
function template and an unresolved non-type using declaration.
llvm-svn: 334106
NFC for targets other than PS4.
Simplify users' workflow when enabling asan or ubsan and calling the linker separately.
Differential Revision: https://reviews.llvm.org/D47375
llvm-svn: 334096
Do not memory map the main file if the flag UserFilesAreVolatile is set to true
in ASTUnit when calling FileSystem::getBufferForFile.
Differential Revision: https://reviews.llvm.org/D47460
llvm-svn: 334070
Summary:
Since Z3 tests have been not been running [1] some tests needed to be
updated. I also added a regression test for [1].
[1] https://reviews.llvm.org/D47722
Reviewers: george.karpenkov, NoQ, ddcc
Reviewed By: george.karpenkov
Subscribers: mikhail.ramalho, dcoughlin, xazax.hun, szepet, zzheng, a.sidorin, cfe-commits
Differential Revision: https://reviews.llvm.org/D47726
llvm-svn: 334067