Summary:
This adds the 'f' inline assembly constraint, as supported by GCC. An
'f'-constrained operand is passed in a floating point register. Exactly
which kind of floating-point register (32-bit or 64-bit) is decided
based on the operand type and the available standard extensions (-f and
-d, respectively).
This patch adds support in both the clang frontend, and LLVM itself.
Reviewers: asb, lewis-revill
Reviewed By: asb
Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, s.egerton, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65500
llvm-svn: 367403
In `CodeGenFunction::EmitAArch64BuiltinExpr()`, bulk move all of the aarch64 MSVC-builtin cases to an earlier point in the function (the `// Handle non-overloaded intrinsics first` switch block) in order to avoid an unreachable in `GetNeonType()`. The NEON type-overloading logic is not appropriate for the Windows builtins.
Fixes https://llvm.org/pr42775
Differential Revision: https://reviews.llvm.org/D65403
llvm-svn: 367323
Summary:
In D62801, new function attribute `willreturn` was introduced. In short, a function with `willreturn` is guaranteed to come back to the call site(more precise definition is in LangRef).
In this patch, willreturn is annotated for LLVM intrinsics.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: jvesely, nhaehnle, sstefan1, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64904
llvm-svn: 367184
This patch changes the following tests to run under the new pass manager only:
```
Clang :: CodeGen/avx512-reduceMinMaxIntrin.c (1 of 4)
Clang :: CodeGen/avx512vl-builtins.c (2 of 4)
Clang :: CodeGen/avx512vlbw-builtins.c (3 of 4)
Clang :: CodeGen/avx512f-builtins.c (4 of 4)
```
The new PM added extra bitcasts that weren't checked before. For
reduceMinMaxIntrin.c, the issue was mostly the alloca's being in a different
order. Other changes involved extra bitcasts, and differently ordered loads and
stores, but the logic should still be the same.
Differential revision: https://reviews.llvm.org/D65110
llvm-svn: 367157
The maximum alignment used by ARM arch
is 64bits, not 128.
This could cause overaligned memory
access for 128 bit neon vector that
have unpredictable behaviour.
This fixes: https://bugs.llvm.org/show_bug.cgi?id=42668
Patch by: Diogo Sampaio(diogo.sampaio@arm.com)
Differential Revision: https://reviews.llvm.org/D65000
Change-Id: I5a62b766491f15dd51e4cfe6625929db897f67e3
llvm-svn: 367119
changes were made to the patch since then.
--------
[NewPM] Port Sancov
This patch contains a port of SanitizerCoverage to the new pass manager. This one's a bit hefty.
Changes:
- Split SanitizerCoverageModule into 2 SanitizerCoverage for passing over
functions and ModuleSanitizerCoverage for passing over modules.
- ModuleSanitizerCoverage exists for adding 2 module level calls to initialization
functions but only if there's a function that was instrumented by sancov.
- Added legacy and new PM wrapper classes that own instances of the 2 new classes.
- Update llvm tests and add clang tests.
llvm-svn: 367053
Adds the SVE vector and predicate registers to the list of known registers.
Patch by Kerry McLaughlin.
Reviewers: erichkeane, sdesmalen, rengolin
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D64739
llvm-svn: 366878
Modified the intrinsics
int_addressofreturnaddress,
int_frameaddress & int_sponentry.
This commit depends on the changes in rL366679
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D64563
llvm-svn: 366683
with '-mframe-pointer'
After D56351 and D64294, frame pointer handling is migrated to tri-state
(all, non-leaf, none) in clang driver and on the function attribute.
This patch makes the frame pointer handling cc1 option tri-state.
Reviewers: chandlerc, rnk, t.p.northover, MaskRay
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D56353
llvm-svn: 366645
Summary:
Add immutable WASM global `__tls_align` which stores the alignment
requirements of the TLS segment.
Add `__builtin_wasm_tls_align()` intrinsic to get this alignment in Clang.
The expected usage has now changed to:
__wasm_init_tls(memalign(__builtin_wasm_tls_align(),
__builtin_wasm_tls_size()));
Reviewers: tlively, aheejin, sbc100, sunfish, alexcrichton
Reviewed By: tlively
Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D65028
llvm-svn: 366624
Summary:
Regular LTO modules do not need LTO Unit splitting, only ThinLTO does
(they must be consistently split into regular and Thin units for
optimizations such as whole program devirtualization and lower type
tests). In order to avoid spurious errors from LTO when combining with
split ThinLTO modules, always set this flag for regular LTO modules.
Reviewers: pcc
Subscribers: mehdi_amini, Prazek, inglorion, steven_wu, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D65009
llvm-svn: 366623
The RISC-V hard float calling convention requires the frontend to:
* Detect cases where, once "flattened", a struct can be passed using
int+fp or fp+fp registers under the hard float ABI and coerce to the
appropriate type(s)
* Track usage of GPRs and FPRs in order to gate the above, and to
determine when signext/zeroext attributes must be added to integer
scalars
This patch attempts to do this in compliance with the documented ABI,
and uses ABIArgInfo::CoerceAndExpand in order to do this. @rjmccall, as
author of that code I've tagged you as reviewer for initial feedback on
my usage.
Note that a previous version of the ABI indicated that when passing an
int+fp struct using a GPR+FPR, the int would need to be sign or
zero-extended appropriately. GCC never did this and the ABI was changed,
which makes life easier as ABIArgInfo::CoerceAndExpand can't currently
handle sign/zero-extension attributes.
Re-landed after backing out 366450 due to missed hunks.
Differential Revision: https://reviews.llvm.org/D60456
llvm-svn: 366480
Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.
Reviewers: tlively, aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64900
llvm-svn: 366475
The RISC-V hard float calling convention requires the frontend to:
* Detect cases where, once "flattened", a struct can be passed using
int+fp or fp+fp registers under the hard float ABI and coerce to the
appropriate type(s) * Track usage of GPRs and FPRs in order to gate the
above, and to
determine when signext/zeroext attributes must be added to integer
scalars
This patch attempts to do this in compliance with the documented ABI,
and uses ABIArgInfo::CoerceAndExpand in order to do this. @rjmccall, as
author of that code I've tagged you as reviewer for initial feedback on
my usage.
Note that a previous version of the ABI indicated that when passing an
int+fp struct using a GPR+FPR, the int would need to be sign or
zero-extended appropriately. GCC never did this and the ABI was changed,
which makes life easier as ABIArgInfo::CoerceAndExpand can't currently
handle sign/zero-extension attributes.
Differential Revision: https://reviews.llvm.org/D60456
llvm-svn: 366450
Remove dependency of malloc in implementation of mm_malloc function in PowerPC
intrinsics and alignment assumption on glibc.
Reviewed By: Hal Finkel
Differential Revision: https://reviews.llvm.org/D64850
llvm-svn: 366406
As discussed in D64780 the wording of this warning message is being
changed to say 'is not supported' instead of 'ignored', and the
diag ID itself is being changed to warn_cconv_not_supported.
llvm-svn: 366368
Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.
`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.
`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.
`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
__wasm_init_tls(malloc(__builtin_wasm_tls_size()));
Reviewers: tlively, aheejin, kripken, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64537
llvm-svn: 366272
The original commit is r366076. It is temporarily reverted (r366155)
due to test failure. This resubmit makes test more robust by accepting
regex instead of hardcoded names/references in several places.
This is a followup patch for https://reviews.llvm.org/D61809.
Handle unnamed bitfield properly and add more test cases.
Fixed the unnamed bitfield issue. The unnamed bitfield is ignored
by debug info, so we need to ignore such a struct/union member
when we try to get the member index in the debug info.
D61809 contains two test cases but not enough as it does
not checking generated IRs in the fine grain level, and also
it does not have semantics checking tests.
This patch added unit tests for both code gen and semantics checking for
the new intrinsic.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 366231
The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined.
This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function.
The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used.
I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example).
make check-all didn't show any new failures.
[1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics
Differential Revision: https://reviews.llvm.org/D64495
llvm-svn: 366197
i.e., recent 5745eccef54ddd3caca278d1d292a88b2281528b:
* Bump the function_type_mismatch handler version, as its signature has changed.
* The function_type_mismatch handler can return successfully now, so
SanitizerKind::Function must be AlwaysRecoverable (like for
SanitizerKind::Vptr).
* But the minimal runtime would still unconditionally treat a call to the
function_type_mismatch handler as failure, so disallow -fsanitize=function in
combination with -fsanitize-minimal-runtime (like it was already done for
-fsanitize=vptr).
* Add tests.
Differential Revision: https://reviews.llvm.org/D61479
llvm-svn: 366186
Add "memtag" sanitizer that detects and mitigates stack memory issues
using armv8.5 Memory Tagging Extension.
It is similar in principle to HWASan, which is a software implementation
of the same idea, but there are enough differencies to warrant a new
sanitizer type IMHO. It is also expected to have very different
performance properties.
The new sanitizer does not have a runtime library (it may grow one
later, along with a "debugging" mode). Similar to SafeStack and
StackProtector, the instrumentation pass (in a follow up change) will be
inserted in all cases, but will only affect functions marked with the
new sanitize_memtag attribute.
Reviewers: pcc, hctim, vitalybuka, ostannard
Subscribers: srhines, mehdi_amini, javed.absar, kristof.beyls, hiraditya, cryptoad, steven_wu, dexonsmith, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64169
llvm-svn: 366123
This is a followup patch for https://reviews.llvm.org/D61809.
Handle unnamed bitfield properly and add more test cases.
Fixed the unnamed bitfield issue. The unnamed bitfield is ignored
by debug info, so we need to ignore such a struct/union member
when we try to get the member index in the debug info.
D61809 contains two test cases but not enough as it does
not checking generated IRs in the fine grain level, and also
it does not have semantics checking tests.
This patch added unit tests for both code gen and semantics checking for
the new intrinsic.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 366076
gcc PowerPC supports 3 representations of long double:
* -mlong-double-64
long double has the same representation of double but is mangled as `e`.
In clang, this is the default on AIX, FreeBSD and Linux musl.
* -mlong-double-128
2 possible 128-bit floating point representations:
+ -mabi=ibmlongdouble
IBM extended double format. Mangled as `g`
In clang, this is the default on Linux glibc.
+ -mabi=ieeelongdouble
IEEE 754 quadruple-precision format. Mangled as `u9__ieee128` (`U10__float128` before gcc 8.2)
This is currently unavailable.
This patch adds -mabi=ibmlongdouble and -mabi=ieeelongdouble, and thus
makes the IEEE 754 quadruple-precision long double available for
languages supported by clang.
Reviewed By: hfinkel
Differential Revision: https://reviews.llvm.org/D64283
llvm-svn: 366044
When processing the command line options march, mcpu and mfpu, we store
the implied target features on a vector. The change D62998 introduced a
temporary vector, where the processed features get accumulated. When
calling DecodeARMFeaturesFromCPU, which sets the default features for
the specified CPU, we certainly don't want to override the features
that have been explicitly specified on the command line. Therefore, the
default features should appear first in the final vector. This problem
became evident once I added the missing (unhandled) target features in
ARM::getExtensionFeatures.
Differential Revision: https://reviews.llvm.org/D63936
llvm-svn: 366027
This patch series adds support for the next-generation arch13
CPU architecture to the SystemZ backend.
This includes:
- Basic support for the new processor and its features.
- Support for low-level builtins mapped to new LLVM intrinsics.
- New high-level intrinsics in vecintrin.h.
- Indicate support by defining __VEC__ == 10303.
Note: No currently available Z system supports the arch13
architecture. Once new systems become available, the
official system name will be added as supported -march name.
llvm-svn: 365933
This patch makes the driver option -mlong-double-128 available for X86
and PowerPC. The CC1 option -mlong-double-128 is available on all targets
for users to test on unsupported targets.
On PowerPC, -mlong-double-128 uses the IBM extended double format
because we don't support -mabi=ieeelongdouble yet (D64283).
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D64277
llvm-svn: 365866
This patch contains a port of SanitizerCoverage to the new pass manager. This one's a bit hefty.
Changes:
- Split SanitizerCoverageModule into 2 SanitizerCoverage for passing over
functions and ModuleSanitizerCoverage for passing over modules.
- ModuleSanitizerCoverage exists for adding 2 module level calls to initialization
functions but only if there's a function that was instrumented by sancov.
- Added legacy and new PM wrapper classes that own instances of the 2 new classes.
- Update llvm tests and add clang tests.
Differential Revision: https://reviews.llvm.org/D62888
llvm-svn: 365838
An os_log_helper FunctionDecl may not have a body. Ignore these for the
purposes of debug entry value emission.
Fixes an assertion failure seen in a stage2 build of clang:
Assertion failed: (FD->hasBody() && "Functions must have body here"),
function analyzeParametersModification
llvm-svn: 365716
All the command lines are for 64-bit mode, but sometimes I compile
the tests in 32-bit mode to see what assembly we get and we need
to skip these to do that.
llvm-svn: 365668
The CCCR_Ignore action is only used for Microsoft calling conventions,
mainly because MSVC does not warn when a calling convention would be
ignored by the current target. This behavior is actually somewhat
important, since windows.h uses WINAPI (which expands to __stdcall)
widely. This distinction didn't matter much before the introduction of
__vectorcall to x64 and the ability to make that the default calling
convention with /Gv. Now, we can't just ignore __stdcall for x64, we
have to treat it as an explicit __cdecl annotation.
Fixes PR42531
llvm-svn: 365579
Ignore trailing NullStmts in compound expressions when determining the result type and value. This is to match the GCC behavior which ignores semicolons at the end of compound expressions.
Patch by Dominic Ferreira.
llvm-svn: 365498
For background of BPF CO-RE project, please refer to
http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.
In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.
Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
addr = preserve_array_access_index(base, index, dimension)
addr = preserve_union_access_index(base, di_index)
addr = preserve_struct_access_index(base, gep_index, di_index)
here,
base: the base pointer for the array/union/struct access.
index: the last access index for array, the same for IR/DebugInfo layout.
dimension: the array dimension.
gep_index: the access index based on IR layout.
di_index: the access index based on user/debuginfo types.
If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().
For example, for the following example,
$ cat test.c
struct sk_buff {
int i;
int b1:1;
int b2:2;
union {
struct {
int o1;
int o2;
} o;
struct {
char flags;
char dev_id;
} dev;
int netid;
} u[10];
};
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
= (void *) 4;
#define _(x) (__builtin_preserve_access_index(x))
int bpf_prog(struct sk_buff *ctx) {
char dev_id;
bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
return dev_id;
}
$ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
test.c >& log
The generated IR looks like below:
...
define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
%2 = alloca %struct.sk_buff*, align 8
%3 = alloca i8, align 1
store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
%4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
%5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
%6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
%struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
%7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
[10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
%8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
%union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
%9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
%10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
%struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
%11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
%12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
%13 = sext i8 %12 to i32, !dbg !54
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
ret i32 %13, !dbg !57
}
!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
!26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
!34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.
For &ctx->u[5].dev.dev_id,
. The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
. The "%7 = ..." represents array subscript "5".
. The "%8 = ..." represents union member "dev" with index 1 for DI layout.
. The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.
The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.
Signed-off-by: Yonghong Song <yhs@fb.com>
Differential Revision: https://reviews.llvm.org/D61809
llvm-svn: 365438
For background of BPF CO-RE project, please refer to
http://vger.kernel.org/bpfconf2019.html
In summary, BPF CO-RE intends to compile bpf programs
adjustable on struct/union layout change so the same
program can run on multiple kernels with adjustment
before loading based on native kernel structures.
In order to do this, we need keep track of GEP(getelementptr)
instruction base and result debuginfo types, so we
can adjust on the host based on kernel BTF info.
Capturing such information as an IR optimization is hard
as various optimization may have tweaked GEP and also
union is replaced by structure it is impossible to track
fieldindex for union member accesses.
Three intrinsic functions, preserve_{array,union,struct}_access_index,
are introducted.
addr = preserve_array_access_index(base, index, dimension)
addr = preserve_union_access_index(base, di_index)
addr = preserve_struct_access_index(base, gep_index, di_index)
here,
base: the base pointer for the array/union/struct access.
index: the last access index for array, the same for IR/DebugInfo layout.
dimension: the array dimension.
gep_index: the access index based on IR layout.
di_index: the access index based on user/debuginfo types.
If using these intrinsics blindly, i.e., transforming all GEPs
to these intrinsics and later on reducing them to GEPs, we have
seen up to 7% more instructions generated. To avoid such an overhead,
a clang builtin is proposed:
base = __builtin_preserve_access_index(base)
such that user wraps to-be-relocated GEPs in this builtin
and preserve_*_access_index intrinsics only apply to
those GEPs. Such a buyin will prevent performance degradation
if people do not use CO-RE, even for programs which use
bpf_probe_read().
For example, for the following example,
$ cat test.c
struct sk_buff {
int i;
int b1:1;
int b2:2;
union {
struct {
int o1;
int o2;
} o;
struct {
char flags;
char dev_id;
} dev;
int netid;
} u[10];
};
static int (*bpf_probe_read)(void *dst, int size, const void *unsafe_ptr)
= (void *) 4;
#define _(x) (__builtin_preserve_access_index(x))
int bpf_prog(struct sk_buff *ctx) {
char dev_id;
bpf_probe_read(&dev_id, sizeof(char), _(&ctx->u[5].dev.dev_id));
return dev_id;
}
$ clang -target bpf -O2 -g -emit-llvm -S -mllvm -print-before-all \
test.c >& log
The generated IR looks like below:
...
define dso_local i32 @bpf_prog(%struct.sk_buff*) #0 !dbg !15 {
%2 = alloca %struct.sk_buff*, align 8
%3 = alloca i8, align 1
store %struct.sk_buff* %0, %struct.sk_buff** %2, align 8, !tbaa !45
call void @llvm.dbg.declare(metadata %struct.sk_buff** %2, metadata !43, metadata !DIExpression()), !dbg !49
call void @llvm.lifetime.start.p0i8(i64 1, i8* %3) #4, !dbg !50
call void @llvm.dbg.declare(metadata i8* %3, metadata !44, metadata !DIExpression()), !dbg !51
%4 = load i32 (i8*, i32, i8*)*, i32 (i8*, i32, i8*)** @bpf_probe_read, align 8, !dbg !52, !tbaa !45
%5 = load %struct.sk_buff*, %struct.sk_buff** %2, align 8, !dbg !53, !tbaa !45
%6 = call [10 x %union.anon]* @llvm.preserve.struct.access.index.p0a10s_union.anons.p0s_struct.sk_buffs(
%struct.sk_buff* %5, i32 2, i32 3), !dbg !53, !llvm.preserve.access.index !19
%7 = call %union.anon* @llvm.preserve.array.access.index.p0s_union.anons.p0a10s_union.anons(
[10 x %union.anon]* %6, i32 1, i32 5), !dbg !53
%8 = call %union.anon* @llvm.preserve.union.access.index.p0s_union.anons.p0s_union.anons(
%union.anon* %7, i32 1), !dbg !53, !llvm.preserve.access.index !26
%9 = bitcast %union.anon* %8 to %struct.anon.0*, !dbg !53
%10 = call i8* @llvm.preserve.struct.access.index.p0i8.p0s_struct.anon.0s(
%struct.anon.0* %9, i32 1, i32 1), !dbg !53, !llvm.preserve.access.index !34
%11 = call i32 %4(i8* %3, i32 1, i8* %10), !dbg !52
%12 = load i8, i8* %3, align 1, !dbg !54, !tbaa !55
%13 = sext i8 %12 to i32, !dbg !54
call void @llvm.lifetime.end.p0i8(i64 1, i8* %3) #4, !dbg !56
ret i32 %13, !dbg !57
}
!19 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "sk_buff", file: !3, line: 1, size: 704, elements: !20)
!26 = distinct !DICompositeType(tag: DW_TAG_union_type, scope: !19, file: !3, line: 5, size: 64, elements: !27)
!34 = distinct !DICompositeType(tag: DW_TAG_structure_type, scope: !26, file: !3, line: 10, size: 16, elements: !35)
Note that @llvm.preserve.{struct,union}.access.index calls have metadata llvm.preserve.access.index
attached to instructions to provide struct/union debuginfo type information.
For &ctx->u[5].dev.dev_id,
. The "%6 = ..." represents struct member "u" with index 2 for IR layout and index 3 for DI layout.
. The "%7 = ..." represents array subscript "5".
. The "%8 = ..." represents union member "dev" with index 1 for DI layout.
. The "%10 = ..." represents struct member "dev_id" with index 1 for both IR and DI layout.
Basically, traversing the use-def chain recursively for the 3rd argument of bpf_probe_read() and
examining all preserve_*_access_index calls, the debuginfo struct/union/array access index
can be achieved.
The intrinsics also contain enough information to regenerate codes for IR layout.
For array and structure intrinsics, the proper GEP can be constructed.
For union intrinsics, replacing all uses of "addr" with "base" should be enough.
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 365435
-mlong-double-64 is supported on some ports of gcc (i386, x86_64, and ppc{32,64}).
On many other targets, there will be an error:
error: unrecognized command line option '-mlong-double-64'
This patch makes the driver option -mlong-double-64 available for x86
and ppc. The CC1 option -mlong-double-64 is available on all targets for
users to test on unsupported targets.
LongDoubleSize is added as a VALUE_LANGOPT so that the option can be
shared with -mlong-double-128 when we support it in clang.
Also, make powerpc*-linux-musl default to use 64-bit long double. It is
currently the only supported ABI on musl and is also how people
configure powerpc*-linux-musl-gcc.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D64067
llvm-svn: 365412
Summary:
Change the vuqadd scalar instrinsics to have the second argument as unsigned values, not signed,
accordingly to https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
So now the compiler correctly warns that a undefined negative float conversion is being done.
Reviewers: LukeCheeseman, john.brawn
Reviewed By: john.brawn
Subscribers: john.brawn, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64242
llvm-svn: 365300
Summary:
Change the vsqadd scalar instrinsics to have the second argument as signed values, not unsigned,
accordingly to https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics
The existing unsigned argument can cause faulty code as negative float to unsigned conversion is
undefined, which llvm/clang optimizes away.
Reviewers: LukeCheeseman, john.brawn
Reviewed By: john.brawn
Subscribers: john.brawn, javed.absar, kristof.beyls, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D64239
llvm-svn: 365298
Summary:
Prior to r329065, we used [-max, max] as the range of representable
values because LLVM's `fptrunc` did not guarantee defined behavior when
truncating from a larger floating-point type to a smaller one. Now that
has been fixed, we can make clang follow normal IEEE 754 semantics in this
regard and take the larger range [-inf, +inf] as the range of representable
values.
In practice, this affects two parts of the frontend:
* the constant evaluator no longer treats floating-point evaluations
that result in +-inf as being undefined (because they no longer leave
the range of representable values of the type)
* UBSan no longer treats conversions to floating-point type that are
outside the [-max, +max] range as being undefined
In passing, also remove the float-divide-by-zero sanitizer from
-fsanitize=undefined, on the basis that while it's undefined per C++
rules (and we disallow it in constant expressions for that reason), it
is defined by Clang / LLVM / IEEE 754.
Reviewers: rnk, BillyONeal
Subscribers: cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63793
llvm-svn: 365272
Summary:
"ww" and "ws" are both constraint codes for VSX vector registers that
hold scalar double data. "ww" is preferred for float while "ws" is
preferred for double.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D64119
llvm-svn: 365106
Attach a unique DISubprogram to a function declaration that will be
used for call site debug info.
([7/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D60714
llvm-svn: 364502
Summary:
The changes in D59673 made the choice redundant, since we can achieve
single-file split DWARF just by not setting an output file name.
Like llc we can also derive whether to enable Split DWARF from whether
-split-dwarf-file is set, so we don't need the flag at all anymore.
The test CodeGen/split-debug-filename.c distinguished between having set
or not set -enable-split-dwarf with -split-dwarf-file, but we can
probably just always emit the metadata into the IR.
The flag -split-dwarf wasn't used at all anymore.
Reviewers: dblaikie, echristo
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D63167
llvm-svn: 364479
Emit the debug info flag that indicates that a parameter has unchanged
value throughout a function.
([5/13] Introduce the debug entry values.)
Co-authored-by: Ananth Sowda <asowda@cisco.com>
Co-authored-by: Nikola Prica <nikola.prica@rt-rk.com>
Co-authored-by: Ivan Baev <ibaev@cisco.com>
Differential Revision: https://reviews.llvm.org/D58035
llvm-svn: 364424
"To" selects an odd-numbered GPR, and "Te" an even one. There are some
8.1-M instructions that have one too few bits in their register fields
and require registers of particular parity, without necessarily using
a consecutive even/odd pair.
Also, the constraint letter "t" should select an MVE q-register, when
MVE is present. This didn't need any source changes, but some extra
tests have been added.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: javed.absar, eraman, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60709
llvm-svn: 364331
This patch ensures that SimplifyCFGPass comes before SampleProfileLoaderPass
on PGO runs in the new PM and fixes clang/test/CodeGen/pgo-sample.c.
Differential Revision: https://reviews.llvm.org/D63626
llvm-svn: 364201
Unnamed bit-fields should not be represented in the TBAA metadata
because they do not represent storage fields (they only affect layout).
Zero-sized fields should not be represented in the TBAA metadata
because by definition they have no associated storage (so we will never
emit a load or store through them), and they might not appear in
declaration order within the struct layout.
Fixes a verifier failure when emitting a TBAA-enabled load through a
class type containing a zero-sized field.
llvm-svn: 364140
_MM_FROUND_CUR_DIRECTION is the behavior of the intrinsics that
don't take a rounding mode argument. So a better test
is using _MM_FROUND_NO_EXC with the SAE only intrinsics and
an explicit rounding mode with the intrinsics that support
embedded rounding mode.
llvm-svn: 364127
As per the discussion on D58375, we disable test that have optimizations under
the new PM. This patch adds -fno-experimental-new-pass-manager to RUNS that:
- Already run with optimizations (-O1 or higher) that were missed in D58375.
- Explicitly test new PM behavior along side some new PM RUNS, but are missing
this flag if new PM is enabled by default.
- Specify -O without the number. Based on getOptimizationLevel(), it seems the
default is 2, and the IR appears to be the same when changed to -O2, so
update the test to explicitly say -O2 and provide -fno-experimental-new-pass-manager`.
Differential Revision: https://reviews.llvm.org/D63156
llvm-svn: 364066
This fixes CodeGen/available-externally-suppress.c when the new pass manager is
turned on by default. available_externally was not emitted during -O2 -flto
runs when it should still be retained for link time inlining purposes. This can
be fixed by checking that we aren't LTOPrelinking when adding the
EliminateAvailableExternallyPass.
Differential Revision: https://reviews.llvm.org/D63580
llvm-svn: 363971
This fixes CodeGen/x86_64-instrument-functions.c when running under the new
pass manager. The pass should go before any other pass to prevent
`__cyg_profile_func_enter/exit()` from not being emitted by inlined functions.
Differential Revision: https://reviews.llvm.org/D63577
llvm-svn: 363969
These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
- CodeGen/flatten.c will fail under new PM becausec the new PM AlwaysInliner
seems to intentionally inline functions but not call sites marked with
alwaysinline (D23299)
- Tests that check remarks happen to check them for the inliner which is not
turned on at O0. These tests just check that remarks work, but we can make
separate tests for the new PM with -O1 so we can turn on the inliner and
check the remarks with minimal changes.
Differential Revision: https://reviews.llvm.org/D62225
llvm-svn: 363846
This introduced MMX instructions in code that wasn't previously using
them, breaking programs using 64-bit vectors and x87 floating-point in
the same application. See discussion on the code review for more
details.
> According to System V i386 ABI: the __m64 type paramater and return
> value are passed by MMX registers. But current implementation treats
> __m64 as i64 which results in parameter passing by stack and returning
> by EDX and EAX.
>
> This patch fixes the bug (https://bugs.llvm.org/show_bug.cgi?id=41029)
> for Linux and NetBSD.
>
> Patch by Wei Xiao (wxiao3)
>
> Differential Revision: https://reviews.llvm.org/D59744
llvm-svn: 363790
Summary:
When a function argument or return type is a homogeneous aggregate
which contains an FP16 vector but the target does not support FP16
operations natively, the type must be converted into an array of
integer vectors by then front end (otherwise LLVM will handle FP16
vectors incorrectly by scalarizing them and promoting FP16 to float,
see https://reviews.llvm.org/D50507).
Currently the logic for checking whether or not a given homogeneous
aggregate contains FP16 vectors is incorrect: it only looks at the
type of the first vector.
This patch fixes the issue by adding a new method
ARMABIInfo::containsAnyFP16Vectors and using it. The traversal logic
of this method is largely the same as in
ABIInfo::isHomogeneousAggregate.
Reviewers: eli.friedman, olista01, ostannard
Reviewed By: ostannard
Subscribers: ostannard, john.brawn, javed.absar, kristof.beyls, pbarrio, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63437
llvm-svn: 363687
Use -fsave-optimization-record=<format> to specify a different format
than the default, which is YAML.
For now, only YAML is supported.
llvm-svn: 363573
Summary:
With Split DWARF the resulting object file (then called skeleton CU)
contains the file name of another ("DWO") file with the debug info.
This can be a problem for remote compilation, as it will contain the
name of the file on the compilation server, not on the client.
To use Split DWARF with remote compilation, one needs to either
* make sure only relative paths are used, and mirror the build directory
structure of the client on the server,
* inject the desired file name on the client directly.
Since llc already supports the latter solution, we're just copying that
over. We allow setting the actual output filename separately from the
value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU.
Fixes PR40276.
Reviewers: dblaikie, echristo, tejohnson
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D59673
llvm-svn: 363496
Summary:
This is the first in a series of changes trying to align clang -cc1
flags for Split DWARF with those of llc. The unfortunate side effect of
having -split-dwarf-output for single file Split DWARF will disappear
again in a subsequent change.
The change is the result of a discussion in D59673.
Reviewers: dblaikie, echristo
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D63130
llvm-svn: 363494
ARM has a special target feature called soft-float-abi. This feature is
special, since we get it passed to us explicitly in the frontend, but
filter it out before it can land in any target feature strings in LLVM
IR.
__attribute__((target(""))) doesn't quite filter these features out
properly, so today, we get warnings about soft-float-abi being an
unknown feature from the backend.
This CL has us filter soft-float-abi out at a slightly different point,
so we don't end up passing these invalid features to the backend.
Differential Revision: https://reviews.llvm.org/D61750
llvm-svn: 363346
Add an AssumptionCache callback to the InlineFuntionInfo used for the
AlwaysInlinerPass to match codegen of the AlwaysInlinerLegacyPass to generate
llvm.assume. This fixes CodeGen/builtin-movdir.c when new PM is enabled by
default.
Differential Revision: https://reviews.llvm.org/D63170
llvm-svn: 363287
This contains the part of D62225 which fixes CodeGen/split-debug-single-file.c
by not placing .dwo sections when using -enable-split-dwarf=split.
Differential Revision: https://reviews.llvm.org/D63168
llvm-svn: 363281
Port emmintrin.h which include Intel SSE2 intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
It's a follow-up patch of D62121.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Differential Revision: https://reviews.llvm.org/D62569
llvm-svn: 363122
According to System V i386 ABI: the __m64 type paramater and return
value are passed by MMX registers. But current implementation treats
__m64 as i64 which results in parameter passing by stack and returning
by EDX and EAX.
This patch fixes the bug (https://bugs.llvm.org/show_bug.cgi?id=41029)
for Linux and NetBSD.
Patch by Wei Xiao (wxiao3)
Differential Revision: https://reviews.llvm.org/D59744
llvm-svn: 363116
Two recently added tests mention complications for cross-compile, but
they do not actually enforce native compilation. This patch makes them
require native compilation to avoid the complications they mention.
llvm-svn: 363070
Change D60691 caused some knock-on failures that weren't caught by the
existing tests. Firstly, selecting a CPU that should have had a
restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs
and no double precision) could give the unrestricted version, because
`ARM::getFPUFeatures` returned a list of features including subtracted
ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw
away all the ones that didn't start with `+`. Secondly, the
preprocessor macros didn't reliably match the actual compilation
settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as
if hardware FP was available, because the list of features on the cc1
command line would include things like `+vfp4`,`-vfp4d16` and clang
didn't realise that one of those cancelled out the other.
I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so
that it returns a list that enables every FP-related feature
compatible with the selected FPU and disables every feature not
compatible, which is more verbose but means clang doesn't have to
understand the dependency relationships between the backend features.
Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all
the various forms of the FP feature names, so that it won't miss cases
where it should have set `HW_FP` to feed into feature test macros.
That in turn caused an ordering problem when handling `-mcpu=foo+bar`
together with `-mfpu=something_that_turns_off_bar`. To fix that, I've
arranged that the `+bar` suffixes on the end of `-mcpu` and `-march`
cause feature names to be put into a separate vector which is
concatenated after the output of `getFPUFeatures`.
Another side effect of all this is to fix a bug where `clang -target
armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even
though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was
because `HW_FP` was being set to a value including only the `FPARMV8`
bit, but that feature test macro was testing only the `VFP4FPU` bit.
Now `HW_FP` ends up with all the bits set, so it gives the right
answer.
Changes to tests included in this patch:
* `arm-target-features.c`: I had to change basically all the expected
results. (The Cortex-M4 test in there should function as a
regression test for the accidental double-precision bug.)
* `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG`
everywhere so that those tests are no longer sensitive to the order
of cc1 feature options on the command line.
* `arm-acle-6.5.c`: been updated to expect the right answer to that
FMA test.
* `Preprocessor/arm-target-features.c`: added a regression test for
the `mfpu=softvfp` issue.
Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne
Reviewed By: ostannard
Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62998
llvm-svn: 362791
These builtins should work with immediate or variable shift operand for
gcc compatibility.
Differential Revision: https://reviews.llvm.org/D62850
llvm-svn: 362786
LLVM IR recently added a Type parameter to the byval Attribute, so that
when pointers become opaque and no longer have an element type the
information will still be present in IR.
For now the Type parameter is optional (which is why Clang didn't need
this change at the time), but it will become mandatory soon.
llvm-svn: 362652
Tests that use -O1, -O2 and -O3 would often produce different results
with the new pass manager which makes these tests fail. Disable new PM
explicitly for these tests.
Differential Revision: https://reviews.llvm.org/D58375
llvm-svn: 362580
Summary: According to C99 standard long long is at least 64 bits in
size. However, OpenCL C defines long long as 128 bit signed
integer. This prevents one to use x86 builtins when compiling OpenCL C
code for x86 targets. The patch changes long long to long for OpenCL
only.
Patch by: Alexander Batashev <alexander.batashev@intel.com>
Reviewers: craig.topper, Ka-Ka, eandrews, erichkeane, Anastasia
Reviewed By: Ka-Ka, erichkeane, Anastasia
Subscribers: a.elovikov, yaxunl, Anastasia, cfe-commits, ivankara, etyurin, asavonic
Tags: #clang
Differential Revision: https://reviews.llvm.org/D62580
llvm-svn: 362391
The recent change D60691 introduced a bug in clang when handling
option combinations such as `-mcpu=cortex-m4 -mfpu=none`. Those
options together should select Cortex-M4 but disable all use of
hardware FP, but in fact, now hardware FP instructions can still be
generated in that mode.
The reason is because the handling of FPUVersion::NONE disables all
the same feature names it used to, of which the base one is `vfp2`.
But now there are further features below that, like `vfp2d16fp` and
(following D60694) `fpregs`, which also need to be turned off to
disable hardware FP completely.
Added a tiny test which double-checks that compiling a simple FP
function doesn't access the FP registers.
Reviewers: SjoerdMeijer, dmgreen
Reviewed By: dmgreen
Subscribers: lebedev.ri, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D62729
llvm-svn: 362380
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 362190
Since byval is now a typed attribute it gets sorted slightly differently by
LLVM when the order of attributes is being canonicalized. This updates the few
Clang tests that depend on the old order.
Clang patch is unchanged.
llvm-svn: 362129
Syntax:
asm [volatile] goto ( AssemblerTemplate
:
: InputOperands
: Clobbers
: GotoLabels)
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
New llvm IR is "callbr" for inline asm goto instead "call" for inline asm
For:
asm goto("testl %0, %0; jne %l1;" :: "r"(cond)::label_true, loop);
IR:
callbr void asm sideeffect "testl $0, $0; jne ${1:l};", "r,X,X,~{dirflag},~{fpsr},~{flags}"(i32 %0, i8* blockaddress(@foo, %label_true), i8* blockaddress(@foo, %loop)) #1
to label %asm.fallthrough [label %label_true, label %loop], !srcloc !3
asm.fallthrough:
Compiler need to generate:
1> a dummy constarint 'X' for each label.
2> an unique fallthrough label for each asm goto stmt " asm.fallthrough%number".
Diagnostic
1> duplicate asm operand name are used in output, input and label.
2> goto out of scope.
llvm-svn: 362045
Since byval is now a typed attribute it gets sorted slightly differently by
LLVM when the order of attributes is being canonicalized. This updates the few
Clang tests that depend on the old order.
llvm-svn: 362013
The `__builtin_msa_ctcmsa` and `__builtin_msa_cfcmsa` builtins are mapped
to the `ctcmsa` and `cfcmsa` instructions respectively. While MSA
control registers have indexes in 0..7 range, the instructions accept
register index in 0..31 range [1].
[1] MIPS Architecture for Programmers Volume IV-j:
The MIPS64 SIMD Architecture Module
https://www.mips.com/?do-download=the-mips64-simd-architecture-module
llvm-svn: 361967
According to i386 System V ABI 2.1: Structures and unions assume the
alignment of their most strictly aligned component. But current
implementation always takes them as 4-byte aligned which will result
in incorrect code, e.g:
1 #include <immintrin.h>
2 typedef union {
3 int d[4];
4 __m128 m;
5 } M128;
6 extern void foo(int, ...);
7 void test(void)
8 {
9 M128 a;
10 foo(1, a);
11 foo(1, a.m);
12 }
The first call (line 10) takes the second arg as 4-byte aligned while
the second call (line 11) takes the second arg as 16-byte aligned.
There is oxymoron for the alignment of the 2 calls because they should
be the same.
This patch fixes the bug by following i386 System V ABI and apply it to
Linux only since other System V OS (e.g Darwin, PS4 and FreeBSD) don't
want to spend any effort dealing with the ramifications of ABI breaks
at present.
Patch by Wei Xiao (wxiao3)
Differential Revision: https://reviews.llvm.org/D60748
llvm-svn: 361934
Port xmmintrin.h which include Intel SSE intrinsics implementation to PowerPC platform (using Altivec).
The new headers containing those implemenations are located into a directory named ppc_wrappers
which has higher priority when the platform is PowerPC on Linux. They are mainly developed by Steven Munroe,
with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Patched by: Qiu Chaofan <qiucf@cn.ibm.com>
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D62121
llvm-svn: 361928
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lrint
and llrint. It currently only optimize for AArch64 backend.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D62019
llvm-svn: 361878
Those two subtarget features were awkward because their semantics are
reversed: each one indicates the _lack_ of support for something in
the architecture, rather than the presence. As a consequence, you
don't get the behavior you want if you combine two sets of feature
bits.
Each SubtargetFeature for an FP architecture version now comes in four
versions, one for each combination of those options. So you can still
say (for example) '+vfp2' in a feature string and it will mean what
it's always meant, but there's a new string '+vfp2d16sp' meaning the
version without those extra options.
A lot of this change is just mechanically replacing positive checks
for the old features with negative checks for the new ones. But one
more interesting change is that I've rearranged getFPUFeatures() so
that the main FPU feature is appended to the output list *before*
rather than after the features derived from the Restriction field, so
that -fp64 and -d32 can override defaults added by the main feature.
Reviewers: dmgreen, samparker, SjoerdMeijer
Subscribers: srhines, javed.absar, eraman, kristof.beyls, hiraditya, zzheng, Petar.Avramovic, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D60691
llvm-svn: 361845
Summary:
NewPassManager is not using CodeGenOpts values before this patch.
[to be coupled with D61616]
Reviewers: chandlerc
Subscribers: jlebar, cfe-commits, llvm-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61617
llvm-svn: 361534
Overaligned and underaligned types (i.e. types where the alignment has been
increased or decreased using the aligned and packed attributes) weren't being
correctly handled in all cases, as the unadjusted alignment should be used.
This patch also adjusts getTypeUnadjustedAlign to correctly handle typedefs of
non-aggregate types, which it appears it never had to handle before.
Differential Revision: https://reviews.llvm.org/D62152
llvm-svn: 361372
We shouldn't really make assumptions about possible sizes for long and long long. And longer term we should probably support vectorizing these intrinsics. By making the result types not fixed we can support vectors as well.
Differential Revision: https://reviews.llvm.org/D62026
llvm-svn: 361169
This patch implements a limited form of autolinking primarily designed to allow
either the --dependent-library compiler option, or "comment lib" pragmas (
https://docs.microsoft.com/en-us/cpp/preprocessor/comment-c-cpp?view=vs-2017) in
C/C++ e.g. #pragma comment(lib, "foo"), to cause an ELF linker to automatically
add the specified library to the link when processing the input file generated
by the compiler.
Currently this extension is unique to LLVM and LLD. However, care has been taken
to design this feature so that it could be supported by other ELF linkers.
The design goals were to provide:
- A simple linking model for developers to reason about.
- The ability to to override autolinking from the linker command line.
- Source code compatibility, where possible, with "comment lib" pragmas in other
environments (MSVC in particular).
Dependent library support is implemented differently for ELF platforms than on
the other platforms. Primarily this difference is that on ELF we pass the
dependent library specifiers directly to the linker without manipulating them.
This is in contrast to other platforms where they are mapped to a specific
linker option by the compiler. This difference is a result of the greater
variety of ELF linkers and the fact that ELF linkers tend to handle libraries in
a more complicated fashion than on other platforms. This forces us to defer
handling the specifiers to the linker.
In order to achieve a level of source code compatibility with other platforms
we have restricted this feature to work with libraries that meet the following
"reasonable" requirements:
1. There are no competing defined symbols in a given set of libraries, or
if they exist, the program owner doesn't care which is linked to their
program.
2. There may be circular dependencies between libraries.
The binary representation is a mergeable string section (SHF_MERGE,
SHF_STRINGS), called .deplibs, with custom type SHT_LLVM_DEPENDENT_LIBRARIES
(0x6fff4c04). The compiler forms this section by concatenating the arguments of
the "comment lib" pragmas and --dependent-library options in the order they are
encountered. Partial (-r, -Ur) links are handled by concatenating .deplibs
sections with the normal mergeable string section rules. As an example, #pragma
comment(lib, "foo") would result in:
.section ".deplibs","MS",@llvm_dependent_libraries,1
.asciz "foo"
For LTO, equivalent information to the contents of a the .deplibs section can be
retrieved by the LLD for bitcode input files.
LLD processes the dependent library specifiers in the following way:
1. Dependent libraries which are found from the specifiers in .deplibs sections
of relocatable object files are added when the linker decides to include that
file (which could itself be in a library) in the link. Dependent libraries
behave as if they were appended to the command line after all other options. As
a consequence the set of dependent libraries are searched last to resolve
symbols.
2. It is an error if a file cannot be found for a given specifier.
3. Any command line options in effect at the end of the command line parsing apply
to the dependent libraries, e.g. --whole-archive.
4. The linker tries to add a library or relocatable object file from each of the
strings in a .deplibs section by; first, handling the string as if it was
specified on the command line; second, by looking for the string in each of the
library search paths in turn; third, by looking for a lib<string>.a or
lib<string>.so (depending on the current mode of the linker) in each of the
library search paths.
5. A new command line option --no-dependent-libraries tells LLD to ignore the
dependent libraries.
Rationale for the above points:
1. Adding the dependent libraries last makes the process simple to understand
from a developers perspective. All linkers are able to implement this scheme.
2. Error-ing for libraries that are not found seems like better behavior than
failing the link during symbol resolution.
3. It seems useful for the user to be able to apply command line options which
will affect all of the dependent libraries. There is a potential problem of
surprise for developers, who might not realize that these options would apply
to these "invisible" input files; however, despite the potential for surprise,
this is easy for developers to reason about and gives developers the control
that they may require.
4. This algorithm takes into account all of the different ways that ELF linkers
find input files. The different search methods are tried by the linker in most
obvious to least obvious order.
5. I considered adding finer grained control over which dependent libraries were
ignored (e.g. MSVC has /nodefaultlib:<library>); however, I concluded that this
is not necessary: if finer control is required developers can fall back to using
the command line directly.
RFC thread: http://lists.llvm.org/pipermail/llvm-dev/2019-March/131004.html.
Differential Revision: https://reviews.llvm.org/D60274
llvm-svn: 360984
As for other floating-point rounding builtins that can be optimized
when build with -fno-math-errno, this patch adds support for lround
and llround. It currently only optimize for AArch64 backend.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61392
llvm-svn: 360896
Summary:
The definition of the builtins __builtin_bswap32, __builtin_bitreverse32, __builtin_rotateleft32 and __builtin_rotateright32 rely on that the int type is 32 bits wide on the target.
The defintions of the builtins __builtin_bswap64, __builtin_bitreverse64, __builtin_rotateleft64, and __builtin_rotateright64 rely on that the long long type is 64 bits wide.
On targets where this is not the case (e.g. AVR) clang will generate faulty code (wrong llvm assembler intrinsics).
This patch add support for using 'Z' (the int32_t type) in Bultins.def. The builtins above are changed to be based on the int32_t type instead of the int type, and the int64_t type instead of the long long type.
The AVR backend (experimental) have a native int type that is only 16 bits wide. The supplied testcase will therefore fail if running the testcase on trunk as clang will convert e.g. __builtin_bitreverse32 into llvm.bitreverse.i16 on AVR.
Reviewers: dylanmckay, spatel, rsmith, efriedma
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D61845
llvm-svn: 360863
Port hardware assisted address sanitizer to new PM following the same guidelines as msan and tsan.
Changes:
- Separate HWAddressSanitizer into a pass class and a sanitizer class.
- Create new PM wrapper pass for the sanitizer class.
- Use the getOrINsert pattern for some module level initialization declarations.
- Also enable kernel-kwasan in new PM
- Update llvm tests and add clang test.
Differential Revision: https://reviews.llvm.org/D61709
llvm-svn: 360707
> extension allowing a "static" declaration to follow an "extern"
> declaration to stop working.
It introduced asserts for some "static-following-extern" cases, breaking the
Chromium build. See the cfe-commits thread for reproducer.
llvm-svn: 360657
Summary:
A COFF stub indirects the reference to a symbol through memory. A
.refptr.$sym global variable pointer is created to refer to $sym.
Typically mingw uses these for external global variable declarations,
but we can use them for weak function declarations as well.
Updates the dso_local classification to add a special case for
extern_weak symbols on COFF in both clang and LLVM.
Fixes PR37598
Reviewers: smeenai, mstorsjo
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61615
llvm-svn: 360207
Keep looking for decl-specifiers after an unknown identifier. Don't
issue diagnostics about an error type specifier conflicting with later
type specifiers.
llvm-svn: 360117
This matches the behavior of the old pass manager. There are some
targets that don't have target machine at all (e.g. le32, spir) which
whose tests would never run with new pass manager. Similarly, we would
need to disable tests for targets that are disabled.
Differential Revision: https://reviews.llvm.org/D58374
llvm-svn: 360100
In MinGW, setjmp isn't expanded as a builtin in the compiler (like it
is for MSVC), but manually hooked up as calls to the right underlying
functions in headers. Using the actual CRT's real setjmp/longjmp
functions requires this intrinsic. (Currently this is worked around by
using MinGW specific reimplementations of setjmp/longjmp on aarch64.)
Differential Revision: https://reviews.llvm.org/D61592
llvm-svn: 360082
Summary:
1. Enable infrastructure of AVX512_BF16, which is supported for BFLOAT16 in Cooper Lake;
2. Enable intrinsics for VCVTNE2PS2BF16, VCVTNEPS2BF16 and DPBF16PS instructions, which are Vector Neural Network Instructions supporting BFLOAT16 inputs and conversion instructions from IEEE single precision.
For more details about BF16 intrinsic, please refer to the latest ISE document: https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference
Patch by LiuTianle
Reviewers: craig.topper, smaslov, LuoYuanke, wxiao3, annita.zhang, spatel, RKSimon
Reviewed By: craig.topper
Subscribers: mgorny, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D60552
llvm-svn: 360018
According to alignment section in below ARM64 ABI document, MSVC could increase
alignment of global data based on its total size. Clang doesn't do this. Compile
the same symbol into different alignments by Clang and MSVC could cause link
error because some instruction encodings, like 64-bit LDR/STR with immediate,
require the target to be 8 bytes aligned, and linker could choose code stream
with such LDR/STR instruction from MSVC and 4 bytes aligned data from Clang into
final image, which actually cannot be linked together
(see https://bugs.llvm.org/show_bug.cgi?id=41506 for more details).
https://docs.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=vs-2019#alignment
Differential Revision: https://reviews.llvm.org/D61225
llvm-svn: 359744
Summary:
C guarantees that brace-init with fewer initializers than members in the
aggregate will initialize the rest of the aggregate as-if it were static
initialization. In turn static initialization guarantees that padding is
initialized to zero bits.
Quoth the Standard:
C17 6.7.9 Initialization ❡21
If there are fewer initializers in a brace-enclosed list than there are elements
or members of an aggregate, or fewer characters in a string literal used to
initialize an array of known size than there are elements in the array, the
remainder of the aggregate shall be initialized implicitly the same as objects
that have static storage duration.
C17 6.7.9 Initialization ❡10
If an object that has automatic storage duration is not initialized explicitly,
its value is indeterminate. If an object that has static or thread storage
duration is not initialized explicitly, then:
* if it has pointer type, it is initialized to a null pointer;
* if it has arithmetic type, it is initialized to (positive or unsigned) zero;
* if it is an aggregate, every member is initialized (recursively) according to
these rules, and any padding is initialized to zero bits;
* if it is a union, the first named member is initialized (recursively)
according to these rules, and any padding is initialized to zero bits;
<rdar://problem/50188861>
Reviewers: glider, pcc, kcc, rjmccall, erik.pilkington
Subscribers: jkorous, dexonsmith, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D61280
llvm-svn: 359628
Summary:
This patch adds support for __builtin_dcbf for PPC.
__builtin_dcbf copies the contents of a modified block from the data cache
to main memory and flushes the copy from the data cache.
Differential revision: https://reviews.llvm.org/D59843
llvm-svn: 359517
us emitting the operand of __builtin_constant_p if it has side-effects.
Original commit message:
Fix interactions between __builtin_constant_p and constexpr to match
current trunk GCC.
GCC permits information from outside the operand of
__builtin_constant_p (but in the same constant evaluation context) to be
used within that operand; clang now does so too. A few other minor
deviations from GCC's behavior showed up in my testing and are also
fixed (matching GCC):
* Clang now supports nullptr_t as the argument type for
__builtin_constant_p
* Clang now returns true from __builtin_constant_p if called with a
null pointer
* Clang now returns true from __builtin_constant_p if called with an
integer cast to pointer type
llvm-svn: 359367
This provides intrinsics support for Memory Tagging Extension (MTE),
which was introduced with the Armv8.5-a architecture.
These intrinsics are available when __ARM_FEATURE_MEMORY_TAGGING is defined.
Each intrinsic is described in detail in the ACLE Q1 2019 documentation:
https://developer.arm.com/docs/101028/latest
Reviewed By: Tim Nortover, David Spickett
Differential Revision: https://reviews.llvm.org/D60485
llvm-svn: 359348
This reverts r359250 (git commit 4730604bd3)
The newly added test should use -cc1 and -emit-llvm and there are other
test failures that need fixing.
llvm-svn: 359251
Statically link certain runtime library functions for MSVC/GNU Windows
environments. This is consistent with MSVC behavior.
Fixes LNK4286 and LNK4217 warnings from link.exe when linking the static
CRT:
LINK : warning LNK4286: symbol '__std_terminate' defined in 'libvcruntime.lib(ehhelpers.obj)' is imported by 'ASAN_NOINST_TEST_OBJECTS.asan_noinst_test.cc.x86_64-calls.o'
LINK : warning LNK4286: symbol '__std_terminate' defined in 'libvcruntime.lib(ehhelpers.obj)' is imported by 'ASAN_NOINST_TEST_OBJECTS.asan_test_main.cc.x86_64-calls.o'
LINK : warning LNK4217: symbol '_CxxThrowException' defined in 'libvcruntime.lib(throw.obj)' is imported by 'ASAN_NOINST_TEST_OBJECTS.gtest-all.cc.x86_64-calls.o' in function '"int `public: static class UnitTest::GetInstance * __cdecl testing::UnitTest::GetInstance(void)'::`1'::dtor$5" (?dtor$5@?0??GetInstance@UnitTest@testing@@SAPEAV12@XZ@4HA)'
Reviewers: mstorsjo, efriedma, TomTan, compnerd, smeenai, mgrang
Subscribers: abdulras, theraven, smeenai, pcc, mehdi_amini, javed.absar, inglorion, kristof.beyls, dexonsmith, cfe-commits
Differential Revision: https://reviews.llvm.org/D55229
llvm-svn: 359250
These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.
Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library
https://github.com/NVIDIA/cutlass/blob/master/cutlass/arch/mma.h#L101)
Differential Revision: https://reviews.llvm.org/D60279
llvm-svn: 359248
Currently InstrProf lowering is not enabled for Clang PGO instrumentation in
the new pass manager. The following command
"-fprofile-instr-generate -fexperimental-new-pass-manager ..." is broken.
This CL enables InstrProf lowering pass for Clang PGO instrumentation in the
new pass manager.
Differential Revision: https://reviews.llvm.org/D61138
llvm-svn: 359215
Summary:
The opt level was not being passed down to the ThinLTO backend when
invoked via clang (for distributed ThinLTO).
This exposed an issue where the new PM was asserting if the Thin or
regular LTO backend pipelines were invoked with -O0 (not a new issue,
could be provoked by invoking in-process *LTO backends via linker using
new PM and -O0). Fix this similar to the old PM where -O0 only does the
necessary lowering of type metadata (WPD and LowerTypeTest passes) and
then quits, rather than asserting.
Reviewers: xur
Subscribers: mehdi_amini, inglorion, eraman, hiraditya, steven_wu, dexonsmith, cfe-commits, llvm-commits, pcc
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D61022
llvm-svn: 359025
Without this patch, APSInt inherits APInt::isNegative, which merely
checks the sign bit without regard to whether the type is actually
signed. isNonNegative and isStrictlyPositive call isNegative and so
are also affected.
This patch adjusts APSInt to override isNegative, isNonNegative, and
isStrictlyPositive with implementations that consider whether the type
is signed.
A large set of Clang OpenMP tests are affected. Without this patch,
these tests assume that `true` is not a valid argument for clauses
like `collapse`. Indeed, `true` fails APInt::isStrictlyPositive but
not APSInt::isStrictlyPositive. This patch adjusts those tests to
assume `true` should be accepted.
This patch also adds tests revealing various other similar fixes due
to APSInt::isNegative calls in Clang's ExprConstant.cpp and
SemaExpr.cpp: `++` and `--` overflow in `constexpr`, evaluated object
size based on `alloc_size`, `<<` and `>>` shift count validation, and
OpenMP array section validation.
Reviewed By: lebedev.ri, ABataev, hfinkel
Differential Revision: https://reviews.llvm.org/D59712
llvm-svn: 359012
Port mmintrin.h which include x86 MMX intrinsics implementation to PowerPC platform (using Altivec).
To make the include process correct, PowerPC's toolchain class is overrided to insert new headers directory (named ppc_wrappers) into the path. Basic test cases for several intrinsic functions are added.
The header is mainly developed by Steven Munroe, with contributions from Paul Clarke, Bill Schmidt, Jinsong Ji and Zixuan Wu.
Reviewed By: Jinsong Ji
Differential Revision: https://reviews.llvm.org/D59924
llvm-svn: 358949
I plan to use this as the basis for backend IR test cases. We currently crash hard for using 32 or 64 bit mask registers without avx512bw.
llvm-svn: 358435
The pattern we replaced these with may be too hard to match as demonstrated by
PR41496 and PR41316.
This patch restores the intrinsics and then we can start focusing
on the optimizing the intrinsics.
I've mostly reverted the original patch that removed them. Though I modified
the avx512 intrinsics to not have masking built in.
Differential Revision: https://reviews.llvm.org/D60674
llvm-svn: 358427
[MS] Add metadata for __declspec(allocator)
Original summary:
Emit !heapallocsite in the metadata for calls to functions marked with
__declspec(allocator). Eventually this will be emitted as S_HEAPALLOCSITE debug
info in codeview.
Differential Revision: https://reviews.llvm.org/D60237
llvm-svn: 358307