Set the default alignment control variables for z/OS target and add test case for alignment rules on z/OS.
Reviewed By: abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D88845
(it was introduced in https://lists.llvm.org/pipermail/llvm-dev/2015-January/080956.html)
This canonicalization seems dubious.
Most importantly, while it does not create `inttoptr` casts by itself,
it may cause them to appear later, see e.g. D88788.
I think it's pretty obvious that it is an undesirable outcome,
by now we've established that seemingly no-op `inttoptr`/`ptrtoint` casts
are not no-op, and are no longer eager to look past them.
Which e.g. means that given
```
%a = load i32
%b = inttoptr %a
%c = inttoptr %a
```
we likely won't be able to tell that `%b` and `%c` is the same thing.
As we can see in D88789 / D88788 / D88806 / D75505,
we can't really teach SCEV about this (not without the https://bugs.llvm.org/show_bug.cgi?id=47592 at least)
And we can't recover the situation post-inlining in instcombine.
So it really does look like this fold is actively breaking
otherwise-good IR, in a way that is not recoverable.
And that means, this fold isn't helpful in exposing the passes
that are otherwise unaware of these patterns it produces.
Thusly, i propose to simply not perform such a canonicalization.
The original motivational RFC does not state what larger problem
that canonicalization was trying to solve, so i'm not sure
how this plays out in the larger picture.
On vanilla llvm test-suite + RawSpeed, this results in
increase of asm instructions and final object size by ~+0.05%
decreases final count of bitcasts by -4.79% (-28990),
ptrtoint casts by -15.41% (-3423),
and of inttoptr casts by -25.59% (-6919, *sic*).
Overall, there's -0.04% less IR blocks, -0.39% instructions.
See https://bugs.llvm.org/show_bug.cgi?id=47592
Differential Revision: https://reviews.llvm.org/D88789
This is one of the reason for extra invalidations in D84959. In
practice, I don't think we have use cases needing this. This simplifies
the pipeline a bit and prune corner cases when considering
invalidations.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85676
We were taking multiple pointer arguments in the builtin.
gcc accepts a single void*.
The cast from void* to _m128i* caused the IR generation to assume
the pointer was aligned.
Instead make the builtin take a single void*, emit i8* GEPs to
adjust then cast to <2 x i64>* and perform a store with align of 1.
Summary: This patch implements the builtins for xvtdivdp, xvtdivsp, xvtsqrtdp, xvtsqrtsp.
The instructions correspond to the following builtins:
int vec_test_swdiv(vector double v1, vector double v2);
int vec_test_swdivs(vector float v1, vector float v2);
int vec_test_swsqrt(vector double v1);
int vec_test_swsqrts(vector float v1);
This patch depends on D88274, which fixes the bug in copying from CRRC to GPRC/G8RC.
Reviewed By: steven.zhang, amyk
Differential Revision: https://reviews.llvm.org/D88278
This is an alternate fix (see D87835) for a bug where a NaN constant
gets wrongly transformed into Infinity via truncation.
In this patch, we uniformly convert any SNaN to QNaN while raising
'invalid op'.
But we don't have a way to directly specify a 32-bit SNaN value in LLVM IR,
so those are always encoded/decoded by calling convert from/to 64-bit hex.
See D88664 for a clang fix needed to allow this change.
Differential Revision: https://reviews.llvm.org/D88238
This goes with the APFloat change proposed in
D88238.
This is copied from the MIPS-specific test in
builtin-nan-legacy.c to verify that the normal
behavior is correct on other targets without the
complication of an inverted quiet bit.
On some targets, preferred alignment is larger than ABI alignment in some cases. For example,
on AIX we have special power alignment rules which would cause that. Previously, to support
those cases, we added a “PreferredAlignment” field in the `RecordLayout` to store the AIX
special alignment values in “PreferredAlignment” as the community suggested.
However, that patch alone is not enough. There are places in the Clang where `PreferredAlignment`
should have been used instead of ABI-specified alignment. This patch is aimed at fixing those
spots.
Differential Revision: https://reviews.llvm.org/D86790
Key Locker provides a mechanism to encrypt and decrypt data with an AES key without having access
to the raw key value by converting AES keys into “handles”. These handles can be used to perform the
same encryption and decryption operations as the original AES keys, but they only work on the current
system and only until they are revoked. If software revokes Key Locker handles (e.g., on a reboot),
then any previous handles can no longer be used.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D88398
This happens in glibc's headers. It's important that we recognize these
functions so that we can mark them as returns_twice.
Differential Revision: https://reviews.llvm.org/D88518
GCC 7 introduced -fprofile-update={atomic,prefer-atomic} (prefer-atomic is for
best efforts (some targets do not support atomics)) to increment counters
atomically, which is exactly what we have done with -fprofile-instr-generate
(D50867) and -fprofile-arcs (b5ef137c11).
This patch adds the option to clang to surface the internal options at driver level.
GCC 7 also turned on -fprofile-update=prefer-atomic when -pthread is specified,
but it has performance regression
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89307). So we don't follow suit.
Differential Revision: https://reviews.llvm.org/D87737
This reverts commit 55c4ff91bd.
Issues were introduced as discussed in https://reviews.llvm.org/D88241
where this change made previous bugs in the linker and BitCodeWriter
visible.
Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.
Currently, compiler may transform the above code
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
bpf_probe_read(buf, buf_size, p2);
}
to
p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
bpf_probe_read(buf, buf_size, p2);
}
and eventually assembly code looks like
reloc_exist = 1;
reloc_member_offset = 10; //calculate member offset from base
p2 = base + reloc_member_offset;
if (reloc_exist) {
bpf_probe_read(bpf, buf_size, p2);
}
if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.
This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like
%6 = load @llvm.sk_buff:0:50$0:0:0:2:0
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
for a particular address computation relocation.
But this transformation has another consequence, code sinking
may happen like below:
PHI = <possibly different @preserve_*_access_globals>
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6
For such cases, we will not able to generate relocations since
multiple relocations are merged into one.
This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.
A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:
- report fatal error if any reloc global in PHI nodes
- remove all bpf passthrough builtin functions
Changes for existing CORE tests:
- for clang tests, add "-Xclang -disable-llvm-passes" flags to
avoid builtin->reloc_global transformation so the test is still
able to check correctness for clang generated IR.
- for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command
before "llc" command since "opt" is needed to call newly-placed
builtin->reloc_global transformation. Add target triple in the IR
file since "opt" requires it.
- Since target triple is added in IR file, if a test may produce
different results for different endianness, two tests will be
created, one for bpfeb and another for bpfel, e.g., some tests
for relocation of lshift/rshift of bitfields.
- field-reloc-bitfield-1.ll has different relocations compared to
old codes. This is because for the structure in the test,
new code returns struct layout alignment 4 while old code
is 8. Align 8 is more precise and permits double load. With align 4,
the new mechanism uses 4-byte load, so generating different
relocations.
- test intrinsic-transforms.ll is removed. This is used to test
cse on intrinsics so we do not lose metadata. Now metadata is attached
to global and not instruction, it won't get lost with cse.
Differential Revision: https://reviews.llvm.org/D87153
Instead of expliciting emitting a setc in the inline asm instructions,
we can use flag output. This allows the backend to use the flag
directly if it is needed by a branch. Previously we needed a test
instruction to convert the register back to a flag.
If the flag can't be used directly, the backend will emit a setcc.
Differential Revision: https://reviews.llvm.org/D87888
This patch legalizes the v256i1 and v512i1 types that will be used for MMA.
It implements loads and stores of these types.
v256i1 is a pair of VSX registers, so for this type, we load/store the two
underlying registers. v512i1 is used for MMA accumulators. So in addition to
loading and storing the 4 associated VSX registers, we generate instructions to
prime (copy the VSX registers to the accumulator) after loading and unprime
(copy the accumulator back to the VSX registers) before storing.
This patch also adds the UACC register class that is necessary to implement the
loads and stores. This class represents accumulator in their unprimed form and
allow the distinction between primed and unprimed accumulators to avoid invalid
copies of the VSX registers associated with primed accumulators.
Differential Revision: https://reviews.llvm.org/D84968
- `-cl-fp32-correctly-rounded-divide-sqrt` is an OpenCL-specific option
and `correctly-rounded-divide-sqrt-fp-math` should be added for OpenCL
at most.
Differential revision: https://reviews.llvm.org/D88303
There appears to be a mis-compile with MemorySSA-backed DSE in
combination with llvm.lifetime.end. It currently appears like
DSE is doing the right thing and the llvm.lifetime.end markers
are incorrect. The reverted patch uncovers the mis-compile.
This patch temporarily switches back to the legacy DSE
implementation, while we investigate.
This reverts commit 9d172c8e9c.
Make the corresponding change that was made for byval in
b7141207a4. Like byval, this requires a
bulk update of the test IR tests to include the type before this can
be mandatory.
PAC/BTI-related codegen in the AArch64 backend is controlled by a set
of LLVM IR function attributes, added to the function by Clang, based
on command-line options and GCC-style function attributes. However,
functions, generated in the LLVM middle end (for example,
asan.module.ctor or __llvm_gcov_write_out) do not get any attributes
and the backend incorrectly does not do any PAC/BTI code generation.
This patch record the default state of PAC/BTI codegen in a set of
LLVM IR module-level attributes, based on command-line options:
* "sign-return-address", with non-zero value means generate code to
sign return addresses (PAC-RET), zero value means disable PAC-RET.
* "sign-return-address-all", with non-zero value means enable PAC-RET
for all functions, zero value means enable PAC-RET only for
functions, which spill LR.
* "sign-return-address-with-bkey", with non-zero value means use B-key
for signing, zero value mean use A-key.
This set of attributes are always added for AArch64 targets (as
opposed, for example, to interpreting a missing attribute as having a
value 0) in order to be able to check for conflicts when combining
module attributed during LTO.
Module-level attributes are overridden by function level attributes.
All the decision making about whether to not to generate PAC and/or
BTI code is factored out into AArch64FunctionInfo, there shouldn't be
any places left, other than AArch64FunctionInfo, which directly
examine PAC/BTI attributes, except AArch64AsmPrinter.cpp, which
is/will-be handled by a separate patch.
Differential Revision: https://reviews.llvm.org/D85649
Adding this test so that I can extend it in a follow on patch with
expected IR for AIX when I implement complex handling in
AIXABIInfo.
Reviewed By: daltenty, ZarkoCA
Differential Revision: https://reviews.llvm.org/D88105
Add the ability to selectively instrument a subset of functions by dividing the functions into N logical groups and then selecting a group to cover. By selecting different groups over time you could cover the entire application incrementally with lower overhead than instrumenting the entire application at once.
Differential Revision: https://reviews.llvm.org/D87953
This patch implements custom codegen for the vec_replace_elt and
vec_replace_unaligned builtins.
These builtins map to the @llvm.ppc.altivec.vinsw and @llvm.ppc.altivec.vinsd
intrinsics depending on the arguments. The main motivation for doing custom
codegen for these intrinsics is because there are float and double versions of
the builtin. Normally, the converting the float to an integer would be done via
fptoui in the IR. This is incorrect as fptoui truncates the value and we must
ensure the value is not truncated. Therefore, we provide custom codegen to utilize
bitcast instead as bitcasts do not truncate.
Differential Revision: https://reviews.llvm.org/D83500
I believe the inline asm emitted here should have a memory clobber since it writes to memory.
It was also missing the dirflag clobber that we use by default along with flags and fpsr. To avoid missing defaults in the future, get the default list from the target
Differential Revision: https://reviews.llvm.org/D88121
This patch implements the vec_[all|any]_[eq | ne | lt | gt | le | ge] builtins for vector signed/unsigned __int128.
Differential Revision: https://reviews.llvm.org/D87910
D87921 was reverted in commit b89059a313
as it was causing an unknown llvm PPC bot failure. Reapplying the patch
after confirming that this is not responsible. Build bot failure:
https://reviews.llvm.org/D87921#2286644 which caused the revert.
The wrong placement of add pass with optimizations led to
-funique-internal-linkage-names being disabled.
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make it
work correctly with -O2 and new pass manager. Updated the tests to explicitly
check O0 and O1.
Differential Revision: https://reviews.llvm.org/D87921
This completes the circle, complementing -lto-embed-bitcode
(specifically, post-merge-pre-opt). Using -thinlto-assume-merged skips
function importing. The index file is still needed for the other data it
contains.
Differential Revision: https://reviews.llvm.org/D87949
This patch implements the vector string isolate (predicate and non-predicate
versions) builtins. The predicate builtins are custom selected within PPCISelDAGToDAG.
Differential Revision: https://reviews.llvm.org/D87671
This patch implements the 128-bit vector divide extended builtins in Clang/LLVM.
These builtins map to the vdivesq and vdiveuq instructions respectively.
Differential Revision: https://reviews.llvm.org/D87729
Set the default wchar_t type on z/OS, and unsigned as the default.
Reviewed By: hubert.reinterpretcast, fanbo-meng
Differential Revision: https://reviews.llvm.org/D87624
Fixed the placement of the MPM.addpass for UniqueInternalLinkageNames to make
it work correctly with -O2 and new pass manager. Updated the tests to
explicitly check O0 and O2.
Previously, the addPass was placed before BackendUtil.cpp#L1373 which is wrong
as MPM gets assigned at this point and any additions to the pass vector before
this is wrong. This change just moves it after MPM is assigned and places it at
a point where O0 and O0+ can share it.
Differential Revision: https://reviews.llvm.org/D87921
This patch implements the vec_gen[b|h|w|d|q]m function prototypes in altivec.h
in order to utilize the move to VSR with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82725
This switches to using DSE + MemorySSA by default again, after
fixing the issues reported after the first commit.
Notable fixes fc82006331, a0017c2bc2.
This reverts commit 3a59628f3c.
This patch implements the vec_cntm function prototypes in altivec.h in order to
utilize the vector count mask bits instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82726
Currenlty assume x18 is used as pointer to shadow call stack. User shall pass
flags:
"-fsanitize=shadow-call-stack -ffixed-x18"
Runtime supported is needed to setup x18.
If SCS is desired, all parts of the program should be built with -ffixed-x18 to
maintain inter-operatability.
There's no particuluar reason that we must use x18 as SCS pointer. Any register
may be used, as long as it does not have designated purpose already, like RA or
passing call arguments.
Differential Revision: https://reviews.llvm.org/D84414
Enable canonicalization of SPF_ABS and SPF_NABS to the abs intrinsic.
To be conservative, the one-use check on the comparison is retained,
this may be relaxed if all goes well.
It's pretty likely that this will uncover places that missing
handling for the abs() intrinsic. Please report any seen performance
regressions.
Differential Revision: https://reviews.llvm.org/D87188
Instead of relying on whether a certain identifier is a builtin, introduce BuiltinAttr to specify a declaration as having builtin semantics.
This fixes incompatible redeclarations of builtins, as reverting the identifier as being builtin due to one incompatible redeclaration would have broken rest of the builtin calls.
Mostly-compatible redeclarations of builtins also no longer have builtin semantics. They don't call the builtin nor inherit their attributes.
A long-standing FIXME regarding builtins inside a namespace enclosed in extern "C" not being recognized is also addressed.
Due to the more correct handling attributes for builtin functions are added in more places, resulting in more useful warnings.
Tests are updated to reflect that.
Intrinsics without an inline definition in intrin.h had `inline` and `static` removed as they had no effect and caused them to no longer be recognized as builtins otherwise.
A pthread_create() related test is XFAIL-ed, as it relied on it being recognized as a builtin based on its name.
The builtin declaration syntax is too restrictive and doesn't allow custom structs, function pointers, etc.
It seems to be the only case and fixing this would require reworking the current builtin syntax, so this seems acceptable.
Fixes PR45410.
Reviewed By: rsmith, yutsumi
Differential Revision: https://reviews.llvm.org/D77491
This patch adds support for implicit casting between GNU vectors and SVE
vectors when `__ARM_FEATURE_SVE_BITS==N`, as defined by the Arm C
Language Extensions (ACLE, version 00bet5, section 3.7.3.3) for SVE [1].
This behavior makes it possible to use GNU vectors with ACLE functions
that operate on VLAT. For example:
typedef int8_t vec __attribute__((vector_size(32)));
vec f(vec x) { return svasrd_x(svptrue_b8(), x, 1); }
Tests are also added for implicit casting between GNU and fixed-length
SVE vectors created by the 'arm_sve_vector_bits' attribute. This
behavior makes it possible to use VLST with existing interfaces that
operate on GNUT. For example:
typedef int8_t vec1 __attribute__((vector_size(32)));
void f(vec1);
#if __ARM_FEATURE_SVE_BITS==256 && __ARM_FEATURE_SVE_VECTOR_OPERATORS
typedef svint8_t vec2 __attribute__((arm_sve_vector_bits(256)));
void g(vec2 x) { f(x); } // OK
#endif
The `__ARM_FEATURE_SVE_VECTOR_OPERATORS` feature macro indicates
interoperability with the GNU vector extension. This is the first patch
providing support for this feature, which once complete will be enabled
by the `-msve-vector-bits` flag, as the `__ARM_FEATURE_SVE_BITS` feature
currently is.
[1] https://developer.arm.com/documentation/100987/latest
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D87607
This will embed bitcode after (Thin)LTO merge, but before optimizations.
In the case the thinlto backend is called from clang, the .llvmcmd
section is also produced. Doing so in the case where the caller is the
linker doesn't yet have a motivation, and would require plumbing through
command line args.
Differential Revision: https://reviews.llvm.org/D87636
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm smax/smin/umax/umin intrinsics.
This patch updates the SSE/AVX integer MINMAX intrinsics to emit the generic equivalents instead of the icmp+select code pattern.
Differential Revision: https://reviews.llvm.org/D87603
This patch introduces the new .bb_addr_map section feature which allows us to emit the bits needed for mapping binary profiles to basic blocks into a separate section.
The format of the emitted data is represented as follows. It includes a header for every function:
| Address of the function | -> 8 bytes (pointer size)
| Number of basic blocks in this function (>0) | -> ULEB128
The header is followed by a BB record for every basic block. These records are ordered in the same order as MachineBasicBlocks are placed in the function. Each BB Info is structured as follows:
| Offset of the basic block relative to function begin | -> ULEB128
| Binary size of the basic block | -> ULEB128
| BB metadata | -> ULEB128 [ MBB.isReturn() OR MBB.hasTailCall() << 1 OR MBB.isEHPad() << 2 ]
The new feature will replace the existing "BB labels" functionality with -basic-block-sections=labels.
The .bb_addr_map section scrubs the specially-encoded BB symbols from the binary and makes it friendly to profilers and debuggers.
Furthermore, the new feature reduces the binary size overhead from 70% bloat to only 12%.
For more information and results please refer to the RFC: https://lists.llvm.org/pipermail/llvm-dev/2020-July/143512.html
Reviewed By: MaskRay, snehasish
Differential Revision: https://reviews.llvm.org/D85408
avx512-reduceIntrin.c wasn't bothering with the exhaustive alloca/store/load/bitcast checks and avx512-reduceMinMaxIntrin.c shouldn't need to either.
This makes it a lot easier to maintain as the update script still doesn't work properly on x86 targets
gcov is an "Edge Profiling with Edge Counters" application according to
Optimally Profiling and Tracing Programs (1994).
The minimum number of counters necessary is |E|-(|V|-1). The unmeasured edges
form a spanning tree. Both GCC --coverage and clang -fprofile-generate leverage
this optimization. This patch implements the optimization for clang --coverage.
The produced .gcda files are much smaller now.
i.e. change the work flow from
* .gcno for function A
* .gcno for function B
* .gcno for function C
* .gcda for function A
* .gcda for function B
* .gcda for function C
to
* .gcno for function A
* .gcda for function A
* .gcno for function B
* .gcda for function B
* .gcno for function C
* .gcda for function C
Currently there is duplicate logic in .gcno & .gcda processing: how functions
are filtered, which edges are instrumented, etc. This refactor enables simplification.
Since we always process .gcno, in -fprofile-arcs -fno-test-coverage mode,
__llvm_internal_gcov_emit_function_args.0 will have non-zero checksums.
After the recent discussion on cfe-dev 'Can indirect class parameters be
noalias?' [1], it seems like using using noalias is problematic for
current C++, but should be allowed for C-only code.
This patch introduces a new option to let the user indicate that it is
safe to mark indirect class parameters as noalias. Note that this also
applies to external callers, e.g. it might not be safe to use this flag
for C functions that are called by C++ functions.
In targets that allocate indirect arguments in the called function, this
enables more agressive optimizations with respect to memory operations
and brings a ~1% - 2% codesize reduction for some programs.
[1] : http://lists.llvm.org/pipermail/cfe-dev/2020-July/066353.html
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D85473
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
d4ce862f introduced HasStrictFP to disable generating constrained FP
operations for platforms lacking support. Since work for enabling
constrained FP on PowerPC is almost done, we'd like to enable it.
Reviewed By: kpn, steven.zhang
Differential Revision: https://reviews.llvm.org/D87223
The tests have been updated and I plan to move them from the MSSA
directory up.
Some end-to-end tests needed small adjustments. One difference to the
legacy DSE is that legacy DSE also deletes trivially dead instructions
that are unrelated to memory operations. Because MemorySSA-backed DSE
just walks the MemorySSA, we only visit/check memory instructions. But
removing unrelated dead instructions is not really DSE's job and other
passes will clean up.
One noteworthy change is in llvm/test/Transforms/Coroutines/ArgAddr.ll,
but I think this comes down to legacy DSE not handling instructions that
may throw correctly in that case. To cover this with MemorySSA-backed
DSE, we need an update to llvm.coro.begin to treat it's return value to
belong to the same underlying object as the passed pointer.
There are some minor cases MemorySSA-backed DSE currently misses, e.g. related
to atomic operations, but I think those can be implemented after the switch.
This has been discussed on llvm-dev:
http://lists.llvm.org/pipermail/llvm-dev/2020-August/144417.html
For the MultiSource/SPEC2000/SPEC2006 the number of eliminated stores
goes from ~17500 (legayc DSE) to ~26300 (MemorySSA-backed). More numbers
and details in the thread on llvm-dev.
Impact on CTMark:
```
Legacy Pass Manager
exec instrs size-text
O3 + 0.60% - 0.27%
ReleaseThinLTO + 1.00% - 0.42%
ReleaseLTO-g. + 0.77% - 0.33%
RelThinLTO (link only) + 0.87% - 0.42%
RelLO-g (link only) + 0.78% - 0.33%
```
http://llvm-compile-time-tracker.com/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
```
New Pass Manager
exec instrs. size-text
O3 + 0.95% - 0.25%
ReleaseThinLTO + 1.34% - 0.41%
ReleaseLTO-g. + 1.71% - 0.35%
RelThinLTO (link only) + 0.96% - 0.41%
RelLO-g (link only) + 2.21% - 0.35%
```
http://195.201.131.214:8000/compare.php?from=3f22e96d95c71ded906c67067d75278efb0a2525&to=ae8be4642533ff03803967ee9d7017c0d73b0ee0&stat=instructions
Reviewed By: asbirlea, xbolva00, nikic
Differential Revision: https://reviews.llvm.org/D87163
There are still plenty of tests that specify x86 as a triple but most shouldn't be doing anything very target specific - we can move any ones that I have missed on a case by case basis.
In standard C library, both rint and nearbyint returns rounding result
in current rounding mode. But nearbyint never raises inexact exception.
On PowerPC, x(v|s)r(d|s)pic may modify FPSCR XX, raising inexact
exception. So we can't select constrained fnearbyint into xvrdpic.
One exception here is xsrqpi, which will not raise inexact exception, so
fnearbyint f128 is okay here.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D87220
This patch resumes the work of D16586.
According to the AAPCS, volatile bit-fields should
be accessed using containers of the widht of their
declarative type. In such case:
```
struct S1 {
short a : 1;
}
```
should be accessed using load and stores of the width
(sizeof(short)), where now the compiler does only load
the minimum required width (char in this case).
However, as discussed in D16586,
that could overwrite non-volatile bit-fields, which
conflicted with C and C++ object models by creating
data race conditions that are not part of the bit-field,
e.g.
```
struct S2 {
short a;
int b : 16;
}
```
Accessing `S2.b` would also access `S2.a`.
The AAPCS Release 2020Q2
(https://documentation-service.arm.com/static/5efb7fbedbdee951c1ccf186?token=)
section 8.1 Data Types, page 36, "Volatile bit-fields -
preserving number and width of container accesses" has been
updated to avoid conflict with the C++ Memory Model.
Now it reads in the note:
```
This ABI does not place any restrictions on the access widths of bit-fields where the container
overlaps with a non-bit-field member or where the container overlaps with any zero length bit-field
placed between two other bit-fields. This is because the C/C++ memory model defines these as being
separate memory locations, which can be accessed by two threads simultaneously. For this reason,
compilers must be permitted to use a narrower memory access width (including splitting the access into
multiple instructions) to avoid writing to a different memory location. For example, in
struct S { int a:24; char b; }; a write to a must not also write to the location occupied by b, this requires at least two
memory accesses in all current Arm architectures. In the same way, in struct S { int a:24; int:0; int b:8; };,
writes to a or b must not overwrite each other.
```
Patch D16586 was updated to follow such behavior by verifying that we
only change volatile bit-field access when:
- it won't overlap with any other non-bit-field member
- we only access memory inside the bounds of the record
- avoid overlapping zero-length bit-fields.
Regarding the number of memory accesses, that should be preserved, that will
be implemented by D67399.
Differential Revision: https://reviews.llvm.org/D72932
The following people contributed to this patch:
- Diogo Sampaio
- Ties Stuij
Discussed with @craig.topper and @spatel - this is to try and tidyup the codegen folder and move the x86 specific tests (as opposed to general tests that just happen to use x86 triples) into subfolders. Its up to other targets if they follow suit.
It also helps speed up test iterations as using wildcards on lit commands often misses some filenames.
We're now getting close to having the necessary analysis/combines etc. for the new generic llvm.abs.* intrinsics.
This patch updates the SSE/AVX ABS vector intrinsics to emit the generic equivalents instead of the icmp+sub+select code pattern.
Differential Revision: https://reviews.llvm.org/D87101
This patch implements the vec_expandm function prototypes in altivec.h in order
to utilize the vector expand with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82727
These overloads are listed in appendix A of the ELFv2 ABI specification
without a requirement for ISA 3.0. So these need to be available on
all Altivec-capable architectures. The implementation in altivec.h
erroneously had them guarded for Power9 due to the availability of
the VCMPNE[BHW] instructions. However these need to be implemented
in terms of the VCMPEQ[BHW] instructions on older architectures.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47423
The load builtins in altivec.h do not have const in the signature
for the pointer parameter. This prevents using them for loading
from constant pointers. A notable case for such a use is Eigen.
This patch simply adds the missing const.
Fixes: https://bugs.llvm.org/show_bug.cgi?id=47408
The __ARM_FEATURE_SVE_BITS feature macro is specified in the Arm C
Language Extensions (ACLE) for SVE [1] (version 00bet5). From the spec,
where __ARM_FEATURE_SVE_BITS==N:
When N is nonzero, indicates that the implementation is generating
code for an N-bit SVE target and that the arm_sve_vector_bits(N)
attribute is available.
This was defined in D83550 as __ARM_FEATURE_SVE_BITS_EXPERIMENTAL and
enabled under the -msve-vector-bits flag to simplify initial tests.
This patch drops _EXPERIMENTAL now there is support for the feature.
[1] https://developer.arm.com/documentation/100987/latest
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D86720
This patch implements the builtins for Vector Multiply Builtins (vmulxxd family of instructions), and adds the appropriate test cases for these builtins. The builtins utilize the vector multiply instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83955
As a prerequisite to doing experimental buids of pieces of FreeBSD PowerPC64 as little-endian, allow actually targeting it.
This is needed so basic platform definitions are pulled in. Without it, the compiler will only run freestanding.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D73425
This patch implements the builtins for Vector Load with Zero and Signed Extend Builtins (lxvr_x for b, h, w, d), and adds the appropriate test cases for these builtins. The builtins utilize the vector load instructions itnroduced with ISA 3.1.
Differential Revision: https://reviews.llvm.org/D82502#inline-797941
This relands D85743 with a fix for test
CodeGen/attr-arm-sve-vector-bits-call.c that disables the new pass
manager with '-fno-experimental-new-pass-manager'. Test was failing due
to IR differences with the new pass manager which broke the Fuchsia
builder [1]. Reverted in 2e7041f.
[1] http://lab.llvm.org:8011/builders/fuchsia-x86_64-linux/builds/10375
Original summary:
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
It's not undefined behavior for an unsigned left shift to overflow (i.e. to
shift bits out), but it has been the source of bugs and exploits in certain
codebases in the past. As we do in other parts of UBSan, this patch adds a
dynamic checker which acts beyond UBSan and checks other sources of errors. The
option is enabled as part of -fsanitize=integer.
The flag is named: -fsanitize=unsigned-shift-base
This matches shift-base and shift-exponent flags.
<rdar://problem/46129047>
Differential Revision: https://reviews.llvm.org/D86000
This patch adjusts the following ARM/AArch64 LLVM IR intrinsics:
- neon_bfmmla
- neon_bfmlalb
- neon_bfmlalt
so that they take and return bf16 and float types. Previously these
intrinsics used <8 x i8> and <4 x i8> vectors (a rudiment from
implementation lacking bf16 IR type).
The neon_vbfdot[q] intrinsics are adjusted similarly. This change
required some additional selection patterns for vbfdot itself and
also for vector shuffles (in a previous patch) because of SelectionDAG
transformations kicking in and mangling the original code.
This patch makes the generated IR cleaner (less useless bitcasts are
produced), but it does not affect the final assembly.
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D86146
This patch implements codegen for the 'arm_sve_vector_bits' type
attribute, defined by the Arm C Language Extensions (ACLE) for SVE [1].
The purpose of this attribute is to define vector-length-specific (VLS)
versions of existing vector-length-agnostic (VLA) types.
VLSTs are represented as VectorType in the AST and fixed-length vectors
in the IR everywhere except in function args/return. Implemented in this
patch is codegen support for the following:
* Implicit casting between VLA <-> VLS types.
* Coercion of VLS types in function args/return.
* Mangling of VLS types.
Casting is handled by the CK_BitCast operation, which has been extended
to support the two new vector kinds for fixed-length SVE predicate and
data vectors, where the cast is implemented through memory rather than a
bitcast which is unsupported. Implementing this as a normal bitcast
would require relaxing checks in LLVM to allow bitcasting between
scalable and fixed types. Another option was adding target-specific
intrinsics, although codegen support would need to be added for these
intrinsics. Given this, casting through memory seemed like the best
approach as it's supported today and existing optimisations may remove
unnecessary loads/stores, although there is room for improvement here.
Coercion of VLSTs in function args/return from fixed to scalable is
implemented through the AArch64 ABI in TargetInfo.
The VLA and VLS types are defined by the ACLE to map to the same
machine-level SVE vectors. VLS types are mangled in the same way as:
__SVE_VLS<typename, unsigned>
where the first argument is the underlying variable-length type and the
second argument is the SVE vector length in bits. For example:
#if __ARM_FEATURE_SVE_BITS==512
// Mangled as 9__SVE_VLSIu11__SVInt32_tLj512EE
typedef svint32_t vec __attribute__((arm_sve_vector_bits(512)));
// Mangled as 9__SVE_VLSIu10__SVBool_tLj512EE
typedef svbool_t pred __attribute__((arm_sve_vector_bits(512)));
#endif
The latest ACLE specification (00bet5) does not contain details of this
mangling scheme, it will be specified in the next revision. The
mangling scheme is otherwise defined in the appendices to the Procedure
Call Standard for the Arm Architecture, see [2] for more information.
[1] https://developer.arm.com/documentation/100987/latest
[2] https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#appendix-c-mangling
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85743
This patch adds type information for SVE ACLE vector types,
by describing them as vectors, with a lower bound of 0, and
an upper bound described by a DWARF expression using the
AArch64 Vector Granule register (VG), which contains the
runtime multiple of 64bit granules in an SVE vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86101
This patch implements the function prototypes vec_mulh and vec_dive in order to
utilize the vector multiply high (vmulh[s|u][w|d]) and vector divide extended
(vdive[s|u][w|d]) instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82609
Support -march=sapphirerapids for x86.
Compare with Icelake Server, it includes 14 more new features. They are
amxtile, amxint8, amxbf16, avx512bf16, avx512vp2intersect, cldemote,
enqcmd, movdir64b, movdiri, ptwrite, serialize, shstk, tsxldtrk, waitpkg.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D86503
As suggested by @rsmith on PR47267, by replacing the builtin_memcpy bitcast pattern with builtin_bit_cast we can use _castf32_u32, _castu32_f32, _castf64_u64 and _castu64_f64 inside constant expresssions (constexpr). Although __builtin_bit_cast was added for c++20 it works on all clang c/c++ modes.
Differential Revision: https://reviews.llvm.org/D86398
Currently ConstantExpr::getWithOperands does not handle FNeg and
subsequently treats FNeg as binary operator, leading to an assertion
failure or segmentation fault if built without assertions.
Originally I reproduced this with llvm-dis on a bitcode file, which I
unfortunately cannot share and also cannot really reduce.
But PR45426 describes the same issue and has a reproducer with Clang, so
I'll go with that.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D86274
Commit dbcfbffc adds ppc.readflm and ppc.setflm intrinsics to read or
write FPSCR register. This patch adds them to Clang.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D85874
This is a first step patch to enable constexpr support and testing to a large number of x86 intrinsics.
All I've done here is provide a DEFAULT_FN_ATTRS_CONSTEXPR variant to our existing DEFAULT_FN_ATTRS tag approach that adds constexpr on c++ builds. The clang cuda headers do something similar.
I've started with POPCNT mainly as its tiny and are wrappers to generic __builtin_* intrinsics which already act as constexpr.
Differential Revision: https://reviews.llvm.org/D86229
This adds parsing and codegen support for tune in target attribute.
I've implemented this so that arch in the target attribute implicitly disables tune from the command line. I'm not sure what gcc does here. But since -march implies -mtune. I assume 'arch' in the target attribute implies tune in the target attribute.
Differential Revision: https://reviews.llvm.org/D86187
This patch implements the vec_extractm function prototypes in altivec.h in
order to utilize the vector extract with mask instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82675
Pin the test to use -enable-npm-optnone.
Before, optnone wasn't implemented under NPM, so the LPM and NPM runs produced different IR. Now with -enable-npm-optnone, that is no longer necessary.
Reviewed By: ychen
Differential Revision: https://reviews.llvm.org/D86008
This fails due to the clang invocation running at -O0, producing an optnone function.
Then even with -O2 in the later invocations, LoopVectorizePass doesn't run on the optnone function.
So split this into an -O0 run and an -O2 run.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86011
When casting an enumerate with a fixed bool type the casting should use
an IntegralToBoolean instead of an IntegralCast as is required per Core
Issue 2338.
Fixes PR47055: Incorrect codegen for enum with bool underlying type
Differential Revision: https://reviews.llvm.org/D85612
This was done by turning on -enable-npm-optnone and fixing failures.
That will be enabled in a follow-up change for ease of reverting.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D85457
These functions won't ever unwind. This is useful for MemorySanitizer
as it simplifies handling __atomic_load in particular.
Differential Revision: https://reviews.llvm.org/D85573
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
This recommits the following patches now that D85684 has landed
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
COFF targets have a max object alignment of 8192, so trying to create
one with a larger size results in an unreachable in WinCOFFObjectWriter.
For the reproducer I have uses thread local storage, however other
alignments are likely affected as well.
This patch sets the MaxVectorAlign for COFF to 8192. Additionally,
though there is no longer a way to reproduce that I could find, it
correctly sets the MaxTLSAlign for COFF to that value as well, so that
if anyone comes up with a situation where this is true, it will cause an
error.
Differential Revision: https://reviews.llvm.org/D85543
When we use mask compare intrinsics under strict FP option, the masked
elements shouldn't raise any exception. So, we cann't replace the
intrinsic with a full compare + "and" operation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D85385
Fixes pr/11710.
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Resubmit after breaking Windows and OSX builds.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D80242
This patch adds the missing information to the LF_BUILDINFO record, which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).
Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO stores a full path to the compiler, the PWD (CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
If the CPU string is empty, the target feature map may end up having
an empty string inserted to it. The symptom of the problem is a warning
message:
'+' is not a recognized feature for this target (ignoring feature)
Also, the target-features attribute in the module will have an empty
string in it.
This patch adds noundef to return value and arguments of standard I/O functions.
With this patch, passing undef or poison to the functions becomes undefined
behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++,
passing undef to them was already UB in source.
With this patch, the functions cannot return undef or poison anymore as well.
According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified
value; 3.19.3 says unspecified value is a valid value of the relevant type,
and using unspecified value is unspecified behavior, which is not UB, so it
cannot be undef (using undef is UB when e.g. it is used at branch condition).
— The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10).
— The details of the value stored by the fgetpos function (7.21.9.1).
— The details of the value returned by the ftell function for a text stream (7.21.9.4).
In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D85345
This patch implements the function prototypes vec_extractl and vec_extracth in altivec.h to utilize the vector extract double element instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D84622
The _ExtInt(1) in getTypeForMem was hitting the bool logic for expanding
to an 8 bit value. The result was an assert, or store i1 %0, i8* %2, align 1
since the parameter IS an i1. This patch changes the 'forMem' test to
exclude ext-int from the bool test.
This patch simplified IR generation for __builtin_btf_type_id().
For __builtin_btf_type_id(obj, flag), previously IR builtin
looks like
if (obj is a lvalue)
llvm.bpf.btf.type.id(obj.ptr, 1, flag) !type
else
llvm.bpf.btf.type.id(obj, 0, flag) !type
The purpose of the 2nd argument is to differentiate
__builtin_btf_type_id(obj, flag) where obj is a lvalue
vs.
__builtin_btf_type_id(obj.ptr, flag)
Note that obj or obj.ptr is never used by the backend
and the `obj` argument is only used to derive the type.
This code sequence is subject to potential llvm CSE when
- obj is the same .e.g., nullptr
- flag is the same
- metadata type is different, e.g., typedef of struct "s"
and strust "s".
In the above, we don't want CSE since their metadata is different.
This patch change IR builtin to
llvm.bpf.btf.type.id(seq_num, flag) !type
and seq_num is always increasing. This will prevent potential
llvm CSE.
Also report an error if the type name is empty for
remote relocation since remote relocation needs non-empty
type name to do relocation against vmlinux.
Differential Revision: https://reviews.llvm.org/D85174
This allows people to use `int8_t` instead of `char`, -funsigned-char,
and generally decouples SIMD from the specialness of `char`.
And it makes intrinsics like `__builtin_wasm_add_saturate_s_i8x16`
and `__builtin_wasm_add_saturate_u_i8x16` use signed and unsigned
element types, respectively.
Differential Revision: https://reviews.llvm.org/D85074
This patch added the following additional compile-once
run-everywhere (CO-RE) relocations:
- existence/size of typedef, struct/union or enum type
- enum value and enum value existence
These additional relocations will make CO-RE bpf programs more
adaptive for potential kernel internal data structure changes.
For existence/size relocations, the following two code patterns
are supported:
1. uint32_t __builtin_preserve_type_info(*(<type> *)0, flag);
2. <type> var;
uint32_t __builtin_preserve_field_info(var, flag);
flag = 0 for existence relocation and flag = 1 for size relocation.
For enum value existence and enum value relocations, the following code
pattern is supported:
uint64_t __builtin_preserve_enum_value(*(<enum_type> *)<enum_value>,
flag);
flag = 0 means existence relocation and flag = 1 for enum value.
relocation. In the above <enum_type> can be an enum type or
a typedef to enum type. The <enum_value> needs to be an enumerator
value from the same enum type. The return type is uint64_t to
permit potential 64bit enumerator values.
Differential Revision: https://reviews.llvm.org/D83242
In order to follow NEC Aurora SX VE ABI correctly, change to sign/zero
extend integer arguments and return values smaller than 64 bits in clang.
Also update regression test.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D85071
Specified in https://github.com/WebAssembly/simd/pull/237, these
instructions load the first vector lane from memory and zero the other
lanes. Since these instructions are not officially part of the SIMD
proposal, they are only available on an opt-in basis via LLVM
intrinsics and clang builtin functions. If these instructions are
merged to the proposal, this implementation will change so that the
instructions will be generated from normal IR. At that point the
intrinsics and builtin functions would be removed.
This PR also changes the opcodes for the experimental f32x4.qfm{a,s}
instructions because their opcodes conflicted with those of the
v128.load{32,64}_zero instructions. The new opcodes were chosen to
match those used in V8.
Differential Revision: https://reviews.llvm.org/D84820
ipconstprop is going to get removed and checking opt with specific
passes makes the tests more fragile.
The tests retain the important checks that !callback metadata is created
correctly.
As mentioned on D70376, LVI can currently cause performance issues
when running under NewPM. The problem is that, unlike the legacy
pass manager, NewPM will not immediately discard the LVI analysis
if the following pass does not need it. This is a problem, because
LVI has a high memory requirement, and mass invalidation of LVI
values is very inefficient. LVI should only be alive during passes
that actively interact with it.
This patch addresses the issue by explicitly abandoning LVI after CVP,
which gets us back to the LegacyPM behavior.
Differential Revision: https://reviews.llvm.org/D84959
Power10 introduces new instructions for vector multiply, divide and modulus.
These instructions can be exploited by the builtin functions: vec_mul, vec_div,
and vec_mod, respectively.
This patch aims adds the function prototype, vec_mod, as vec_mul and vec_div
been previously implemented in altivec.h.
This patch also adds the following front end tests:
vec_mul for v2i64
vec_div for v4i32 and v2i64
vec_mod for v4i32 and v2i64
Differential Revision: https://reviews.llvm.org/D82576
fptosi/fptoui have similar, but not identical, semantics. In
particular, the behavior on overflow is different.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)
Differential Revision: https://reviews.llvm.org/D84703
Problem:
Right now, our "Running pass" is not accurate when passes are wrapped in adaptor because adaptor is never skipped and a pass could be skipped. The other problem is that "Running pass" for a adaptor is before any "Running pass" of passes/analyses it depends on. (for example, FunctionToLoopPassAdaptor). So the order of printing is not the actual order.
Solution:
Doing things like PassManager::Debuglogging is very intrusive because we need to specify Debuglogging whenever adaptor is created. (Actually, right now we're not specifying Debuglogging for some sub-PassManagers. Check PassBuilder)
This patch move debug logging for pass as a PassInstrument callback. We could be sure that all running passes are logged and in the correct order.
This could also be used to implement hierarchy pass logging in legacy PM. We could also move logging of pass manager to this if we want.
The test fixes looks messy. It includes changes:
- Remove PassInstrumentationAnalysis
- Remove PassAdaptor
- If a PassAdaptor is for a real pass, the pass is added
- Pass reorder (to the correct order), related to PassAdaptor
- Add missing passes (due to Debuglogging not passed down)
Reviewed By: asbirlea, aeubanks
Differential Revision: https://reviews.llvm.org/D84774
A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:
- -mfloat-abi=soft
- -mfpu=none
This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".
This patch also disables unsupported target features when nofp is passed
to -march.
Differential Revision: https://reviews.llvm.org/D82948
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
This patch implements the `vec_xst_trunc` function in altivec.h in order to
utilize the Store VSX Vector Rightmost [byte | half | word | doubleword] Indexed
instructions introduced in Power10.
Differential Revision: https://reviews.llvm.org/D82467
Implement __builtin_eh_return_data_regno for SystemZ.
Match behavior of GCC.
Author: slavek-kucera
Differential Revision: https://reviews.llvm.org/D84341
Previously, the vins*vlx instructions were incorrectly defined with i64 as the
second argument. This patches fixes this issue by correcting the second argument
of the vins*vlx instructions/intrinsics to be i32.
Differential Revision: https://reviews.llvm.org/D84277
I was trying to pick this up a bit when reviewing D48426 (& perhaps D69778) - in any case, looks like D48426 added a module level flag that might not be needed.
The D48426 implementation worked by setting a module level flag, then code generating contents from the PCH a special case in ASTContext::DeclMustBeEmitted would be used to delay emitting the definition of these functions if they came from a Module with this flag.
This strategy is similar to the one initially implemented for modular codegen that was removed in D29901 in favor of the modular decls list and a bit on each decl to specify whether it's homed to a module.
One major difference between PCH object support and modular code generation, other than the specific list of decls that are homed, is the compilation model: MSVC PCH modules are built into the object file for some other source file (when compiling that source file /Yc is specified to say "this compilation is where the PCH is homed"), whereas modular code generation invokes a separate compilation for the PCH alone. So the current modular code generation test of to decide if a decl should be emitted "is the module where this decl is serialized the current main file" has to be extended (as Lubos did in D69778) to also test the command line flag -building-pch-with-obj.
Otherwise the whole thing is basically streamlined down to the modular code generation path.
This even offers one extra material improvement compared to the existing divergent implementation: Homed functions are not emitted into object files that use the pch. Instead at -O0 they are not emitted into the IR at all, and at -O1 they are emitted using available_externally (existing functionality implemented for modular code generation). The pch-codegen test has been updated to reflect this new behavior.
[If possible: I'd love it if we could not have the extra MSVC-style way of accessing dllexport-pch-homing, and just do it the modular codegen way, but I understand that it might be a limitation of existing build systems. @hans / @thakis: Do either of you know if it'd be practical to move to something more similar to .pcm handling, where the pch itself is passed to the compilation, rather than homed as a side effect of compiling some other source file?]
Reviewers: llunak, hans
Differential Revision: https://reviews.llvm.org/D83652
The implementation of the xvtlsbb builtins/intrinsics were not correct as the
intrinsics previously used i1 as an argument type. This patch changes the i1
argument type used in these intrinsics to be i32 instead, as having the second
as an i1 can lead to issues in the backend.
Differential Revision: https://reviews.llvm.org/D84291
For a long time, the InstCombine pass handled target specific
intrinsics. Having target specific code in general passes was noted as
an area for improvement for a long time.
D81728 moves most target specific code out of the InstCombine pass.
Applying the target specific combinations in an extra pass would
probably result in inferior optimizations compared to the current
fixed-point iteration, therefore the InstCombine pass resorts to newly
introduced functions in the TargetTransformInfo when it encounters
unknown intrinsics.
The patch should not have any effect on generated code (under the
assumption that code never uses intrinsics from a foreign target).
This introduces three new functions:
TargetTransformInfo::instCombineIntrinsic
TargetTransformInfo::simplifyDemandedUseBitsIntrinsic
TargetTransformInfo::simplifyDemandedVectorEltsIntrinsic
A few target specific parts are left in the InstCombine folder, where
it makes sense to share code. The largest left-over part in
InstCombineCalls.cpp is the code shared between arm and aarch64.
This allows to move about 3000 lines out from InstCombine to the targets.
Differential Revision: https://reviews.llvm.org/D81728
Pass LowerMatrixIntrinsics wasn't running yet running under the new pass
manager, and this adds LowerMatrixIntrinsics to the pipeline (to the
same place as where it is running in the old PM).
Differential Revision: https://reviews.llvm.org/D84180
Sometimes we also want to avoid merging inline assembly. This patch add
the nomerge function attribute to inline assembly.
Reviewed By: zequanwu
Differential Revision: https://reviews.llvm.org/D84225
Use 'o' for the mangling specification instead of 'e'. This fixes an
error in the backend caused by a mismatch between the data layouts
generated by the backend and the frontend.
rdar://problem/64168540
GCC r187297 (2012-05) introduced `__gcov_dump` and `__gcov_reset`.
`__gcov_flush = __gcov_dump + __gcov_reset`
The resolution to https://gcc.gnu.org/PR93623 ("No need to dump gcdas when forking" target GCC 11.0) removed the unuseful and undocumented __gcov_flush.
Close PR38064.
Reviewed By: calixte, serge-sans-paille
Differential Revision: https://reviews.llvm.org/D83149
Previously, the vins* intrinsic was incorrectly defined to have its second and
third argument arguments as an i64. This patch fixes the second and third
argument of the vins* instruction and intrinsic to have i32s instead.
Differential Revision: https://reviews.llvm.org/D83497
This reverts most of the following patches due to reports of miscompiles.
I've left the added test cases with comments updated to be FIXMEs.
1cf6f210a2 [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2 [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2 [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329a [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
4c5a93bd landed adjustment to handle C++20 no_unique_address attribute
correctly, clang treats empty members in aggregate type differently if
having this attribute. This commit adds necessary test for PowerPC
target to reflect this change.
In 2b3c505, the pointer arguments for the matrix load and store
intrinsics was changed to always be the element type of the vector
argument.
This patch updates the MatrixBuilder to not add the pointer type to the
overloaded types and adjusts the clang/mlir tests.
This should fix a few build failures on GreenDragon, including
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O0-g/7891/
Some of the system registers readable on AArch64 and ARM platforms
return different values with each read (for example a timer counter),
these shouldn't be hoisted outside loops or otherwise interfered with,
but the normal @llvm.read_register intrinsic is only considered to read
memory.
This introduces a separate @llvm.read_volatile_register intrinsic and
maps all system-registers on ARM platforms to use it for the
__builtin_arm_rsr calls. Registers declared with asm("r9") or similar
are unaffected.
Summary:
NOTE: There is a mailing list discussion on this: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html
Complemantary to the assumption outliner prototype in D71692, this patch
shows how we could simplify the code emitted for an alignemnt
assumption. The generated code is smaller, less fragile, and it makes it
easier to recognize the additional use as a "assumption use".
As mentioned in D71692 and on the mailing list, we could adopt this
scheme, and similar schemes for other patterns, without adopting the
assumption outlining.
Reviewers: hfinkel, xbolva00, lebedev.ri, nikic, rjmccall, spatel, jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: thopre, yamauchi, kuter, fhahn, merge_guards_bot, hiraditya, bollu, rkruppe, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71739
This change fixed a SEH bug (exposed by test58 & test61 in MSVC test xcpt4u.c);
when an Except-filter is located inside a finally, the frame-pointer generated today
via intrinsic @llvm.eh.recoverfp is the frame-pointer of the immediate
parent _finally, not the frame-ptr of outermost host function.
The fix is to retrieve the Establisher's frame-pointer that was previously saved in
parent's frame.
The prolog of a filter inside a _finally should be like code below:
%0 = call i8* @llvm.eh.recoverfp(i8* bitcast (@"?fin$0@0@main@@"), i8*%frame_pointer)
%1 = call i8* @llvm.localrecover(i8* bitcast (@"?fin$0@0@main@@"), i8*%0, i32 0)
%2 = bitcast i8* %1 to i8**
%3 = load i8*, i8** %2, align 8
Differential Revision: https://reviews.llvm.org/D77982
Currently, Clang previously diagnosed this code by default:
void f(int a[static 0]);
saying that "static has no effect on zero-length arrays", which was
accurate.
However, static array extents require that the caller of the function
pass a nonnull pointer to an array of *at least* that number of
elements, but it can pass more (see C17 6.7.6.3p6). Given that we allow
zero-sized arrays as a GNU extension and that it's valid to pass more
elements than specified by the static array extent, we now support
zero-sized static array extents with the usual semantics because it can
be useful in cases like:
void my_bzero(char p[static 0], int n);
my_bzero(&c+1, 0); //ok
my_bzero(t+k,n-k); //ok, pattern from actual code
This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding an .OBJ without any external dependency but the .OBJ itself (other than the compiler executable).
Some tools need this information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD (which is the CWD at program startup), a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variable). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.
For more information see PR36198 and D43002.
Differential Revision: https://reviews.llvm.org/D80833
Use the new -fexperimental-strict-floating-point flag in more cases to
fix the arm and aarch64 bots.
Differential Revision: https://reviews.llvm.org/D80952
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Use the new -fexperimental-strict-floating-point flag to force allowing
strict floating point on hosts that aren't already marked as supporting
it (X86 and SystemZ).
Differential Revision: https://reviews.llvm.org/D80952
Many platform ABIs have special support for passing aggregates that
either just contain a single member of floatint-point type, or else
a homogeneous set of members of the same floating-point type.
When making this determination, any extra "empty" members of the
aggregate type will typically be ignored. However, in C++ (at least
in all prior versions), no data member would actually count as empty,
even if it's type is an empty record -- it would still be considered
to take up at least one byte of space, and therefore make those ABI
special cases not apply.
This is now changing in C++20, which introduced the [[no_unique_address]]
attribute. Members of empty record type, if they also carry this
attribute, now do *not* take up any space in the type, and therefore
the ABI special cases for single-element or homogeneous aggregates
should apply.
The C++ Itanium ABI has been updated accordingly, and GCC 10 has
added support for this new case. This patch now adds support to
LLVM. This is cross-platform; it affects all platforms that use
the single-element or homogeneous aggregate ABI special case and
implement this using any of the following common subroutines
in lib/CodeGen/TargetInfo.cpp:
isEmptyField
isEmptyRecord
isSingleElementStruct
isHomogeneousAggregate
This enables _InterlockedAnd64/_InterlockedOr64/_InterlockedXor64/_InterlockedDecrement64/_InterlockedIncrement64/_InterlockedExchange64/_InterlockedExchangeAdd64/_InterlockedExchangeSub64 on 32-bit Windows
The backend already knows how to expand these to a loop using cmpxchg8b on 32-bit targets.
Fixes PR46595
Differential Revision: https://reviews.llvm.org/D83254
The SystemZ ABI specifies that aggregate types with just a single
member of floating-point type shall be passed as if they were just
a scalar of that type. This applies to both struct and class types
(but not unions).
However, the current ABI support code in clang only checks this
case for struct types, which means that for class types, generated
code does not adhere to the platform ABI.
Fixed by accepting both struct and class types in the
SystemZABIInfo::GetSingleElementType routine.
When using __sync_nand_and_fetch with __int128, a problem is found that
the wrong value for the 'invert' value gets emitted to the xor in case
where the int size is greater than 64 bits.
This is because uses of llvm::ConstantInt::get which zero extends the
greater than 64 bits, so instead -1 that we require, it end up
getting 18446744073709551615
This patch replaces the call to llvm::ConstantInt::get with the call
to llvm::Constant::getAllOnesValue which works for all integer types.
Reviewers: jfp, erichkeane, rjmccall, hfinkel
Differential Revision: https://reviews.llvm.org/D82832
There are now more SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D82943
This covers both the existing memory functions as well as the new bulk memory proposal.
Added new test files since changes where also required in the inputs.
Also removes unused init/drop intrinsics rather than trying to make them work for 64-bit.
Differential Revision: https://reviews.llvm.org/D82821
Mark these tests as only failing on PowerPC. Avoids unexpected passes on
other bots.
Fingers crossed.
Differential Revision: https://reviews.llvm.org/D80952
We currently have strict floating point/constrained floating point enabled
for all targets. Constrained SDAG nodes get converted to the regular ones
before reaching the target layer. In theory this should be fine.
However, the changes are exposed to users through multiple clang options
already in use in the field, and the changes are _completely_ _untested_
on almost all of our targets. Bugs have already been found, like
"https://bugs.llvm.org/show_bug.cgi?id=45274".
This patch disables constrained floating point options in clang everywhere
except X86 and SystemZ. A warning will be printed when this happens.
Differential Revision: https://reviews.llvm.org/D80952
Summary:
Change stack alignment from 64 bits to 128 bits to follow ABI correctly.
And add a regression test for datalayout.
Reviewers: simoll, k-ishizaka
Reviewed By: simoll
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #llvm, #ve, #clang
Differential Revision: https://reviews.llvm.org/D83173
Making -g[no-]column-info opt out reduces the length of a typical CC1 command line.
Additionally, in a non-debug compile, we won't see -dwarf-column-info.
Assume bundle can have more than one entry with the same name,
but at least AlignmentFromAssumptionsPass::extractAlignmentInfo() uses
getOperandBundle("align"), which internally assumes that it isn't the
case, and happily crashes otherwise.
Minimal reduced reproducer: run `opt -alignment-from-assumptions` on
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
%0 = type { i64, %1*, i8*, i64, %2, i32, %3*, i8* }
%1 = type opaque
%2 = type { i8, i8, i16 }
%3 = type { i32, i32, i32, i32 }
; Function Attrs: nounwind
define i32 @f(%0* noalias nocapture readonly %arg, %0* noalias %arg1) local_unnamed_addr #0 {
bb:
call void @llvm.assume(i1 true) [ "align"(%0* %arg, i64 8), "align"(%0* %arg1, i64 8) ]
ret i32 0
}
; Function Attrs: nounwind willreturn
declare void @llvm.assume(i1) #1
attributes #0 = { nounwind "reciprocal-estimates"="none" }
attributes #1 = { nounwind willreturn }
This is what we'd have with -mllvm -enable-knowledge-retention
This reverts commit c95ffadb24.