These intrinsics should always take an immediate for the rounding mode.
The base instruction comes from before EVEX embdedded rounding. The
user should always provide the immediate rather than us assuming
CUR_DIRECTION.
Make the 512-bit versions also explicit aliases instead of copy
pasting the code.
llvm-svn: 363961
Summary:
This patch adds support for the handling of the variables under the declare target to clause.
The variables in this case are handled like link variables are. A pointer is created on the host and then mapped to the device. The runtime will then copy the address of the host variable in the device pointer.
Reviewers: ABataev, AlexEichenberger, caomhin
Reviewed By: ABataev
Subscribers: guansong, jdoerfert, cfe-commits
Tags: #clang
Differential Revision: https://reviews.llvm.org/D63108
llvm-svn: 363959
This is a stopgap measure to make it easier to integrate the PSTL into
libc++. In the future, we should have a system similar to what libc++
does, where we specify settings at configuration time and generate a
__config_site header that is part of the PSTL.
llvm-svn: 363958
Summary:
BLSI sets the C flag is the input is not zero. So if its followed
by a TEST of the input where only the Z flag is consumed, we can
replace it with the opposite check of the C flag.
We should be able to do the same for BLSMSK and BLSR, but the
naive test case for those is being optimized to a subo by
CodeGenPrepare.
Reviewers: spatel, RKSimon
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D63589
llvm-svn: 363957
The form that compares against 0 is better because:
1. It removes a use of the input value.
2. It's the more standard form for this pattern: https://graphics.stanford.edu/~seander/bithacks.html#DetermineIfPowerOf2
3. It results in equal or better codegen (tested with x86, AArch64, ARM, PowerPC, MIPS).
This is a root cause for PR42314, but probably doesn't completely answer the codegen request:
https://bugs.llvm.org/show_bug.cgi?id=42314
Alive proof:
https://rise4fun.com/Alive/9kG
Name: is power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp eq i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp eq i32 %a2, 0
Name: is not power-of-2
%neg = sub i32 0, %x
%a = and i32 %neg, %x
%r = icmp ne i32 %a, %x
=>
%dec = add i32 %x, -1
%a2 = and i32 %dec, %x
%r = icmp ne i32 %a2, 0
llvm-svn: 363956
Thought of this case while working on something else. We appear to get it right in all of the variations I tried, but that's by accident. So, add a test which would catch the potential bug.
llvm-svn: 363953
The attribute can specify elimination for leaf or non-leaf, so it
should always be considered. I copied this bug from AArch64, which
probably should also be fixed.
llvm-svn: 363949
This change reverts r363649; effectively re-landing r363626. At this point
clang::Index::CodegenNameGeneratorImpl has been refactored into
clang::AST::ASTNameGenerator. This makes it so that the previous circular link
dependency no longer exists, fixing the previous share lib
(-DBUILD_SHARED_LIBS=ON) build issue which was the reason for r363649.
Clang interface stubs (previously referred to as clang-ifsos) is a new frontend
action in clang that allows the generation of stub files that contain mangled
name info that can be used to produce a stub library. These stub libraries can
be useful for breaking up build dependencies and controlling access to a
library's internal symbols. Generation of these stubs can be invoked by:
clang -fvisibility=<visibility> -emit-interface-stubs \
-interface-stub-version=<interface format>
Notice that -fvisibility (along with use of visibility attributes) can be used
to control what symbols get generated. Currently the interface format is
experimental but there are a wide range of possibilities here.
Currently clang-ifs produces .ifs files that can be thought of as analogous to
object (.o) files, but just for the mangled symbol info. In a subsequent patch
I intend to add support for merging the .ifs files into one .ifs/.ifso file
that can be the input to something like llvm-elfabi to produce something like a
.so file or .dll (but without any of the code, just symbols).
Differential Revision: https://reviews.llvm.org/D60974
llvm-svn: 363948
If we construct an object in some arbitrary non-default addr space
it should fail unless either:
- There is an implicit conversion from the address space to default
/generic address space.
- There is a matching ctor qualified with an address space that is
either exactly matching or convertible to the address space of an
object.
Differential Revision: https://reviews.llvm.org/D62156
llvm-svn: 363944
Summary:
AIX system headers need stdint.h and inttypes.h to be re-enterable when macro _STD_TYPES_T is defined so that limit macro definitions such as UINT32_MAX can be found. This patch attempts to allow that on AIX.
Reviewers: hubert.reinterpretcast, jasonliu, mclow.lists, EricWF
Reviewed by: hubert.reinterpretcast, mclow.lists
Subscribers: jfb, jsji, christof, cfe-commits, libcxx-commits, llvm-commits
Tags: #LLVM, #clang, #libc++
Differential Revision: https://reviews.llvm.org/D59253
llvm-svn: 363939
This includes integer arithmetic of various kinds (add/sub/multiply,
saturating and not), and the immediate forms of VMOV and VMVN that
load an immediate into all lanes of a vector.
Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover
Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D62674
llvm-svn: 363936
AMDGPU target needs to override getRegClass() used during
instruction selection. We now may have either 32 or 64 bit
conditional registers used in the same instructions. For
that purpose special SReg_1 register class is created which
is dynamically resolved to either SReg_64 or SGPR_32 depending
on the subtarget attributes.
Differential Revision: https://reviews.llvm.org/D63205
llvm-svn: 363931
ELFState<ELFT>::addSymbols method looks a bit strange.
User code have to create the destination symbols vector outside,
add a null symbol and then pass it to addSymbols when it seems
the more natural logic is to isolate all work with symbols inside some
function, build the list right there and return it.
Differential revision: https://reviews.llvm.org/D63493
llvm-svn: 363930
Summary:
Our rule to create R_*_RELATIVE for absolute relocation types were
loose. D63121 made it stricter but it failed to create R_*_RELATIVE for
R_ARM_TARGET1 and R_PPC64_TOC. rLLD363236 worked around that by
reinstating the original behavior for ARM and PPC64.
This patch is an attempt to simplify the logic.
Note, in ld.bfd, R_ARM_TARGET2 --target2=abs also creates
R_ARM_RELATIVE. This seems a very uncommon scenario (moreover,
--target2=got-rel is the default), so I do not implement any logic
related to it.
Also, delete R_AARCH64_ABS32 from AArch64::getDynRel. We don't have
working ILP32 support yet. Allowing it would create an incorrect
R_AARCH64_RELATIVE.
For MIPS, the (if SymbolRel, then RelativeRel) code is to keep its
behavior unchanged.
Note, in ppc64-abs64-dyn.s, R_PPC64_TOC gets an incorrect addend because
computeAddend() doesn't compute the correct address. We seem to have the
wrong behavior for a long time. The important thing seems that a dynamic
relocation R_PPC64_TOC should not be created as the dynamic loader will
error R_PPC64_TOC is not supported.
Reviewers: atanasyan, grimar, peter.smith, ruiu, sfertile, espindola
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D63383
llvm-svn: 363928
ARM and RISC-V do not support TLS relaxations. However, for General
Dynamic and Local Dynamic models, if we are producing an executable and
the symbol is non-preemptable, we know it must be defined and the
R_ARM_TLS_DTPMOD32/R_RISCV_TLS_DTPMOD{32,64} dynamic relocation can be
omitted because it is always 1. This may be necessary for static linking
as DTPMOD may not be expected at load time.
Merge handleARMTlsRelocation() into handleTlsRelocation(). This requires
more logic to R_TLSGD_PC and R_TLSLD_PC. Because we use SymbolicRel to
resolve the relocation at link time, R_ARM_TLS_DTPMOD32 can be deleted
from relocateOne(). It cannot be used as a static relocation type.
As a bonus, the additional logic in R_TLSGD_PC code can be shared by the
TLS support for RISC-V (D63220).
Reviewed By: ruiu
Differential Revision: https://reviews.llvm.org/D63333
llvm-svn: 363927
We tend to only test for scalar/scalar consts when really we could support non-uniform vectors using ISD::matchUnaryPredicate/matchBinaryPredicate etc.
llvm-svn: 363924
Remove most of the abstraction over ptrace() register operations,
as it has little value and introduces more code than it saves.
Instead, leave a single ptrace() wrapper method and call it directly
from ReadRegisterSet() and WriteRegisterSet() with correct PT_* request
and buffer.
Remove the remaining direct ReadGPR() and WriteGPR() invocations
with ReadRegisterSet() and WriteRegisterSet().
Cleanup suggested by Pavel Labath in D63545.
Differential Revision: https://reviews.llvm.org/D63594
llvm-svn: 363923
That commit changed DIERef from a struct to a class, but did not update
the forward-declarations. This fixes one forward-declaration, and
removes other (unused) decls.
llvm-svn: 363915
Nothing of these tests made much sense. Loops were iterating too much, and I
also don't think it was actually testing anything. I think we simply want to
check that AEK_SOME_EXT returns "+some_ext".
I've given the AArch64 tests the same treatment as they very similarly didn't
made any sense either.
This fixes PR42316.
Differential Revision: https://reviews.llvm.org/D63569
llvm-svn: 363913